[ contents ]


Authoring Techniques for XHTML & HTML Internationalization: Handling Bidirectional Text 1.0

W3C Working Draft 9 May 2004

This version:
Latest version:
Previous version:
Richard Ishida, W3C


This document provides advice for the use of markup and CSS to create pages for languages that use bidirectional text, such as Arabic and Hebrew. It attempts to counter many of the misunderstandings or over-complexities that currently abound. It also offers advice to those preparing content that will be localized into scripts that behave like Arabic and Hebrew.

This document is one of a series of documents providing HTML authors with techniques for developing internationalized HTML using XHTML 1.0 or HTML 4.01, supported by CSS1, CSS2 and some aspects of CSS3. It focuses specifically on advice about character sets, encodings, and other character-specific matters. It is produced by the Guidelines, Education & Outreach Task Force (GEO) of the W3C Internationalization Working Group (I18N WG). The GEO Task Force encourages feedback about the content of this document as well as participation in the development of the techniques by people who have experience creating Web content that conforms to internationalization needs.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the First Public Working Draft of a document produced by the GEO (Guidelines, Education & Outreach) Task Force of the W3C Internationalization Working Group (I18N WG). The Internationalization Working Group is part of the W3C Internationalization Activity. This is a draft document that does not fully represent the consensus of the group at this time. The Working Group expects to advance this Working Draft to Working Group Note.

The document provides practical techniques related to the creation of content in scripts such as Arabic and Hebrew, or content in other languages that includes fragments of these languages. It also offers advice that HTML content authors can use to ensure that their HTML is easily adaptable for an international audience. These are techniques that are best addressed from the start of content development if unnecessary costs and resource issues are to be avoided later on.

This document was last published as part of a larger document entitled Authoring Techniques for XHTML & HTML Internationalization 1.0. The material in that document will now be published as a number of smaller independent documents to allow for easier ongoing improvements and updates. The total number of such documents is not fixed, but will grow as material and resources become available. The title of all related documents will begin with "Authoring Techniques for XHTML & HTML Internationalization:..." and they can be found in the W3C technical reports index.

The Task Force encourages feedback about the content of this document as well as participation in the development of the guidelines by people who have experience creating Web content that conforms to internationalization needs. Send comments about this document to www-i18n-comments@w3.org. The archives for this list are publicly available.

The Internationalization Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced under the 24 January 2002 CPP as amended by the W3C Patent Policy Transition Procedure. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy. At the time of publication, the Working Group believed there were no patent disclosures relevant to this specification.

Table of Contents

1 Introduction
    1.1 Who should use this document
    1.2 How to use this document
    1.3 Standards addressed
    1.4 User agents addressed
    1.5 Editorial notes
2 Definitions
3 Enabling easy localization for RTL scripts
4 General use of bidi markup
5 Basic setup for pages in RTL scripts
6 Changing the directionality of a block element
7 Mixing text direction inline
8 Handling parentheses & other mirrored characters
9 Overriding the Unicode bidirectional algorithm


A Acknowledgements
B References

Return to top of contents...1 Introduction

Return to top of contents...1.2 How to use this document

If you are new to this topic you may wish to read this document from end to end. It is, however, expected that this document will normally be used for reference purposes - the reader dipping in to a particular section to find out how to perform a specific task with internationalization in mind.

This document is one of several documents relating to the design of XHTML and HTML documents. An overview document is available that summarises all the recommendations of this and its companion documents together, organized according to tasks that a developer of XHMTL/HTML content may want to perform. When this material is used as a reference, it is recommended that the overview document is used as a starting point.

Cross references and further resources are summarized at the end of each section.

Editorial notes have been left in this version of the document. These are marked .

For information about the applicability of recommendations to user agents see below.

Return to top of contents...1.3 Standards addressed

This document provides techniques for developing pages using HTML 4.01, XHTML 1.0 and XHTML 1.1 with CSS1, CSS2 and some parts of CSS3.

Note that XHTML source can be served as XML (using MIME types application/xhtml+xml, application/xml or text/xml) or HTML (using the MIME type text/html).

It is very common for XHTML to be served as HTML, following the compatibility guidelines in Appendix C of the XHTML 1.0 specification. This allows authors with the right editing tools to produce valid XML code, which therefore lends itself to processing with such things as scripting or XSLT, but is also well supported for display by most mainstream browsers. (XHTML served as application/xhtml+xml is not well supported for browser display at the moment.) In this document we wish to reflect practical reality for content authors, so we cover XHTML served as text/html in the techniques.

Indeed we encourage the use of XHTML, and all the examples (unless trying to make a specific point about HTML 4.01) are written in XHTML.

For XHTML served as XML, this document limits its advice to documents served as application/xhtml+xml. Note that user agent support for XHTML served as XML is still patchy.

Return to top of contents...1.4 User agents addressed

In order to improve the value of this information to the user we try to ground techniques with information about their applicability to particular user agents.

User agents, in this current version, means a number of mainstream browsers. (The scope may grow as resources and test results become available for other user agents.)

In an attempt to make the task of tracking browser applicability manageable, we have chosen a 'base version' for each of the user agents we are tracking for applicability. This base version represents a fairly recent, standards-compliant version of the browser. Where a browser operates in both standards- and quirks-mode, standards-mode is assumed (ie. you should use a DOCTYPE statement).

The base versions considered for this version of the document include:

If the technique is applicable to a base version of a user agent the name of that user agent will appear immediately below the summary of the technique. If the technique is not applicable, the name will appear crossed out. If the name does not appear at all, this signifies that further investigation is needed. If the technique is applicable to a later version than the chosen base version, this will be indicated by adding the version number to the name.

Detailed information may also be provided from time to time about behavior of a user agent in an earlier version than the base version, or about some particular aspect of the behavior of a base version or later user agent. This is provided in a special boxed section within the body of the text.

Return to top of contents...2 Definitions

'Bidirectional', or 'bidi', text typically refers to text written using or including a script such as Arabic or Hebrew. In Arabic and Hebrew text the content flows predominantly from right to left, but embedded numbers or text in other scripts (such as Latin script) still runs left to right. Text in other languages, such as English, can also be bidirectional if it includes excerpts from languages such as Arabic and Hebrew.

Scripts such as Arabic and Hebrew, which are predominantly right-to-left in orientation, may also be referred to as 'RTL' (right-to-left) scripts.

Return to top of contents...3 Enabling easy localization for RTL scripts

 IE(Win)  Mozilla  Opera  NNav  Safari  IE(Mac) 

Values of right and left in attributes need to be converted when translating the document into a language using the Arabic or Hebrew scripts. It can save a lot of time and risk to use CSS stylesheets to achieve the same effect. One should expect the stylesheet to be converted as part of the translation process.

Attributes in HTML 4.01 that have values of right and left are align and clear. align is used with the elements hr, div, hx, p, col, colgroup, tbody, td, tfoot, th, thead and tr. clear is used with br.

For example, to right align a paragraph you could use the following CSS rule:


p { text-align: right; }

(Note that this technique does not refer to the values rtl and ltr that are used with the dir attribute.)

Return to top of contents...4 General use of bidi markup

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

Because directionality is an integral part of the document structure, markup should always be used to set the directionality for a document or chunk of information, or to indicate places in the text where the Unicode bidi algorithm is insufficient to achieve desired directionality.

The CSS2 specification recommends the use of markup for bidi text in HTML. In fact it goes as far as to say that conforming HTML user agents may ignore CSS bidi properties. This is because the HTML specification clearly defines the expected behavior of user agents with respect to the bidi markup.

See CSS vs. markup for bidi support for a fuller explanation.

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

Once you have established the appropriate directionality for the html element you will only need to apply bidi markup to a block element if you want that element's directionality to be different. The same applies for inline markup. Do not use inline bidi markup unless the Unicode bidi algorithm is insufficient on its own.

The following Arabic example shows bad usage. None of the dir attributes are needed if dir="rtl" was added to the html element. Removing them will significantly simplify the document, and reduce bandwidth - which may be an important consideration in countries where Arabic is spoken.


Bad practise. Do not copy!

<h2 dir="rtl">القاموس</h2>


<dt dir="rtl">المنالية</dt>

<dd dir="rtl">سهولة منال للويب من قبل الجميع بصرف النّظر عن إعاقةهم . </dd>

<dt dir="rtl">برنامج التصديق</dt>

<dd dir="rtl">

أو "الفاليديتور" أداة للتّحقّق من صلاحيّة صفحة ويب. على سبيل المثال، للتّحقّق من صلاحيّة

<span dir="ltr">HTML</span> ، يمكن أن تستخدم بزنامج تصديق

<span dir="ltr">W3C</span>


<dt dir="rtl">التّدويل</dt>

<dd dir="rtl">

تدويل الويب يسمح و يجعله سهل لاستخدام موقعك باللّغات و السّيناريوهات و الثّقافات المختلفة.



The Unicode Bidirectional Algorithm is applied to text that is stored in logical order, and determines the appropriate display direction of a sequence of characters. It does this on the basis of semantics associated with those characters by the Unicode Standard.


The following Arabic text contains the number 1996 that runs left to right within the overall right to left flow of the Arabic letters. No special markup or styling is needed to achieve this. The bidirectional algorithm alone is enough.

بدأ تطوير إكس إم إل في 1996 و صارت...

Occasionally the Unicode bidirectional algorithm is not sufficient to correctly order chunks of embedded text. Alternatively, you may want to override the effects of the bidirectional algorithm for a part of the page. In these cases you can apply additional markup to produce the ordering you want.


Background information


Return to top of contents...5 Basic setup for pages in RTL scripts

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

This will cause block elements and table columns to start on the right and flow from right to left. All block elements in the document will inherit this setting unless it is explicitly overridden.

No dir attribute is needed for documents that have a base directionality of ltr, since this is the default.

Having established the directionality at the html tag level, you should not use the dir attribute on other elements unless you want to change the directionality for that element. Unnecessary use of the dir attribute impacts bandwidth and potentially creates unnecessary additional work for page maintenance.

User Agent Notes:

Microsoft recommends that that the dir attribute be attached to the html element rather than the body element for several reasons relating to the functionality associated with the browser.

User Agent Notes:

In Internet Explorer adding the dir attribute to the html tag also moves the scroll bar to the left of the browser window.

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

Although the HTML specification recommends the use of the dir attribute on the html element, this guideline is motivated more by practical considerations relating to user agent behavior.

According to the Microsoft article Authoring HTML for Middle Eastern Content, the following behaviors can only be expected in Internet Explorer 5 if the dir attribute is on the html element, rather than the body element.

  • The OLE/COM ambient property of the document is set to AMBIENT_RIGHTTOLEFT

  • The document direction can be toggled through the document object model (DOM) (document.direction="ltr/rtl")

  • An HTML Dialog will get the correct extended windows styles set so it displays as a RTL dialog on a Bidi enabled system.

  • If the document has vertical scrollbars, they will be used on the left side if dir="rtl".

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

'Visual ordering' of text was common for old user agents that didn't support the Unicode bidirectional algorithm. Text was stored in the source code in the same order you would expect to see it displayed. This also involved such things as disabling any line wrapping, explicit right-alignment of text in paragraphs and table cells, and reverse-ordering of table columns when translating from English to a language using a bidi script. The result is very fragile code that is difficult to maintain. For example, if you want to add a few words in the middle of a paragraph, you would have to move text to and from every line that followed it in the paragraph.

Visually ordered bidirectional HTML does not conform to the HTML specification.

With 'logical ordering' text is stored in memory in the order in which it would normally be typed (and usually pronounced). The Unicode bidirectional algorithm is then applied by the browser to render the correct visual display.

Visual ordering isn't really seen much for Arabic. Since the Arabic letters are all joined up there was a stronger motivation on the part of Arabic implementers to enable the logical ordering approach.

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

It is usually best to use an Unicode encoding, such as UTF-8. This technique applies if, for some reason, you choose to serve your Hebrew page in an ISO encoding instead.

According to RFC1555 and RFC1556, there are special conventions for the use of charset parameter values to indicate bidirectional treatment in MIME mail, in particular to distinguish between visual, implicit, and explicit directionality. 'Visual' refers to the practise of typing in the Hebrew characters in reverse order and preventing automatic line breaks. Formatting the document visually in this way is typically done to ensure reasonable display on older user agents that do not handle bidirectionality. Such documents do not conform to the HTML specification. 'Implicit' is also called logical ordering, and refers to an approach where all characters in memory in the order in which it would normally be typed. Correct ordering for display is then done by a special algorithm (this is the preferred approach). 'Explicit' refers to the use of explicit markers in the text to indicate directional changes.

The charset parameter value ISO-8859-8 for Hebrew denotes visual ordering, ISO-8859-8-i denotes implicit bidirectionality, and ISO-8859-8-e denotes explicit directionality.

Because HTML uses the Unicode bidirectional algorithm, conforming documents encoded using ISO 8859-8 must be labeled as ISO-8859-8-i. Explicit directional control is also possible with HTML, but cannot be expressed with ISO 8859-8, so "ISO-8859-8-e" should not be used.

Contrary to what is said in RFC1555 and RFC1556, ISO-8859-6 (Arabic) is not visual ordering.


Further information

Background information

Reference links


Return to top of contents...6 Changing the directionality of a block element

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

The following example illustrates the effect of applying a change in directionality to a block level element using the dir attribute.


The following paragraph inherits the LTR directionality of this page, and its source contains some Hebrew text, followed by punctuation, followed by a graphic.

להוביל את הרשת למיצוי הפוטנציאל שלה…  Small picture of a globe.

The following is exactly the same code, but with an explicit dir="rtl" added to the paragraph tag to turn this into a right-to-left paragraph embedded in this left-to-right page.

להוביל את הרשת למיצוי הפוטנציאל שלה…  Small picture of a globe.

Note, in particular, that the positions of the image and punctuation in the example above change relative to the text, because the overall directional flow has been changed. Note also, however, that the Hebrew characters are still read in the same direction. Their sequence is determined by the Unicode bidirectional algorithm, not by the dir attribute.

The content of all nested block elements will inherit directionality (unless of course a nested element explicitly changes its directionality using dir). Remember that the base directionality for a document should already be established by the html element. There is no need to add dir attributes to block level elements unless you want to apply a different direction to that set by the html tag or an explicit setting on a parent block element.

Visual user agents that support bidirectional display will typically right-align block elements in a rtl context, and vice versa. (See the example above.)

The dir attribute setting also affects the flow of columns in a table.


The following table element has a dir attribute set to rtl.

مكتب W3C הישראליمكتب W3C הישראליمكتب W3C הישראלי

Here is the same table element with the dir attribute removed. The directionality of the columns is now set by the next ancestor element that specifies directionality - in this case the default ltr setting of the html tag of this document.

مكتب W3C הישראליمكتب W3C הישראליمكتب W3C הישראלי

Note how the cells inherit the directionality set for the table. This produces the alignment of text in the cell, the order of text relative to the number, and the position of the question mark.

Note also that in most browsers, unlike other block elements, adding a dir attribute to the table will not cause the table to be aligned differently. It will only affect the order of columns and table content. If you want the table to be aligned with the other side of the content area you will need to wrap the table in another block element (eg. a div) that carries a dir attribute.

Return to top of contents...7 Mixing text direction inline

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

You need to be familiar with the concepts in What you need to know about the bidi algorithm and inline markup to understand this technique.

Unfortunately, the bidirectional algorithm may not always produce the desired result with regard to the placement of punctuation. For instance, the overall context of the example below is LTR. If we introduce some punctuation between the Arabic and Latin letters it will produce the following (undesirable) result.


The title is "مفتاح معايير الويب!" in Arabic.

The exclamation mark is part of the Arabic phrase and should have appeared to its left. It appears to the right because it is between an Arabic and Latin character and the overall paragraph direction is LTR. It is therefore treated as part of the English text.

An easy way to fix this is to insert the Unicode character U+200F, called the RIGHT-TO-LEFT MARK, after the exclamation mark. There is a similar character, U+200E, called the LEFT-TO-RIGHT MARK.

The best way to represent these characters is with the pre-defined HTML character entities, &rlm; and &lrm;.

Now with two strong RTL characters on either side, the exclamation mark too will be treated as part of the RTL directional run and we will get the following (correct) result.


The title is "مفتاح معايير الويب!‏" in Arabic.

Note that it is possible to use actual Unicode characters or Numeric Character References (ie. &#x200E; and &#x200F;) rather than the character entities mentioned above. The character entity is recommended because it provides maximum clarity in the code. A character code would not be visible, and a numeric value may be easily mistaken.


مشس هخصث خهس title in english!&lrm; تخت تخهثز.

مشس هخصث خهس title in english!&#x200E; تخت تخهثز.

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

You need to be familiar with the concepts in What you need to know about the bidi algorithm and inline markup to understand this technique.

The Unicode characters RLM (right-to-left mark) and LRM (left-to-right mark) can be useful to achieve the correct ordering of text items that are only separated by directionally neutral characters. We will show two examples of this.

In our first example, below, the list order is incorrect because the first two Arabic words should be reversed and the intervening comma, which is part of the English text, should appear immediately to the right of the first word. The reason for the failure is that, with a strongly typed right-to-left (RTL) character on either side, the bidirectional algorithm sees the neutral comma as part of the Arabic text.



The names of these states in Arabic are مصر, البحرين and الكويت respectively.


The names of these states in Arabic are مصر,‎ البحرين and الكويت, respectively.

The correct result was obtained by simply placing a &lrm; entity immediately after the comma. This has the effect of placing the neutral comma between two strongly typed characters, one left-to-right and the other right-to-left. Because neutral characters in this position take on the directionality of the overall context (here the paragraph), the bidi algorithm will now see it as part of the English left-to-right flow and will see the two Arabic words as separate.

In the second example, this time in a RTL Hebrew paragraph, the beginning of the sentence looks a real mess. This is because the text from "W3C" to "Consortium" is seen as a single directional run of LTR characters. (The second parenthesis from the right falls between LTR and RTL characters, so assumes the directionality of the paragraph - RTL.)



W3C - (World Wide Web Consortium) מעביר את שירותי הארחה באירופה ל - ERCIM.


W3C -‏ (World Wide Web Consortium) מעביר את שירותי הארחה באירופה ל - ERCIM.

It is very simple to obtain the correct result. Simply put a &rlm; entity immediately after the hyphen. This causes the hyphen and the nearby parenthesis to be seen as part of the paragraph's text flow.

Note that the dir attribute is not appropriate to resolve this case.

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

At a simple level the Unicode bidirectional algorithm takes care of the reordering of inline text, but where there is nesting of directionality the dir attribute may need to be used.

The Unicode bidirectional algorithm organizes characters into directional runs - sequences of characters with the same directionality. Directionally neutral characters such as spaces and punctuation take on the directionality of surrounding characters, allowing directional runs to span several words. In the example below there are three directional runs - English, Arabic, and English. These are ordered according to the prevailing directionality of the paragraph - in this case left-to-right.


The title is مفتاح معايير الويب in Arabic.

Unfortunately, the bidirectional algorithm alone does not produce the desired result if one of the directional runs contains mixed direction text, as can be seen in the following example.

The incorrect line of text is coded as a simple sequence of characters without any inline markup. Note that the order of the two Hebrew words is correct, but the text "W3C" should appear on the left hand side of the quotation and the comma should appear between the Hebrew text and "W3C".



The title says "פעילות הבינאום, W3C" in Hebrew.


The title says "פעילות הבינאום, W3C" in Hebrew.

To get the correct result we have to create a new 'embedding level' by surrounding the text within the quote marks with a span element and setting its dir attribute to rtl as shown here. (The language information has been omitted to make the example clearer.)


<p>The title says "<span dir="rtl">פעילות הבינאום, W3C</span>" in Hebrew.</p>

This causes the comma to take on the same RLT directionality as the whole span, and orders the Hebrew directional runs appropriately.

Note that we have used a span element to carry the dir attribute in this case. If the quote had already been surrounded by an element, the dir attribute should be attached to that. A span element should only be used where there is nothing else available.

Note also that we placed the span element inside the quotation marks, since these are a part of the English text.

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

@@ make sure to refer to the title element

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

There are a number of control characters in Unicode that can be used to create the same effect as markup for bidirectional text. These are:






Both Unicode in Markup Languages and the HTML 4.01 specification advise against using these when markup is available, and they particularly advise against mixing control codes and markup.

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 


Further information


Return to top of contents...8 Handling parentheses & other mirrored characters

 IE(Win)  Mozilla  NNav 

The shape of the glyphs used for a pair of mirrored characters will be determined at run time according to the directional context in which they appear.

Return to top of contents...9 Overriding the Unicode bidirectional algorithm

 IE(Win)  Mozilla  Opera7.2  NNav  Safari  IE(Mac) 

bdo stands for 'bidirectional override'. This inline element can be used to override the Unicode bidirectional algorithm if the dir attribute doesn't produce the desired result or if you want to produce a different result.


Illustrations of the characters as stored in memory in earlier examples are produced by simply applying a bdo tag. This causes the characters to flow left to right, regardless of the directionality of the characters involved. For instance, an example showing how text is stored in the computer's memory such as

The title says "פעילות הבינאום" in Hebrew.

can be produced using the following underlying code

<p><bdo dir="ltr">The title says "פעילות הבינאום" in Hebrew.</bdo></p>

Without the bdo tag, the Unicode bidirectional algorithm would have produced the following result. Note how the characters in the Hebrew words run in a different direction.

The title says "פעילות הבינאום" in Hebrew.

Return to top of contents...A Acknowledgements

The following GEO Task Force members have contributed their time and valuable comments to shaping these guidelines:

Phil Arko, Steve Billings, Deborah Cawkwell, Wendy Chisholm, Andrew Cunningham, Martin Dürst, Lloyd Honomichl, Russ Rolfe, Peter Sigrist, Tex Texin, Najib Tounsi

Return to top of contents...B References

Bert Bos, Håkon Wium Lie, Chris Lilley, Ian Jacobs, Eds., Cascading Style Sheets, level 2 (CSS2 Specification), W3C Recommendation. (See http://www.w3.org/TR/REC-CSS2.)
HTML 4.01
Dave Raggett, Arnaud Le Hors, Ian Jacobs, Eds., HTML 4.01 Specification, W3C Recommendation. (See http://www.w3.org/TR/html401.)
H. Nussbacher and Y. Bourvine, Hebrew Character Encoding for Internet Messages, December 1993. (See http://www.ietf.org/rfc/rfc1555.txt
H. Nussbacher, Handling of Bi-directional Texts in MIME, December 1993. (See http://www.ietf.org/rfc/rfc1556.txt
Martin Dürst and Asmus Freytag, Unicode in XML and other Markup Languages, Unicode Technical Report #20 and W3C Note. (See http://www.w3.org/TR/unicode-xml.)