[ contents ]

W3C

Authoring HTML: Handling Right-to-left Scripts

W3C Working Group Note 08 September 2009

This version:
http://www.w3.org/TR/2009/NOTE-i18n-html-tech-bidi-20090908/
Latest version:
http://www.w3.org/TR/i18n-html-tech-bidi/
Previous version:
http://www.w3.org/TR/2009/WD-i18n-html-tech-bidi-20090714/
Editor:
Richard Ishida, W3C

Abstract

This document provides advice for the use of HTML markup and CSS style sheets to create pages for languages that use right-to-left scripts, such as Arabic, Hebrew, Persian, Thaana, Urdu, etc. It explains how to create content in right-to-left scripts that builds on but goes beyond the Unicode bidirectional algorithm, as well as how to prepare content for localization into right-to-left scripts.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document provides advice on practical techniques related to the creation of content in languages that use right-to-left scripts, such as Arabic and Hebrew, or content in other languages that includes fragments of text in these scripts.

This is a W3C Working Group Note produced by the Internationalization Core Working Group, part of the W3C Internationalization Activity.

Please send comments on this document to www-international@w3.org (publicly archived).

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This document may be updated, replaced or obsoleted by other documents at any time. Therefore, quotes or references to specific information in the document should include the publication date of this version, 08 September 2009. It is inappropriate to cite this document as other than a Working Group Note, which is not an endorsed W3C Recommendation.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Important concepts
3 Problems with bidirectional source text
4 Authoring with localization in mind
5 Setting up a right-to-left page
6 Changing direction on block elements
7 Mixing text direction inline
8 Handling parentheses & other mirrored characters
9 Overriding the Unicode bidirectional algorithm
A Acknowledgments

1 Introduction

Go to the table of contents.1.1 Who should use this document?

All content authors working with HTML and CSS who are working with text in a language that uses a right-to-left script, or whose content will be localized to a language that uses a right-to-left script. The term 'author' is used in the sense of a person that creates content either directly or via a script or program that generates HTML documents.

This document provides guidance for developers of HTML that enables support for international deployment. Enabling international deployment is the responsibility of all content authors, not just localization groups or vendors, and is relevant from the very start of development. Ignoring the advice in this document, or relegating it to a later phase in the development process, will only add unnecessary costs and resource issues at a later date.

It is assumed that readers of this document are proficient in developing HTML and XHTML pages - this document limits itself to providing advice specifically related to internationalization.

Go to the table of contents.1.2 How to use this document

Note: This document will assume prior familiarity with the concepts introduced in the tutorial Creating HTML Pages in Arabic & Hebrew. That tutorial provides an overview of how to create pages in right-to-left scripts.

This document lists a number of techniques related to handling pages in right-to-left scripts, with explanations. Each technique is summarised in text on a light blue background. This is followed by

The key recommendation for each technique is summarized tersely on a light blue background.

The text that follows the summary gives concise advice on how to implement the technique, and additional explanations and discussion follow that where appropriate. In some cases, the applicability of the recommendation may vary, depending on your aims and context. Where there are pros and cons for a given recommendation, we try to clearly indicate those.

The document is primarily designed for use as a reference source, where readers look up techniques one by one. For this reason, you may find a significant amount of duplication if you read the whole document in one go.

Go to the table of contents.1.2.1 Examples

We show examples of text in native scripts using images, to ensure that the example looks correct regardless of the fonts and rendering capabilities of the reader's platform. Clicking on the View code. graphic will typically display the actual text version in a separate window. You can then examine the source text too.

To make it easier to understand the flow of characters in an example if you do not read Arabic or Hebrew text, we also provide an ASCII-only version of many examples. This version uses uppercase translations to represent the Arabic or Hebrew characters, while all Latin text is lowercased. The order of characters represents the way you would see the native text arranged on screen (so you usually read the translations from right to left).

See also the note in the section Example source text in this document about the difficulties of representing code examples, and the approach taken in the light of numerous possibilities.

Go to the table of contents.1.2.2 Browser-specific notes

When there are things you should know about with regard to how, or whether, a technique is supported on particular browsers this information can be found by following the 'Get more information' link at the bottom of each technique. This information is kept in a separate document so that it can be kept up-to-date as time passes. You should check the notes document regularly for changes.

Any browser-specific notes generally relate to the latest versions of a selection of browsers that were widely deployed at the time this document was published. However, they may also include information about later versions, as they are released, and may also eventually include information about other browsers.

In the absence of browser-specific notes, you can assume that the technique works interoperably on the following browser versions (which are those tested prior to the release of the document):

  • Internet Explorer icon Internet Explorer v6-8
  • Firefox icon Firefox v3.5.2
  • Opera icon Opera v9.64

  • Chrome icon Google Chrome v2.0.172.33

  • Safari icon Safari v4.0

Three versions of Internet Explorer are listed, since they still account for a large proportion of the user base.

Go to the table of contents.1.2.3 Further information

When you follow the 'Get more information' link at the bottom of each technique you will also find links to one or more locations in a techniques index. These links lead to further how-to information, useful links, tests and results pages, etc.

The information in the techniques index is updated as new resources are discovered or developed.

Go to the table of contents.1.3 Technologies addressed

This document provides techniques for developing pages using HTML 4.01 and XHTML 1.0 served as text/html with CSS.

Note: XHTML 1.0 can be served as XML (using MIME types application/xhtml+xml, application/xml or text/xml) or HTML (using the MIME type text/html). It is very common for XHTML 1.0 to be served as HTML, hopefully following the compatibility guidelines in Appendix C of the XHTML 1.0 specification. This allows authors to produce valid XML code, which has benefits for processing with scripts or XSLT, but is also well supported for display by all mainstream browsers. (Unlike XHTML served as application/xhtml+xml, which is not well supported by some browsers at the moment.) This document does not concern itself with XHTML served as XML.

Where a browser operates in both standards- and quirks-mode, standards-mode is assumed (ie. you should use a DOCTYPE statement).

Go to the table of contents.1.4 Terminology

base direction
The overall directional context for a document or block of text. In HTML, the default is left-to-right, but this can be changed using the dir attribute.
bidirectional (bidi) algorithm
An algorithm, described in the Unicode Standard, for producing the correct visual ordering of right-to-left and bidirectional text.
block element
Block elements are elements such as p, div, ol, ul, blockquote, body, etc. The opposite of a block element is an inline element, such as span, em, strong, a, etc.
directional run
A sequence of adjacent characters in bidirectional text that all have the same directionality. In bidirectional text there will always be a minimum of two directional runs, one RTL and the other LTR.
inline element
Inline elements are elements such as span, em, strong, a, etc. The opposite of an inline element is a block element, such as p, div, ol, ul, blockquote, body, etc.
inline text
Text that lies wholly within a single block element, ie. text within a paragraph. Inline text may include inline markup.
logical order
Characters arranged in memory in, for the most part, the order in which they are pronounced. Compare to visual order.
LRE
A short name for the Unicode character U+202A LEFT-TO-RIGHT EMBEDDING. This invisible control character is used to begin a range of text with an embedded base direction of left-to-right.
LRO
A short name for the Unicode character U+202E LEFT-TO-RIGHT OVERRIDE. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from left to right.
PDF
A short name for the Unicode character U+202C POP DIRECTIONAL FORMATTING. This invisible control character is used to signal the end of a range of text that was started with one of the RLE, LRE, RLO or LRO characters.
RLE
A short name for the Unicode character U+202B RIGHT-TO-LEFT EMBEDDING. This invisible control character is used to begin a range of text with an embedded base direction of right-to-left.
RLO
A short name for the Unicode character U+202E RIGHT-TO-LEFT OVERRIDE. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from right to left.
visual order
Characters arranged in memory in the order in which they are read on-screen. Compare to logical order.

2 Important concepts

Go to the table of contents.2.1 Bidirectional (or bidi) text

'Bidirectional', or 'bidi', text typically refers to text written using a mixture of right-to-left and left-to-right scripts. For example, in Arabic and Hebrew text the content flows predominantly from right to left, but embedded numbers or text in other scripts (such as Latin script) still runs left to right. Text in other languages, such as English, can also be bidirectional if it includes excerpts from languages such as Arabic and Hebrew.

Scripts such as Arabic and Hebrew, which are predominantly right-to-left in orientation, may be referred to as 'RTL' (right-to-left) scripts.

This document will use the Arabic and Hebrew languages for most of its examples. Many languages use the Arabic script, and several other scripts run predominantly right-to-left: these include Thaana, N'ko, and Syriac, as well as other scripts no longer in common use, such as Cypriot, Phoenician and Kharoshthi.

Go to the table of contents.2.2 Relationships between language and directionality

Direction is a property of scripts, not language.

Some people think that information about directionality can be inferred from information about the language of the text, but this is not true. There must be a one-to-one mapping between directionality and language for this to work, and there isn't. For example, Azerbaijani can be written using both right-to-left and left-to-right scripts, and the language code az is relevant for either.

In addition, when using directional markup inline, the markup and the values of that markup do not necessarily coincide with language declarations.

Also, markup used to indicate directionality has values that indicate that the normal directionality should be overridden; it is not possible to indicate that using language related values.

In the same way, attributes indicating text direction in HTML and XHTML do not, and should not, provide information about the language of text.

There exist already separate mechanisms for declaring language and directionality in HTML and XHTML, and these ideas should not be confused.

Other W3C techniques information describes how to declare character encoding and language.

Go to the table of contents.3 Problems with bidirectional source text

There is currently a lack of good editing environments for creating HTML pages using right-to-left scripts. Because of the fact that HTML markup and escapes contain punctuation and strongly typed letters, you are always working with bidirectional source text. However, if the editor is not aware that the markup is not ordinary text (which is usually the case) it can produce some odd effects, and make coding difficult.

This section simply mentions some of those problems, so that you are forewarned. It doesn't propose a full solution, but it does offer some advice which may help with problematic editing environments.

Go to the table of contents.3.1 Working with markup

Unless your editor recognizes markup in source text as not being normal text, the strongly typed letters and punctuation in the markup will appear in places you wouldn't expect, and sometimes interfere with the order of the content itself.

If you are creating a large amount of right-to-left text, it makes sense to set the base direction of the editing window in your editor to right-to-left. This helps ensure that the content is correctly ordered. Unfortunately, this tends to increase the likelihood that your markup looks strange in the source text.

Example 1 shows some simple markup in a left-to-right context.

<p class="myclass" title="العربي">مشس هخصث خهس تخت تخهثز.</p>

The source contains a p tag followed by a class attribute, followed by a title attribute with some Arabic text as its value. The content of the paragraph itself starts with Arabic text. The resulting order in a left-to-right environment (where Arabic text is indicated by text in square brackets) is

<p class="myclass" title="[paragraph_content]<"[title_value].</p>.

As Example 2 shows, things are hardly better if the overall context for the source code is right-to-left. In this case, the resulting order for the same source text is

<p/>[paragraph_content]<"[title_value]"=p class="myclass" title>.

<p class="myclass" title="العربي">مشس هخصث خهس تخت تخهثز.</p>

Note, however, that this source will display correctly in a user agent. This is just a problem for reading and maintaining the source text.

The title attribute with Arabic text makes the situation much worse that normal in the above examples. The problem arises because there is only 'punctuation' between two runs of strongly-typed right-to-left text, so the Unicode bidirectional algorithm considers this to be a single run of text. It helps a little, if you can do it, to ensure that an attribute with a ltr value (ie. here the class attribute) appears last. This would make the text in a left-to-right context look as expected, and in a right-to-left context it would prevent the interaction of markup with content (see Example 3).

<p title="العربي" class="myclass">مشس هخصث خهس تخت تخهثز.</p>

If you are dealing with content that is predominantly in a right-to-left script, then, you need to look for a source editor that recognizes markup as a special construct, and produces a sensible order.

It can also help to start the content on a new line (see Example 4), however this doesn't always help with inline markup. Also, you should try to avoid including white space before the closing markup, as this can lead to other problems (see 7.6 Watch out for white space).

<p class="myclass" title="العربي">

مشس هخصث خهس تخت تخهثز.</p>

Not only that, but if your markup includes a dir attribute to change the directional context of the content, your editor should recognize this and produce a corresponding change in the order of the source code.

Go to the table of contents.3.2 Adding escapes to the content

Note: See 7.2 Weak/neutral characters at the edge of a directional run and 7.3 Adjacent, same-direction directional runs for details about how escapes can be used to correctly order bidirectional inline text.

If you use a Unicode character for Unicode control characters such as the RIGHT TO LEFT MARK (RLM) or ZERO-WIDTH NON JOINER, you will not usually be able to see it in the source text, since it is invisible. For this reason you may think that a useful way to represent these characters is with the pre-defined HTML character entities, &rlm; and &zwnj;, or their numeric equivalents, &#x200F; and &#x200C;.

Unfortunately, such an approach typically has its problems, too. As described in the previous section related to markup in source text, the strongly-typed left-to-right characters and 'punctuation' characters in the escapes will normally cause the Unicode bidirectional algorithm to display very odd looking source text.

Very few editors currently recognize, for example, the sequence of characters &#x200F; as a single unit representing a character with a strong right-to-left direction. They treat this as simply text containing punctuation, numbers and two strongly-typed left-to-right characters (x and F), and apply the Unicode bidirectional algorithm to that as they would to any normal text.

Example 5 shows a typical view of source text after adding an escape to bidirectional text in right-to-left ordered source text. The sequence &#x200F; embedded in right-to-left text is displayed ;x200F#&. At the beginning or end of embedded English text the escape is broken into fragments, and appears as x200F;text in english#& or ;text in english&#x200F, respectively.

Note that the source will still display correctly in a user agent. This is just a problem for reading and maintaining the source text.

مشس&#x200F; هخصث خهس text in english تخت تخهثز.

مشس هخصث خهس &#x200F;text in english تخت تخهثز.

مشس هخصث خهس text in english&#x200F; تخت تخهثز.

Various approaches are possible, if you want to avoid using invisible characters:

  • use an editor that recognizes an escape as a single unit representing a RLM/LRM character and produces the expected effect on the surrounding source text

  • use an editor that provides a symbolic visual representation of the RLM/LRM character, so that you don't lose sight of it

  • break the source code line around the escape - works in some cases

  • learn to live with the undesirable reordering effects for escapes.

Go to the table of contents.3.3 Example source text in this document

Given the discussion above, representing examples of source text in this document can be quite difficult. Should we show source text in right-to-left order, or left-to-right? Should we assume that the editor recognizes and handles markup and escapes as separate entities from the content, and create source fragments that look like that - or should we show source as it really looks for many people who don't have such clever editors? And particularly, should we assume that the bidirectional algorithm is properly applied in the source editor, picking up cues from the markup, or not?

We will avoid source code examples unless they are very useful. We will try to describe how to apply the markup rather than show it.

We will typically represent examples in a left-to-right context, and use invisible markup to make content and markup look as you might expect it to be displayed by an intelligent editor, since this will provide maximum clarity about the point being made, even if it doesn't reflect how the markup will look for many people.

4 Authoring with localization in mind

Whenever possible, avoid HTML attributes with values of right and left. Use CSS in a linked style sheet instead.
How to

Attributes in HTML 4.01 that have values of right and left are align and clear.

The align attribute is used with the elements hr, div, h1-h6, p, table, caption, col, colgroup, tbody, table, td, tfoot, th, thead and tr. The use of this attribute is deprecated in HTML 4.01 for all but the table-related elements.

clear is used with br, but is also deprecated in HTML 4.01.

For example, to right-align an image, such as the icons in this document that return you to the table of contents, you could use the CSS rule in Example 6:

You can achieve the same effect as the clear attribute using the CSS shown in Example 7. This rule ensures that the h2 element has no floated content to its left.

These style rules would, of course, need to be changed in the style sheet for a version of the page that was localized into a right-to-left script, but it should be much easier to do that than to go through all the HTML content.

(Note that this technique does not refer to the values rtl and ltr that are used with the dir attribute.)

Discussion

Values of right and left in attributes need to be reversed when translating the document into a language using a right-to-left script.

Whether you are authoring a LTR document or a RTL document, it can save a lot of time and risk to use CSS style sheets to achieve the same effect, since one small change in a CSS file can save the trouble of editing code in many HTML documents. (One should expect the style sheet to need conversion anyway as part of the translation process.)

Only use text-align where you specifically want to override the current default alignment.
Discussion

Values of right and left need to be reversed when translating the document into a language using a script with a different direction. To reduce the effort and complexity of adapting the styles it is better to only use text-align where it is actually needed to override the current default alignment.

Often people apply it by default when it is not actually required. This overrides the default alignment derived from the base direction, and leads to more work when localizing a document - particularly if the document contains blocks of text in more than one direction. By default the dir attribute setting should produce the correct alignment.

5 Setting up a right-to-left page

Add dir="rtl" to the html tag any time the overall page direction is right-to-left.
How to

Add dir="rtl" to the html tag any time the overall page direction is right-to-left.

This will cause block elements and table columns to start on the right and flow from right to left. All block elements in the document will inherit this setting unless it is explicitly overridden.

No dir attribute is needed for pages that have a base direction of ltr, since this is the default.

Discussion

Setting the dir attribute on the html element sets the default direction for all elements in the page, including the head element. Note, however, the effect of this on the user interface of some browsers (the 'browser chrome') as described in the browser-specific notes.

Having established the base direction at the level of the html tag, you should not use the dir attribute on other elements unless you want to change the base direction for that element. Unnecessary use of the dir attribute impacts bandwidth and creates unnecessary additional work for page maintenance (see 6.2 Use bidi markup only when necessary).

There is not usually any good reason for using dir on the body element. Placing the attribute on the html element has the same effect, but also covers the text in the head element, too.

If you need to avoid the scroll bar moving on some browsers, put dir on the head element and a div just inside the body element.
How to
Note: This technique is relevant only if you consider it to be a problem that putting dir in the html tag may affect the user interface of some browsers. See the discussion below before implementing this technique.

To avoid this behavior without tagging every block element in the document, you could add, immediately inside the body element, a div element that surrounds all the other content in the document, and apply the dir attribute to that. The directionality will then be inherited by all other block elements in the body of the document, but will not set off the changes to the browser.

If you do this, you must ensure that you add a dir attribute to the head element also, to cover its title element, attribute values, etc.

<html lang="he">
<head dir="rtl">
    ...
</head>
<body>
<div dir="rtl">
 ...
</div>
</body>
</html
Discussion

In some browsers, applying a right-to-left direction in the html or body tag will affect the user interface, too. If the page has a scroll bar, it will appear on the left side of the window. JavaScript alert boxes may also be mirror imaged.

Note: At the time of writing, the scroll bar effect can be seen in Internet Explorer and Opera, but the JavaScript dialog boxes are only different for Internet Explorer and only in certain circumstances. You can find more detailed and up-to-date information in the notes page associated with this document.

Some speakers of languages that use right-to-left scripts prefer the directionality of the user interface to be associated with the desktop environment, not with the content of a particular document. Because of this, they may prefer to not declare the document directionality on the html or body tag.

The approach outlined in the How To section above shows the simplest way to avoid this behavior, and yet still ensure that the default base direction for the whole document is right-to-left.

If you want to know more about this, read the Microsoft article Authoring HTML for Middle Eastern Content. According to this, the following behaviors can only be expected in Internet Explorer 5+ if the dir attribute is on the html element, rather than the body element.

  • The OLE/COM ambient property of the document is set to AMBIENT_RIGHTTOLEFT

  • The document direction can be toggled through the document object model (DOM) (document.direction="ltr" or document.direction="rtl")

  • An HTML Dialog will get the correct extended windows styles set so it displays as a RTL dialog on a Bidi enabled system.

  • If the document has vertical scrollbars, they will be used on the left side if dir="rtl".

Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding.
How to

Create and store your Hebrew content in logical order (ie. usually as you would pronounce it), not the order you expect to see it displayed.

You will need to use an appropriate character encoding. It is usually best to use an Unicode encoding, such as UTF-8. If, for some reason, you choose to serve your Hebrew page in an ISO encoding instead, then specify ISO-8859-8-I, not ISO-8859-8.

Discussion

Visual ordering. Visual ordering of text was common for old user agents that didn't support the Unicode bidirectional algorithm. Text was stored in the source code in the same order you would expect to see it displayed. This also involved such things as disabling any line wrapping, explicit right-alignment of text in paragraphs/ table cells, and reverse-ordering of table columns when translating from English to a language using a bidi script. For example, if you want to add a few words in the middle of a paragraph, you would have to move text to and from every line that followed it in the paragraph (see the tutorial Creating HTML Pages in Arabic & Hebrew for an example).

Note, too, that if you have in-line markup, such as emphasis or link text, that spans more than one line, you will need to mark up the text runs on both lines separately. Again, adding text before such markup in a paragraph would mean that you have to carefully change this markup to reflect the new position of the text.

The result is very fragile code that is difficult to maintain. In addition, all the extra tags needed to manage the text would bloat your code and impact not only authoring time, but also bandwidth. Visually ordered bidirectional HTML does not conform to the HTML specification unless bdo markup is used.

Logical ordering. Using logical ordering, on the other hand, makes it almost trivial to create long paragraphs of flowing text that automatically wraps to the width of the block element. It also makes it much easier to address accessibility, using such things as screen readers.

Logically ordered text is stored in memory in the order in which it would normally be typed (and usually pronounced). The Unicode bidirectional algorithm is then applied by the browser to render the correct visual display.

Note: Visual ordering isn't really seen much for Arabic. Since the Arabic letters are all joined up there was a stronger motivation on the part of Arabic implementers to enable the logical ordering approach.

Character encoding considerations. Certain character encodings are associated with visual vs. logical ordering of text. Text in a Unicode encoding, such as UTF-8, is always logical. Unicode is generally the best choice for a character encoding, but if you wish to use an ISO code page, you should read 6.2 Use bidi markup only when necessary.

If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8.
How to

It is usually best to use an Unicode encoding, such as UTF-8. If, for some reason, you choose to serve your Hebrew page in an ISO encoding instead, then declare the encoding to be ISO-8859-8-i, not ISO-8859-8, and create and store your Hebrew content in logical order, not the order you expect to see it displayed.

Discussion

Certain character encodings are associated with visual vs. logical ordering of text. Text in a Unicode encoding, such as UTF-8, is always logical.

According to RFC1555 and RFC1556, there are special conventions for the use of charset parameter values to indicate bidirectional treatment in MIME mail, in particular to distinguish between visual, implicit, and explicit directionality. 'Visual' refers to the practice of storing Hebrew characters in presentation order, so that there is no reliance on reordering performed by the operating system or the display subsystem. 'Implicit' is also called logical ordering, and refers to an approach where all characters are stored in memory in the order in which they would normally be typed. Correct ordering for display is then done by a special algorithm (this is the preferred approach). 'Explicit' refers to the use of explicit markers in the text to indicate directional changes.

The charset parameter value ISO-8859-8 for Hebrew denotes visual ordering, ISO-8859-8-i denotes implicit bidirectionality, and ISO-8859-8-e denotes explicit directionality. (The latter is not supported by any common browser.)

HTML assumes by default that bidi data is stored in logical order, and that rendering agents will have to use the Unicode Bidirectional Algorithm to present the text in correct visual order. If the encoding is ISO-8859-8, the corresponding charset specification must be ISO-8859-8-i.

Explicit directional control is also possible with HTML, but cannot be expressed with ISO 8859-8, so "ISO-8859-8-e" should not be used.

Note, also, that ISO encodings don't include diacritics - if you want these, then use a logical encoding such as a Unicode encoding or Windows-1255.

By the way, contrary to what is said in RFC1555 and RFC1556, ISO-8859-6 (Arabic) does not imply visual ordering.

Do not use CSS styling to control directionality in HTML. Use markup.
How to

Just use the dir attribute when you need to indicate direction, and don't use CSS properties.

Discussion

It is possible to express direction for a range of text using the CSS direction and unicode-bidi properties. Even when CSS is used, however, because directionality is an integral part of the document structure and needs to be persistent, you should always use dedicated markup to set the base direction for a document or chunk of information, or to indicate places in the text where the Unicode bidi algorithm is insufficient to achieve desired inline directionality.

The way in which a browser is expected to handle the dir attribute and its values is clearly defined in the HTML specification, so CSS is not needed. The CSS2 specification also recommends the use of markup for bidi text in HTML. In fact it goes as far as to say that conforming HTML user agents may ignore CSS bidi properties, since the HTML specification clearly defines the expected behavior of user agents with respect to the bidi markup.

Although XHTML uses XML syntax, it is usually served to browsers using the text/html MIME type, ie. the browser recognizes it and treats it as HTML. Therefore the same principle applies: use the markup, don't use CSS for direction.

See the article CSS vs. markup for bidi support for a fuller explanation.

6 Changing direction on block elements

Add the dir attribute to a block element to change base direction. Don't use CSS or Unicode control characters.
How to

Add the dir attribute to a block element where you want to change the base direction. Example 10 shows how you might mark up a blockquote element to render a left-aligned English quotation in a right-to-left page.

<blockquote dir="ltr" lang="en"
   cite="http://www.example.org/romeoandjuliet#2.2.2">
<p>But, soft! What light through yonder window breaks?<br>
It is the east, and Juliet is the sun.</p>
</blockquote>

Do not try to achieve the same effect using CSS or Unicode control characters.

Note also that you should only use the dir attribute on block elements when you need to change the base direction from the current default (see 6.2 Use bidi markup only when necessary).

Tables are slightly different from other block elements. Using a dir attribute directly on a table tag will reorder the columns and contents as expected, but will not cause the table to move to the other side of the displayed page. If you want that to happen, you should put the table in a block element, such as div, and add the dir attribute to that, rather than put the attribute on the table element.

Discussion

Apart from the fact that it can be difficult to manage Unicode control characters because they are invisible, they don't really work for managing base direction across block elements because of questions of scoping and inheritance.

CSS is not needed for bidi support in HTML, and it is best to rely on the dedicated markup that HTML provides, with all needed behavior built in (see 5.5 Don't use CSS styling for direction in HTML)

For more information see the tutorial Creating HTML Pages in Arabic & Hebrew.

Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction.
How to

Once you have established the appropriate base direction for the html element you will only need to apply bidi markup to another element if you want that element's base direction to be different from that currently in force.

The same principle applies for inline markup. Do not use inline bidi markup unless the Unicode bidi algorithm is insufficient on its own to produce the expected results.

Discussion

The following Arabic example shows bad usage. None of the dir attributes are needed if dir="rtl" is added to the html element. Removing them will significantly simplify the document and reduce bandwidth requirements.

Example 11: [Bad practice. Do not copy!] Directional markup used far too often in a document.

<h2 dir="rtl">

القاموس

</h2>

<dl>

<dt dir="rtl">

المنالية

</dt>

<dd dir="rtl">

سهولة منال للويب من قبل الجميع بصرف النّظر عن إعاقةهم.

</dd>

<dt dir="rtl">

برنامج التصديق

</dt>

<dd dir="rtl">

أو "الفاليديتور" أداة للتّحقّق من صلاحيّة صفحة ويب. على سبيل المثال، للتّحقّق من صلاحيّة

<span dir="ltr">HTML</span>، يمكن أن تستخدم بزنامج تصديق

<span dir="ltr">W3C</span>

</dd>

<dt dir="rtl">

التّدويل

</dt>

<dd dir="rtl">

تدويل الويب يسمح و يجعله سهل لاستخدام موقعك باللّغات و السّيناريوهات و الثّقافات المختلفة.

</dd>

</dl>

The block elements inherit their direction from that set on the html element, or the previous parent element where a change was made. The inline 'HTML' and 'CSS' words in Example 11 do not need markup because the bidi algorithm can produce the right result automatically.

Occasionally the Unicode bidirectional algorithm is not sufficient to correctly order certain inline sequences of bidirectional text. Alternatively, you may want to override the effects of the bidirectional algorithm for a part of the page. In these cases you can apply additional markup to produce the ordering you want. These scenarios are discussed in other techniques in this document.

7 Mixing text direction inline

There are three main scenarios that cause problems when dealing with bidirectional inline text. These are:

We address these scenarios here with proposals for solutions.

When you have bidirectional text nested in inline text of a different direction, and markup can be used, use the dir attribute to make the text display correctly. Otherwise, use RLE/LRE and PDF control characters to create an embedded base direction.
How to

Add the dir attribute to an element surrounding the embedded text. If there is no element surrounding the text, use a span element. Set the value of the dir attribute to either ltr or rtl, depending on the base direction of the embedded text, as shown in the examples below.

If it is not possible to use markup, eg. in the title element or attribute values, you will need to use Unicode control characters.

For examples and more detailed explanations see the discussion that follows.

Discussion

This technique is useful where nested, inline text, such as a quotation, is bidirectional. At a simple level the Unicode bidirectional algorithm takes care of the reordering of inline text, but where nested text is bidirectional you need to set up an embedding level, ie. indicate a range of text for which a different base direction will be applied.

This can be done using markup around the relevant text, or by adding Unicode control characters to the text. It is recommended that markup be used in preference to control characters because the latter are difficult to manage well, given that they are invisible.

You need to be familiar with the concepts in the article What you need to know about the bidi algorithm and inline markup to understand this technique.

Using markup. Example 12 shows a sentence that, because we rely solely on the bidirectional algorithm, is incorrectly ordered, but that can be fixed with markup.

If we rely solely on the bidirectional algorithm, the text 'W3C' in the sentence below will appear in the wrong place. It is part of the quotation and should appear after, ie. to the left of, the Hebrew text, and the comma should be just to its right.

View code.Incorrectly ordered text, because no embedding.

Visual ASCII version: the title is "YTIVITCA NOITAZILANOITANRETNI, w3c" in hebrew.

This is how the sentence should look.

View code.Correctly ordered text via embedding.

Visual ASCII version: the title is "w3c ,YTIVITCA NOITAZILANOITANRETNI" in hebrew.

Here is the markup that would produce it.

<p>The title is "<span dir="rtl" lang="he">...</span>" in Hebrew</p>

The markup sets up a new base direction for the embedded text. This RTL base direction causes the directional runs in the embedded text to proceed from right to left, and makes the comma between the two different directional runs become RTL-typed.

It is possible that the embedded text is not surrounded by markup, and you may therefore need to add it, but note that the quotation here was already surrounded by markup. A span was used in order to label the language of the quotation. This is likely to be a common occurrence. In addition to marking up for language, quotations may be marked up with such things as a span or q element for styling or semantic properties. Given that the boundary of the quotation is already clearly marked, adding the dir attribute is simple and quick.

Note also, by the way, that we placed the span element inside the quotation marks, since these are a part of the English text.

Using control characters. Where markup is not available, such as in a title attribute value or an option element, you will have to use Unicode control characters to demarcate the required range of text and assign a base direction to it.

To mark the beginning of the embedded section you use one of U+202B RIGHT-TO-LEFT EMBEDDING (RLE) or U+202A LEFT-TO-RIGHT EMBEDDING (LRE) to set the base direction. This corresponds to the markup <span dir="rtl"> or <span dir="ltr">, respectively. At the other end of the embedded section is U+202C POP DIRECTIONAL FORMATTING (PDF). This corresponds to </span> in markup terms.

These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section Adding escapes to the content.)

Example 13 shows how.

Here is the text we plan to use in a page description, rendered incorrectly because we are relying only on the bidi algorithm heuristics.

View code.Hebrew for 'Report on the XML at 10! event.'

This is how the sentence should look.

View code.Hebrew for 'Report on the XML at 10! event.'

Here is the markup that would produce it.

<meta name="description" content="Report on the &#x202B;...&#x202C; event." />

The RLE character sets up a new RTL base direction for the embedded text. This RTL base direction causes the directional runs to all proceed from right to left. The limit of the embedded text is indicated using a PDF character. We used numeric character references for the source text so that you can see what we did.

When weak or neutral characters or objects appear at the wrong side of a directional run, fix it using dir if there is markup already in place, or use an RLM/LRM.
How to

If the directional run is surrounded by markup, you can simply add the dir attribute to the element surrounding it.

If not, rather than use the RLE/LRE plus PDF controls to create embedded text, place U+200F RIGHT-TO-LEFT MARK (RLM) or U+200E LEFT-TO-RIGHT MARK (LRM) alongside misplaced characters to produce the desired result.

The RLM/LRM characters can be added as either characters or as escapes. (But see the issues associated with escapes in the section Adding escapes to the content.)

Note: Although we talk in terms of characters in this technique, the same principles apply to objects such as checkboxes, images, radio buttons, etc, since they are treated in the same way as neutral characters.

For examples and more detailed explanations see the discussion that follows.

Discussion

You need to be familiar with the concepts in the article What you need to know about the bidi algorithm and inline markup to understand this technique.

Weakly-typed or neutral characters between different directional runs take on the directionality of the base direction. This can be an issue if the character in question is part of, but on the edge of, a directional run which has a different direction from the current base direction.

You can deal with misplaced characters by either providing a different base direction, or by making sure the problematic character is followed by an appropriate strongly-typed character. Example 14 illustrates the problem and both of these solutions.

In the example text immediately below, the exclamation mark is part of the Arabic phrase and should have appeared to its left. It appears to the right because it falls, in memory, between an Arabic and Latin character and the overall paragraph direction is LTR. It is therefore treated as part of the English text (as is the adjacent quotation mark).

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: the title is "SDRADNATS BEW OT YEK EHT!".

This is what we should have seen.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: the title is "!SDRADNATS BEW OT YEK EHT".

An easy way to fix this is to insert the Unicode character U+200F, called the RIGHT-TO-LEFT MARK (RLM), after the exclamation mark. Now with two strong RTL characters on either side, the exclamation mark too will be treated as part of the RTL directional run and we will get the correct result. Here's the markup that would produce it. We use a numeric character reference so that you can see the character.

<p>The title is "...&#x200F".</p>

Note, however, that in a case such as this you are likely to have markup in place around the Arabic text to identify its semantics, assign a language tag or apply appropriate styling. If that is the case, it is equally simple to just add a dir attribute to the existing markup, as shown here.

<p>The title is "<cite lang="ar" dir="rtl">...</cite>".</p>
Note: The use of a RLM/LRM character only works in the simple cases like Example 14 where the embedded text in the sentence is a single directional run. If it contains bidirectional elements, you will need to apply the approach outlined in7.1 Use the dir attribute for inline nested segments.

Although our base text for Example 14 was in Latin script, you are more likely to encounter this kind of problem in an Arabic paragraph that included English text followed by punctuation. In that case you would use U+200E LEFT-TO-RIGHT MARK (LRM) to address the problem. Here is an example.

In the example text immediately below, the parenthesis to the left is part of the English phrase and should have appeared to its right. It appears to the left because it falls, in memory, between a Latin and Arabic character and the overall paragraph direction is RTL. It is therefore treated as part of the Arabic text. Note: the shape of the parenthesis is irrelevant, since it is a mirrored character.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: DNA NOITCUDORTNI NA ROF (web content accessibility guidelines (wcag EES
.wcag ROF LAIRETAM LANOITACUDE DNA LACINHCET OT SKNIL

This is what we should have seen.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: DNA NOITCUDORTNI NA ROF web content accessibility guidelines (wcag) EES
.wcag ROF LAIRETAM LANOITACUDE DNA LACINHCET OT SKNIL

The easy way to fix this is to insert the Unicode character U+200E, called the LEFT-TO-RIGHT MARK (LRM), after the parenthesis that is in the wrong place. Now with two strong LTR characters on either side, the parenthesis will be treated as part of the LTR directional run and we will get the correct result.

If, however, the text "Web Content Accessibility Guidelines (WCAG)" is surrounded by markup, it is equally simple and effective to just add a dir attribute to the existing markup, rather than insert the character.

Here is a slightly different looking example, that turns out to be the same problem. There is a major issue with this example because it is not obvious that a mistake has been made.

The first part of this MAC number has been moved to the right. This is because the characters between the Hebrew text and the 'aa' are all neutral or weak, and so they take on the base direction, and are associated with the Hebrew directional run.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: aa:04:bb:06:01:02 REBMUN

This is what we should have seen.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: 01:02:aa:04:bb:06 REBMUN

Again, we can produce the right ordering just by putting an LRM character immediately before the start of the number. This puts the initial digits and colons between two strong LTR characters, which associates them with the rest of the number.

Similar results can be obtained for telephone numbers with certain separators.

When adjacent but separate directional runs with the same directionality are rendered in the wrong order, use RLM/LRM.
How to

If the base direction is right-to-left, place an RLM character (U+200F RIGHT-TO-LEFT MARK) between the directional runs to produce the desired result. Otherwise use a LRM mark (U+200E LEFT-TO-RIGHT MARK).

These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section Adding escapes to the content.)

Note that the dir attribute is not appropriate to resolve this case.

Discussion

This technique is relevant when you have a list or sequence of items in text that includes more than one adjacent items with the same directionality, but a directionality that is different to the current base direction.

It will be easiest to describe this using some examples.

In the sentence that follows the first and second Arabic words in the list of states are in the wrong order, and the comma is misplaced.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: the names of these states in arabic are NIARHAB ,TPYGE and TIAWUK respectively.

This is because the first two Arabic words have the same direction, are only separated by neutral and weak characters, which adopt the directionality of the surrounding characters, and therefore constitute a single directional run (right to left). The text should look like this.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: the names of these states in arabic are TPYGE, NIARHAB and TIAWUK respectively.

To achieve the desired effect, we need to break the directional run by adding a strongly-typed left-to-right character between the two words. This has the additional effect of changing the directionality of the comma to LTR, since it is now between two characters of differing directionality. The character we added is an invisible LRM. Here is the markup.

<p>The names of these states in Arabic are ...,&#x200E; ... and ... respectively.</p>

Example 17 shows a case that occurs only rarely in English. Because of the likelihood of Latin text showing up in languages written with the Arabic or Hebrew scripts, this situation is more common when writing in those languages. Example 18 shows a typical case.

In the next, right-to-left, sentence the acronym and the following number are incorrectly ordered, and the neutral parenthesis adds to the confusion.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: wcag) 2.0) SENILEDIUG YTILIBISSECCA TNETNOC BEW

This is what we expected to see.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: 2.0 (wcag) SENILEDIUG YTILIBISSECCA TNETNOC BEW

The problem is caused by the assumption on the part of the bidi algorithm that WCAG and 2.0 are part of the same directional run, whereas in reality the 2.0 is related to the Hebrew title, rather than the acronym (which is what should be in parentheses). To solve the problem we add a RLM character between the two, to break this into two directional runs which then get ordered right to left.

The same problem can occur when, say, a Persian sentence ends in an English word followed by a period, and then the next sentence starts with an English word. To avoid the two words being swapped around you need to put an RLM after the end of the first sentence.

This same issue also applies to sequences of items such as checkbox or radio button labels or other lists of items on a page. See Example 19 for an illustration.

In this checklist of languages the author had intended English to be nearer the beginning of the sentence than French (ie. further to the right), but the checkbox is treated as a directionally neutral item in the text, and so the English and French items are treated as a single directional run. This leads to them being displayed the wrong way round. To add to the confusion, it looks as if Arabic and English have been selected by the user, whereas in fact the user has selected Arabic and French!

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: UDRU NAISREP english français CIBARA :SEGAUGNAL

The solution here is the same as before. Simply add an RLM character after the English label, and you get the following.

View code.Hebrew for 'Leading the Web to its full potential...'

Visual ASCII version: UDRU NAISREP français english CIBARA :SEGAUGNAL

There are a number of scenarios where you need to look out for this issue. For example, it is common to create navigation bars from items listed in a ul element that are then rendered as inline items using CSS display: inline. For this to work you will need to end relevant li content with a RLM or LRM (depending on the base direction of the context in which the items will be displayed).

The same applies to lists that are created at run time using scripting, such as that shown at the top left of the page in Example 20, where the language links are automatically generated based on information about which language versions are supported for that page. In this case, the same mechanism and labels are used as for left-to-right pages, so the script detects the language of this page as one that is written right-to-left, and then adds an RLM to all of the labels, whatever the language.

Use stateful Unicode control characters for bidirectional control only for attribute text or element text that allows no internal markup.
How to

In HTML do not use the Unicode characters RLE, LRE, RLO, LRO and PDF where markup is available. To show the limits of embedded text with a different base direction, use the dir attribute, and to override the bidirectional algorithm use the bdo element.

Note: Two non-embedding directional control characters provided by Unicode do not have corresponding markup and should be used. These are U+200F RIGHT-TO-LEFT MARK (RLM) and U+200E LEFT-TO-RIGHT MARK (LRM).

On the other hand, attribute text or element text that allows no internal markup, such as the title, textarea and option elements, cannot support use of dir on a span or other element to label part of its content.

In these cases you need to use Unicode characters to do the job. The following table shows correspondences between markup and Unicode control codes:

MarkupCodeCodepointDescription
dir = "rtl"RLEU+202BSame effect as the start tag of a block or inline element with the attribute dir set to rtl .
dir = "ltr"LREU+202ASame effect as the start tag of a block or inline element with the attribute dir set to ltr .
<bdo dir = "rtl">RLOU+202ESame effect as the start tag of a bdo element with the attribute dir set to rtl .
<bdo dir = "ltr">LRO U+202DSame effect as the start tag of a bdo element with the attribute dir set to ltr .
end of selectionPDFU+202CWhen used to terminate RLE or LRE it is equivalent to the end tag of the element carrying the dir attribute. When used to terminate RLO or LRO it is equivalent to the </bdo> tag.

These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section Adding escapes to the content.)

Discussion

The HTML 4 specification specifically warns against mixing the two approaches because of the increased likelihood of improper nesting. It also recommends the use of markup because it "offers a better guarantee of document structural integrity and alleviates some problems when editing bidirectional HTML text with a simple text editor". It does not proscribe the use of Unicode bidi formatting codes.

The joint Unicode Technical Report #20 and W3C Note, Unicode in XML and other Markup Languages goes further. It explicitly recommends that only the markup be used. It also recommends that the Unicode bidi formatting codes should be ignored if detected in a browser context, and replaced by appropriate markup when received in an editing context.

Of course, in attribute values or for the three elements listed above markup cannot be used, so the Unicode control characters are the only option available.

For further discussion, see the article Bidi formatting codes vs. markup in (X)HTML.

Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes.
How to
Note: This technique is described as a way to work around the fact that some browsers don't always do what you would expect. Note, also, that this is only a problem for bidirectional text. Monodirectional text should look fine, because there is no need to correctly order a sequence of different directional runs.

Put the Unicode characters RLE (U+202B) or LRE (U+202A) at the beginning, and PDF (U+202C) at the end of bidirectional text that you expect to be displayed in one of the following situations:

  • as a tooltip
  • in the page title
  • on a JavaScript dialog box

These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section Adding escapes to the content.)

Discussion

Bidirectional text that is displayed in page titles, or JavaScript dialog boxes in a browser with a left-to-right locale is typically displayed with a base direction of left-to-right. This means that directional runs in these contexts are also ordered left-to-right. At the time of writing Internet Explorer and Firefox do respect the base direction of the content when displaying tooltips for title attributes, but other browsers do not.

Here is a screen snap of a tooltip in a right-to-left page which looks correct.

Picture of bidirectional text in a tooltip, with directional runs in the correct order.

On a different browser, the text in the same tooltip has the two Arabic words on the wrong sides of the 'W3C' text, because a base direction of LTR has been applied.

Picture of bidirectional text in a tooltip, with directional runs in the correct order.

Since markup is not effective in any of these situations, Unicode control characters can be used to establish the base direction as right-to-left. This produces the desired effect on most browsers.

For more information about handling of these situations in browsers and examples, follow the link to more information, below, and look at the test and results pages.

Do not leave white space at the end of inline elements that mark a directional boundary.
How to

Remove all white space from before the end tag of an inline element that changes the base direction.

Discussion

Spaces between directional runs may appear to collapse at the boundary of an embedding if there is a space just before the end tag of the inline element that surrounds the embedded text. Here is an example.

The following picture shows the problem. (See Hebrew word to the right and the word 'in'.)

View code.An example of text that is apparently missing a space.

Here is the source text that produced that result.

<p>The title says <span dir="rtl" lang="he">... W3C </span> in Hebrew.</p>

Note carefully the space between the C of W3C and the < of the following </span>. This is what causes the effect. If you simply eliminate that space, you get what you expected, which is what is shown next.

View code.Parentheses and Latin text incorrectly ordered.

Although this seems paradoxical, that an extra space can cause a missing space, it is not a bug. For a detailed explanation of why this happens see the article Bidi space loss.

8 Handling parentheses & other mirrored characters

Treat mirrored characters as if any word left in the name meant 'opening', and right meant 'closing'.
How to

Whatever the base direction of the text you are authoring, always use U+0028 LEFT PARENTHESIS and U+0029 RIGHT PARENTHESIS (or their equivalents in non-Unicode but logical encodings) as the opening parenthesis and closing parenthesis, respectively. Ignore the actual names of these characters. Allow the rendering algorithms to choose the appropriate shape for you.

The same applies to the other mirrored characters.

The following text runs right to left. The first parenthesis in memory is U+0028 LEFT PARENTHESIS and the second is U+0029 RIGHT PARENTHESIS. The rendering automatically produces the correct shape for the displayed glyphs.

View code.Example of parentheses changing shape.
Discussion

Mirrored characters are used according to their Unicode semantics, rather than their actual displayed shape. There are a number of paired punctuation characters, but also some single characters. The shape of a mirrored character when displayed will automatically change according to the directional context.

The following picture shows some text before and after a RLM has been inserted alongside the first parenthesis (red). Before the RLM was added, the bidi algorithm assumed that the parenthesis was part of the LTR directional run, ie. the Latin text. After the RLM was added, the parenthesis was between two characters with different directionality, and therefore takes on the directionality of the base direction, ie. RTL. Note how the shape is automatically changed to reflect this. No change was made to the text other than inserting the RLM.

View code.Example of parentheses changing shape.

It is unfortunate in this case that Unicode character names cannot be changed, otherwise these parentheses you see in Example 24 would have been named OPENING PARENTHESIS and CLOSING PARENTHESIS instead, to make their use clearer.

9 Overriding the Unicode bidirectional algorithm

Use the bdo element to force the directionality of a sequence of inline characters.
No UA specific notes.
How to

Surround the text with a bdo element. Set the value of the dir attribute on the bdo tag to either ltr or rtl, depending on the base direction of the surrounding text, as shown in the examples below.

If it is not possible to use markup, eg. in the title element or attribute values, you will need to use Unicode control characters.

For examples and more detailed explanations see the discussion that follows.

Discussion

bdo stands for 'bidirectional override'. This inline element can be used to override the Unicode bidirectional algorithm, and just list all characters in the sequence they are stored in memory.

This is not often required, but it can be very useful to correctly display part numbers or to show how text is stored in memory.

This can be done using markup around the relevant text, or by adding Unicode control characters to the text. It is recommended that markup is used in preference to control characters because control characters create states with invisible boundaries, and are difficult to manage.

Using markup. Example 25 shows how to use bdo.

The following picture shows the same text as the bidirectional algorithm would display it, and how it would look if you use bdo markup to remove the effects of the bidi algorithm.

View code.Incorrectly ordered text, because no embedding.

Visual ASCII version:
in the phrase "w3c ,YTIVITCA NOITAZILANOITANRETNI",
the order of characters in memory is:

INTERNATIONALIZATION ACTIVITY, w3c

Here is the markup that would produce it.

<p>In the phrase "<span dir="rtl" lang="he">...</span>" the order of characters in memory is:</p>

<p><bdo dir="ltr"> ... </bdo></p>

It is possible that the embedded text is not surrounded by markup, and you may need to add it, but note that the quotation here was already surrounded by markup. A span was used in order to label the language of the quotation. This is likely to be a common occurrence. In addition to marking up for language, quotations may be marked up with such things as a span or q element for styling or semantic properties. Given that the boundary of the quotation is already clearly marked, adding the dir attribute is simple and quick.

Note also, by the way, that we placed the span element inside the quotation marks, since these are a part of the English text.

Using control characters. Where markup is not available, such as in a title attribute value or an option element, you will have to use Unicode control characters to demarcate the required range of text and assign a base direction to it.

To mark the beginning of the embedded section you use one of U+202E RIGHT-TO-LEFT OVERRIDE (RLO) or U+202D LEFT-TO-RIGHT OVERRIDE (LRO) to set the base direction. This corresponds to the markup <bdo dir="rtl"> or <bdo dir="ltr">, respectively. At the other end of the embedded section is U+202C POP DIRECTIONAL FORMATTING (PDF). This corresponds to </bdo> in markup terms.

These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section Adding escapes to the content.)

Go to the table of contents.A Acknowledgments

Members of the Internationalization Working Group and former GEO Working Group have contributed their time and valuable comments to shaping these guidelines.