11. XHTML Bi-directional Text Module


This section is normative.

The Bi-directional Text module defines an element that can be used to declare the bi-directional rules for the element's content.

Elements Attributes Minimal Content Model
bdo Core, dir* ("ltr" | "rtl") (PCDATA | Inline)*

When this module is used, the bdo element is added to the Inline content set of the Text Module. Selecting this module also adds the attribute dir* ("ltr" | "rtl") to the I18N attribute collection.

Implementation: DTD

11.1. The bdo element

The bidirectional algorithm and the dir attribute generally suffice to manage embedded direction changes. However, some situations may arise when the bidirectional algorithm results in incorrect presentation. The bdo element allows authors to turn off the bidirectional algorithm for selected fragments of text.


The Core collection
A collection of basic attributes used on all elements, including class, id, title.
dir = "ltr|rtl"
This mandatory attribute specifies the base direction of the element's text content. This direction overrides the inherent directionality of characters as defined in [UNICODE]. Possible values:
  • ltr: Left-to-right text.
  • rtl: Right-to-left text.

Consider a document containing the same text as before:

english1 HEBREW2 english3 HEBREW4 english5 HEBREW6

but assume that this text has already been put in visual order. One reason for this may be that the MIME standard ([RFC2045], [RFC1556]) favors visual order, i.e., that right-to-left character sequences are inserted right-to-left in the byte stream. In an email, the above might be formatted, including line breaks, as:

english1 2WERBEH english3
4WERBEH english5 6WERBEH

This conflicts with the [UNICODE] bidirectional algorithm, because that algorithm would invert 2WERBEH, 4WERBEH, and 6WERBEH a second time, displaying the Hebrew words left-to-right instead of right-to-left.

The solution in this case is to override the bidirectional algorithm by putting the Email excerpt in a pre element (to conserve line breaks) and each line in a bdo element, whose dir attribute is set to ltr:

<bdo dir="ltr">english1 2WERBEH english3</bdo>
<bdo dir="ltr">4WERBEH english5 6WERBEH</bdo>

This tells the bidirectional algorithm "Leave me left-to-right!" and would produce the desired presentation:

english1 2WERBEH english3
4WERBEH english5 6WERBEH

The bdo element should be used in scenarios where absolute control over sequence order is required (e.g., multi-language part numbers). The dir attribute is mandatory for this element.

Authors may also use special Unicode characters to override the bidirectional algorithm -- LEFT-TO-RIGHT OVERRIDE (202D) or RIGHT-TO-LEFT OVERRIDE (hexadecimal 202E). The POP DIRECTIONAL FORMATTING (hexadecimal 202C) character ends either bidirectional override.

Note. Recall that conflicts can arise if the dir attribute is used on inline elements (including bdo concurrently with the corresponding [UNICODE] formatting characters.

Bidirectionality and character encoding According to [RFC1555] and [RFC1556], there are special conventions for the use of "charset" parameter values to indicate bidirectional treatment in MIME mail, in particular to distinguish between visual, implicit, and explicit directionality. The parameter value "ISO-8859-8" (for Hebrew) denotes visual encoding, "ISO-8859-8-i" denotes implicit bidirectionality, and "ISO-8859-8-e" denotes explicit directionality.

Because XHTML uses the Unicode bidirectionality algorithm, conforming documents encoded using ISO 8859-8 must be labeled as "ISO-8859-8-i". Explicit directional control is also possible with HXTML, but cannot be expressed with ISO 8859-8, so "ISO-8859-8-e" should not be used.

The value "ISO-8859-8" implies that the document is formatted visually, misusing some markup (such as table with right alignment and no line wrapping) to ensure reasonable display on older user agents that do not handle bidirectionality. Such documents do not conform to the present specification. If necessary, they can be made to conform to the current specification (and at the same time will be displayed correctly on older user agents) by adding bdo markup where necessary. Contrary to what is said in [RFC1555] and [RFC1556], ISO-8859-6 (Arabic) is not visual ordering.

11.1.1. Character references for directionality and joining control

Since ambiguities sometimes arise as to the directionality of certain characters (e.g., punctuation), the [UNICODE] specification includes characters to enable their proper resolution. Also, Unicode includes some characters to control joining behavior where this is necessary (e.g., some situations with Arabic letters). XHTML includes character references for these characters.

The following DTD excerpt presents some of the directional entities:

   <!ENTITY zwnj CDATA "&#8204;"--=zero width non-joiner-->
   <!ENTITY zwj  CDATA "&#8205;"--=zero width joiner-->
   <!ENTITY lrm  CDATA "&#8206;"--=left-to-right mark-->
   <!ENTITY rlm  CDATA "&#8207;"--=right-to-left mark-->

The zwnj entity is used to block joining behavior in contexts where joining will occur but shouldn't. The zwj entity does the opposite; it forces joining when it wouldn't occur but should. For example, the Arabic letter "HEH" is used to abbreviate "Hijri", the name of the Islamic calendar system. Since the isolated form of "HEH" looks like the digit five as employed in Arabic script (based on Indic digits), in order to prevent confusing "HEH" as a final digit five in a year, the initial form of "HEH" is used. However, there is no following context (i.e., a joining letter) to which the "HEH" can join. The zwj character provides that context.

Similarly, in Persian texts, there are cases where a letter that normally would join a subsequent letter in a cursive connection should not. The character zwnj is used to block joining in such cases.

The other characters, lrm and rlm, are used to force directionality of directionally neutral characters. For example, if a double quotation mark comes between an Arabic (right-to-left) and a Latin (left-to-right) letter, the direction of the quotation mark is not clear (is it quoting the Arabic text or the Latin text?). The lrm and rlm characters have a directional property but no width and no word/line break property. Please consult [UNICODE] for more details.

Mirrored character glyphs. In general, the bidirectional algorithm does not mirror character glyphs but leaves them unaffected. An exception are characters such as parentheses (see [UNICODE], table 4-7). In cases where mirroring is desired, for example for Egyptian Hieroglyphs, Greek Bustrophedon, or special design effects, this should be controlled with styles.

11.1.2. The effect of style sheets on bidirectionality

In general, using style sheets to change an element's visual rendering from block-level to inline or vice-versa is straightforward. However, because the bidirectional algorithm relies on the inline/block-level distinction, special care must be taken during the transformation.

When an inline element that does not have a dir attribute is transformed to the style of a block-level element by a style sheet, it inherits the dir attribute from its closest parent block element to define the base direction of the block.

When a block element that does not have a dir attribute is transformed to the style of an inline element by a style sheet, the resulting presentation should be equivalent, in terms of bidirectional formatting, to the formatting obtained by explicitly adding a dir attribute (assigned the inherited value) to the transformed element.