3.2.6 Requirements relating to bidirectional-algorithm formatting characters

Text content in HTML elements with child Text nodes, and text in attributes of HTML elements that allow free-form text, may contain characters in the range U+202A to U+202E (the bidirectional-algorithm formatting characters). However, the use of these characters is restricted so that any embedding or overrides generated by these characters do not start and end with different parent elements, and so that all such embeddings and overrides are explicitly terminated by a U+202C POP DIRECTIONAL FORMATTING character. This helps reduce incidences of text being reused in a manner that has unforeseen effects on the bidirectional algorithm.

The aforementioned restrictions are defined by specifying that certain parts of documents form bidirectional-algorithm formatting character ranges, and then imposing a requirement on such ranges.

The strings resulting from applying the following algorithm to an HTML element element are bidirectional-algorithm formatting character ranges:

  1. Let output be an empty list of strings.

  2. Let string be an empty string.

  3. Let node be the first child node of element, if any, or null otherwise.

  4. Loop: If node is null, jump to the step labeled end.

  5. Process node according to the first matching step from the following list:

    If node is a Text node

    Append the text data of node to string.

    If node is a br element
    If node is an HTML element that is flow content but that is not also phrasing content

    If string is not the empty string, push string onto output, and let string be empty string.

    Do nothing.
  6. Let node be node's next sibling, if any, or null otherwise.

  7. Jump to the step labeled loop.

  8. End: If string is not the empty string, push string onto output.

  9. Return output as the bidirectional-algorithm formatting character ranges.

The value of a namespace-less attribute of an HTML element is a bidirectional-algorithm formatting character range.

Any strings that, as described above, are bidirectional-algorithm formatting character ranges must match the string production in the following ABNF, the character set for which is Unicode. [ABNF]

string        = *( plaintext ( embedding / override ) ) plaintext
embedding     = ( lre / rle ) string pdf
override      = ( lro / rlo ) string pdf
lre           = %x202A ; U+202A LEFT-TO-RIGHT EMBEDDING
rle           = %x202B ; U+202B RIGHT-TO-LEFT EMBEDDING
lro           = %x202D ; U+202D LEFT-TO-RIGHT OVERRIDE
rlo           = %x202E ; U+202E RIGHT-TO-LEFT OVERRIDE
pdf           = %x202C ; U+202C POP DIRECTIONAL FORMATTING
plaintext     = *( %x0000-2029 / %x202F-10FFFF )
                ; any string with no bidirectional-algorithm formatting characters

Authors are encouraged to use the dir attribute, the bdo element, and the bdi element, rather than maintaining the bidirectional-algorithm formatting characters manually. The bidirectional-algorithm formatting characters interact poorly with CSS.