Text content in HTML elements with child Text
nodes, and text in attributes of HTML elements that allow free-form text, may
contain characters in the range U+202A to U+202E (the
bidirectional-algorithm formatting characters). However, the use of
these characters is restricted so that any embedding or overrides
generated by these characters do not start and end with different
parent elements, and so that all such embeddings and overrides are
explicitly terminated by a U+202C POP DIRECTIONAL FORMATTING
character. This helps reduce incidences of text being reused in a
manner that has unforeseen effects on the bidirectional algorithm.
[BIDI]
The aforementioned restrictions are defined by specifying that certain parts of documents form bidirectional-algorithm formatting character ranges, and then imposing a requirement on such ranges.
The strings resulting from applying the following algorithm to an HTML element element are bidirectional-algorithm formatting character ranges:
Let output be an empty list of strings.
Let string be an empty string.
Let node be the first child node of element, if any, or null otherwise.
Loop: If node is null, jump to the step labeled end.
Process node according to the first matching step from the following list:
Text
nodeAppend the text data of node to string.
br
elementIf string is not the empty string, push string onto output, and let string be empty string.
Let node be node's next sibling, if any, or null otherwise.
Jump to the step labeled loop.
End: If string is not the empty string, push string onto output.
Return output as the bidirectional-algorithm formatting character ranges.
The value of a namespace-less attribute of an HTML element is a bidirectional-algorithm formatting character range.
Any strings that, as described above, are bidirectional-algorithm
formatting character ranges must match the string
production in the following ABNF, the character
set for which is Unicode. [ABNF]
string = *( plaintext ( embedding / override ) ) plaintext embedding = ( lre / rle ) string pdf override = ( lro / rlo ) string pdf lre = %x202A ; U+202A LEFT-TO-RIGHT EMBEDDING rle = %x202B ; U+202B RIGHT-TO-LEFT EMBEDDING lro = %x202D ; U+202D LEFT-TO-RIGHT OVERRIDE rlo = %x202E ; U+202E RIGHT-TO-LEFT OVERRIDE pdf = %x202C ; U+202C POP DIRECTIONAL FORMATTING plaintext = *( %x0000-2029 / %x202F-10FFFF ) ; any string with no bidirectional-algorithm formatting characters
Authors are encouraged to use the dir
attribute, the bdo
element, and the bdi
element, rather than maintaining the
bidirectional-algorithm formatting characters manually. The
bidirectional-algorithm formatting characters interact poorly with
CSS.