Abstract

Authoring a web app that needs to support both right-to-left and left-to-right interfaces, or to take as input and display both left-to-right and right-to-left data, usually presents a number of challenges that make it an especially laborious and bug-prone task. Some of these are due to browser bugs, but some can be traced to a lapse in the specification of the bidirectional aspects of a given HTML or CSS feature. And some of these challenges can be greatly simplified by adding a few strategically placed new HTML and CSS features.

This document was used to work through and communicate recommendations made to the HTML and CSS Working Groups for some of the most repetitive pain points. It is being published now for the historical record in order to capture some of the thinking that lay behind the evolution of the specifications and to help people in the future working on bidi issues understand the history of the decisions taken. Notes have been added to give a brief summary of what was actually implemented in the HTML or CSS specifications.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document contains initial proposals for features to be added to HTML to support bidirectional text in languages such as Arabic, Hebrew, Persian, Thaana, Urdu, etc., and describes the eventual solutions that were adopted by HTML5. This is a W3C Draft produced by the Internationalization Working Group, part of the W3C Internationalization Activity. The Working Group expects to advance this Working Draft to Working Group Note. Please send comments on this document to public-i18n-bidi@w3.org (publicly archived).

This document was published by the Internationalization Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to public-i18n-bidi@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

Table of Contents

1. Introduction

Authoring a web app that needs to support both right-to-left and left-to-right interfaces, or to take as input and display both left-to-right and right-to-left data, usually presents a number of challenges that make it an especially laborious and bug-prone task. Some of these are due to browser bugs, but some can be traced to a lapse in the specification of the bidirectional aspects of a given HTML or CSS feature. And some of these challenges could be greatly simplified by adding a few strategically placed new HTML and CSS features.

Note

While HTML5 and CSS Level 3 were in development, this document was used to work through and communicate recommendations made to the HTML and CSS Working Groups for some of the most repetitive pain points. It is being published now for the historical record, in order to capture some of the thinking that lay behind the evolution of the specifications and to help people in the future working on bidi issues understand the history of the decisions taken. Much of the text is unchanged from that of document versions published some considerable time ago, but some notes have been added more recently to give a brief summary of what was actually changed in the HTML or CSS specifications. An appendix captures a snapshot, as of July 2014, of a set of links and notes that were originally in a separate document, and that were used to track bugs raised against implementations for the features described in the body of this document.

1.1 Notation

All examples in this document are in "fake bidi", i.e. use uppercase English to represent RTL characters and lowercase English for LTR characters. They will usually first give the characters in the order in which they are stored in memory, and then in the visual order in which they appear when displayed. For example, we would say: When displayed, "RTL SENTENCE" comes out as "ECNETNES LTR".

1.2 Base direction: a recurrent theme

Much of this proposal deals with determining and declaring the base direction of text. This is because text displayed in the wrong direction is often garbled.

For example, "10 main st." is displayed in RTL as

.main st 10

and "MAKE html WORK FOR YOU" is displayed in LTR as

EKAM html UOY ROF KROW

instead of the intended

UOY ROF KROW html EKAM

and is quite unreadable.

1.3 Terminology

base direction
The overall directional context, LTR or RTL, in which text is displayed, and which often affects the way it is displayed when using the Unicode Bidirectional Algorithm.
computed direction
In HTML, the base direction of an element can be specified using the dir attribute, but can also be inherited from an ancestor element, as well as set using CSS. We call the "bottom line" base direction (LTR or RTL) applied to an element after considering all these factors the computed direction.
inline element
Inline elements are elements such as span, em, strong, a, etc. The opposite of an inline element is a block element, such as p, div, ol, ul, blockquote, body, etc.
inline text
Text that lies wholly within a single block element, i.e. text within a paragraph. Inline text may include inline markup.
LRE
A short name for the Unicode character U+202A LEFT-TO-RIGHT EMBEDDING. This invisible control character is used to begin a range of text with an embedded base direction of left-to-right.
LRO
A short name for the Unicode character U+202D LEFT-TO-RIGHT OVERRIDE. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from left to right.
LTR
Left-to-right.
PDF
A short name for the Unicode character U+202C POP DIRECTIONAL FORMATTING. This invisible control character is used to signal the end of a range of text that was started with one of the RLE, LRE, RLO or LRO characters.
RLE
A short name for the Unicode character U+202B RIGHT-TO-LEFT EMBEDDING. This invisible control character is used to begin a range of text with an embedded base direction of right-to-left.
RLO
A short name for the Unicode character U+202E RIGHT-TO-LEFT OVERRIDE. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from right to left.
RTL
Right-to-left.
UBA (Unicode Bidirectional Algorithm)
The Unicode Bidirectional Algorithm, which determines the visual order in which bidi text is to be displayed, given a base direction that is either set explicitly or "guessed" from the text itself using a standard algorithm. In HTML up to version 4, the UBA is always passed a specific base direction, and never asked to guess it.

2. New HTML and CSS features

2.1 Support bidi isolation of inline element content

2.1.1 Background

The UBA's rendering of a piece of text depends not only on the explicitly declared direction in which it appears (e.g. the dir attribute value on the parent element) and the characters it contains, but also on the implicit directional properties of the characters preceding and following it. For example, in an RTL context, "john: " is displayed as "john: " when followed by "susan" (i.e. "john: susan"), but as " :john" when followed by "SUSAN" (i.e. "NASUS :john") - note the change in colon positioning.

The bidi formatting characters LRO, RLO, LRE, RLE, and PDF have particularly strong influence on what surrounds them. For example, RLO makes all text up to the next PDF behave as RTL characters, making "hello" display as "olleh".

2.1.2 The problem

Most documents contain a large number of self-contained entities whose content must not influence the directional rendering of what precedes or follows them. Furthermore, the document author naively expects such an entity to be displayed visually between what precedes it and what follows it.

Examples of such entities are legion: the title of an article, the name of an author, a description, etc.

As long as the entire document and all the entities it contains are of uniform direction, there is no problem. Arbitrary-direction entities also don't cause a problem when they are displayed as a separate display:block or display:inline-block element (which are treated as separate "paragraphs" in UBA terms). However, when an inline entity is allowed to contain text of arbitrary direction, bad things start happening, and existing HTML mark-up is powerless to stop it.

Isolation example 1.

PURPLE PIZZA - 3 reviews

The entity here is the RTL name of a restaurant, being displayed in an LTR context. The intent is to have it appear as

AZZIP ELPRUP - 3 reviews

However, it is actually displayed as

3 - AZZIP ELPRUP reviews

and is effectively unreadable. This happens because according to the UBA, a number "sticks" to the strong-directional run preceding it.

Isolation example 2.

<span dir="rtl">PURPLE PIZZA</span> - 3 reviews

This is a common first attempt at fixing Isolation example 1. In fact, wrapping opposite-direction text in mark-up indicating its direction is generally a good idea, and is in many cases essential. Here, however, it makes no difference at all - the result is exactly the same as in Isolation example 1.

That the "fix" does not work is, in fact, to be expected: the <span dir="rtl"> only explicitly states the direction of the text inside it, and does not say anything at all about what surrounds it.

In fact, the currently recommended way to fix our purple pizza is not to use mark-up at all, but to insert an LRM character (U+200E, &lrm;) after the PURPLE PIZZA. This prevents the RTL text from "sticking" to the number that happens to follow it. If the context had been RTL and the entity LTR, the same magic would be worked by the RLM character (U+200F, &rlm;). The same technique is supposed to be applied to Isolation example 3 and Isolation example 4 below. Unfortunately, using LRM/RLM marks like this is less than ideal, for reasons we will discuss below.

Isolation example 3.

USE css (<span dir="ltr">position:relative</span>).

The entity here is a code snippet ("position:relative"), marked with a span and its LTR direction, to be displayed in an RTL context. Despite the RTL context, it is preceded by the LTR word "css" because technical terms and brand names often appear in their original Latin script in RTL text. The intent is to have it appear as

.(position:relative) css ESU

However, it is actually displayed as

.(css (position:relative ESU

This happens because the LTR word "css" before the entity "sticks" to the LTR entity according to normal UBA rules.

Isolation example 4.

documents &gt; <span dir="rtl">MY FIRST NOVEL</span> &gt; <span dir="rtl">CHAPTER 1</span>

The entities here are folder names displayed in "breadcrumbs" in an LTR context, where two of the folder names happen to be RTL. The intent is to have it displayed as

documents > LEVON TSRIF YM > 1 RETPAHC

However, it is actually displayed as

documents > 1 RETPAHC < LEVON TSRIF YM

i.e. with the RTL folder names visually in the wrong order (and the arrow between them reversed). This happens because according to the UBA, the two RTL entities "stick" together, whether or not they are wrapped in spans as shown here.

Isolation example 5.

joe hackerRLO: overdrawn

The entity here is the name of a user, as chosen by a malicious user to include the invisible RLO character (U+202E), followed by a status string. Obviously, the user's name is "HTML-escaped" when displayed, but this does not do anything to the RLO character. The outcome is that this is displayed as 

joe hacker: nwardrevo

where the entity influenced the display of what follows it, reversing its characters. This has security implications and has surfaced on blogs. On the other hand, it does not even have to be due to malicious use, only to the inadvertently bad trimming of an overly-long string.

In HTML4, there is no reliable way to deal with Isolation example 1 to Isolation example 4 using mark-up (while retaining inline display), except by redundantly marking an entity's surroundings with the base direction, which is counterintuitive and painful to implement. The usual way to deal with Isolation example 1 to Isolation example 4 is to surround an entity in either LRM or RLM characters - LRM in an LTR context, and RLM in an RTL context. This prevents the entity from "sticking" to what precedes or follows it.

However, using the LRM/RLM technique has several disadvantages, particularly in a web application:

  • The LRM or RLM is being used to address a layout issue that reflects the structure of the document, i.e. to indicate the boundary of an entity. There should be a way to express it in mark-up, not magic Unicode characters. In fact, the entity is typically already surrounded by an element that either gives it style or indicates its direction; why can't the element itself be used to indicate an entity?

  • In a web application, having to add logic to choose between an LRM and an RLM is a pain, especially when the existing code layer does not happen to have easy access to the context's direction.

  • Not all search engines (e.g. the browser's own CTRL-F) are smart enough to ignore invisible Unicode characters such as LRM and RLM. This makes a document using such characters less searchable: the user searches for "A B", but does not find "A B" because there is an invisible character between them. Or, conversely, the user copy-pastes text - accidentally including the LRM or RLM character - from the page into some search box, and does not get hits in any other documents because they do not contain the LRM/RLM. In a manually-authored HTML document using a few judiciously placed LRMs/RLMs, such problems do not amount to much. In a web application, however, the simplest way to use this technique is to do it wholesale, around every inserted entity. This results in very real searchability problems. Avoiding them requires implementing quite complicated logic to decide whether the LRM or RLM is really necessary.

Furthermore, LRMs and RLMs do not help in Isolation example 5. Nor is there any mark-up to solve it. The only current way to deal with it is for the application to either remove any LRE, RLE, LRO, RLO, or PDF characters in it, or to remove any extra PDFs and then add any missing ones at the end. This is a rarely-implemented pain in the neck.

2.1.3 Proposed solution

  1. Add a new value to the CSS property unicode-bidi, tentatively named isolate. It would directionally isolate an inline element from its surroundings, just like display:inline-block elements are already directionally isolated from their surroundings: neither affects the bidi ordering of the other, and no part of the surrounding content gets ordered between parts of the element's content. Furthermore, the effects of LRE, RLE, LRO, RLO, and PDF characters appearing in the unicode-bidi:isolate element never extend beyond it. Like the unicode-bidi property generally, unicode-bidi:isolate does not inherit.

    The exact definition of the effects of unicode-bidi:isolate on an element:

    • For the purposes of the bidi resolution within the element, the contents of the element are treated as a separate, independent paragraph or paragraphs whose base direction is the element's computed direction. For the purpose of the bidi resolution of the element as a whole in its containing paragraph (if any), the element is treated as if it were an Object Replacement Character (U+FFFC).

    • The unicode-bidi property will allow combining the isolate and bidi-override values, i.e. unicode-bidi:isolate bidi-override.

    • unicode-bidi:isolate has no effect on an element that already introduces a UBA paragraph break, e.g. <br bidibreak=hard>.

    Since unicode-bidi:isolate works by making an element that would otherwise be part of a containing UBA paragraph into a separate UBA paragraph, it also has no effect on elements that already constitute a separate UBA paragraph:

    • unicode-bidi:isolate only affects elements that have display:inline (or display:runin when it behaves as display:inline). Thus, it has no effect on display:inline-block elements (which should continue to use bidi isolation, as already stated in the CSS specification, regardless of whether they have unicode-bidi:isolate or not) or inline elements whose display has been set to something other than inline.

    • unicode-bidi:isolate has no effect on floating and position:absolute (and position:fixed) elements, even when they have display:inline.

    Please note, however, that unicode-bidi:isolate has its usual effect on block elements that have been assigned display:inline. As we will see below, such elements will in fact have unicode-bidi:isolate by default.

    The independent application of the UBA to segments of structured text by higher level protocols such as HTML and CSS is explicitly allowed by the UBA's section 4.3, HL4 ("Apply the Bidirectional Algorithm to segments").

  2. Expose unicode-bidi:isolate in HTML by adding an element attribute tentatively named ubi, for "Unicode Bidi Isolate", as in <span dir="rtl" ubi>. Like unicode-bidi:isolate, it would be used to directionally isolate an inline-display element from its surroundings: neither affects the bidi ordering of the other, and no part of the surrounding content gets ordered between parts of the element's content. Furthermore, the effects of LRE, RLE, LRO, RLO, and PDF characters appearing in the element never extend beyond it. The attribute would take three values:

    • off, specifying no special action. This is the default except in the two cases indicated below (for which off would have to be explicitly specified when no isolation is desired). There is no inheritance.

    • ubi, specifying isolation. (Alternatively, this value could be named on. We chose ubi for similarity with pre-HTML5 boolean attributes, e.g. selected. It is up to the HTML WG to decide which is better.) It is implemented by setting the unicode-bidi CSS property for the element to isolate - or isolate bidi-override for a bdo element. It is the default value for:

    • empty string, specifying isolation just like the ubi value. The empty string value allows specifying the attribute without a value for conciseness, e.g. <span dir="rtl" ubi>.

Applications generating HTML would use ubi routinely on elements that wrap an inserted data string (usually in conjunction with indicating its direction using the dir attribute). In particular, it will be recommended to use ubi on the a element once any browsers support it.

Although in theory unicode-bidi:isolate could be used directly to achieve the same effect as ubi, the recommended approach will be to use ubi, since the bidi properties of content are an intimate part of the content and should be specified directly on it as HTML mark-up (see CSS vs. markup for bidi support). They are not simply an issue of presentation that should properly be specified separately in CSS. The new unicode-bidi:isolate CSS property value is being proposed because, like the dir attribute, the ubi attribute has to be implemented via CSS.

Note

Implementation update:

CSS Writing Modes Level 3 module added the proposed isolate value to the unicode-bidi property.

Instead of the suggested ubi attribute, however, HTML5 added a new bdi element, which gets unicode-bidi:isolate in the default stylesheet, and whose dir attribute value is auto by default. The output element, previously added to HTML5, is also assigned unicode-bidi:isolate in the default stylesheet.

After further discussion and research around possible side-effects, a recommendation was made to alter the styling of the dir attribute in the default browser style sheet, so that the browser isolates the content of each element with a dir attribute on it. This provides a much simpler solution for content authors. After research into the suitability of this approach, including implementation in the Chrome browser, HTML5 added this.

2.2 Support auto-direction

2.2.1 The problem

Many web applications with an RTL-language interface or an RTL-language data source need to display and/or accept as input both LTR and RTL data. Furthermore, the application often does not know and can not control the direction of the data.

For example, an online book store that carries books in many languages needs to display the original book titles regardless of the language of the user interface. Thus, a Hebrew or Arabic book title may appear in an English interface, and vice-versa. The direction of the title may be available as a separate attribute, but more likely it isn't, and needs to be guessed. The safest guess is on the basis of the characters making up the title.

If this site also allows user comments or reviews, it is unreasonable to limit these to one language. For example, for an English book listed in an Arabic or Hebrew interface, it is perfectly reasonable to get comments in English, the book's language. The application does not know what the user will type until the user types it. Only displaying the comments in the language matching the UI does not make much sense, since a customer interested in the book is probably interested in comments in both the book's and the UI's language. Such a situation will often necessitate mixing text in one direction (e.g. the comments) and text in the opposite direction (e.g. the interface).

Unless opposite-direction data is explicitly declared as such, it is often displayed garbled as shown above. Perhaps even worse, the user experience of typing opposite-direction data is quite awkward due to the cursor and punctuation jumping around during data entry and difficulty in selecting text.

Currently, avoiding such problems requires that the application implement logic to estimate the data's direction - and use it in the many places where it is needed. Such logic is not easy to implement, since it requires using long tables of strong-RTL and/or strong-LTR characters, and becomes non-obvious when a string contains both. For an input element, where the direction must be automatically set as the user types the text, there is no choice but to implement the estimation logic in page scripts, thus requiring even more advanced programming skills. As a result, few applications wind up doing direction estimation, and a poor user experience is quite common for web pages mixing LTR data in an RTL interface or vice-versa.

2.2.2 Not the problem

The issue at hand is with text data that is basically compatible with the UBA. That is, given the correct base direction, applying the UBA will display the text intelligibly. The only problem is that we don't know the correct base direction.

This is distinct from a different, harder issue: text mixing LTR and RTL without using the formatting characters necessary to display it intelligibly using standard UBA rules. Whichever base direction is applied, the text will not be displayed as intended. Examples of such data are not as rare as one might think:

  • Path or URL that includes consecutive RTL folder or file names (one would expect the path components to proceed in a uniform direction)
  • "Tweets" that include both an RTL phrase and LTR parts like @name and a URL
  • An RTL sentence that attempts to give a phone number with spaces in it
  • Sentence containing an opposite-direction quotation that starts with a number or ends with punctuation
  • Multi-paragraph text containing both LTR and RTL paragraphs, e.g. an RTL restaurant review followed by restaurant address in Latin script.

Such text does not include the Unicode formatting characters that could fix its display either because it must conform to a syntax that would misinterpret such characters, or simply because it was created by a human user that does not know such characters exist, much less how to enter or use them. Given the text's syntax, or at least a set of patterns for the problematic parts, the text could, in theory, be parsed into its constituent parts, and formatting characters added to make the text display correctly.

Although this is a painful real-world problem, it is unrelated to HTML per se and currently lacks a mature solution. We are not proposing one here.

2.2.3 Estimation algorithms

A data string's direction is obvious when it contains either LTR or RTL characters, but not both. The following heuristic algorithms have been used when the data does contain both LTR and RTL characters:

  • First character with strong direction. This is the algorithm specifically mandated by the UBA for choosing a paragraph's base direction (unless overridden by a higher-level protocol, which is what currently always happens in HTML). This has the advantage of being easy to understand (and even surmise) for the user, and text is usually more readable when starting with a word in its overall direction. Nevertheless, it is not uncommon for an RTL phrase to start with an LTR word like a brand name or a technical term, in which case this algorithm fails.
  • Does the string contain any RTL characters? This fails for LTR text that includes some RTL, which is quite uncommon, but not unheard of.
  • Word count: does the percentage of RTL words exceed some threshold value? Works well for a mixture of the RTL languages with English, but untested for other languages, and not well-defined for the languages that do not use spaces to separate words. Also, it proves unintuitive to the user in some circumstances.
  • Character count: in an attempt to get around some of the issues of word counting, an alternative has been proposed of comparing RTL and LTR character counts, after applying script-specific coefficients for the counts of characters in different scripts. This remains untried and unproven.

Different approaches have been preferred in different contexts: first-strong for search boxes, any-RTL for advertisements, and word-count for longer texts like e-mails. Nevertheless, it is worth pointing out that the choice of the precise algorithm is an optimization. For most real-world data strings, all these estimation algorithms will give the same correct result.

In addition to the basic algorithm choice, there are also side issues:

  • How should those parts of the string bracketed in LRE / RLE / LRO / RLO and PDF characters be treated? It is possible to simply ignore such bracketing characters, and this is actually specified by the UBA for its first-strong algorithm. Another possibility is to ignore them together with the substrings they bracket. The rationale for the latter approach is that the direction we estimate for the whole string will not be applied to the bracketed substrings anyway. In fact, if part of a string is explicitly declared LTR, it is usually because the overall string is RTL, and vice-versa. On the other hand, if the string contains no strong-directional characters outside the declared substrings, and all the declared substrings give the same direction, then it might be best to estimate the string overall to be of the same direction as the declared substrings.
  • Is it really necessary to scan the whole string, or can the decision be safely limited to a maximum-sized prefix? This is particularly relevant to browsers, which must start displaying an element's content before having downloaded the whole document or even the whole element.

2.2.4 Proposed solution

Make simple direction estimation functionality available in the browser by allowing the dir attribute to take on a new auto value indicating that the user agent is responsible for estimating the direction of the element's contents.

Specifying dir=auto directs the user agent to examine the element's text content and estimate whether it is LTR or RTL using one of a small set of simple, well-defined algorithms (see below). Once this direction value (ltr or rtl) is determined, dir=auto behaves as if that value had been assigned to the dir attribute. Thus, this is the value assigned to the CSS direction property, and thus the one used for inheritance to descendants using the usual mechanism. The auto value is never inherited.

Other than the use of the estimation algorithm, the one other difference introduced by dir=auto is that the default value for the ubi attribute becomes ubi, thus applying directional isolation from the element's surroundings.

Since there is no one perfect, practical direction estimation algorthm currently known, and since different known algorithms are heuristic and work best in different use cases, we support two algorithms and allow a new HTML element attribute, autodirmethod="first-strong"|"any-rtl", to specify which algorithm dir=auto should use. We allow autodirmethod to inherit, so it can be specified directly on the dir=auto element or on an ancestor, e.g. the root element in order to apply to all the dir=auto elements on the page. The default value of autodirmethod on the root element is first-strong.

The estimation algorithms take as input the in-order traversal of the element's descendent text nodes. However, they exclude the text nodes on the path to which there is an element with a unicode-bidi style other than "normal", e.g. a bdo element or an element with a dir attribute (including dir=auto).

The first-strong algorithm, identical to that defined by the UBA's rules P1, P2, and P3, returns the direction of the first strong (Unicode bidi class L, R, or AL) character it encounters in this traversal. If it does not encounter any, it returns ltr. Note that this last case includes "formatted numbers", e.g. "(617) 987-6543" and "-15.2%".

The any-rtl algorithm returns rtl if it encounters any strong RTL (bidi class R or AL) characters, and ltr otherwise. However, it excludes parts of the text encountered in the traversal:

  • Parts of the text between an LRE, RLE, LRO, RLO, and its matching PDF.

  • The part of the text after the first 100 characters, where the text excluded by the rules above is not part of the count. This should alleviate efficiency concerns.

No attempt will be made to exclude "hidden" content, whether using display:none style or any other invisibility technique. For backward compatibility, dir=auto will not be the default for any currently defined elements.

Please note that the direction that any direction estimation algorithm will assign to text mixing LTR and RTL characters, although well-defined, may not always be correct as judged by a human user.

Also note that dir=auto is primarily intended for elements wrapping a "single-origin" piece of text, e.g. a text input or a span displaying an item name or description as read from a database. The more complex the element's structure, the higher the chances that it mixes LTR and RTL content, and the lower the chances that an estimation algorithm will succeed in displaying the contents intelligibly. It is meaningless to use an estimation algorithm on content mixed to the extent that it is unintelligible in both LTR and RTL (when displayed by standard UBA rules).

Note

Implementation update:

CSS Writing Modes Level 3 module added a plaintext value to the unicode-bidi property in order to allow per-paragraph auto-direction, primarily for use on textarea and output elements.

HTML5 added a new auto value for the dir attribute, but not autodirmethod. The effect of dir=auto is to set the unicode-bidi CSS property to plaintext for textarea and output elements, to bidi-override isolate for bdo elements, and to isolate otherwise. Either way, it estimates a direction according to the UBA method applied to element’s text content (while skipping over descendant elements with an explicit dir attribute, as well as bdi, script and style elements), and sets the CSS direction property accordingly.

2.3 Support reporting the chosen direction of input and textarea in form submissions

2.3.1 Background

In many applications, it is necessary to allow the user to enter text of either direction into a given input type="text" or textarea element, regardless of the page's direction.

Although algorithms for estimating the direction of a string exist (and hopefully will be exposed by the browser as described in Support auto-direction ), they remain heuristic for mixed-script strings.

As a result, all major browsers provide some way for the user to explicitly set the direction of an input type="text" or textarea element, e.g. via keyboard shortcuts, so the text being entered by the user is displayed correctly.

2.3.2 The problem

Once the text entered by the user has been submitted to the server, the direction in which it was displayed in the page is lost, unless explicitly added to the form as an invisible input by page scripts. However, scripts are not available in all environments, e.g. e-mail forms. As a result, in such an environment, the application is forced to guess at the direction of a string submitted by the user, will sometimes get it wrong, and as a result display it incorrectly in subsequent pages.

2.3.3 Proposed solution

Support a new attribute, tentatively named submitdir, in input and textarea. Its presence will specify that when the element is a "successful control" (i.e. its value is to be included in the form submission), then the value of the element's computed direction (at submission time) is also to be included in the submission, as an additional "successful control". (Reminder: the computed direction is the bottom-line ltr or rtl being used to display the element; it never takes on any other value. It is available as the value of the CSS direction property for the element.)

The additional control's name is to be the element's control name suffixed with "_dir". If the form contains other control(s) with the same control name as the additional control, the additional control will still be submitted alongside them; it is up to the application to sort out what the different control values mean.

The value of the submitdir attribute is immaterial; it would normally be an empty string (when the attribute is present without a value) or "submitdir".

For example, let's assume that a dir attribute value to indicate direction estimation is auto, and an RTL page contains the following form:

<form action="foo" method="get"> 
  <input type="text" name="mytest" dir="auto" submitdir /> 
</form>

Then, if the user typed in the LTR value "hello", the submission URL would be foo?mytest=hello&mytest_dir=ltr.

Note

Implementation update:

HTML5 added a new attribute, dirname, with roughly the proposed semantics, for the textarea element and the input element in the text and search states. The dirname attribute has to have a value, and that value is the name used for the field that carries the direction information.

2.4 Support option for images to be flipped horizontally in RTL

2.4.1 Background

Although most images, e.g. photos, are equally applicable to LTR and RTL pages, some images are inherently and primarily "handed" or "directional", and need to appear in a mirror image in an RTL page. Common examples include various arrow and "connector" images. A less obvious example might be star rating images: the "full" half of a half-star needs to be on the left in LTR and on the right in RTL.

Images can be introduced into a page via CSS, e.g. the background property, in addition to HTML features like the img element. In fact, CSS is the recommended way to introduce images that are part of the presentation, which is what all or at least most "handed" images are. Furthermore, introducing images through CSS using the "sprite" technique can be a great deal more efficient than using HTML to add the images.

Using the directional selection (:rtl) CSS feature in combination with the graphic transformation (transform:scaleX(-1)) CSS feature, both of which have already been proposed for CSS3, can achieve the horizontal flipping of an element containing a "handed" image based on direction.

2.4.2 The problem

Currently, the author of a page to be localized into both LTR and RTL languages is forced to create two separate versions of each "handed" image, stored in two separate files, and use one or the other depending on the page language by changing the src attribute of the img. This process is monotonous and error-prone.

This headache can be avoided using the directional selection (:rtl) and graphic transformation (transform:scaleX(-1)) features proposed for CSS3, since a combination of these can horizontally flip an element depending on the UI direction. Being able to tell the browser to do such flips automatically makes it that much easier for web applications to support both LTR and RTL interfaces. Only one image needs to be provided by the page, and the attributes of the element displaying the image do not even have to differ between the LTR and RTL pages.

Nevertheless, this technique is not very easy to use or even to discover. Furthermore, it is limited to horizontally flipping the element as a whole, and is thus inappropriate for elements that combine several images or display more than images (although it is probably sufficent in the case of simple img elements).

2.4.3 Proposed solution

Expand the syntax of each of the possible ways that an image can be specified in CSS3 Images, e.g. url, sprite, image-list, linear-gradient, and radial-gradient by allowing a new keyword, rtlflip. Examples would be:

  • background-image: linear-gradient(45deg, white, black) rtlflip

  • list-style-image:url('sprite.png#xywh=10,30,60,20') rtlflip

The presence of the rtlflip keyword means that the image must be horizontally flipped when the element's CSS direction (or, in the case of list-style-image, the list item marker's direction, as defined by the list-style-direction CSS property) is rtl.

An alternative syntax for these same contexts would be to allow one of two new keywords: ltr or rtl. The presence of one of these would declare the image's direction and specify that the image should be horizontally flipped when this direction does not match the element's CSS direction (or, in the case of list-style-image, the list item marker's direction). Being declarative as opposed to instructive, this alternative is more elegant than rtlflip, but requires two new keywords instead of one. It is up to the CSS WG to choose the better syntax.

3. Standardizing Bidi Aspects of Existing HTML and CSS Features

3.1 br should serve as a bidi separator

3.1.1 Background

The UBA's rendering of a piece of text depends not only on the explicitly declared direction in which it appears (e.g. the dir attribute value on the parent element) and the characters it contains, but also on the implicit directional properties of the characters preceding and following it. For example, in an RTL context, "john: " is displayed as "john: " when followed by "susan" (i.e. "john: susan"), but as " :john" when followed by "SUSAN" (i.e. "NASUS :john") - note the change in colon positioning.

The bidi formatting characters LRO, RLO, LRE, RLE, and PDF have particularly strong influence on what surrounds them. For example, RLO makes all text up to the next PDF behave as RTL characters, making "hello" display as "olleh".

In the UBA, whitespace and punctuation characters provide almost no separation against either kind of bidi influence.

On the other hand, the UBA's sections 3.3.1 and 3.3.2 require that the bidi state be completely reset at a "paragraph break". This means that strong-directional text (e.g. letters) and explicit bidi formatting characters (e.g. RLE and RLO) in one paragraph have no effect on the formatting of the text in the next paragraph and vice-versa. This is a very high level of bidi separation.

In plain text, line breaks (line feed (U+000A), carriage return (U+000D) and their combinations) are commonly used both to end paragraphs and simply to wrap logical lines. The former usage needs a UBA paragraph break, while the latter usage wants no more bidi separation than other kinds of whitespace. The UBA resolves this ambiguity in favor of the paragraph break because of its importance. All common UBA implementations for plain text treat line breaks as a UBA paragraph break, in accordance with the UBA specification.

The UBA leaves the definition of a "paragraph" in higher-level protocols like HTML up to the protocol.

It is well-accepted that HTML block elements like div and p form UBA paragraphs, and this is implemented by all major browsers. Thus, whatever happens inside a block element has no effect on the bidirectional rendering of the text before it or after it.

3.1.2 The problem

The HTML 4 standard explicitly specifies that br is to be treated for bidi purposes as whitespace, and not as a UBA paragraph break. The arguments for this decision seem to be that:

  • br is defined as an inline element.

  • The preferred way to demarcate a paragraph in HTML is as a p or some other block element.

Firefox and Opera and (in standards mode) Internet Explorer 8 and 9 follow this specification and treat br as whitespace for UBA purposes.

In actual usage, however, br is a very popular element and is used to form paragraphs at least as often as p, just like line breaks in plain text. In fact, unlike line breaks in plain text, it is almost always used for that purpose, as opposed to just wrapping a line to fit in a limited amount of space, simply because HTML normally takes care of line wrapping by itself.

As a result, Firefox's implementation of br as UBA whitespace, despite being in accordance with the current HTML specification, is regularly reported as a bug. It results in innocent-looking HTML like

1. his name is JOHN.<br>
2. SUSAN is a friend of his.

being rendered as

1. his name is .NHOJ
NASUS .2 is a friend of his.

Because the "JOHN.<br>2. SUSAN" forms a single RTL run despite the br, the "2" goes to the right of SUSAN. (Please note that wrapping the "JOHN" and "SUSAN" in separate dir="rtl" spans, i.e. "<span dir="rtl">JOHN</span>.<br>2. <span dir="rtl">SUSAN</span>", does not make any difference.)

Although this LTR example is somewhat contrived, the RTL equivalent is quite realistic because it is common for LTR brand names, acronyms, etc. to be used in RTL text:

1. IT IS IMPORTANT TO LEARN html.<br>
2. css IS IMPORTANT TOO.

which is rendered in Firefox and Opera as

html. NRAEL OT TNATROPMI SI TI .1
.OOT TNATROPMI SI 2. css

As a result, IE 7 and WebKit treat br as a UBA paragraph break. Although this is not in conformance with the HTML 4 spec, the bidi separation it provides does seem to follow most users' expectations.

If IE and WebKit were to change their br behavior to conform to the current standard, many existing RTL HTML documents would be broken, especially given that they tend to be authored mostly with IE in mind.

While the bidi separation provided by treating br as a UBA paragraph separator is useful, the strong nature of this separation (closing all open embedding levels) also creates problems. Being an inline element, br can be nested within an arbitrary number of other inline elements. If these inline ancestors have explicit dir attribute values of their own, should the br terminate their effects as UBA's definition of a paragraph separator says it should? That is what a line break in plain text does when it comes between an LRE or RLE and its matching PDF. So, should the second line in <div dir="rtl"><span dir="ltr">1. hello!<br>2. goodbye!</span></div> be displayed as RTL? That would conform to the definition of a UBA paragraph break, but would go against the spirit of HTML. This is, in fact, what WebKit currently does (although it is now being treated as a bug).

To avoid this problem, IE apparently re-opens the directional embedding levels specified on ancestor elements via mark-up (dir attribute, bdo element) or CSS up to the closest ancestor block element after closing them at a br paragraph break. On the other hand, it does not reopen the directional embedding levels stemming from surrounding LRE/RLE/LRO/RLO and PDF characters. Should this be specified as the correct behavior?

And what about those rare uses of br when it is simply being used to wrap a line and a UBA paragraph break is undesirable?

3.1.3 Proposed solution

Support a new HTML element attribute, bidibreak=hard|soft. On a br element, the soft value means that the br is to be treated as a UBA bidi class WS (whitespace) character, as was required in HTML 4. The hard value means that the br is to be treated as UBA bidi class B, i.e. paragraph break. If neither is specified, the bidibreak attribute value is inherited from the parent. Thus, when specified on an element other than br, bidibreak serves to determine the behavior of descendant br elements. For the root element, the default is hard (which, of course, spreads to every br element in the document, unless an intervening element sets bidibreak otherwise). Alternatively, if and only if all major browser makers reach unanimous consensus that the default value for the root element should be soft and commit to implementing it as such to the HTML WG prior to the new HTML specification publication, that too would be fine.

When the author wants to use br just to wrap a line without adding bidi separation, <br bidibreak="soft"> will do the trick.

Reasonable use cases for specifying bidibreak="soft" on non-br elements would include an element containing poetry, as well as the root element of a document that relies on the bidi behavior specified for br by HTML 4.

UTR #20 and UAX #13 will need to be updated to reflect this change. In the former, 'In HTML, use <xhtml:br /> instead of U+2028' should be replaced with 'In HTML, use <xhtml:br bidibreak="soft" /> instead of U+2028'. In the latter, 'line separators basically correspond to HTML <br>' should be replaced with 'line separators basically correspond to HTML <BR BIDIBREAK="soft">'.

When br introduces a UBA paragraph break, the base direction of the new UBA paragraph will be determined by the computed direction of the nearest ancestor element whose bidi properties require its contents to be in a separate UBA paragraph (or sequence of paragraphs), e.g. a block element or an element directionally isolated by the ubi attribute. Furthermore, for every element between there and the br that results in the creation of an embedding or override level, e.g. a bdo element or any element with a dir attribute or a value other than normal for the unicode-bidi CSS property, the corresponding embedding or override level is re-introduced at the start of the new UBA paragraph (to be closed at the end of the element or the UBA paragraph, whichever comes first).

Note

Implementation update:

HTML5 was changed to define br as a bidi paragraph separator. The bidi-break attribute was not implemented, given the lack of incontrovertibly good use cases for the value soft, and that LINE SEPARATOR will hopefully be supported in future and thus cover the use cases that have been given.

3.2 Line breaks should serve as bidi separators inside output, textarea, and script dialog text

3.2.1 Background

As in br should serve as a bidi separator.

3.2.2 The problem

IE and WebKit treat line break characters as a UBA paragraph break in output, textarea, and the text displayed in dialogs by the page's scripts using functions such as Javascript's alert() and confirm(). Given that in these contexts line breaks are expected to behave as they do in plain text, this would seem to be in accordance with the UBA. Firefox, however, treats line breaks in all these contexts as UBA whitespace, while Opera treats them as UBA paragraph separators in textarea and dialog text, but as whitespace in output. See br should serve as a bidi separator for examples where this makes a difference.

While one might think that one could use LINE SEPARATOR (U+2028) and PARAGRAPH SEPARATOR (U+2029) characters in these environments to disambiguate the desired behavior, the HTML 4 specification explicitly prohibits treating them as line breaks or even as whitespace. While this makes sense for HTML generally, it does not seem to make much sense inside textarea and output, since these characters are useful in plain text, and these environments are meant to format plain text by plain text rules.

3.2.3 Proposed solution

The HTML specification should state that in elements where line breaks are not collapsed, e.g. textarea and elements with white-space:pre|pre-line|pre-wrap, the LINE SEPARATOR (U+2028) and PARAGRAPH SEPARATOR (U+2029) characters should also break lines. Line breaks and PARAGRAPH SEPARATOR characters in these elements will constitute UBA paragraph breaks, while LINE SEPARATOR characters and wrapped lines will constitute UBA whitespace. At UBA paragraph breaks introduced by line breaks and PARAGRAPH SEPARATOR characters, the new UBA paragraph will be opened with a base direction and initial embeddings as specified for UBA paragraph breaks introduced by br.

Page script services for displaying plain text, such Javascript's alert() and confirm() functions, should also break lines at LINE SEPARATOR and PARAGRAPH SEPARATOR characters. They should treat line breaks and PARAGRAPH SEPARATOR characters as UBA paragraph breaks, and LINE SEPARATOR characters and wrapped lines as UBA whitespace.

Note

Implementation update:

HTML5 does not specifically call out the handling of LINE SEPARATOR (U+2028) and PARAGRAPH SEPARATOR (U+2029), since this is a matter for CSS and Unicode. It no longer contains the prohibition on their use that HTML4 contained.

HTML5 added a note that newlines constitute UBA paragraph breaks in output elements, and added a similar note for the textarea element in its raw value.

For script dialog text, HTML5 delegates implementation of line breaks, LINE SEPARATOR (U+2028) and PARAGRAPH SEPARATOR (U+2029) to the Unicode Standard, however it affirms that "User agents are expected to honor the Unicode semantics of text that is exposed in user interfaces, for example supporting the bidirectional algorithm in text shown in dialogs, title bars, pop-up menus, and tooltips." and adds an example of required Unicode support in script dialogs that says that LINE FEED should constitute a UBA paragraph break.

3.3 Block elements as bidi separators

3.3.1 Background

As in br should serve as a bidi separator.

3.3.2 The problem

There is no standard definition of whether a block element serves as a UBA break between the text preceding and following it, i.e. whether the text preceding a div or an hr (defined to be a block element) should behave as if it were in the same UBA paragraph as the text following it. For short, we will call block elements with text on both sides "embedded".

Different browsers treat embedded block elements differently. Just as with br, in Firefox and Opera, an embedded block element provides no bidi separation between the text preceding and following it, while IE and WebKit treat it as a UBA paragraph break. See br should serve as a bidi separator for examples where this discrepancy makes a difference; just replace br with hr.

It is difficult to justify Firefox and Opera's treatment of embedded block elements. Besides breaking a line, the embedded block elements:

  • Include among them the paragraph element, p. It seems reasonable to expect the insertion of a paragraph to break the text before it and the text after it into two UBA paragraphs.
  • Turn the text before and after them into "anonymous blocks", and it is well accepted that each block should constitute a separate UBA paragraph.

Thus, it seems reasonable to resolve the discrepancy in favor of treating embedded block elements as UBA paragraph breaks.

Nevertheless, should this hold for all block elements, even those that have been "taken out of the flow" via CSS like position:absolute? Currently, IE treats these as UBA paragraph breaks, but WebKit doesn't.

What about inline elements that have been given display:block in CSS? Currently, browsers make no bidi-wise distinction between real block elements and display:block ones: IE and WebKit treat them all as UBA paragraph breaks, and Forefox and Opera do not.

The converse situation of block elements that have been given display:inline in CSS has further complications. (One use case for this technique is getting an inline auto-numbered list with <ol style "display:inline">.) At one time, all browsers treated such an element for bidi purposes as an ordinary inline element (no special treatment). However, HTML 4 explicitly specified that such an element, when lacking the dir attribute, should be treated as if it had a dir attribute of its inherited base direction. This was done to prevent the bidi ordering within such an element from being affected by the element's surroundings, as it would not be affected if it still had block display. To date, the only browser to implement this specification is Firefox. And, as we have seen in Support bidi isolation of inline element content, this specification does not go far enough, since adding a dir attribute to an inline element does not prevent it from affecting the bidi ordering of its surroundings in ways that a separate block would not. So, should we now also start treating block elements with inline display as UBA paragraph breaks?

3.3.3 Proposed solution

An element with display:block, except when it has been taken out of document flow with CSS such as float or position:absolute, but regardless of whether it is a block element or inline element, should be specified as introducing a UBA paragraph break between the content preceding and following it. This does not present a problem for backward compatibility because there has been no browser interoperability in this respect.

If the display:block element has display:inline ancestors that have bidi properties (e.g. the dir attribute or the bdo element), these bidi properties should be applied to the anonymous block boxes created for these ancestors, in accordince with CSS specs for anonymous block boxes. This is analogous to the re-opening of initial embeddings in the new UBA paragraph as specified for the UBA paragraph breaks introduced by br.

The default ubi attribute value for a block element with display:inline should be specified to be ubi, isolating the element directionally from its surroundings by default. The condition "When a block element that does not have a dir attribute is transformed to the style of an inline element by a style sheet, the resulting presentation should be equivalent, in terms of bidirectional formatting, to the formatting obtained by explicitly adding a dir attribute (assigned the inherited value) to the transformed element" should be removed from the HTML specification. Thus, when such an element is explicitly given ubi=off, it will be displayed the way it had been before this condition was added to the HTML specification and the way it is currently displayed in all major browsers except Firefox. These changes do not present a problem for backward compatibility because the HTML 4 specification was never implemented in this respect by most browsers.

Note

Implementation update:

For display:block elements, except those taken out of document flow with CSS such as float or position:absolute, to be specified as introducing a UBA paragraph break between the content preceding and following it, HTML relies on CSS.

However, HTML5 changed the default stylesheet to specify unicode-bidi:isolate for all “block” elements (in case they get forced to display:inline), where it and HTML 4 had previously specified unicode-bidi:embed.

3.4 Script dialog text direction

3.4.1 Background

The W3C recommends that in HTML, the direction of text be declared using the dir attribute, avoiding the use of Unicode formatting characters LRE, RLE, and PDF except where the dir attribute is inapplicable.

3.4.2 The problem

Services like Javascript's alert() and confirm() functions that let page scripts display plain text in a dialog currently do not take a parameter to indicate the direction of the text to be displayed. In fact, there is not even a specification or interoperability for the directional context in which such text will be displayed, i.e. the default direction assumed for the text:

  • In IE, the script dialog text is displayed in the page's direction, as set using .<html dir=..> or <body dir=...>.

  • In the other major browsers, the directional context used for dialog text appears to be either the OS or the browser chrome's default direction, which neither the server nor page scripts can even determine, let alone control.

  • The i18n test suite being developed by the W3C currently starts with IE's approach, but goes further, asserting that dialog text should be displayed in the triggering element's direction. While this does give a measure of control of the direction in which the text will be displayed, it also makes things quite difficult for the page developer, since the same function when called for events triggered by different elements - or even the same element after its computed direction changes - will result in different dialog displays. No known browser takes this approach.

Since a value displayed in the wrong direction can come out garbled, pages wind up having to wrap their RTL dialog text in RLE + PDF characters for correct display on LTR systems. On the other hand, pages dare not wrap their LTR dialog text in LRE + PDF characters for correct display on RTL systems, since most computers in the world are running an LTR OS without RTL script support turned on, and thus display LRE and PDF as rectangles. (This is not a concern in the case of RTL dialog text, since a system that does not have RTL script support will not display RTL text correctly anyway.) Also, since dialog text can contain line breaks and thus UBA paragraph breaks, the formatting characters have to be applied separately to each line of the text. Finally, these formatting characters are little-known, lack named entities, and are generally undesirable in HTML documents. It is unacceptable to force applications to use them when they need to display a message whose direction differs from that in which the user agent will display it (even if there were a way to determine that).

3.4.3 Proposed solution

Ideally, functions like Javascript's alert(), confirm() and prompt() should take an optional paramter indicating the text's direction. Such a proposal should be made to the relevant Ecmascript bodies.

However, there is also a need for reasonably useful default behavior when the script does not make use of such a parameter, even if and when such parameters in fact become available, and certainly before that blessed day.

Thus, the HTML specification should state that plain text passed by page scripts without specifying an explicit direction to whatever services script languages provide for dialog display should be displayed according to the UBA's rules P1, P2, and P3, which estimate the direction of each paragraph according to its first strong character.

The i18n test suite being developed by the W3C should be modified to conform with the proposed solution.

Note

Implementation update:

HTML5 says "Text from the contents of elements is expected to be rendered in a manner that honors the directionality of the element from which the text was obtained. Text from attributes is expected to be rendered in a manner that honours the directionality of the attribute.".

A proposal for HTML to require user agents to implement the Unicode specification regarding Default Ignorable Code Points (Unicode Standard version 5.2, Chapter 5, section 5.21), even if the underlying platform does not handle them properly, was rejected because if the underlying platform handles them incorrectly, then that's a bug in the underlying platform. Either the platform should be fixed, or the user agent should work around it.

3.5 title should support the dir attribute

3.5.1 Background

As in Script dialog text direction.

3.5.2 The problem

One would expect that the page's direction set using <html dir=...> would apply to the page's title element. Unfortunately, however, this is not the case in any major browser. The directional context all major browsers use for title is either the OS or the browser chrome's default direction, which neither the server nor page scripts can even determine, let alone control.

Nor does setting the dir attribute directly on the title element have any effect in any major browser.

Since a value displayed using the wrong direction can come out garbled, pages wind up having to wrap their RTL title in RLE + PDF characters. This has the same problems as with script dialog text, see Script dialog text direction.

3.5.3 Proposed solution

The HTML specification should explicitly state that the title's text will be displayed in the title's computed direction.

It is easy enough for a browser to implement this, since it knows the default directional context in which the text will be displayed. If and only if this differs from the desired direction, the browser needs to wrap the title text in RLE + PDF when RTL is desired and LRE + PDF when LTR is desired.

In principle, this could break existing RTL documents that count on their title being displayed in LTR, as is usually the case today. The change should be made despite this, because:

  • Such documents can't really count on the current behavior anyway: on an RTL OS / browser the title is already displayed RTL.

  • In many cases, RTL documents work around the problem by having a title that looks the same whether displayed in LTR or RTL.

  • This will fix more documents than it will break.

  • Forcing backward compatibility will perpetuate an ugly exception.

Note

Implementation update:

HTML5 was changed to require that the title element text will be displayed in the element's computed direction. In addition, HTML5 changed to require text from elements generally to be rendered in native user interfaces in a manner that honors the directionality of the element from which the text was obtained, mentioning dialogs, title bars, pop-up menus, and tooltips as particular examples.

3.6 title and alt attribute text direction

3.6.1 Background

As in Script dialog text direction.

3.6.2 The problem

Currently all major browsers (IE, FF, Chrome, Safari, Opera) display the tooltips specified by a title or alt attribute in the direction of the element to which it belongs, but this does not appear to be formally specified anywhere. Furthermore, this consensus seems fragile because in principle, the direction of an element and the text of its tooltip do not have to coincide. Here is a reasonable counterexample: an RTL web page displays an LTR address (e.g. for a location in Europe), with a tooltip on the address element saying "ADDRESS" in the page's language. The tooltip thus needs to be RTL while the element needs to be LTR.

Until recently, Chrome displayed tooltips in the OS / browser's default direction. When fixing this bug, the initial inclination was to apply only the page's direction, not the element's, due to the "in principle" consideration above.

Apparently not trusting browser behavior, the W3C suggests that tooltip direction may have to be set using LRE | RLE + PDF. This is actually quite difficult to do properly, since wrapping an LTR tooltip in LRE + PDF just in case the browser winds up displaying it in an RTL context will result in the LRE and PDF displaying as rectangles on LTR OS's without RTL support enabled, i.e. the vast majority of computers.

3.6.3 Proposed solution

Support two new attributes, titledir=ltr|rtl|auto and altdir=ltr|rtl|auto that, when present, specify the title and alt attributes' direction, respectively. (The values should have the same meaning as for the dir attribute, including the auto value's reliance on the autodirmethod's value.) In the absence of titledir and altdir, respectively, title and alt attribute text will be displayed in the element's computed direction, as should be stated in the specification.

Note

Implementation update:

HTML5 was changed to require that text from elements generally, including their attribute values, be rendered in native user interfaces in a manner that honors the directionality of the element from which the text was obtained, mentioning dialogs, title bars, pop-up menus, and tooltips as particular examples.

Addition of titledir and altdir attributes was rejected for HTML5 because they are inelegant (cannot add ...lang and ...dir attributes for every attribute with a displayed text value), and cases where the element direction and the title or alt text direction need to be different are rare, and a workaround can be used.

3.7 option should support the dir attribute and be displayed accordingly both in the dropdown and after being chosen

3.7.1 Background

As in Script dialog text direction.

3.7.2 The problem

In a single select, the values of different options may have different directions. Currently, however, out of all major browsers, only FF supports the dir attribute on option, and does so poorly: once the value is chosen, it is displayed in the select's direction.

IE and Opera display all options in the select's direction.

Safari automatically estimates the direction of each option and displays it as such both in the dropdown and after it has been chosen regardless of the select's direction (which is only used to place the down-arrow button and to align the values). This is all very nice, but direction estimation algorithms do make mistakes, so it would be good to be able to specify the actual dir value for a given option - and Safari does not support that.

Chrome does not support the dir attribute on option and is on its way to doing what Safari does.

As a result, the only practical way to specify option value direction is using LRE | RLE + PDF, which is cumbersome.

3.7.3 Proposed solution

The HTML specification should state that an option element's computed direction will take its dir attribute into account, and will be used to display the option's text in both the dropdown and after being chosen.

The HTML specification should also state that setting an option element's alignment via CSS or the align attribute will affect its display accordingly in both the dropdown and after being chosen.

Note

Implementation update:

HTML5 was changed to require text from elements generally to be rendered in native user interfaces in a manner that honors the directionality of the element from which the text was obtained. The section gives a detailed example for the option element, including the correct rendering in the select.

HTML5 was also changed to specify that “User agents are expected to render the labels in a select in such a manner that any alignment remains consistent whether the label is being displayed as part of the page or in a menu control.” The align attribute is no longer supported for option elements in HTML5.

Currently, no browser lets the text-align CSS property affect an option’s alignment either in the drop-down or after being chosen. 

3.8 input type="text" and textarea should support interoperable "set direction" functionality

3.8.1 Background

Garbling by incorrect direction also applies to text being entered by the user in an input control. In fact, entering text of direction opposite to the input's declared direction is an unpleasant experience even if the full text does not wind up being garbled, due to the cursor and punctuation jumping around during data entry and difficulty in selecting text. All major browsers thus provide some way for the user to set the direction of each input type="text" and textarea element.

3.8.2 The problem

Unfortunately, the way "set direction" functionality interacts with page scripts varies significantly between browsers, which makes it difficult to write scripts that are informed of the user's choice.

IE: Direction is set using keyboard shortcuts - CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL. (These key combinations are also adopted for this purpose by most Microsoft products, e.g. Windows dialogs, notepad and Word.) They set the value of the element's dir attribute, which is then available to scripts. They trigger the onpropertychange event, at which time the dir value is already changed. They also trigger onkeyup, but before the dir value has been changed, so setTimeout(0) has to be used to get the updated dir value. They do not trigger onkeypress.

FF: Direction is set using the CTRL + SHIFT + X keyboard shortcut, which cycles through LTR and RTL. It does not set the value of the element's dir attribute, and is thus invisible to scripts.

Opera: same keyboard shortcuts as IE. They do not set the value of the element's dir attribute, and are thus invisible to scripts.

Chrome: same keyboard shortcuts as IE. They set the value of the element's dir attribute, which is then available to scripts. They trigger the onkeyup event, at which time the dir value is already changed. They do not trigger onkeypress or oninput. They also do not trigger onpropertychange, since this event exists only in IE.

Safari: Right-click on the input or textarea provides a "Set paragraph direction" submenu. Using "Set paragraph direction" sets the value of the element's dir attribute, which is then available to scripts. However, it does not trigger onkeyup, onkeypress, or oninput. It also doesn't trigger onpropertychange, since this event exists only in IE.

Besides the various "on"-style events mentioned above, there are also the DOM2 mutation events, specifically DOMAttrModified and DOMSubtreeModified. Unfortunately, DOM2 mutation events have not been interoperably implemented, and are currently deprecated in DOM3. Specifically, DOMAttrModified has not been implemented in WebKit and it is generally unclear whether DomAttrModified is supposed to be triggered on attribute changes done via the browser UI, not via script. As for DOMSubtreeModified, while it is implemented in WebKit, it only seems to be triggered by the initial addition of the dir attribute, but not when its value changes. Furthermore, it has not been implemented in Opera.

3.8.3 Proposed solution

The HTML specification should state that some way to set the direction of input type="text" and textarea elements should be exposed to the user, and using it will:

  • Set the element's dir attribute value accordingly.
  • Trigger oninput after the dir attribute has been set; even though no actual input took place, the user did change the recommended interpretation of the input already collected.

Furthermore, it should be recommended that on an OS that has a widespread convention for setting direction (such as CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL on Windows), the user agent will support that convention (although it may provide other methods too).

Note

Implementation update:

HTML5 was changed for the input element in text and search states and the textarea element to specify that when the user agent user interface allows the user to set the direction of the element it will set the element's dir attribute value accordingly and trigger the oninput event after the dir attribute has been set.

The proposal that HTML should specify that on an OS that has a widespread convention for setting direction, the user agent should support that convention on input and textarea elements was rejected because “user agents are naturally incentivised to follow platform conventions, but should not be required to do so, since they may have perfectly good reasons for doing entirely different things. It's out of scope of the specification to require behaviour that is not a factor for interoperability.”

3.9 When an input value is remembered, its direction should be remembered too

3.9.1 Background

Some browsers implement auto-completion, a feature whereby values previously entered into an element like input type="text" are remembered and under certain conditions presented to the user in a dropdown. When the user selects one of the items in the dropdown, this value is assigned to the element. At different times, the user may enter values of different direction for the same input. The direction of a value is set either directly by the user through a "set direction" command exposed by the browser (e.g. via keyboard shortcuts, see 3.8 input type="text" and textarea should support interoperable "set direction" functionality) or letting page scripts automatically set the input's dir attribute after estimating the direction of the value on the fly.

3.9.2 The problem

Browsers do not remember the direction of previously-entered values. Some display them in the dropdown in the OS or browser default direction. Some display them in the input's current direction. Finally, some display each value in its own estimated direction. Each of these will result in some values being displayed incorrectly; even the last approach will sometimes fail because estimation algorithms do make mistakes, and this may not have been the direction originally set by the user or page scripts.

After the user chooses a value from the dropdown, the value is usually displayed in the input's current direction, which may or may not be correct for it.

3.9.3 Proposed solution

The HTML specification should state that whenever a user agent stores a user-provided input type="text" or textarea value for later use (such as auto-completion), it should also store the nominal direction value the element had when displaying this value. This may be the original direction of the element, or may have been set by the user for that value via keyboard shortcuts, or may have been set for that value by page scripts. If the user agent later displays the value in an auto-completion dropdown, it should be displayed in its stored direction. If the value is assigned to an element, the element's dir value should be set to its stored direction.

3.10 The rendering of numbering or bullets in a list should be independent of the direction of individual li elements

3.10.1 Background

The HTML specifications gives no indication of how the bullet or number of an li element should be displayed when its computed direction is the opposite of its parent's (usually an ol or ul).

3.10.2 The problem

In practice, for li elements whose "list-style-position" CSS property has the default "outside" value, different browsers do different things. Furthermore, the effects vary depending on the list's alignment, and whether it is ordered or unordered:

<ul dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

IE Firefox, Opera and WebKit
 * item a.
 * longer item b.
.C METI LTR * 
 * item a.
 * longer item b.
.C METI LTR 

<ol dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

IE Firefox, Opera and WebKit
 1. item a.
 2. longer item b.
 .C METI LTR . 
 1. item a.
 2. longer item b.
 .C METI LTR 

<ul dir="ltr" style="text-align:left"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

IE Firefox and Opera WebKit
 * item a.
 * longer item b.
 .C METI LTR *
 * item a.
 * longer item b.
   .C METI LTR *
 * item a.
 * longer item b.
   .C METI LTR

<ul dir="ltr" style="text-align:right"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

IE Firefox and Opera WebKit
* item a. 
* longer item b. 
.C METI LTR * 
* item a. 
* longer item b. 
.C METI LTR 
 *
item a. 
 *
longer item b. 
.C METI LTR 

In our opinion, not only is browser behavior unacceptably incompatible and inconsistent, but none of the above provides a usable display of opposite-direction list items.

3.10.3 Proposed solution

The HTML specification should state that, by default, the markers of all "list-style-position:outside" items should disregard the list item element's computed direction, using the list element's computed direction instead for the marker's display and positioning.

CSS will provide means to control this. If the CSS default in this respect must differ, the default stylesheet should achieve the default behavior specified above.

Furthermore, the list marker text will be directionally isolated from the list item text, appearing in its own UBA paragraph.

The outcome should look like this:

<ul dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

 * item a.
 * longer item b.
 *
.C METI LTR 

<ol dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

 1. item a.
 2. longer item b.
 3.
.C METI LTR 

<ul dir="ltr" style="text-align:left"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

 * item a.
 * longer item b.
 * .C METI LTR

<ul dir="ltr" style="text-align:right"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul>

 *
item a. 
 *
longer item b. 
 *
.C METI LTR 

The CSS specification should also state that setting an li element's alignment via CSS or the align attribute will affect its display accordingly.

3.11 A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction

3.11.1 Background

The vertical scrollbar in an LTR UI is normally placed on the right side of a window or widget, and on the left side in an RTL UI.

3.11.2 The problem

In a browser open on a given page, the UI is made up of two parts: the chrome of the browser itself (e.g. its menus and toolbars), and the page being displayed in the browser. The two parts can be and often are in two different langauges and thus directions.

It is unclear which of the two is the principal part of the UI. Certainly the page takes up most of the window and is presumably the user's focus of attention. As a result, it seems natural that the vertical scrollbar should be on the "end" edge relative to the page's (i.e. the body element's) overall direction - and not the browser's chrome direction.

However, this usually results in a usability issue when surfing: the scrollbar moves from side to side when going from an LTR page to an RTL page or vice-versa, confusing the user and making the scrollbar surprisingly difficult to find visually and click on physically. It is also arguable that the overall scrollbar is a part of the browser chrome, not the page, so it has no business being dependent on the page direction.

As a result, Firefox, Chrome and Safari place the scrollbar on the "end" edge relative to the browser's chrome direction. Furthermore, this is the behavior required by the dir on html, vertical scrollbar alignment and dir on body, vertical scrollbar alignment test tests in the i18n test suite being developed by the W3C.

However, IE and Opera continue to put the scrollbar relative to the page direction.

3.11.3 Proposed solution

The HTML and CSS specifications should state that the user agent window's overall vertical scrollbar should be located independent of the direction of any page element, despite being otherwise controlled by the style of the body element. (Thus, it should be located on the the "end" side relative to the user agent chrome direction.)

Note

Implementation update:

This proposal was rejected as out-of-scope for the HTML5 specification: “The HTML spec doesn't even require that there be a window in the first place, let alone a scroll bar. Plus, user interface decisions are explicitly left up to user agents since they represent quality-of-implementation issues and not interoperability issues.”

It seems that a proposal to add this to the CSS spec is also inappropriate, since “CSS explicitly mentions only a scrolling mechanism and considers out-of-scope how that mechanism is implemented. It could be scroll bars, or a little panner map in the corner, arrow buttons along each edge of the box, or a joystick control with no on-screen representation, or something else.”

It will be necessary to address the proposal directly to the browser developers.

3.12 The vertical scrollbar of an element below body should be on the "end" side relative to the element's direction

3.12.1 Background

As in A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction.

3.12.2 The problem

Users expect the vertical scrollbar of a "widget" inside the page to be on an LTR widget's right side, and on an RTL widget's left side. The rationale for making the browser chrome direction determine the location of the vertical scrollbar for the body element in A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction were exceptional to the body element:

  • Only the body's scrollbars could conceivably be in the same window location across all pages.

  • Only the body's scrollbars can be conceived of as being part of the browser chrome.

However, due to the usability problem with the page's overall vertical scrollbar described in A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction, Firefox, Chrome and Safari place every element's vertical scrollbar on the "end" edge relative to the browser's chrome direction, regardless of the element's direction. While this is indeed desirable for the body element as indicated in A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction, it is not desirable for the elements below it.

3.12.3 Proposed solution

The HTML and CSS specifications should state that the vertical scrollbar of an element below body and of the body element of a document being displayed in a frame or iframe should be on the "end" side relative to the element's direction.

Note

Implementation update:

This proposal was rejected as out-of-scope for the HTML5 specification: “Where the scroll bars render is a presentational concern, and out of scope for HTML. HTML barely mentions scroll bars, in fact (and only in relation to historical APIs).”

It seems that a proposal to add this to the CSS spec is also inappropriate, since “CSS explicitly mentions only a scrolling mechanism and considers out-of-scope how that mechanism is implemented. It could be scroll bars, or a little panner map in the corner, arrow buttons along each edge of the box, or a joystick control with no on-screen representation, or something else.”

It will be necessary to address the proposal directly to the browser developers.

4. Implementation status information

This appendix captures a snapshot, as of July 2014, of a set of links that were used to track implementations of the features described above. It is captured here for the historical record. It may not show the complete set of bugs raised against implementations.

4.1 Support bidi isolation of inline element content

4.2 Support auto-direction

4.3 Support reporting the chosen direction of input and textarea in form submissions

  1. Limited it to textareas and inputs of type “text” and “search”.
  2. Called it dirname. The problem with the name “submitdir” is that most people apparently take it to mean the plausible “directory you submit to” (i.e. similar to “action” - someone even suggested renaming it “actiondir”).
  3. Required that it be given a value.
  1. There has to be something better than the "dirname" name. How about "addDirection"? - Rejected: too long, ddd.
  2. Would it be possible to allow no value, e.g. to use the name value suffixed with "_dir"? - Rejected: feature is already too complicated.

4.4 Support option for images to be flipped horizontally in RTL

4.5 br should serve as a bidi separator

4.6 Line breaks should serve as bidi separators inside output, textarea, and script dialog text

4.7 Block elements as bidi separators

  1. Updated bug 65617 on WebKit.
  2. Updated bug 676245 on Mozilla.

4.8 Script dialog text direction

4.9 title should support the dir attribute

4.10 title and alt attribute text direction

  1. If the element has dir=auto (explicitly or by default, as is the case for the bdi element), or if the element inherits its directionality from such an element, then the directionality of each of the element's attributes must be computed as if attribsdir="auto" had been specified.
  2. Otherwise, the directionality of the element's attributes is the same as the element's directionality.

4.11 option should support the dir attribute and be displayed accordingly both in the dropdown and after being chosen

  1. HTML5 no longer includes the align attribute.
  2. This is thus purely a CSS matter.
  3. Currently, no browser lets the text-align CSS property affect an option’s alignment either in the drop-down or after being chosen.
  4. If text-align, which is “start” by default, were allowed to influence option’s alignment, the results would be undesirable: by default, LTR and RTL entries would have opposite alignment, which in most cases is undesirable.

4.12 input type="text" and textarea should support interoperable "set direction" functionality

4.13 When an input value is remembered, its direction should be remembered too

  1. “We definitely don't want session storage state memory mutating the DOM dynamically.” But form memory already does so when it sets the input/textarea's value/content.
  2. “In any case, that's unlikely to ever come up, since the page isn't likely to have a different dir="" attribute than it did when the user filled the form the first time.” But it does, when the user sets the control's direction via the UI provided by the user agent, which sets the control's dir attribute, when filling the form the first time.
  3. “The spec doesn't define how any of the form memory stuff should work, so it's not clear to me that it'd be appropriate to go into this much detail.” This is a fair point: it’s difficult to be normative in something that the spec really doesn’t define very much.

4.14 The rendering of numbering or bullets in a list should be independent of the direction of individual li elements

4.15 A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction

4.16 The vertical scrollbar of an element below body should be on the "end" side relative to the element's direction

4.17 Notes

  1. Mozilla bug 613154 lists all the Mozilla bugs opened so far that deal with Additional Requirements for Bidi in HTML and CSS.
  2. WebKit bug 50910 is the tracking bug for the WebKit project.

A. Revision Log

This version introduces the following changes:

B. Acknowledgements

The editors owe a debt of gratitude to the following contributors: Adil Allawi, Technical Director, Diwan Software, Matitiahu Allouche, Bidi Architect, IBM, Uri Bernstein, Google, Douglas Davidson, Apple, Mark Davis, Senior I18n Architect, Google & President of the Unicode Consortium, Martin J. Dürst, W3C I18n Interest Group Chair, Asmus Freytag, President, ASMUS, Inc., Richard Ishida, I18n Lead, W3C, Shanjian Li, Google, Mohamed Mohie, IBM, Jeremy Moskovich, Google, Shachar Shemesh, Lingnu Open Source Consulting, Gaal Yahas, Google