Document Status Update 2022-08-30: This version is outdated! For the latest version, please look at https://www.w3.org/International/techniques/authoring-html#direction.
This document provides advice on practical techniques related to the creation of content in languages that use right-to-left scripts, such as Arabic and Hebrew, or content in other languages that includes fragments of text in these scripts. This is a W3C Draft produced by the Internationalization Working Group, part of the W3C Internationalization Activity. The Working Group expects to advance this Working Draft to Working Group Note. Please send comments on this document to www-international@w3.org (publicly archived).
All authors and producers of HTML and CSS who are working with text in a language that uses a right-to-left script, or whose content will be localized to a language that uses a right-to-left script.
This document provides guidance for developers of HTML that enables support for international deployment. Enabling international deployment is the responsibility of all content authors, not just localization groups or vendors, and is relevant from the very start of development.
It is assumed that readers of this document are proficient in developing HTML and XHTML pages - this document limits itself to providing advice specifically related to internationalization.
This document lists a number of do's and don'ts, which we will refer to as techniques, related to authoring pages in right-to-left scripts. Each technique is followed by a 'detail' link which provides further information. Where needed, you can get additional information and explanations by following the links to the appropriate section of the techniques index, listed alongside each section.
If a technique says 'consider', there are usually pros and cons involved in following the advice given, and you should follow the link to more detailed information to be sure you understand these. In some cases it may be that not all browsers support the features described. In other cases, it may be purely up to you to decide whether or not this is a good idea.
'Bidirectional', or 'bidi', text typically refers to text written using a mixture of right-to-left and left-to-right scripts. For example, in Arabic and Hebrew text the content flows predominantly from right to left, but embedded numbers or text in other scripts (such as Latin script) still runs left to right. Text in other languages, such as English, can also be bidirectional if it includes excerpts from languages such as Arabic and Hebrew.
Scripts such as Arabic and Hebrew, which are predominantly right-to-left in orientation, may be referred to as 'RTL' (right-to-left) scripts.
Several languages use the Arabic script, such as Urdu and Persian. Several other scripts run predominantly right-to-left: these include Thaana, N'ko, and Syriac, as well as other scripts no longer in common use, such as Cypriot, Phoenician and Kharoshthi.
Direction is a property of scripts, not language.
Be careful about assuming that information about directionality can be inferred from information about the language of the text, as this is
not always true. There must be a one-to-one mapping between directionality and language for this to work, and there often isn't. For example, Azerbaijani can be written using both right-to-left and left-to-right scripts, and the language code az
is relevant for either.
In addition, when using directional markup inline, the markup and the values of that markup do not necessarily coincide with language declarations.
Also, markup used to indicate directionality has values that indicate that the normal directionality should be overridden; it is not possible to indicate that using language related values.
In the same way, attributes indicating text direction in HTML do not, and should not, provide information about the language of text.
Although it is theoretically possible to infer direction correctly much of the time from language information (no browser does so at the time of writing), it is much better to use directional markup.
There is currently a lack of good editing environments for creating HTML pages using right-to-left scripts. Because of the fact that HTML markup and escapes contain punctuation and strongly typed letters, you are always working with bidirectional source text. However, if the editing application is not aware (as is usually the case) that the markup is not ordinary text, then it can produce some odd effects and make coding difficult.
This section simply mentions some of those problems, so that you are forewarned. It doesn't propose a full solution, but it does offer some advice which may help with problematic editing environments.
Unless your editor recognizes markup in source text as not being normal text, the strongly typed letters and punctuation in the markup will appear in places you wouldn't expect, and sometimes interfere with the order of the content itself.
If you are creating a large amount of right-to-left text, it makes sense to set the base direction of the editing window in your editor to right-to-left. This helps ensure that the content is correctly ordered. Unfortunately, this tends to increase the likelihood that your markup looks strange in the source text.
shows some simple markup in a left-to-right context.
The source contains a p
tag followed by a class
attribute, followed by a title
attribute with some Arabic text (العربي) as its value. The content of the paragraph itself (مشس هخصث خهس تخت تخهثز) starts with Arabic text. The resulting order in a left-to-right environment (where Arabic text is indicated by text in square brackets) is shown below.
As shows, things are hardly better if the overall context for the source code is right-to-left. In this case, the resulting order for the same source text can be seen here.
Note, however, that this source will display correctly in a user agent. This is just a problem for reading and maintaining the source text.
The title
attribute with Arabic text makes the situation much worse than normal in the above examples. The problem arises because there is only 'punctuation' (ie. the quote and angle bracket) between two runs of strongly-typed right-to-left text, so the Unicode bidirectional algorithm considers this to be a single run of text.
It helps a little, if you can do it, to ensure that an attribute with a value that uses left-to-right script text (in the example below, the class
attribute) appears last. This would make the text in a left-to-right context look as expected, and in a right-to-left context it would prevent the interaction of markup with content (see ). There are still some issues, however – things are still a little jumbled, and the quotation marks are not where you would expect.
It can also help to start the content on a new line (see ), however this doesn't always help with inline markup. Also, you should try to avoid including white space before the closing markup, as this can lead to other problems.
If you are dealing with content that is predominantly in a right-to-left script, the ideal solution would be a source editor that recognizes markup as a special construct, and protects it to produce a sensible order for the characters in the source text. Not only that, but if your markup includes a dir
attribute to change the directional context of the content, your editor should recognize this and produce a corresponding change in the order of the source code.
For small edits, if they are unable to find a bidi-aware editor, some authors actually prefer to use an editor that knows nothing about bidi. This means that they have to read the right-to-left content backwards, but at least makes it easier to locate and change the items they are interested in.
If you use a Unicode control character such as the RIGHT TO LEFT MARK (RLM) or ZERO-WIDTH NON JOINER, you will not usually be able to see it in the source text, since it is invisible. For this reason you may think that a useful way to represent these characters is with the pre-defined HTML character entities,
‏
and ‌
, or their numeric equivalents, ‏
and ‌
.
Unfortunately, such an approach typically has its problems, too. As described in the previous section related to markup in source text, the strongly-typed left-to-right characters and non-alphabetic characters in the escapes will normally cause the Unicode bidirectional algorithm to display very odd looking source text.
Very few editors currently recognize, for example, the sequence of characters in ‏
as a single unit representing a character with a strong right-to-left direction. They treat this as simply text containing punctuation, numbers and two strongly-typed left-to-right characters (x and F), and apply the Unicode bidirectional algorithm to that as they would to any normal text.
shows a typical view of source text after adding an escape to bidirectional text in right-to-left ordered source text. Focus on the constituent parts of the character escape itself, rather than the order of the Arabic text. The sequence ‏
is displayed ;x200F#&
when embedded in right-to-left text. At the beginning or end of embedded English text the escape is broken into fragments, and appears as x200F;text in english#&
or ;text in english‏
, respectively.
Note that the source will still display correctly in a user agent. This is just a problem for reading and maintaining the source text.
Various approaches are possible, if you want to avoid using characters that are invisible in your source code:
use an editor that recognizes an escape as a single unit representing a RLM/LRM character and produces the expected effect on the surrounding source text
use an editor that provides a symbolic visual representation of the RLM/LRM character, so that you don't lose sight of it
break the source code line around the escape - works in some cases
Otherwise, you just have to learn to live with the undesirable reordering effects for escapes.
Given the discussion above, representing examples of source text in examples can be quite difficult. Should we show source text in right-to-left order, or left-to-right? Should we assume that the editor recognizes and handles markup and escapes as separate entities from the content, and create source fragments that look like that – or should we show source as it really looks for many people who don't have such clever editors? And particularly, should we assume that the bidirectional algorithm is properly applied in the source editor, picking up cues from the markup, or not?
In most of our articles right-to-left text in code samples is represented by UPPERCASE TRANSLATIONS, and left-to-right text by lowercase. In this case, text in code samples reflects the direction of characters as stored in memory, rather than the displayed result. The original version of text in uppercase translations would be read from right-to-left.
Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. detail
Add dir="rtl"
to the html
tag any time the overall document direction is right-to-left. detail
Don't add dir="rtl"
to the body
tag. detail If you need to avoid the scroll bar moving on some browsers, put dir
on the head
element and a div
just inside the body
element. detail
Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. detail Except for very rare circumstances you should always use the Unicode encoding UTF-8. detail If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i
rather than ISO-8859-8
. detail
Do not use CSS styling to control directionality in HTML. Use markup. detail
Learn more about:
Add the dir
attribute to a block element to change base direction. detail Don't use CSS or Unicode control characters to control directionality in HTML. Use markup. detail
Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. detail
Learn more about:
Add dir="auto"
to input
tags to automatically align text to the correct side of an input field. detail
Add dir="auto"
to textarea
and pre
tags to make paragraphs align to the left or right according to the initial strong character. detail
Consider using the dirname
attribute to pass information to the server about the direction of text in a text or search form control. detail
Learn more about:
If you know the phrase's direction, or can work it out for injected text, tightly wrap every opposite-direction phrase in markup. Add the CSS shim to your style sheet, and use the dir
attribute on that markup. Be sure to nest markup to show the structure. detail
If you want to bullet-proof your code for browsers that don't support the CSS shim where tightly-wrapped text is followed inline by a number or a logically separate opposite-direction phrase, add ‏
or ‎
immediately after the phrase. detail
If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in bdi
(no dir
attribute needed), or if the phrase is tightly wrapped by an element already, just add dir="auto"
to that element. detail
Use Unicode control characters for bidirectional control only for attribute text or element text that allows no internal markup. detail
Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tool tips, page titles, or on JavaScript dialog boxes. detail
Do not leave white space at the end of inline elements that mark a directional boundary. detail
Learn more about:
Treat mirrored characters as if any word left
in the name meant 'opening', and right
meant 'closing'. detail
Learn more about:
Use the bdo
element to force the directionality of a sequence of inline characters. detail
Learn more about:
This Editor's Draft has been changed as follows:
Members of the Internationalization Working Group and former GEO Working Group have contributed their time and valuable comments to shaping these guidelines.