Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts

Intended audience: HTML/XHTML and CSS content authors implementing pages in right-to-left scripts such as Arabic and Hebrew, or having to deal with embedded right-to-left script text. This material is applicable whether you create documents in an editor, or via scripting.

Updated

This tutorial gathers together and organizes pointers to articles that, taken together, help you understand the essential aspects of how to work with languages in right-to-left scripts and bidirectional text when authoring HTML and CSS.

In a nutshell

This section is for people in a hurry who just want to know some of the key recommendations from the tutorial. If you don't understand something, or if you want more detail, work through the rest of the tutorial.

Add a dir attribute to the html tag to set the default base direction of your page if it is right-to-left. Use the dir attribute on block elements within the page only where you need to change the base direction.

For inline text, tightly wrap all opposite-direction phrases in markup that sets their base direction.

Use dir=auto to automatically set the base direction of form fields, pre elements or text inserted into the page. Use the dirname attribute if you need to pass information about the base direction of form input to the server.

Avoid using CSS or Unicode control codes for managing direction where you can use markup.

Use logical ordering of bidirectional text, rather than visual ordering, and let the Unicode Bidirectional Algorithm take the strain.

Parts of this material describe the latest thinking embodied in the HTML5 specification. It is important to note that the HTML5 specification is still not completely stable.

Definitions

Bidirectional text
In languages that use right-to-left scripts any embedded text from a left-to-right script and all numbers progress visually left-to-right within the right-to-left visual flow of the text. (Of course, English text on this page could also contain bidirectional text if it included, say, Arabic and Hebrew examples.)
Bidirectional text is commonplace in right-to-left scripts such as Arabic, Hebrew, Syriac, and Thaana. Numerous different languages are written with these scripts, including Arabic, Hebrew, Pashto, Persian, Sindhi, Syriac, Dhivehi, Urdu, Yiddish, etc.
Bidi
A short form for 'bidirectional'.
RTL
A short form for 'right-to-left'.
LTR
A short form for 'left-to-right'.
Base direction
In order for text to look right when an HTML page is displayed, we need to establish the directional context of that text. We refer to that directional context as the 'base direction'.
It is fundamentally important to establish the appropriate base direction for text so that the bidirectional algorithm produces the expected ordering of the text when displayed. Correct specification of the base direction also establishes a proper default alignment for the text.
In HTML the base direction is either set explicitly by the nearest parent element that uses the dir attribute, or, in the absence of such an attribute, the base direction is inherited from the default direction of the document, which is left-to-right (LTR).
Unicode Bidirectional Algorithm
The Unicode Bidirectional Algorithm (UBA), often referred to as just the 'bidi algorithm', is part of the Unicode Standard. It describes an algorithm used when determining the directionality for bidirectional Unicode text and is widely supported by web browsers and other applications. For the details, see Unicode Standard Annex #9.

Markup for text direction

In this section we cover the basics of markup for text direction. The first article deals with setting direction at a document and structural level. The second with inline text elements – this is somewhat more complicated than the former, because it is where you have to handle bidirectional text.

The third article describes the difference between visually and logically ordered text, in case you ever happen to come across the former. These days you are generally unlikely to have to deal with visually-ordered content.

Text direction and structural markup looks at basic usage of the dir attribute at the document level and for structural markup in HTML, eg. things like paragraphs, tables, and forms. It also looks at new developments in HTML5 for dealing with direction in form elements, pre elements and inserted text. It includes the following:

What you need to know about the bidi algorithm and inline markup begins by describing how the Unicode Bidirectional Algorithm works, in simple terms. This algorithm is the basis for directional control of text in all browsers, but it has its limitations, and those need to be met with markup. The article looks at the problems and proposes simple solutions. It includes the following:

There are still some final decisions pending, at the date of writing, about what the markup for inline text will be like in HTML5 due to recent developments in Unicode, and that information will be added in due course.

Visual vs. logical ordering of text compares visual vs. logical approaches to storing bidirectional text and makes the case for the logical model. It covers the following:

CSS and Unicode control characters

Generally speaking you should manage text direction in HTML using markup rather than CSS or Unicode control characters, although there are places where the latter is the only resort. These articles look into the reasons for this in detail.

CSS vs. markup for bidi support covers:

Unicode controls vs. markup for bidi support discusses why markup is better than Unicode control characters, where it is available. It covers:

Using Unicode controls for bidi text explains how to use Unicode control characters where they are the only option. It covers: