Inline markup and bidirectional text in HTML

Useful markup and control codes

NOTE! This section includes references to markup introduced by HTML5 that should simplify various aspects of handling inline bidi text. The new features are not yet fully implemented in all browsers. We point out what is new. Use the new features where you can, and encourage browsers developers to continue to implement them.

The `dir` attribute

The dir attribute sets the base direction for the content of an element.

To set the default direction of the whole HTML document to right-to-left, add dir="rtl" to the html tag. This will result in all elements in the document inheriting a base direction of RTL.

You can change the base direction for content within a page by surrounding that content with an element and adding a dir attribute to indicate the desired direction.

In principle, the right thing to do for every opposite-direction phrase is to set its base direction by using the dir attribute on an element tightly wrapping the phrase.

HTML5 changes the semantics of the dir attribute. In browsers that implement this change, the content of the element on which the dir attribute sits will be isolated, in terms of the bidi algorithm, from the content surrounding it. Wrapping the opposite-direction phrases in an element with a dir attribute, helps address some of the problems listed in the previous section; adding isolation helps resolve some more.

Check out the worked examples below to see how this works.

LRM/RLM

The visual order in which text is displayed can sometimes be modified using two invisible Unicode control characters: LRM (U+200E LEFT-TO-RIGHT MARK) which can be added to the source text using the character itself or the escapes ‎ or &lrm;, and RLM (U+200F RIGHT-TO-LEFT MARK), for which the escapes are ‏ or &rlm;). Each has the strong type indicated by its name, like an A or an א, but is invisible.

One use of LRM and RLM is to extend a directional run through neutral or weak characters at the start or end of an opposite-direction phrase, by putting a mark of the same direction as the phrase on the other side of those neutral or weak characters. You can see an example of how it works in the advanced usage notes for use case 1 below.

Another use is to separate an opposite-direction phrase from some neighboring but independent text that would otherwise be incorrectly treated as the same directional run (see use case 3 for a good example). To do this you can put between them a directional mark with the same directionality as the overall context.

In HTML5, where the dir attribute is isolating, both cases are better addressed by adding the dir attribute to an element wrapping the opposite-direction phrase, so there is really no need to use LRM/RLM. See below for details.

dir="auto"

HTML5 addresses another need: text dropped into a page, say from a database, when you don't know its base direction. Before HTML5, you could only set the dir attribute to ltr or rtl, and had to somehow determine yourself which of them was appropriate.

HTML5 provides a new value for the dir attribute: auto. The auto value tells the browser to look at the first strongly typed character in the element. If it's a right-to-left typed character such as a Hebrew or Arabic letter, the element will get a direction of rtl. If it's, say, a Latin character, the direction will be ltr.

There are corner cases where this may not give the desired outcome, but it should usually produce the desired result.

Note that the browser ignores any neutral or weak characters at the beginning of the text when looking for the first strong character. It also ignores anything inside a bdi element or an element with a dir tag of its own, including auto.

Like any other use of the dir attribute in HTML5, dir="auto" also directionally isolates its content from its surroundings.

The `bdi` element

HTML5 also introduces a new element, bdi (bidirectional isolate). It is just like a span except that, whether or not it is used with a dir attribute, it directionally isolates its content from the surrounding text; "bdi" stands for "bidirectional isolate".

bdi comes with the dir attribute set to the new auto value by default (see above), however it is also possible to use an explicit dir attribute on bdi with values set to ltr or rtl, if you know the direction of the phrase and just want to isolate it.

The choice of whether to attach dir="auto" on an existing element or to wrap the phrase in a bdi depends on whether you already have an inline element tightly wrapping the potentially opposite-direction phrase, and whether you happen to know the phrase's direction (or can guess at it better than the browser's dir="auto" logic).

The `bdo` element

The bdo (bidirectional override) element prevents the bidirectional algorithm from rearranging the sequence of characters it encloses, and allows you to display the sequence from right to left or from left to right in the order in which the characters are stored in memory.

There are important use cases for bdo, but they are rare. For more information see Overriding the algorithm below. Do not confuse this element with bdi, and do not use it for managing normal bidi text.

The CSS shim

The CSS shim can be applied when a browser supports the CSS needed to isolate text, but doesn't support isolation for the dir attribute. It was particularly useful during the transition period, while several major browsers had still not implemented isolation for the dir attribute but did support the necessary CSS. Now it is only useful for Safari (Edge never supported the required CSS), although hopefully it won't be needed for that browser either much longer.

Browsers that don't yet support the CSS will simply behave in the same way as before, but most recent versions of major browsers do support the desired behavior already.

The CSS shim is as follows:

[dir='ltr'], [dir='rtl'] { 
	unicode-bidi: -webkit-isolate;
	unicode-bidi: -moz-isolate;
	unicode-bidi: -ms-isolate;
	unicode-bidi: isolate;
	} 
bdo[dir='ltr'], bdo[dir='rtl'] {
	unicode-bidi: bidi-override; 
	unicode-bidi: -webkit-isolate-override; 
	unicode-bidi: -moz-isolate-override; 
	unicode-bidi: -ms-isolate-override; 
	unicode-bidi: isolate-override;
  	}

At the time of writing, all browser versions that support isolation in CSS also support the bdi element.

Steps for handling inline bidirectional text in HTML

Here we summarize default guidelines for working with bidirectional inline text. Often alternative approaches will work, but the approaches outlined here are simple to apply and should work for all cases.

When none of the problematic use cases apply, the approaches outlined here will not have any visible effect. But when one of them does apply, these approaches provide a simple solution, that doesn't require you to figure out specifically what the problem is.

Descriptions of the markup used can be found in the previous section. Following sections will provide worked examples. Some of the alternatives are also explored in the worked examples.

Tightly wrap opposite-direction phrases

The best way to begin to address bidi issues in your content is to tightly wrap every opposite-direction phrase in markup that sets its base direction. By tightly wrap, we mean that the element contains the entire opposite-direction phrase, and nothing but the opposite-direction phrase.

If you know the direction of the phrase

Most of the potential problems pointed out earlier simply melt away when you add a dir attribtue to tightly wrapped phrases of opposite-direction text. When browsers that support the HTML5 specification encounter the dir attribute on an element they isolate the text inside the element from the text surrounding it.

If the phrase is already tightly wrapped in an inline element, you can use the existing element for this purpose. If not, add a span element.

Examples for text in a left-to-right context

1	Before:	`<p>`ltr-text RTL-TEXT`</p>`
	After:	`<p>`ltr-text `<span dir=rtl>`RTL-TEXT`</span></p>`
2	Before:	`<p>`ltr-text `<cite>`RTL-TEXT`</cite></p>`
	After:	`<p>`ltr-text `<cite dir=rtl>`RTL-TEXT`</cite></p>`
3	Before:	`<p>`ltr-text `<cite>`RTL-TEXT ltr-text-in-rtl`</cite></p>`
	After:	`<p>`ltr-text `<cite dir=rtl>`RTL-TEXT `<span dir=ltr>`ltr-text-in-rtl`</span></cite></p>`

Bulletproofing your code for legacy browsers.

You can bulletproof this approach to handle Edge and Safari or other legacy browsers which still don't directionally isolate elements with a dir attribute. Having tightly-wrapped the phrase as just shown, add a directional mark (RLM or LRM) immediately after the markup of that phrase. Choose one that matches the direction of the surrounding context. This is mostly needed when the phrase is followed by a number or a separate opposite-direction phrase (ie. in a list), but you can add a directional mark matching the direction of the context after every tightly-wrapped phrase.

Examples for text in a left-to-right context

1	Before:	`<p>`ltr-text `<span dir=rtl>`RTL-TEXT`</span> 1234</p>`
	After:	`<p>`ltr-text `<span dir=rtl>`RTL-TEXT`</span>&lrm; 1234</p>`
2	Before:	`<p>`ltr-text `<span dir=rtl>`RTL-TEXT-1`</span>, <span dir=rtl>`RTL-TEXT-2`</span></p>`
	After:	`<p>`ltr-text `<span dir=rtl>`RTL-TEXT-1`</span>&lrm;, <span dir=rtl>`RTL-TEXT-2`</span></p>`

If you don't know the direction of the phrase

When text will be added at run time to your HTML page you may not be able to predict the base direction of the injected text in advance. To handle this eventuality you have two options.

If the phrase is tightly wrapped by an element already, you could just add dir="auto" to that element. This directionally isolates the element's text and looks at the first strong character to determine what base direction to apply.

Otherwise, wrap the phrase in a bdi element (or in a span element with dir set to auto, if you prefer.) Without a dir attribute, the bdi element behaves as if dir="auto" had been applied.

Examples of how to cater for text that is inserted at runtime

1	Before:	`<p>`static-text `<cite>`injected-text`</cite></p>`
	After:	`<p>`static-text `<cite dir="auto">`injected-text`</cite></p>`
2	Before:	`<p>`static-text injected-text`</p>`
	After:	`<p>`static-text `<bdi>`injected-text`</bdi></p>` or `<p>`static-text `<span dir=auto>`injected-text`</span></p>`

Bulletproofing your code for legacy browsers.

The isolation can be addressed using LRM/RLM as described above, but the key benefit of the bdi element or dir="auto" is that they can guess at the appropriate base direction for the inserted text.

Unfortunately, there isn't really an alternative way to automatically determine the base direction for the inserted text, other than to use scripting.

Worked examples for static use cases

In this section we will look at how to write code that addresses various use cases where the content is written by the author. The section following this deals with use cases where content is injected into the page.

Use case 1: Nested bidi

In this example a right-to-left book title is embedded in a left-to-right context, and the book title itself contains an embedded left-to-right phrase. Here is the code without any additional bidi markup:

What one would expect to see is:

Unfortunately, the bidirectional algorithm cannot tell where the boundaries of the nested changes in base direction should be. The result, without help in the markup, is:

Fixing use case 1

To address this in HTML5, if there is no other markup around the opposite-direction phrases, wrap both in markup with the appropriate dir value. (Note, by the way, how the markup appears inside the quotation marks, which are part of the English text.)

the title is "AN INTRODUCTION TO c++" in arabic.

RLI	U+2067 RIGHT-TO-LEFT ISOLATE	Sets direction to rtl
LRI	U+2066 LEFT-TO-RIGHT ISOLATE	Sets direction to ltr
FSI	U+2068 FIRST STRONG ISOLATE	Sets direction according to the first strong character
PDI	U+2069 POP DIRECTIONAL ISOLATE	Terminates the range set by RLI, LRI or FSI