Inline markup and bidirectional text in HTML

Intended audience: content developers working with right-to-left scripts, HTML and SVG coders (using editors or scripting), script developers (PHP, JSP, etc.), schema developers (DTDs, XML Schema, RelaxNG, etc.), and anyone who is struggling to understand how to make their mixed direction text look right in markup.

Many examples in this document are shown as images to avoid problems for those with a browser that doesn't produce what was intended or doesn't have non-ASCII fonts.

Code samples containing Arabic and Hebrew text may be displayed in different ways depending on which editor is used. In this article right-to-left text in code samples is represented by UPPERCASE TRANSLATIONS, and left-to-right text by lowercase. All text in code samples reflects the direction of characters as stored in memory, rather than the displayed result. The original version of text in uppercase translations would be read from right-to-left.

To see the full source, click on the "Test in your browser" links and view the source of the page that displays.

It is common for content in Arabic, Hebrew, and other languages that use right-to-left scripts to include numerals or include text from other scripts. Both of these typically flow left-to-right within the overall right-to-left context.

This article tells you how to write HTML where text with different writing directions is mixed within a paragraph or other HTML block (ie. inline or phrasal content). (A companion article Structural markup and right-to-left text in HTML tells you how to use HTML markup for elements such as html, and structural markup such as p or div and forms.)

Just tell me what I need to do

If you know the direction of all the text involved, tightly wrap every opposite-direction phrase in markup, and use the dir attribute on that markup. Be sure to nest markup to show the structure.

<p>the title is <cite dir="rtl">AN INTRODUCTION TO <span dir="ltr">c++</span></cite> in arabic.</p>

If you want to bullet-proof your code for legacy or non-conformant browsers where tightly-wrapped text is followed inline by a number or a logically separate opposite-direction phrase, add &rlm; or &lrm; (choose the one corresponding to the base direction of the surrounding text) immediately after the phrase.

<p>we find the phrase '<span dir="rtl">INTERNATIONALIZATION ACTIVITY</span>&lrm;' 5 times on the page.</p>

If you don't know the direction of text that will be inserted at run time, add dir=auto to any markup that tightly wraps the location. If there is no markup, wrap the location with a bdi element.

foreach $restaurant echo "<p><bdi>$restaurant['name']</bdi> - $restaurant['count'] reviews</p>";

Tell me more

The article first describes basic principles underlying how the Unicode bidirectional algorithm works. Then it looks at some of the more common scenarios where the bidi algorithm requires assistance through the addition of markup or control codes. It is written in a tutorial style that helps the reader with little or no background in handling bidirectional text progress from one concept to the next.

If you want just a little more detail, jump to the section Steps for handling inline bidirectional text in HTML.

If you'd like to understand inline bidirectional text better, and see worked examples, read the rest of this article.

The article is focused on markup usage in HTML, but most of the concepts are also relevant for other markup languages.

How the bidi algorithm works

If you're not really familiar with the Unicode Bidirectional Algorithm, then before reading further you should read the basic introduction to how the bidi algorithm works.

Where the bidi algorithm needs help

In the sections below, we will examine specific examples of what can go wrong, why it goes wrong, and what fixes it. Nevertheless, it is important to realize that, basically, the problems all occur when content in one direction includes an inline phrase in the opposite direction. We will call these opposite-direction phrases. An opposite-direction phrase may be a single directional run (such as a word), or may be a set of embedded directional runs with changes in base direction.

In the following example, the English sentence contains an opposite-direction phrase between the quotation marks. That phrase, itself, also contains an opposite-direction phrase (the word C++), and an exclamation mark that has to appear at the end of the Arabic phrase. The arrows show the opposite-direction phrases.

Displayed result of previous code

Common examples of such phrases include quotations, titles of books, articles or plays, formatted numbers (e.g. phone numbers and MAC addresses), street and email addresses, and various names, such as brand names, acronyms, part numbers, site names, place names, file names (and paths), etc.

The problem is worse in applications that drop text into a page, say from a database. The application often does not know a-priori whether such text is (or perhaps contains) an opposite-direction phrase, and has to estimate its direction at run-time by checking the Unicode ranges of its characters. HTML5 introduces a feature for doing so in the browser.

Whenever an opposite-direction phrase occurs, things can go wrong. That is, something will go wrong if the text includes, without any special "wrapping", an inline opposite-direction phrase that:

Although this list seems daunting, there is no need to determine which, if any, of these cases applies to a particular phrase. There is a simple, default way of "wrapping" opposite-direction phrases that will prevent problems in all of the cases above, and do no harm when none of them apply. We will describe how do such wrapping for HTML5-aware browsers and for others.

Useful markup and control codes

NOTE! This section includes references to markup introduced by HTML5 that should simplify various aspects of handling inline bidi text. The new features are not yet fully implemented in all browsers. We point out what is new. Use the new features where you can, and encourage browsers developers to continue to implement them.

The dir attribute

The dir attribute sets the base direction for the content of an element.

To set the default direction of the whole HTML document to right-to-left, add dir="rtl" to the html tag. This will result in all elements in the document inheriting a base direction of RTL.

You can change the base direction for content within a page by surrounding that content with an element and adding a dir attribute to indicate the desired direction.

In principle, the right thing to do for every opposite-direction phrase is to set its base direction by using the dir attribute on an element tightly wrapping the phrase.

HTML5 changes the semantics of the dir attribute. In browsers that implement this change, the content of the element on which the dir attribute sits will be isolated, in terms of the bidi algorithm, from the content surrounding it. Wrapping the opposite-direction phrases in an element with a dir attribute, helps address some of the problems listed in the previous section; adding isolation helps resolve some more.

Check out the worked examples below to see how this works.

LRM/RLM

The visual order in which text is displayed can sometimes be modified using two invisible Unicode control characters: LRM (U+200E LEFT-TO-RIGHT MARK) which can be added to the source text using the character itself or the escapes &#x200E; or &lrm;, and RLM (U+200F RIGHT-TO-LEFT MARK), for which the escapes are &#x200F; or &rlm;). Each has the strong type indicated by its name, like an A or an א, but is invisible.

One use of LRM and RLM is to extend a directional run through neutral or weak characters at the start or end of an opposite-direction phrase, by putting a mark of the same direction as the phrase on the other side of those neutral or weak characters. You can see an example of how it works in the advanced usage notes for use case 1 below.

Another use is to separate an opposite-direction phrase from some neighboring but independent text that would otherwise be incorrectly treated as the same directional run (see use case 3 for a good example). To do this you can put between them a directional mark with the same directionality as the overall context.

In HTML5, where the dir attribute is isolating, both cases are better addressed by adding the dir attribute to an element wrapping the opposite-direction phrase, so there is really no need to use LRM/RLM. See below for details.

dir="auto"

HTML5 addresses another need: text dropped into a page, say from a database, when you don't know its base direction. Before HTML5, you could only set the dir attribute to ltr or rtl, and had to somehow determine yourself which of them was appropriate.

HTML5 provides a new value for the dir attribute: auto. The auto value tells the browser to look at the first strongly typed character in the element. If it's a right-to-left typed character such as a Hebrew or Arabic letter, the element will get a direction of rtl. If it's, say, a Latin character, the direction will be ltr.

There are corner cases where this may not give the desired outcome, but it should usually produce the desired result.

Note that the browser ignores any neutral or weak characters at the beginning of the text when looking for the first strong character. It also ignores anything inside a bdi element or an element with a dir tag of its own, including auto.

Like any other use of the dir attribute in HTML5, dir="auto" also directionally isolates its content from its surroundings.

The bdi element

HTML5 also introduces a new element, bdi (bidirectional isolate). It is just like a span except that, whether or not it is used with a dir attribute, it directionally isolates its content from the surrounding text; "bdi" stands for "bidirectional isolate".

bdi comes with the dir attribute set to the new auto value by default (see above), however it is also possible to use an explicit dir attribute on bdi with values set to ltr or rtl, if you know the direction of the phrase and just want to isolate it.

The choice of whether to attach dir="auto" on an existing element or to wrap the phrase in a bdi depends on whether you already have an inline element tightly wrapping the potentially opposite-direction phrase, and whether you happen to know the phrase's direction (or can guess at it better than the browser's dir="auto" logic).

The bdo element

The bdo (bidirectional override) element prevents the bidirectional algorithm from rearranging the sequence of characters it encloses, and allows you to display the sequence from right to left or from left to right in the order in which the characters are stored in memory.

There are important use cases for bdo, but they are rare. For more information see Overriding the algorithm below. Do not confuse this element with bdi, and do not use it for managing normal bidi text.

The CSS shim

The CSS shim can be applied when a browser supports the CSS needed to isolate text, but doesn't support isolation for the dir attribute. It was particularly useful during the transition period, while several major browsers had still not implemented isolation for the dir attribute but did support the necessary CSS. Now it is only useful for Safari (Edge never supported the required CSS), although hopefully it won't be needed for that browser either much longer.

Browsers that don't yet support the CSS will simply behave in the same way as before, but most recent versions of major browsers do support the desired behavior already.

The CSS shim is as follows:

[dir='ltr'], [dir='rtl'] { 
	unicode-bidi: -webkit-isolate;
	unicode-bidi: -moz-isolate;
	unicode-bidi: -ms-isolate;
	unicode-bidi: isolate;
	} 
bdo[dir='ltr'], bdo[dir='rtl'] {
	unicode-bidi: bidi-override; 
	unicode-bidi: -webkit-isolate-override; 
	unicode-bidi: -moz-isolate-override; 
	unicode-bidi: -ms-isolate-override; 
	unicode-bidi: isolate-override;
  	}

At the time of writing, all browser versions that support isolation in CSS also support the bdi element.

Steps for handling inline bidirectional text in HTML

Here we summarize default guidelines for working with bidirectional inline text. Often alternative approaches will work, but the approaches outlined here are simple to apply and should work for all cases.

When none of the problematic use cases apply, the approaches outlined here will not have any visible effect. But when one of them does apply, these approaches provide a simple solution, that doesn't require you to figure out specifically what the problem is.

Descriptions of the markup used can be found in the previous section. Following sections will provide worked examples. Some of the alternatives are also explored in the worked examples.

Tightly wrap opposite-direction phrases

The best way to begin to address bidi issues in your content is to tightly wrap every opposite-direction phrase in markup that sets its base direction. By tightly wrap, we mean that the element contains the entire opposite-direction phrase, and nothing but the opposite-direction phrase.

If you know the direction of the phrase

Most of the potential problems pointed out earlier simply melt away when you add a dir attribtue to tightly wrapped phrases of opposite-direction text. When browsers that support the HTML5 specification encounter the dir attribute on an element they isolate the text inside the element from the text surrounding it.

If the phrase is already tightly wrapped in an inline element, you can use the existing element for this purpose. If not, add a span element.

Examples for text in a left-to-right context
1 Before: <p>ltr-text RTL-TEXT</p>
After: <p>ltr-text <span dir=rtl>RTL-TEXT</span></p>
2 Before: <p>ltr-text <cite>RTL-TEXT</cite></p>
After: <p>ltr-text <cite dir=rtl>RTL-TEXT</cite></p>
3 Before: <p>ltr-text <cite>RTL-TEXT ltr-text-in-rtl</cite></p>
After: <p>ltr-text <cite dir=rtl>RTL-TEXT <span dir=ltr>ltr-text-in-rtl</span></cite></p>

Bulletproofing your code for legacy browsers.

You can bulletproof this approach to handle Edge and Safari or other legacy browsers which still don't directionally isolate elements with a dir attribute. Having tightly-wrapped the phrase as just shown, add a directional mark (RLM or LRM) immediately after the markup of that phrase. Choose one that matches the direction of the surrounding context. This is mostly needed when the phrase is followed by a number or a separate opposite-direction phrase (ie. in a list), but you can add a directional mark matching the direction of the context after every tightly-wrapped phrase.

Examples for text in a left-to-right context
1 Before: <p>ltr-text <span dir=rtl>RTL-TEXT</span> 1234</p>
After: <p>ltr-text <span dir=rtl>RTL-TEXT</span>&lrm; 1234</p>
2 Before: <p>ltr-text <span dir=rtl>RTL-TEXT-1</span>, <span dir=rtl>RTL-TEXT-2</span></p>
After: <p>ltr-text <span dir=rtl>RTL-TEXT-1</span>&lrm;, <span dir=rtl>RTL-TEXT-2</span></p>

If you don't know the direction of the phrase

When text will be added at run time to your HTML page you may not be able to predict the base direction of the injected text in advance. To handle this eventuality you have two options.

If the phrase is tightly wrapped by an element already, you could just add dir="auto" to that element. This directionally isolates the element's text and looks at the first strong character to determine what base direction to apply.

Otherwise, wrap the phrase in a bdi element (or in a span element with dir set to auto, if you prefer.) Without a dir attribute, the bdi element behaves as if dir="auto" had been applied.

Examples of how to cater for text that is inserted at runtime
1 Before: <p>static-text <cite>injected-text</cite></p>
After: <p>static-text <cite dir="auto">injected-text</cite></p>
2 Before: <p>static-text injected-text</p>
After: <p>static-text <bdi>injected-text</bdi></p>
or
<p>static-text <span dir=auto>injected-text</span></p>

Bulletproofing your code for legacy browsers.

The isolation can be addressed using LRM/RLM as described above, but the key benefit of the bdi element or dir="auto" is that they can guess at the appropriate base direction for the inserted text.

Unfortunately, there isn't really an alternative way to automatically determine the base direction for the inserted text, other than to use scripting.

Worked examples for static use cases

In this section we will look at how to write code that addresses various use cases where the content is written by the author. The section following this deals with use cases where content is injected into the page.

Use case 1: Nested bidi

In this example a right-to-left book title is embedded in a left-to-right context, and the book title itself contains an embedded left-to-right phrase. Here is the code without any additional bidi markup:

 Bad code. Don't copy!

<p>the title is "AN INTRODUCTION TO c++" in arabic.</p>

What one would expect to see is:

Displayed result of previous code

Unfortunately, the bidirectional algorithm cannot tell where the boundaries of the nested changes in base direction should be. The result, without help in the markup, is:

Displayed result of previous code

Fixing use case 1

To address this in HTML5, if there is no other markup around the opposite-direction phrases, wrap both in markup with the appropriate dir value. (Note, by the way, how the markup appears inside the quotation marks, which are part of the English text.)

<p>the title is "<span dir="rtl">AN INTRODUCTION TO <span dir="ltr">c++</span></span>" in arabic.</p>

It is important to note that each phrase is nested. Just wrapping the Arabic in one span followed by a span containing the C++ would result in no improvement at all.

advanced usage notes: Note that two elements with dir are needed in this case. This is because there are two opposite-direction phrases. If only one was used, like this:

 Bad code. Don't copy!

<p>the title is "<span dir="rtl">AN INTRODUCTION TO c++</span>"</p>

the displayed text would be as shown below. This moves the C++ to the left, as needed, but the + signs appear on the wrong side of the C.

Displayed result of previous code

This fails because the "C++" is an opposite-direction (LTR) phrase within the title, ending in neutral characters and the phrase is now being displayed with an RTL base direction. The bidi algorithm has no way of knowing that the plus signs are part of an LTR phrase, not of the RTL context, and thus displays them to the left of the "C" instead of to its right.

To solve this problem, wrap the overall RTL phrase in a <span dir="rtl">, and the LTR phrase nested inside it in its own <span dir="ltr">, as shown.

If there is already suitable markup to surround the book title, such as a cite element, add the dir attribute to that.

<p>the title is <cite dir="rtl">AN INTRODUCTION TO <span dir="ltr">c++</span></cite> in arabic.</p>

advanced usage notes: If the "C++" in this example was an ordinary Latin-script word, such as "Python" you wouldn't actually need to mark it up to get the right display. The bidi algorithm would take care of it. However marking up text in this way avoids you having to understand why these two cases are different, and having to work out which case applies for your content.

Similarly, if the title contained no embedded left-to-right text, you wouldn't actually need directional markup at all, but adding it avoids possible issues related to following inline text, such as where the text is edited to add a following number or another title, like this:

<p>the titles are <cite dir="rtl">AN INTRODUCTION TO ARABIC</cite>, <cite dir="rtl">FIRST STEPS IN URDU</cite>, and <cite dir="rtl">MASTERING HEBREW</cite>.</p>

Bulletproofing for legacy browsers

The solution outlined for HTML5 aware browsers will work equally well for browsers that don't support HTML5 features.

advanced usage notes: As noted earlier, one use of LRM and RLM is to extend a directional run through neutral or weak characters at the start or end of an opposite-direction phrase, by putting a mark of the same direction as the phrase on the other side of those neutral or weak characters. For this example, instead of wrapping the "C++" in a <span dir="ltr">, we could add &lrm; after the second plus:

<p>the title is <cite dir="rtl">AN INTRODUCTION TO c++&lrm;</cite></p>

The result is what we need:

Displayed result of previous code

Because the LRM is a strongly left-to-right character, the neutral pluses are now between two strong left-to-right characters (the C and the LRM). They therefore also become left-to-right in direction, making a single directional run of the four characters.

Used this way, however, LRM and RLM are a bit like gotos in programming languages: a quick hack that, unlike the dir attribute, says nothing about the structure of the text. And they simply cannot be used to deal with an opposite-direction phrase that happens to contain a nested phrase in the original direction, like our complete "Introduction to C++" example above. That may seem like an esoteric case, but it is surprisingly common when displaying right-to-left data in a left-to-right page, because the use of left-to-right words (like "C++") is not uncommon in right-to-left text.

So, if you don't want to analyze whether LRM and RLM can replace the use of the dir attribute in your case, just use the dir attribute.

Use case 2: Following numbers

In the next example, the opposite-direction phrase is followed by a logically separate number. This is the code without any bidi markup:

 Bad code. Don't copy!

<p>we find the phrase 'INTERNATIONALIZATION ACTIVITY' 5 times on the page.</p>

You would expect to see:

Displayed result of previous code

You would actually see:

Displayed result of previous code

This happens because the bidi algorithm tells the browser to treat the "5″ as part of the Hebrew text. This is not appropriate here though. We need to find a way to say that the name and the number are separate things, ie. to isolate the inserted name from the number.

Fixing use case 2

Wrap the opposite-direction phrase (the title) in markup and add the appropriate dir value. There is no need to add anything else, since the dir attribute automatically isolates its content.

<p>we find the phrase '<span dir="rtl">INTERNATIONALIZATION ACTIVITY</span>' 5 times on the page.</p>

If there is already suitable markup to surround the book title, such as an a element, add the dir attribute to it.

<p>we find the phrase '<a href="..." dir="rtl">INTERNATIONALIZATION ACTIVITY</a>' 5 times on the page.</p>

Bulletproofing for legacy browsers

For browsers where dir doesn't isolate, you would fix this by not only adding the markup around the opposite direction, Hebrew text, but adding also an LRM character after it. That would prevent the number being associated with the right-to-left text.

<p>we find the phrase '<span dir="rtl">INTERNATIONALIZATION ACTIVITY</span>' &lrm;5 times on the page.</p>

If the search string was already tightly wrapped by an element, use that element tag to add the dir attribute, and add the LRM character after it.

Of course, if the overall context is right-to-left, eg. Arabic/Hebrew/etc. text, and the book title was in English, you would need to add an RLM character rather than an LRM character.

Use case 3: Lists

Neutrals between same directional runs can sometimes be misinterpreted by the bidi algorithm. In this use case we have several country names in Arabic listed in a LTR paragraph. This is an example of an opposite-direction phrase followed by another, but logically separate, opposite-direction phrase. Here is the source code without any bidi markup:

 Bad code. Don't copy!

<p>the names of these states in arabic are EGYPT, BAHRAIN and KUWAIT respectively.</p>

We expect to see the following:

Egypt appears to the left of Bahrain.

In the actual result, the first two Arabic words are reversed and the intervening comma is moved to the right side of the space between the words.

Bahrain appears to the left of Egypt.

The reason for the failure is that, with a strongly typed right-to-left (RTL) character on either side, the bidirectional algorithm sees the neutral comma as part of the Arabic text. It is interpreting the first two Arabic words and the comma as a single directional run in Arabic. In fact it is part of the English text, and should mark the boundary between the two separate right-to-left directional runs in Arabic.

The solution for this use case is similar to that for the previous use case, so we will keep the notes below brief, and assume that you have read the solutions for use cases 1 and 2. We will present just the default markup approach.

Fixing use case 3

Simply wrap each Arabic word with markup and add the appropriate dir value.

<p>the names of these states in arabic are <span dir="rtl">EGYPT</span>, <span dir="rtl">BAHRAIN</span> and <span dir="rtl">KUWAIT</span> respectively.</p>

If there is already markup surrounding the Arabic text, such as an a element, add the dir attribute to it.

<p>the names of these states in arabic are <a href="..." dir="rtl">EGYPT</a>, <a href="..." dir="rtl">BAHRAIN</a> and <a href="..." dir="rtl">KUWAIT</a> respectively.</p>

Bulletproofing for legacy browsers

Add markup around the Arabic text, but add also an LRM character after it whenever that text is followed by another opposite-direction phrase. Use an RLM character if the surrounding context is right-to-left.

<p>the names of these states in arabic are <span dir="rtl">EGYPT</span>&lrm;, <span dir="rtl">BAHRAIN</span> and <span dir="rtl">KUWAIT</span> respectively.</p>

As before, if the Arabic text was already tightly wrapped by an element, use that element tag to add the dir attribute.

Worked examples for dynamic use cases

In this section we will look at use cases that involve injecting content into a page at run time.

It is important to note that we cannot address markup inside the injected content. In all cases below, if the injected phrases contain embedded opposite-direction phrases themselves, these need to be already marked up when the phrase is injected into the page, either in the database, or added by scripting when the injected phrase is fetched. If this is not done, the injected text will look alright for simple cases, but may be problematic for more complex ones.

Use case 4: Nested bidi

In the article Structural markup and right-to-left text in HTML there is an example of a page for an online book store that carries books in many languages and needs to display the original book titles regardless of the language of the user interface. Thus, a Hebrew or Arabic book title may appear in an English interface, and vice-versa.

Let us suppose that you searched for the book הצהרות קידוד תװי CSS and and let's further suppose that that book wasn't found. The bookstore might generate a message that says so. The image below shows what one would expect to see.

Book not found message.

Note how the 'CSS' is to the left of the Hebrew text because it is part of the book title. However with the following source code ...

 Bad code. Don't copy!

<p>your search - <cite class="booktitle">CHARACTER ENCODING IN css</cite> - did not match any documents.</p>

... here is the actual result. Note how the 'CSS' is now on the right of the Hebrew text.

Book not found message.

Fixing use case 4

The default rule when there is no other element around the injected text, is to wrap it in bdi.

<p>your search - <bdi><?php echo $theString; ?></bdi> - did not match any documents.</p>

The bdi tag automatically assigns a direction based on the first strong character in the injected string.

advanced usage notes:It is possible that the search string in this example begins with a strong left-to-right character, for example, if the book title that we are searching for begins with 'CSS', rather than ending with it. In that case, there is not much we can do by default in the markup. To cover this case you would have to use scripting to detect the direction of the string as a whole and apply that to the markup.

If there is another element around the injected text, use dir="auto" or wrap the injected phrase in bdi.

<p>your search - <cite dir="auto"><?php echo $theString; ?></cite> - did not match any documents.</p>

<p>your search - <cite><bdi><?php echo $theString; ?></bdi></cite> - did not match any documents.</p>

Bulletproofing for legacy browsers

Without HTML5 markup behavior we can't really address this use case using markup, since we need to know in advance the direction of the text. This can only be achieved by knowing the direction of or examining the injected phrase before insertion, and applying the appropriate directional information by scripting.

Use case 5: Following numbers

Here's an example where the names of restaurants are added to a page from a database and followed by a number. You don't know in advance the directionality of the injected text. This is the code produced by the script that injects the phrases, without bidi markup:

 Bad code. Don't copy!
<p><span class="name">aroma</span> - 3 reviews</p>
<p><span class="name">PURPLE PIZZA</span> - 5 reviews</p>
<p><span class="name">PURPLE PIZZA roma</span> - 3 reviews</p>

And here's what one would expect to see, and what you'd actually see.

What it should look like.

AZZIP ELPRUP - 5 reviews

What it actually looks like.

5 - AZZIP ELPRUP reviews

The problem with the second restaurant name arises because the browser thinks that the " – 5″ is part of the Hebrew text. This is what the Unicode Bidi Algorithm tells it to do, and usually it is correct. Not here though. We need to find a way to say that the name and the number are separate things, ie. to isolate the inserted name from the number.

In the third restaurant name the number is back in the right place, but the word 'Roma' is part of the Hebrew name, and should appear to the left of the Hebrew text. In other words, we need to apply a base direction of RTL to the whole of the injected text.

Fixing use case 5

Once again, the default rule when there is no other element around the injected text, is to wrap it in bdi. The bdi element automatically isolates the injected phrase from the number, and sets the direction for the phrase based on its first strong character.

foreach $restaurant echo "<p><bdi>$restaurant['name']</bdi> - $restaurant['count'] reviews</p>";

The bdi tag automatically assigns a direction based on the first strong character in the injected string.

You'll notice that the example above puts bdi around the name Aroma too. Of course, you don't actually need that, but it won't do any harm. On the other hand, it simplifies the necessary script code, and means you can handle any name that comes out of the database, whatever script it is in.

If there is another element around the injected text, use dir="auto".

foreach $restaurant echo "<p><a href='...' dir='auto' class='name'>$restaurant['name']</a> - $restaurant['count'] reviews</p>";

Bulletproofing for legacy browsers

Again, without HTML5 markup behavior, all we can do is add a LRM character after the injected phrase, to ensure that it is isolated from the number. This would be sufficient to correctly render the second item in the list, because it is a very simple case, with no embedded opposite-direction phrases or neutral characters. The third case, however, will not work so well, since the base direction has to be set to right-to-left for the word 'Roma' to appear on the left. This can only be properly rendered if the injected phrase has markup added to it before insertion.

The code would look something like this.

foreach $restaurant echo "<p><span class='name' dir='auto'>$restaurant['name']</span>&lrm; - $restaurant['count'] reviews</p>";

Additional examples

Use case 6: Punctuation at the end of an opposite-direction phrase

It is a very common situation for punctuation or some other neutral character to appear at the end of an opposite direction phrase and belong with that phrase.

Unfortunately, such neutrals between different directional runs are typically misinterpreted unless there is additional bidi markup. In the following example, the exclamation mark should appear at the end of the Arabic text, ie. to the left, like this:

An exclamation mark appearing to the left of Arabic text.

Unfortunately, if we rely solely on the bidirectional algorithm we see this:

An exclamation mark appearing to the right of Arabic text.

Given our understanding of the bidi algorithm we can easily understand why this happened. Because the exclamation mark was typed in between the last RTL letter 'ب' (on the left)‌ and the LTR letter 'i' (of the word 'in') its directionality is determined by the base direction of the paragraph, ie. LTR in this case.

Because the exclamation mark is seen as LTR it joins the directional run that includes the text 'in Arabic'.

Fixing use case 6 when the direction is known

The general solution mentioned above works fine: just put the opposite-direction phrase in an element with a dir attribute. If there isn't already an element present, use a span.

<p>the title is "<cite dir="rtl" lang="ar">INTERNATIONALIZATION ACTIVITY!</cite>" in arabic.</p>

advanced usage notes:You could also simply place an RLM after the exclamation mark, but we have already discussed earlier why that is a less ideal fix. Note, also, that when using this solution, without markup, the Arabic text is not marked up for language or styling. Adding markup around the embedded title is probably a better way to solve the problem.

Fixing use case 6 for injected text

Use bdi if there isn't already a surrounding element, otherwise put a dir="auto" on the surrounding element.

<p>the title is "<bdi lang="ar">INTERNATIONALIZATION ACTIVITY!</bdi>" in arabic.</p>

<p>the title is "<cite dir="auto" lang="ar">INTERNATIONALIZATION ACTIVITY!</cite>" in arabic.</p>

Use case 7: Telephone numbers, MAC addresses, etc.

The picture below shows the expected result of displaying a telephone number in a right-to-left context, where the area code is surrounded by parentheses, and where the number appears at the beginning of a line or after some right-to-left text.

Telephone number correctly ordered.

The next picture shows what you actually see, if you rely solely on the bidi algorithm.

Telephone number incorrectly ordered.

Because these are numbers, the order applied by the bidirectional algorithm is slightly different from what we've seen before, but the fix is essentially the same.

Here is another, somewhat more problematic example of the same thing. The picture below shows a MAC address number as you would expect to see it displayed in a right-to-left context. The sequence 01:02:aa:4a:bb:06 looks exactly the same as it would in a left-to-right context.

MAC address correctly ordered.

Here, however, is what you will see when relying solely on the bidirectional algorithm.

MAC address incorrectly ordered.

This is particularly worrisome, since it's not obvious when the order is incorrect. Even if you did know it was incorrect, it is not at all clear how it should be read.

Although there are more characters involved, this problem is caused because the bidirectional algorithm assumes that the initial run of numbers (and colons, since they are neutral) are associated with the preceding Hebrew text, rather than part of the MAC address.

This example indicates that you should always wrap MAC addresses, and similar numbers, with directional information.

Fixing use case 7 when the direction is known

The solution is the same. Put the opposite-direction phrase in an element with a dir attribute. If there isn't already an element present, use a span. The following code would be used in an overall right-to-left context.

<p>... <span dir="ltr">(012) 345 6789</span> ...</p>

<p>כתובת <span dir="ltr">‎‎01:02:aa:4a:bb:06</span> ...</p>

Fixing use case 7 for injected text

Use bdi if there isn't already a surrounding element, or put dir="auto" on a surrounding element. We just show the simplest case here. The following code would be used in an overall right-to-left context.

<p>...<bdi>(012) 345 6789</bdi> ...</p>

<p>כתובת <bdi>‎‎01:02:aa:4a:bb:06</bdi> ...</p>

advanced usage notes:You could also solve both of these cases by simply inserting an RLM immediately before the number. Adding markup around the number is probably a safer way to solve the problem.

What if I can't use markup?

There are some situations where you may not be able to use the markup described in the previous section. In HTML these include the title element and any attribute value.

In these situations you have to use the invisible Unicode characters that produce the same results.

To replicate the effect of the markup described in the example above related to nested base directions, we can use pairs of characters to surround the embedded text. The first character is one of U+202B RIGHT-TO-LEFT EMBEDDING (RLE) or U+202A LEFT-TO-RIGHT EMBEDDING (LRE). This corresponds to the pre-HTML5 markup <span dir="rtl"> or <span dir="ltr">, respectively, ie. they do not isolate. The second character is U+202C POP DIRECTIONAL FORMATTING (PDF). This corresponds to the </span> in the markup. Here's an example.

<title>the title says "&#x202B;INTERNATIONALIZATION ACTIVITY, w3c&#x202C;" in hebrew.</title>

These control characters should only be used for inline phrases, not for block elements such as paragraphs. In general, it is recommended that you use markup where it is available, rather than these character pairs, because it is easier to see and therefore manage the markup, and it is consistent with the approach used for block elements. Where markup is not available, of course, this is the only option.

The two characters we already met in the above text, U+200F RIGHT-TO-LEFT MARK (RLM) and U+200E LEFT-TO-RIGHT MARK (LRM) can also be used, where appropriate.

<title>the title says "INTERNATIONALIZE THE WEB!&#x200F;" in arabic.</title>

If isolation is necessary, either within the text or when the text is used with surrounding content, in addition to RLE/LRE...PDF, you may also need to add the LRM or RLM marks as described in the section about legacy browser support.

Note that in the example just shown the Arabic text is no longer marked up for language or styling. Also, because the character is invisible you may prefer to actually type in a numeric character reference (&#x200F;) as we did here, or, if available, a character entity (such as &rlm; in HTML).

From Unicode version 6.3 onwards, the Unicode Standard contains new control codes (RLI, LRI, FSI, PDI) to enable authors to express isolation at the same time as direction in inline bidirectional text. The Unicode Consortium recommends that isolation be used as the default for all future inline bidirectional text embeddings. To use these new control codes, however, it will be necessary to wait until the browsers support them. The new control codes are:

RLI U+2067 RIGHT-TO-LEFT ISOLATE Sets direction to rtl
LRI U+2066 LEFT-TO-RIGHT ISOLATE Sets direction to ltr
FSI U+2068 FIRST STRONG ISOLATE Sets direction according to the first strong character
PDI U+2069 POP DIRECTIONAL ISOLATE Terminates the range set by RLI, LRI or FSI

Mirrored characters

The Unicode Bidirectional Algorithm has rules for displaying mirrored characters. The visible shape of these characters depends on whether they are displayed in a LTR or RTL context. These are commonly pairs of characters such as parentheses, brackets, and the like, but also include some characters that are not typically paired, such as [U+2260 NOT EQUAL TO].

basic vertical japanese
The character > [U+003E GREATER-THAN SIGN] points to the right when displayed in a LTR context, but to the left in an RTL context. Test in your browser

This is completely automatic. You do not have to change the character for the shape to change.

The ends of an opening parenthesis always face in the direction of the text flow, and closing parentheses face the other way. This means that, whether the stored content is in Arabic/Hebrew or Latin script, you would use the same ( [U+0028 LEFT PARENTHESIS] character at the beginning of the parenthesized text. In other words, treat mirrored characters as if any word left in the name meant 'opening', and right meant 'closing'.

But up-to-date implementations of the bidi algorithm go further, and attempt to balance parentheses. In the picture below, the higher lines show how parenthesis used to look (without intervention), and the bottom shows how they look with the balancing in play.

basic vertical japanese
Parentheses, as they used to be (top) and balanced (bottom) using the Unicode Bidirectional Algorithm. Test in your browser

Again, you don't need to take any action to enable these improvements. The browser should just do this.

Overriding the algorithm

There may be occasions where you don't want the bidi algorithm to do its reordering work at all. In these cases you need some additional markup to surround the text you want left unordered. In HTML this is achieved using the inline bdo element. Note that you shouldn't find yourself using bdo for normal management of bidi text – it's only for special cases, mostly educational. And don't confuse it with bdi.

In other XML applications, such as XHTML2, it may be implemented as a value of rlo or lro on the dir attribute, enabling it to be applied to any element. There are also Unicode control characters you could use to achieve the same result, but because they create states with invisible boundaries this is generally not recommended.

Examples that show the characters as ordered in memory use the bdo tag to achieve that effect. You must provide a dir attribute with the bdo element, and the value must be either rtl or ltr (it cannot be auto). For example, the picture below shows Hebrew text as ordered in memory.

Shows Hebrew text in the order stored in memory.

Text using a bidirectional override (bottom line).

For the bottom line we would use the following markup in HTML:

<p><bdo dir="ltr">INTERNATIONALIZATION ACTIVITY, w3c</bdo></p>