Intended audience: content developers working with right-to-left scripts, HTML/XHTML and SVG coders (using editors or scripting), script developers (PHP, JSP, etc.), schema developers (DTDs, XML Schema, RelaxNG, etc.), and anyone who is struggling to understand how to make their mixed direction text look right in markup .
Code samples containing Arabic and Hebrew text may be displayed in different ways depending on which editor is used. In this article right-to-left text in code samples is represented by UPPERCASE TRANSLATIONS, and left-to-right text by lowercase. All text in code samples reflects the direction of characters as stored in memory, rather than the displayed result. The original version of text in uppercase translations would be read from right-to-left.
To see the full source, click on the and view the source of the page that displays.
This article tells you how to write HTML where text with different writing directions is mixed within a paragraph or other HTML block (ie. inline or phrasal content). (A companion article Structural markup and right-to-left text in HTML tells you how to use HTML markup for elements such as
html, and structural markup such as
div and forms.)
If you know the direction of all the text involved, tightly wrap every opposite-direction phrase in markup. Add the CSS shim to your style sheet, and use the
dir attribute on that markup. Be sure to nest markup to show the structure.
If you want to bullet-proof your code for browsers that don't support the CSS shim where tightly-wrapped text is followed inline by a number or a logically separate opposite-direction phrase, add
‎ immediately after the phrase.
If you don't know the direction of text that will be inserted at run time, add
dir=auto to any markup that tightly wraps the location. If there is no markup, wrap the location with a
If you want just a little more detail, jump to the section Steps for handling inline bidirectional text in HTML.
If you'd like to understand inline bidirectional text better, and see worked examples, read the rest of this article.
It is common for content in Arabic, Hebrew, and other languages that use right-to-left scripts to include numerals or include text from other scripts. Both of these typically flow left-to-right within the overall right-to-left context.
The article first describes some of the basic principles underlying how the Unicode bidirectional algorithm works, so that you better understand the problems you have to deal with. Then it looks at some of the more common scenarios where the bidi algorithm requires assistance through the addition of markup or control codes. It is written in a tutorial style that helps the reader with little or no background in handling bidirectional text progress from one concept to the next.
The article is focused on markup usage in HTML, but most of the concepts are also relevant for other markup languages.
NOTE!This is a major update of the article formerly title What You Need to Know About the Bidi Algorithm and Inline Markup, and reflects the recent changes in bidi markup in the HTML5 specification.
Technically speaking, the main change is that the
dir attribute now isolates text by default with respect to the bidi algorithm. Isolation as a default is the recommendation of the Unicode Standard as of version 6.3.
For the less technical-minded, the main advantage of this change is a much simpler transition for both content authors and browser developers who want reap the benefits of isolation. At the same time, these approaches have good results for existing legacy content.
If you're not really familiar with the Unicode Bidirectional Algorithm, then before reading further you should click on this button and read the basic introduction to how the bidi algorithm works.
dirshould be used on the
htmltag and for structural markup see Structural markup and right-to-left text in HTML.
In the sections below, we will examine specific examples of what can go wrong, why it goes wrong, and what fixes it. Nevertheless, it is important to realize that, basically, the problems all occur when content in one direction includes an inline phrase in the opposite direction. We will call these opposite-direction phrases. An opposite-direction phrase may be a single directional run (such as a word), or may be a set of directional runs with an embedded change in base direction.
In the following example, the English sentence contains an opposite-direction phrase between the quotation marks. That phrase, itself, also contains an opposite-direction phrase: the word C++.
Common examples of such phrases include quotations, titles of books, articles or plays, formatted numbers (e.g. phone numbers and MAC addresses), street and email addresses, and various names, such as brand names, acronyms, part numbers, site names, place names, file names (and paths), etc.
The problem is worse in applications that drop text into a page, say from a database. The application often does not know a-priori whether such text is (or perhaps contains) an opposite-direction phrase, and has to estimate its direction at run-time by checking the Unicode ranges of its characters. HTML5 introduces a feature for doing so in the browser.
Whenever an opposite-direction phrase occurs, things can go wrong. That is, something will go wrong if the text includes, without any special "wrapping", an inline opposite-direction phrase that:
Although this list seems daunting, there is no need to determine which, if any, of these cases applies to a particular phrase. There is a simple, default way of "wrapping" opposite-direction phrases that will prevent problems in all of the cases above, and do no harm when none of them apply. We will describe how do such wrapping for HTML5-aware browsers and for others.
NOTE! This section includes references to markup being introduced by HTML5 that should simplify various aspects of handling inline bidi text. The HTML5 spec cannot yet be said to be stable, nor are the new features implemented in all browsers. We point out what is new. Use the new features where you can, and encourage browsers developers to continue to implement them.
dir attribute sets the base direction for the content of an element.
To set the default direction of the whole HTML document to right-to-left, add
dir="rtl" to the
html tag. This will result in all elements in the document inheriting a base direction of RTL.
You can change the base direction for content within a page by surrounding that content with an element and adding a
dir attribute to indicate the desired direction.
In principle, the right thing to do for every opposite-direction phrase is to set its base direction by using the
dir attribute on an element tightly wrapping the phrase.
HTML5 changes the semantics of the
dir attribute. In browsers that implement this change, the content of the element on which the
dir attribute sits will be isolated, in terms of the bidi algorithm, from the content surrounding it. Wrapping the opposite-direction phrases in an element with a
dir attribute, helps address some of the problems listed in the previous section; adding isolation helps resolve some more.
Check out the worked examples below to see how this works.
The visual order in which text is displayed can sometimes be modified using two invisible Unicode control characters: LRM (U+200E LEFT-TO-RIGHT MARK) which can be added to the source text using the character itself or the escapes
‎, and RLM (U+200F RIGHT-TO-LEFT MARK), for which the escapes are
‏). Each has the strong type indicated by its name, like an A or an א, but is invisible.
One use of LRM and RLM is to extend a directional run through neutral or weak characters at the start or end of an opposite-direction phrase, by putting a mark of the same direction as the phrase on the other side of those neutral or weak characters. You can see an example of how it works in the advanced usage notes for use case 1 below.
Another use is to separate an opposite-direction phrase from some neighboring but independent text that would otherwise be incorrectly treated as the same directional run (see use case 3 for a good example). To do this you can put between them a directional mark with the same directionality as the overall context.
In HTML5, where the
dir attribute is isolating, both cases are better addressed by adding the
dir attribute to an element wrapping the opposite-direction phrase, so there is really no need to use LRM/RLM. See below for details.
HTML5 addresses another need: text dropped into a page, say from a database, when you don't know its base direction. Before HTML5, you could only set the
dir attribute to
rtl, and had to somehow determine yourself which of them was appropriate.
HTML5 provides a new value for the
auto value tells the browser to look at the first strongly typed character in the element. If it's a right-to-left typed character such as a Hebrew or Arabic letter, the element will get a direction of
rtl. If it's, say, a Latin character, the direction will be
There are corner cases where this may not give the desired outcome, but it should usually produce the desired result.
Note that the browser ignores any neutral or weak characters at the beginning of the text when looking for the first strong character. It also ignores anything inside a
bdi element or an element with a
dir tag of its own, including
Like any other use of the
dir attribute in HTML5,
dir="auto" also directionally isolates its content from its surroundings.
See which browsers support this.
HTML5 also introduces a new element,
bdi. It is just like a
span except that, whether or not it is used with a
dir attribute, it directionally isolates its content from the surrounding text; "bdi" stands for "bidirectional isolate".
bdi comes with the
dir attribute set to the new
auto value by default (see above), however it is also possible to use an explicit
dir attribute on
bdi with values set to
rtl, if you know the direction of the phrase and just want to isolate it.
The choice of whether to attach
dir="auto" on an existing element or to wrap the phrase in a
bdi depends on whether you already have an inline element tightly wrapping the potentially opposite-direction phrase, and whether you happen to know the phrase's direction (or can guess at it better than the browser's
See which browsers support this.
Here we summarize default guidelines for working with bidirectional inline text. Often alternative approaches will work, but the approaches outlined here are simple to apply and should work for all cases.
dirattribute to that element. In some cases this can lead to the block of text being aligned on the page in a way that is not desirable. To avoid this, you can add an inline element immediately inside the tags of the existing markup, and follow the rules below for marking up that inline element. For more information about handling direction in non-inline elements, see Structural markup and right-to-left text in HTML.
The easiest way to address bidi issues in your content is to tightly wrap every opposite-direction phrase in markup that sets its base direction.
By tightly wrapping, we mean that the element contains the entire opposite-direction phrase, and nothing but the opposite-direction phrase.
When none of the problematic use cases apply, this will not have any visible effect. But when one of them does apply, this provides a simple solution, that doesn't require you to figure out specifically what the problem is.
The latest version of the HTML5 specification says that browsers should change their default style sheet so that the
dir attribute isolates the text inside the element from that surrounding it. This means that more of the potential problems pointed out earlier simply melt away when you tightly wrap all opposite-direction phrases with elements containing
The CSS Shim. Unfortunately, not all browsers yet apply isolation when
dir is used. For this reason, during the transitional phase, we recommend that you provide some CSS yourself to produce the effect of the default style sheet. The following browser versions are known to support the necessary CSS:
Internet Explorer 8-10 doesn't support the CSS, but does use a hack that produces a similar effect, and is usually good enough.
Browsers that don't yet support the CSS will simply behave in the same way as before.
The CSS shim is as follows:
At the time of writing, all browser versions that support isolation in CSS also support the
To make sure that a phrase that contains any opposite-direction characters is displayed correctly, do the following.
If you know the phrase's direction, or can work it out for injected text, wrap all opposite-direction phrases in an element with a
dir attribute. This is not always necessary, but never does any harm. If the phrase is already tightly wrapped in an inline element, you can use the existing element for this purpose. If not, add a
If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either:
wrap the phrase in
<bdi>...</bdi>. (If already wrapped by a
span element, you may replace the
bdi.) Without an explicit
dir="auto" is implied.
or alternatively, if the phrase is tightly wrapped by an element already, you could just add
dir="auto" to that element
dir isolation or the CSS shim is not supported by a browser or browser version, it is not possible to isolate phrases from their surrounding content. It is often possible to achieve a similar effect, however, using an RLM or LRM Unicode control code, if you know the direction of the surrounding text.
To make sure that a phrase that contains any opposite-direction characters is displayed correctly, do the following:
If you know the direction of the text surrounding a phrase, or can work it out for injected text:
tightly wrap opposite-direction phrases in an inline element that uses the
dir attribute to set the direction of the phrase, as described above.
if the tightly-wrapped phrase in the previous step is followed inline (possibly after some intervening neutral characters) by a number or a logically separate opposite-direction phrase, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. Choose one that matches the direction of the surrounding context. If you do not want to or cannot check whether the phrase is followed by one of those things, you can add a directional mark matching the direction of the context after every tightly-wrapped phrase.
If you don't know the phrase's direction, ie. unknown text that will be injected at run time, there isn't really a good way to automatically apply the right base direction. However, if you know that one injected phrase may be followed inline by a number or a logically separate opposite-direction phrase, you can add a directional mark immediately after the phrase that matches the direction of the surrounding context in order to separate the phrase from what follows.
If the phrase is being injected at run-time and its overall direction is unknown, it can be estimated from the direction of its individual characters, e.g. by using the direction of its first strongly typed character. Open-source code for doing so is available, but HTML does not offer any features for easing this task. It is possible (but not necessary) to skip the steps above only if both the overall direction of the phrase and the direction of the last strongly typed character in the phrase is the same as the context direction.
In this section we will look at how to write code that addresses various use cases where the content is written by the author. The section following this deals with use cases where content is injected into the page.
In all cases, the sections related to use of HTML5 features assume the availability of the CSS shim described above for browsers that support it but don't support isolation with
In this example a right-to-left book title is embedded in a left-to-right context, and the book title itself contains an embedded left-to-right phrase. Here is the code without any additional bidi markup:
What one would expect to see is:
Unfortunately, the bidirectional algorithm cannot tell where the boundaries of the nested changes in base direction should be. The result, without help in the markup, is:
To address this in HTML5, if there is no other markup around the opposite-direction phrases, wrap both in markup with
dir value. (Note, by the way, how the markup appears inside the quotation marks, which are part of the English text.)
It is important to note that each phrase is nested. Just wrapping the Arabic in one
span followed by a
span containing the C++ would result in no improvement at all.
advanced usage notes: Note that two elements with
dir are needed in this case. This is because there are two opposite-direction phrases. If only one was used, like this:
the displayed text would be as shown below. This moves the C++ to the left, as needed, but the + signs appear on the wrong side of the C.
This fails because the "C++" is an opposite-direction (LTR) phrase within the title, ending in neutral characters and the phrase is now being displayed with an RTL base direction. The bidi algorithm has no way of knowing that the plus signs are part of an LTR phrase, not of the RTL context, and thus displays them to the left of the "C" instead of to its right.
To solve this problem, wrap the overall RTL phrase in a
<span dir="rtl">, and the LTR phrase nested inside it in its own
<span dir="ltr">, as shown.
If there is already suitable markup to surround the book title, such as a
cite element, add the
dir attribute to that.
advanced usage notes: If the "C++" in this example was an ordinary Latin-script word, such as "Python" you wouldn't actually need to mark it up to get the right display. The bidi algorithm would take care of it. However marking up text in this way avoids you having to understand why these two cases are different, and having to work out which case applies for your content.
Similarly, if the title contained no embedded left-to-right text, you wouldn't actually need directional markup at all, but adding it avoids possible issues related to following inline text, such as where the text is edited to add a following number or another title, like this:
The solution outlined for HTML5 aware browsers will work equally well for browsers that don't support HTML5 features.
advanced usage notes: As noted earlier, one use of LRM and RLM is to extend a directional run through neutral or weak characters at the start or end of an opposite-direction phrase, by putting a mark of the same direction as the phrase on the other side of those neutral or weak characters. For this example, instead of wrapping the "C++" in a
<span dir="ltr">, we could add
‎ after the second plus:
The result is what we need:
Because the LRM is a strongly left-to-right character, the neutral pluses are now between two strong left-to-right characters (the C and the LRM). They therefore also become left-to-right in direction, making a single directional run of the four characters.
Used this way, however, LRM and RLM are a bit like gotos in programming languages: a quick hack that, unlike the
dir attribute, says nothing about the structure of the text. And they simply cannot be used to deal with an opposite-direction phrase that happens to contain a nested phrase in the original direction, like our complete "Introduction to C++" example above. That may seem like an esoteric case, but it is surprisingly common when displaying right-to-left data in a left-to-right page, because the use of left-to-right words (like "C++") is not uncommon in right-to-left text.
So, if you don't want to analyze whether LRM and RLM can replace the use of the
dir attribute in your case, just use the
In the next example, the opposite-direction phrase is followed by a logically separate number. This is the code without any bidi markup:
You would expect to see:
You would actually see:
This happens because the bidi algorithm tells the browser to treat the "5″ as part of the Hebrew text. This is not appropriate here though. We need to find a way to say that the name and the number are separate things, ie. to isolate the inserted name from the number.
In a browser that supports isolating
dir or the CSS shim, wrap the opposite-direction phrase (the title) in markup and add the appropriate
dir value. There is no need to add anything else, since the
dir attribute automatically isolates its content.
If there is already suitable markup to surround the book title, such as an
a element, add the
dir attribute to it.
For browsers where
dir doesn't isolate, you would fix this by not only adding the markup around the opposite direction, Hebrew text, but adding also an LRM character after it. That would prevent the number being associated with the right-to-left text.
If the search string was already tightly wrapped by an element, use that element tag to add the
dir attribute, and add the LRM character after it.
Of course, if the overall context is right-to-left, eg. Arabic/Hebrew/etc. text, and the book title was in English, you would need to add an RLM character rather than an LRM character.
Neutrals between same directional runs can sometimes be misinterpreted by the bidi algorithm. In this use case we have several country names in Arabic listed in a LTR paragraph. This is an example of an opposite-direction phrase followed by another, but logically separate, opposite-direction phrase. Here is the source code without any bidi markup:
We expect to see the following:
In the actual result, the first two Arabic words are reversed and the intervening comma is moved to the right side of the space between the words.
The reason for the failure is that, with a strongly typed right-to-left (RTL) character on either side, the bidirectional algorithm sees the neutral comma as part of the Arabic text. It is interpreting the first two Arabic words and the comma as a single directional run in Arabic. In fact it is part of the English text, and should mark the boundary between the two separate right-to-left directional runs in Arabic.
The solution for this use case is similar to that for the previous use case, so we will keep the notes below brief, and assume that you have read the solutions for use cases 1 and 2. We will present just the default markup approach.
Simply wrap each Arabic word with markup and add the appropriate
If there is already markup surrounding the Arabic text, such as an
a element, add the
dir attribute to it.
In HTML4 add markup around the Arabic text, but add also an LRM character after it whenever that text is followed by another opposite-direction phrase. Use an RLM character if the surrounding context is right-to-left.
As before, if the Arabic text was already tightly wrapped by an element, use that element tag to add the
In this section we will look at use cases that involve injecting content into a page at run time.
It is important to note that we cannot address markup inside the injected content. In all cases below, if the injected phrases contain embedded opposite-direction phrases themselves, these need to be already marked up when the phrase is injected into the page, either in the database, or added by scripting when the injected phrase is fetched. If this is not done, the injected text will look alright for simple cases, but may be problematic for more complex ones.
In the article Structural markup and right-to-left text in HTML there is an example of a page for an online book store that carries books in many languages and needs to display the original book titles regardless of the language of the user interface. Thus, a Hebrew or Arabic book title may appear in an English interface, and vice-versa.
Let us suppose that you searched for the book הצהחת קידוד תװי CSS and that that book wasn't found. The bookstore might generate a message that says so. The image above shows what one would expect to see.
Note how the 'CSS' is to the left of the Hebrew text because it is part of the book title. However with the following source code ...
... here is the actual result. Note how the 'CSS' is now on the right of the Hebrew text.
The default rule when there is no other element around the injected text, is to wrap it in
bdi tag automatically assigns a direction based on the first strong character in the injected string.
advanced usage notes:It is possible that the search string in this example begins with a strong left-to-right character, for example, if the book title that we are searching for begins with 'CSS', rather than ending with it. In that case, there is not much we can do by default in the markup. To cover this case you would have to use scripting to detect the direction of the string as a whole and apply that to the markup.
If there is another element around the injected text, use
dir="auto" or wrap the injected phrase in
In HTML4 we can't really address this use case using markup, since we need to know in advance the direction of the text. This can only be achieved by knowing the direction of or examining the injected phrase before insertion, and applying the appropriate directional information by scripting.
Here's an example where the names of restaurants are added to a page from a database and followed by a number. You don't know in advance the directionality of the injected text. This is the code produced by the script that injects the phrases, without bidi markup:
And here's what one would expect to see, and what you'd actually see.
The problem with the second restaurant name arises because the browser thinks that the " – 5″ is part of the Hebrew text. This is what the Unicode Bidi Algorithm tells it to do, and usually it is correct. Not here though. We need to find a way to say that the name and the number are separate things, ie. to isolate the inserted name from the number.
In the third restaurant name the number is back in the right place, but the word 'Roma' is part of the Hebrew name, and should appear to the left of the Hebrew text. In other words, we need to apply a base direction of RTL to the whole of the injected text.
Once again, the default rule when there is no other element around the injected text, is to wrap it in
bdi element automatically isolates the injected phrase from the number, and sets the direction for the phrase based on its first strong character.
bdi tag automatically assigns a direction based on the first strong character in the injected string.
You'll notice that the example above puts
bdi around the name Aroma too. Of course, you don't actually need that, but it won't do any harm. On the other hand, it simplifies the necessary script code, and means you can handle any name that comes out of the database, whatever script it is in.
If there is another element around the injected text, wrap the injected phrase in
bdi or use
Again, in HTML4, all we can do is add a LRM character after the injected phrase, to ensure that it is isolated from the number. This would be sufficient to correctly render the second item in the list, because it is a very simple case, with no embedded opposite-direction phrases or neutral characters. The third case, however, will not work so well, since the base direction has to be set to right-to-left for the word 'Roma' to appear on the left. This can only be properly rendered if the injected phrase has markup added to it before insertion.
The code would look something like this.
It is a very common situation for punctuation or some other neutral character to appear at the end of an opposite direction phrase and belong with that phrase.
Unfortunately, such neutrals between different directional runs are typically misinterpreted unless there is additional bidi markup. In the following example, the exclamation mark should appear at the end of the Arabic text, ie. to the left, like this:
Unfortunately, if we rely solely on the bidirectional algorithm we see this:
Given our understanding of the bidi algorithm we can easily understand why this happened. Because the exclamation mark was typed in between the last RTL letter 'ب' (on the left) and the LTR letter 'i' (of the word 'in') its directionality is determined by the base direction of the paragraph, ie. LTR in this case.
Because the exclamation mark is seen as LTR it joins the directional run that includes the text 'in Arabic'.
The general solution mentioned above works fine: just put the opposite-direction phrase in an element with a
dir attribute. If there isn't already an element present, use a
advanced usage notes:You could also simply place an RLM after the exclamation mark, but we have already discussed earlier why that is a less ideal fix. Note, also, that when using this solution, without markup, the Arabic text is not marked up for language or styling. Adding markup around the embedded title is probably a better way to solve the problem.
bdi if there isn't already a surrounding element, otherwise put a
dir="auto" on the surrounding element, or put
bdi inside it.
The picture below shows the expected result of displaying a telephone number in a right-to-left context, where the area code is surrounded by parentheses, and where the number appears at the beginning of a line or after some right-to-left text.
The next picture shows what you actually see, if you rely solely on the bidi algorithm.
Because these are numbers, the order applied by the bidirectional algorithm is slightly different from what we've seen before, but the fix is essentially the same.
Here is another, somewhat more problematic example of the same thing. The picture below shows a MAC address number as you would expect to see it displayed in a right-to-left context. The sequence 01:02:aa:4a:bb:06 looks exactly the same as it would in a left-to-right context.
Here, however, is what you will see when relying solely on the bidirectional algorithm.
This is particularly worrisome, since it's not obvious when the order is incorrect. Even if you did know it was incorrect, it is not at all clear how it should be read.
Although there are more characters involved, this problem is caused because the bidirectional algorithm assumes that the initial run of numbers (and colons, since they are neutral) are associated with the preceding Hebrew text, rather than part of the Mac address.
This example indicates that you should always wrap MAC addresses, and similar numbers, with directional information.
The solution is the same. Put the opposite-direction phrase in an element with a
dir attribute. If there isn't already an element present, use a
span. The following code would be used in an overall right-to-left context.
bdi if there isn't already a surrounding element, or put
dir="auto" on a surrounding element, or put
bdi inside it . We just show the simplest case here. The following code would be used in an overall right-to-left context.
advanced usage notes:You could also solve both of these cases by simply inserting an RLM immediately before the number. Adding markup around the number is probably a safer way to solve the problem.
There are some situations where you may not be able to use the markup described in the previous section. In HTML these include the
element and any attribute value.
In these situations you have to use the invisible Unicode characters that produce the same results.
To replicate the effect of the markup described in the example above related to nested base directions, we can use pairs of characters to surround the embedded text. The first character is one of U+202B RIGHT-TO-LEFT EMBEDDING (RLE) or U+202A LEFT-TO-RIGHT EMBEDDING (LRE). This corresponds to the pre-HTML5 markup
<span dir="rtl"> or
<span dir="ltr">, respectively, ie. they do not isolate. The second character is U+202C POP DIRECTIONAL FORMATTING (PDF). This corresponds to the
</span> in the markup. Here's an example.
These control characters should only be used for inline phrases, not for block elements such as paragraphs. In general, it is recommended that you use markup where it is available, rather than these character pairs, because it is easier to see and therefore manage the markup, and it is consistent with the approach used for block elements. Where markup is not available, of course, this is the only option.
The two characters we already met in the above text, U+200F RIGHT-TO-LEFT MARK (RLM) and U+200E LEFT-TO-RIGHT MARK (LRM) can also be used, where appropriate.
If isolation is necessary, either within the text or when the text is used with surrounding content, in addition to RLE/LRE...PDF, you may also need to add the LRM or RLM marks as described in the section about legacy browser support.
Note that in the example just shown the Arabic text is no longer marked up for language or styling. Also, because the character is invisible you may prefer to actually type in a numeric character reference (‏) as we did here, or, if available, a character entity (such as ‏ in HTML).
From Unicode version 6.3 onwards, the Unicode Standard contains new control codes (RLI, LRI, FSI, PDI) to enable authors to express isolation at the same time as direction in inline bidirectional text. The Unicode Consortium recommends that isolation be used as the default for all future inline bidirectional text embeddings. To use these new control codes, however, it will be necessary to wait until the browsers support them. The new control codes are:
|RLI||U+2067 RIGHT-TO-LEFT ISOLATE||Sets direction to rtl|
|LRI||U+2066 LEFT-TO-RIGHT ISOLATE||Sets direction to ltr|
|FSI||U+2068 FIRST STRONG ISOLATE||Sets direction according to the first strong character|
|PDI||U+2069 POP DIRECTIONAL ISOLATE||Terminates the range set by RLI, LRI or FSI|
You may have noticed that, in addition to changing position, one of the parentheses in the previous example actually changed shape, too. This was completely automatic, and happens because these characters are what are known as mirrored characters in Unicode.
Mirrored characters are usually pairs of characters, such as parentheses, brackets, and the like, whose shape when displayed is dependent upon whether it is part of a LTR or RTL context. You do not have to change the character for the shape to change.
The ends of an opening parenthesis always face in the direction of the text flow, and closing parentheses face the other way.
In the picture below, the parenthesis circled in red faces to the right in the top line and to the left in the bottom line. The only difference between the two lines is that we put a span around the Latin text in the bottom line and set the base direction to
ltr. What you are seeing is exactly the same character – we have only changed the markup.
On the top line, the bidi algorithm thinks the closing parenthesis is part of the RTL text, and so it faces right. On the bottom line, the bidi algorithm treats it as a LTR closing parenthesis, so it faces left.
This means that, whether the stored content is in Arabic/Hebrew or Latin script, you would use the same LEFT PARENTHESIS character at the beginning of the parenthesized text. In other words, treat mirrored characters as if any word
left in the name meant 'opening', and right meant 'closing'.
Unicode 6.3 introduces to the Bidirectional Algorithm some new rules for handling paired characters, such as brackets and parentheses. These should help to reduce problems in problematic areas. You don't need to take any action to enable these improvements. It's simply a case of waiting for the browser to implement the new behaviour.
There may be occasions where you don't want the bidi algorithm to do its reordering work at all. In these cases you need some additional markup to surround the text you want left unordered.
In HTML this is achieved using the inline bdo element. (In other XML applications, such as XHTML2, it may be implemented as a value of
lro on the
dir attribute, enabling it to be applied to any element.) Again, there are Unicode control
characters you could use to achieve the same result, but because they create states with invisible boundaries this is generally not recommended.
Examples that show the characters as ordered in memory use the
bdo tag to achieve that effect. For example, the picture below shows Hebrew text as ordered in memory.
For the bottom line we would use the following markup in HTML:
Note that the CSS shim described earlier in this article contains code that applies isolation to the