Bidi space loss

Intended audience: HTML coders, script developers (PHP, JSP, etc.), Web project managers, browser implementers, and anyone who needs to understand why spaces are apparently dropped in bidi text.

Question

Why does my browser collapse spaces between Latin and Arabic/Hebrew text, and how can I fix it?

Answer

A likely cause

Right-to-left text in code samples is represented here by UPPERCASE TRANSLATIONS, and left-to-right text by lowercase.

Spaces between Latin and Arabic/Hebrew text may appear to collapse if text is followed by white space inside an inline element that includes a dir attribute.

In the following code pattern the colored rectangle indicates the problem space. (The uppercase letters represent RTL text, and the lowercase content is LTR.)

<p dir="rtl">RTL_TEXT <span dir="ltr">ltr_text </span>RTL_TEXT</p>

Code such as the above would produce the following result, if we substitute Arabic and English for the content.

Picture of the result, showing no space to left of Latin text.

Note that this effect also occurs when right-to-left text is embedded in a left-to-right passage.

How to fix it

If the previous section describes the look of your code, the solution is to remove all space before the end tag of the inline element, or remove the dir attribute (if appropriate).

Here is the new pattern:

<p dir="rtl">RTL_TEXT <span dir="ltr">ltr_text</span> RTL_TEXT</p>

For example, moving the space after W3C outside the span in the real example above would produce a result that looks as expected.

Picture of the result, showing space on both sides of Latin text.

In this instance the span element around the text W3C is not actually needed to produce the correct ordering. Leaving out the attribute or the whole span element will also solve the problem (although we generally recommend marking up all opposite-direction text).

Additional information

Why does this happen?

Only read this section if you want the technical details about why this happens.

The expected behavior when the text is displayed is not described in detail in the HTML specifications, but is described in CSS specifications. Although the examples on this page do not use CSS, the same principles apply. The following is taken from the CSS Text Module Level 3 Working Draft:

Any space immediately following another collapsible space—even one outside the boundary of the inline containing that space, provided they are both within the same inline formatting context—is collapsed to have zero advance width. (It is invisible, but retains its soft wrap opportunity, if any.)

Given a scenario as follows, where the colors represent spaces (U+0020):

<ltr>a <rtl> B </rtl> c</ltr>

the spec says that the space after A is kept, the space before B is removed, the space after B is kept, the space before C is removed, which leaves us with:

<ltr>a <rtl>B </rtl>c</ltr>

This is then rendered according to the Unicode bidirectional algorithm, and the end result is:

a  Bc

Note that there are actually two spaces between A and B. The embedding levels can be expressed as follows:

00110

What happens in my browser?

The following boxes show code samples followed by an implementation of that code on this page, so that you can test the behavior of your current browser. The surrounding context is right-to-left for all examples. The vertical orange bar indicates the location of space characters.

ARABIC <span dir="ltr">latin </span>ARABIC

صفحة الترجمة لموقع W3C على الرابط

ARABIC <span dir="ltr">latin </span> ARABIC

صفحة الترجمة لموقع W3C على الرابط

ARABIC <span dir="ltr">latin</span> ARABIC

صفحة الترجمة لموقع W3C على الرابط

ARABIC <span>latin </span>ARABIC

صفحة الترجمة لموقع W3C على الرابط

ARABIC<span dir="ltr"> latin</span> ARABIC

صفحة الترجمة لموقع W3C على الرابط