Why does my browser collapse spaces between Latin and Arabic/Hebrew text, and how can I fix it?
Right-to-left text in code samples is represented here by UPPERCASE TRANSLATIONS, and left-to-right text by lowercase.
Spaces between Latin and Arabic/Hebrew text may appear to collapse if text is followed by white space inside an inline element that
In the following code pattern the colored rectangle indicates the problem space. (The uppercase letters represent RTL text, and the lowercase content is LTR.)
Code such as the above would produce the following result, if we substitute Arabic and English for the content.
Note that this effect also occurs when right-to-left text is embedded in a left-to-right passage.
If the previous section describes the look of your code, the solution is to remove all space before the end tag of the inline element, or
dir attribute (if appropriate).
Here is the new pattern:
For example, moving the space after W3C outside the
span in the real example above would produce a result that looks as expected.
In this instance the
span element around the text W3C is not actually needed to produce the correct ordering. Leaving out the attribute or the whole
span element will also solve the
problem (although we generally recommend marking up all opposite-direction text).
Only read this section if you want the technical details about why this happens.
The expected behavior when the text is displayed is not described in detail in the HTML specifications, but is described in CSS specifications. Although the examples on this page do not use CSS, the same principles apply. The following is taken from the CSS Text Module Level 3 Working Draft:
Any space immediately following another collapsible space—even one outside the boundary of the inline containing that space, provided they are both within the same inline formatting context—is collapsed to have zero advance width. (It is invisible, but retains its soft wrap opportunity, if any.)
Given a scenario as follows, where the colors represent spaces (U+0020):
<ltr>a <rtl> B </rtl> c</ltr>
the spec says that the space after A is kept, the space before B is removed, the space after B is kept, the space before C is removed, which leaves us with:
<ltr>a <rtl>B </rtl>c</ltr>
This is then rendered according to the Unicode bidirectional algorithm, and the end result is:
Note that there are actually two spaces between A and B. The embedding levels can be expressed as follows:
The following boxes show code samples followed by an implementation of that code on this page, so that you can test the behavior of your current browser. The surrounding context is right-to-left for all examples. The vertical orange bar indicates the location of space characters.