Intended audience: HTML coders, script developers (PHP, JSP, etc.), Web project managers, browser implementers, and anyone who needs to understand why spaces are apparently dropped in bidi text.
Why does my browser collapse spaces between Latin and Arabic/Hebrew text, and how can I fix it?
Spaces between Latin and Arabic/Hebrew text may appear to collapse if text is followed by white space inside an inline element that
Here is an example, where the colored rectangle indicates the problem space.
Code such as the above would produce the following result.
Note that this effect also occurs when right-to-left text is embedded in a left-to-right passage.
If the previous section describes the look of your code, the solution is to remove all space before the end tag of the inline element, or
dir attribute (if appropriate).
For example, moving the space after W3C outside the
span, like this
would produce a result that looks as expected.
In this instance the
span element around the text W3C is not actually needed to produce the correct ordering. Leaving out the attribute or the whole
span element will also solve the
problem (although we generally recommend marking up all opposite-direction text).
Only read this section if you want the technical details about why this happens.
The expected behavior when the text is displayed is not described in detail in the HTML specifications, but is described in CSS specifications. Although the examples on this page do not use CSS, the same principles apply. The following is taken from the CSS Text Module Level 3 Working Draft:
Any space immediately following another collapsible space—even one outside the boundary of the inline containing that space, provided they are both within the same inline formatting context—is collapsed to have zero advance width. (It is invisible, but retains its soft wrap opportunity, if any.)
Given a scenario as follows, where the colors represent spaces (U+0020):
<ltr>A <rtl> B </rtl> C</ltr>
the spec says that the space after A is kept, the space before B is removed, the space after B is kept, the space before C is removed, which leaves us with:
<ltr>A <rtl>B </rtl>C</ltr>
This is then rendered according to the Unicode bidirectional algorithm, and the end result is:
Note that there are actually two spaces between A and B. The embedding levels can be expressed as follows:
The following boxes show code samples followed by an implementation of that code on this page, so that you can test the behavior of your current browser. The surrounding context is right-to-left for all examples.