Accesskey n skips to in-page navigation. Skip to the content start.

s_gotoW3cHome Internationalization
 

Bidi space loss

Intended audience: users, XHTML/HTML coders (using editors or scripting), script developers (PHP, JSP, etc.), CSS coders, schema developers (DTDs, XML Schema, RelaxNG, etc.), XSLT developers, Web project managers, and anyone who is new to internationalization and needs guidance on topics to consider and ways to get into the material on the site.

Updated 2003-11-06 09:10:35

Question

Why does my browser collapse spaces between Latin and Arabic/Hebrew text?

Background

Spaces between Latin and Arabic/Hebrew text may appear to collapse if text is followed by white space inside an inline element that includes a dir attribute.

For example, in such browsers the code:

<p dir="rtl"> العالمية <span dir="ltr">(W3C) </span> تخلق قواعد </p>

Would produce a result that looks as follows, where the arrow indicates the location of the missing space:

Picture of the result, showing no space to left of Latin text.

Note that this effect also occurs when right-to-left text is embedded in a left-to-right passage.

Answer

If the previous section describes the look of your code, the solution is to remove all space before the end tag of the inline element, or remove the dir attribute (if appropriate).

For example, removing the space between (W3C) and </span>:

<p dir="rtl"> العالمية <span dir="ltr">(W3C)</span> تخلق قواعد </p>

would produce a result that looks like:

Picture of the result, showing space on both sides of Latin text.

Note also that in this example the dir="ltr" attribute in the <span> element around the text (W3C) is not actually needed to produce the correct ordering. Leaving out the attribute or the whole span element will also solve the problem.

How does it look for me?

The following boxes show code samples followed by an implementation of that code on this page, so that you can test the behavior of your current user agent.

Code: <p dir="rtl"> العالمية <span dir="ltr">(W3C) </span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية <span dir="ltr">(W3C)</span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية <span>(W3C) </span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية <span>(W3C)</span> تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

Code: <p dir="rtl"> العالمية (W3C) تخلق قواعد </p>

العالمية (W3C) تخلق قواعد

By the way

Only read this section if you want the gory details about why this happens.

The expected behavior when the text is displayed is not described in detail in the XHTML/HTML specifications, but is described in recent CSS specifications. Although the examples on this page do not use CSS, the same principles apply. The following is taken from the CSS 2.1 Working Draft:

  1. If 'white-space' is set to 'normal', 'nowrap', or 'pre-line',
    1. every tab (U+0009) is converted to a space (U+0020)
    2. any space (U+0020) following another space (U+0020) — even a space before the inline, if that space also has 'white-space' set to 'normal', 'nowrap' or 'pre-line' — is removed.

Given a scenario as follows (where the colors represent spaces, U+0020, for easy identification):

<ltr>A <rtl> B </rtl> C</ltr>

the spec says that the space after A is kept, the space before B is removed, the space after B is kept, the space before C is removed. This is then rendered according to the Unicode bidirectional algorithm, and the end result is:

A  BC

Note that there are two spaces between A and B! The embedding levels can be expressed as follows:

11221

Tell us what you think (English).

Subscribe to an RSS feed.

New resources

Home page news

Twitter (Home page news)

‎@webi18n

Further reading

By: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2003-07-04. Last substantive update 2003-11-06 09:10:35 GMT. This version 2011-08-31 9:45 GMT

For the history of document changes, search for qa-bidi-space in the i18n blog.