Styling using language attributes

Question

What is the most appropriate way to associate CSS styles with text in a particular language in a multilingual HTML or XML document?

Presentation styles are commonly used to control changes in fonts, font sizes and line heights when language changes occur in the document. This can be particularly useful when dealing with Simplified versus Traditional Chinese, where users tend to prefer different fonts, even though they may be using many of the same characters. It can also be useful to better harmonize the look of mixed, script-specific fonts, such as when mixing Arabic and Latin fonts.

This page looks at available options for doing this most effectively.

Quick answer

The best way to style content by language in HTML is to use the :lang selector in your CSS style sheet. For example:

:lang(ta) 	{
    font-family: Latha, "Tamil MN", serif;
    font-size: 120%;
    }

The rest of the article adds some detail about :lang, and compares with two other approaches.

Alternatives

Three CSS selectors are commonly used to apply styles where the language changes in a document.

  1. [lang = "..."]
  2. [lang |= "..."]
  3. :lang()

All match the value of a lang attribute in HTML, and all are supported by major browsers (see the test results).

The [lang="..."] selector

Use this selector to style an element where the lang value exactly matches that in the selector.

The following CSS:

*[lang="zh"] {font-family: Kaiti, Kai, serif;}

Will style the span element below:

<p>"This is <em>English</em>" translates as <span lang="zh">这是<em>英文</em></span>.</p>

However, it will not match a span element with the lang value of zh-Hans. The attribute value has to match the selector value exactly.

The [lang|="..."] selector

Use this selector to style an element where the lang value starts with the value in the selector.

The following CSS:

*[lang|="zh"] {font-family: Kaiti, Kai, serif;}

Will style the span element below:

<p>"This is <em>English</em>" translates as <span lang="zh-Hans">这是<em>英文</em></span>.</p>

In fact, it will match any element with a lang value that starts with the zh language subtag, including zh, zh-Hant, zh-TW, zh-Hans-CN, etc.

Inheritance of language values

A significant difference between :lang and the other methods is that it recognizes the language of the content of an element even if the language is declared outside the element in question.

Suppose, for example, that in a future English document containing Japanese text you wanted to style emphasized Japanese text using special Asian CSS3 properties, rather than italicization (which doesn't always work well with the complex characters of Japanese). You might have the following rules in your style sheet:

em { font-style: italic; }
em:lang(ja)  { font-style: normal; text‑emphasis: dot; text‑emphasis‑position: over right; }

Now assume that you have the following content, that the user agent supports :lang, and that the html tag states that this is an English document.

<p>This is <em>English</em>, but <span lang="ja">これは<em>日本語</em>です。</span></p>

You would expect to see the emphasized English word italicized, but the emphasized Japanese word in regular text with small dots above each character, something like this:

Picture of what was just described.

The important point to be made in this section is that this would not be possible using the [lang|="..."] or [lang="..."] selectors. For those to work you would have to declare the language explicitly on each Japanese em tag.

This is a significant difference between the usefulness of these different selectors.

Which language attribute?

The lang attribute is used to identify the language of text served as HTML. Text served as XML should use the xml:lang attribute.

For XHTML that is served as text/html, it is recommended that you use both attributes, since the HTML parser will pick up on the lang attribute, whereas if you parse the content as XML the xml:lang attribute will be used by your XML parser.

The article will first discuss the various options for styling by language in HTML, using the lang attribute. There then follows a section about how to style XML documents based on xml:lang.

The :lang(...) pseudo-class selector

The HTML fragment:


<p>It is polite to welcome people in their own language:</p>
<ul>
    <li lang="zh-Hans">欢迎</li>
    <li lang="zh-Hant">歡迎</li>
    <li lang="el">Καλοσωρίσατε</li>
    <li lang="ar">اهلا وسهلا</li>
    <li lang="ru">Добро пожаловать</li>
    <li lang="din">Kudual</li>
</ul>

could have the following styling:

body 		{font-family: "Times New Roman",serif;}
:lang(ar) 	{font-family: "Scheherazade",serif; 
                 font-size: 120%;}
:lang(zh-Hant) 	{font-family: Kai,KaiTi,serif;}
:lang(zh-Hans) 	{font-family: DFKai-SB,BiauKai,serif;}
:lang(din) 	{font-family: "Doulos SIL",serif;}

The Greek and Russian use the styling set for the body element.

This is the ideal way to style language fragments, because it is the only selector that can apply styling to the content of an element when the language of that content is declared earlier in a page.

A rule for :lang(zh) would match elements with a language value of zh. It would also match more specific language specifications such as zh-Hant, zh-Hans and zh-TW.

The selector :lang(zh-Hant) will only match elements that have a language value of zh-Hant or have inherited that language value. If the CSS rule specified :lang(zh-TW), the rule would not match our sample paragraph.

A [lang|="..."] selector that matches the beginning of a value of an attribute

For markup example we saw in the previous section, the style sheet could be written as:

body 		   {font-family: "Times New Roman",serif;}
*[lang|="ar"] 	   {font-family: "Scheherazade",serif; 
                    font-size: 120%;}
*[lang|="zh-Hant"] {font-family: Kai,KaiTi,serif;}
*[lang|="zh-Hans"] {font-family: DFKai-SB,BiauKai,serif;}
*[lang|="din"]     {font-family: "Doulos SIL",serif;}

Unlike :lang, this selector will only work for elements which carry a lang attribute (see Inheritance of language values).

There is a significant difference between this selector and [lang="..."]. Whereas [lang="..."] will only match elements when the selector value and the attribute value are identical, this selector value will match a language attribute value that has additional hyphen-separated values. Therefore the selector [lang|="sl"] would match sl-IT, sl-nedis or sl-IT-nedis, and the selector [lang|="zh-Hans"] would also match zh-Hans-CN.

Generic class or id selectors

This method avoids the need to match the language declarations at all, and relies on class or id attribute markup. Using an ordinary CSS class or id selector works with most browsers that support CSS. The disadvantage is that adding the attributes takes up time and bandwidth.

For the markup example above, this would require us to change the HTML code by adding class attributes as follows:


<p>It is polite to welcome people in their own language:</p>
<ul>
    <li class="zhs" lang="zh-Hans">欢迎</li>
    <li class="zht" lang="zh-Hant">歡迎</li>
    <li class="el" lang="el">Καλοσωρίσατε</li>
    <li class="ar" lang="ar">اهلا وسهلا</li>
    <li class="ru" lang="ru">Добро пожаловать</li>
    <li class="din" lang="din">Kudual</li>
</ul>

We could then have the following styling:

body 	{font-family: "Times New Roman",serif; }
.ar 	{font-family: "Scheherazade",serif; 
         font-size: 120%;}
.zht 	{font-family: Kai,KaiTi,serif;}
.zhs 	{font-family: DFKai-SB,BiauKai,serif;}
.din	{font-family: "Doulos SIL",serif;}

Using CSS selectors in XML with xml:lang

As mentioned earlier, in a document that is parsed as XML you need to use the xml:lang attribute (rather than the lang attribute) to express language information.

Using :lang

Use of :lang is straightforward. If the document is parsed as HTML, the :lang selector will match content where the language was defined using a lang attribute value. However, if the document is parsed as XML, the :lang selector will match content labeled with an xml:lang attribute value and ignore any lang attribute value.

Using attr= and attr|=

Use of these selectors involves some additional considerations.

The xml: part of the xml:lang attribute indicates that this is the lang attribute used in the XML namespace. CSS3 Namespaces describes how to handle xml:lang as an attribute in a namespace. Basically you need to declare the namespace and then replace the colon with a vertical bar. For example:

@namespace xml "http://www.w3.org/XML/1998/namespace" 
*[xml|lang |= 'ar'] { ... }

or:

@namespace xml "http://www.w3.org/XML/1998/namespace" 
*[xml|lang = 'ar'] { ... }

Any @namespace rules must follow all @charset and @import rules and precede all other non-ignored at-rules and rule sets in a style sheet. Note, also, that the URI for the namespace declaration must be exactly correct.

Fallbacks

For browsers that are not namespace aware, you can fall back to escaped characters. For this you need no @namespace declaration, just one of the following:

*[xml\:lang |= '..'] { ... }

or:

*[xml\:lang = '..'] { ... }

Note, however, that if you try to use this approach with a namespace-aware browser (ie. most recent, major browsers), it will not work, so if you feel it is needed, you should use this approach in addition to the namespace-based selectors.

By the way

I have used the language codes zh-Hant and zh-Hans. These language codes do not represent specific languages. zh-Hant would indicate Chinese written in Traditional Chinese script. Similarly zh-Hans represents Chinese written in Simplified Chinese script. This could refer to Mandarin or many other Chinese languages.

Until the zh-Hans and zh-Hant language tags were available, the codes zh-TW and zh-CN were used to indicate Traditional and Simplified versions of Chinese writing, respectively. This is not actually appropriate because zh-TW indicates the Chinese language spoken in Taiwan, although more than one Chinese language is spoken there. Similarly zh-CN really indicates a generic Chinese spoken language used in China (PRC), rather than Simplified Chinese writing. It could refer to Mandarin or any other Chinese language. The same code was also used incorrectly for the Simplified Chinese written in Singapore.

If you need to use language tags to differentiate between Chinese languages, the IANA language subtag registry has more precise language codes for a range of Chinese languages. For more information see Language tags in HTML and XML.