Why use the language attribute?

Details

We list here a few of the ways that language information is useful at the moment, however, as specifications and browsers evolve in the future there could be numerous additional applications for language information.

Styling pages

Language attributes allow you to vary the styling of your content by language. For more information about how to do this, see Styling using the lang attribute.

For example, fonts or line spacing may need to change to accommodate different alphabets, style-generated quotation marks may need to be different by language, emphasis may need to be expressed in language dependent ways, etc.

The following example shows how you could set a specific font for embedded Arabic text in a page.

body {
    font-family: "Palatino Linotype", "Book Antiqua", Palatino, serif;
    }
:lang(ar) {
    font-family: "Traditional Arabic", "Al Bayan", serif;}

Another example of language-dependent behavior is hyphenation. Hyphenation rules are very language dependent. The description of the hyphens property in CSS (which at the time of writing is just starting to see adoption by browsers) says "Correct automatic hyphenation requires a hyphenation resource appropriate to the language of the text being broken. The UA is therefore only required to automatically hyphenate text for which the author has declared a language (e.g. via HTML lang or XML xml:lang) and for which it has an appropriate hyphenation resource."

The same German text is displayed in two narrow columns side by side — German hyphenation rules are applied when `lang="de"` is specified, breaking long compound words properly and improving text flow. (Live test.)

Other typographic and layout features that are affected by language include line-breaking, justification, and case conversion, and more are coming as the specifications develop.

Font selection

User-agents can (and do) use language information to select language-appropriate fonts, which improves the overall user experience of the page.

For example, in a page encoded in Unicode, text in Simplified Chinese, Traditional Chinese, Japanese, and Korean languages may share the same code point for an ideographic character, but speakers of these languages expect the glyphs used to vary in small details from language to language. In the absence of explicit styling applied by the content author, some browsers automatically assign appropriate fonts according to the language of the content. The illustration in the picture below shows the effect on text of changing nothing but the language attribute value in a browser such as Firefox or Internet Explorer.

The glyphs for the Han character for snow (雪) with in different scripts (en, ko, zh-Hant, zh-Hans, and ja).

Search

Although automatic language detection is commonly used by major search engines to identify the language of resources, page internal markup can be used to improve the quality of search results based on the user's linguistic preferences.

Spelling and grammar checkers

Authoring tools can adapt spelling and grammar checking based on the language of the content, or ignore content that is not in the language of the spelling checker. This can significantly improve efficiency when checking spelling.

Browsers have also recently begun to allow users to check the spelling of the text they type into forms or elements with the contenteditable attribute set. A browser that takes into account information about the language of the content can provide a more effective user experience.

Translation

Translation tools can use the language attributes to recognize pages or sections of text in a particular language and automatically adjust the workflow process or protect text from changes by the translator in translation tools.

Non-text readers

Language information assists speech synthesizers and Braille translators to produce usable results. These applications need to know whether they can produce output from the text, or whether perhaps they need to switch to a different language mode.

Language tagging is recommended by the W3C Web Accessibility Guidelines, which is enforced by governmental policies in some countries, eg. UK - Disability Discrimination Act (UK).

Parsers and scripts

Tagging content with language information also allows for language-specific processing.

For example, a script or XSLT style sheet could be used to do various things, including:

extract language-specific text from a page
look for and select information, from pages that are in a particular language
reorder content in the appropriate way for a given language (sort orders are very language dependent)
apply culture-specific styling, such as appropriate quote substitution or emphasis, during conversion to another format, such as XSL-FO.

Bear in mind that when you create the information you do not always know how people will want to process your information later.

Why use the language attribute?

Question

Quick answer