Why use the language attribute?

Question

Why should I use the language attribute in web pages?

The lang (or sometimes the xml:lang ) attribute specifies the natural language of the content of a web page. An attribute on the html tag sets the language for all the text on the page. If part of the page uses text in a different language, you can add a language attribute with a different value to the element that surrounds that content. For information about how to use language attributes see Declaring language in HTML.

Quick answer

Identifying the language of your content allows you to automatically do a number of things, from changing the look and behavior of a page, to extracting information, to changing the way that an application works. Some language applications work at the level of the document as a whole, some work on appropriately labeled document fragments.

It is best to add language information to your content now in order to be able to reap the benefits when new developments arise. It is simple to do when creating content, but more difficult to retrofit later.

Details

We list here a few of the ways that language information is useful at the moment, however, as specifications and browsers evolve in the future there could be numerous additional applications for language information.

Styling pages

Language attributes allow you to vary the styling of your content by language. For more information about how to do this, see Styling using the lang attribute.

For example, fonts or line spacing may need to change to accommodate different alphabets, style-generated quotation marks may need to be different by language, emphasis may need to be expressed in language dependent ways, etc.

The following example shows how you could set a specific font for embedded Arabic text in a page.

body {
    font-family: "Palatino Linotype", "Book Antiqua", Palatino, serif;
    }
:lang(ar) {
    font-family: "Traditional Arabic", "Al Bayan", serif;}

Another example of language-dependent behavior is hyphenation. Hyphenation rules are very language dependent. The description of the hyphens property in CSS (which at the time of writing is just starting to see adoption by browsers) says "Correct automatic hyphenation requires a hyphenation resource appropriate to the language of the text being broken. The UA is therefore only required to automatically hyphenate text for which the author has declared a language (e.g. via HTML lang or XML xml:lang) and for which it has an appropriate hyphenation resource."

Other typographic and layout features that are affected by language include line-breaking, justification, and case conversion, and more are coming as the specifications develop.

Font selection

User-agents can (and do) use language information to select language-appropriate fonts, which improves the overall user experience of the page.

For example, in a page encoded in Unicode, text in Simplified Chinese, Traditional Chinese, Japanese, and Korean languages may share the same code point for an ideographic character, but speakers of these languages expect the glyphs used to vary in small details from language to language. In the absence of explicit styling applied by the content author, some browsers automatically assign appropriate fonts according to the language of the content. The illustration in the picture below shows the effect on text of changing nothing but the language attribute value in a browser such as Firefox or Internet Explorer.

Search

Although automatic language detection is commonly used by major search engines to identify the language of resources, page internal markup can be used to improve the quality of search results based on the user's linguistic preferences.

Spelling and grammar checkers

Authoring tools can adapt spelling and grammar checking based on the language of the content, or ignore content that is not in the language of the spelling checker. This can significantly improve efficiency when checking spelling.

Browsers have also recently begun to allow users to check the spelling of the text they type into forms or elements with the contenteditable attribute set. A browser that takes into account information about the language of the content can provide a more effective user experience.

Translation

Translation tools can use the language attributes to recognize pages or sections of text in a particular language and automatically adjust the workflow process or protect text from changes by the translator in translation tools.

Non-text readers

Language information assists speech synthesizers and Braille translators to produce usable results. These applications need to know whether they can produce output from the text, or whether perhaps they need to switch to a different language mode.

Language tagging is recommended by the W3C Web Accessibility Guidelines, which is enforced by governmental policies in some countries, eg. UK - Disability Discrimination Act (UK).

Parsers and scripts

Tagging content with language information also allows for language-specific processing.

For example, a script or XSLT style sheet could be used to do various things, including:

Bear in mind that when you create the information you do not always know how people will want to process your information later.

By the way

The usefulness of language tagging has increased over recent years, as technology has progressed, and it will continue to increase as we go forward. In many cases, these applications may not be things you see as important when first developing your content, but may grow in value as time progresses. However, we are currently faced with a circular problem. People who don't see the applications of language information do not provide information about the language of their content. Language-related applications are slow to be deployed until this information is widely applied to content. This cycle can be broken by content authors declaring language information as a matter of course. The more content is tagged and tagged correctly, the more useful and pervasive such applications will become. Adding language information is usually easy to do, and carries no penalties.