Accesskey n skips to in-page navigation. Skip to the content start.

Go to W3C Home PageGo to Architecture Domain home page  Internationalization 
 

FAQ: Why use the language attribute?

Intended audience: XHTML/HTML coders (using editors or scripting), script developers (PHP, JSP, etc.), and anyone who is wondering why they should use language attributes in HTML.

Question

Why should I use the language attribute in web pages?

Answer

Applications exist that can use natural language information about content to deliver to users the most relevant information based on their language preferences. The more content is tagged and tagged correctly, the more useful and pervasive such applications will become.

Language declarations specify the 'natural language' of web page content. A declaration should always be used to indicate the language of a web page as a whole. If the language changes within the main page container element this should also be reflected in a sub-container element, eg., span, div, td, p, etc.

HTML, XHTML and XML vary in the way the language attribute is specified. See the tutorial Declaring Language in XHTML and HTML.

By the way, do not equate language information and charset declarations. These are separate things.

Information that indicates content language can be useful for many applications. Some of these work at the level of the document as a whole, some work on appropriately labeled document fragments. What follows is a list of a few possible applications for language information:

Authoring tools

Authoring tools can supply appropriate spelling and grammar checking based on the language of a segment.

Translation tools

Translation tools can use the tags to help recognize sections of text in a particular language.

Accessibility

Language information assists speech synthesizers and Braille translators; it is required by the W3C Web Accessibility Initiative (WAI) and enforced by governmental policies in some countries, eg, UK - Disability Discrimination Act (UK).

Font selection

User-agents can (and do) use the content language to select language-appropriate fonts, which improves the overall user experience of the page.

Page rendering

CSS2 uses language information powerfully as a pseudo class. For example, you might want to use different font size depending on the language:

<style type="text/css">
:lang(ar)   {
    font-family: Traditional Arabic, serif;
    font-size: 125%;
}
:lang(fr)   {
    font-family: Arial, sans-serif;
    font-size: 100%;
}
</style>

This is not implemented in the current version of Microsoft Internet Explorer, but does work in other browsers such as Mozilla.

Search

Search engines can group or filter results based on the user's linguistic preferences.

It is also common to use meta tags to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the meta elements, using associated language attributes, and display search results according to the language preferences of the user.

Parsing

You or other people may need to use such things as XSL or scripting to process the text in your document. While processing the file you can use language information to extract or identify specific text, or apply linguistically appropriate output (eg. sorting or quoting), styling, etc. Bear in mind that when you create the information you do not always know how people will want to process your information later.

By the way

You might think information about natural language could be inferred from the character encoding. However, character encoding does not enable unambiguous identification of a natural language: there must be a 1:1 mapping between encoding and language for this inference to work... and there isn't one. For example, a single character encoding could be used for many languages, eg, Latin 1 (iso-8859-1) could encode both French and English, as well as a great many other languages. In addition, the character encoding can vary over a single language, eg, Arabic could be encoded with 'Windows-1256' or 'ISO-8859-6' or 'UTF-8' (or another Unicode encoding).

Tell us what you think (English).

Subscribe to an RSS feed.

New resources

Home page news.

Further reading

Author: Deborah Cawkwell, BBC World Service.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2004-06-21. Last substantive update 2005-08-23 16:35 GMT. This version 2006-11-20 16:33 GMT

Page location: http://www.w3.org/International/questions/qa-lang-why.en.php

For the history of document changes, search for qa-lang-why in the i18n blog.