Accesskey n skips to in-page navigation. Skip to the content start.

s_gotoW3cHome Internationalization
 

HTTP and meta for language information

Question

Should I declare the language of my XHTML document using a language attribute, the Content-Language HTTP header, or a meta element?

Background

In current practice one can find XHTML documents that provide information about the language of a page in a number of different ways.

One method is to use the lang and xml:lang attributes on the html tag.

Example:
<html lang="en" xml:lang="en">

Alternatively, you may find documents that provide this information using a meta element.

Example:
<meta http-equiv="Content-Language" content="en"/>

Language information may also be found in the HTTP header that is sent with a document (see the last line in the following example of an HTTP header).

Example:
HTTP/1.1 200 OK
Date: Wed, 05 Nov 2003 10:46:04 GMT
Server: Apache/1.3.28 (Unix) PHP/4.2.3
Content-Location: CSS2-REC.en.html
Vary: negotiate,accept-language,accept-charset
TCN: choice
P3P: policyref=http://www.w3.org/2001/05/P3P/p3p.xml
Cache-Control: max-age=21600
Expires: Wed, 05 Nov 2003 16:46:04 GMT
Last-Modified: Tue, 12 May 1998 22:18:49 GMT
ETag: "3558cac9;36f99e2b"
Accept-Ranges: bytes
Content-Length: 10734
Connection: close
Content-Type: text/html; charset=iso-8859-1
Content-Language: en

It is also worth noting that the meta element and the HTTP header both support a list of values. The example below declares the primary languages of the document to be (in equal measure) German, French and Italian.

Example:
<meta http-equiv="Content-Language" content="de, fr, it"/>

The question is, which of these methods is the best approach?

Answer

To answer this question, a distinction needs to be drawn between (1) specifying the language used for processing content, and (2) using metadata to identify the audience for the document. It is recommended that the lang and xml:lang attributes be used for the former, and the HTTP Content-Language setting or a meta element be used, if appropriate or needed, for the latter. The rest of this answer will attempt to explain this distinction a little more clearly.

Specifying the text-processing language

When specifying the language for text-processing you are declaring the language in which a specific range of text is actually written, so that user agents or applications such as voice browsers or spell checkers can effectively handle the text in question. So we are, by necessity, talking about associating a single language with a specific range of text. Enclosed elements inherit the declared value, but you can, of course, override an initial declaration by specifying a different language on embedded elements where the language changes.

Specifying file metadata

Many documents on the Web contain embedded fragments of content in a different language than the overall content. Generally speaking, however, the page is still aimed at speakers of one particular language. For example, a German city-guide for Beijing may contain useful phrases in Chinese, but the primary language of the page is German, ie. it is aimed at a German-speaking audience.

It is possible, however, to imagine a situation where a Web page contains the same information in more than one language. For example, it may be a page that welcomes Canadian readers with French content in the left column, and the same content in English in the right hand column. Here the document is equally targeted at speakers of both languages. This situation is not as common on the Web as in printed material, since it is easy to link to a separate page on the Web for different audiences, but it is still possible.

It may be useful to use metadata to express whether a particular document addresses its readers in one language or more, and what those languages are. Note, however, that we are talking about the document as a whole here. We are not saying that any particular range of text in the document is in a particular language, as we would if we were specifying the language for text-processing. Nor are we listing the languages of all fragments of text in the document. We are providing metadata to indicate who is the intended audience for the document.

Choosing between language attributes, HTTP headers and meta elements

To specify the language of fragments within an XHTML document there is no other choice but to use the lang and/or xml:lang attributes on the appropriate elements. The HTTP header or meta element information are not relevant here.

By extension, you should specify the text-processing language of the content as a whole by using the lang and/or xml:lang attributes in the html tag. Since all other elements in the document are a subset of the html tag, they naturally inherit this value until it is overridden by additional attributes.

This conforms to current best practice recommendations, and existing user agents currently recognize language values declared in this way when they come to apply language-specific styling, default fonts for Chinese, Japanese and Korean, etc.

Language declarations in the HTTP header or meta elements should be used for expressing metadata about a file.

The following are reasons that they are inappropriate for declaring the language of text-processing:

On the other hand, the fact that an HTTP header or meta element can specify more than one language at a time is very useful for describing metadata. Since the lang and xml:lang attributes describe elements and are limited to specifying only a single language at a time, they are not suitable for this task.

The usefulness of declaring metadata about a file using the HTTP header or the meta element, and the choice of which is the best approach, depend on conventions for use of that information. Such things are not specified by the HTML specification, but could be standardized by other technologies, such as search engines or other tools. On the other hand, it cannot hurt to use such constructs.

Other considerations

There are potential issues related to use of the HTTP Content-Language header surrounding the maintenance and use of server-side information. Many authors may find it difficult to access server settings, particularly when dealing with an ISP. So this approach is not a solution that is always available.

In addition, it can be useful for troubleshooters or translation process managers to be able to identify the language or languages of a file by looking inside the document. So having the information inside the document is useful, even if you use the HTTP header.

By the way

Note that this discussion is very different from that about the use of the meta charset declaration and HTTP charset header. There is no alternative markup construct in XHTML for declaring the character encoding of a document, and the HTTP header takes precedence over the meta declaration.

Where a document has two or more equal primary languages, as described above, it is hard to know how to deal with the title element if it contains text in more than one language. It can only be labeled as one because it cannot contain markup, and cannot be repeated.

Tell us what you think (English).

Subscribe to an RSS feed.

New resources

Home page news

Further reading

Author: Richard Ishida, W3C.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content first published 2004-07-02. Last substantive update 2004-07-02 15:51 GMT. This version 2006-11-19 11:44 GMT

For the history of document changes, search for qa-http-and-lang in the i18n blog.