HTTP headers, meta elements and language information

In addition to the lang (and/or xml:lang) attribute on the html tag, you may come across language information in HTML meta elements, or in the HTTP header which is served with an HTML page. Here we look at whether these are useful when declaring language for HTML content, and if so, how they should be used.

This article is (specifically) about language declarations in HTTP headers and meta elements. It's not a general guide to setting language on an HTML page: for that, see Declaring language in HTML.

This article builds on the distinction between (1) using file metadata to identify the audience for the document, and (2) specifying the language used for the purpose of processing content. If you want to better understand the distinction see the article Types of language declaration.

The meta element

You may come across a language-related meta element near the top of an HTML file. It looks like this.

 Bad code. Don't copy!

<meta http-equiv="content-language" content="de">

This use of the content-language value for an http-equiv attribute is deprecated by the HTML specification, and should no longer be used. Instead, you should always use a lang attribute on the html tag to declare the default language of the text in the page.

If you want to know why this approach is deprecated, see below. To learn about how to use the lang attribute, see Declaring language in HTML.

HTTP headers

When you retrieve a web page or resource from a server, the server sends with it various bits of information about the thing you are retrieving (metadata). It uses a format referred to as HTTP headers. One of the items you may find in such metadata is language related. See the last line in the example below that shows the HTTP response that accompanies this article.

HTTP/1.1 200 OK
Date: Sat, 23 Jul 2011 07:28:50 GMT
Server: Apache/2
Content-Location: qa-http-and-lang.en.php
Vary: negotiate,accept-language,Accept-Encoding
TCN: choice
P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Content-Language: en

Here is an example of an HTTP Content-Language header that declares the resource to be a mixture of English, Hindi and Punjabi. Unlike a lang attribute on an HTML element, if your intended audience speaks more than one language, the HTTP header allows you to use a comma-separated list of languages.

Content-Language: en, hi, pa

The Content-Language header is associated with a particular page by settings on the server or by server-side scripting. For example, you will typically find a Content-language header in the HTTP metadata when a server holds more than one version of a resource, each in a different language. If the server uses information it has about you to automatically select a particular language version ('content negotiation'), the language version selected will be identified in the HTTP header.

The HTTP Content-Language header can provide useful language data related to the page or resource you are retrieving, but the HTTP specification indicates clearly that the intent of this information is to provide metadata about the intended audience of the document, rather than to indicate the actual language of the text itself. Such metadata may be used for serving the right language version, workflow management, classification, searching, etc. See also Inferring the language of the text from an HTTP header.

Because language information in the HTTP header is sent by the server, this information is simply not available if your page is accessed from a hard drive, memory stick, or other non-server based location. There is currently no widely recognized way of representing this kind of metadata inside the page.

Specifying the text-processing language

In HTML, the lang attribute should be used to specify the language of text content so that the browser can correctly display or process your content (eg. for hyphenation, styling, spell checking, etc). You should always use it on the html element tag, and then also use it on any elements that surround fragments of content in a different language.

If you want to better understand the distinction between (1) using file metadata to identify the audience for the document, and (2) specifying the language used for processing content, see the article Types of language declaration.

For more information about how to use the lang attribute, see Declaring language in HTML.

Inferring the language of the text from an HTTP header

If no language is declared on the html tag, some, but not all, mainstream browsers recognize the value declared in the HTTP header to set the default language of the text in the page. Even in a browser that does this, however, the information seems to be applied to some features and not others that are affected by language. The HTML5 specification says that if there is no lang attribute on the html tag, and if there is no meta element with the http-equiv attribute set to Content-Language, and if there is only a single language tag in the HTTP header declaration, then a browser must use that information to guess at the default language of the text in the page.

However, since you should always use a language attribute on the html tag, and the language attribute always overrides the HTTP header information, this really becomes a fine point. The HTTP header should be used only to provide metadata about the intended audience of the document as a whole, and the language attribute on the html tag should be used to declare the default language of the content.

Additional information

The information in this section is less likely to be useful, but is provided for completeness.

Why you shouldn't use the meta element

The use of a meta element in the document head with the http-equiv attribute set to Content-Language is not mentioned directly in the HTML 4.01 specification, and yet, for a long time, much of the informal guidance out on the Web about how to declare language for your HTML page suggested its use, and some HTML authoring tools automatically created such elements when you specified language information using dialog boxes. Here is an example that declares the language to be English.

 Bad code. Don't copy!

<meta http-equiv="Content-Language" content="en">

Unlike the lang and xml:lang attributes, the value of the content attribute can be a comma-separated list of language tags. The example below declares the primary languages of the document to be (in equal measure) German, French and Italian.

 Bad code. Don't copy!

<meta http-equiv="Content-Language" content="de, fr, it">

If the name of the meta element wasn't a clear enough clue, the fact that the value supports multiple languages indicates that this element is really about document level metadata. If you are to usefully indicate the language of a range of text, you have to be specific – it can only be in one language at a time. The meta element, then, is an in-document location for expressing metadata about the language of the intended audience of the document as a whole.

Until recently, few browsers paid any attention to this meta element. Then several major browsers began using this element, if there was no language attribute on the html tag, to set the default language of the text in the document (what you should use a language attribute on the html tag for). The way this was implemented was inconsistent, and therefore unreliable, from one browser to another.

Because of the history of confusion and inconsistent implementation surrounding this kind of declaration, in 2011 the HTML Working Group took a decision to make the meta element with http-equiv set to Content-Language non-conforming in HTML. This means that you should no longer use it in HTML5, and therefore, though technically not illegal in other types of HTML, it is best to now not use it anywhere.

HTML5 did, however, make a concession for backwards compatibility. If there is a meta element with http-equiv set to Content-Language in the markup, and if there is no language attribute on the html tag, and if the meta element has a value that is a single language tag, then a browser must use that information to guess at the default language of the text on the page. Having said that, this is only for backwards compatibility, and you really shouldn't use this approach any more. Simply use a language attribute on the html tag.

Document-internal metadata

One implication of HTML5 dropping the meta element for declaring language is that there is now no obvious way to provide metadata about the document inside the document itself. In theory it would be quite useful for content management systems, translation processes, and the like. This kind of information can be carried by an HTTP header (as we'll see in the next section), but such systems and processes tend to work on documents that are not sent from a server with an HTTP header, and so in-document metadata would be useful.

Perhaps another approach, such as RDFa, would provide a way of representing such information in the future.

The WHATWG Wiki MetaExtensions page provides an extended list of values that could be used with the meta element's name value, though none have been formally accepted yet. One such value is dc.language, used to express language information with Dublin Core notation.

Do not use this <meta name="dc.language" content="en">

It is unclear, however, that this information is ever used by browsers, or to what extent it is used by any other application. The WHATWG page recommends that the lang attribute be used instead. That recommendation is good for declaring the text-processing language, but doesn't address its possible use for expressing metadata about the page as a whole.