How should I set the language of the content in my HTML page?
This page describes how to mark up an HTML page so that it gives information about the language of the page. It begins with an overall summary, then provides additional details in subsequent sections.
Always use a language attribute on the
html tag to declare the default language of the text in the page. This is inherited by all other elements. For example:
Note that you should use the
html element rather than the
body element, since the
body element doesn't cover the text inside the document's
When the page contains content in another language, add a language attribute to an element surrounding that content. This allows you to style or process it differently. For example:
lang attribute for pages served as HTML. (For pages served as XML, including XHTML 1.x and HTML5 polyglot documents, see Choosing the right attribute.)
Use language tags from the IANA Language Subtag Registry. You can find subtags using the unofficial Language Subtag Lookup tool. (more)
In some parts of your code you may have a problem. If you have multilingual text in the
title element, you cannot mark up parts of the text for different languages because the
title attribute only allows characters – no markup. The same goes for multiple languages in attribute values. There is no good solution for this at the moment.
Use nested elements to take care of content and attribute values on the same element that are in different languages. (more)
You should never use a
meta element with the
http-equiv attribute set to
Content-Language to indicate the language of a page, but in certain circumstances you may want to serve language information with the HTTP header to indicate the intended audience of your page. Whether or not you use the HTTP header, you should always declare the language of the text in a page using a language attribute on the
html tag. For more information see the companion article, HTTP headers,
meta elements and language information.
This section provides more detailed information on a variety of topics related to declaring language in HTML.
Occasionally the language of the text in an
attribute and the element content are in different languages. For example, at the top right corner of this article there are links to translated versions of this page. The
link text shows the language of the target page using the language of the target page, but an associated
title attribute contains a hint in the
language of the current page:
If your code looks as follows, the language
attributes would actually indicate that not only the content but also the
title attribute text is in Spanish.
This is obviously incorrect.
Instead, move the attribute containing text in a different language to another element, as shown in this example, where the
a element inherits the default
en setting of the
If you want to specify the language of some content but there is no markup around it, use an element such as
div around the content.
Here is an example:
To be sure that all user agents recognize which language you mean, you need to follow a standard approach when providing language attribute values. You also need to consider how to refer in a standard way to dialectal differences between languages, such as the difference between US English and British English, which diverge significantly in terms of spelling and pronunciation.
The rules for creating language attribute values are described by an IETF specification called BCP 47. In addition to specifying how to use simple language tags, such as
en for English or
fr for French, BCP 47 describes
how to compose language tags that allow you specify regional dialects, scripts and other variants related to that language.
BCP 47 incorporates, but goes beyond, the ISO sets of language and country codes. To find relevant codes you should consult the IANA Language Subtag Registry.
For a gentle but fairly thorough introduction to the syntax of BCP 47 tags, read Language tags in HTML and XML. For help in choosing the right language tag out of the many possible tags and combinations, see Choosing a language tag.
If your document is HTML (ie. served as
text/html), use the
lang attribute to set the language of the
document or a range of text. For example, the following sets the default language to French:
When serving XHTML 1.x or polyglot pages as
text/html, use both the
lang attribute and the
xml:lang attribute together every time you want to set the language. The
xml:lang attribute is the standard way to identify language information in XML. Ensure that the values for both attributes are identical.
xml:lang attribute is not actually useful for handling the file as HTML, but takes over from the
lang attribute any time you process or serve the document as XML. The
lang attribute is allowed by the syntax of XHTML, and may also be recognized by browsers. When using other XML parsers, however (such as the
lang() function in XSLT) you can't rely on the
lang attribute being recognized.
If you are serving your page as XML (ie. using a MIME type such as
application/xhtml+xml), you do
not need the
lang attribute. The
xml:lang attribute alone will suffice.
The information in this section is less likely to be useful, but is provided for completeness.
In addition to including an in-page language attribute on the
html tag (which you should always do), you may also have come across language declarations in the HTTP header (which is served with the page), or as
Importantly, the in-page language declaration always overrides the HTTP information when it comes to determining the actual language of the text, but the HTTP information may provide more general information about the intended use of the resource. Use of
meta elements in the HTML page for declaring language is not recommended.
For information about
Content-Language in HTTP and in
meta elements see HTTP headers,
meta elements and language information.
Just for good measure, and for the sake of thoroughness, it is perhaps worth mentioning a few other points that are not relevant to this discussion.
Firstly, it is not possible to declare the language of text using CSS.
DOCTYPE that should start any HTML file may contain what looks to some people like a language declaration. The
DOCTYPE in the example below contains the text EN, which stands for 'English'. This, however, indicates the language of the schema associated with this document – it has nothing to do with the language of the document itself.
Thirdly, sometimes people assume that information about natural language could be inferred from the character encoding. However, a character encoding does not enable unambiguous identification of a natural language: there must be a one-to-one mapping between encoding and language for this inference to work, and there isn't one. For example, a single character encoding could be used for many languages, eg. Latin 1 (ISO-8859-1) could encode both French and English, as well as a great many other languages. In addition, the character encoding can vary over a single language, for example Arabic could use encodings such as 'Windows-1256' or 'ISO-8859-6' or 'UTF-8'.
All these encoding examples, however, are nowadays moot, since all content should be authored in UTF-8, which covers all but the rarest of languages in a single character encoding.
The same goes for text direction. As with encodings and language, there is not always a one-to-one mapping between language and script, and therefore directionality. For example, Azerbaijani can be written using both right-to-left (Arabic) and left-to-right (Latin or Cyrillic) scripts, and the language code
az can be relevant for either. In addition, text direction markup used with inline text applies a range of different values to the text, whereas language is a simple switch that is not up to the tasks required.