From HTML WG Wiki
Revision as of 18:38, 9 February 2010 by Kkrueger (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Changed attribute: HTML 4 "lang" attribute

Issue: Justification for changing the "lang" attribute, so that it can be specified globally on any elements.

What Is Being Changed?

Quick Overview

Compared with HTML 4.01, HTML 5 adds the "lang" attribute to BASE, BR, PARAM, and SCRIPT (which basically has no particular effect).

HTML5 also adds the "lang" attribute to IFRAME, where it could potentially be useful if IFRAME is used to visually render fallback content, for example, a long description of a graphic or a version of a document instance in its original natural language, embedded in an IFRAME of an auto-translated version of that document instance.

HTML5 also drops the following elements from HTML 4.01 which did not allow the "lang" attribute: APPLET, BASEFONT, FRAME, and FRAMESET

The "lang" attribute as Defined by HTML 4.01

8. Language information and text direction

This section of the document discusses two important issues that affect the internationalization of HTML: specifying the language (the lang attribute) and direction (the dir attribute) of text in a document.

8.1 Specifying the language of content: the "lang" attribute

Attribute definitions

lang = language-code CI::This attribute specifies the base language of an element's attribute values and text content. The default value of this attribute is unknown.

Language information specified via the lang attribute may be used by a user agent to control rendering in a variety of ways. Some situations where author-supplied language information may be helpful include:

  • Assisting search engines
  • Assisting speech synthesizers
  • Helping a user agent select glyph variants for high quality typography
  • Helping a user agent choose a set of quotation marks
  • Helping a user agent make decisions about hyphenation, ligatures, and spacing
  • Assisting spell checkers and grammar checkers

The lang attribute specifies the language of element content and attribute values; whether it is relevant for a given attribute depends on the syntax and semantics of the attribute and the operation involved.

The intent of the lang attribute is to allow user agents to render content more meaningfully based on accepted cultural practice for a given language. This does not imply that user agents should render characters that are atypical for a particular language in less meaningful ways; user agents must make a best attempt to render all characters, regardless of the value specified by lang.

For instance, if characters from the Greek alphabet appear in the midst of English text:

<P><Q lang="en">Her super-powers were the result of
&gamma;-radiation,</Q> he explained.</P>

a user agent (1) should try to render the English content in an appropriate manner (e.g., in its handling the quotation marks) and (2) must make a best attempt to render γ even though it is not an English character.

Please consult the section on undisplayable characters for related information.

8.1.1 Language codes

The lang attribute's value is a language code that identifies a natural language spoken, written, or otherwise used for the communication of information among people. Computer languages are explicitly excluded from language codes.

RFC1766 defines and explains the language codes that must be used in HTML documents.

Briefly, language codes consist of a primary code and a possibly empty series of subcodes:

        language-code = primary-code ( "-" subcode )*

Here are some sample language codes:

  • "en": English
  • "en-US": the U.S. version of English.
  • "en-cockney": the Cockney version of English.
  • "i-navajo": the Navajo language spoken by some Native Americans.
  • "x-klingon": The primary tag "x" indicates an experimental language tag

Two-letter primary codes are reserved for ISO639 language abbreviations. Two-letter codes include fr (French), de (German), it (Italian), nl (Dutch), el (Greek), es (Spanish), pt (Portuguese), ar (Arabic), he (Hebrew), ru (Russian), zh (Chinese), ja (Japanese), hi (Hindi), ur (Urdu), and sa (Sanskrit).

Any two-letter subcode is understood to be ISO3166 country code.

8.1.2 Inheritance of language codes

An element inherits language code information according to the following order of precedence (highest to lowest):

  • The lang attribute set for the element itself.
  • The closest parent element that has the lang attribute set (i.e., the lang attribute is inherited).
  • The HTTP "Content-Language" header (which may be configured in a server).
    • For example:
Content-Language: en-cockney
  • User agent default values and user preferences.

In this example, the primary language of the document is French ("fr"). One paragraph is declared to be in Spanish ("es"), after which the primary language returns to French. The following paragraph includes an embedded Japanese ("ja") phrase, after which the primary language returns to French.

<HTML lang="fr">
<TITLE>Un document multilingue</TITLE>
<em>...Interpreted as French...</em>
<P lang="es"><em>...Interpreted as Spanish...</em>
<P><em>...Interpreted as French again...</em>
<P><em>...French text interrupted by</em><EM lang="ja">some
         Japanese</EM><em>French begins here again...</em>

Note.</b> Table cells may inherit lang values not from its parent but from the first cell in a span. Please consult the section on alignment inheritance for details.

8.1.3 Interpretation of language codes

In the context of HTML, a language code should be interpreted by user agents as a hierarchy of tokens rather than a single token. When a user agent adjusts rendering according to language information (say, by comparing style sheet language codes and lang values), it should always favor an exact match, but should also consider matching primary codes to be sufficient. Thus, if the lang attribute value of "en-US" is set for the HTML element, a user agent should prefer style information that matches "en-US" first, then the more general value "en".

<b>Note.</b> Language code hierarchies do not guarantee that all languages with a common prefix will be understood by those fluent in one or more of those languages. They do allow a user to request this commonality when it is true for that user.

The "lang" attribute as Defined by HTML5

The lang attribute specifies the primary language for the element's contents and for any of the element's attributes that contain text. Its value must be a valid RFC3066 language code, or the empty string.

The xml:lang attribute is defined in XML.

If these attributes are omitted from an element, then it implies that the language of this element is the same as the language of the parent element. Setting the attribute to the empty string indicates that the primary language is unknown.

The "lang" attribute may only be used on elements of <a href="#html-">HTML documents</a>. Authors must not use the "lang" attribute in XML documents.

The xml:lang attribute may only be used on elements of XML documents. Authors must not use the "xml:lang" attribute in HTML documents.

To determine the language of a node, user agents must look at the nearest ancestor element (including the element itself if the node is an element) that has an lang or xml:lang attribute set. That specifies the language of the node.

If both the xml:lang attribute and the lang attribute are set on an element, user agents must use the xml:lang attribute, and the lang attribute must be ignored for the purposes of determining the element's language.

If no explicit language is given for the root element, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language. In the absence of any language information, the default value is unknown (the empty string).

User agents may use the element's language to determine proper processing or rendering (e.g. in the selection of appropriate fonts or pronounciations, or for dictionary selection).

The lang DOM attribute must reflect the lang content attribute.

Rationale: Why this Attribute Should be Changed

  1. List Rationale
  2. List Rationale
  3. List Rationale
  4. List Rationale
  5. List Rationale
  6. Applicable Design Principles (proposed)
    • Specific Principle
    • Specific Principle
    • Specific Principle
    • Specific Principle

Rationale: Why this Attribute Should Not be Changed

  1. List Rationale
  2. List Rationale
  3. List Rationale
  4. List Rationale
  5. List Rationale
  6. Applicable Design Principles (proposed)
    • Specific Principle
    • Specific Principle
    • Specific Principle
    • Specific Principle

Advice From Authorities



Use Cases

Policies, Guidelines, and Law

Related References

Related E-mail


*HTML: HTML Working Group Issues Page
*TableOfContents: An Index to HTML WG Wiki Space