This is a DRAFT resource that supports Working Drafts of WCAG 3. Content in this resource is not mature and should not be considered authoritative. It may be changed, replaced or removed at any time.

Method: HTML lang attribute indicates the language of text

Outcome

This method supports the outcome Changes Of Natural Language.

Platform

All platforms that support HTML.

Technology

HTML

Input aspects for testing

CSS styling
DOM tree

Summary

When authors specify the correct natural language of content, user agents, including assistive technologies, can present text more accurately. Screen readers can load the correct pronunciation rules and braille tables for languages.

How it solves user need

Specifying both the natural language of the view's main content as well as any changes in language of blocks of content in that view can help:

People who use screen readers or other technologies that convert text into synthetic speech;
People who use braille that need the correct tables to be applied to text; and
People with certain cognitive, language, and learning disabilities who use text-to-speech software.

When to use and what to do

Ensure that the HTML element has a lang attribute. The lang attribute value must use a valid language tag, for example lang="en" indicates the language as English. The attribute value must match the natural language of the page title element. Changes in language must be indicated using the lang attribute on the closest parent element.

W3C Resources

Internationalization Best Practices: Specifying Language in XHTML & HTML Content
Explainer: Improving Spoken Presentation on the Web
HTML page has lang attribute - WCAG2 ACT Rule
HTML element language subtag matches language - WCAG2 ACT Rule
Element with lang attribute has valid language tag - WCAG2 ACT Rule

Non-W3C Resources

Language codes - IANA assignments language subtag registry
The lang and xml:lang attributes
RFC 5646: Tags for Identifying Languages

Accessibility Support

There are differences in how assistive technologies handle unknown and invalid language tags. Some will default to the language of the page, whereas others will default to the closest ancestor with a valid lang attribute. Languages with no known pronunciation such as Latin are exempt from this method and outcome. A page written in Haryanvi can not correctly be pronounced by major screen readers. Haryanvi is not considered an accessibility supported language.

Since most assistive technologies will consistently use lang over xml:lang when both are used, violation of this method may not necessarily be a violation of the outcome. Only when there are inconsistencies between assistive technologies as to which attribute is used to determine the language does this lead to a violation of the Changes to Natural Language outcome.

Assumptions

The language of the page can be set by other methods than the lang attribute, for example using HTTP headers or the meta element. These methods are not supported by all assistive technologies. This method assumes that these other methods are insufficient to satisfy the outcome.

{list of examples, unless using sub-sections, then delete this list}

Passed

{Example Name}

{Use a copy of this section for each example if not using a bullet list of examples}

{explanation}

Failed

{Example Name}

{Use a copy of this section for each example if not using a bullet list of examples}

{explanation}

Inapplicable

{Example Name}

{Use a copy of this section for each example if not using a bullet list of examples}

{explanation}

Get Started

To get started with testing language attributes in a web page, you'll need to inspect the code in your browser. Once the page has loaded, open up your browser's developer tools and look for the <html> element near the top of code—it should be on the second line, just after the <!doctype html> tag. Once you've found that, you need to check that it has a lang attribute that correctly indicates the main language of the page. For example, if the main language of the page is French, you would look for <html lang="fr">.

Once you've checked that the page itself is marked up correctly, you need to read the page to look for blocks of text that are in a different language. If you find a block of content in another language, use your browser's developer tools to inspect the code to check that there is a lang attribute that correctly indicates the language that the block of text is written in. Complete this check for each block of text that's in a different human language from that of the main page.

Summary

For each block of content that's in a different human language than that of its parent, ensure there is lang attribute that correctly matches the content in the block.

Applicability

This outcome applies to any block of text that is included in the accessibility tree.

Expectations

Expectation 1:: For each test target, there is an ancestor with a lang attribute.
Expectation 2:: For each test target, the ancestor from expectation 1 has a lang attribute value that is a valid language tag.
Expectation 3:: For each test target, the language tag from expectation 2 matches that target's default language.

This section does not use plain language for non-technical users. The technical descriptions are precise language primarily for test tool developers.

ASCII Whitespace

ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE

Assistive Technology

Hardware and/or software that acts as a user agent, or along with a mainstream user agent, to provide functionality to meet the requirements of users with disabilities that go beyond those offered by mainstream user agents.

Attribute Value

The attribute value of a content attribute set on an HTML element is the value that the attribute gets after being parsed and computed according to specifications. It may differ from the value that is actually written in the HTML code due to trimming whitespace or non-digits characters, default values, or case-insensitivity.

Some notable case of attribute value, among others:

For enumerated attributes, the attribute value is either the state of the attribute, or the keyword that maps to it; even for the default states. Thus has an attribute value of either Image Button (the state) or image (the keyword mapping to it), both formulations having the same meaning; similarly, "an input element with a type attribute value of Text" can be either , (missing value default), or (invalid value default).
For boolean attributes, the attribute value is true when the attribute is present and false otherwise. Thus
For attributes whose value is used in a case-insensitive context, the attribute value is the lowercase version of the value written in the HTML code.
For attributes that accept numbers, the attribute value is the result of parsing the value written in the HTML code according to the rules for parsing this kind of number.
For attributes that accept sets of tokens, whether space separated or comma separated, the attribute value is the set of tokens obtained after parsing the set and, depending on the case, converting its items to lowercase (if the set is used in a case-insensitive context).
For aria-* attributes, the attribute value is computed as indicated in the WAI-ARIA specification and the HTML Accessibility API Mappings.

This list is not exhaustive, and only serves as an illustration for some of the most common cases.

The attribute value of an IDL attribute is the value returned on getting it. Note that when an IDL attribute reflects a content attribute, they have the same attribute value.

Block of Text

Any natural language text, including alternative text, that starts on a new line and ends with a hard line break is a block of text. Each of the following are examples of a Block of Text: a paragraph, each item in a multi-line list, lines in a block of code, each cell in a table.

Any natural language text, including alternative text, that isn’t part of a larger sentence. A block of text starts on a new line and ends with the line break. Examples include paragraphs, large quotations, lists, buttons, table cells. Quotations inside a paragraph or links inside sentences are not blocks of text. A paragraph broken by a line break (
) would be two blocks of text. A poem would be a block of text for every line. A definition list could be on one line on a wide screen could be one block of text and one the narrow screen it would be two blocks of text.

Content Type

Each document has an associated encoding (an encoding), content type (a string), URL (a URL), origin (an origin), type ("xml" or "html"), and mode ("no-quirks", "quirks", or "limited-quirks"). [ENCODING] [URL] [HTML]

Unless stated otherwise, a document’s encoding is the utf-8 encoding, content type is "application/xml", document URL is "about:blank", document origin is an opaque origin, type is "xml", and its document mode is "no-quirks".

A document is said to be an XML document if its document type is "xml"; otherwise an HTML document. Whether a document is an HTML document or an XML document affects the behavior of certain APIs.

A document is said to be in no-quirks mode if its mode is "no-quirks", quirks mode if its document mode is "quirks", and limited-quirks mode if its document mode is "limited-quirks".

Default Page Language

The default language of a web page is the most common language of its top-level browsing context document, if it is unique. If this document has either no or several most common languages, then it has no default language.

Document Element

The document element of a document is the element whose parent is that document, if it exists; otherwise null.

Flat Tree

While Selectors operate on the DOM tree as the host language presents it, with separate trees that are unreachable via the standard parent/child relationship, the rest of CSS needs a single unified tree structure to work with. This is called the flattened element tree (or flat tree).

Inclusive Descendent

An inclusive descendant is an object or one of its descendants.

Natural Language

Natural Language (sometimes just language) refers to the spoken, written, or signed communications used by human beings. From Internationalization Glossary

Node Document

Each node has an associated node document, set upon creation, that is a document.

Text

A sequence of characters that can be programmatically determined, where the sequence is expressing something in natural language.

Text Inheriting its Programmatic Language from an Element

The text inheriting its programmatic language from an element E is composed of all the following texts:

text nodes: the value of any text nodes that are visible or included in the accessibility tree and children of an element inheriting its programmatic language from E;
accessible text: the accessible name and accessible description of any element inheriting its programmatic language from E, and included in the accessibility tree;
page title: the value of the document title, only if E is a document in a top-level browsing context.

An element F is an element inheriting its programmatic language from an element E if at least one of the following conditions is true (recursively):

F is E itself (an element always inherits its programmatic language from itself); or
F does not have a non-empty lang attribute, and is the child in the flat tree of an element inheriting its programmatic language from E; or
F is a fully active document element, has no non-empty lang attribute, and its browsing context container is an element inheriting its programmatic language from E.

Top Level Browsing Context

A browsing context that has no parent browsing context is the top-level browsing context for itself and all of the browsing contexts for which it is an ancestor browsing context.

A top-level browsing context has an associated group (null or a browsing context group). It is initially null.

It is possible to create new browsing contexts that are related to a top-level browsing context while their container is null. Such browsing contexts are called auxiliary browsing contexts. Auxiliary browsing contexts are always top-level browsing contexts.

Valid Language Tag

A language tag is valid if its primary language subtag exists in the language subtag registry with a Type field whose field-body value is language.

A "language tag" is here to be understood as in the first paragraph of the RFC 5646 language tag syntax, i.e. a sequence of subtags separated by hyphens, where a subtag is any sequence of alphanumerical characters. Thus, this definition intentionally differs from the strict RFC 5646 syntax (and ABNF grammar) as user agents and assistive technologies are more lenient in what they accept. The definition is however consistent with the behavior of the :lang() pseudo-selector as defined by Selectors Level 3. For example, de-hello would be an accepted way to indicate German in current user agents and assistive technologies, despite not being valid according to RFC 5646 grammar. As a consequence of this definition, however, grandfathered tags are not correctly recognized as valid language subtags.

Subtags, notably the primary language subtag, are case insensitive. Hence comparison with the language subtag registry must be done in a case insensitive way.

WCAG 3 JSON