[ contents ]

W3C

Language Tags and Locale Identifiers for the World Wide Web

W3C Working Draft 19 April 2006

This version:
http://www.w3.org/TR/2006/WD-ltli-20060419/
Latest version:
http://www.w3.org/TR/ltli/
Editor:
Felix Sasaki, W3C

This document is also available in these non-normative formats: XML.


Abstract

This document describes mechanisms for identifying or selecting the language of content or locale preferences used to process information using Web technologies. It describes how document formats, specifications, and implementations should handle language tags, as well as data structures that extend these tags to describe international preferences.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a First Public Working Draft of "Language and Locale Identifiers for the World Wide Web (LTLI)".

This document describes mechanisms for identifying or selecting the language of content or locale preferences used to process information using Web technologies. It describes how document formats, specifications, and implementations should handle language tags, as well as data structures that extend these tags to describe international preferences.

This document was developed by the Internationalization Core Working Group, part of the W3C Internationalization Activity. The Working Group expects to advance this Working Draft to Recommendation Status (see W3C document maturity levels).

Send your comments to www-i18n-comments@w3.org. Use "Comment on LTLI WD" in the subject line of your email. The archives for this list are publicly available.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

Appendices

A Normative References
B References (Non-Normative)

Go to the table of contents.1 Introduction

This section is informative.

Go to the table of contents.1.1 Scope of this Specification

This document describes mechanisms for identifying or selecting the language of content or locale preferences used to process information using Web technologies. It describes how document formats, specifications, and implementations should handle the language tags described by [BCP 47], as well as data structures that extend these tags to describe international preferences (see sec. 3.1 in [WS i18n Scenarios]).

The identification of language and locale has a broad range of applications within the World Wide Web. Existing standards which make use of language identification encompass the xml:lang attribute in [XML 1.0], the lang and hreflang atttributes in [HTML 4.01], or the language property in [XSL 1.0]. Locale identification is used for example within the CLDR project, cf. [LDML].

The current practice in many of these standards is to identify language and locale in terms of [RFC 3066], using formulations like "RFC 3066 or its successor". Recently a successor for [RFC 3066] has been developed, called [RFC 3066bis]. This specification takes [RFC 3066bis] as the basis for language and locale identification.

Currently, this specification refers to [RFC 3066bis] directly. However, [BCP 47] is always the "Best Common Practice" document for the identification of language. Since [RFC 3066bis] is expected to become the new BCP 47 before this working draft becomes a recommendation, a later draft of this specification will refer to BCP 47 directly.

Go to the table of contents.1.2 Out of Scope

This specification will not deal with formats for locale data, or actual locale data. However, such formats might apply the definitions made in this specification, see e.g. [LDML].

Go to the table of contents.1.3 Application Scenario: Web Services Internationalization

In order to enable multi-locale operation of Web services and to create the ability for locale negotiation, this specification describes a standardized method for identifying locales and locale and/or language preferences on the Web, including non-normative guidelines for implementation. This is called out in Requirement R005 of [WS i18n Req]. The mechanism for language and locale identification which is defined in this specification will be used in a future version of the description of Web Service Internationalization in [WS i18n].

Go to the table of contents.1.4 Further Application Scenarios

Further application scenarios of this specification encompass for example the standards mentioned in Section 1.1: Scope of this Specification. The scenarios can be divided in two areas:

  • Definition of values for language and locale identifiers

  • Definition of matching schemes for language and locale identifiers

As for matching of language values, many specifications already define operations using matching. An example is the language pseudo-class :lang defined in sec. 5.11.4 of [CSS 2.1]. It matches elements based on their language. This specification formulates requirements on such operations, based on [RFC 3066bis Matching].

Go to the table of contents.2 Notation and Terminology

This section is normative.

Go to the table of contents.2.1 Notation and Terminology

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].

Go to the table of contents.2.2 Language and Locale values

Language and locale values are defined as values which are compliant to [RFC 3066bis]. Their purpose is

  1. the identification of a [Definition: human language, whether spoken, written, signed, or otherwise signaled, for the purpose of communication.]

  2. the identification of a [Definition: locale, that is a collection of international preferences, generally related to a geographic region that a (certain category) of users require. These are usually identified by a shorthand identifier or token that is passed from the environment to various processes to get culturally affected behavior.]

Note: These definitions are based on [WS i18n Scenarios]. The same holds for the following description of the relation between language and locale.

Language and locale are distinct properties. Language is a core component of locale, but a locale can identify information that is not associated with language, such as a timezone. Thus the terms language and locale should not be used interchangeably, although there is a close relationship between these properties. Syntactically, locale IDs are sometimes distinguished from language IDs by the use of "_" instead of "-", but this syntactic distinction cannot be relied upon. Historically, locale IDs sometimes included charset parameters - that usage is strongly discouraged.

Note that sometimes information is heuristically inferred from language or locale identifiers. For example, software might infer that if the locale is "fr-FR" that the user's preferred currency is EUR. However, that is only a guess because that locale ID does not specify the preferred currency. The user may actually be living in the UK, and do most transactions in GBP.

Example 1: The difference between language and locale

Making the assumption that the language parameter ja-JP (Japanese) means the user's timezone is "Asia/Tokyo" would be a mistake if the requester is in Australia.

Go to the table of contents.2.3 Matching of Language Values

[Definition: A language range is a mechanism for identifying sets of language tags that share specific attributes. ] This allows users to select or filter the language tags based on specific requirements.

[Definition: A basic language range is a language range as described in sec. 2.1 of [RFC 3066bis].]

[Definition: An extended language range is a language range as described in sec. 2.2 of [RFC 3066bis].]

[Definition: A language priority list is a prioritized or weighted list of language ranges as described in sec. 2.3 of [RFC 3066bis].]

Go to the table of contents.3 Conformance Criteria

This section is normative

Conformance to this specification means the following:

  1. Specifications that make use of language or locale values MUST meet the conformance criteria defined for "well-formed" processors, as defined in sec. 2.2.9 of [RFC 3066bis].

  2. Specifications that make use of language or locale values MAY validate these values. If they do so, they MUST meet the conformance criteria defined for "validating" processors, as defined in sec. 2.2.9 of [RFC 3066bis].

  3. Specifications that define operations on language or locale values using matching Must use either a basic language range or an extended language range.

  4. Specifications that define operations on language or locale values using matching MUST specify whether the resulting language priority list contains a single result (lookup as defined in [RFC 3066bis Matching]), or a possible empty set of results (filtering as defined in [RFC 3066bis Matching]).

  5. Specifications that describe the identification of locales or aspects thereof with IRIs may use IRIs [RFC 3987] for this purpose, or to point to more detailed locale or preference data.

Note: Many specifications which have been created before [RFC 3066bis] and [RFC 3066bis Matching] are conform to these criteria. The purpose of the criteria is to provide a stable source for requirements for language and locale identification.

Go to the table of contents.4 Guidelines for the Interoperable Implementation of this Specification

This section is informative.

[Ed. note: This section will be written in a subsequent working draft.]

Go to the table of contents.A Normative References

BCP 47
Tags for the Identification of Languages. IETF Best Common Practice. BCP 47 is currently represented by [RFC 3066].
RFC 2119
S. Bradner. Key Words for use in RFCs to Indicate Requirement Levels. IETF March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
RFC 3066bis
Addison Phillips, Mark Davis. Tags for the Identification of Languages. IETF Internet-Draft, 14 October 2005. See http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt.
RFC 3066bis Matching
Addison Phillips, Mark Davis Matching of Language Tags. IETF Internet-Draft, 4 March 2006. See http://www.ietf.org/internet-drafts/draft-ietf-ltru-matching-12.txt.
RFC 3987
Martin Dürst, Michael Suignard. Internationalized Resource Identifiers (IRIs). IETF January 2005. Available at http://www.ietf.org/rfc/rfc3987.txt.

Go to the table of contents.B References (Non-Normative)

CSS 2.1
Bert Bos, Tantek Çelik, Ian Hickson, Håkon Wium Lie. Cascading Style Sheets, level 2 revision 1. W3C Working Draft 13 June 2005. Available at http://www.w3.org/TR/2005/WD-CSS21-20050613/. The latest version of CSS 2.1 is available at http://www.w3.org/TR/CSS21/.
HTML 4.01
Dave Ragget, Arnaud Le Hors, Ian Jacobs, eds. HTML 4.01 Specification. W3C Recommendation 24 December 1999. Available at http://www.w3.org/TR/1999/REC-html401-19991224/. The latest version of HTML 4.01 is available at http://www.w3.org/TR/html401/.
LDML
Mark Davis. Locale Data Markup Language (LDML), Unicode Technical Standard #35. Available at http://unicode.org/reports/tr35/tr35-5.html. The latest version of LDML is available at http://unicode.org/reports/tr35/.
RFC 3066
H. Alvestrand, editor. Tags for the Identification of Languages, IETF January 2001. Available at http://www.ietf.org/rfc/rfc3066.txt.
WS i18n
Addison Phillips, Mary Trumble. Web Services Internationalization (WS-I18N). W3C Working Draft 14 September 2005. Available at http://www.w3.org/TR/2005/WD-ws-i18n-20050914/. The latest version of WS i18n is available at http://www.w3.org/TR/ws-i18n/.
WS i18n Req
Addison Phillips. Requirements for the Internationalization of Web Services. W3C Working Group Note 16 November 2004. Available at http://www.w3.org/TR/2004/NOTE-ws-i18n-req-20041116/. The latest version of Ws i18n Req is available at http://www.w3.org/TR/ws-i18n-req/.
WS i18n Scenarios
Debasish Banerjee, Martin Dürst, Mike McKenna, Addison Phillips, Takao Suzuki, Tex Texin, Mary Trumble, Andrea Vine, Kentaro Noji. Web Services Internationalization Usage Scenarios. W3C Working Group Note 30 July 2004. Available at http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/. The latest version of WS i18n Scenarios is available at http://www.w3.org/TR/ws-i18n-scenarios/.
XML 1.0
Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, et al., eds. Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation 04 February 2004. Available at http://www.w3.org/TR/2004/REC-xml-20040204/. The latest version of XML 1.0 is available at http://www.w3.org/TR/REC-xml/.
XSL 1.0
Sharon Adler et al., eds. Extensible Stylesheet Language (XSL) Version 1.0. W3C Recommendation 15 October 2001. Available at http://www.w3.org/TR/2001/REC-xsl-20011015/. The latest version of XSL 1.0 is available at http://www.w3.org/TR/xsl/.