Its0503ReqLangLocale

From W3C Wiki
Jump to: navigation, search


ITS WG Collaborative editing page

Follow the conventions for editing this page.

Status: Working Draft

Author: Masaki Itagaki

Identifying Language/Locale

Summary

[R006] Any document at its beginning should declare a language/locale that is applied to both main content and external content stored separately. While the language/locale may be declared for the whole document, when an element or a text span is in a different language/locale from the document-level language, it should be labeled appropriately. Therefore, DTD/Schema should allow any elements to have a language/locale specifying attribute. The language/locale declaration should use industry standard approaches.

Challenges

Identifying languages (such as French and Spanish) and locales (such as Canadian French and Ecuadorian Spanish) is very important in rendering and processing document text and content properly since they provide specifications of language-dependent properties, such as hyphenation, text wrapping rules, color usage, fonts, spell checking quotation marks and other punctuation, etc.

In order to simplify the parsing process by documentation and localization tools, there should be a declaration of a language/locale that is applied to the whole document as well as externalized content. This should be done as a document-level property. Meanwhile, as a document may contain content with multiple languages/locales, subsets of the document needs a language/locale attribute. Such a local language/locale specification should be declared against an element or a span of text.

Notes

Currently there are several different standards for language/locale specifications, such as RFC3066 [RFC 3066][1]. XML 1.0 prescribes a language identification attribute xml:lang ([XML 1.0][2], section 2.12, and [XML 1.0 Errata][3], E01).The values of xml:lang are defined in terms of RFC 3066 or its successor. Currently, the successor RFC 3066bis is under development. There is also a technical standard from Unicode regarding the locale data markup language [LDML][4]. One has to be careful about the difference between locale and language identification, see [5]. RFC 3066bis provides mechanisms to separate these two clearly. ITS should carefully review these existing industry standards and clearly define what is a language/locale and its purpose in order to successfully meet this requirement.