From W3C Wiki
This is a summary of current thoughts on LTLI (Language Tags and Locale Identifiers)
We are still hard-pressed to come up with a satisfactory definition for a locale. "Locale" is a fairly old concept coming from the field of software localization in the 1980's. Localization is understood to mean doing whatever it takes to adapt a piece of software to a given group of users; we're talking about large groups here, such as a whole country or all the speakers of a certain language. The "locale", then, is the set of "things" common to this group, from the point of view of the software being localized. The most important part of localization is the translation of all text to the language of the users, so that they can understand it. But there are other aspects:
- Traditionally, translating to another language often meant using another character set, which in turn required adapting the software to deal with that character set. Therefore "charset" was deemed to be part of a locale, e.g. in the POSIX locale model.
- Apart from static text, which simply gets translated, software often generates or interprets text by itself. Even primitive applications were often able to interpret the user-provided answers to Yes/No questions (the answer being either "Y" for Yes or "N" for No). Thus the single letters used for Yes and No in a given language became part of the locale data for that language. And similarly for things such as dates, which software would often generate from a binary data value or interpret from user input. The software then needs to know the conventional order of components (year, month, day) and maybe even the names of the months, etc.
- Depending on the particular application, many other things may be subject to adaptation during localization, and may therefore be considered part of the "locale".
- In many systems the notion of locale allows for customization, and thus is not tied to a particular language/country combination. For example, many systems allow customized date or time formats, number formats, choice of measurement system, and so on.
- The concept of locale sometimes has little to do with software localization; it is simply a general bundle of preferences or other information associated with a user, such as the country of residence, the country of citizenship, and so on.
There is general agreement that language is the core part of the locale. Language is always present, which is not the case for any other "aspect" of a locale.
Soon after software localization came internationalization. This consists in making allowance in the design and implementation of software to make subsequent localization easier and more efficient. Text is externalized into resources, making translation easier. But some of the functionality is also generalized, so that it can function in multiple locales. An example would be a date display subroutine, which becomes able to display the date according to many different conventions (different order, month names in multiple languages, character sets). But this date display function then needs to be told which convention to use in any particular call. This leads us to locale identifiers.
Topics to be covered by LTLI
- Language versus locale: should the information be part of one field or separate fields? Proposal for best practice: Use one field for both, except in cases where the notion of locale encompasses extended information (as above).
- Core of a locale: language (mostly). No need (or ability) for LTLI to define the rest.
- BCP 47 as core of locale identifiers.
- Specifying locales - legacy formats (posix example, java / cldr example (centered on language)).
- Canonicalization of other identifiers, e.g. "en_us" to "en-us".
- LTLI and the matching part of BCP 47 - no need to address this.
- Section on "How to reference BCP 47".
- BP: How to specify locale on the web: for browsers / UAs (user settings), for server - client interaction (e.g. web services, language negotiation), web applications, ...