JavaScriptInternationalization

JavaScript Internationalization

Current Situation – July 2013

The issues list below was prepared in 2009 for discussion with Ecma TC39. In the meantime, several issues have been addressed in the first edition of the ECMAScript Internationalization API Specification, approved as standard ECMA-402 in December 2012; these issues have been marked with "ES-Intl-1" below. Several additional issues have been addressed in the current draft of the sixth edition of the ECMAScript Language Specification; these issues have been marked with "ES-6" below. For supplementary character support in regular expressions TC39 has approved a proposal, which however has not yet been integrated into the draft ES-6.

Issues with Current Spec Wording

Fix toLowerCase/UpperCase prohibition on proper handling of casing on supplementary characters (P1 – ES-6)
15.9.1.8 strongly encourages DST handling not to consider actual rules applied in the past. Too strong? (P1 – ES-6)

Locale-related Behavior

((some aspects could be permitted without major changes, some require major work))

Locale parameters for formatting dates, numbers, lists, toLocaleString(#locale). (P1 – ES-Intl-1)
Locale-sensitive sorting (P1 – ES-Intl-1)
Method for obtaining the default locale (P1 – ES-Intl-1) and for obtaining available locales (P2 – ES-Intl-1)
Method to obtain default time zone. (P1)
MessageFormat (P2)
Date/Time formatting pattern strings (P2 – alternative design in ES-Intl-1)
TimeZone parameter for formatting dates. (P3)

Note: IETF BCP 47 language tags are generally considered the standard for identifying locales and language specific formats.

Regular Expressions

(( Important, but requires major work. ))

Character classes for complete range of Unicode characters (digit, letter, usw.) (P2)
Sets and ranges with supplementary characters (P2)
Grapheme cluster handling (counting, parsing, incrementing) and code point handling. (P2)

See UTS#18 (http://www.unicode.org/reports/tr18/) and Perl regexp for more.

Supplementary Character Support and Unicode References

Track the Unicode version, at least at the major version level (currently 6.0). (P1 – ES-6)
Remove references to UCS-2 and require UTF-16 support. Require full character set and remove limits to BMP. (P1 – ES-6, except regular expressions)
Unicode escapes to support supplementary characters directly (e.g. \U######, \u{######}) (P3 – ES-6) (regex???)
Possibly extend fromCharCode() to accept supplementary code points or provide "fromCodePoint()". (P1 – ES-6)
Add "codePointAt()" to complement "charCodeAt()" to support supplementary characters. (P1 – ES-6)
Line Terminators missing some characters. (P1 – rejected: https://bugs.ecmascript.org/show_bug.cgi?id=409)

Providing supplementary character support is an important requirement. Changes made to the Java programming language in this regard (adding additional methods for accessing code points instead of UTF-16 code units) might be an appropriate model. Norbert Lindenberg has an article on the choices Sun made that provides good reference:

  http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html

Some backup notes and references

Markus's pages:

ECMAScript Language Specification, 5ed:

http://ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262%205th%20edition%20December%202009.pdf

ECMAScript Internationalization API Specification, 1ed:

http://ecma-international.org/publications/standards/Ecma-402.htm