JavaScriptInternationalization

From Internationalization

JavaScript Internationalization

Current Situation – July 2013

The issues list below was prepared in 2009 for discussion with Ecma TC39. In the meantime, several issues have been addressed in the first edition of the ECMAScript Internationalization API Specification, approved as standard ECMA-402 in December 2012; these issues have been marked with "ES-Intl-1" below. Several additional issues have been addressed in the current draft of the sixth edition of the ECMAScript Language Specification; these issues have been marked with "ES-6" below. For supplementary character support in regular expressions TC39 has approved a proposal, which however has not yet been integrated into the draft ES-6.

Issues with Current Spec Wording

  1. Fix toLowerCase/UpperCase prohibition on proper handling of casing on supplementary characters (P1 – ES-6)
  2. 15.9.1.8 strongly encourages DST handling not to consider actual rules applied in the past. Too strong? (P1 – ES-6)

Locale-related Behavior

((some aspects could be permitted without major changes, some require major work))

  1. Locale parameters for formatting dates, numbers, lists, toLocaleString(#locale). (P1 – ES-Intl-1)
  2. Locale-sensitive sorting (P1 – ES-Intl-1)
  3. Method for obtaining the default locale (P1 – ES-Intl-1) and for obtaining available locales (P2 – ES-Intl-1)
  4. Method to obtain default time zone. (P1)
  5. MessageFormat (P2)
  6. Date/Time formatting pattern strings (P2 – alternative design in ES-Intl-1)
  7. TimeZone parameter for formatting dates. (P3)
  • Note: IETF BCP 47 language tags are generally considered the standard for identifying locales and language specific formats.

Regular Expressions

(( Important, but requires major work. ))

  1. Character classes for complete range of Unicode characters (digit, letter, usw.) (P2)
  2. Sets and ranges with supplementary characters (P2)
  3. Grapheme cluster handling (counting, parsing, incrementing) and code point handling. (P2)

Supplementary Character Support and Unicode References

  1. Track the Unicode version, at least at the major version level (currently 6.0). (P1 – ES-6)
  2. Remove references to UCS-2 and require UTF-16 support. Require full character set and remove limits to BMP. (P1 – ES-6, except regular expressions)
  3. Unicode escapes to support supplementary characters directly (e.g. \U######, \u{######}) (P3 – ES-6) (regex???)
  4. Possibly extend fromCharCode() to accept supplementary code points or provide "fromCodePoint()". (P1 – ES-6)
  5. Add "codePointAt()" to complement "charCodeAt()" to support supplementary characters. (P1 – ES-6)
  6. Line Terminators missing some characters. (P1 – rejected: https://bugs.ecmascript.org/show_bug.cgi?id=409)


Providing supplementary character support is an important requirement. Changes made to the Java programming language in this regard (adding additional methods for accessing code points instead of UTF-16 code units) might be an appropriate model. Norbert Lindenberg has an article on the choices Sun made that provides good reference:

  http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html

Some backup notes and references

Markus's pages:

ECMAScript Language Specification, 5ed:

ECMAScript Internationalization API Specification, 1ed:

See also:

ECMAScript Language Specification, draft 6ed:

Proposal for supplementary characters in regular expressions: