Internationalization Project Review 2002-05-23 - slide "Character Model: Normalization"

Section 4 of Character Model: most difficult, most contentious

Basic Problem: How to deal with precomposed (á / decomposed (a+´)) canonical equivalence in Unicode

First Last Call had sender makes right, but without enforcement

Second Last Call moved to check and reject based mainly on comments from Microsoft, with restrictions on pieces of a grammar (e.g. XML)

By design, data is almost always in NFC (main exceptions: LOC and Vietnamese)

Creates problems because people don't want to do it because actual need is quite rare