Character Model: Normalization


Section 4 of Character Model: most difficult, most contentious

Basic Problem: How to deal with precomposed (á / decomposed (a+´)) canonical equivalence in Unicode

Got Unicode Consortium to define Normalization Form C (NFC) based on Requirements

First Last Call had sender makes right, but without enforcement

Second Last Call moved to check and reject based mainly on comments from Microsoft, with restrictions on pieces of a grammar (e.g. XML)

By design, data is almost always in NFC (main exceptions: LOC and Vietnamese)

Creates problems because people don't want to do it because actual need is quite rare