XML 1.0 Leftover: Character Normalization
- Problem: duplicate representations in Unicode for the same character
(e.g. ü and u",...), leftover from Unicode/ISO 10646 merger
- Normalization Form
C: Prefer precomposed over decomposed
- Early Uniform Normalization: Normalize when text is created, to avoid
problems later
- Avoids unpleasant surprises when matching strings
- NFC designed close to most used encoding variant
- Compact
implementation of Normalization Checking