Character Encoding
- Originally huge roadblock to networking
- Part of traditional locale model
- Really important for world wide localization
- Largely solved due to Unicode (and XML)
- Indication of encoding in protocol headers or formats:
Content-Type:
text/html;charset=iso-8859-6
<?xml version='1.0'
encoding='iso-8859-6'?>
- Unicode as a reference for conversion and processing ("Think
Unicode"):
- Visible e.g. in numeric character references (覫)
- Some remaining inaccuracies due to vendor-specific differences in
conversion tables (see e.g. XML Japanese
Profile)
- Even better: Unicode-based encodings (e.g. UTF-8) for transfer:
- UTF-8 (and UTF-16, with a BOM) are defaults for XML
- XML processors are required to accept UTF-8 (and UTF-16)