What happens if we just use UTF-8?
- Domain name not found: Failure, but not stability problem
- Software crashes,...: Security problem, not i18n problem
- High bits get lost (bit-stripping): Analysis
- For ease of processing, UTF-8 is highly regular and somehwat
redundant (see also The
Properties and Promizes of UTF-8)
- Only 11 out of 64 trailing UTF-8 byte values are valid (-,
0...9)
- Example: Fältström => FC$ltstrC6m
- bit-stripped names will either be invalid (most of the time, and in
particular when not containing Latin characters) or contain numbers in
unusual places
- Wrong hits are not impossible, but very rare
- This needs more research