Equivalence versus Normalization: The Case of Identifiers

Martin J. Dürst
Keio University/W3C

© 1998 Unicode/W3C/Keio University

Goals

Equivalence vs. Normalization

Equivalence:

1 inch is equivalent to 2.54 centimeters

Normalization:

Let's all use centimeters, so that we understand each other better

Equivalence in Unicode

é is (cannonically) equivalent to e + ´

     

Ambiguities in Unicode

Equivalence Categories

Cannonical Equivalence: Reader has no chance to make or see a difference
=> Needs to work, or reader will be highly confused

Compatibility Equivalence: Almost the same, but difference can be identified
=> Important for specific applications (searching, sorting)

Unicode for Running Text

Identifiers

Equivalence for Identifiers

Do We Need Normalization?

YES:

NO?:

Architecture

Do the right things at the right place

An example: Proxies

Internet Engineering Principle

Be conservative in what you send, be liberal in what you accept

Problem: No way to be conservative

Problem: Which Way to Normalize

Conclusion