String Identity Matching Choices


  1. Which representations to treat as equivalent (and which not)
  2. Which components in the WWW architecture to make responsible for equivalences:
    1. Each individual component that performs a string identity check has to take equivalences into account (Late Normalization)
    2. Duplicates and ambiguities are removed as close to their source as possible (Early Normalization)
  3. Which way to normalize (in the case that early normalization (2.2) is needed, even if only in some cases)