W3C | TAG | tag issues list

Proposal regarding TAG issue IRIEverywhere-27

Status of this document

This document was prepared in response to an action item assigned at the 18 November 2002 TAG ftf meeting regarding issue IRIEverywhere-27. The TAG has not reviewed this document. This document represents some of the TAG's 18 November 2002 discussion, but does not represent the consensus of the TAG or of any other parties. Please send comments on this document to the public mailing list www-tag (archive).

Many thanks to Martin Dürst for discussing these topics.


IRIs are defined in an Internet Draft entitled "Internationalized Resource Identifiers"; see the IRI home page for the latest version.

This document proposes:

  1. An answer the question "To what extent should IRIs be used on the Web today?"
  2. Changes to various specifications.

1. To what extent should IRIs be used on the Web today?

IRIs are based on several years of experience with internationalized identifiers. Some W3C Recommendations include language copied from IRI drafts; other groups are likely to do similarly, possibly introducing interoperability problems. The TAG threrefore supports the work on an IRI specification (per 18 Nov 2002 teleconf).

The IRI specification is still a draft, however. Therefore, any party wishing to use IRIs -- software developers, content authors, or specification editors -- should use them with caution.

2. Changes to various specifications in light of this model

The TAG considers the IRI space and URI space to be different (per 18 Nov 2002 teleconf): IRIs are a way to talk about URIs (as relative URIs map to absolute ones). In this light, the TAG recommends that IRIs not be compared for equivalence directly, but that they be converted to URIs (possibly after normalization) and the URIs compared. The TAG is working on a separate finding entitled "How to Compare Uniform Resource Identifiers."

In light of this (and other concerns regarding hex-escaping), the authors recommend some changes to the following specifications.

2.1 IRI specification

  1. Section 2.3 ("IRI Equivalence and Normalization) should:
    1. Talk about normalization
    2. Explain that IRI comparision is done by normalizing and converting to URIs (per section 3.1), then comparing URIs per How to Compare Uniform Resource Identifiers."
  2. Question: Should the TAG ask the IRI spec editors to convey a model whereby IRIs are a means of "talking about a URI?" Currently, the IRI specification conveys the message that IRIs should replace URIs. Martin observes that if you say that ~/%7e/%7E are equivalent, then you should say that IRIs are a way of writing URIs. Otherwise, IRIs and URIs are different.

2.2 RFC2396

  1. RFC2396 should be modified so that hex digits (HEXDIG) are case-insensitive. We note that the editor's draft of RFC2396 at the time of writing this does not talk about this case-insensitivity, nor does it define HEXDIG.

2.3 How to Compare URIs

  1. Include a reference to the IRI specification, where IRI comparison is discussed.
  2. Per TAG discussion at the 20 Jan 2003 teleconference, ask that "lower case" in section 3.1 be changed to "upper case," even if the desired change to RFC2396 is made.
  3. In section 3.1, step 2, part three, change "character sequence" to "triplet sequence."
  4. (For ASCII characters), treat, for example, ~/%7e/%7E as equivalent. If "~" is treated differently, then some IRIs may be considered equivalent in this regard, while URIs would be considered different. This breaks the model of seeing IRIs as a way of writing URIs.

Then ask that How to Compare URIs be reflected in RFC2396.

2.4 XML Namespaces 1.1

Chris Lilley, Ian Jacobs

Last modified: $Date: 2003/01/27 22:58:44 $ by $Author: ijacobs $