Diff: draft-duerst-iri-10.txt - draft-duerst-iri.txt

	draft-duerst-iri-10.txt	draft-duerst-iri.txt

	Network Working Group M. Duerst	Network Working Group M. Duerst
	Internet-Draft W3C	Internet-Draft W3C
	Expires: March 28, 2005 M. Suignard	Expires: May 31, 2005 M. Suignard
	Microsoft Corporation	Microsoft Corporation
	September 27, 2004	November 30, 2004

	Internationalized Resource Identifiers (IRIs)	Internationalized Resource Identifiers (IRIs)
	draft-duerst-iri-10	draft-duerst-iri-11

	Status of this Memo	Status of this Memo

	This document is an Internet-Draft and is subject to all provisions	This document is an Internet-Draft and is subject to all provisions
	of section 3 of RFC 3667. By submitting this Internet-Draft, each	of section 3 of RFC 3667. By submitting this Internet-Draft, each
	author represents that any applicable patent or other IPR claims of	author represents that any applicable patent or other IPR claims of
	which he or she is aware have been or will be disclosed, and any of	which he or she is aware have been or will be disclosed, and any of
	which he or she become aware will be disclosed, in accordance with	which he or she become aware will be disclosed, in accordance with
	RFC 3668.	RFC 3668.


	skipping to change at page 1, line 37	skipping to change at page 1, line 37
	and may be updated, replaced, or obsoleted by other documents at any	and may be updated, replaced, or obsoleted by other documents at any
	time. It is inappropriate to use Internet-Drafts as reference	time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."	material or to cite them other than as "work in progress."

	The list of current Internet-Drafts can be accessed at	The list of current Internet-Drafts can be accessed at
	http://www.ietf.org/ietf/1id-abstracts.txt.	http://www.ietf.org/ietf/1id-abstracts.txt.

	The list of Internet-Draft Shadow Directories can be accessed at	The list of Internet-Draft Shadow Directories can be accessed at
	http://www.ietf.org/shadow.html.	http://www.ietf.org/shadow.html.

	This Internet-Draft will expire on March 28, 2005.	This Internet-Draft will expire on May 31, 2005.

	Copyright Notice	Copyright Notice

	Copyright (C) The Internet Society (2004).	Copyright (C) The Internet Society (2004).

	Abstract	Abstract

	This document defines a new protocol element, the Internationalized	This document defines a new protocol element, the Internationalized
	Resource Identifier (IRI), as a complement to the Uniform Resource	Resource Identifier (IRI), as a complement to the Uniform Resource
	Identifier (URI). An IRI is a sequence of characters from the	Identifier (URI). An IRI is a sequence of characters from the

	skipping to change at page 2, line 16	skipping to change at page 2, line 16
	of extending or changing the definition of URIs, to allow a clear	of extending or changing the definition of URIs, to allow a clear
	distinction and to avoid incompatibilities with existing software.	distinction and to avoid incompatibilities with existing software.
	Guidelines for the use and deployment of IRIs in various protocols,	Guidelines for the use and deployment of IRIs in various protocols,
	formats, and software components that now deal with URIs are	formats, and software components that now deal with URIs are
	provided.	provided.

	Table of Contents	Table of Contents

	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
	1.1 Overview and Motivation . . . . . . . . . . . . . . . . . 4	1.1 Overview and Motivation . . . . . . . . . . . . . . . . . 4
	1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . 5	1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . 4
	1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . 5	1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . 5
	1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 6	1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 6
	2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 7	2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 7
	2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 7	2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 7
	2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . 8	2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . 8
	3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 11	3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 10
	3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . 11	3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . 11
	3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . 14	3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . 14
	3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . 16	3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . 15
	4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 17	4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 17
	4.1 Logical Storage and Visual Presentation . . . . . . . . . 17	4.1 Logical Storage and Visual Presentation . . . . . . . . . 17
	4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . 19	4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . 18
	4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . 20	4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . 20
	4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 20	4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 20
	5. IRI Equivalence and Comparison . . . . . . . . . . . . . . . . 22	5. Normalization and Comparison . . . . . . . . . . . . . . . . . 22
	5.1 Simple String Comparison . . . . . . . . . . . . . . . . . 22	5.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 22
	5.2 Conversion to URIs . . . . . . . . . . . . . . . . . . . . 23	5.2 Preparation for Comparison . . . . . . . . . . . . . . . . 23
	5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . 23	5.3 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 23
	5.4 Preferred Forms . . . . . . . . . . . . . . . . . . . . . 24	5.3.1 Simple String Comparison . . . . . . . . . . . . . . . 24
	6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 25	5.3.2 Syntax-based Normalization . . . . . . . . . . . . . . 25
	6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 25	5.3.3 Scheme-based Normalization . . . . . . . . . . . . . . 27
	6.2 Software Interfaces and Protocols . . . . . . . . . . . . 25	5.3.4 Protocol-based Normalization . . . . . . . . . . . . . 29
	6.3 Format of URIs and IRIs in Documents and Protocols . . . . 26	6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 29
	6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 26	6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 29
	6.5 Relative IRI References . . . . . . . . . . . . . . . . . 28	6.2 Software Interfaces and Protocols . . . . . . . . . . . . 30
	7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 28	6.3 Format of URIs and IRIs in Documents and Protocols . . . . 30
	7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 28	6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 30
	7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 28	6.5 Relative IRI References . . . . . . . . . . . . . . . . . 32
	7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 29	7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 32
	7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 30	7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 32
	7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 30	7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 33
	7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 31	7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 34
	7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 31	7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 34
	7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 32	7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 35
	8. Security Considerations . . . . . . . . . . . . . . . . . . . 33	7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 35
	9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34	7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 36
	10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34	7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 36
	11. References . . . . . . . . . . . . . . . . . . . . . . . . . 35	8. Security Considerations . . . . . . . . . . . . . . . . . . . 37
	11.1 Normative References . . . . . . . . . . . . . . . . . . . . 35	9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39
	11.2 Non-normative References . . . . . . . . . . . . . . . . . . 36	10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 39
	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 38	11. References . . . . . . . . . . . . . . . . . . . . . . . . . 39
	A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 39	11.1 Normative References . . . . . . . . . . . . . . . . . . . . 39
	A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 39	11.2 Non-normative References . . . . . . . . . . . . . . . . . . 41
	A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 40	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 43
	A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 40	A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 43
	A.4 Indicating Character Encodings in the URI/IRI . . . . . . 40	A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 43
	Intellectual Property and Copyright Statements . . . . . . . . 41	A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 44
		A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 44
		A.4 Indicating Character Encodings in the URI/IRI . . . . . . 44
		Intellectual Property and Copyright Statements . . . . . . . . 45

	1. Introduction	1. Introduction

	1.1 Overview and Motivation	1.1 Overview and Motivation

	A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a	A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a
	sequence of characters chosen from a limited subset of the repertoire	sequence of characters chosen from a limited subset of the repertoire
	of US-ASCII [ASCII] characters.	of US-ASCII [ASCII] characters.

	The characters in URIs are frequently used for representing words of	The characters in URIs are frequently used for representing words of

	skipping to change at page 4, line 45	skipping to change at page 4, line 45
	[RFCYYYY], such as URI references. The syntax of IRIs is defined in	[RFCYYYY], such as URI references. The syntax of IRIs is defined in
	Section 2, and the relationship between IRIs and URIs in Section 3.	Section 2, and the relationship between IRIs and URIs in Section 3.

	Using characters outside of A-Z in IRIs brings with it some	Using characters outside of A-Z in IRIs brings with it some
	difficulties. Section 4 discusses the special case of bidirectional	difficulties. Section 4 discusses the special case of bidirectional
	IRIs, Section 5 various forms of equivalence between IRIs, and	IRIs, Section 5 various forms of equivalence between IRIs, and
	Section 6 the use of IRIs in different situations. Section 7 gives	Section 6 the use of IRIs in different situations. Section 7 gives
	additional informative guidelines, and Section 8 security	additional informative guidelines, and Section 8 security
	considerations.	considerations.

	For discussion of this document, please use the public-iri@w3.org
	mailing list (publicly archived at
	http://lists.w3.org/Archives/Public/public-iri/). An issues list for
	this document is maintained at
	http://www.w3.org/International/iri-edit#issues. For more
	information on the topic of this document, please also see [W3CIRI]
	and [Duerst01].

	1.2 Applicability	1.2 Applicability

	IRIs are designed to be compatible with recommendations for new URI	IRIs are designed to be compatible with recommendations for new URI
	schemes [RFC2718]. The compatibility is provided by specifying a	schemes [RFC2718]. The compatibility is provided by specifying a
	well defined and deterministic mapping from the IRI character	well defined and deterministic mapping from the IRI character
	sequence to the functionally equivalent URI character sequence.	sequence to the functionally equivalent URI character sequence.
	Practical use of IRIs (or IRI references) in place of URIs (or URI	Practical use of IRIs (or IRI references) in place of URIs (or URI
	references) depends on the following conditions being met:	references) depends on the following conditions being met:

	a) The protocol or format element where IRIs are used should be	a) The protocol or format element where IRIs are used should be

	skipping to change at page 11, line 5	skipping to change at page 10, line 44
	/ "25" %x30-35 ; 250-255	/ "25" %x30-35 ; 250-255

	pct-encoded = "%" HEXDIG HEXDIG	pct-encoded = "%" HEXDIG HEXDIG

	unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"	unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
	reserved = gen-delims / sub-delims	reserved = gen-delims / sub-delims
	gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"	gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
	sub-delims = "!" / "$" / "&" / "'" / "(" / ")"	sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
	/ "*" / "+" / "," / ";" / "="	/ "*" / "+" / "," / ";" / "="

		This syntax does not support IPv6 scoped addressing zone identifiers.

	3. Relationship between IRIs and URIs	3. Relationship between IRIs and URIs

	IRIs are meant to replace URIs in identifying resources for	IRIs are meant to replace URIs in identifying resources for
	protocols, formats and software components which use a UCS-based	protocols, formats and software components which use a UCS-based
	character repertoire. These protocols and components may never need	character repertoire. These protocols and components may never need
	to use URIs directly, especially when the resource identifier is used	to use URIs directly, especially when the resource identifier is used
	simply for identification purposes. However, when the resource	simply for identification purposes. However, when the resource
	identifier is used for resource retrieval, it is in many cases	identifier is used for resource retrieval, it is in many cases
	necessary to determine the associated URI because most retrieval	necessary to determine the associated URI because most retrieval
	mechanisms currently only are defined for URIs. In this case, IRIs	mechanisms currently only are defined for URIs. In this case, IRIs

	skipping to change at page 12, line 12	skipping to change at page 12, line 7
	characters from the UCS normalized according to Normalization	characters from the UCS normalized according to Normalization
	Form C (NFC, [UTR15]).	Form C (NFC, [UTR15]).

	Variant B) If the IRI is in some digital representation (e.g. an	Variant B) If the IRI is in some digital representation (e.g. an
	octet stream) in some known non-Unicode character encoding:	octet stream) in some known non-Unicode character encoding:
	Convert the IRI to a sequence of characters from the UCS	Convert the IRI to a sequence of characters from the UCS
	normalized according to NFC.	normalized according to NFC.

	Variant C) If the IRI is in an Unicode-based character encoding	Variant C) If the IRI is in an Unicode-based character encoding
	(for example UTF-8 or UTF-16): Do not normalize (see Section	(for example UTF-8 or UTF-16): Do not normalize (see Section
	5.3 for details). Apply Step 2 directly to the encoded Unicode	5.3.2.2 for details). Apply Step 2 directly to the encoded
	character sequence.	Unicode character sequence.

	Step 2) For each character in 'ucschar' or 'iprivate', apply Steps	Step 2) For each character in 'ucschar' or 'iprivate', apply Steps
	2.1 through 2.3 below.	2.1 through 2.3 below.

	2.1) Convert the character to a sequence of one or more octets	2.1) Convert the character to a sequence of one or more octets
	using UTF-8 [RFC3629].	using UTF-8 [RFC3629].

	2.2) Convert each octet to %HH, where HH is the hexadecimal	2.2) Convert each octet to %HH, where HH is the hexadecimal
	notation of the octet value. Note that this is identical to	notation of the octet value. Note that this is identical to
	the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To	the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To

	skipping to change at page 22, line 14	skipping to change at page 22, line 8
	Depending on whether the upper-case letters represent Arabic or	Depending on whether the upper-case letters represent Arabic or
	Hebrew, the visual representation is different.	Hebrew, the visual representation is different.

	Example 10 (allowed, but not recommended):	Example 10 (allowed, but not recommended):
	logical representation: http://ab.CDEFGH.123/kl/mn/op.html	logical representation: http://ab.CDEFGH.123/kl/mn/op.html
	visual representation: http://ab.123.HGFEDC/kl/mn/op.html	visual representation: http://ab.123.HGFEDC/kl/mn/op.html
	Components consisting of only numbers are allowed (it would be rather	Components consisting of only numbers are allowed (it would be rather
	difficult to prohibit them), but may interact with adjacent RTL	difficult to prohibit them), but may interact with adjacent RTL
	components in ways that are not easy to predict.	components in ways that are not easy to predict.

	5. IRI Equivalence and Comparison	5. Normalization and Comparison

	This section discusses IRI Equivalence and Comparison similar to	Note: The structure and much of the material for this section is
	Section 6, "Normalization and Comparison", in [RFCYYYY]. This	taken from section 6 of [RFCYYYY]; the differences are due to the
	section focuses on the main issues and on aspects that are different	specifics of IRIs.
	from [RFCYYYY]; Section 6 of [RFCYYYY] is recommended background
	reading.

	There is no general rule or procedure to decide whether two arbitrary	One of the most common operations on IRIs is simple comparison:
	IRIs are equivalent or not (i.e. whether they refer to the same	determining if two IRIs are equivalent without using the IRIs or the
	resource or not). Two IRIs that look almost the same may refer to	mapped URIs to access their respective resource(s). A comparison is
	different resources. Two IRIs that look completely different may	performed every time a response cache is accessed, a browser checks
	refer to the same resource. Each specification or application that	its history to color a link, or an XML parser processes tags within a
	uses IRIs has to decide on the appropriate criterion for IRI	namespace. Extensive normalization prior to comparison of IRIs may
	equivalence.	be used by spiders and indexing engines to prune a search space or
		reduce duplication of request actions and response storage.

	5.1 Simple String Comparison	IRI comparison is performed in respect to some particular purpose,
		and implementations with differing purposes will often be subject to
		differing design trade-offs in regards to how much effort should be
		spent in reducing aliased identifiers. This section describes a
		variety of methods that may be used to compare IRIs, the trade-offs
		between them, and the types of applications that might use them.

	In some scenarios a definite answer to the question of IRI	5.1 Equivalence
	equivalence is needed that is independent of the scheme used and
	always can be calculated quickly and without accessing a network. An
	example of such a case is XML Namespaces ([XMLNamespace]). In such
	cases, two IRIs SHOULD be defined as equivalent if and only if they
	are character-by-character equivalent. This is the same as being
	byte-by-byte equivalent if the character encoding for both IRIs is
	the same. As an example,
	http://example.org/~user, http://example.org/%7euser, and
	http://example.org/%7Euser are not equivalent under this definition.
	When comparing character-by-character, the comparison function MUST
	NOT map IRIs to URIs, because such a mapping would create additional
	spurious equivalences.

	It follows that IRIs SHOULD NOT be modified when being transported if	Since IRIs exist to identify resources, presumably they should be
	there is any chance that this IRI might be used as an identifier in	considered equivalent when they identify the same resource. However,
	the way explained above. When an IRI is used as an identifier in	such a definition of equivalence is not of much practical use, since
	scenarios that depend upon character-by-character equivalence,	there is no way for an implementation to compare two resources that
	creators of IRIs should take additional care to avoid IRIs that only	are not under its own control. For this reason, determination of
	differ in their use of percent-escaping. As an example, using both	equivalence or difference of IRIs is based on string comparison,
	http://example.org/~user and http://example.org/%7Euser to identify	perhaps augmented by reference to additional rules provided by URI
	XML Namespaces is a bad idea.	scheme definitions. We use the terms "different" and "equivalent" to
		describe the possible outcomes of such comparisons, but there are
		many applicationdependent versions of equivalence.

	5.2 Conversion to URIs	Even though it is possible to determine that two IRIs are equivalent,
		IRI comparison is not sufficient to determine if two IRIs identify
		different resources. For example, an owner of two different domain
		names could decide to serve the same resource from both, resulting in
		two different IRIs. Therefore, comparison methods are designed to
		minimize false negatives while strictly avoiding false positives.

	For actual resolution, differences in percent-encoding (except for	In testing for equivalence, applications should not directly compare
	the percent-encoding of reserved characters) MUST always result in	relative references; the references should be converted to their
	the same resource. For example, http://example.org/~user,	respective target IRIs before comparison. When IRIs are being
	http://example.org/%7euser and http://example.org/%7Euser must	compared for the purpose of selecting (or avoiding) a network action,
	resolve to the same resource.	such as retrieval of a representation, fragment components (if any)
		should be excluded from the comparison.

	If this kind of equivalence is to be tested, the percent-encoding of	Applications using IRIs as identity tokens with no relationship to a
	both IRIs to be compared has to be aligned, for example by converting	protocol MUST use the Simple String Comparison (see Section 5.3.1).
	both IRIs to URIs (see Section 3.1), eliminating escape differences	All other applications MUST select one of the comparison practices
	in the resulting URIs, and making sure that the case of the	from the Comparison Ladder (see Section 5.3, or, after IRI-to-URI
	hexadecimal characters in the percent-encoding is always the same	conversion, select one of the comparison practices from the URI
	(preferably upper case). If the IRI is to be passed to another	comparison ladder [RFCYYYY], Section 6.2.
	application, or used further in some other way, its original form
	MUST be preserved; the conversion described here should be performed
	only for the purpose of local comparison.

	Additional, similar equivalences are possible based on knowledge	5.2 Preparation for Comparison
	about the generic URI/IRI syntax, such as the fact that the scheme
	part is case-insensitive.

	5.3 Normalization	Any kind of IRI comparison REQUIRES that all escapings or encodings
		in the protocol or format that carries an IRI are resolved. This is
		usually done when parsing the protocol or format. Examples of such
		escapings or encodings are entities and numeric character references
		in [HTML4] and [XML1]. As an example, http://example.org/rosé
		(in HTML), http://example.org/rosé (in HTML or XML), and
		http://example.org/rosé (in HTML or XML) all get resolved into
		what is denoted in this document (see Section 1.4) as
		http://example.org/rosé (the "é" here standing for the
		actual e-acute character, to compensate for the fact that this
		document cannot contain non-ASCII characters).

		Similar considerations apply to encodings such as Transfer Codings in
		HTTP (see [RFC2616]) and Content Transfer Encodings in MIME[RFC2045],
		although in these cases, the encoding is not based on characters, but
		on octets, and additional care is required to make sure that
		characters, and not just arbitrary octets, are compared (see Section
		5.3.1).

		5.3 Comparison Ladder

		A variety of methods are used in practice to test IRI equivalence.
		These methods fall into a range, distinguished by the amount of
		processing required and the degree to which the probability of false
		negatives is reduced. As noted above, false negatives cannot be
		eliminated. In practice, their probability can be reduced, but this
		reduction requires more processing and is not cost-effective for all
		applications.

		If this range of comparison practices is considered as a ladder, the
		following discussion will climb the ladder, starting with those
		practices that are cheap but have a relatively higher chance of
		producing false negatives, and proceeding to those that have higher
		computational cost and lower risk of false negatives.

		5.3.1 Simple String Comparison

		If two IRIs, considered as character strings, are identical, then it
		is safe to conclude that they are equivalent. This type of
		equivalence test has very low computational cost and is in wide use
		in a variety of applications, particularly in the domain of parsing
		and when a definitive answer to the question of IRI equivalence is
		needed that is independent of the scheme used and can be calculated
		quickly and without accessing a network. An example of such a case
		is XML Namespaces ([XMLNamespace]).

		Testing strings for equivalence requires some basic precautions.
		This procedure is often referred to as "bit-for-bit" or
		"byte-for-byte" comparison, which is potentially misleading. Testing
		of strings for equality is normally based on pairwise comparison of
		the characters that make up the strings, starting from the first and
		proceeding until both strings are exhausted and all characters found
		to be equal, a pair of characters compares unequal, or one of the
		strings is exhausted before the other.

		Such character comparisons require that each pair of characters be
		put in comparable encoding form. For example, should one IRI be
		stored in a byte array in UTF-8 encoding form, and the second be in a
		UTF-16 encoding form, bit-for-bit comparisons applied naively will
		produce errors. It is better to speak of equality on a
		character-for-character rather than byte-for-byte or bit-for-bit
		basis. In practical terms, character-by-character comparisons should
		be done codepoint-by-codepoint after conversion to a common character
		encoding form. When comparing character-by-character, the comparison
		function MUST NOT map IRIs to URIs, because such a mapping would
		create additional spurious equivalences. It follows that IRIs SHOULD
		NOT be modified when being transported if there is any chance that
		this IRI might be used as an identifier.

		False negatives are caused by the production and use of IRI aliases.
		Unnecessary aliases can be reduced, regardless of the comparison
		method, by consistently providing IRI references in an
		already-normalized form (i.e., a form identical to what would be
		produced after normalization is applied, as described below).
		Protocols and data formats often choose to limit some IRI comparisons
		to simple string comparison, based on the theory that people and
		implementations will, in their own best interest, be consistent in
		providing IRI references, or at least consistent enough to negate any
		efficiency that might be obtained from further normalization.

		5.3.2 Syntax-based Normalization

		Implementations may use logic based on the definitions provided by
		this specification to reduce the probability of false negatives.
		Such processing is moderately higher in cost than
		character-for-character string comparison. For example, an
		application using this approach could reasonably consider the
		following two IRIs equivalent:

		example://a/b/c/%7Bfoo%7D/rosé
		eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9

		Web user agents, such as browsers, typically apply this type of IRI
		normalization when determining whether a cached response is
		available. Syntax-based normalization includes such techniques as
		case normalization, character normalization, percent-encoding
		normalization, and removal of dot-segments.

		5.3.2.1 Case Normalization

		For all IRIs, the hexadecimal digits within a percent-encoding
		triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore
		should be normalized to use uppercase letters for the digits A-F.

		When an IRI uses components of the generic syntax, the component
		syntax equivalence rules always apply; namely, that the scheme and
		US-ASCII only host are case-insensitive and therefore should be
		normalized to lowercase. For example, the URI
		<HTTP://www.EXAMPLE.com/> is equivalent to <http://www.example.com/>.
		Case equivalence for non-ASCII characters in IRI components that are
		IDNs are discussed in Section 5.3.3. The other generic syntax
		components are assumed to be case-sensitive unless specifically
		defined otherwise by the scheme.

		Creating schemes that allow case-insensitive syntax components
		containing non US-ASCII characters should be avoided because such a
		case normalization may be cultural dependant and is always a complex
		operation. The only exception concerns non-ASCII host names for
		which the character normalization includes a mapping step derived
		from case folding.

		5.3.2.2 Character Normalization

	The Unicode Standard [UNIV4] defines various equivalences between	The Unicode Standard [UNIV4] defines various equivalences between
	sequences of characters for various purposes. Unicode Standard Annex	sequences of characters for various purposes. Unicode Standard Annex
	#15 [UTR15] defines various Normalization Forms for these	#15 [UTR15] defines various Normalization Forms for these
	equivalences, in particular Normalization Form C (NFC, Canonical	equivalences, in particular Normalization Form C (NFC, Canonical
	Decomposition, followed by Canonical Composition) and Normalization	Decomposition, followed by Canonical Composition) and Normalization
	Form KC (NFKC, Compatibility Decomposition, followed by Canonical	Form KC (NFKC, Compatibility Decomposition, followed by Canonical
	Composition).	Composition).

	Equivalence of IRIs MUST rely on the assumption that IRIs are	Equivalence of IRIs MUST rely on the assumption that IRIs are
	appropriately pre-normalized, rather than applying normalization when	appropriately pre-character-normalized, rather than applying
	comparing two IRIs. The exceptions are conversion from a non-digital	character normalization when comparing two IRIs. The exceptions are
	form, and conversion from a non-UCS-based character encoding to an	conversion from a non-digital form, and conversion from a
	UCS-based character encoding. In these cases, NFC or a normalizing	non-UCS-based character encoding to an UCS-based character encoding.
	transcoder using NFC MUST be used for interoperability. To avoid	In these cases, NFC or a normalizing transcoder using NFC MUST be
	false negatives and problems with transcoding, IRIs SHOULD be created	used for interoperability. To avoid false negatives and problems
	using NFC. Using NFKC may avoid even more problems, for example by	with transcoding, IRIs SHOULD be created using NFC. Using NFKC may
	choosing half-width Latin letters instead of full-width, and	avoid even more problems, for example by choosing half-width Latin
	full-width Katakana instead of half-width.	letters instead of full-width, and full-width Katakana instead of
		half-width.

	As an example, http://www.example.org/résumé.html (in XML	As an example, http://www.example.org/résumé.html (in XML
	Notation) is in NFC. On the other hand,	Notation) is in NFC. On the other hand,
	http://www.example.org/résumé.html is not in NFC. The	http://www.example.org/résumé.html is not in NFC. The
	former uses precombined e-acute characters, the latter uses 'e'	former uses precombined e-acute characters, the latter uses 'e'
	characters followed by combining acute accents. Both usages are	characters followed by combining acute accents. Both usages are
	defined to be canonically equivalent in [UNIV4].	defined to be canonically equivalent in [UNIV4].

	Note: Because it is unknown how a particular field is being treated	Note: Because it is unknown how a particular sequence of characters
	with respect to text normalization, it would be inappropriate to	is being treated with respect to character normalization, it would
	allow third parties to normalize an IRI arbitrarily. This does	be inappropriate to allow third parties to normalize an IRI
	not contradict the recommendation that when a resource is created,	arbitrarily. This does not contradict the recommendation that
	its IRI should be as normalized as possible (i.e. NFC or even	when a resource is created, its IRI should be as
	NFKC). This is similar to the upper-case/lower-case problems in	character-normalized as possible (i.e. NFC or even NFKC). This
	URIs. Some parts of a URI are case-insensitive (domain name).	is similar to the upper-case/lower-case problems in
	For others, it is unclear whether they are case-sensitive or	character-normalized as possible (i.e. NFC or even NFKC). URIs.
		Some parts of a URI are case-insensitive (domain name). For
		others, it is unclear whether they are case-sensitive or
	case-insensitive, or something in between (e.g. case-sensitive,	case-insensitive, or something in between (e.g. case-sensitive,
	but if the wrong case is used, a multiple choice selection is	but if the wrong case is used, a multiple choice selection is
	provided instead of a direct negative result). The best recipe is	provided instead of a direct negative result). The best recipe is
	that the creator uses a reasonable capitalization, and when	that the creator uses a reasonable capitalization, and when
	transferring the URI, that capitalization is never changed.	transferring the URI, that capitalization is never changed.

	Various IRI schemes may allow the usage of International Domain Names	Various IRI schemes may allow the usage of Internationalized Domain
	(IDN) [RFC3490]. When in use in IRIs, those names SHOULD be	Names (IDN) [RFC3490] either in the ireg-name part or elsewhere.
	validated using the ToASCII operation defined in [RFC3490], with the	Character Normalization also applies to IDNs, as discussed in Section
	flags "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing	5.3.3.
	an invalid IDN cannot successfully be resolved. For legibility
	purposes, IDN components of IRIs SHOULD NOT be converted into ASCII
	Compatible Encoding (ACE).

	5.4 Preferred Forms	5.3.2.3 Percent-Encoding Normalization

	The following are the preferred forms for IRIs when created:	The percent-encoding mechanism (Section 2.1 of [RFCYYYY]) is a
		frequent source of variance among otherwise identical IRIs. In
		addition to the case normalization issue noted above, some IRI
		producers percent-encode octets that do not require percent-encoding,
		resulting in IRIs that are equivalent to their nonencoded
		counterparts. Such IRIs should be normalized by decoding any
		percent-encoded octet sequence that corresponds to an unreserved
		character, as described in Section 2.3 of [RFCYYYY].

	- Always provide the URI scheme in lowercase characters.	For actual resolution, differences in percent-encoding (except for
		the percent-encoding of reserved characters) MUST always result in
		the same resource. For example, http://example.org/~user,
		http://example.org/%7euser and http://example.org/%7Euser must
		resolve to the same resource.

	- Only perform percent-encoding where it is essential.	If this kind of equivalence is to be tested, the percent-encoding of
		both IRIs to be compared has to be aligned, for example by converting
		both IRIs to URIs (see Section 3.1), eliminating escape differences
		in the resulting URIs, and making sure that the case of the
		hexadecimal characters in the percent-encoding is always the same
		(preferably upper case). If the IRI is to be passed to another
		application, or used further in some other way, its original form
		MUST be preserved; the conversion described here should be performed
		only for the purpose of local comparison.

	- Always use uppercase A-through-F characters when percent-encoding.	5.3.2.4 Path Segment Normalization

	- For those schemes where ireg-name is a domain name, always provide	The complete path segments "." and ".." are intended only for use
	the individual labels, in the form produced when applying nameprep	within relative references (Section 4.1 of [RFCYYYY]) and are removed
	[RFC3491]. This in particular includes using lowercase characters	as part of the reference resolution process (Section 5.2 of
	rather than uppercase characters where applicable. Also, always	[RFCYYYY]). However, some implementations may incorrectly assume
	use US-ASCII '.' as a separator.	that reference resolution is not necessary when the reference is
		already an IRI, and thus fail to remove dot-segments when they occur
		in non-relative paths. IRI normalizers should remove dot-segments by
		applying the remove_dot_segments algorithm to the path, as described
		in Section 5.2.4 of [RFCYYYY].

	- Where possible, provide IRI components in NFKC or NFC.	5.3.3 Scheme-based Normalization

	- Prevent /./ and /../ from appearing in IRI paths.	The syntax and semantics of IRIs vary from scheme to scheme, as
		described by the defining specification for each scheme.
		Implementations may use scheme-specific rules, at further processing
		cost, to reduce the probability of false negatives. For example,
		since the "http" scheme makes use of an authority component, has a
		default port of "80", and defines an empty path to be equivalent to
		"/", the following four IRIs are equivalent:

	- For schemes that define an empty path to be equivalent to a path	http://example.com
	of "/", use "/".	http://example.com/
		http://example.com:/
		http://example.com:80/
		In general, an IRI that uses the generic syntax for authority with an
		empty path should be normalized to a path of "/"; likewise, an
		explicit ":port", where the port is empty or the default for the
		scheme, is equivalent to one where the port and its ":" delimiter are
		elided, and thus should be removed by scheme-based normalization.
		For example, the second IRI above is the normal form for the "http"
		scheme.

		Another case where normalization varies by scheme is in the handling
		of an empty authority component or empty host subcomponent. For many
		scheme specifications, an empty authority or host is considered an
		error; for others, it is considered equivalent to "localhost" or the
		end-user's host. When a scheme defines a default for authority and
		an IRI reference to that default is desired, the reference should be
		normalized to an empty authority for the sake of uniformity, brevity,
		and internationalization. If, however, either the userinfo or port
		subcomponent is non-empty, then the host should be given explicitly
		even if it matches the default.

		Normalization should not remove delimiters when their associated
		component is empty unless licensed to do so by the scheme
		specification. For example, the IRI "http://example.com/?" cannot be
		assumed to be equivalent to any of the examples above. Likewise, the
		presence or absence of delimiters within a userinfo subcomponent is
		usually significant to its interpretation. The fragment component is
		not subject to any scheme-based normalization; thus, two IRIs that
		differ only by the suffix "#" are considered different regardless of
		the scheme.

		Some IRI schemes may allow the usage of Internationalized Domain
		Names (IDN) [RFC3490] either in their ireg-name part or elsewhere.
		When in use in IRIs, those names SHOULD be validated using the
		ToASCII operation defined in [RFC3490], with the flags
		"UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing an
		invalid IDN cannot successfully be resolved. Validated IDN
		components of IRIs SHOULD be character normalized using the Nameprep
		process [RFC3491]; however, for legibility purposes, they SHOULD NOT
		be converted into ASCII Compatible Encoding (ACE).

		Scheme-based normalization may also consider IDN components and their
		conversions to punycode as equivalent. As an example,
		http://résumé.example.org may be considered equivalent to
		http://xn--rsum-bpad.example.org

		Other scheme-specific normalizations are possible.

		5.3.4 Protocol-based Normalization

		Web spiders, for which substantial effort to reduce the incidence of
		false negatives is often cost-effective, are observed to implement
		even more aggressive techniques in IRI comparison. For example, if
		they observe that an IRI such as

		http://example.com/data

		redirects to an IRI differing only in the trailing slash

		http://example.com/data/

		they will likely regard the two as equivalent in the future. This
		kind of technique is only appropriate when equivalence is clearly
		indicated by both the result of accessing the resources and the
		common conventions of their scheme's dereference algorithm (in this
		case, use of redirection by HTTP origin servers to avoid problems
		with relative references).

	6. Use of IRIs	6. Use of IRIs

	6.1 Limitations on UCS Characters Allowed in IRIs	6.1 Limitations on UCS Characters Allowed in IRIs

	This section discusses limitations on characters and character	This section discusses limitations on characters and character
	sequences usable for IRIs beyond those given in Section 2.2 and	sequences usable for IRIs beyond those given in Section 2.2 and
	Section 4.1. The considerations in this section are relevant when	Section 4.1. The considerations in this section are relevant when
	creating IRIs and when converting from URIs to IRIs.	creating IRIs and when converting from URIs to IRIs.


	skipping to change at page 35, line 15	skipping to change at page 39, line 31
	The discussion on the issue addressed here has started a long time	The discussion on the issue addressed here has started a long time
	ago. There was a thread in the HTML working group in August 1995	ago. There was a thread in the HTML working group in August 1995
	(under the topic of "Globalizing URIs") and in the www-international	(under the topic of "Globalizing URIs") and in the www-international
	mailing list in July 1996 (under the topic of "Internationalization	mailing list in July 1996 (under the topic of "Internationalization
	and URLs"), and ad-hoc meetings at the Unicode conferences in	and URLs"), and ad-hoc meetings at the Unicode conferences in
	September 1995 and September 1997.	September 1995 and September 1997.

	Many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding,	Many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding,
	Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim	Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim
	Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie	Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie
	Daigle, Ted Hardie, Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex	Daigle, Ted Hardie, Bill Fenner, Margaret Wasserman, Russ Housley,
	Texin, Graham Klyne, Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam	Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex Texin, Graham Klyne,
	Costello, Dan Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy	Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam Costello, Dan
	Badami, Jonathan Rosenne, Asmus Freytag, Simon Josefsson, Carlos	Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy Badami, Jonathan
	Viegas Damasio, Chris Haynes, Walter Underwood, and many others for	Rosenne, Asmus Freytag, Simon Josefsson, Carlos Viegas Damasio, Chris
	help with understanding the issues and possible solutions, and	Haynes, Walter Underwood, and many others for help with understanding
	getting the details right.	the issues and possible solutions, and getting the details right.

	This document is a product of the Internationalization Working Group	This document is a product of the Internationalization Working Group
	(I18N WG) of the World Wide Web Consortium (W3C). Thanks to the	(I18N WG) of the World Wide Web Consortium (W3C). Thanks to the
	members of the W3C I18N Working Group and Interest Group for their	members of the W3C I18N Working Group and Interest Group for their
	contributions and their work on [CharMod]. Thanks also go to the	contributions and their work on [CharMod]. Thanks also go to the
	members of many other W3C Working Groups for adopting IRIs, and to	members of many other W3C Working Groups for adopting IRIs, and to
	the members of the Montreal IAB Workshop on Internationalization and	the members of the Montreal IAB Workshop on Internationalization and
	Localization for their review.	Localization for their review.

	11. References	11. References

	skipping to change at page 36, line 17	skipping to change at page 40, line 34
	Profile for Internationalized Domain Names (IDN)", RFC	Profile for Internationalized Domain Names (IDN)", RFC
	3491, March 2003.	3491, March 2003.

	[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO	[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
	10646", STD 63, RFC 3629, November 2003.	10646", STD 63, RFC 3629, November 2003.

	[RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform	[RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
	Resource Identifier (URI): Generic Syntax (Note to the RFC	Resource Identifier (URI): Generic Syntax (Note to the RFC
	Editor: Please update this reference with the RFC	Editor: Please update this reference with the RFC
	resulting from draft-fielding-uri-rfc2396bis-xx.txt, and	resulting from draft-fielding-uri-rfc2396bis-xx.txt, and
	remove this Note)", draft-fielding-uri-rfc2396bis-07.txt	remove this Note)", draft-fielding-uri-rfc2396bis-07 (work
	(work in progress), April 2004.	in progress), April 2004.

	[UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard	[UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard
	Annex #9, March 2004,	Annex #9, March 2004,
	<http://www.unicode.org/reports/tr9/tr9-13.html>.	<http://www.unicode.org/reports/tr9/tr9-13.html>.

	[UNIV4] The Unicode Consortium, "The Unicode Standard, Version	[UNIV4] The Unicode Consortium, "The Unicode Standard, Version
	4.0.1, defined by: The Unicode Standard, Version 4.0	4.0.1, defined by: The Unicode Standard, Version 4.0
	(Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1),	(Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1),
	as amended by Unicode 4.0.1	as amended by Unicode 4.0.1
	(http://www.unicode.org/versions/Unicode4.0.1/)", March	(http://www.unicode.org/versions/Unicode4.0.1/)", March

	skipping to change at page 36, line 46	skipping to change at page 41, line 15
	11.2 Non-normative References	11.2 Non-normative References

	[BidiEx] "Examples of bidirectional IRIs",	[BidiEx] "Examples of bidirectional IRIs",
	<http://www.w3.org/International/iri-edit/BidiExamples>.	<http://www.w3.org/International/iri-edit/BidiExamples>.

	[CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T.	[CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T.
	Texin, "Character Model for the World Wide Web", World	Texin, "Character Model for the World Wide Web", World
	Wide Web Consortium Working Draft, February 2004,	Wide Web Consortium Working Draft, February 2004,
	<http://www.w3.org/TR/charmod>.	<http://www.w3.org/TR/charmod>.

	[Duerst01]
	Duerst, M., "Internationalized Resource Identifiers: From
	Specification to Testing", Proc. 19th International
	Unicode Conference, San Jose , September 2001,
	<http://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html>.

	[Duerst97]	[Duerst97]
	Duerst, M., "The Properties and Promises of UTF-8", Proc.	Duerst, M., "The Properties and Promises of UTF-8", Proc.
	11th International Unicode Conference, San Jose ,	11th International Unicode Conference, San Jose ,
	September 1997,	September 1997,
	<http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/	<http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/
	IUC11-UTF-8.pdf>.	IUC11-UTF-8.pdf>.

	[Gettys] Gettys, J., "URI Model Consequences",	[Gettys] Gettys, J., "URI Model Consequences",
	<http://www.w3.org/DesignIssues/ModelConsequences>.	<http://www.w3.org/DesignIssues/ModelConsequences>.

	[HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01	[HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01
	Specification", World Wide Web Consortium Recommendation,	Specification", World Wide Web Consortium Recommendation,
	December 1999,	December 1999,
	<http://www.w3.org/TR/REC-html40/appendix/	<http://www.w3.org/TR/REC-html40/appendix/
	notes.html#h-B.2>.	notes.html#h-B.2>.

		[RFC2045] Freed, N. and N. Freed, "Multipurpose Internet Mail
		Extensions (MIME) Part One: Format of Internet Message
		Bodies", RFC 2045, November 1996.

	[RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,	[RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
	Atkinson, R., Crispin, M. and P. Svanberg, "The Report of	Atkinson, R., Crispin, M. and P. Svanberg, "The Report of
	the IAB Character Set Workshop held 29 February - 1 March,	the IAB Character Set Workshop held 29 February - 1 March,
	1996", RFC 2130, April 1997.	1996", RFC 2130, April 1997.

	[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.	[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.

	[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997.	[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997.

	[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and	[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and

	skipping to change at page 38, line 11	skipping to change at page 42, line 25
	Protocol", RFC 2640, July 1999.	Protocol", RFC 2640, July 1999.

	[RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke,	[RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke,
	"Guidelines for new URL Schemes", RFC 2718, November 1999.	"Guidelines for new URL Schemes", RFC 2718, November 1999.

	[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other	[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other
	Markup Languages", Unicode Technical Report #20, World	Markup Languages", Unicode Technical Report #20, World
	Wide Web Consortium Note, February 2002,	Wide Web Consortium Note, February 2002,
	<http://www.w3.org/TR/unicode-xml/>.	<http://www.w3.org/TR/unicode-xml/>.

	[W3CIRI] Duerst, M., "Internationalization - URIs and other
	identifiers", September 2002,
	<http://www.w3.org/International/O-URL-and-ident.html>.

	[XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking	[XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking
	Language (XLink) Version 1.0", World Wide Web Consortium	Language (XLink) Version 1.0", World Wide Web Consortium
	Recommendation, June 2001,	Recommendation, June 2001,
	<http://www.w3.org/TR/xlink/#link-locators>.	<http://www.w3.org/TR/xlink/#link-locators>.

	[XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and	[XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and
	F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third	F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third
	Edition)", World Wide Web Consortium Recommendation,	Edition)", World Wide Web Consortium Recommendation,
	February 2004,	February 2004,
	<http://www.w3.org/TR/REC-xml#sec-external-ent>.	<http://www.w3.org/TR/REC-xml#sec-external-ent>.

End of changes.
This html diff was produced by rfcdiff 1.16, available from http://www.levkowetz.com/ietf/tools/rfcdiff/