draft-duerst-iri-10.txt | draft-duerst-iri.txt | |||
---|---|---|---|---|
Network Working Group M. Duerst | Network Working Group M. Duerst | |||
Internet-Draft W3C | Internet-Draft W3C | |||
Expires: March 28, 2005 M. Suignard | Expires: May 31, 2005 M. Suignard | |||
Microsoft Corporation | Microsoft Corporation | |||
September 27, 2004 | November 30, 2004 | |||
Internationalized Resource Identifiers (IRIs) | Internationalized Resource Identifiers (IRIs) | |||
draft-duerst-iri-10 | draft-duerst-iri-11 | |||
Status of this Memo | Status of this Memo | |||
This document is an Internet-Draft and is subject to all provisions | This document is an Internet-Draft and is subject to all provisions | |||
of section 3 of RFC 3667. By submitting this Internet-Draft, each | of section 3 of RFC 3667. By submitting this Internet-Draft, each | |||
author represents that any applicable patent or other IPR claims of | author represents that any applicable patent or other IPR claims of | |||
which he or she is aware have been or will be disclosed, and any of | which he or she is aware have been or will be disclosed, and any of | |||
which he or she become aware will be disclosed, in accordance with | which he or she become aware will be disclosed, in accordance with | |||
RFC 3668. | RFC 3668. | |||
skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on March 28, 2005. | This Internet-Draft will expire on May 31, 2005. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The Internet Society (2004). | Copyright (C) The Internet Society (2004). | |||
Abstract | Abstract | |||
This document defines a new protocol element, the Internationalized | This document defines a new protocol element, the Internationalized | |||
Resource Identifier (IRI), as a complement to the Uniform Resource | Resource Identifier (IRI), as a complement to the Uniform Resource | |||
Identifier (URI). An IRI is a sequence of characters from the | Identifier (URI). An IRI is a sequence of characters from the | |||
skipping to change at page 2, line 16 | skipping to change at page 2, line 16 | |||
of extending or changing the definition of URIs, to allow a clear | of extending or changing the definition of URIs, to allow a clear | |||
distinction and to avoid incompatibilities with existing software. | distinction and to avoid incompatibilities with existing software. | |||
Guidelines for the use and deployment of IRIs in various protocols, | Guidelines for the use and deployment of IRIs in various protocols, | |||
formats, and software components that now deal with URIs are | formats, and software components that now deal with URIs are | |||
provided. | provided. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1 Overview and Motivation . . . . . . . . . . . . . . . . . 4 | 1.1 Overview and Motivation . . . . . . . . . . . . . . . . . 4 | |||
1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 7 | 2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 7 | |||
2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . 8 | 2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . 8 | |||
3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 11 | 3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 10 | |||
3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . 11 | 3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . 11 | |||
3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . 14 | 3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . 14 | |||
3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 17 | 4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 17 | |||
4.1 Logical Storage and Visual Presentation . . . . . . . . . 17 | 4.1 Logical Storage and Visual Presentation . . . . . . . . . 17 | |||
4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . 19 | 4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . 18 | |||
4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . 20 | 4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . 20 | |||
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
5. IRI Equivalence and Comparison . . . . . . . . . . . . . . . . 22 | 5. Normalization and Comparison . . . . . . . . . . . . . . . . . 22 | |||
5.1 Simple String Comparison . . . . . . . . . . . . . . . . . 22 | 5.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
5.2 Conversion to URIs . . . . . . . . . . . . . . . . . . . . 23 | 5.2 Preparation for Comparison . . . . . . . . . . . . . . . . 23 | |||
5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . 23 | 5.3 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 23 | |||
5.4 Preferred Forms . . . . . . . . . . . . . . . . . . . . . 24 | 5.3.1 Simple String Comparison . . . . . . . . . . . . . . . 24 | |||
6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 5.3.2 Syntax-based Normalization . . . . . . . . . . . . . . 25 | |||
6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 25 | 5.3.3 Scheme-based Normalization . . . . . . . . . . . . . . 27 | |||
6.2 Software Interfaces and Protocols . . . . . . . . . . . . 25 | 5.3.4 Protocol-based Normalization . . . . . . . . . . . . . 29 | |||
6.3 Format of URIs and IRIs in Documents and Protocols . . . . 26 | 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 26 | 6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 29 | |||
6.5 Relative IRI References . . . . . . . . . . . . . . . . . 28 | 6.2 Software Interfaces and Protocols . . . . . . . . . . . . 30 | |||
7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 28 | 6.3 Format of URIs and IRIs in Documents and Protocols . . . . 30 | |||
7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 28 | 6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 30 | |||
7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 28 | 6.5 Relative IRI References . . . . . . . . . . . . . . . . . 32 | |||
7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 29 | 7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 32 | |||
7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 30 | 7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 32 | |||
7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 30 | 7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 33 | |||
7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 31 | 7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 34 | |||
7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 31 | 7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 34 | |||
7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 32 | 7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 35 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 33 | 7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 35 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 | 7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 36 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 | 7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 36 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | |||
11.1 Normative References . . . . . . . . . . . . . . . . . . . . 35 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 | |||
11.2 Non-normative References . . . . . . . . . . . . . . . . . . 36 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 39 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 38 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 39 | 11.1 Normative References . . . . . . . . . . . . . . . . . . . . 39 | |||
A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 39 | 11.2 Non-normative References . . . . . . . . . . . . . . . . . . 41 | |||
A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 40 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 43 | |||
A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 40 | A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 43 | |||
A.4 Indicating Character Encodings in the URI/IRI . . . . . . 40 | A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 43 | |||
Intellectual Property and Copyright Statements . . . . . . . . 41 | A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 44 | |||
A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 44 | ||||
A.4 Indicating Character Encodings in the URI/IRI . . . . . . 44 | ||||
Intellectual Property and Copyright Statements . . . . . . . . 45 | ||||
1. Introduction | 1. Introduction | |||
1.1 Overview and Motivation | 1.1 Overview and Motivation | |||
A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a | A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a | |||
sequence of characters chosen from a limited subset of the repertoire | sequence of characters chosen from a limited subset of the repertoire | |||
of US-ASCII [ASCII] characters. | of US-ASCII [ASCII] characters. | |||
The characters in URIs are frequently used for representing words of | The characters in URIs are frequently used for representing words of | |||
skipping to change at page 4, line 45 | skipping to change at page 4, line 45 | |||
[RFCYYYY], such as URI references. The syntax of IRIs is defined in | [RFCYYYY], such as URI references. The syntax of IRIs is defined in | |||
Section 2, and the relationship between IRIs and URIs in Section 3. | Section 2, and the relationship between IRIs and URIs in Section 3. | |||
Using characters outside of A-Z in IRIs brings with it some | Using characters outside of A-Z in IRIs brings with it some | |||
difficulties. Section 4 discusses the special case of bidirectional | difficulties. Section 4 discusses the special case of bidirectional | |||
IRIs, Section 5 various forms of equivalence between IRIs, and | IRIs, Section 5 various forms of equivalence between IRIs, and | |||
Section 6 the use of IRIs in different situations. Section 7 gives | Section 6 the use of IRIs in different situations. Section 7 gives | |||
additional informative guidelines, and Section 8 security | additional informative guidelines, and Section 8 security | |||
considerations. | considerations. | |||
For discussion of this document, please use the public-iri@w3.org | ||||
mailing list (publicly archived at | ||||
http://lists.w3.org/Archives/Public/public-iri/). An issues list for | ||||
this document is maintained at | ||||
http://www.w3.org/International/iri-edit#issues. For more | ||||
information on the topic of this document, please also see [W3CIRI] | ||||
and [Duerst01]. | ||||
1.2 Applicability | 1.2 Applicability | |||
IRIs are designed to be compatible with recommendations for new URI | IRIs are designed to be compatible with recommendations for new URI | |||
schemes [RFC2718]. The compatibility is provided by specifying a | schemes [RFC2718]. The compatibility is provided by specifying a | |||
well defined and deterministic mapping from the IRI character | well defined and deterministic mapping from the IRI character | |||
sequence to the functionally equivalent URI character sequence. | sequence to the functionally equivalent URI character sequence. | |||
Practical use of IRIs (or IRI references) in place of URIs (or URI | Practical use of IRIs (or IRI references) in place of URIs (or URI | |||
references) depends on the following conditions being met: | references) depends on the following conditions being met: | |||
a) The protocol or format element where IRIs are used should be | a) The protocol or format element where IRIs are used should be | |||
skipping to change at page 11, line 5 | skipping to change at page 10, line 44 | |||
/ "25" %x30-35 ; 250-255 | / "25" %x30-35 ; 250-255 | |||
pct-encoded = "%" HEXDIG HEXDIG | pct-encoded = "%" HEXDIG HEXDIG | |||
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | |||
reserved = gen-delims / sub-delims | reserved = gen-delims / sub-delims | |||
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" | gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" | |||
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" | sub-delims = "!" / "$" / "&" / "'" / "(" / ")" | |||
/ "*" / "+" / "," / ";" / "=" | / "*" / "+" / "," / ";" / "=" | |||
This syntax does not support IPv6 scoped addressing zone identifiers. | ||||
3. Relationship between IRIs and URIs | 3. Relationship between IRIs and URIs | |||
IRIs are meant to replace URIs in identifying resources for | IRIs are meant to replace URIs in identifying resources for | |||
protocols, formats and software components which use a UCS-based | protocols, formats and software components which use a UCS-based | |||
character repertoire. These protocols and components may never need | character repertoire. These protocols and components may never need | |||
to use URIs directly, especially when the resource identifier is used | to use URIs directly, especially when the resource identifier is used | |||
simply for identification purposes. However, when the resource | simply for identification purposes. However, when the resource | |||
identifier is used for resource retrieval, it is in many cases | identifier is used for resource retrieval, it is in many cases | |||
necessary to determine the associated URI because most retrieval | necessary to determine the associated URI because most retrieval | |||
mechanisms currently only are defined for URIs. In this case, IRIs | mechanisms currently only are defined for URIs. In this case, IRIs | |||
skipping to change at page 12, line 12 | skipping to change at page 12, line 7 | |||
characters from the UCS normalized according to Normalization | characters from the UCS normalized according to Normalization | |||
Form C (NFC, [UTR15]). | Form C (NFC, [UTR15]). | |||
Variant B) If the IRI is in some digital representation (e.g. an | Variant B) If the IRI is in some digital representation (e.g. an | |||
octet stream) in some known non-Unicode character encoding: | octet stream) in some known non-Unicode character encoding: | |||
Convert the IRI to a sequence of characters from the UCS | Convert the IRI to a sequence of characters from the UCS | |||
normalized according to NFC. | normalized according to NFC. | |||
Variant C) If the IRI is in an Unicode-based character encoding | Variant C) If the IRI is in an Unicode-based character encoding | |||
(for example UTF-8 or UTF-16): Do not normalize (see Section | (for example UTF-8 or UTF-16): Do not normalize (see Section | |||
5.3 for details). Apply Step 2 directly to the encoded Unicode | 5.3.2.2 for details). Apply Step 2 directly to the encoded | |||
character sequence. | Unicode character sequence. | |||
Step 2) For each character in 'ucschar' or 'iprivate', apply Steps | Step 2) For each character in 'ucschar' or 'iprivate', apply Steps | |||
2.1 through 2.3 below. | 2.1 through 2.3 below. | |||
2.1) Convert the character to a sequence of one or more octets | 2.1) Convert the character to a sequence of one or more octets | |||
using UTF-8 [RFC3629]. | using UTF-8 [RFC3629]. | |||
2.2) Convert each octet to %HH, where HH is the hexadecimal | 2.2) Convert each octet to %HH, where HH is the hexadecimal | |||
notation of the octet value. Note that this is identical to | notation of the octet value. Note that this is identical to | |||
the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To | the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To | |||
skipping to change at page 22, line 14 | skipping to change at page 22, line 8 | |||
Depending on whether the upper-case letters represent Arabic or | Depending on whether the upper-case letters represent Arabic or | |||
Hebrew, the visual representation is different. | Hebrew, the visual representation is different. | |||
Example 10 (allowed, but not recommended): | Example 10 (allowed, but not recommended): | |||
logical representation: http://ab.CDEFGH.123/kl/mn/op.html | logical representation: http://ab.CDEFGH.123/kl/mn/op.html | |||
visual representation: http://ab.123.HGFEDC/kl/mn/op.html | visual representation: http://ab.123.HGFEDC/kl/mn/op.html | |||
Components consisting of only numbers are allowed (it would be rather | Components consisting of only numbers are allowed (it would be rather | |||
difficult to prohibit them), but may interact with adjacent RTL | difficult to prohibit them), but may interact with adjacent RTL | |||
components in ways that are not easy to predict. | components in ways that are not easy to predict. | |||
5. IRI Equivalence and Comparison | 5. Normalization and Comparison | |||
This section discusses IRI Equivalence and Comparison similar to | Note: The structure and much of the material for this section is | |||
Section 6, "Normalization and Comparison", in [RFCYYYY]. This | taken from section 6 of [RFCYYYY]; the differences are due to the | |||
section focuses on the main issues and on aspects that are different | specifics of IRIs. | |||
from [RFCYYYY]; Section 6 of [RFCYYYY] is recommended background | ||||
reading. | ||||
There is no general rule or procedure to decide whether two arbitrary | One of the most common operations on IRIs is simple comparison: | |||
IRIs are equivalent or not (i.e. whether they refer to the same | determining if two IRIs are equivalent without using the IRIs or the | |||
resource or not). Two IRIs that look almost the same may refer to | mapped URIs to access their respective resource(s). A comparison is | |||
different resources. Two IRIs that look completely different may | performed every time a response cache is accessed, a browser checks | |||
refer to the same resource. Each specification or application that | its history to color a link, or an XML parser processes tags within a | |||
uses IRIs has to decide on the appropriate criterion for IRI | namespace. Extensive normalization prior to comparison of IRIs may | |||
equivalence. | be used by spiders and indexing engines to prune a search space or | |||
reduce duplication of request actions and response storage. | ||||
5.1 Simple String Comparison | IRI comparison is performed in respect to some particular purpose, | |||
and implementations with differing purposes will often be subject to | ||||
differing design trade-offs in regards to how much effort should be | ||||
spent in reducing aliased identifiers. This section describes a | ||||
variety of methods that may be used to compare IRIs, the trade-offs | ||||
between them, and the types of applications that might use them. | ||||
In some scenarios a definite answer to the question of IRI | 5.1 Equivalence | |||
equivalence is needed that is independent of the scheme used and | ||||
always can be calculated quickly and without accessing a network. An | ||||
example of such a case is XML Namespaces ([XMLNamespace]). In such | ||||
cases, two IRIs SHOULD be defined as equivalent if and only if they | ||||
are character-by-character equivalent. This is the same as being | ||||
byte-by-byte equivalent if the character encoding for both IRIs is | ||||
the same. As an example, | ||||
http://example.org/~user, http://example.org/%7euser, and | ||||
http://example.org/%7Euser are not equivalent under this definition. | ||||
When comparing character-by-character, the comparison function MUST | ||||
NOT map IRIs to URIs, because such a mapping would create additional | ||||
spurious equivalences. | ||||
It follows that IRIs SHOULD NOT be modified when being transported if | Since IRIs exist to identify resources, presumably they should be | |||
there is any chance that this IRI might be used as an identifier in | considered equivalent when they identify the same resource. However, | |||
the way explained above. When an IRI is used as an identifier in | such a definition of equivalence is not of much practical use, since | |||
scenarios that depend upon character-by-character equivalence, | there is no way for an implementation to compare two resources that | |||
creators of IRIs should take additional care to avoid IRIs that only | are not under its own control. For this reason, determination of | |||
differ in their use of percent-escaping. As an example, using both | equivalence or difference of IRIs is based on string comparison, | |||
http://example.org/~user and http://example.org/%7Euser to identify | perhaps augmented by reference to additional rules provided by URI | |||
XML Namespaces is a bad idea. | scheme definitions. We use the terms "different" and "equivalent" to | |||
describe the possible outcomes of such comparisons, but there are | ||||
many applicationdependent versions of equivalence. | ||||
5.2 Conversion to URIs | Even though it is possible to determine that two IRIs are equivalent, | |||
IRI comparison is not sufficient to determine if two IRIs identify | ||||
different resources. For example, an owner of two different domain | ||||
names could decide to serve the same resource from both, resulting in | ||||
two different IRIs. Therefore, comparison methods are designed to | ||||
minimize false negatives while strictly avoiding false positives. | ||||
For actual resolution, differences in percent-encoding (except for | In testing for equivalence, applications should not directly compare | |||
the percent-encoding of reserved characters) MUST always result in | relative references; the references should be converted to their | |||
the same resource. For example, http://example.org/~user, | respective target IRIs before comparison. When IRIs are being | |||
http://example.org/%7euser and http://example.org/%7Euser must | compared for the purpose of selecting (or avoiding) a network action, | |||
resolve to the same resource. | such as retrieval of a representation, fragment components (if any) | |||
should be excluded from the comparison. | ||||
If this kind of equivalence is to be tested, the percent-encoding of | Applications using IRIs as identity tokens with no relationship to a | |||
both IRIs to be compared has to be aligned, for example by converting | protocol MUST use the Simple String Comparison (see Section 5.3.1). | |||
both IRIs to URIs (see Section 3.1), eliminating escape differences | All other applications MUST select one of the comparison practices | |||
in the resulting URIs, and making sure that the case of the | from the Comparison Ladder (see Section 5.3, or, after IRI-to-URI | |||
hexadecimal characters in the percent-encoding is always the same | conversion, select one of the comparison practices from the URI | |||
(preferably upper case). If the IRI is to be passed to another | comparison ladder [RFCYYYY], Section 6.2. | |||
application, or used further in some other way, its original form | ||||
MUST be preserved; the conversion described here should be performed | ||||
only for the purpose of local comparison. | ||||
Additional, similar equivalences are possible based on knowledge | 5.2 Preparation for Comparison | |||
about the generic URI/IRI syntax, such as the fact that the scheme | ||||
part is case-insensitive. | ||||
5.3 Normalization | Any kind of IRI comparison REQUIRES that all escapings or encodings | |||
in the protocol or format that carries an IRI are resolved. This is | ||||
usually done when parsing the protocol or format. Examples of such | ||||
escapings or encodings are entities and numeric character references | ||||
in [HTML4] and [XML1]. As an example, http://example.org/rosé | ||||
(in HTML), http://example.org/rosé (in HTML or XML), and | ||||
http://example.org/rosé (in HTML or XML) all get resolved into | ||||
what is denoted in this document (see Section 1.4) as | ||||
http://example.org/rosé (the "é" here standing for the | ||||
actual e-acute character, to compensate for the fact that this | ||||
document cannot contain non-ASCII characters). | ||||
Similar considerations apply to encodings such as Transfer Codings in | ||||
HTTP (see [RFC2616]) and Content Transfer Encodings in MIME[RFC2045], | ||||
although in these cases, the encoding is not based on characters, but | ||||
on octets, and additional care is required to make sure that | ||||
characters, and not just arbitrary octets, are compared (see Section | ||||
5.3.1). | ||||
5.3 Comparison Ladder | ||||
A variety of methods are used in practice to test IRI equivalence. | ||||
These methods fall into a range, distinguished by the amount of | ||||
processing required and the degree to which the probability of false | ||||
negatives is reduced. As noted above, false negatives cannot be | ||||
eliminated. In practice, their probability can be reduced, but this | ||||
reduction requires more processing and is not cost-effective for all | ||||
applications. | ||||
If this range of comparison practices is considered as a ladder, the | ||||
following discussion will climb the ladder, starting with those | ||||
practices that are cheap but have a relatively higher chance of | ||||
producing false negatives, and proceeding to those that have higher | ||||
computational cost and lower risk of false negatives. | ||||
5.3.1 Simple String Comparison | ||||
If two IRIs, considered as character strings, are identical, then it | ||||
is safe to conclude that they are equivalent. This type of | ||||
equivalence test has very low computational cost and is in wide use | ||||
in a variety of applications, particularly in the domain of parsing | ||||
and when a definitive answer to the question of IRI equivalence is | ||||
needed that is independent of the scheme used and can be calculated | ||||
quickly and without accessing a network. An example of such a case | ||||
is XML Namespaces ([XMLNamespace]). | ||||
Testing strings for equivalence requires some basic precautions. | ||||
This procedure is often referred to as "bit-for-bit" or | ||||
"byte-for-byte" comparison, which is potentially misleading. Testing | ||||
of strings for equality is normally based on pairwise comparison of | ||||
the characters that make up the strings, starting from the first and | ||||
proceeding until both strings are exhausted and all characters found | ||||
to be equal, a pair of characters compares unequal, or one of the | ||||
strings is exhausted before the other. | ||||
Such character comparisons require that each pair of characters be | ||||
put in comparable encoding form. For example, should one IRI be | ||||
stored in a byte array in UTF-8 encoding form, and the second be in a | ||||
UTF-16 encoding form, bit-for-bit comparisons applied naively will | ||||
produce errors. It is better to speak of equality on a | ||||
character-for-character rather than byte-for-byte or bit-for-bit | ||||
basis. In practical terms, character-by-character comparisons should | ||||
be done codepoint-by-codepoint after conversion to a common character | ||||
encoding form. When comparing character-by-character, the comparison | ||||
function MUST NOT map IRIs to URIs, because such a mapping would | ||||
create additional spurious equivalences. It follows that IRIs SHOULD | ||||
NOT be modified when being transported if there is any chance that | ||||
this IRI might be used as an identifier. | ||||
False negatives are caused by the production and use of IRI aliases. | ||||
Unnecessary aliases can be reduced, regardless of the comparison | ||||
method, by consistently providing IRI references in an | ||||
already-normalized form (i.e., a form identical to what would be | ||||
produced after normalization is applied, as described below). | ||||
Protocols and data formats often choose to limit some IRI comparisons | ||||
to simple string comparison, based on the theory that people and | ||||
implementations will, in their own best interest, be consistent in | ||||
providing IRI references, or at least consistent enough to negate any | ||||
efficiency that might be obtained from further normalization. | ||||
5.3.2 Syntax-based Normalization | ||||
Implementations may use logic based on the definitions provided by | ||||
this specification to reduce the probability of false negatives. | ||||
Such processing is moderately higher in cost than | ||||
character-for-character string comparison. For example, an | ||||
application using this approach could reasonably consider the | ||||
following two IRIs equivalent: | ||||
example://a/b/c/%7Bfoo%7D/rosé | ||||
eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9 | ||||
Web user agents, such as browsers, typically apply this type of IRI | ||||
normalization when determining whether a cached response is | ||||
available. Syntax-based normalization includes such techniques as | ||||
case normalization, character normalization, percent-encoding | ||||
normalization, and removal of dot-segments. | ||||
5.3.2.1 Case Normalization | ||||
For all IRIs, the hexadecimal digits within a percent-encoding | ||||
triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore | ||||
should be normalized to use uppercase letters for the digits A-F. | ||||
When an IRI uses components of the generic syntax, the component | ||||
syntax equivalence rules always apply; namely, that the scheme and | ||||
US-ASCII only host are case-insensitive and therefore should be | ||||
normalized to lowercase. For example, the URI | ||||
<HTTP://www.EXAMPLE.com/> is equivalent to <http://www.example.com/>. | ||||
Case equivalence for non-ASCII characters in IRI components that are | ||||
IDNs are discussed in Section 5.3.3. The other generic syntax | ||||
components are assumed to be case-sensitive unless specifically | ||||
defined otherwise by the scheme. | ||||
Creating schemes that allow case-insensitive syntax components | ||||
containing non US-ASCII characters should be avoided because such a | ||||
case normalization may be cultural dependant and is always a complex | ||||
operation. The only exception concerns non-ASCII host names for | ||||
which the character normalization includes a mapping step derived | ||||
from case folding. | ||||
5.3.2.2 Character Normalization | ||||
The Unicode Standard [UNIV4] defines various equivalences between | The Unicode Standard [UNIV4] defines various equivalences between | |||
sequences of characters for various purposes. Unicode Standard Annex | sequences of characters for various purposes. Unicode Standard Annex | |||
#15 [UTR15] defines various Normalization Forms for these | #15 [UTR15] defines various Normalization Forms for these | |||
equivalences, in particular Normalization Form C (NFC, Canonical | equivalences, in particular Normalization Form C (NFC, Canonical | |||
Decomposition, followed by Canonical Composition) and Normalization | Decomposition, followed by Canonical Composition) and Normalization | |||
Form KC (NFKC, Compatibility Decomposition, followed by Canonical | Form KC (NFKC, Compatibility Decomposition, followed by Canonical | |||
Composition). | Composition). | |||
Equivalence of IRIs MUST rely on the assumption that IRIs are | Equivalence of IRIs MUST rely on the assumption that IRIs are | |||
appropriately pre-normalized, rather than applying normalization when | appropriately pre-character-normalized, rather than applying | |||
comparing two IRIs. The exceptions are conversion from a non-digital | character normalization when comparing two IRIs. The exceptions are | |||
form, and conversion from a non-UCS-based character encoding to an | conversion from a non-digital form, and conversion from a | |||
UCS-based character encoding. In these cases, NFC or a normalizing | non-UCS-based character encoding to an UCS-based character encoding. | |||
transcoder using NFC MUST be used for interoperability. To avoid | In these cases, NFC or a normalizing transcoder using NFC MUST be | |||
false negatives and problems with transcoding, IRIs SHOULD be created | used for interoperability. To avoid false negatives and problems | |||
using NFC. Using NFKC may avoid even more problems, for example by | with transcoding, IRIs SHOULD be created using NFC. Using NFKC may | |||
choosing half-width Latin letters instead of full-width, and | avoid even more problems, for example by choosing half-width Latin | |||
full-width Katakana instead of half-width. | letters instead of full-width, and full-width Katakana instead of | |||
half-width. | ||||
As an example, http://www.example.org/résumé.html (in XML | As an example, http://www.example.org/résumé.html (in XML | |||
Notation) is in NFC. On the other hand, | Notation) is in NFC. On the other hand, | |||
http://www.example.org/résumé.html is not in NFC. The | http://www.example.org/résumé.html is not in NFC. The | |||
former uses precombined e-acute characters, the latter uses 'e' | former uses precombined e-acute characters, the latter uses 'e' | |||
characters followed by combining acute accents. Both usages are | characters followed by combining acute accents. Both usages are | |||
defined to be canonically equivalent in [UNIV4]. | defined to be canonically equivalent in [UNIV4]. | |||
Note: Because it is unknown how a particular field is being treated | Note: Because it is unknown how a particular sequence of characters | |||
with respect to text normalization, it would be inappropriate to | is being treated with respect to character normalization, it would | |||
allow third parties to normalize an IRI arbitrarily. This does | be inappropriate to allow third parties to normalize an IRI | |||
not contradict the recommendation that when a resource is created, | arbitrarily. This does not contradict the recommendation that | |||
its IRI should be as normalized as possible (i.e. NFC or even | when a resource is created, its IRI should be as | |||
NFKC). This is similar to the upper-case/lower-case problems in | character-normalized as possible (i.e. NFC or even NFKC). This | |||
URIs. Some parts of a URI are case-insensitive (domain name). | is similar to the upper-case/lower-case problems in | |||
For others, it is unclear whether they are case-sensitive or | character-normalized as possible (i.e. NFC or even NFKC). URIs. | |||
Some parts of a URI are case-insensitive (domain name). For | ||||
others, it is unclear whether they are case-sensitive or | ||||
case-insensitive, or something in between (e.g. case-sensitive, | case-insensitive, or something in between (e.g. case-sensitive, | |||
but if the wrong case is used, a multiple choice selection is | but if the wrong case is used, a multiple choice selection is | |||
provided instead of a direct negative result). The best recipe is | provided instead of a direct negative result). The best recipe is | |||
that the creator uses a reasonable capitalization, and when | that the creator uses a reasonable capitalization, and when | |||
transferring the URI, that capitalization is never changed. | transferring the URI, that capitalization is never changed. | |||
Various IRI schemes may allow the usage of International Domain Names | Various IRI schemes may allow the usage of Internationalized Domain | |||
(IDN) [RFC3490]. When in use in IRIs, those names SHOULD be | Names (IDN) [RFC3490] either in the ireg-name part or elsewhere. | |||
validated using the ToASCII operation defined in [RFC3490], with the | Character Normalization also applies to IDNs, as discussed in Section | |||
flags "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing | 5.3.3. | |||
an invalid IDN cannot successfully be resolved. For legibility | ||||
purposes, IDN components of IRIs SHOULD NOT be converted into ASCII | ||||
Compatible Encoding (ACE). | ||||
5.4 Preferred Forms | 5.3.2.3 Percent-Encoding Normalization | |||
The following are the preferred forms for IRIs when created: | The percent-encoding mechanism (Section 2.1 of [RFCYYYY]) is a | |||
frequent source of variance among otherwise identical IRIs. In | ||||
addition to the case normalization issue noted above, some IRI | ||||
producers percent-encode octets that do not require percent-encoding, | ||||
resulting in IRIs that are equivalent to their nonencoded | ||||
counterparts. Such IRIs should be normalized by decoding any | ||||
percent-encoded octet sequence that corresponds to an unreserved | ||||
character, as described in Section 2.3 of [RFCYYYY]. | ||||
- Always provide the URI scheme in lowercase characters. | For actual resolution, differences in percent-encoding (except for | |||
the percent-encoding of reserved characters) MUST always result in | ||||
the same resource. For example, http://example.org/~user, | ||||
http://example.org/%7euser and http://example.org/%7Euser must | ||||
resolve to the same resource. | ||||
- Only perform percent-encoding where it is essential. | If this kind of equivalence is to be tested, the percent-encoding of | |||
both IRIs to be compared has to be aligned, for example by converting | ||||
both IRIs to URIs (see Section 3.1), eliminating escape differences | ||||
in the resulting URIs, and making sure that the case of the | ||||
hexadecimal characters in the percent-encoding is always the same | ||||
(preferably upper case). If the IRI is to be passed to another | ||||
application, or used further in some other way, its original form | ||||
MUST be preserved; the conversion described here should be performed | ||||
only for the purpose of local comparison. | ||||
- Always use uppercase A-through-F characters when percent-encoding. | 5.3.2.4 Path Segment Normalization | |||
- For those schemes where ireg-name is a domain name, always provide | The complete path segments "." and ".." are intended only for use | |||
the individual labels, in the form produced when applying nameprep | within relative references (Section 4.1 of [RFCYYYY]) and are removed | |||
[RFC3491]. This in particular includes using lowercase characters | as part of the reference resolution process (Section 5.2 of | |||
rather than uppercase characters where applicable. Also, always | [RFCYYYY]). However, some implementations may incorrectly assume | |||
use US-ASCII '.' as a separator. | that reference resolution is not necessary when the reference is | |||
already an IRI, and thus fail to remove dot-segments when they occur | ||||
in non-relative paths. IRI normalizers should remove dot-segments by | ||||
applying the remove_dot_segments algorithm to the path, as described | ||||
in Section 5.2.4 of [RFCYYYY]. | ||||
- Where possible, provide IRI components in NFKC or NFC. | 5.3.3 Scheme-based Normalization | |||
- Prevent /./ and /../ from appearing in IRI paths. | The syntax and semantics of IRIs vary from scheme to scheme, as | |||
described by the defining specification for each scheme. | ||||
Implementations may use scheme-specific rules, at further processing | ||||
cost, to reduce the probability of false negatives. For example, | ||||
since the "http" scheme makes use of an authority component, has a | ||||
default port of "80", and defines an empty path to be equivalent to | ||||
"/", the following four IRIs are equivalent: | ||||
- For schemes that define an empty path to be equivalent to a path | http://example.com | |||
of "/", use "/". | http://example.com/ | |||
http://example.com:/ | ||||
http://example.com:80/ | ||||
In general, an IRI that uses the generic syntax for authority with an | ||||
empty path should be normalized to a path of "/"; likewise, an | ||||
explicit ":port", where the port is empty or the default for the | ||||
scheme, is equivalent to one where the port and its ":" delimiter are | ||||
elided, and thus should be removed by scheme-based normalization. | ||||
For example, the second IRI above is the normal form for the "http" | ||||
scheme. | ||||
Another case where normalization varies by scheme is in the handling | ||||
of an empty authority component or empty host subcomponent. For many | ||||
scheme specifications, an empty authority or host is considered an | ||||
error; for others, it is considered equivalent to "localhost" or the | ||||
end-user's host. When a scheme defines a default for authority and | ||||
an IRI reference to that default is desired, the reference should be | ||||
normalized to an empty authority for the sake of uniformity, brevity, | ||||
and internationalization. If, however, either the userinfo or port | ||||
subcomponent is non-empty, then the host should be given explicitly | ||||
even if it matches the default. | ||||
Normalization should not remove delimiters when their associated | ||||
component is empty unless licensed to do so by the scheme | ||||
specification. For example, the IRI "http://example.com/?" cannot be | ||||
assumed to be equivalent to any of the examples above. Likewise, the | ||||
presence or absence of delimiters within a userinfo subcomponent is | ||||
usually significant to its interpretation. The fragment component is | ||||
not subject to any scheme-based normalization; thus, two IRIs that | ||||
differ only by the suffix "#" are considered different regardless of | ||||
the scheme. | ||||
Some IRI schemes may allow the usage of Internationalized Domain | ||||
Names (IDN) [RFC3490] either in their ireg-name part or elsewhere. | ||||
When in use in IRIs, those names SHOULD be validated using the | ||||
ToASCII operation defined in [RFC3490], with the flags | ||||
"UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing an | ||||
invalid IDN cannot successfully be resolved. Validated IDN | ||||
components of IRIs SHOULD be character normalized using the Nameprep | ||||
process [RFC3491]; however, for legibility purposes, they SHOULD NOT | ||||
be converted into ASCII Compatible Encoding (ACE). | ||||
Scheme-based normalization may also consider IDN components and their | ||||
conversions to punycode as equivalent. As an example, | ||||
http://résumé.example.org may be considered equivalent to | ||||
http://xn--rsum-bpad.example.org | ||||
Other scheme-specific normalizations are possible. | ||||
5.3.4 Protocol-based Normalization | ||||
Web spiders, for which substantial effort to reduce the incidence of | ||||
false negatives is often cost-effective, are observed to implement | ||||
even more aggressive techniques in IRI comparison. For example, if | ||||
they observe that an IRI such as | ||||
http://example.com/data | ||||
redirects to an IRI differing only in the trailing slash | ||||
http://example.com/data/ | ||||
they will likely regard the two as equivalent in the future. This | ||||
kind of technique is only appropriate when equivalence is clearly | ||||
indicated by both the result of accessing the resources and the | ||||
common conventions of their scheme's dereference algorithm (in this | ||||
case, use of redirection by HTTP origin servers to avoid problems | ||||
with relative references). | ||||
6. Use of IRIs | 6. Use of IRIs | |||
6.1 Limitations on UCS Characters Allowed in IRIs | 6.1 Limitations on UCS Characters Allowed in IRIs | |||
This section discusses limitations on characters and character | This section discusses limitations on characters and character | |||
sequences usable for IRIs beyond those given in Section 2.2 and | sequences usable for IRIs beyond those given in Section 2.2 and | |||
Section 4.1. The considerations in this section are relevant when | Section 4.1. The considerations in this section are relevant when | |||
creating IRIs and when converting from URIs to IRIs. | creating IRIs and when converting from URIs to IRIs. | |||
skipping to change at page 35, line 15 | skipping to change at page 39, line 31 | |||
The discussion on the issue addressed here has started a long time | The discussion on the issue addressed here has started a long time | |||
ago. There was a thread in the HTML working group in August 1995 | ago. There was a thread in the HTML working group in August 1995 | |||
(under the topic of "Globalizing URIs") and in the www-international | (under the topic of "Globalizing URIs") and in the www-international | |||
mailing list in July 1996 (under the topic of "Internationalization | mailing list in July 1996 (under the topic of "Internationalization | |||
and URLs"), and ad-hoc meetings at the Unicode conferences in | and URLs"), and ad-hoc meetings at the Unicode conferences in | |||
September 1995 and September 1997. | September 1995 and September 1997. | |||
Many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding, | Many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding, | |||
Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim | Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim | |||
Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie | Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie | |||
Daigle, Ted Hardie, Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex | Daigle, Ted Hardie, Bill Fenner, Margaret Wasserman, Russ Housley, | |||
Texin, Graham Klyne, Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam | Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex Texin, Graham Klyne, | |||
Costello, Dan Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy | Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam Costello, Dan | |||
Badami, Jonathan Rosenne, Asmus Freytag, Simon Josefsson, Carlos | Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy Badami, Jonathan | |||
Viegas Damasio, Chris Haynes, Walter Underwood, and many others for | Rosenne, Asmus Freytag, Simon Josefsson, Carlos Viegas Damasio, Chris | |||
help with understanding the issues and possible solutions, and | Haynes, Walter Underwood, and many others for help with understanding | |||
getting the details right. | the issues and possible solutions, and getting the details right. | |||
This document is a product of the Internationalization Working Group | This document is a product of the Internationalization Working Group | |||
(I18N WG) of the World Wide Web Consortium (W3C). Thanks to the | (I18N WG) of the World Wide Web Consortium (W3C). Thanks to the | |||
members of the W3C I18N Working Group and Interest Group for their | members of the W3C I18N Working Group and Interest Group for their | |||
contributions and their work on [CharMod]. Thanks also go to the | contributions and their work on [CharMod]. Thanks also go to the | |||
members of many other W3C Working Groups for adopting IRIs, and to | members of many other W3C Working Groups for adopting IRIs, and to | |||
the members of the Montreal IAB Workshop on Internationalization and | the members of the Montreal IAB Workshop on Internationalization and | |||
Localization for their review. | Localization for their review. | |||
11. References | 11. References | |||
skipping to change at page 36, line 17 | skipping to change at page 40, line 34 | |||
Profile for Internationalized Domain Names (IDN)", RFC | Profile for Internationalized Domain Names (IDN)", RFC | |||
3491, March 2003. | 3491, March 2003. | |||
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | |||
10646", STD 63, RFC 3629, November 2003. | 10646", STD 63, RFC 3629, November 2003. | |||
[RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | [RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | |||
Resource Identifier (URI): Generic Syntax (Note to the RFC | Resource Identifier (URI): Generic Syntax (Note to the RFC | |||
Editor: Please update this reference with the RFC | Editor: Please update this reference with the RFC | |||
resulting from draft-fielding-uri-rfc2396bis-xx.txt, and | resulting from draft-fielding-uri-rfc2396bis-xx.txt, and | |||
remove this Note)", draft-fielding-uri-rfc2396bis-07.txt | remove this Note)", draft-fielding-uri-rfc2396bis-07 (work | |||
(work in progress), April 2004. | in progress), April 2004. | |||
[UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | [UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | |||
Annex #9, March 2004, | Annex #9, March 2004, | |||
<http://www.unicode.org/reports/tr9/tr9-13.html>. | <http://www.unicode.org/reports/tr9/tr9-13.html>. | |||
[UNIV4] The Unicode Consortium, "The Unicode Standard, Version | [UNIV4] The Unicode Consortium, "The Unicode Standard, Version | |||
4.0.1, defined by: The Unicode Standard, Version 4.0 | 4.0.1, defined by: The Unicode Standard, Version 4.0 | |||
(Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), | (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), | |||
as amended by Unicode 4.0.1 | as amended by Unicode 4.0.1 | |||
(http://www.unicode.org/versions/Unicode4.0.1/)", March | (http://www.unicode.org/versions/Unicode4.0.1/)", March | |||
skipping to change at page 36, line 46 | skipping to change at page 41, line 15 | |||
11.2 Non-normative References | 11.2 Non-normative References | |||
[BidiEx] "Examples of bidirectional IRIs", | [BidiEx] "Examples of bidirectional IRIs", | |||
<http://www.w3.org/International/iri-edit/BidiExamples>. | <http://www.w3.org/International/iri-edit/BidiExamples>. | |||
[CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T. | [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T. | |||
Texin, "Character Model for the World Wide Web", World | Texin, "Character Model for the World Wide Web", World | |||
Wide Web Consortium Working Draft, February 2004, | Wide Web Consortium Working Draft, February 2004, | |||
<http://www.w3.org/TR/charmod>. | <http://www.w3.org/TR/charmod>. | |||
[Duerst01] | ||||
Duerst, M., "Internationalized Resource Identifiers: From | ||||
Specification to Testing", Proc. 19th International | ||||
Unicode Conference, San Jose , September 2001, | ||||
<http://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html>. | ||||
[Duerst97] | [Duerst97] | |||
Duerst, M., "The Properties and Promises of UTF-8", Proc. | Duerst, M., "The Properties and Promises of UTF-8", Proc. | |||
11th International Unicode Conference, San Jose , | 11th International Unicode Conference, San Jose , | |||
September 1997, | September 1997, | |||
<http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/ | <http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/ | |||
IUC11-UTF-8.pdf>. | IUC11-UTF-8.pdf>. | |||
[Gettys] Gettys, J., "URI Model Consequences", | [Gettys] Gettys, J., "URI Model Consequences", | |||
<http://www.w3.org/DesignIssues/ModelConsequences>. | <http://www.w3.org/DesignIssues/ModelConsequences>. | |||
[HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01 | [HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01 | |||
Specification", World Wide Web Consortium Recommendation, | Specification", World Wide Web Consortium Recommendation, | |||
December 1999, | December 1999, | |||
<http://www.w3.org/TR/REC-html40/appendix/ | <http://www.w3.org/TR/REC-html40/appendix/ | |||
notes.html#h-B.2>. | notes.html#h-B.2>. | |||
[RFC2045] Freed, N. and N. Freed, "Multipurpose Internet Mail | ||||
Extensions (MIME) Part One: Format of Internet Message | ||||
Bodies", RFC 2045, November 1996. | ||||
[RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H., | [RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H., | |||
Atkinson, R., Crispin, M. and P. Svanberg, "The Report of | Atkinson, R., Crispin, M. and P. Svanberg, "The Report of | |||
the IAB Character Set Workshop held 29 February - 1 March, | the IAB Character Set Workshop held 29 February - 1 March, | |||
1996", RFC 2130, April 1997. | 1996", RFC 2130, April 1997. | |||
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | |||
[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. | [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. | |||
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | |||
skipping to change at page 38, line 11 | skipping to change at page 42, line 25 | |||
Protocol", RFC 2640, July 1999. | Protocol", RFC 2640, July 1999. | |||
[RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, | [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, | |||
"Guidelines for new URL Schemes", RFC 2718, November 1999. | "Guidelines for new URL Schemes", RFC 2718, November 1999. | |||
[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | |||
Markup Languages", Unicode Technical Report #20, World | Markup Languages", Unicode Technical Report #20, World | |||
Wide Web Consortium Note, February 2002, | Wide Web Consortium Note, February 2002, | |||
<http://www.w3.org/TR/unicode-xml/>. | <http://www.w3.org/TR/unicode-xml/>. | |||
[W3CIRI] Duerst, M., "Internationalization - URIs and other | ||||
identifiers", September 2002, | ||||
<http://www.w3.org/International/O-URL-and-ident.html>. | ||||
[XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking | [XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking | |||
Language (XLink) Version 1.0", World Wide Web Consortium | Language (XLink) Version 1.0", World Wide Web Consortium | |||
Recommendation, June 2001, | Recommendation, June 2001, | |||
<http://www.w3.org/TR/xlink/#link-locators>. | <http://www.w3.org/TR/xlink/#link-locators>. | |||
[XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and | [XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and | |||
F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third | F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third | |||
Edition)", World Wide Web Consortium Recommendation, | Edition)", World Wide Web Consortium Recommendation, | |||
February 2004, | February 2004, | |||
<http://www.w3.org/TR/REC-xml#sec-external-ent>. | <http://www.w3.org/TR/REC-xml#sec-external-ent>. | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.16, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |