| draft-duerst-iri-10.txt | draft-duerst-iri.txt | |||
|---|---|---|---|---|
| Network Working Group M. Duerst | Network Working Group M. Duerst | |||
| Internet-Draft W3C | Internet-Draft W3C | |||
| Expires: March 28, 2005 M. Suignard | Expires: May 31, 2005 M. Suignard | |||
| Microsoft Corporation | Microsoft Corporation | |||
| September 27, 2004 | November 30, 2004 | |||
| Internationalized Resource Identifiers (IRIs) | Internationalized Resource Identifiers (IRIs) | |||
| draft-duerst-iri-10 | draft-duerst-iri-11 | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is subject to all provisions | This document is an Internet-Draft and is subject to all provisions | |||
| of section 3 of RFC 3667. By submitting this Internet-Draft, each | of section 3 of RFC 3667. By submitting this Internet-Draft, each | |||
| author represents that any applicable patent or other IPR claims of | author represents that any applicable patent or other IPR claims of | |||
| which he or she is aware have been or will be disclosed, and any of | which he or she is aware have been or will be disclosed, and any of | |||
| which he or she become aware will be disclosed, in accordance with | which he or she become aware will be disclosed, in accordance with | |||
| RFC 3668. | RFC 3668. | |||
| skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on March 28, 2005. | This Internet-Draft will expire on May 31, 2005. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2004). | Copyright (C) The Internet Society (2004). | |||
| Abstract | Abstract | |||
| This document defines a new protocol element, the Internationalized | This document defines a new protocol element, the Internationalized | |||
| Resource Identifier (IRI), as a complement to the Uniform Resource | Resource Identifier (IRI), as a complement to the Uniform Resource | |||
| Identifier (URI). An IRI is a sequence of characters from the | Identifier (URI). An IRI is a sequence of characters from the | |||
| skipping to change at page 2, line 16 | skipping to change at page 2, line 16 | |||
| of extending or changing the definition of URIs, to allow a clear | of extending or changing the definition of URIs, to allow a clear | |||
| distinction and to avoid incompatibilities with existing software. | distinction and to avoid incompatibilities with existing software. | |||
| Guidelines for the use and deployment of IRIs in various protocols, | Guidelines for the use and deployment of IRIs in various protocols, | |||
| formats, and software components that now deal with URIs are | formats, and software components that now deal with URIs are | |||
| provided. | provided. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1 Overview and Motivation . . . . . . . . . . . . . . . . . 4 | 1.1 Overview and Motivation . . . . . . . . . . . . . . . . . 4 | |||
| 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 7 | 2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . 7 | |||
| 2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . 8 | 2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . 8 | |||
| 3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 11 | 3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 10 | |||
| 3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . 11 | 3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . 11 | |||
| 3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . 14 | 3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . 14 | |||
| 3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 17 | 4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 17 | |||
| 4.1 Logical Storage and Visual Presentation . . . . . . . . . 17 | 4.1 Logical Storage and Visual Presentation . . . . . . . . . 17 | |||
| 4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . 19 | 4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . 18 | |||
| 4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . 20 | 4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 5. IRI Equivalence and Comparison . . . . . . . . . . . . . . . . 22 | 5. Normalization and Comparison . . . . . . . . . . . . . . . . . 22 | |||
| 5.1 Simple String Comparison . . . . . . . . . . . . . . . . . 22 | 5.1 Equivalence . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 5.2 Conversion to URIs . . . . . . . . . . . . . . . . . . . . 23 | 5.2 Preparation for Comparison . . . . . . . . . . . . . . . . 23 | |||
| 5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . 23 | 5.3 Comparison Ladder . . . . . . . . . . . . . . . . . . . . 23 | |||
| 5.4 Preferred Forms . . . . . . . . . . . . . . . . . . . . . 24 | 5.3.1 Simple String Comparison . . . . . . . . . . . . . . . 24 | |||
| 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 5.3.2 Syntax-based Normalization . . . . . . . . . . . . . . 25 | |||
| 6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 25 | 5.3.3 Scheme-based Normalization . . . . . . . . . . . . . . 27 | |||
| 6.2 Software Interfaces and Protocols . . . . . . . . . . . . 25 | 5.3.4 Protocol-based Normalization . . . . . . . . . . . . . 29 | |||
| 6.3 Format of URIs and IRIs in Documents and Protocols . . . . 26 | 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
| 6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 26 | 6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 29 | |||
| 6.5 Relative IRI References . . . . . . . . . . . . . . . . . 28 | 6.2 Software Interfaces and Protocols . . . . . . . . . . . . 30 | |||
| 7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 28 | 6.3 Format of URIs and IRIs in Documents and Protocols . . . . 30 | |||
| 7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 28 | 6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 30 | |||
| 7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 28 | 6.5 Relative IRI References . . . . . . . . . . . . . . . . . 32 | |||
| 7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 29 | 7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 32 | |||
| 7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 30 | 7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 32 | |||
| 7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 30 | 7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 33 | |||
| 7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 31 | 7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 34 | |||
| 7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 31 | 7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 34 | |||
| 7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 32 | 7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 35 | |||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 33 | 7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 35 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 | 7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 36 | |||
| 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 | 7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 36 | |||
| 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | |||
| 11.1 Normative References . . . . . . . . . . . . . . . . . . . . 35 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 | |||
| 11.2 Non-normative References . . . . . . . . . . . . . . . . . . 36 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 39 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 38 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
| A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 39 | 11.1 Normative References . . . . . . . . . . . . . . . . . . . . 39 | |||
| A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 39 | 11.2 Non-normative References . . . . . . . . . . . . . . . . . . 41 | |||
| A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 40 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 40 | A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 43 | |||
| A.4 Indicating Character Encodings in the URI/IRI . . . . . . 40 | A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 41 | A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 44 | |||
| A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 44 | ||||
| A.4 Indicating Character Encodings in the URI/IRI . . . . . . 44 | ||||
| Intellectual Property and Copyright Statements . . . . . . . . 45 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1 Overview and Motivation | 1.1 Overview and Motivation | |||
| A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a | A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a | |||
| sequence of characters chosen from a limited subset of the repertoire | sequence of characters chosen from a limited subset of the repertoire | |||
| of US-ASCII [ASCII] characters. | of US-ASCII [ASCII] characters. | |||
| The characters in URIs are frequently used for representing words of | The characters in URIs are frequently used for representing words of | |||
| skipping to change at page 4, line 45 | skipping to change at page 4, line 45 | |||
| [RFCYYYY], such as URI references. The syntax of IRIs is defined in | [RFCYYYY], such as URI references. The syntax of IRIs is defined in | |||
| Section 2, and the relationship between IRIs and URIs in Section 3. | Section 2, and the relationship between IRIs and URIs in Section 3. | |||
| Using characters outside of A-Z in IRIs brings with it some | Using characters outside of A-Z in IRIs brings with it some | |||
| difficulties. Section 4 discusses the special case of bidirectional | difficulties. Section 4 discusses the special case of bidirectional | |||
| IRIs, Section 5 various forms of equivalence between IRIs, and | IRIs, Section 5 various forms of equivalence between IRIs, and | |||
| Section 6 the use of IRIs in different situations. Section 7 gives | Section 6 the use of IRIs in different situations. Section 7 gives | |||
| additional informative guidelines, and Section 8 security | additional informative guidelines, and Section 8 security | |||
| considerations. | considerations. | |||
| For discussion of this document, please use the public-iri@w3.org | ||||
| mailing list (publicly archived at | ||||
| http://lists.w3.org/Archives/Public/public-iri/). An issues list for | ||||
| this document is maintained at | ||||
| http://www.w3.org/International/iri-edit#issues. For more | ||||
| information on the topic of this document, please also see [W3CIRI] | ||||
| and [Duerst01]. | ||||
| 1.2 Applicability | 1.2 Applicability | |||
| IRIs are designed to be compatible with recommendations for new URI | IRIs are designed to be compatible with recommendations for new URI | |||
| schemes [RFC2718]. The compatibility is provided by specifying a | schemes [RFC2718]. The compatibility is provided by specifying a | |||
| well defined and deterministic mapping from the IRI character | well defined and deterministic mapping from the IRI character | |||
| sequence to the functionally equivalent URI character sequence. | sequence to the functionally equivalent URI character sequence. | |||
| Practical use of IRIs (or IRI references) in place of URIs (or URI | Practical use of IRIs (or IRI references) in place of URIs (or URI | |||
| references) depends on the following conditions being met: | references) depends on the following conditions being met: | |||
| a) The protocol or format element where IRIs are used should be | a) The protocol or format element where IRIs are used should be | |||
| skipping to change at page 11, line 5 | skipping to change at page 10, line 44 | |||
| / "25" %x30-35 ; 250-255 | / "25" %x30-35 ; 250-255 | |||
| pct-encoded = "%" HEXDIG HEXDIG | pct-encoded = "%" HEXDIG HEXDIG | |||
| unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" | |||
| reserved = gen-delims / sub-delims | reserved = gen-delims / sub-delims | |||
| gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" | gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" | |||
| sub-delims = "!" / "$" / "&" / "'" / "(" / ")" | sub-delims = "!" / "$" / "&" / "'" / "(" / ")" | |||
| / "*" / "+" / "," / ";" / "=" | / "*" / "+" / "," / ";" / "=" | |||
| This syntax does not support IPv6 scoped addressing zone identifiers. | ||||
| 3. Relationship between IRIs and URIs | 3. Relationship between IRIs and URIs | |||
| IRIs are meant to replace URIs in identifying resources for | IRIs are meant to replace URIs in identifying resources for | |||
| protocols, formats and software components which use a UCS-based | protocols, formats and software components which use a UCS-based | |||
| character repertoire. These protocols and components may never need | character repertoire. These protocols and components may never need | |||
| to use URIs directly, especially when the resource identifier is used | to use URIs directly, especially when the resource identifier is used | |||
| simply for identification purposes. However, when the resource | simply for identification purposes. However, when the resource | |||
| identifier is used for resource retrieval, it is in many cases | identifier is used for resource retrieval, it is in many cases | |||
| necessary to determine the associated URI because most retrieval | necessary to determine the associated URI because most retrieval | |||
| mechanisms currently only are defined for URIs. In this case, IRIs | mechanisms currently only are defined for URIs. In this case, IRIs | |||
| skipping to change at page 12, line 12 | skipping to change at page 12, line 7 | |||
| characters from the UCS normalized according to Normalization | characters from the UCS normalized according to Normalization | |||
| Form C (NFC, [UTR15]). | Form C (NFC, [UTR15]). | |||
| Variant B) If the IRI is in some digital representation (e.g. an | Variant B) If the IRI is in some digital representation (e.g. an | |||
| octet stream) in some known non-Unicode character encoding: | octet stream) in some known non-Unicode character encoding: | |||
| Convert the IRI to a sequence of characters from the UCS | Convert the IRI to a sequence of characters from the UCS | |||
| normalized according to NFC. | normalized according to NFC. | |||
| Variant C) If the IRI is in an Unicode-based character encoding | Variant C) If the IRI is in an Unicode-based character encoding | |||
| (for example UTF-8 or UTF-16): Do not normalize (see Section | (for example UTF-8 or UTF-16): Do not normalize (see Section | |||
| 5.3 for details). Apply Step 2 directly to the encoded Unicode | 5.3.2.2 for details). Apply Step 2 directly to the encoded | |||
| character sequence. | Unicode character sequence. | |||
| Step 2) For each character in 'ucschar' or 'iprivate', apply Steps | Step 2) For each character in 'ucschar' or 'iprivate', apply Steps | |||
| 2.1 through 2.3 below. | 2.1 through 2.3 below. | |||
| 2.1) Convert the character to a sequence of one or more octets | 2.1) Convert the character to a sequence of one or more octets | |||
| using UTF-8 [RFC3629]. | using UTF-8 [RFC3629]. | |||
| 2.2) Convert each octet to %HH, where HH is the hexadecimal | 2.2) Convert each octet to %HH, where HH is the hexadecimal | |||
| notation of the octet value. Note that this is identical to | notation of the octet value. Note that this is identical to | |||
| the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To | the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To | |||
| skipping to change at page 22, line 14 | skipping to change at page 22, line 8 | |||
| Depending on whether the upper-case letters represent Arabic or | Depending on whether the upper-case letters represent Arabic or | |||
| Hebrew, the visual representation is different. | Hebrew, the visual representation is different. | |||
| Example 10 (allowed, but not recommended): | Example 10 (allowed, but not recommended): | |||
| logical representation: http://ab.CDEFGH.123/kl/mn/op.html | logical representation: http://ab.CDEFGH.123/kl/mn/op.html | |||
| visual representation: http://ab.123.HGFEDC/kl/mn/op.html | visual representation: http://ab.123.HGFEDC/kl/mn/op.html | |||
| Components consisting of only numbers are allowed (it would be rather | Components consisting of only numbers are allowed (it would be rather | |||
| difficult to prohibit them), but may interact with adjacent RTL | difficult to prohibit them), but may interact with adjacent RTL | |||
| components in ways that are not easy to predict. | components in ways that are not easy to predict. | |||
| 5. IRI Equivalence and Comparison | 5. Normalization and Comparison | |||
| This section discusses IRI Equivalence and Comparison similar to | Note: The structure and much of the material for this section is | |||
| Section 6, "Normalization and Comparison", in [RFCYYYY]. This | taken from section 6 of [RFCYYYY]; the differences are due to the | |||
| section focuses on the main issues and on aspects that are different | specifics of IRIs. | |||
| from [RFCYYYY]; Section 6 of [RFCYYYY] is recommended background | ||||
| reading. | ||||
| There is no general rule or procedure to decide whether two arbitrary | One of the most common operations on IRIs is simple comparison: | |||
| IRIs are equivalent or not (i.e. whether they refer to the same | determining if two IRIs are equivalent without using the IRIs or the | |||
| resource or not). Two IRIs that look almost the same may refer to | mapped URIs to access their respective resource(s). A comparison is | |||
| different resources. Two IRIs that look completely different may | performed every time a response cache is accessed, a browser checks | |||
| refer to the same resource. Each specification or application that | its history to color a link, or an XML parser processes tags within a | |||
| uses IRIs has to decide on the appropriate criterion for IRI | namespace. Extensive normalization prior to comparison of IRIs may | |||
| equivalence. | be used by spiders and indexing engines to prune a search space or | |||
| reduce duplication of request actions and response storage. | ||||
| 5.1 Simple String Comparison | IRI comparison is performed in respect to some particular purpose, | |||
| and implementations with differing purposes will often be subject to | ||||
| differing design trade-offs in regards to how much effort should be | ||||
| spent in reducing aliased identifiers. This section describes a | ||||
| variety of methods that may be used to compare IRIs, the trade-offs | ||||
| between them, and the types of applications that might use them. | ||||
| In some scenarios a definite answer to the question of IRI | 5.1 Equivalence | |||
| equivalence is needed that is independent of the scheme used and | ||||
| always can be calculated quickly and without accessing a network. An | ||||
| example of such a case is XML Namespaces ([XMLNamespace]). In such | ||||
| cases, two IRIs SHOULD be defined as equivalent if and only if they | ||||
| are character-by-character equivalent. This is the same as being | ||||
| byte-by-byte equivalent if the character encoding for both IRIs is | ||||
| the same. As an example, | ||||
| http://example.org/~user, http://example.org/%7euser, and | ||||
| http://example.org/%7Euser are not equivalent under this definition. | ||||
| When comparing character-by-character, the comparison function MUST | ||||
| NOT map IRIs to URIs, because such a mapping would create additional | ||||
| spurious equivalences. | ||||
| It follows that IRIs SHOULD NOT be modified when being transported if | Since IRIs exist to identify resources, presumably they should be | |||
| there is any chance that this IRI might be used as an identifier in | considered equivalent when they identify the same resource. However, | |||
| the way explained above. When an IRI is used as an identifier in | such a definition of equivalence is not of much practical use, since | |||
| scenarios that depend upon character-by-character equivalence, | there is no way for an implementation to compare two resources that | |||
| creators of IRIs should take additional care to avoid IRIs that only | are not under its own control. For this reason, determination of | |||
| differ in their use of percent-escaping. As an example, using both | equivalence or difference of IRIs is based on string comparison, | |||
| http://example.org/~user and http://example.org/%7Euser to identify | perhaps augmented by reference to additional rules provided by URI | |||
| XML Namespaces is a bad idea. | scheme definitions. We use the terms "different" and "equivalent" to | |||
| describe the possible outcomes of such comparisons, but there are | ||||
| many applicationdependent versions of equivalence. | ||||
| 5.2 Conversion to URIs | Even though it is possible to determine that two IRIs are equivalent, | |||
| IRI comparison is not sufficient to determine if two IRIs identify | ||||
| different resources. For example, an owner of two different domain | ||||
| names could decide to serve the same resource from both, resulting in | ||||
| two different IRIs. Therefore, comparison methods are designed to | ||||
| minimize false negatives while strictly avoiding false positives. | ||||
| For actual resolution, differences in percent-encoding (except for | In testing for equivalence, applications should not directly compare | |||
| the percent-encoding of reserved characters) MUST always result in | relative references; the references should be converted to their | |||
| the same resource. For example, http://example.org/~user, | respective target IRIs before comparison. When IRIs are being | |||
| http://example.org/%7euser and http://example.org/%7Euser must | compared for the purpose of selecting (or avoiding) a network action, | |||
| resolve to the same resource. | such as retrieval of a representation, fragment components (if any) | |||
| should be excluded from the comparison. | ||||
| If this kind of equivalence is to be tested, the percent-encoding of | Applications using IRIs as identity tokens with no relationship to a | |||
| both IRIs to be compared has to be aligned, for example by converting | protocol MUST use the Simple String Comparison (see Section 5.3.1). | |||
| both IRIs to URIs (see Section 3.1), eliminating escape differences | All other applications MUST select one of the comparison practices | |||
| in the resulting URIs, and making sure that the case of the | from the Comparison Ladder (see Section 5.3, or, after IRI-to-URI | |||
| hexadecimal characters in the percent-encoding is always the same | conversion, select one of the comparison practices from the URI | |||
| (preferably upper case). If the IRI is to be passed to another | comparison ladder [RFCYYYY], Section 6.2. | |||
| application, or used further in some other way, its original form | ||||
| MUST be preserved; the conversion described here should be performed | ||||
| only for the purpose of local comparison. | ||||
| Additional, similar equivalences are possible based on knowledge | 5.2 Preparation for Comparison | |||
| about the generic URI/IRI syntax, such as the fact that the scheme | ||||
| part is case-insensitive. | ||||
| 5.3 Normalization | Any kind of IRI comparison REQUIRES that all escapings or encodings | |||
| in the protocol or format that carries an IRI are resolved. This is | ||||
| usually done when parsing the protocol or format. Examples of such | ||||
| escapings or encodings are entities and numeric character references | ||||
| in [HTML4] and [XML1]. As an example, http://example.org/rosé | ||||
| (in HTML), http://example.org/rosé (in HTML or XML), and | ||||
| http://example.org/rosé (in HTML or XML) all get resolved into | ||||
| what is denoted in this document (see Section 1.4) as | ||||
| http://example.org/rosé (the "é" here standing for the | ||||
| actual e-acute character, to compensate for the fact that this | ||||
| document cannot contain non-ASCII characters). | ||||
| Similar considerations apply to encodings such as Transfer Codings in | ||||
| HTTP (see [RFC2616]) and Content Transfer Encodings in MIME[RFC2045], | ||||
| although in these cases, the encoding is not based on characters, but | ||||
| on octets, and additional care is required to make sure that | ||||
| characters, and not just arbitrary octets, are compared (see Section | ||||
| 5.3.1). | ||||
| 5.3 Comparison Ladder | ||||
| A variety of methods are used in practice to test IRI equivalence. | ||||
| These methods fall into a range, distinguished by the amount of | ||||
| processing required and the degree to which the probability of false | ||||
| negatives is reduced. As noted above, false negatives cannot be | ||||
| eliminated. In practice, their probability can be reduced, but this | ||||
| reduction requires more processing and is not cost-effective for all | ||||
| applications. | ||||
| If this range of comparison practices is considered as a ladder, the | ||||
| following discussion will climb the ladder, starting with those | ||||
| practices that are cheap but have a relatively higher chance of | ||||
| producing false negatives, and proceeding to those that have higher | ||||
| computational cost and lower risk of false negatives. | ||||
| 5.3.1 Simple String Comparison | ||||
| If two IRIs, considered as character strings, are identical, then it | ||||
| is safe to conclude that they are equivalent. This type of | ||||
| equivalence test has very low computational cost and is in wide use | ||||
| in a variety of applications, particularly in the domain of parsing | ||||
| and when a definitive answer to the question of IRI equivalence is | ||||
| needed that is independent of the scheme used and can be calculated | ||||
| quickly and without accessing a network. An example of such a case | ||||
| is XML Namespaces ([XMLNamespace]). | ||||
| Testing strings for equivalence requires some basic precautions. | ||||
| This procedure is often referred to as "bit-for-bit" or | ||||
| "byte-for-byte" comparison, which is potentially misleading. Testing | ||||
| of strings for equality is normally based on pairwise comparison of | ||||
| the characters that make up the strings, starting from the first and | ||||
| proceeding until both strings are exhausted and all characters found | ||||
| to be equal, a pair of characters compares unequal, or one of the | ||||
| strings is exhausted before the other. | ||||
| Such character comparisons require that each pair of characters be | ||||
| put in comparable encoding form. For example, should one IRI be | ||||
| stored in a byte array in UTF-8 encoding form, and the second be in a | ||||
| UTF-16 encoding form, bit-for-bit comparisons applied naively will | ||||
| produce errors. It is better to speak of equality on a | ||||
| character-for-character rather than byte-for-byte or bit-for-bit | ||||
| basis. In practical terms, character-by-character comparisons should | ||||
| be done codepoint-by-codepoint after conversion to a common character | ||||
| encoding form. When comparing character-by-character, the comparison | ||||
| function MUST NOT map IRIs to URIs, because such a mapping would | ||||
| create additional spurious equivalences. It follows that IRIs SHOULD | ||||
| NOT be modified when being transported if there is any chance that | ||||
| this IRI might be used as an identifier. | ||||
| False negatives are caused by the production and use of IRI aliases. | ||||
| Unnecessary aliases can be reduced, regardless of the comparison | ||||
| method, by consistently providing IRI references in an | ||||
| already-normalized form (i.e., a form identical to what would be | ||||
| produced after normalization is applied, as described below). | ||||
| Protocols and data formats often choose to limit some IRI comparisons | ||||
| to simple string comparison, based on the theory that people and | ||||
| implementations will, in their own best interest, be consistent in | ||||
| providing IRI references, or at least consistent enough to negate any | ||||
| efficiency that might be obtained from further normalization. | ||||
| 5.3.2 Syntax-based Normalization | ||||
| Implementations may use logic based on the definitions provided by | ||||
| this specification to reduce the probability of false negatives. | ||||
| Such processing is moderately higher in cost than | ||||
| character-for-character string comparison. For example, an | ||||
| application using this approach could reasonably consider the | ||||
| following two IRIs equivalent: | ||||
| example://a/b/c/%7Bfoo%7D/rosé | ||||
| eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9 | ||||
| Web user agents, such as browsers, typically apply this type of IRI | ||||
| normalization when determining whether a cached response is | ||||
| available. Syntax-based normalization includes such techniques as | ||||
| case normalization, character normalization, percent-encoding | ||||
| normalization, and removal of dot-segments. | ||||
| 5.3.2.1 Case Normalization | ||||
| For all IRIs, the hexadecimal digits within a percent-encoding | ||||
| triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore | ||||
| should be normalized to use uppercase letters for the digits A-F. | ||||
| When an IRI uses components of the generic syntax, the component | ||||
| syntax equivalence rules always apply; namely, that the scheme and | ||||
| US-ASCII only host are case-insensitive and therefore should be | ||||
| normalized to lowercase. For example, the URI | ||||
| <HTTP://www.EXAMPLE.com/> is equivalent to <http://www.example.com/>. | ||||
| Case equivalence for non-ASCII characters in IRI components that are | ||||
| IDNs are discussed in Section 5.3.3. The other generic syntax | ||||
| components are assumed to be case-sensitive unless specifically | ||||
| defined otherwise by the scheme. | ||||
| Creating schemes that allow case-insensitive syntax components | ||||
| containing non US-ASCII characters should be avoided because such a | ||||
| case normalization may be cultural dependant and is always a complex | ||||
| operation. The only exception concerns non-ASCII host names for | ||||
| which the character normalization includes a mapping step derived | ||||
| from case folding. | ||||
| 5.3.2.2 Character Normalization | ||||
| The Unicode Standard [UNIV4] defines various equivalences between | The Unicode Standard [UNIV4] defines various equivalences between | |||
| sequences of characters for various purposes. Unicode Standard Annex | sequences of characters for various purposes. Unicode Standard Annex | |||
| #15 [UTR15] defines various Normalization Forms for these | #15 [UTR15] defines various Normalization Forms for these | |||
| equivalences, in particular Normalization Form C (NFC, Canonical | equivalences, in particular Normalization Form C (NFC, Canonical | |||
| Decomposition, followed by Canonical Composition) and Normalization | Decomposition, followed by Canonical Composition) and Normalization | |||
| Form KC (NFKC, Compatibility Decomposition, followed by Canonical | Form KC (NFKC, Compatibility Decomposition, followed by Canonical | |||
| Composition). | Composition). | |||
| Equivalence of IRIs MUST rely on the assumption that IRIs are | Equivalence of IRIs MUST rely on the assumption that IRIs are | |||
| appropriately pre-normalized, rather than applying normalization when | appropriately pre-character-normalized, rather than applying | |||
| comparing two IRIs. The exceptions are conversion from a non-digital | character normalization when comparing two IRIs. The exceptions are | |||
| form, and conversion from a non-UCS-based character encoding to an | conversion from a non-digital form, and conversion from a | |||
| UCS-based character encoding. In these cases, NFC or a normalizing | non-UCS-based character encoding to an UCS-based character encoding. | |||
| transcoder using NFC MUST be used for interoperability. To avoid | In these cases, NFC or a normalizing transcoder using NFC MUST be | |||
| false negatives and problems with transcoding, IRIs SHOULD be created | used for interoperability. To avoid false negatives and problems | |||
| using NFC. Using NFKC may avoid even more problems, for example by | with transcoding, IRIs SHOULD be created using NFC. Using NFKC may | |||
| choosing half-width Latin letters instead of full-width, and | avoid even more problems, for example by choosing half-width Latin | |||
| full-width Katakana instead of half-width. | letters instead of full-width, and full-width Katakana instead of | |||
| half-width. | ||||
| As an example, http://www.example.org/résumé.html (in XML | As an example, http://www.example.org/résumé.html (in XML | |||
| Notation) is in NFC. On the other hand, | Notation) is in NFC. On the other hand, | |||
| http://www.example.org/résumé.html is not in NFC. The | http://www.example.org/résumé.html is not in NFC. The | |||
| former uses precombined e-acute characters, the latter uses 'e' | former uses precombined e-acute characters, the latter uses 'e' | |||
| characters followed by combining acute accents. Both usages are | characters followed by combining acute accents. Both usages are | |||
| defined to be canonically equivalent in [UNIV4]. | defined to be canonically equivalent in [UNIV4]. | |||
| Note: Because it is unknown how a particular field is being treated | Note: Because it is unknown how a particular sequence of characters | |||
| with respect to text normalization, it would be inappropriate to | is being treated with respect to character normalization, it would | |||
| allow third parties to normalize an IRI arbitrarily. This does | be inappropriate to allow third parties to normalize an IRI | |||
| not contradict the recommendation that when a resource is created, | arbitrarily. This does not contradict the recommendation that | |||
| its IRI should be as normalized as possible (i.e. NFC or even | when a resource is created, its IRI should be as | |||
| NFKC). This is similar to the upper-case/lower-case problems in | character-normalized as possible (i.e. NFC or even NFKC). This | |||
| URIs. Some parts of a URI are case-insensitive (domain name). | is similar to the upper-case/lower-case problems in | |||
| For others, it is unclear whether they are case-sensitive or | character-normalized as possible (i.e. NFC or even NFKC). URIs. | |||
| Some parts of a URI are case-insensitive (domain name). For | ||||
| others, it is unclear whether they are case-sensitive or | ||||
| case-insensitive, or something in between (e.g. case-sensitive, | case-insensitive, or something in between (e.g. case-sensitive, | |||
| but if the wrong case is used, a multiple choice selection is | but if the wrong case is used, a multiple choice selection is | |||
| provided instead of a direct negative result). The best recipe is | provided instead of a direct negative result). The best recipe is | |||
| that the creator uses a reasonable capitalization, and when | that the creator uses a reasonable capitalization, and when | |||
| transferring the URI, that capitalization is never changed. | transferring the URI, that capitalization is never changed. | |||
| Various IRI schemes may allow the usage of International Domain Names | Various IRI schemes may allow the usage of Internationalized Domain | |||
| (IDN) [RFC3490]. When in use in IRIs, those names SHOULD be | Names (IDN) [RFC3490] either in the ireg-name part or elsewhere. | |||
| validated using the ToASCII operation defined in [RFC3490], with the | Character Normalization also applies to IDNs, as discussed in Section | |||
| flags "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing | 5.3.3. | |||
| an invalid IDN cannot successfully be resolved. For legibility | ||||
| purposes, IDN components of IRIs SHOULD NOT be converted into ASCII | ||||
| Compatible Encoding (ACE). | ||||
| 5.4 Preferred Forms | 5.3.2.3 Percent-Encoding Normalization | |||
| The following are the preferred forms for IRIs when created: | The percent-encoding mechanism (Section 2.1 of [RFCYYYY]) is a | |||
| frequent source of variance among otherwise identical IRIs. In | ||||
| addition to the case normalization issue noted above, some IRI | ||||
| producers percent-encode octets that do not require percent-encoding, | ||||
| resulting in IRIs that are equivalent to their nonencoded | ||||
| counterparts. Such IRIs should be normalized by decoding any | ||||
| percent-encoded octet sequence that corresponds to an unreserved | ||||
| character, as described in Section 2.3 of [RFCYYYY]. | ||||
| - Always provide the URI scheme in lowercase characters. | For actual resolution, differences in percent-encoding (except for | |||
| the percent-encoding of reserved characters) MUST always result in | ||||
| the same resource. For example, http://example.org/~user, | ||||
| http://example.org/%7euser and http://example.org/%7Euser must | ||||
| resolve to the same resource. | ||||
| - Only perform percent-encoding where it is essential. | If this kind of equivalence is to be tested, the percent-encoding of | |||
| both IRIs to be compared has to be aligned, for example by converting | ||||
| both IRIs to URIs (see Section 3.1), eliminating escape differences | ||||
| in the resulting URIs, and making sure that the case of the | ||||
| hexadecimal characters in the percent-encoding is always the same | ||||
| (preferably upper case). If the IRI is to be passed to another | ||||
| application, or used further in some other way, its original form | ||||
| MUST be preserved; the conversion described here should be performed | ||||
| only for the purpose of local comparison. | ||||
| - Always use uppercase A-through-F characters when percent-encoding. | 5.3.2.4 Path Segment Normalization | |||
| - For those schemes where ireg-name is a domain name, always provide | The complete path segments "." and ".." are intended only for use | |||
| the individual labels, in the form produced when applying nameprep | within relative references (Section 4.1 of [RFCYYYY]) and are removed | |||
| [RFC3491]. This in particular includes using lowercase characters | as part of the reference resolution process (Section 5.2 of | |||
| rather than uppercase characters where applicable. Also, always | [RFCYYYY]). However, some implementations may incorrectly assume | |||
| use US-ASCII '.' as a separator. | that reference resolution is not necessary when the reference is | |||
| already an IRI, and thus fail to remove dot-segments when they occur | ||||
| in non-relative paths. IRI normalizers should remove dot-segments by | ||||
| applying the remove_dot_segments algorithm to the path, as described | ||||
| in Section 5.2.4 of [RFCYYYY]. | ||||
| - Where possible, provide IRI components in NFKC or NFC. | 5.3.3 Scheme-based Normalization | |||
| - Prevent /./ and /../ from appearing in IRI paths. | The syntax and semantics of IRIs vary from scheme to scheme, as | |||
| described by the defining specification for each scheme. | ||||
| Implementations may use scheme-specific rules, at further processing | ||||
| cost, to reduce the probability of false negatives. For example, | ||||
| since the "http" scheme makes use of an authority component, has a | ||||
| default port of "80", and defines an empty path to be equivalent to | ||||
| "/", the following four IRIs are equivalent: | ||||
| - For schemes that define an empty path to be equivalent to a path | http://example.com | |||
| of "/", use "/". | http://example.com/ | |||
| http://example.com:/ | ||||
| http://example.com:80/ | ||||
| In general, an IRI that uses the generic syntax for authority with an | ||||
| empty path should be normalized to a path of "/"; likewise, an | ||||
| explicit ":port", where the port is empty or the default for the | ||||
| scheme, is equivalent to one where the port and its ":" delimiter are | ||||
| elided, and thus should be removed by scheme-based normalization. | ||||
| For example, the second IRI above is the normal form for the "http" | ||||
| scheme. | ||||
| Another case where normalization varies by scheme is in the handling | ||||
| of an empty authority component or empty host subcomponent. For many | ||||
| scheme specifications, an empty authority or host is considered an | ||||
| error; for others, it is considered equivalent to "localhost" or the | ||||
| end-user's host. When a scheme defines a default for authority and | ||||
| an IRI reference to that default is desired, the reference should be | ||||
| normalized to an empty authority for the sake of uniformity, brevity, | ||||
| and internationalization. If, however, either the userinfo or port | ||||
| subcomponent is non-empty, then the host should be given explicitly | ||||
| even if it matches the default. | ||||
| Normalization should not remove delimiters when their associated | ||||
| component is empty unless licensed to do so by the scheme | ||||
| specification. For example, the IRI "http://example.com/?" cannot be | ||||
| assumed to be equivalent to any of the examples above. Likewise, the | ||||
| presence or absence of delimiters within a userinfo subcomponent is | ||||
| usually significant to its interpretation. The fragment component is | ||||
| not subject to any scheme-based normalization; thus, two IRIs that | ||||
| differ only by the suffix "#" are considered different regardless of | ||||
| the scheme. | ||||
| Some IRI schemes may allow the usage of Internationalized Domain | ||||
| Names (IDN) [RFC3490] either in their ireg-name part or elsewhere. | ||||
| When in use in IRIs, those names SHOULD be validated using the | ||||
| ToASCII operation defined in [RFC3490], with the flags | ||||
| "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing an | ||||
| invalid IDN cannot successfully be resolved. Validated IDN | ||||
| components of IRIs SHOULD be character normalized using the Nameprep | ||||
| process [RFC3491]; however, for legibility purposes, they SHOULD NOT | ||||
| be converted into ASCII Compatible Encoding (ACE). | ||||
| Scheme-based normalization may also consider IDN components and their | ||||
| conversions to punycode as equivalent. As an example, | ||||
| http://résumé.example.org may be considered equivalent to | ||||
| http://xn--rsum-bpad.example.org | ||||
| Other scheme-specific normalizations are possible. | ||||
| 5.3.4 Protocol-based Normalization | ||||
| Web spiders, for which substantial effort to reduce the incidence of | ||||
| false negatives is often cost-effective, are observed to implement | ||||
| even more aggressive techniques in IRI comparison. For example, if | ||||
| they observe that an IRI such as | ||||
| http://example.com/data | ||||
| redirects to an IRI differing only in the trailing slash | ||||
| http://example.com/data/ | ||||
| they will likely regard the two as equivalent in the future. This | ||||
| kind of technique is only appropriate when equivalence is clearly | ||||
| indicated by both the result of accessing the resources and the | ||||
| common conventions of their scheme's dereference algorithm (in this | ||||
| case, use of redirection by HTTP origin servers to avoid problems | ||||
| with relative references). | ||||
| 6. Use of IRIs | 6. Use of IRIs | |||
| 6.1 Limitations on UCS Characters Allowed in IRIs | 6.1 Limitations on UCS Characters Allowed in IRIs | |||
| This section discusses limitations on characters and character | This section discusses limitations on characters and character | |||
| sequences usable for IRIs beyond those given in Section 2.2 and | sequences usable for IRIs beyond those given in Section 2.2 and | |||
| Section 4.1. The considerations in this section are relevant when | Section 4.1. The considerations in this section are relevant when | |||
| creating IRIs and when converting from URIs to IRIs. | creating IRIs and when converting from URIs to IRIs. | |||
| skipping to change at page 35, line 15 | skipping to change at page 39, line 31 | |||
| The discussion on the issue addressed here has started a long time | The discussion on the issue addressed here has started a long time | |||
| ago. There was a thread in the HTML working group in August 1995 | ago. There was a thread in the HTML working group in August 1995 | |||
| (under the topic of "Globalizing URIs") and in the www-international | (under the topic of "Globalizing URIs") and in the www-international | |||
| mailing list in July 1996 (under the topic of "Internationalization | mailing list in July 1996 (under the topic of "Internationalization | |||
| and URLs"), and ad-hoc meetings at the Unicode conferences in | and URLs"), and ad-hoc meetings at the Unicode conferences in | |||
| September 1995 and September 1997. | September 1995 and September 1997. | |||
| Many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding, | Many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding, | |||
| Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim | Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim | |||
| Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie | Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie | |||
| Daigle, Ted Hardie, Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex | Daigle, Ted Hardie, Bill Fenner, Margaret Wasserman, Russ Housley, | |||
| Texin, Graham Klyne, Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam | Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex Texin, Graham Klyne, | |||
| Costello, Dan Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy | Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam Costello, Dan | |||
| Badami, Jonathan Rosenne, Asmus Freytag, Simon Josefsson, Carlos | Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy Badami, Jonathan | |||
| Viegas Damasio, Chris Haynes, Walter Underwood, and many others for | Rosenne, Asmus Freytag, Simon Josefsson, Carlos Viegas Damasio, Chris | |||
| help with understanding the issues and possible solutions, and | Haynes, Walter Underwood, and many others for help with understanding | |||
| getting the details right. | the issues and possible solutions, and getting the details right. | |||
| This document is a product of the Internationalization Working Group | This document is a product of the Internationalization Working Group | |||
| (I18N WG) of the World Wide Web Consortium (W3C). Thanks to the | (I18N WG) of the World Wide Web Consortium (W3C). Thanks to the | |||
| members of the W3C I18N Working Group and Interest Group for their | members of the W3C I18N Working Group and Interest Group for their | |||
| contributions and their work on [CharMod]. Thanks also go to the | contributions and their work on [CharMod]. Thanks also go to the | |||
| members of many other W3C Working Groups for adopting IRIs, and to | members of many other W3C Working Groups for adopting IRIs, and to | |||
| the members of the Montreal IAB Workshop on Internationalization and | the members of the Montreal IAB Workshop on Internationalization and | |||
| Localization for their review. | Localization for their review. | |||
| 11. References | 11. References | |||
| skipping to change at page 36, line 17 | skipping to change at page 40, line 34 | |||
| Profile for Internationalized Domain Names (IDN)", RFC | Profile for Internationalized Domain Names (IDN)", RFC | |||
| 3491, March 2003. | 3491, March 2003. | |||
| [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | |||
| 10646", STD 63, RFC 3629, November 2003. | 10646", STD 63, RFC 3629, November 2003. | |||
| [RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | [RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | |||
| Resource Identifier (URI): Generic Syntax (Note to the RFC | Resource Identifier (URI): Generic Syntax (Note to the RFC | |||
| Editor: Please update this reference with the RFC | Editor: Please update this reference with the RFC | |||
| resulting from draft-fielding-uri-rfc2396bis-xx.txt, and | resulting from draft-fielding-uri-rfc2396bis-xx.txt, and | |||
| remove this Note)", draft-fielding-uri-rfc2396bis-07.txt | remove this Note)", draft-fielding-uri-rfc2396bis-07 (work | |||
| (work in progress), April 2004. | in progress), April 2004. | |||
| [UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | [UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | |||
| Annex #9, March 2004, | Annex #9, March 2004, | |||
| <http://www.unicode.org/reports/tr9/tr9-13.html>. | <http://www.unicode.org/reports/tr9/tr9-13.html>. | |||
| [UNIV4] The Unicode Consortium, "The Unicode Standard, Version | [UNIV4] The Unicode Consortium, "The Unicode Standard, Version | |||
| 4.0.1, defined by: The Unicode Standard, Version 4.0 | 4.0.1, defined by: The Unicode Standard, Version 4.0 | |||
| (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), | (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), | |||
| as amended by Unicode 4.0.1 | as amended by Unicode 4.0.1 | |||
| (http://www.unicode.org/versions/Unicode4.0.1/)", March | (http://www.unicode.org/versions/Unicode4.0.1/)", March | |||
| skipping to change at page 36, line 46 | skipping to change at page 41, line 15 | |||
| 11.2 Non-normative References | 11.2 Non-normative References | |||
| [BidiEx] "Examples of bidirectional IRIs", | [BidiEx] "Examples of bidirectional IRIs", | |||
| <http://www.w3.org/International/iri-edit/BidiExamples>. | <http://www.w3.org/International/iri-edit/BidiExamples>. | |||
| [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T. | [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T. | |||
| Texin, "Character Model for the World Wide Web", World | Texin, "Character Model for the World Wide Web", World | |||
| Wide Web Consortium Working Draft, February 2004, | Wide Web Consortium Working Draft, February 2004, | |||
| <http://www.w3.org/TR/charmod>. | <http://www.w3.org/TR/charmod>. | |||
| [Duerst01] | ||||
| Duerst, M., "Internationalized Resource Identifiers: From | ||||
| Specification to Testing", Proc. 19th International | ||||
| Unicode Conference, San Jose , September 2001, | ||||
| <http://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html>. | ||||
| [Duerst97] | [Duerst97] | |||
| Duerst, M., "The Properties and Promises of UTF-8", Proc. | Duerst, M., "The Properties and Promises of UTF-8", Proc. | |||
| 11th International Unicode Conference, San Jose , | 11th International Unicode Conference, San Jose , | |||
| September 1997, | September 1997, | |||
| <http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/ | <http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/ | |||
| IUC11-UTF-8.pdf>. | IUC11-UTF-8.pdf>. | |||
| [Gettys] Gettys, J., "URI Model Consequences", | [Gettys] Gettys, J., "URI Model Consequences", | |||
| <http://www.w3.org/DesignIssues/ModelConsequences>. | <http://www.w3.org/DesignIssues/ModelConsequences>. | |||
| [HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01 | [HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01 | |||
| Specification", World Wide Web Consortium Recommendation, | Specification", World Wide Web Consortium Recommendation, | |||
| December 1999, | December 1999, | |||
| <http://www.w3.org/TR/REC-html40/appendix/ | <http://www.w3.org/TR/REC-html40/appendix/ | |||
| notes.html#h-B.2>. | notes.html#h-B.2>. | |||
| [RFC2045] Freed, N. and N. Freed, "Multipurpose Internet Mail | ||||
| Extensions (MIME) Part One: Format of Internet Message | ||||
| Bodies", RFC 2045, November 1996. | ||||
| [RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H., | [RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H., | |||
| Atkinson, R., Crispin, M. and P. Svanberg, "The Report of | Atkinson, R., Crispin, M. and P. Svanberg, "The Report of | |||
| the IAB Character Set Workshop held 29 February - 1 March, | the IAB Character Set Workshop held 29 February - 1 March, | |||
| 1996", RFC 2130, April 1997. | 1996", RFC 2130, April 1997. | |||
| [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | |||
| [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. | [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997. | |||
| [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | |||
| skipping to change at page 38, line 11 | skipping to change at page 42, line 25 | |||
| Protocol", RFC 2640, July 1999. | Protocol", RFC 2640, July 1999. | |||
| [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, | [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. Petke, | |||
| "Guidelines for new URL Schemes", RFC 2718, November 1999. | "Guidelines for new URL Schemes", RFC 2718, November 1999. | |||
| [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | |||
| Markup Languages", Unicode Technical Report #20, World | Markup Languages", Unicode Technical Report #20, World | |||
| Wide Web Consortium Note, February 2002, | Wide Web Consortium Note, February 2002, | |||
| <http://www.w3.org/TR/unicode-xml/>. | <http://www.w3.org/TR/unicode-xml/>. | |||
| [W3CIRI] Duerst, M., "Internationalization - URIs and other | ||||
| identifiers", September 2002, | ||||
| <http://www.w3.org/International/O-URL-and-ident.html>. | ||||
| [XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking | [XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking | |||
| Language (XLink) Version 1.0", World Wide Web Consortium | Language (XLink) Version 1.0", World Wide Web Consortium | |||
| Recommendation, June 2001, | Recommendation, June 2001, | |||
| <http://www.w3.org/TR/xlink/#link-locators>. | <http://www.w3.org/TR/xlink/#link-locators>. | |||
| [XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and | [XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and | |||
| F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third | F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third | |||
| Edition)", World Wide Web Consortium Recommendation, | Edition)", World Wide Web Consortium Recommendation, | |||
| February 2004, | February 2004, | |||
| <http://www.w3.org/TR/REC-xml#sec-external-ent>. | <http://www.w3.org/TR/REC-xml#sec-external-ent>. | |||
| End of changes. | ||||
This html diff was produced by rfcdiff 1.16, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ | ||||