| draft-duerst-iri-09.txt | draft-duerst-iri-10.txt | |||
|---|---|---|---|---|
| Network Working Group M. Duerst | Network Working Group M. Duerst | |||
| Internet-Draft W3C | Internet-Draft W3C | |||
| Expires: January 17, 2005 M. Suignard | Expires: March 28, 2005 M. Suignard | |||
| Microsoft Corporation | Microsoft Corporation | |||
| July 19, 2004 | September 27, 2004 | |||
| Internationalized Resource Identifiers (IRIs) | Internationalized Resource Identifiers (IRIs) | |||
| draft-duerst-iri-09 | draft-duerst-iri-10 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, I certify that any applicable | This document is an Internet-Draft and is subject to all provisions | |||
| patent or other IPR claims of which I am aware have been disclosed, | of section 3 of RFC 3667. By submitting this Internet-Draft, each | |||
| and any of which I become aware will be disclosed, in accordance with | author represents that any applicable patent or other IPR claims of | |||
| which he or she is aware have been or will be disclosed, and any of | ||||
| which he or she become aware will be disclosed, in accordance with | ||||
| RFC 3668. | RFC 3668. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as | other groups may also distribute working documents as | |||
| Internet-Drafts. | Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on January 17, 2005. | This Internet-Draft will expire on March 28, 2005. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2004). All Rights Reserved. | Copyright (C) The Internet Society (2004). | |||
| Abstract | Abstract | |||
| This document defines a new protocol element, the Internationalized | This document defines a new protocol element, the Internationalized | |||
| Resource Identifier (IRI), as a complement to the Uniform Resource | Resource Identifier (IRI), as a complement to the Uniform Resource | |||
| Identifier (URI). An IRI is a sequence of characters from the | Identifier (URI). An IRI is a sequence of characters from the | |||
| Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to | Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to | |||
| URIs is defined, which means that IRIs can be used instead of URIs | URIs is defined, which means that IRIs can be used instead of URIs | |||
| where appropriate to identify resources. | where appropriate to identify resources. | |||
| skipping to change at page 2, line 39 | skipping to change at page 2, line 41 | |||
| 5. IRI Equivalence and Comparison . . . . . . . . . . . . . . . . 22 | 5. IRI Equivalence and Comparison . . . . . . . . . . . . . . . . 22 | |||
| 5.1 Simple String Comparison . . . . . . . . . . . . . . . . . 22 | 5.1 Simple String Comparison . . . . . . . . . . . . . . . . . 22 | |||
| 5.2 Conversion to URIs . . . . . . . . . . . . . . . . . . . . 23 | 5.2 Conversion to URIs . . . . . . . . . . . . . . . . . . . . 23 | |||
| 5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . 23 | 5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 5.4 Preferred Forms . . . . . . . . . . . . . . . . . . . . . 24 | 5.4 Preferred Forms . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 25 | 6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . 25 | |||
| 6.2 Software Interfaces and Protocols . . . . . . . . . . . . 25 | 6.2 Software Interfaces and Protocols . . . . . . . . . . . . 25 | |||
| 6.3 Format of URIs and IRIs in Documents and Protocols . . . . 26 | 6.3 Format of URIs and IRIs in Documents and Protocols . . . . 26 | |||
| 6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 26 | 6.4 Use of UTF-8 for Encoding Original Characters . . . . . . 26 | |||
| 6.5 Relative IRI References . . . . . . . . . . . . . . . . . 27 | 6.5 Relative IRI References . . . . . . . . . . . . . . . . . 28 | |||
| 7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 27 | 7. URI/IRI Processing Guidelines (informative) . . . . . . . . . 28 | |||
| 7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 28 | 7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . 28 | |||
| 7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 28 | 7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 29 | 7.3 URI/IRI Transfer Between Applications . . . . . . . . . . 29 | |||
| 7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 29 | 7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . 30 | |||
| 7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 30 | 7.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . 30 | |||
| 7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 31 | 7.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . 31 | |||
| 7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 31 | 7.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . 31 | |||
| 7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 32 | 7.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . 32 | |||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 33 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 33 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
| 11.1 Normative References . . . . . . . . . . . . . . . . . . . . 35 | 11.1 Normative References . . . . . . . . . . . . . . . . . . . . 35 | |||
| 11.2 Non-normative References . . . . . . . . . . . . . . . . . . 36 | 11.2 Non-normative References . . . . . . . . . . . . . . . . . . 36 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 38 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 39 | A. Design Alternatives . . . . . . . . . . . . . . . . . . . . . 39 | |||
| A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 39 | A.1 New Scheme(s) . . . . . . . . . . . . . . . . . . . . . . 39 | |||
| A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 39 | A.2 Other Character Encodings than UTF-8 . . . . . . . . . . . 40 | |||
| A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 39 | A.3 New Encoding Convention . . . . . . . . . . . . . . . . . 40 | |||
| A.4 Indicating Character Encodings in the URI/IRI . . . . . . 40 | A.4 Indicating Character Encodings in the URI/IRI . . . . . . 40 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 41 | Intellectual Property and Copyright Statements . . . . . . . . 41 | |||
| 1. Introduction | 1. Introduction | |||
| 1.1 Overview and Motivation | 1.1 Overview and Motivation | |||
| A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a | A Uniform Resource Identifier (URI) is defined in [RFCYYYY] as a | |||
| sequence of characters chosen from a limited subset of the repertoire | sequence of characters chosen from a limited subset of the repertoire | |||
| of US-ASCII [ASCII] characters. | of US-ASCII [ASCII] characters. | |||
| skipping to change at page 5, line 7 | skipping to change at page 5, line 7 | |||
| For discussion of this document, please use the public-iri@w3.org | For discussion of this document, please use the public-iri@w3.org | |||
| mailing list (publicly archived at | mailing list (publicly archived at | |||
| http://lists.w3.org/Archives/Public/public-iri/). An issues list for | http://lists.w3.org/Archives/Public/public-iri/). An issues list for | |||
| this document is maintained at | this document is maintained at | |||
| http://www.w3.org/International/iri-edit#issues. For more | http://www.w3.org/International/iri-edit#issues. For more | |||
| information on the topic of this document, please also see [W3CIRI] | information on the topic of this document, please also see [W3CIRI] | |||
| and [Duerst01]. | and [Duerst01]. | |||
| 1.2 Applicability | 1.2 Applicability | |||
| IRIs are designed to be compatible with recent recommendations for | IRIs are designed to be compatible with recommendations for new URI | |||
| new URI schemes [RFC2718]. The compatibility is provided by | schemes [RFC2718]. The compatibility is provided by specifying a | |||
| specifying a well defined and deterministic mapping from the IRI | well defined and deterministic mapping from the IRI character | |||
| character sequence to the functionally equivalent URI character | sequence to the functionally equivalent URI character sequence. | |||
| sequence. Practical use of IRIs (or IRI references) in place of URIs | Practical use of IRIs (or IRI references) in place of URIs (or URI | |||
| (or URI references) depends on the following conditions being met: | references) depends on the following conditions being met: | |||
| a) The protocol or format element where IRIs are used should be | a) The protocol or format element where IRIs are used should be | |||
| explicitly designated to be able to carry IRIs. That is, the | explicitly designated to be able to carry IRIs. That is, the | |||
| intent is not to introduce IRIs into contexts that are not defined | intent is not to introduce IRIs into contexts that are not defined | |||
| to accept them. For example, XML schema [XMLSchema] has an | to accept them. For example, XML schema [XMLSchema] has an | |||
| explicit type "anyURI" that includes IRIs and IRI references. | explicit type "anyURI" that includes IRIs and IRI references. | |||
| Therefore, IRIs and IRI references can be in attributes and | Therefore, IRIs and IRI references can be in attributes and | |||
| elements of type "anyURI". On the other hand, in the HTTP | elements of type "anyURI". On the other hand, in the HTTP | |||
| protocol [RFC2616], the Request URI is defined as an URI, which | protocol [RFC2616], the Request URI is defined as an URI, which | |||
| means that direct use of IRIs is not allowed in HTTP requests. | means that direct use of IRIs is not allowed in HTTP requests. | |||
| skipping to change at page 6, line 22 | skipping to change at page 6, line 22 | |||
| charset: The name of a parameter or attribute used to identify a | charset: The name of a parameter or attribute used to identify a | |||
| character encoding. | character encoding. | |||
| UCS: Universal Character Set; the coded character set defined by ISO/ | UCS: Universal Character Set; the coded character set defined by ISO/ | |||
| IEC 10646 [ISO10646] and the Unicode Standard [UNIV4]. | IEC 10646 [ISO10646] and the Unicode Standard [UNIV4]. | |||
| IRI reference: The term "IRI reference" denotes the common usage of | IRI reference: The term "IRI reference" denotes the common usage of | |||
| an Internationalized Resource Identifier. An IRI reference may be | an Internationalized Resource Identifier. An IRI reference may be | |||
| absolute or relative. However, the "IRI" that results from such a | absolute or relative. However, the "IRI" that results from such a | |||
| reference only includes absolute IRIs; any relative IRIs are | reference only includes absolute IRIs; any relative IRI references | |||
| resolved to their absolute form. Note that in [RFC2396], URIs did | are resolved to their absolute form. Note that in [RFC2396], URIs | |||
| not include fragment identifiers, but in [RFCYYYY], fragment | did not include fragment identifiers, but in [RFCYYYY], fragment | |||
| identifiers are part of URIs. | identifiers are part of URIs. | |||
| running text: Human text (paragraphs, sentences, phrases) with syntax | running text: Human text (paragraphs, sentences, phrases) with syntax | |||
| according to orthographic conventions of a natural language, as | according to orthographic conventions of a natural language, as | |||
| opposed to syntax defined for ease of processing by machines | opposed to syntax defined for ease of processing by machines | |||
| (markup, programming languages,...). | (markup, programming languages,...). | |||
| protocol element: Any portion of a message which affects processing | protocol element: Any portion of a message which affects processing | |||
| of that message by the protocol in question. | of that message by the protocol in question. | |||
| presentation element: Presentation form corresponding to a protocol | presentation element: Presentation form corresponding to a protocol | |||
| element, for example using a wider range of characters. | element, for example using a wider range of characters. | |||
| create (an URI or IRI): With respect to URIs and IRIs, the word | create (an URI or IRI): With respect to URIs and IRIs, the word | |||
| 'create' is used for the initial creation. This may be the | 'create' is used for the initial creation. This may be the | |||
| initial creation of a resource with a certain name, or the initial | initial creation of a resource with a certain identifier, or the | |||
| exposition of a resource under a particular name. | initial exposition of a resource under a particular identifier. | |||
| generate (an URI or IRI): With respect to URIs and IRIs, the word | generate (an URI or IRI): With respect to URIs and IRIs, the word | |||
| 'generate' is used when the IRI is generated by derivation from | 'generate' is used when the IRI is generated by derivation from | |||
| other information. | other information. | |||
| 1.4 Notation | 1.4 Notation | |||
| RFCs and Internet Drafts currently do not allow any characters | RFCs and Internet Drafts currently do not allow any characters | |||
| outside the US-ASCII repertoire. Therefore, this document uses | outside the US-ASCII repertoire. Therefore, this document uses | |||
| various special notations to denote such characters in examples. | various special notations to denote such characters in examples. | |||
| skipping to change at page 8, line 8 | skipping to change at page 8, line 8 | |||
| 2.1 Summary of IRI Syntax | 2.1 Summary of IRI Syntax | |||
| IRIs are defined similarly to URIs in [RFCYYYY], but the class of | IRIs are defined similarly to URIs in [RFCYYYY], but the class of | |||
| unreserved characters is extended by adding the characters of the UCS | unreserved characters is extended by adding the characters of the UCS | |||
| (Universal Character Set, [ISO10646]) beyond U+007F, subject to the | (Universal Character Set, [ISO10646]) beyond U+007F, subject to the | |||
| limitations given in the syntax rules below and in Section 6.1. | limitations given in the syntax rules below and in Section 6.1. | |||
| Otherwise, the syntax and use of components and reserved characters | Otherwise, the syntax and use of components and reserved characters | |||
| is the same as that in [RFCYYYY]. All the operations defined in | is the same as that in [RFCYYYY]. All the operations defined in | |||
| [RFCYYYY], such as the resolution of relative URIs, can be applied to | [RFCYYYY], such as the resolution of relative references, can be | |||
| IRIs by IRI-processing software in exactly the same way as this is | applied to IRIs by IRI-processing software in exactly the same way as | |||
| done to URIs by URI-processing software. | this is done to URIs by URI-processing software. | |||
| Characters outside the US-ASCII repertoire are not reserved and | Characters outside the US-ASCII repertoire are not reserved and | |||
| therefore MUST NOT be used for syntactical purposes such as to | therefore MUST NOT be used for syntactical purposes such as to | |||
| delimit components in newly defined schemes. As an example, it is | delimit components in newly defined schemes. As an example, it is | |||
| not allowed to use U+00A2, CENT SIGN, as a delimiter in IRIs, because | not allowed to use U+00A2, CENT SIGN, as a delimiter in IRIs, because | |||
| it is in the 'iunreserved' category, in the same way as it is not | it is in the 'iunreserved' category, in the same way as it is not | |||
| possible to use '-' as a delimiter, because it is in the 'unreserved' | possible to use '-' as a delimiter, because it is in the 'unreserved' | |||
| category in URIs. | category in URIs. | |||
| 2.2 ABNF for IRI References and IRIs | 2.2 ABNF for IRI References and IRIs | |||
| skipping to change at page 8, line 48 | skipping to change at page 8, line 48 | |||
| of the non-terminals have been changed as follows: If the | of the non-terminals have been changed as follows: If the | |||
| non-terminal contains 'URI', this has been changed to 'IRI'. | non-terminal contains 'URI', this has been changed to 'IRI'. | |||
| Otherwise, an 'i' has been prefixed. | Otherwise, an 'i' has been prefixed. | |||
| The following rules are different from [RFCYYYY]: | The following rules are different from [RFCYYYY]: | |||
| IRI = scheme ":" ihier-part [ "?" iquery ] | IRI = scheme ":" ihier-part [ "?" iquery ] | |||
| [ "#" ifragment ] | [ "#" ifragment ] | |||
| ihier-part = "//" iauthority ipath-abempty | ihier-part = "//" iauthority ipath-abempty | |||
| / ipath-abs | / ipath-absolute | |||
| / ipath-rootless | / ipath-rootless | |||
| / ipath-empty | / ipath-empty | |||
| IRI-reference = IRI / relative-IRI | IRI-reference = IRI / irelative-ref | |||
| absolute-IRI = scheme ":" ihier-part [ "?" iquery ] | absolute-IRI = scheme ":" ihier-part [ "?" iquery ] | |||
| relative-IRI = irelative-part [ "?" iquery ] [ "#" ifragment ] | irelative-ref = irelative-part [ "?" iquery ] [ "#" ifragment ] | |||
| irelative-part = "//" iauthority ipath-abempty | irelative-part = "//" iauthority ipath-abempty | |||
| / ipath-abs | / ipath-absolute | |||
| / ipath-noscheme | / ipath-noscheme | |||
| / ipath-empty | / ipath-empty | |||
| iauthority = [ iuserinfo "@" ] ihost [ ":" port ] | iauthority = [ iuserinfo "@" ] ihost [ ":" port ] | |||
| iuserinfo = *( iunreserved / pct-encoded / sub-delims / ":" ) | iuserinfo = *( iunreserved / pct-encoded / sub-delims / ":" ) | |||
| ihost = IP-literal / IPv4address / ireg-name | ihost = IP-literal / IPv4address / ireg-name | |||
| ireg-name = *( iunreserved / pct-encoded / sub-delims ) | ireg-name = *( iunreserved / pct-encoded / sub-delims ) | |||
| ipath = ipath-abempty ; begins with "/" or is empty | ipath = ipath-abempty ; begins with "/" or is empty | |||
| / ipath-abs ; begins with "/" but not "//" | / ipath-absolute ; begins with "/" but not "//" | |||
| / ipath-noscheme ; begins with a non-colon segment | / ipath-noscheme ; begins with a non-colon segment | |||
| / ipath-rootless ; begins with a segment | / ipath-rootless ; begins with a segment | |||
| / ipath-empty ; zero characters | / ipath-empty ; zero characters | |||
| ipath-abempty = *( "/" isegment ) | ipath-abempty = *( "/" isegment ) | |||
| ipath-abs = "/" [ isegment-nz *( "/" isegment ) ] | ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ] | |||
| ipath-noscheme = isegment-nzc *( "/" isegment ) | ipath-noscheme = isegment-nz-nc *( "/" isegment ) | |||
| ipath-rootless = isegment-nz *( "/" isegment ) | ipath-rootless = isegment-nz *( "/" isegment ) | |||
| ipath-empty = 0<ipchar> | ipath-empty = 0<ipchar> | |||
| isegment = *ipchar | isegment = *ipchar | |||
| isegment-nz = 1*ipchar | isegment-nz = 1*ipchar | |||
| isegment-nzc = 1*( iunreserved / pct-encoded / sub-delims | isegment-nz-nc = 1*( iunreserved / pct-encoded / sub-delims | |||
| / "@" ) | / "@" ) | |||
| ; non-zero-length segment without any colon ":" | ||||
| ipchar = iunreserved / pct-encoded / sub-delims / ":" | ipchar = iunreserved / pct-encoded / sub-delims / ":" | |||
| / "@" | / "@" | |||
| iquery = *( ipchar / iprivate / "/" / "?" ) | iquery = *( ipchar / iprivate / "/" / "?" ) | |||
| ifragment = *( ipchar / "/" / "?" ) | ifragment = *( ipchar / "/" / "?" ) | |||
| iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar | iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar | |||
| skipping to change at page 12, line 11 | skipping to change at page 12, line 11 | |||
| of any character encoding: Represent the IRI as a sequence of | of any character encoding: Represent the IRI as a sequence of | |||
| characters from the UCS normalized according to Normalization | characters from the UCS normalized according to Normalization | |||
| Form C (NFC, [UTR15]). | Form C (NFC, [UTR15]). | |||
| Variant B) If the IRI is in some digital representation (e.g. an | Variant B) If the IRI is in some digital representation (e.g. an | |||
| octet stream) in some known non-Unicode character encoding: | octet stream) in some known non-Unicode character encoding: | |||
| Convert the IRI to a sequence of characters from the UCS | Convert the IRI to a sequence of characters from the UCS | |||
| normalized according to NFC. | normalized according to NFC. | |||
| Variant C) If the IRI is in an Unicode-based character encoding | Variant C) If the IRI is in an Unicode-based character encoding | |||
| (for example UTF-8 or UTF-16): Do not normalize. Apply Step 2 | (for example UTF-8 or UTF-16): Do not normalize (see Section | |||
| directly to the encoded Unicode character sequence. | 5.3 for details). Apply Step 2 directly to the encoded Unicode | |||
| character sequence. | ||||
| Step 2) For each character in 'ucschar' or 'iprivate', apply Steps | Step 2) For each character in 'ucschar' or 'iprivate', apply Steps | |||
| 2.1 through 2.3 below. | 2.1 through 2.3 below. | |||
| 2.1) Convert the character to a sequence of one or more octets | 2.1) Convert the character to a sequence of one or more octets | |||
| using UTF-8 [RFC3629]. | using UTF-8 [RFC3629]. | |||
| 2.2) Convert each octet to %HH, where HH is the hexadecimal | 2.2) Convert each octet to %HH, where HH is the hexadecimal | |||
| notation of the octet value. Note that this is identical to | notation of the octet value. Note that this is identical to | |||
| the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To | the percent-encoding mechanism in Section 2.1 of [RFCYYYY]. To | |||
| reduce variability, the hexadecimal notation SHOULD use upper | reduce variability, the hexadecimal notation SHOULD use upper | |||
| case letters. | case letters. | |||
| 2.3) Replace the original character by the resulting character | 2.3) Replace the original character with the resulting character | |||
| sequence (i.e. a sequence of %HH triplets). | sequence (i.e., a sequence of %HH triplets). | |||
| The above mapping from IRIs to URIs produces URIs fully conforming to | The above mapping from IRIs to URIs produces URIs fully conforming to | |||
| [RFCYYYY]. The mapping is also an identity transformation for URIs | [RFCYYYY]. The mapping is also an identity transformation for URIs | |||
| and is idempotent -- applying the mapping a second time will not | and is idempotent -- applying the mapping a second time will not | |||
| change anything. Every URI is by definition an IRI. | change anything. Every URI is by definition an IRI. | |||
| Infrastructure accepting IRIs MAY convert the ireg-name component of | Infrastructure accepting IRIs MAY convert the ireg-name component of | |||
| an IRI as follows (before Step 2 above) for schemes that are known to | an IRI as follows (before Step 2 above) for schemes that are known to | |||
| use domain names in ireg-name, but where the scheme definition does | use domain names in ireg-name, but where the scheme definition does | |||
| not allow percent-encoding for ireg-name: Replace the ireg-name part | not allow percent-encoding for ireg-name: Replace the ireg-name part | |||
| skipping to change at page 13, line 7 | skipping to change at page 13, line 8 | |||
| the IRI | the IRI | |||
| http://résumé.example.org may be converted to | http://résumé.example.org may be converted to | |||
| http://xn--rsum-bpad.example.org instead of | http://xn--rsum-bpad.example.org instead of | |||
| http://r%C3%A9sum%C3%A9.example.org. | http://r%C3%A9sum%C3%A9.example.org. | |||
| An IRI with a scheme that is known to use domain names in ireg-name, | An IRI with a scheme that is known to use domain names in ireg-name, | |||
| but where the scheme definition does not allow percent-encoding for | but where the scheme definition does not allow percent-encoding for | |||
| ireg-name, meets scheme-specific restrictions if either the | ireg-name, meets scheme-specific restrictions if either the | |||
| straightforward conversion or the conversion using the ToASCII | straightforward conversion or the conversion using the ToASCII | |||
| operation on ireg-name result in an URI that meets the | operation on ireg-name result in an URI that meets the | |||
| scheme-specific restrictions. An IRI with a scheme that is known to | scheme-specific restrictions. Such an IRI resolves to the URI | |||
| use domain names in ireg-name, but where the scheme definition does | ||||
| not allow percent-encoding for ireg-name, resolves to the URI | ||||
| obtained after converting the IRI including using the ToASCII | obtained after converting the IRI including using the ToASCII | |||
| operation on ireg-name. Implementations do not need to do this | operation on ireg-name. Implementations do not need to do this | |||
| conversion as long as they produce the same result. | conversion as long as they produce the same result. | |||
| Note: The difference between Variants B and C in Step 1 (Variant B | Note: The difference between Variants B and C in Step 1 (Variant B | |||
| using normalization with NFC while Variant C not using any | using normalization with NFC while Variant C not using any | |||
| normalization) is to account for the fact that in many non-Unicode | normalization) is to account for the fact that in many non-Unicode | |||
| character encodings, some text cannot be represented directly. | character encodings, some text cannot be represented directly. | |||
| For example, Vietnam is natively written "Việt Nam" | For example, Vietnam is natively written "Việt Nam" | |||
| (containing a LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW) | (containing a LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW) | |||
| skipping to change at page 18, line 23 | skipping to change at page 18, line 23 | |||
| LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP | LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP | |||
| DIRECTIONAL FORMATTING (PDF). Setting the embedding direction can | DIRECTIONAL FORMATTING (PDF). Setting the embedding direction can | |||
| also be done in a higher-level protocol (e.g. the dir='ltr' | also be done in a higher-level protocol (e.g. the dir='ltr' | |||
| attribute in HTML). | attribute in HTML). | |||
| There is no requirement to actually use the above embedding if the | There is no requirement to actually use the above embedding if the | |||
| display is still the same without the embedding. For example, a | display is still the same without the embedding. For example, a | |||
| bidirectional IRI in a text with left-to-right base directionality | bidirectional IRI in a text with left-to-right base directionality | |||
| (such as used for English or Cyrillic) that is preceded and followed | (such as used for English or Cyrillic) that is preceded and followed | |||
| by whitespace and strong left-to-right characters does not need an | by whitespace and strong left-to-right characters does not need an | |||
| embedding. Also, a bidirectional relative IRI that only contains | embedding. Also, a bidirectional relative IRI reference that only | |||
| strong right-to-left characters and weak characters and that starts | contains strong right-to-left characters and weak characters and that | |||
| and ends with a strong rigth-to-left character and appears in a text | starts and ends with a strong rigth-to-left character and appears in | |||
| with right-to-left base directionality (such as used for Arabic or | a text with right-to-left base directionality (such as used for | |||
| Hebrew) and is preceded and followed by whitespace and strong | Arabic or Hebrew) and is preceded and followed by whitespace and | |||
| characters does not need an embedding. | strong characters does not need an embedding. | |||
| In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM) may be | In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM) may be | |||
| sufficient to force the correct display behavior. However, the | sufficient to force the correct display behavior. However, the | |||
| details of the Unicode Bidirectional algorithm are not always easy to | details of the Unicode Bidirectional algorithm are not always easy to | |||
| understand. Implementers are strongly advised to err on the side of | understand. Implementers are strongly advised to err on the side of | |||
| caution and to use embedding in all cases where they are not | caution and to use embedding in all cases where they are not | |||
| completely sure that the display behavior is unaffected without the | completely sure that the display behavior is unaffected without the | |||
| embedding. | embedding. | |||
| The Unicode Bidirectional Algorithm ([UNI9], Section 4.3) permits | The Unicode Bidirectional Algorithm ([UNI9], Section 4.3) permits | |||
| skipping to change at page 19, line 17 | skipping to change at page 19, line 17 | |||
| The Unicode Bidirectional Algorithm is designed mainly for running | The Unicode Bidirectional Algorithm is designed mainly for running | |||
| text. To make sure that it does not affect the rendering of | text. To make sure that it does not affect the rendering of | |||
| bidirectional IRIs too much, some restrictions on bidirectional IRIs | bidirectional IRIs too much, some restrictions on bidirectional IRIs | |||
| are necessary. These restrictions are given in terms of delimiters | are necessary. These restrictions are given in terms of delimiters | |||
| (structural characters, mostly punctuation such as '@', '.', ':', | (structural characters, mostly punctuation such as '@', '.', ':', | |||
| '/') and components (usually consisting mostly of letters and | '/') and components (usually consisting mostly of letters and | |||
| digits). | digits). | |||
| The following syntax rules from Section 2.2 correspond to components | The following syntax rules from Section 2.2 correspond to components | |||
| for the purpose of Bidi behavior: iuserinfo, ireg-name, isegment, | for the purpose of Bidi behavior: iuserinfo, ireg-name, isegment, | |||
| isegment-nz, isegment-nzc, ireg-name, iquery, and ifragment. | isegment-nz, isegment-nz-nc, ireg-name, iquery, and ifragment. | |||
| Specifications that define the syntax of any of the above components | Specifications that define the syntax of any of the above components | |||
| MAY divide them further and define smaller parts to be components | MAY divide them further and define smaller parts to be components | |||
| according to this document. As an example, the restrictions of | according to this document. As an example, the restrictions of | |||
| [RFC3490] on bidirectional domain names correspond to treating each | [RFC3490] on bidirectional domain names correspond to treating each | |||
| label of a domain name as a component for those schemes where | label of a domain name as a component for those schemes where | |||
| ireg-name is a domain name. Even where the components are not | ireg-name is a domain name. Even where the components are not | |||
| defined formally, it may be helpful to think about some syntax in | defined formally, it may be helpful to think about some syntax in | |||
| terms of components and to apply the relevant restrictions. For | terms of components and to apply the relevant restrictions. For | |||
| example, for the usual name/value syntax in query parts, it is | example, for the usual name/value syntax in query parts, it is | |||
| skipping to change at page 21, line 50 | skipping to change at page 21, line 50 | |||
| logical representation: http://ab.cd.ef/GH1/2IJ/KL.html | logical representation: http://ab.cd.ef/GH1/2IJ/KL.html | |||
| visual representation: http://ab.cd.ef/LK/JI1/2HG.html | visual representation: http://ab.cd.ef/LK/JI1/2HG.html | |||
| The sequence '1/2' is interpreted by the bidi algorithm as a | The sequence '1/2' is interpreted by the bidi algorithm as a | |||
| fraction, fragmenting the components and leading to confusion. There | fraction, fragmenting the components and leading to confusion. There | |||
| are other characters that are interpreted in a special way close to | are other characters that are interpreted in a special way close to | |||
| numbers, in particular '+', '-', '#', '$', '%', ',', '.', and ':'. | numbers, in particular '+', '-', '#', '$', '%', ',', '.', and ':'. | |||
| Example 9 (not allowed): The numbers in the previous example are | Example 9 (not allowed): The numbers in the previous example are | |||
| percent-encoded: | percent-encoded: | |||
| logical representation: http://ab.cd.ef/GH%31/%32IJ/KL.html, | logical representation: http://ab.cd.ef/GH%31/%32IJ/KL.html, | |||
| visual representation (Hebrew): http://ab.cd.ef/LK/JI%32/%31HG.html | visual representation (Hebrew): http://ab.cd.ef/%31HG/LK/JI%32.html | |||
| visual representation (Arabic): http://ab.cd.ef/LK/JI32%/31%HG.html | visual representation (Arabic): http://ab.cd.ef/31%HG/%LK/JI32.html | |||
| Depending on whether the upper-case letters represent Arabic or | Depending on whether the upper-case letters represent Arabic or | |||
| Hebrew, the visual representation is different. | Hebrew, the visual representation is different. | |||
| Example 10 (allowed, but not recommended): | Example 10 (allowed, but not recommended): | |||
| logical representation: http://ab.CDEFGH.123/kl/mn/op.html | logical representation: http://ab.CDEFGH.123/kl/mn/op.html | |||
| visual representation: http://ab.123.HGFEDC/kl/mn/op.html | visual representation: http://ab.123.HGFEDC/kl/mn/op.html | |||
| Components consisting of only numbers are allowed (it would be rather | Components consisting of only numbers are allowed (it would be rather | |||
| difficult to prohibit them), but may interact with adjacent RTL | difficult to prohibit them), but may interact with adjacent RTL | |||
| components in ways that are not easy to predict. | components in ways that are not easy to predict. | |||
| skipping to change at page 23, line 19 | skipping to change at page 23, line 19 | |||
| For actual resolution, differences in percent-encoding (except for | For actual resolution, differences in percent-encoding (except for | |||
| the percent-encoding of reserved characters) MUST always result in | the percent-encoding of reserved characters) MUST always result in | |||
| the same resource. For example, http://example.org/~user, | the same resource. For example, http://example.org/~user, | |||
| http://example.org/%7euser and http://example.org/%7Euser must | http://example.org/%7euser and http://example.org/%7Euser must | |||
| resolve to the same resource. | resolve to the same resource. | |||
| If this kind of equivalence is to be tested, the percent-encoding of | If this kind of equivalence is to be tested, the percent-encoding of | |||
| both IRIs to be compared has to be aligned, for example by converting | both IRIs to be compared has to be aligned, for example by converting | |||
| both IRIs to URIs (see Section 3.1), eliminating escape differences | both IRIs to URIs (see Section 3.1), eliminating escape differences | |||
| in the resulting URIs, and making sure that the case of the | in the resulting URIs, and making sure that the case of the | |||
| hexadecimal characters in the percent-encodeing is always the same | hexadecimal characters in the percent-encoding is always the same | |||
| (preferably upper case). If the IRI is to be passed to another | (preferably upper case). If the IRI is to be passed to another | |||
| application, or used further in some other way, its original form | application, or used further in some other way, its original form | |||
| MUST be preserved; the conversion described here should be performed | MUST be preserved; the conversion described here should be performed | |||
| only for the purpose of local comparison. | only for the purpose of local comparison. | |||
| Additional, similar equivalences are possible based on knowledge | Additional, similar equivalences are possible based on knowledge | |||
| about the generic URI/IRI syntax, such as the fact that the scheme | about the generic URI/IRI syntax, such as the fact that the scheme | |||
| part is case-insensitive. | part is case-insensitive. | |||
| 5.3 Normalization | 5.3 Normalization | |||
| skipping to change at page 24, line 5 | skipping to change at page 24, line 5 | |||
| UCS-based character encoding. In these cases, NFC or a normalizing | UCS-based character encoding. In these cases, NFC or a normalizing | |||
| transcoder using NFC MUST be used for interoperability. To avoid | transcoder using NFC MUST be used for interoperability. To avoid | |||
| false negatives and problems with transcoding, IRIs SHOULD be created | false negatives and problems with transcoding, IRIs SHOULD be created | |||
| using NFC. Using NFKC may avoid even more problems, for example by | using NFC. Using NFKC may avoid even more problems, for example by | |||
| choosing half-width Latin letters instead of full-width, and | choosing half-width Latin letters instead of full-width, and | |||
| full-width Katakana instead of half-width. | full-width Katakana instead of half-width. | |||
| As an example, http://www.example.org/résumé.html (in XML | As an example, http://www.example.org/résumé.html (in XML | |||
| Notation) is in NFC. On the other hand, | Notation) is in NFC. On the other hand, | |||
| http://www.example.org/résumé.html is not in NFC. The | http://www.example.org/résumé.html is not in NFC. The | |||
| former uses precombined e-acute characters, the later uses 'e' | former uses precombined e-acute characters, the latter uses 'e' | |||
| characters followed by combining acute accents. Both usages are | characters followed by combining acute accents. Both usages are | |||
| defined to be canonically equivalent in [UNIV4]. | defined to be canonically equivalent in [UNIV4]. | |||
| Note: Because it is unknown how a particular field is being treated | Note: Because it is unknown how a particular field is being treated | |||
| with respect to text normalization, it would be inappropriate to | with respect to text normalization, it would be inappropriate to | |||
| allow third parties to normalize an IRI arbitrarily. This does | allow third parties to normalize an IRI arbitrarily. This does | |||
| not contradict the recommendation that when a resource is created, | not contradict the recommendation that when a resource is created, | |||
| its IRI should be as normalized as possible (i.e. NFC or even | its IRI should be as normalized as possible (i.e. NFC or even | |||
| NFKC). This is similar to the upper-case/lower-case problems in | NFKC). This is similar to the upper-case/lower-case problems in | |||
| URIs. Some parts of a URI are case-insensitive (domain name). | URIs. Some parts of a URI are case-insensitive (domain name). | |||
| skipping to change at page 24, line 49 | skipping to change at page 24, line 49 | |||
| - Always use uppercase A-through-F characters when percent-encoding. | - Always use uppercase A-through-F characters when percent-encoding. | |||
| - For those schemes where ireg-name is a domain name, always provide | - For those schemes where ireg-name is a domain name, always provide | |||
| the individual labels, in the form produced when applying nameprep | the individual labels, in the form produced when applying nameprep | |||
| [RFC3491]. This in particular includes using lowercase characters | [RFC3491]. This in particular includes using lowercase characters | |||
| rather than uppercase characters where applicable. Also, always | rather than uppercase characters where applicable. Also, always | |||
| use US-ASCII '.' as a separator. | use US-ASCII '.' as a separator. | |||
| - Where possible, provide IRI components in NFKC or NFC. | - Where possible, provide IRI components in NFKC or NFC. | |||
| - Prevent /./ and /../ from appearing in non-relative URI paths. | - Prevent /./ and /../ from appearing in IRI paths. | |||
| - For schemes that define an empty path to be equivalent to a path | - For schemes that define an empty path to be equivalent to a path | |||
| of "/", use "/". | of "/", use "/". | |||
| 6. Use of IRIs | 6. Use of IRIs | |||
| 6.1 Limitations on UCS Characters Allowed in IRIs | 6.1 Limitations on UCS Characters Allowed in IRIs | |||
| This section discusses limitations on characters and character | This section discusses limitations on characters and character | |||
| sequences usable for IRIs beyond those given in Section 2.2 and | sequences usable for IRIs beyond those given in Section 2.2 and | |||
| skipping to change at page 27, line 9 | skipping to change at page 27, line 9 | |||
| http://www.example.org/résumé.html (é stands for the | http://www.example.org/résumé.html (é stands for the | |||
| e-acute character, and %C3%A9 is the UTF-8 encoded and | e-acute character, and %C3%A9 is the UTF-8 encoded and | |||
| percent-encoded representation of that character). On the other | percent-encoded representation of that character). On the other | |||
| hand, for a document with a URI of | hand, for a document with a URI of | |||
| http://www.example.org/r%E9sum%E9.html, the percent-encoding octets | http://www.example.org/r%E9sum%E9.html, the percent-encoding octets | |||
| cannot be converted to actual characters in an IRI, because the | cannot be converted to actual characters in an IRI, because the | |||
| percent-encoding is not based on UTF-8. | percent-encoding is not based on UTF-8. | |||
| This means that for most URI schemes, there is no need to upgrade | This means that for most URI schemes, there is no need to upgrade | |||
| their scheme definition in order for them to work with IRIs. The | their scheme definition in order for them to work with IRIs. The | |||
| main case where upgrading a scheme definition may make sense is when | main case where upgrading a scheme definition makes sense is when a | |||
| a scheme definition is limited to the use of US-ASCII characters with | scheme definition, or a particular component of a scheme, is strictly | |||
| no provision to include non-ASCII characters/octets but a desire to | limited to the use of US-ASCII characters with no provision to | |||
| include such characters, or only with provisions that are highly | include non-ASCII characters/octets via percent-encoding, or if a | |||
| scheme-specific. An example of such a scheme might be the mailto: | scheme definition currently uses highly scheme-specific provisions | |||
| scheme [RFC2368]. | for the encoding of non-ASCII characters. An example of such a | |||
| scheme might be the mailto: scheme [RFC2368]. | ||||
| This specification does not upgrade any scheme specifications in any | This specification does not upgrade any scheme specifications in any | |||
| way, this has to be done separately. Also, it should be noted that | way, this has to be done separately. Also, it should be noted that | |||
| there is no such thing as an "IRI scheme"; all IRIs use URI schemes, | there is no such thing as an "IRI scheme"; all IRIs use URI schemes, | |||
| and all URI schemes can be used with IRIs, even though in some cases | and all URI schemes can be used with IRIs, even though in some cases | |||
| only by using URIs directly as IRIs, without any conversion. | only by using URIs directly as IRIs, without any conversion. | |||
| URI schemes can impose restrictions on the syntax of scheme-specific | ||||
| URIs, ie. URIs that are admissable under the generic URI syntax | ||||
| [RFCYYYY] may not be admissable due to narrower syntactic constraints | ||||
| imposed by a URI scheme specification. URI scheme definitions cannot | ||||
| broaden the syntactic restrictions of the generic URI syntax, | ||||
| otherwise it would be possible to generate URIs that satisfied the | ||||
| scheme specific syntactic constraints without satisfying the | ||||
| syntactic constraints of the generic URI syntax. However, additional | ||||
| syntactic constraints imposed by URI scheme specifications are | ||||
| applicable to IRI since the corresponding URI resulting from the | ||||
| mapping defined in Section 3.1 MUST be a valid URI under the | ||||
| syntactic restrictions of generic URI syntax and any narrower | ||||
| restrictions imposed by the corresponding URI scheme specification. | ||||
| The requirement for the use of UTF-8 applies to all parts of a URI | The requirement for the use of UTF-8 applies to all parts of a URI | |||
| (with the potential exception of the ireg-name part, see Section | (with the potential exception of the ireg-name part, see Section | |||
| 3.1). However, it is possible that the capability of IRIs to | 3.1). However, it is possible that the capability of IRIs to | |||
| represent a wide range of characters directly is used just in some | represent a wide range of characters directly is used just in some | |||
| parts of the IRI (or IRI reference). The other parts of the IRI may | parts of the IRI (or IRI reference). The other parts of the IRI may | |||
| only contain US-ASCII characters, or they may not be based on UTF-8. | only contain US-ASCII characters, or they may not be based on UTF-8. | |||
| They may be based on another character encoding, or they may directly | They may be based on another character encoding, or they may directly | |||
| encode raw binary data (see also [RFC2397]). | encode raw binary data (see also [RFC2397]). | |||
| For example, it is possible to have a URI reference of | For example, it is possible to have a URI reference of | |||
| skipping to change at page 27, line 44 | skipping to change at page 28, line 11 | |||
| the fragment identifier is encoded in UTF-8 according to [XPointer]. | the fragment identifier is encoded in UTF-8 according to [XPointer]. | |||
| The IRI corresponding to the above URI would be (in XML notation) | The IRI corresponding to the above URI would be (in XML notation) | |||
| http://www.example.org/r%E9sum%E9.xml#résumé. | http://www.example.org/r%E9sum%E9.xml#résumé. | |||
| Similar considerations apply to query parts. The functionality of | Similar considerations apply to query parts. The functionality of | |||
| IRIs (namely to be able to include non-ASCII characters) can only be | IRIs (namely to be able to include non-ASCII characters) can only be | |||
| used if the query part is encoded in UTF-8. | used if the query part is encoded in UTF-8. | |||
| 6.5 Relative IRI References | 6.5 Relative IRI References | |||
| Processing of relative forms of IRIs against a base is handled | Processing of relative IRI references against a base is handled | |||
| straightforwardly; the algorithms of [RFCYYYY] can be applied | straightforwardly; the algorithms of [RFCYYYY] can be applied | |||
| directly, treating the characters additionally allowed in IRIs in the | directly, treating the characters additionally allowed in IRI | |||
| same way as unreserved characters in URIs. | references in the same way as unreserved characters in URI | |||
| references. | ||||
| 7. URI/IRI Processing Guidelines (informative) | 7. URI/IRI Processing Guidelines (informative) | |||
| This informative section provides guidelines for supporting IRIs in | This informative section provides guidelines for supporting IRIs in | |||
| the same software components and operations that currently process | the same software components and operations that currently process | |||
| URIs: software interfaces that handle URIs, software that allows | URIs: software interfaces that handle URIs, software that allows | |||
| users to enter URIs, software that creates or generates URIs, | users to enter URIs, software that creates or generates URIs, | |||
| software that displays URIs, formats and protocols that transport | software that displays URIs, formats and protocols that transport | |||
| URIs, and software that interprets URIs. These may all require more | URIs, and software that interprets URIs. These may all require more | |||
| or less modification before functioning properly with IRIs. The | or less modification before functioning properly with IRIs. The | |||
| skipping to change at page 35, line 49 | skipping to change at page 36, line 17 | |||
| Profile for Internationalized Domain Names (IDN)", RFC | Profile for Internationalized Domain Names (IDN)", RFC | |||
| 3491, March 2003. | 3491, March 2003. | |||
| [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | |||
| 10646", STD 63, RFC 3629, November 2003. | 10646", STD 63, RFC 3629, November 2003. | |||
| [RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | [RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | |||
| Resource Identifier (URI): Generic Syntax (Note to the RFC | Resource Identifier (URI): Generic Syntax (Note to the RFC | |||
| Editor: Please update this reference with the RFC | Editor: Please update this reference with the RFC | |||
| resulting from draft-fielding-uri-rfc2396bis-xx.txt, and | resulting from draft-fielding-uri-rfc2396bis-xx.txt, and | |||
| remove this Note)", draft-fielding-uri-rfc2396bis-05.txt | remove this Note)", draft-fielding-uri-rfc2396bis-07.txt | |||
| (work in progress), April 2004. | (work in progress), April 2004. | |||
| [UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | [UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard | |||
| Annex #9, March 2004, | Annex #9, March 2004, | |||
| <http://www.unicode.org/reports/tr9/tr9-13.html>. | <http://www.unicode.org/reports/tr9/tr9-13.html>. | |||
| [UNIV4] The Unicode Consortium, "The Unicode Standard, Version | [UNIV4] The Unicode Consortium, "The Unicode Standard, Version | |||
| 4.0.1, defined by: The Unicode Standard, Version 4.0 | 4.0.1, defined by: The Unicode Standard, Version 4.0 | |||
| (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), | (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), | |||
| as amended by Unicode 4.0.1 | as amended by Unicode 4.0.1 | |||
| (http://www.unicode.org/versions/Unicode4.0.1/)", March | (http://www.unicode.org/versions/Unicode4.0.1/)", March | |||
| 2004. | 2004. | |||
| [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | |||
| Unicode Standard Annex #15, April 2003, | Unicode Standard Annex #15, April 2003, | |||
| <http://www.unicode.org/unicode/reports/tr15/tr15-23.html>. | <http://www.unicode.org/unicode/reports/tr15/ | |||
| tr15-23.html>. | ||||
| 11.2 Non-normative References | 11.2 Non-normative References | |||
| [BidiEx] "Examples of bidirectional IRIs", | [BidiEx] "Examples of bidirectional IRIs", | |||
| <http://www.w3.org/International/iri-edit/BidiExamples>. | <http://www.w3.org/International/iri-edit/BidiExamples>. | |||
| [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T. | [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M. and T. | |||
| Texin, "Character Model for the World Wide Web", World | Texin, "Character Model for the World Wide Web", World | |||
| Wide Web Consortium Working Draft, February 2004, <http:// | Wide Web Consortium Working Draft, February 2004, | |||
| www.w3.org/TR/charmod>. | <http://www.w3.org/TR/charmod>. | |||
| [Duerst01] | [Duerst01] | |||
| Duerst, M., "Internationalized Resource Identifiers: From | Duerst, M., "Internationalized Resource Identifiers: From | |||
| Specification to Testing", Proc. 19th International | Specification to Testing", Proc. 19th International | |||
| Unicode Conference, San Jose , September 2001, | Unicode Conference, San Jose , September 2001, | |||
| <http://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html>. | <http://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html>. | |||
| [Duerst97] | [Duerst97] | |||
| Duerst, M., "The Properties and Promises of UTF-8", Proc. | Duerst, M., "The Properties and Promises of UTF-8", Proc. | |||
| 11th International Unicode Conference, San Jose , | 11th International Unicode Conference, San Jose , | |||
| End of changes. | ||||
This html diff was produced by rfcdiff 1.12, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ | ||||