draft-duerst-iri-05.txt   draft-duerst-iri-06.txt 
Network Working Group M. Duerst Network Working Group M. Duerst
Internet-Draft W3C Internet-Draft W3C
Expires: April 25, 2004 M. Suignard Expires: August 15, 2004 M. Suignard
Microsoft Corporation Microsoft Corporation
October 26, 2003 February 15, 2004
Internationalized Resource Identifiers (IRIs) Internationalized Resource Identifiers (IRIs)
draft-duerst-iri-05 draft-duerst-iri-06
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 33 skipping to change at page 1, line 33
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http:// The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt. www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 25, 2004. This Internet-Draft will expire on August 15, 2004.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract Abstract
This document defines a new protocol element, the Internationalized This document defines a new protocol element, the Internationalized
Resource Identifier (IRI), as a complement to the URI [RFCYYYY]. An Resource Identifier (IRI), as a complement to the URI [RFCYYYY]. An
IRI is a sequence of characters from the Universal Character Set IRI is a sequence of characters from the Universal Character Set
[ISO10646]. A mapping from IRIs to URIs is defined, which means that [ISO10646]. A mapping from IRIs to URIs is defined, which means that
IRIs can be used instead of URIs where appropriate to identify IRIs can be used instead of URIs where appropriate to identify
resources. resources.
skipping to change at page 2, line 29 skipping to change at page 2, line 29
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Overview and Motivation . . . . . . . . . . . . . . . . . . 4 1.1 Overview and Motivation . . . . . . . . . . . . . . . . . . 4
1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . . 7 2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . . 7
2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . . 7 2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . . 7
3. Relationship between IRIs and URIs . . . . . . . . . . . . . 10 3. Relationship between IRIs and URIs . . . . . . . . . . . . . 9
3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . . 10 3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . . 10
3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . . 13 3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . . 13
3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4. Bidirectional IRIs for Right-to-left Languages . . . . . . . 16 4. Bidirectional IRIs for Right-to-left Languages . . . . . . . 15
4.1 Logical Storage and Visual Presentation . . . . . . . . . . 16 4.1 Logical Storage and Visual Presentation . . . . . . . . . . 16
4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . 17 4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . 17
4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . 18 4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . 18
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5. IRI Equivalence and Comparison . . . . . . . . . . . . . . . 20 5. IRI Equivalence and Comparison . . . . . . . . . . . . . . . 20
5.1 Simple String Comparison . . . . . . . . . . . . . . . . . . 20 5.1 Simple String Comparison . . . . . . . . . . . . . . . . . . 20
5.2 Conversion to URIs . . . . . . . . . . . . . . . . . . . . . 21 5.2 Conversion to URIs . . . . . . . . . . . . . . . . . . . . . 21
5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . 21 5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Preferred Forms . . . . . . . . . . . . . . . . . . . . . . 22 5.4 Preferred Forms . . . . . . . . . . . . . . . . . . . . . . 22
6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . 22 6. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . 23
6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . . 23 6.1 Limitations on UCS Characters Allowed in IRIs . . . . . . . 23
6.2 Software Interfaces and Protocols . . . . . . . . . . . . . 23 6.2 Software Interfaces and Protocols . . . . . . . . . . . . . 23
6.3 Format of URIs and IRIs in Documents and Protocols . . . . . 23 6.3 Format of URIs and IRIs in Documents and Protocols . . . . . 23
6.4 Use of UTF-8 for Encoding Original Characters . . . . . . . 24 6.4 Use of UTF-8 for Encoding Original Characters . . . . . . . 24
6.5 Relative IRI References . . . . . . . . . . . . . . . . . . 25 6.5 Relative IRI References . . . . . . . . . . . . . . . . . . 25
7. URI/IRI Processing Guidelines (informative) . . . . . . . . 25 7. URI/IRI Processing Guidelines (informative) . . . . . . . . 25
7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . . 25 7.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . . 25
7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . . 26 7.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . . 26
7.3 URI/IRI Transfer Between Applications . . . . . . . . . . . 26 7.3 URI/IRI Transfer Between Applications . . . . . . . . . . . 26
7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . . 27 7.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . . 27
skipping to change at page 7, line 50 skipping to change at page 7, line 50
by their transformation to URI references and URIs, they can also be by their transformation to URI references and URIs, they can also be
accepted and processed directly. Therefore, an ABNF definition for accepted and processed directly. Therefore, an ABNF definition for
IRI references (which are the most general concept and the start of IRI references (which are the most general concept and the start of
the grammar) and IRIs is given here. The syntax of this ABNF is the grammar) and IRIs is given here. The syntax of this ABNF is
described in [RFC2234]. Character numbers are taken from the UCS, described in [RFC2234]. Character numbers are taken from the UCS,
without implying any actual binary encoding. Terminals in the ABNF without implying any actual binary encoding. Terminals in the ABNF
are characters, not bytes. are characters, not bytes.
The following rules are different from [RFCYYYY]: The following rules are different from [RFCYYYY]:
IRI = scheme ":" ["//" iauthority] ipath ["?" iquery]
["#" ifragment]
IRI-reference = IRI / relative-IRI IRI-reference = IRI / relative-IRI
IRI = scheme ":" ihier-part [ "?" iquery ] [ "#" ifragment ] relative-IRI = ["//" iauthority] ipath ["?" iquery]
absolute-IRI = scheme ":" ihier-part [ "?" iquery ] ["#" ifragment]
relative-IRI = ihier-part [ "?" iquery ] [ "#" ifragment ]
ihier-part = inet-path / iabs-path / irel-path
inet-path = "//" iauthority [ iabs-path ]
iabs-path = "/" ipath-segments
irel-path = ipath-segments absolute-IRI = scheme ":" ["//" iauthority] ipath ["?" iquery]
iauthority = [ iuserinfo "@" ] ihost [ ":" port ] iauthority = [ iuserinfo "@" ] ihost [ ":" port ]
iuserinfo = *( iunreserved / escaped / ";" / iuserinfo = *( iunreserved / pct-encoded / sub-delims
":" / "&" / "=" / "+" / "$" / "," ) / ":" )
ihost = [ IPv6reference / IPv4address / ihostname ]
ihostname = idomainlabel iqualified
iqualified = *( "." idomainlabel ) [ "." ]
idomainlabel = <<See following production rules>> ihost = IP-literal / IPv4address / ireg-name
ipath-segments = ipath-segment *( "/" ipath-segment ) ireg-name = 0*255( iunreserved / pct-encoded / sub-delims )
ipath-segment = *ipchar ipath = isegment *( "/" isegment )
ipchar = iunreserved / escaped / ";" / isegment = *ipchar
":" / "@" / "&" / "=" / "+" / "$" / ","
iquery = *( ipchar / iprivate / "/" / "?" ) iquery = *( ipchar / iprivate / "/" / "?" )
ifragment = *( ipchar / "/" / "?" ) ifragment = *( ipchar / "/" / "?" )
iric = reserved / iunreserved / escaped ipchar = iunreserved / pct-encoded / sub-delims / ":"
/ "@"
iunreserved = unreserved / ucschar iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF /
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD / %xD0000-DFFFD / %xE1000-EFFFD
iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD
The 'idomainlabel' production rule is as follows:
The value 'idomainlabel' is defined as a string of 'ucschar' obeying
the following rules:
a) Given a string of 'ucschar' values, the ToASCII operation
[RFC3490] is performed on that string with the flag
UseSTD3ASCIIRules set to TRUE and the flag AllowUnassigned set
to FALSE for creating IRIs and set to TRUE otherwise.
b) ToASCII is successful. (Note: This means that its output Some productions ambiguous. The "first-match-wins" (a.k.a.
conforms to 'domainlabel' as defined below.) "greedy") algorithm applies. For details, see [RFCYYYY].
The following are the same as [RFCYYYY]: The following are the same as [RFCYYYY]:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
port = *DIGIT port = *DIGIT
domainlabel = alphanum [ 0*61( alphanum | "-" ) alphanum ] IP-literal = "[" ( IPv6address | IPvFuture ) "]"
IPvFuture = "v" HEXDIG "." 1*( unreserved / sub-delims / ":" )
alphanum = ALPHA / DIGIT
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = DIGIT ; 0-9
/ ( %x31-39 DIGIT ) ; 10-99
/ ( "1" 2DIGIT ) ; 100-199
/ ( "2" %x30-34 DIGIT ) ; 200-249
/ ( "25" %x30-35 ) ; 250-255
IPv6reference = "[" IPv6address "]"
IPv6address = 6( h4 ":" ) ls32 IPv6address = 6( h4 ":" ) ls32
/ "::" 5( h4 ":" ) ls32 / "::" 5( h4 ":" ) ls32
/ [ h4 ] "::" 4( h4 ":" ) ls32 / [ h4 ] "::" 4( h4 ":" ) ls32
/ [ *1( h4 ":" ) h4 ] "::" 3( h4 ":" ) ls32 / [ *1( h4 ":" ) h4 ] "::" 3( h4 ":" ) ls32
/ [ *2( h4 ":" ) h4 ] "::" 2( h4 ":" ) ls32 / [ *2( h4 ":" ) h4 ] "::" 2( h4 ":" ) ls32
/ [ *3( h4 ":" ) h4 ] "::" h4 ":" ls32 / [ *3( h4 ":" ) h4 ] "::" h4 ":" ls32
/ [ *4( h4 ":" ) h4 ] "::" ls32 / [ *4( h4 ":" ) h4 ] "::" ls32
/ [ *5( h4 ":" ) h4 ] "::" h4 / [ *5( h4 ":" ) h4 ] "::" h4
/ [ *6( h4 ":" ) h4 ] "::" / [ *6( h4 ":" ) h4 ] "::"
h4 = 1*4HEXDIG h4 = 1*4HEXDIG
ls32 = ( h4 ":" h4 ) / IPv4address ls32 = ( h4 ":" h4 ) / IPv4address
reserved = "/" / "?" / "#" / "[" / "]" / ";" / IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
":" / "@" / "&" / "=" / "+" / "$" / ","
unreserved = ALPHA / DIGIT / mark
mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / dec-octet = DIGIT ; 0-9
"(" / ")" / %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
escaped = "%" HEXDIG HEXDIG pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
3. Relationship between IRIs and URIs 3. Relationship between IRIs and URIs
IRIs are meant to replace URIs in identifying resources for IRIs are meant to replace URIs in identifying resources for
protocols, formats and software components which use a UCS-based protocols, formats and software components which use a UCS-based
character repertoire. These protocols and components may never need character repertoire. These protocols and components may never need
to use URIs directly, especially when the resource identifier is used to use URIs directly, especially when the resource identifier is used
simply for identification purposes. However, when the resource simply for identification purposes. However, when the resource
identifier is used for resource retrieval, it is in many cases identifier is used for resource retrieval, it is in many cases
necessary to determine the associated URI because most retrieval necessary to determine the associated URI because most retrieval
skipping to change at page 11, line 22 skipping to change at page 10, line 51
Variant B) If the IRI is in some digital representation Variant B) If the IRI is in some digital representation
(e.g. an octet stream) in some known non-Unicode (e.g. an octet stream) in some known non-Unicode
encoding: Convert the IRI to a sequence of characters encoding: Convert the IRI to a sequence of characters
from the UCS normalized according to NFC. from the UCS normalized according to NFC.
Variant C) If the IRI is in an Unicode-based encoding (for Variant C) If the IRI is in an Unicode-based encoding (for
example UTF-8 or UTF-16): Do not normalize. Move example UTF-8 or UTF-16): Do not normalize. Move
directly to Step 2. directly to Step 2.
Step 2) If the IRI contains an 'ihostname' part, replace this Step 2) For each character that is disallowed in URI references,
'ihostname' part by the part converted using the ToASCII
operation specified in Section 4.1 of [RFC3490], with the flag
UseSTD3ASCIIRules set to TRUE and the flag AllowUnassigned set
to FALSE for creating IRIs and set to TRUE otherwise. The
ToASCII operation may fail, but only if the IRI does not
conform to the rules in Section 2.2.
Step 3) For each character that is disallowed in URI references,
apply steps 1) through 3) below. The disallowed characters apply steps 1) through 3) below. The disallowed characters
consist of all non-ASCII characters allowed in IRIs. consist of all non-ASCII characters allowed in IRIs.
1) Convert the character to a sequence of one or more octets 1) Convert the character to a sequence of one or more octets
using UTF-8 [RFCXXXX]. using UTF-8 [RFC3629].
2) Convert each octet to %HH, where HH is the hexadecimal 2) Convert each octet to %HH, where HH is the hexadecimal
notation of the octet value. Note: This is identical to notation of the octet value. Note: This is identical to
the escaping mechanism in Section 2.4.1 of [RFCYYYY]. To the escaping mechanism in Section 2.4.1 of [RFCYYYY]. To
reduce variability, the hexadecimal notation SHOULD use reduce variability, the hexadecimal notation SHOULD use
upper case letters. upper case letters.
3) Replace the original character by the resulting character 3) Replace the original character by the resulting character
sequence (i.e. a sequence of %HH triplets). sequence (i.e. a sequence of %HH triplets).
The above mapping from IRIs to URIs produces URIs fully conforming to The above mapping from IRIs to URIs produces URIs fully conforming to
[RFCYYYY]. The mapping is also an identity transformation for URIs [RFCYYYY]. The mapping is also an identity transformation for URIs
and is idempotent -- applying the mapping a second time will not and is idempotent -- applying the mapping a second time will not
change anything. Every URI is by definition an IRI. change anything. Every URI is by definition an IRI.
Infrastructure accepting IRIs MAY also deal with 'ihostname' parts Infrastructure accepting IRIs MAY also convert the ireg-name
escaped according to Step 3) rather than Step 2). For example, Step component of an IRI as follows (before step 2 above) if it knows that
2) converts the IRI the scheme in question uses domain names: Replace the iregname part
http://r&#xE9;sum&#xE9;.example.org to of the IRI by the part converted using the ToASCII operation
http://xn--rsum-bpad.example.org. For backward compatibility, specified in Section 4.1 of [RFC3490], with the flag
http://r%C3%A9sum%C3%A9.example.org would also be converted to UseSTD3ASCIIRules set to TRUE and the flag AllowUnassigned set to
http://xn--rsum-bpad.example.org. FALSE for creating IRIs and set to TRUE otherwise. The ToASCII
operation may fail, but this would mean that the IRI cannot be
resolved. For example, the IRI
http://r&#xE9;sum&#xE9;.example.org may be converted to
http://xn--rsum-bpad.example.org instead of
http://r%C3%A9sum%C3%A9.example.org.
Infrastructure accepting IRIs MAY also deal with the printable Note: The uniform treatment of the whole IRI in step 2) above is
characters in US-ASCII that are not allowed in URIs, namely "<", ">", important to not make processing dependent on URI scheme. See
'"', Space, "{", "}", "|", "\", "^", and "`", in step 3) above. If [Gettys] for an in-depth discussion.
such characters are found but are not converted, then the conversion
SHOULD fail. Please note that the number sign ("#"), the percent
sign ("%"), and the square bracket characters ("[", "]") are not part
of the above list, and MUST NOT be converted. Protocols and formats
that have used earlier definitions of IRIs including these characters
MAY require unescaping of these characters as a preprocessing step to
extract the actual IRI from a given field. Such preprocessing MAY
also be used by applications allowing the user to enter an IRI.
Internationalized Domain Names may be contained in parts of an Note: In practice, the difference above will not be noticed if
IRI other than the 'ihostname' part. In this case, Step 2) is mapping from IRI to URI and resolution is tightly integrated
not used, but Step 3) is applied. This is important to (e.g. carried out in the same user agent). But conversion
maintain uniform treatment of URIs. See [Gettys] for an in- using [RFC3490] may be able to better deal with backwards
depth discussion. It is the responsibility of scheme-specific compatibility issues in case mapping and resolution are
separated, as in the case of using an HTTP proxy.
Note: Internationalized Domain Names may be contained in parts of
an IRI other than the ireg-name part. It is the responsibility
of scheme-specific implementations (if the Internationalized
Domain Name is part of the scheme syntax) or of server-side
implementations (if the Internationalized Domain Name is part implementations (if the Internationalized Domain Name is part
of the scheme syntax) or of server-side implementations (if the of 'iquery') to apply the necessary conversions at the
Internationalized Domain Name is part of 'iquery') to apply the appropriate point. Example: Trying to validate the Web page at
necessary conversions at the appropriate point. Example:
Trying to validate the Web page at
http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of
http://validator.w3.org/ http://validator.w3.org/
check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;.example.org, which check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;.example.org, which
would convert to a URI of would convert to a URI of
http://validator.w3.org/ http://validator.w3.org/
check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9.example.org. The check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9.example.org. The
server side implementation would be responsible to do the server side implementation would be responsible to do the
necessary conversions in order to be able to retrieve the Web necessary conversions in order to be able to retrieve the Web
page. page.
In this process (in step 3.3), characters allowed in URI Infrastructure accepting IRIs MAY also deal with the printable
characters in US-ASCII that are not allowed in URIs, namely "<", ">",
'"', Space, "{", "}", "|", "\", "^", and "`", in step 2) above. If
such characters are found but are not converted, then the conversion
SHOULD fail. Please note that the number sign ("#"), the percent
sign ("%"), and the square bracket characters ("[", "]") are not part
of the above list, and MUST NOT be converted. Protocols and formats
that have used earlier definitions of IRIs including these characters
MAY require unescaping of these characters as a preprocessing step to
extract the actual IRI from a given field. Such preprocessing MAY
also be used by applications allowing the user to enter an IRI.
Note: In this process (in step 2.3), characters allowed in URI
references as well as existing escape sequences are not escaped references as well as existing escape sequences are not escaped
further. (This mapping is similar to, but different from, the further. (This mapping is similar to, but different from, the
escaping applied when including arbitrary content into some escaping applied when including arbitrary content into some
part of a URI.) For example, an IRI of part of a URI.) For example, an IRI of
http://www.example.org/red%09ros&#xE9;#red (in XML notation) is http://www.example.org/red%09ros&#xE9;#red (in XML notation) is
converted to converted to
http://www.example.org/red%09ros%C3%A9#red, not to something http://www.example.org/red%09ros%C3%A9#red, not to something
like like
http%3A%2F%2Fwww.example.org%2Fred%2509ros%C3%A9%23red. http%3A%2F%2Fwww.example.org%2Fred%2509ros%C3%A9%23red.
Some older software transcoding to UTF-8 may produce illegal Note: Some older software transcoding to UTF-8 may produce illegal
output for some input, in particular for characters outside the output for some input, in particular for characters outside the
BMP (Basic Multilingual Plane). As an example, for the BMP (Basic Multilingual Plane). As an example, for the
following IRI with non-BMP characters (in XML Notation): following IRI with non-BMP characters (in XML Notation):
http://example.com/&#x10300;&#x10301;&#x10302; http://example.com/&#x10300;&#x10301;&#x10302;
(the first three letters of the Old Italic alphabet) the (the first three letters of the Old Italic alphabet) the
correct conversion to a URI is: correct conversion to a URI is:
http://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C%82 http://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C%82
3.2 Converting URIs to IRIs 3.2 Converting URIs to IRIs
skipping to change at page 13, line 48 skipping to change at page 13, line 38
discussion, see [Duerst97].) discussion, see [Duerst97].)
c) The conversion may result in a character that is not c) The conversion may result in a character that is not
appropriate in an IRI. See Section 6.1 for further details. appropriate in an IRI. See Section 6.1 for further details.
Conversion from a URI to an IRI is done using the following steps (or Conversion from a URI to an IRI is done using the following steps (or
any other algorithm that produces the same result): any other algorithm that produces the same result):
1) Represent the URI as a sequence of octets in US-ASCII. 1) Represent the URI as a sequence of octets in US-ASCII.
2) Apply the ToUnicode operation to each 'domainlabel' in the 2) Convert all hexadecimal escapes (% followed by two hexadecimal
'hostname' part (if there is one), representing the output as
UTF-8.
3) Convert all hexadecimal escapes (% followed by two hexadecimal
digits) except those corresponding to '%', characters in digits) except those corresponding to '%', characters in
'reserved', and characters in US-ASCII not allowed in URIs, to 'reserved', and characters in US-ASCII not allowed in URIs, to
the corresponding octets. the corresponding octets.
4) Re-escape any octet produced in step 3) that is not part of a 3) Re-escape any octet produced in step 2) that is not part of a
strictly legal UTF-8 octet sequence. strictly legal UTF-8 octet sequence.
5) Re-escape all octets produced in step 3) that in UTF-8 4) Re-escape all octets produced in step 3) that in UTF-8
represent characters that are not appropriate according to represent characters that are not appropriate according to
Section 4.1 and Section 6.1. Section 4.1 and Section 6.1.
6) Interpret the resulting octet sequence as a sequence of 5) Interpret the resulting octet sequence as a sequence of
characters encoded in UTF-8. characters encoded in UTF-8.
This procedure will convert as many escaped non-ASCII characters as This procedure will convert as many escaped non-ASCII characters as
possible to characters in an IRI. Because there are some choices possible to characters in an IRI. Because there are some choices
when applying step 5) (see Section 6.1), results may vary. when applying step 4) (see Section 6.1), results may vary.
Conversions from URIs to IRIs MUST NOT use any other encoding than Conversions from URIs to IRIs MUST NOT use any other encoding than
UTF-8 in steps 2), 4) and 5) above, even if it might be possible from UTF-8 in steps 3) and 4) above, even if it might be possible from
context to guess that another encoding than UTF-8 was used in the context to guess that another encoding than UTF-8 was used in the
URI. As an example, the URI http://www.example.org/r%E9sum%E9.html URI. As an example, the URI http://www.example.org/r%E9sum%E9.html
might with some guessing be interpreted to contain two e-acute might with some guessing be interpreted to contain two e-acute
characters encoded as iso-8859-1. It must not be converted to an IRI characters encoded as iso-8859-1. It must not be converted to an IRI
containing these e-acute characters. Otherwise, the IRI will in the containing these e-acute characters. Otherwise, the IRI will in the
future be mapped to http://www.example.org/r%C3%A9sum%C3%A9.html, future be mapped to http://www.example.org/r%C3%A9sum%C3%A9.html,
which is a different URI from http://www.example.org/r%E9sum%E9.html. which is a different URI than http://www.example.org/r%E9sum%E9.html.
3.2.1 Examples 3.2.1 Examples
This section shows various examples of converting URIs to IRIs. The This section shows various examples of converting URIs to IRIs. The
notation <hh> is used to denote octets outside those that can be notation <hh> is used to denote octets outside those that can be
represented in this document. Each example shows the result after represented in this document. Each example shows the result after
applying each of the steps 1) to 6). XML Notation is used for the applying each of the steps 1) to 5). XML Notation is used for the
final result. final result.
The following example contains the sequence '%C3%BC', which is a The following example contains the sequence '%C3%BC', which is a
strictly legal UTF-8 sequence, and which is converted into the actual strictly legal UTF-8 sequence, and which is converted into the actual
character U+00FC LATIN SMALL LETTER U WITH DIAERESIS (also known as character U+00FC LATIN SMALL LETTER U WITH DIAERESIS (also known as
u-umlaut). u-umlaut).
1) http://www.example.org/D%C3%BCrst 1) http://www.example.org/D%C3%BCrst
2) http://www.example.org/D%C3%BCrst 2) http://www.example.org/D<c3><bc>rst
3) http://www.example.org/D<c3><bc>rst 3) http://www.example.org/D<c3><bc>rst
4) http://www.example.org/D<c3><bc>rst
5) http://www.example.org/D<c3><bc>rst 4) http://www.example.org/D<c3><bc>rst
6) http://www.example.org/D&#xFC;rst 5) http://www.example.org/D&#xFC;rst
The following example contains the sequence '%FC', which might The following example contains the sequence '%FC', which might
represent U+00FC LATIN SMALL LETTER U WITH DIAERESIS in the represent U+00FC LATIN SMALL LETTER U WITH DIAERESIS in the
iso-8859-1 encoding. (It might represent other characters in other iso-8859-1 encoding. (It might represent other characters in other
encodings. For example, the octet <fc> in iso-8859-5 represents encodings. For example, the octet <fc> in iso-8859-5 represents
U+045C CYRILLIC SMALL LETTER KJE.) Because <fc> is not part of a U+045C CYRILLIC SMALL LETTER KJE.) Because <fc> is not part of a
strictly legal UTF-8 sequence, it is re-escaped in step 2). strictly legal UTF-8 sequence, it is re-escaped in step 3).
1) http://www.example.org/D%FCrst 1) http://www.example.org/D%FCrst
2) http://www.example.org/D%FCrst 2) http://www.example.org/D<fc>rst
3) http://www.example.org/D%FCrst
3) http://www.example.org/D<fc>rst
4) http://www.example.org/D%FCrst 4) http://www.example.org/D%FCrst
5) http://www.example.org/D%FCrst 5) http://www.example.org/D%FCrst
6) http://www.example.org/D%FCrst
The following example contains '%e2%80%ae', which is the escaped The following example contains '%e2%80%ae', which is the escaped
UTF-8 encoding of U+202E, RIGHT-TO-LEFT OVERRIDE. Section 4.1 UTF-8 encoding of U+202E, RIGHT-TO-LEFT OVERRIDE. Section 4.1
forbids the direct use of this character in an IRI. Therefore, the forbids the direct use of this character in an IRI. Therefore, the
corresponding octets are re-escaped in step 5). This example shows corresponding octets are re-escaped in step 4). This example shows
that the case (upper or lower) of letters used in escapes may not be that the case (upper or lower) of letters used in escapes may not be
preserved. The example also contains a punycode-encoded domain name preserved. The example also contains a punycode-encoded domain name
label (xn--99zt52a), which is converted to the corresponding label (xn--99zt52a), which is not converted.
characters U+7D0D U+8C46 (Japanese Natto).
1) http://xn--99zt52a.example.org/%e2%80%ae 1) http://xn--99zt52a.example.org/%e2%80%ae
2) http://<e7><b4><8d><e8><b1><86>.example.org/%e2%80%ae 2) http://xn--99zt52a.example.org/<e2><80><ae>
3) http://<e7><b4><8d><e8><b1><86>.example.org/<e2><80><ae> 3) http://xn--99zt52a.example.org/<e2><80><ae>
4) http://<e7><b4><8d><e8><b1><86>.example.org/<e2><80><ae> 4) http://xn--99zt52a.example.org/%E2%80%AE
5) http://<e7><b4><8d><e8><b1><86>.example.org/%E2%80%AE 5) http://xn--99zt52a.example.org/%E2%80%AE
6) http://&#x7D0D;&#x8C46;.example.org/%E2%80%AE Implementations with scheme-specific knowledge MAY convert punycode-
encoded domain name labels to the corresponding characters using the
ToUnicode procedure. Thus, for the example above, the label xn--
99zt52a may be converted to U+7D0D U+8C46 (Japanese Natto), leading
to the overall IRI of
http://&#x7D0D;&#x8C46;.example.org/%E2%80%AE
4. Bidirectional IRIs for Right-to-left Languages 4. Bidirectional IRIs for Right-to-left Languages
Some UCS characters, such as those used in the Arabic and Hebrew Some UCS characters, such as those used in the Arabic and Hebrew
script, have an inherent right-to-left (rtl) writing direction. IRIs script, have an inherent right-to-left (rtl) writing direction. IRIs
containing such characters (called bidirectional IRIs or Bidi IRIs) containing such characters (called bidirectional IRIs or Bidi IRIs)
require additional attention because of the non-trivial relation require additional attention because of the non-trivial relation
between logical representation (used for digital representation as between logical representation (used for digital representation as
well as when reading/spelling) and visual representation (used for well as when reading/spelling) and visual representation (used for
display/printing). display/printing).
skipping to change at page 16, line 38 skipping to change at page 16, line 19
4.1 Logical Storage and Visual Presentation 4.1 Logical Storage and Visual Presentation
When stored or transmitted in digital representation, bidirectional When stored or transmitted in digital representation, bidirectional
IRIs MUST be in full logical order, and MUST conform to the IRI IRIs MUST be in full logical order, and MUST conform to the IRI
syntax rules (which includes the rules relevant to their scheme). syntax rules (which includes the rules relevant to their scheme).
This assures that bidirectional IRIs can be processed in the same way This assures that bidirectional IRIs can be processed in the same way
as other IRIs. as other IRIs.
When rendered, bidirectional IRIs MUST be rendered using the Unicode When rendered, bidirectional IRIs MUST be rendered using the Unicode
Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be
rendered with an overall left-to-right (ltr) direction. rendered in the same way as they would be rendered if they were in an
left-to-right embedding, i.e. as if they were preceded by U+202A,
LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP
DIRECTIONAL FORMATTING (PDF). Setting the embedding direction can
also be done in a higher-order protocol (e.g. the dir='ltr'
attribute in HTML).
In text with a left-to-right base directionality or embedding (such There is no requirement to actually use the above embedding if the
as used for English or Cyrillic), the Unicode Bidirectional Algorithm display is still the same without the embedding. For example, a
will automatically use an overall ltr direction for the IRI. In text bidirectional IRI in a text with left-to-right base directionality
with a rtl base directionality or embedding (such as used for Arabic (such as used for English or Cyrillic) that is preceded and followed
or Hebrew), setting a different embedding direction for the IRI is by whitespace and strong left-to-right characters does not need an
needed. Setting the embedding direction can be done in a higher- embedding. Also, a bidirectional relative IRI that only contains
order protocol (e.g. the dir='ltr' attribute in HTML). If this is strong right-to-left characters and weak characters and that starts
not available (e.g. in plain text), setting the embedding is done and ends with a strong rigth-to-left character and appears in a text
with Unicode bidi formatting codes, i.e. U+202A, LEFT-TO-RIGHT with right-to-left base directionality (such as used for Arabic or
EMBEDDING (LRE) before the IRI, and U+202C, POP DIRECTIONAL Hebrew) and is preceded and followed by whitespace and strong
FORMATTING (PDF) after the IRI, both not being part of the IRI characters does not need an embedding.
itself.
IRIs MUST NOT contain bidirectional formatting characters (LRM, RLM, In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM) may be
LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of sufficient to force the correct display behavior. However, the
the IRI, but do not themselves appear visually. It would therefore details of the Unicode Bidirectional algorithm are not always easy to
not be possible to correctly input an IRI with such characters. understand. Implementers are strongly advised to err on the side of
caution and to use embedding in all cases where they are not
completely sure that the display behavior is unaffected without the
embedding.
The Unicode Bidirectional Algorithm ([UNI9], Section 4.3) permits
higher-level protocols to influence bidirectional rendering. Such
changes by higher-level protocols MUST NOT be used if they change the
rendering of IRIs.
The bidirectional formatting characters that may be used before or
after the IRI to assure correct display are themselves not part of
the IRI. IRIs MUST NOT contain bidirectional formatting characters
(LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual
rendering of the IRI, but do not themselves appear visually. It
would therefore not be possible to correctly input an IRI with such
characters.
4.2 Bidi IRI Structure 4.2 Bidi IRI Structure
The Unicode Bidirectional Algorithm is designed mainly for running The Unicode Bidirectional Algorithm is designed mainly for running
text. To make sure that it does not affect the rendering of text. To make sure that it does not affect the rendering of
bidirectional IRIs too much, some restrictions on bidirectional IRIs bidirectional IRIs too much, some restrictions on bidirectional IRIs
are necessary. These restrictions are given in terms of delimiters are necessary. These restrictions are given in terms of delimiters
(structural characters, mostly punctuation such as '@', '.', ':', (structural characters, mostly punctuation such as '@', '.', ':',
'/') and components (usually consisting mostly of letters and '/') and components (usually consisting mostly of letters and
digits). digits).
The following syntax rules from Section 2.2 correspond to components The following syntax rules from Section 2.2 correspond to components
for the purpose of Bidi behavior: iuserinfo, ipath-segment, for the purpose of Bidi behavior: iuserinfo, isegment, ireg-name,
ihostname, iquery, and ifragment. iquery, and ifragment.
Specifications that define the syntax of any of the above components Specifications that define the syntax of any of the above components
MAY divide them further and define smaller parts to be components MAY divide them further and define smaller parts to be components
according to this document. As an example, the restrictions of according to this document. As an example, the restrictions of
[RFC3490] on bidirectional domain names correspond to treating each [RFC3490] on bidirectional domain names correspond to treating each
label of the domain name as a component. Even where the components label of a domain name as a component for those schemes where ireg-
are not defined formally, it may be helpful to think about some name is a domain name. Even where the components are not defined
syntax in terms of components and to apply the relevant restrictions. formally, it may be helpful to think about some syntax in terms of
For example, for the usual name/value syntax in query parts, it is components and to apply the relevant restrictions. For example, for
convenient to treat each name and each value as a component. As the usual name/value syntax in query parts, it is convenient to treat
another example, the extensions in a resource name can be treated as each name and each value as a component. As another example, the
separate components. extensions in a resource name can be treated as separate components.
For each component, the following restrictions apply: For each component, the following restrictions apply:
1) A component SHOULD NOT not use both right-to-left and left-to- 1) A component SHOULD NOT not use both right-to-left and left-to-
right characters. right characters.
2) A component using right-to-left characters SHOULD start and end 2) A component using right-to-left characters SHOULD start and end
with right-to-left characters. with right-to-left characters.
The above restrictions are given as shoulds, rather than as musts. The above restrictions are given as shoulds, rather than as musts.
For IRIs that are never presented visually, they are not relevant. For IRIs that are never presented visually, they are not relevant.
However, for IRIs in general, they are very important to insure However, for IRIs in general, they are very important to insure
consistent conversion between visual presentation and logical consistent conversion between visual presentation and logical
representation, in both directions. representation, in both directions.
In some components, the above restrictions may actually be Note: In some components, the above restrictions may actually be
strictly enforced. For example, [RFC3490] requires that these strictly enforced. For example, [RFC3490] requires that these
restrictions apply to the labels of the host name part of an restrictions apply to the labels of a host name for those
IRI. In some other components, for example path components, schemes where ireg-name is a host name. In some other
following these restrictions may not be too difficult. For components, for example path components, following these
other components, such as parts of the query part, it may be restrictions may not be too difficult. For other components,
very difficult to enforce the restrictions, because the values such as parts of the query part, it may be very difficult to
of query parameters may be arbitrary character sequences. enforce the restrictions, because the values of query
parameters may be arbitrary character sequences.
If the above restrictions cannot be satisfied otherwise, the affected If the above restrictions cannot be satisfied otherwise, the affected
component can always be mapped to URI notation as described in component can always be mapped to URI notation as described in
Section 3.1. Please note that the whole component needs to be mapped Section 3.1. Please note that the whole component needs to be mapped
(see also Example 9 below). (see also Example 9 below).
4.3 Input of Bidi IRIs 4.3 Input of Bidi IRIs
Bidi input methods MUST generate Bidi IRIs in logical order while Bidi input methods MUST generate Bidi IRIs in logical order while
rendering them according to Section 4.1. During input, rendering rendering them according to Section 4.1. During input, rendering
skipping to change at page 22, line 24 skipping to change at page 22, line 28
provided instead of a direct negative result). The best recipe provided instead of a direct negative result). The best recipe
is that the generator uses a reasonable capitalization, and is that the generator uses a reasonable capitalization, and
when transfering the URI, that capitalization is never changed. when transfering the URI, that capitalization is never changed.
Various IRI schemes may allow the usage of International Domain Names Various IRI schemes may allow the usage of International Domain Names
(IDN) [RFC3490]. When in use in IRIs, those names SHOULD be (IDN) [RFC3490]. When in use in IRIs, those names SHOULD be
validated using the ToASCII operation defined in [RFC3490], with the validated using the ToASCII operation defined in [RFC3490], with the
flags "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing flags "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing
an invalid IDN cannot successfully be resolved. For legibility an invalid IDN cannot successfully be resolved. For legibility
purposes, IDN components of IRIs SHOULD NOT be converted into ASCII purposes, IDN components of IRIs SHOULD NOT be converted into ASCII
Compatible Encoding (ACE). However, this conversion is applied when Compatible Encoding (ACE).
mapping an IRI into a URI, see Section 3.1.
5.4 Preferred Forms 5.4 Preferred Forms
The following are the preferred forms for IRIs when generated: The following are the preferred forms for IRIs when generated:
- Always provide the URI scheme in lowercase characters. - Always provide the URI scheme in lowercase characters.
- Only perform percent-escaping where it is essential. - Only perform percent-escaping where it is essential.
- Always use uppercase A-through-F characters when percent- - Always use uppercase A-through-F characters when percent-
skipping to change at page 22, line 47 skipping to change at page 22, line 50
- Always provide the hostname, if any, in the form produced when - Always provide the hostname, if any, in the form produced when
applying nameprep [RFC3491]. This in particular includes using applying nameprep [RFC3491]. This in particular includes using
lowercase characters rather than uppercase characters where lowercase characters rather than uppercase characters where
applicable. applicable.
- Where possible, provide IRI components in NFKC or NFC. - Where possible, provide IRI components in NFKC or NFC.
- Prevent /./ and /../ from appearing in non-relative URI paths. - Prevent /./ and /../ from appearing in non-relative URI paths.
- For schemes that define an empty path to be equivalent to a
path of "/", use "/".
6. Use of IRIs 6. Use of IRIs
6.1 Limitations on UCS Characters Allowed in IRIs 6.1 Limitations on UCS Characters Allowed in IRIs
This section discusses limitations on characters and character This section discusses limitations on characters and character
sequences usable for IRIs. The considerations in this section are sequences usable for IRIs. The considerations in this section are
relevant when creating IRIs and when converting from URIs to IRIs. relevant when creating IRIs and when converting from URIs to IRIs.
a) The repertoire of characters allowed in each IRI component is a) The repertoire of characters allowed in each IRI component is
limited by the definition of that component. For example, the limited by the definition of that component. For example, the
skipping to change at page 24, line 45 skipping to change at page 24, line 47
For example, for a document with a URI of For example, for a document with a URI of
http://www.example.org/r%C3%A9sum%C3%A9.html, it is possible to http://www.example.org/r%C3%A9sum%C3%A9.html, it is possible to
construct a corresponding IRI (in XML notation, see Section 1.4): construct a corresponding IRI (in XML notation, see Section 1.4):
http://www.example.org/r&#xE9;sum&#xE9;.html (&#xE9; stands for the http://www.example.org/r&#xE9;sum&#xE9;.html (&#xE9; stands for the
e-acute character, and %C3%A9 is the UTF-8 encoded and escaped e-acute character, and %C3%A9 is the UTF-8 encoded and escaped
representation of that character). On the other hand, for a document representation of that character). On the other hand, for a document
with a URI of http://www.example.org/r%E9sum%E9.html, the escaped with a URI of http://www.example.org/r%E9sum%E9.html, the escaped
octets cannot be converted to actual characters in an IRI, because octets cannot be converted to actual characters in an IRI, because
the escaping is not based on UTF-8. the escaping is not based on UTF-8.
The requirement for the use of UTF-8 applies to all parts of a URI, The requirement for the use of UTF-8 applies to all parts of a URI.
with the exception of the ihostname part. However, it is possible However, it is possible that the capability of IRIs to represent a
that the capability of IRIs to represent a wide range of characters wide range of characters directly is used just in some parts of the
directly is used just in some parts of the IRI (or IRI reference). IRI (or IRI reference). The other parts of the IRI may only contain
The other parts of the IRI may only contain ASCII characters, or they ASCII characters, or they may not be based on UTF-8. They may be
may not be based on UTF-8. They may be based on another encoding, or based on another encoding, or they may directly encode raw binary
they may directly encode raw binary data (see also [RFC2397]). data (see also [RFC2397]).
For example, it is possible to have a URI reference of For example, it is possible to have a URI reference of
http://www.example.org/r%E9sum%E9.xml#r%C3%A9sum%C3%A9, where the http://www.example.org/r%E9sum%E9.xml#r%C3%A9sum%C3%A9, where the
document name is encoded in iso-8859-1 based on server settings, but document name is encoded in iso-8859-1 based on server settings, but
the fragment identifier is encoded in UTF-8 according to [XPointer]. the fragment identifier is encoded in UTF-8 according to [XPointer].
The IRI corresponding to the above URI would be (in XML notation) The IRI corresponding to the above URI would be (in XML notation)
http://www.example.org/r%E9sum%E9.xml#r&#xE9;sum&#xE9;. http://www.example.org/r%E9sum%E9.xml#r&#xE9;sum&#xE9;.
Similar considerations apply to query parts. The functionality of Similar considerations apply to query parts. The functionality of
IRIs (namely to be able to include non-ASCII characters) can only be IRIs (namely to be able to include non-ASCII characters) can only be
skipping to change at page 32, line 27 skipping to change at page 32, line 27
[RFC3490] Faltstrom, P., Hoffman, P. and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P. and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003, <http://www.ietf.org/rfc/ RFC 3490, March 2003, <http://www.ietf.org/rfc/
rfc3490.txt>. rfc3490.txt>.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)", RFC Profile for Internationalized Domain Names (IDN)", RFC
3491, March 2003. 3491, March 2003.
[RFCXXXX] Yergeau, F., "UTF-8, a transformation format of ISO [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", draft-yergeau-rfc2279bis-05.txt (work in 10646", STD 63, RFC 3629, November 2003, <http://
progress), June 2003, <http://www.ietf.org/internet- www.ietf.org/rfc/rfc3629.txt>.
drafts/draft-yergeau-rfc2279bis-05.txt>.
[RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform [RFCYYYY] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", draft- Resource Identifier (URI): Generic Syntax", draft-
fielding-uri-rfc2396bis-03.txt (work in progress), June fielding-uri-rfc2396bis-03.txt (work in progress), June
2003. 2003.
[UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", [UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms",
Unicode Standard Annex #15, March 2001, <http:// Unicode Standard Annex #15, March 2001, <http://
www.unicode.org/unicode/reports/tr15/tr15-21.html>. www.unicode.org/unicode/reports/tr15/tr15-21.html>.
skipping to change at page 36, line 7 skipping to change at page 36, line 7
One Microsoft Way One Microsoft Way
Redmond, WA 98052 Redmond, WA 98052
U.S.A. U.S.A.
Phone: +1 425 882-8080 Phone: +1 425 882-8080
EMail: mailto:michelsu@microsoft.com EMail: mailto:michelsu@microsoft.com
URI: http://www.suignard.com URI: http://www.suignard.com
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of Internet organizations, except as needed for the purpose of
 End of changes. 

This html diff was produced by rfcdiff 1.12, available from http://www.levkowetz.com/ietf/tools/rfcdiff/