| draft-duerst-iri-01.txt | draft-duerst-iri-02.txt | |||
|---|---|---|---|---|
| | ||||
| Network Working Group M. Duerst | Network Working Group M. Duerst | |||
| Internet-Draft W3C/Keio University | Internet-Draft W3C | |||
| Expires: December 30, 2002 M. Suignard | Expires: May 4, 2003 M. Suignard | |||
| Microsoft Corporation | Microsoft Corporation | |||
| July 1, 2002 | November 3, 2002 | |||
| Internationalized Resource Identifiers (IRI) | Internationalized Resource Identifiers (IRIs) | |||
| draft-duerst-iri-01 | draft-duerst-iri-02 | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
| Drafts. | Drafts. | |||
| skipping to change at page 1, line 33 | skipping to change at page 1, line 34 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at http:// | The list of current Internet-Drafts can be accessed at http:// | |||
| www.ietf.org/ietf/1id-abstracts.txt. | www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on December 30, 2002. | This Internet-Draft will expire on May 4, 2003. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2002). All Rights Reserved. | Copyright (C) The Internet Society (2002). All Rights Reserved. | |||
| Abstract | Abstract | |||
| This document defines a new protocol element, the Internationalized | This document defines a new protocol element, the Internationalized | |||
| Resource Identifier (IRI), as a complement to the URI [RFC2396]. An | Resource Identifier (IRI), as a complement to the URI [RFC2396]. An | |||
| IRI is a sequence of characters from the Universal Character Set | IRI is a sequence of characters from the Universal Character Set | |||
| skipping to change at page 2, line 13 | skipping to change at page 2, line 13 | |||
| distinction and to avoid incompatibilities with existing software. | distinction and to avoid incompatibilities with existing software. | |||
| Guidelines for the use and deployment of IRIs in various protocols, | Guidelines for the use and deployment of IRIs in various protocols, | |||
| formats, and software components that now deal with URIs are | formats, and software components that now deal with URIs are | |||
| provided. | provided. | |||
| NOTE | NOTE | |||
| This document is a product of the Internationalization Working Group | This document is a product of the Internationalization Working Group | |||
| (I18N WG) of the World Wide Web Consortium (W3C). For general | (I18N WG) of the World Wide Web Consortium (W3C). For general | |||
| discussion, please use the www-i18n-comments@w3.org mailing list | discussion, please use the www-international@w3.org mailing list | |||
| (publicly archived at http://lists.w3.org/Archives/Public/www-i18n- | (publicly archived at http://lists.w3.org/Archives/Public/www- | |||
| comments/). For more information on the topic of this document, | international/). For more information on the topic of this document, | |||
| please also see [W3CIRI] and [Duer01]. | please also see [W3CIRI] and [Duer01]. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1 Overview and Motivation . . . . . . . . . . . . . . . . . . . 4 | 1.1 Overview and Motivation . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.2 Applicability . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | ||||
| 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2. IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . . . 6 | 2.1 Summary of IRI Syntax . . . . . . . . . . . . . . . . . . . . 7 | |||
| 2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . . . 6 | 2.2 ABNF for IRI References and IRIs . . . . . . . . . . . . . . . 7 | |||
| 2.3 IRI Equivalence and Normalization . . . . . . . . . . . . . . 9 | 2.3 IRI Equivalence and Normalization . . . . . . . . . . . . . . 10 | |||
| 3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 10 | 3. Relationship between IRIs and URIs . . . . . . . . . . . . . . 12 | |||
| 3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . . . 11 | 3.1 Mapping of IRIs to URIs . . . . . . . . . . . . . . . . . . . 12 | |||
| 3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . . . 12 | 3.2 Converting URIs to IRIs . . . . . . . . . . . . . . . . . . . 14 | |||
| 4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 13 | 4. Bidirectional IRIs for Right-to-left Languages . . . . . . . . 15 | |||
| 4.1 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . . 14 | 4.1 Logical Storage and Visual Presentation . . . . . . . . . . . 15 | |||
| 4.2 Visual Rendering of Bidi IRIs . . . . . . . . . . . . . . . . 14 | 4.2 Bidi IRI Structure . . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . . 15 | 4.3 Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 5. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 5.1 Limitations on UCS Character Allowed in IRI . . . . . . . . . 15 | 5. Use of IRIs . . . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 5.2 Software Interfaces and Protocols . . . . . . . . . . . . . . 16 | 5.1 Limitations on UCS Characters Allowed in IRIs . . . . . . . . 19 | |||
| 5.3 Format of URIs and IRIs in Documents and Protocols . . . . . . 17 | 5.2 Software Interfaces and Protocols . . . . . . . . . . . . . . 20 | |||
| 5.4 Relative IRI References . . . . . . . . . . . . . . . . . . . 17 | 5.3 Format of URIs and IRIs in Documents and Protocols . . . . . . 20 | |||
| 6. URI/IRI Processing Guidelines (informative) . . . . . . . . . 17 | 5.4 Relative IRI References . . . . . . . . . . . . . . . . . . . 21 | |||
| 6.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . . . 18 | 6. URI/IRI Processing Guidelines (informative) . . . . . . . . . 21 | |||
| 6.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . . . 18 | 6.1 URI/IRI Software Interfaces . . . . . . . . . . . . . . . . . 21 | |||
| 6.3 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . . . 19 | 6.2 URI/IRI Entry . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 6.4 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . . . 19 | 6.3 URI/IRI Transfer Between Applications . . . . . . . . . . . . 22 | |||
| 6.5 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . . . 20 | 6.4 URI/IRI Generation . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 6.6 Interpretation of URIs and IRIs . . . . . . . . . . . . . . . 20 | 6.5 URI/IRI Selection . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 6.7 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . . . 21 | 6.6 Display of URIs/IRIs . . . . . . . . . . . . . . . . . . . . . 24 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 21 | 6.7 Interpretation of URIs and IRIs . . . . . . . . . . . . . . . 24 | |||
| 8. Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 6.8 Upgrading Strategy . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 9. Acknowlegdements . . . . . . . . . . . . . . . . . . . . . . . 23 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | |||
| References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | 8. Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 26 | 8.1 Changes from -01 to -02 . . . . . . . . . . . . . . . . . . . 27 | |||
| Full Copyright Statement . . . . . . . . . . . . . . . . . . . 27 | 8.2 Changes from -00 to -01 . . . . . . . . . . . . . . . . . . . 27 | |||
| 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 | ||||
| Normative References . . . . . . . . . . . . . . . . . . . . . 28 | ||||
| Non-normative References . . . . . . . . . . . . . . . . . . . 29 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 31 | ||||
| Full Copyright Statement . . . . . . . . . . . . . . . . . . . 32 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1 Overview and Motivation | 1.1 Overview and Motivation | |||
| A URI is defined in [RFC2396] as a sequence of characters chosen from | A URI is defined in [RFC2396] as a sequence of characters chosen from | |||
| a limited subset of the repertoire of US-ASCII characters. | a limited subset of the repertoire of US-ASCII characters. | |||
| The characters in URIs are frequently used for representing words of | The characters in URIs are frequently used for representing words of | |||
| natural languages. Such usage has many advantages: such URIs are | natural languages. Such usage has many advantages: such URIs are | |||
| skipping to change at page 4, line 49 | skipping to change at page 4, line 49 | |||
| 1.2 Applicability | 1.2 Applicability | |||
| IRIs are designed to be compatible with recent recommendations on URI | IRIs are designed to be compatible with recent recommendations on URI | |||
| syntax [RFC2718]. The compatibility is provided by providing a well | syntax [RFC2718]. The compatibility is provided by providing a well | |||
| defined and deterministic mapping from the IRI character sequence to | defined and deterministic mapping from the IRI character sequence to | |||
| the functionally equivalent URI character sequence. Practical use of | the functionally equivalent URI character sequence. Practical use of | |||
| IRIs (or IRI references) in place of URIs (or URI references) depends | IRIs (or IRI references) in place of URIs (or URI references) depends | |||
| on the following conditions being met: | on the following conditions being met: | |||
| a. The protocol or format element used should be explicitly | a) The protocol or format element used should be explicitly | |||
| designated to carry IRIs. That is, the intent is not to | designated to carry IRIs. That is, the intent is not to | |||
| introduce IRIs into contexts that are not defined to accept | introduce IRIs into contexts that are not defined to accept | |||
| them. For example, XML schema [XMLSchema] has an explicit type | them. For example, XML schema [XMLSchema] has an explicit type | |||
| "anyURI" that designates the use of IRIs. | "anyURI" that designates the use of IRIs. | |||
| b. The protocol or format carrying the IRIs must have a mechanism | b) The protocol or format carrying the IRIs should have a | |||
| to represent the wide range of characters used in IRIs, either | mechanism to represent the wide range of characters used in | |||
| natively or by some protocol- or format-specific escaping | IRIs, either natively or by some protocol- or format-specific | |||
| mechanism (for example numeric character references in [XML1]). | escaping mechanism (for example numeric character references in | |||
| [XML1]). | ||||
| c. Either by definition for all the URIs of a specific URI | c) Either by definition for all the URIs of a specific URI scheme, | |||
| scheme, or at least for some specific URIs, the encoding of | or a specific part of a URI (Reference), such as the fragment | |||
| non-ASCII characters has to be based on UTF-8. For new URI | identifier, or at least for some specific URIs of a given | |||
| schemes, this is recommended in [RFC2718]. This allows IRIs to | scheme, the encoding of non-ASCII characters should be based on | |||
| be used with the URN syntax [RFC2141] as well as recent URL | UTF-8. For new URI schemes, this is recommended in [RFC2718]. | |||
| scheme definitions based on UTF-8, such as IMAP URLs [RFC2192] | This allows IRIs to be used with the URN syntax [RFC2141] as | |||
| and POP URLs [RFC2384]. This condition may also apply to only | well as recent URL scheme definitions based on UTF-8, such as | |||
| a piece of a URI (reference), such as the fragment identifier. | IMAP URLs [RFC2192] and POP URLs [RFC2384]. | |||
| In cases and for pieces where an encoding other than UTF-8 is used, | In cases and for pieces where an encoding other than UTF-8 is used, | |||
| and for raw binary data encoded in URIs (see [RFC2397]), the octets | and for raw binary data encoded in URIs (see [RFC2397]), the octets | |||
| have to be %-escaped. In these situations, the ability of IRIs to | have to be %-escaped. In these situations, the ability of IRIs to | |||
| directly represent a wide character repertoire cannot be used. | directly represent a wide character repertoire cannot be used. | |||
| For example, for a document with a URI of http://www.example.org/ | ||||
| r%C3%A9sum%C3%A9.html, it is possible to construct a corresponding | ||||
| IRI (in XML notation): http://www.example.org/résumé.html | ||||
| (é stands for the e-acute character, and is the UTF-8 encoded | ||||
| and escaped representation of that character). On the other hand, | ||||
| for a document with an URI of http://www.example.org/r%e9sum%e9.html, | ||||
| the escaped octets cannot be converted to actual characters in an | ||||
| IRI, because the escaping is based on iso-8859-1 rather than UTF-8. | ||||
| 1.3 Definitions | 1.3 Definitions | |||
| The following definitions are used in this document; they follow the | The following definitions are used in this document; they follow the | |||
| terms in [RFC2130], [RFC2277] and [ISO10646]: | terms in [RFC2130], [RFC2277] and [ISO10646]: | |||
| character: A member of a set of elements used for the | character: A member of a set of elements used for the | |||
| organization, control, or representation of data. For example, | organization, control, or representation of data. For example, | |||
| "LATIN CAPITAL LETTER A" names a character. | "LATIN CAPITAL LETTER A" names a character. | |||
| octet: an ordered sequence of eight bits considered as a unit | octet: an ordered sequence of eight bits considered as a unit | |||
| skipping to change at page 6, line 8 | skipping to change at page 6, line 18 | |||
| method of (unambiguously) converting a sequence of octets into | method of (unambiguously) converting a sequence of octets into | |||
| a sequence of characters. | a sequence of characters. | |||
| code point: A placeholder for a character in a character encoding, | code point: A placeholder for a character in a character encoding, | |||
| for example to encode additional characters in future versions | for example to encode additional characters in future versions | |||
| of the character encoding. | of the character encoding. | |||
| charset: The name of a parameter or attribute used to identify a | charset: The name of a parameter or attribute used to identify a | |||
| character encoding. | character encoding. | |||
| UCS: Universal Character Set; the coded character set defined by | ||||
| [ISO10646] and [UNIV3]. | ||||
| IRI reference: The term "IRI reference" denotes the common usage | ||||
| of an internationalized resource identifier. An IRI reference | ||||
| may be absolute or relative, and may have additional | ||||
| information attached in the form of a fragement identifier. | ||||
| However, the "IRI" that results from such a reference only | ||||
| includes the absolute IRI after fragment identifier (if any) is | ||||
| removed and after any relative IRI is resolved to its absolute | ||||
| form. | ||||
| 1.4 Notation | ||||
| In text, characters outside US-ASCII are sometimes referenced by | ||||
| using a prefix of 'U+', followed by four to six hexadecimal digits. | ||||
| To represent characters outside US-ASCII in examples, this document | ||||
| uses two notations called 'XML Notation' and 'Bidi Notation'. | ||||
| XML Notation uses leading '&#x', trailing ';', and the hexadecimal | ||||
| number of the character in the UCS in between. Example: Я stands | ||||
| for CYRILLIC CAPITAL LETTER YA. In this notation, an actual '&' is | ||||
| denoted by '&'. | ||||
| Bidi Notation is used for bidirectional examples: lower case ASCII | ||||
| letters stand for Latin letters or other letters that are written | ||||
| left-to-right, whereas upper case letters represent Arabic or Hebrew | ||||
| letters that are written right-to-left. | ||||
| 2. IRI Syntax | 2. IRI Syntax | |||
| This section defines the syntax of Internationalized Resource | This section defines the syntax of Internationalized Resource | |||
| Identifiers (IRIs). | Identifiers (IRIs). | |||
| As with URIs, an IRI is defined as a sequence of characters, not as a | As with URIs, an IRI is defined as a sequence of characters, not as a | |||
| sequence of octets. This definition accommodates the fact that IRIs | sequence of octets. This definition accommodates the fact that IRIs | |||
| may be written on paper or read over the radio as well as being | may be written on paper or read over the radio as well as being | |||
| transmitted over the network. The same IRI may be represented as | transmitted over the network. The same IRI may be represented as | |||
| different sequences of octets in different protocols or documents if | different sequences of octets in different protocols or documents if | |||
| these protocols or documents use different character encodings and/or | these protocols or documents use different character encodings (and/ | |||
| transfer encodings. Using the same character encoding as the | or transfer encodings). Using the same character encoding as the | |||
| containing protocol or document assures that the characters in the | containing protocol or document assures that the characters in the | |||
| IRI can be handled (searched, converted, displayed,...) in the same | IRI can be handled (searched, converted, displayed,...) in the same | |||
| way as the rest of the protocol or document. | way as the rest of the protocol or document. | |||
| 2.1 Summary of IRI Syntax | 2.1 Summary of IRI Syntax | |||
| IRIs are defined similarly to URIs in [RFC2396] (as modified by | IRIs are defined similarly to URIs in [RFC2396] (as modified by | |||
| [RFC2732] and [IDNURI]), but the class of unreserved characters is | [RFC2732] and [IDNURI]), but the class of unreserved characters is | |||
| extended by adding all the characters of the UCS (Universal Character | extended by adding the characters of the UCS (Universal Character | |||
| Set, [ISO10646]) beyond U+0080, subject to the limitations given in | Set, [ISO10646]) beyond U+0080, subject to the limitations given in | |||
| Section 5.1. | the syntax rules below and in Section 5.1. | |||
| Otherwise, the syntax and use of components and reserved characters | Otherwise, the syntax and use of components and reserved characters | |||
| is the same as that in [RFC2396]. All the operations defined in | is the same as that in [RFC2396]. All the operations defined in | |||
| [RFC2396], such as the resolution of relative URIs, can be applied to | [RFC2396], such as the resolution of relative URIs, can be applied to | |||
| IRIs by IRI-processing software in exactly the same way as this is | IRIs by IRI-processing software in exactly the same way as this is | |||
| done to URIs by URI-processing software. | done to URIs by URI-processing software. | |||
| Note: [RFC2396]: Uniform Resource Identifiers (URI): Generic Syntax" | ||||
| is being revised as [RFC2396bis]. The syntax used in this document | ||||
| includes bug fixes from [RFC2396bis]. | ||||
| Characters outside the US-ASCII range MUST NOT be used for | Characters outside the US-ASCII range MUST NOT be used for | |||
| syntactical purposes such as to delimit components in newly defined | syntactical purposes such as to delimit components in newly defined | |||
| schemes. As an example, it is not allowed to use U+00A2, CENT SIGN, | schemes. As an example, it is not allowed to use U+00A2, CENT SIGN, | |||
| as a delimiter, because it is in the 'iunreserved' category, in the | as a delimiter in IRIs, because it is in the 'iunreserved' category, | |||
| same way as it is not possible to use '-' as a delimiter, because it | in the same way as it is not possible to use '-' as a delimiter, | |||
| is in the 'unreserved' category. | because it is in the 'unreserved' category in URIs. | |||
| 2.2 ABNF for IRI References and IRIs | 2.2 ABNF for IRI References and IRIs | |||
| While it might be possible to define IRI references and IRIs merely | While it might be possible to define IRI references and IRIs merely | |||
| by their transformation to URIs, they can also be accepted and | by their transformation to URI references and URIs, they can also be | |||
| processed directly. Therefore, an ABNF definition for IRI references | accepted and processed directly. Therefore, an ABNF definition for | |||
| (which are the most general concept and the start of the grammar) and | IRI references (which are the most general concept and the start of | |||
| IRIs is given here. | the grammar) and IRIs is given here. The syntax of this ABNF is | |||
| described in [RFC2234]. Character numbers are taken from the UCS, | ||||
| without implying any actual binary encoding. | ||||
| The following rules are different from [RFC2396]: | The following rules are different from [RFC2396]: | |||
| IRI-reference = [ absoluteIRI | relativeIRI ] [ "#" ifragment ] | absolute-IRI-reference = absolute-IRI [ "#" ifragment ] | |||
| absoluteIRI = scheme ":" ( ihier_part | iopaque_part ) | ||||
| relativeIRI = ( inet_path | iabs_path | irel_path ) | IRI-reference = [ absolute-IRI / relative-IRI ] | |||
| [ "#" ifragment ] | ||||
| absolute-IRI = scheme ":" ( ihier-part / iopaque-part ) | ||||
| relative-IRI = [ inet-path / iabs-path / irel-path ] | ||||
| [ "?" iquery ] | [ "?" iquery ] | |||
| ihier_part = ( inet_path | iabs_path ) [ "?" iquery ] | ||||
| iopaque_part = iric_no_slash *iric | ihier-part = [ inet-path / iabs-path ] [ "?" iquery ] | |||
| iric_no_slash = iunreserved | escaped | ";" | "?" | ":" | "@" | | iopaque-part = iric-no-slash *iric | |||
| "&" | "=" | "+" | "$" | "," | ||||
| inet_path = "//" iauthority [ iabs_path ] | iric-no-slash = iunreserved / escaped / "[" / "]" / ";" / "?" / | |||
| iabs_path = "/" ipath_segments | ":" / "@" / "&" / "=" / "+" / "$" / "," | |||
| irel_path = irel_segment [ iabs_path ] | ||||
| irel_segment = 1*( iunreserved | escaped | | inet-path = "//" iauthority [ iabs-path ] | |||
| ";" | "@" | "&" | "=" | "+" | "$" | "," ) | iabs-path = "/" ipath-segments | |||
| iauthority = iserver | ireg_name | irel-path = irel-segment [ iabs-path ] | |||
| ireg_name = 1*( iunreserved | escaped | "$" | "," | | ||||
| ";" | ":" | "@" | "&" | "=" | "+" ) | irel-segment = 1*( iunreserved / escaped / ";" / | |||
| iserver = [ [ userinfo "@" ] ihostport ] | "@" / "&" / "=" / "+" / "$" / "," ) | |||
| iuserinfo = *( iunreserved | escaped | | ||||
| ";" | ":" | "&" | "=" | "+" | "$" | "," ) | iauthority = iserver / ireg-name | |||
| ireg-name = 1*( iunreserved / escaped / ";" / | ||||
| ":" / "@" / "&" / "=" / "+" / "$" / "," ) | ||||
| iserver = [ [ iuserinfo "@" ] ihostport ] | ||||
| iuserinfo = *( iunreserved / escaped / ";" / | ||||
| ":" / "&" / "=" / "+" / "$" / "," ) | ||||
| ihostport = ihost [ ":" port ] | ihostport = ihost [ ":" port ] | |||
| ihost = ihostname | IPv4address | IPv6reference | ihost = IPv6reference / IPv4address / ihostname | |||
| ihostname = << as specified by [IDNA] >> | ||||
| ipath_segments = isegment *( "/" isegment ) | ihostname = << as specified by [RFCXXXX] >> | |||
| isegment = *ipchar *( ";" iparam ) | ||||
| iparam = *ipchar | ipath = [ iabs-path / iopaque-part ] | |||
| ipchar = iunreserved | escaped | | ipath-segments = isegment *( "/" isegment ) | |||
| ":" | "@" | "&" | "=" | "+" | "$" | "," | isegment = *ipchar | |||
| iquery = *iric | ||||
| ifragment = *iric | ipchar = iunreserved / escaped / ";" / | |||
| iric = reserved | iunreserved | escaped | ":" / "@" / "&" / "=" / "+" / "$" / "," | |||
| iunreserved = ichar | unreserved | iquery = *( ipchar / "/" / "?" ) | |||
| ichar = << allowed character of the UCS [ISO10646] >> | space | delims | unwise | ifragment = *( ipchar / "/" / "?" ) | |||
| iric = reserved / iunreserved / escaped | ||||
| iunreserved = ichar / unreserved | ||||
| ichar = idelims / ucschar / " " / "{" / "}" / "|" | ||||
| / "\" / "^" / "`" | ||||
| idelims = "<" / ">" / DQUOTE | ||||
| ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / | ||||
| / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD | ||||
| / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD | ||||
| / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD | ||||
| / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD | ||||
| / %xD0000-DFFFD / %xE1000-EFFFD | ||||
| Note that the space character and various delimiters are allowed in | Note that the space character and various delimiters are allowed in | |||
| IRIs and IRI references. This is further discussed in Section 5.1. | IRIs and IRI references. This is further discussed in Section 5.1. | |||
| The following describe the allowed characters of the UCS [ISO10646] | The following are the same as [RFC2396bis]: | |||
| using the UCS-4 encoding notation for these characters: | ||||
| U+00A0-U+D7FF | scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) | |||
| U+F900-U+FDCF | port = *DIGIT | |||
| U+FDF0-U+FFEF | alphanum = ALPHA / DIGIT | |||
| U+10000-U+1FFFD | ||||
| U+20000-U+2FFFD | IPv4address = dec-octet 3( "." dec-octet ) | |||
| U+30000-U+3FFFD | dec-octet = DIGIT / ; 0-9 | |||
| U+40000-U+4FFFD | ( %x31-39 DIGIT ) / ; 10-99 | |||
| U+50000-U+5FFFD | ( "1" 2*DIGIT ) / ; 100-199 | |||
| U+60000-U+6FFFD | ( "2" %x30-34 DIGIT ) / ; 200-249 | |||
| U+70000-U+7FFFD | ( "25" %x30-35 ) ; 250-255 | |||
| U+80000-U+8FFFD | ||||
| U+90000-U+9FFFD | ||||
| U+A0000-U+AFFFD | ||||
| U+B0000-U+BFFFD | ||||
| U+C0000-U+CFFFD | ||||
| U+D0000-U+DFFFD | ||||
| U+E1000-U+EFFFD | ||||
| The following are the same as [RFC2396] as modified by [RFC2732]: | ||||
| reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | | ||||
| "$" | "," | "[" | "]" | ||||
| unreserved = alphanum | mark | ||||
| mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | | ||||
| "(" | ")" | ||||
| escaped = "%" hex hex | ||||
| hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | | ||||
| "a" | "b" | "c" | "d" | "e" | "f" | ||||
| IPv6reference = "[" IPv6address "]" | IPv6reference = "[" IPv6address "]" | |||
| IPv6address = hexpart [ ":" IPv4address ] | IPv6address = ( 7( h4 ":" ) h4 ) / | |||
| IPv4address = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT | ( "::" 0*6( h4 ":" ) [ h4 ] ) / | |||
| hexpart = hexseq | hexseq "::" [ hexseq ] | "::" | ( h4 "::" 0*5( h4 ":" ) [ h4 ] ) / | |||
| [ hexseq ] | ( h4 ":" h4 "::" 0*4( h4 ":" ) [ h4 ] ) / | |||
| hexseq = hex4 *( ":" hex4) | ( h4 2( ":" h4 ) "::" 0*3( h4 ":" ) [ h4 ] ) / | |||
| hex4 = 1*4hex | ( h4 3( ":" h4 ) "::" 0*2( h4 ":" ) [ h4 ] ) / | |||
| port = *DIGIT | ( h4 4( ":" h4 ) "::" 0*1( h4 ":" ) [ h4 ] ) / | |||
| scheme = alpha *( alpha | digit | "+" | "-" | "." ) | ( 6( h4 ":" ) IPv4address )/ | |||
| alphanum = alpha | digit | ( "::" 0*5( h4 ":" ) IPv4address )/ | |||
| alpha = lowalpha | upalpha | ( h4 "::" 0*4( h4 ":" ) IPv4address )/ | |||
| lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | | ( h4 ":" h4 "::" 0*3( h4 ":" ) IPv4address )/ | |||
| "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | | ( h4 2( ":" h4 ) "::" 0*2( h4 ":" ) IPv4address )/ | |||
| "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" | ( h4 3( ":" h4 ) "::" 0*1( h4 ":" ) IPv4address ) | |||
| upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | | ||||
| "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | | h4 = 1*4HEXDIG | |||
| "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | reserved = "[" / "]" / ";" / "/" / "?" / | |||
| digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | | ":" / "@" / "&" / "=" / "+" / "$" / "," / | |||
| "8" | "9" | unreserved = ALPHA / DIGIT / mark | |||
| space = << US-ASCII coded character 20 hexadecimal >> | mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / | |||
| delims = "<" | ">" | "#" | "%" | <"> | "(" / ")" | |||
| unwise = "{" | "}" | "|" | "\" | "^" | "`" | ||||
| escaped = "%" HEXDIG HEXDIG | ||||
| 2.3 IRI Equivalence and Normalization | 2.3 IRI Equivalence and Normalization | |||
| There is no general rule or procedure to decide whether two arbitrary | There is no general rule or procedure to decide whether two arbitrary | |||
| IRIs are equivalent or not (i.e. refer to the same resource or not). | IRIs are equivalent or not (i.e. refer to the same resource or not). | |||
| Two IRIs that look almost the same may refer to different resources. | Two IRIs that look almost the same may refer to different resources. | |||
| Two IRIs that look completely different may refer to, and resolve to, | Two IRIs that look completely different may refer to, and resolve to, | |||
| the same resource. | the same resource. | |||
| In some scenarios, such as XML Namespaces ([XMLNamespace]), a | In some scenarios, such as XML Namespaces ([XMLNamespace]), a | |||
| definite answer to the question of IRI equivalence is needed that is | definite answer to the question of IRI equivalence is needed that is | |||
| independent of the scheme used and always can be calculated quickly | independent of the scheme used and always can be calculated quickly | |||
| and without accessing a network. In such cases, two IRIs SHOULD be | and without accessing a network. In such cases, two IRIs SHOULD be | |||
| defined as equivalent if and only if they are character-by-character | defined as equivalent if and only if they are character-by-character | |||
| equivalent (which is the same as byte-by-byte equivalent if the | equivalent. This is the same as being byte-by-byte equivalent if the | |||
| character encoding for both IRIs is the same). In such a case, the | character encoding for both IRIs is the same. As an example, | |||
| comparison function MUST NOT map the IRIs to URIs. | http://example.org/~user, http://example.org/%7euser, and | |||
| http://example.org/%7Euser would not be equivalent. In such a case, | ||||
| the comparison function MUST NOT map the IRIs to URIs. | ||||
| It follows from the above that IRIs SHOULD NOT be modified when being | It follows from the above that IRIs SHOULD NOT be modified when being | |||
| transported. | transported. | |||
| For actual resolution, differences in escaping (except for the | For actual resolution, differences in escaping (except for the | |||
| escaping of reserved characters) MUST always result in the same | escaping of reserved characters) MUST always result in the same | |||
| resource. For example, foo://example.com/XML, foo://example.com/ | resource. For example, http://example.org/~user, | |||
| XM%4C, and foo://example.com/XM%4c must resolve to the same resource. | http://example.org/%7euser and http://example.org/%7Euser must | |||
| If this kind of equivalence is to be tested, the escaping of both | resolve to the same resource. If this kind of equivalence is to be | |||
| IRIs to be compared has to be aligned, for example by converting both | tested, the escaping of both IRIs to be compared has to be aligned, | |||
| IRIs to URIs (see Section 3.1) and making sure that the case of the | for example by converting both IRIs to URIs (see Section 3.1) and | |||
| hexadecimal characters in the %-escape is always the same. Such | making sure that the case of the hexadecimal characters in the %- | |||
| conversions MUST only be done on the fly, without changing the | escape is always the same. Such conversions MUST only be done on the | |||
| original IRI. | fly, without changing the original IRI. | |||
| Specific schemes and resolution mechanisms may define additional | Specific schemes and resolution mechanisms may define additional | |||
| equivalences. For a specific scheme, two IRIs that e.g. differ only | equivalences. For a specific scheme, two IRIs that e.g. differ only | |||
| by case may be equivalent. However, this document does not deal with | by case may be equivalent. However, this document does not deal with | |||
| scheme-specific issues. | scheme-specific issues. | |||
| The Unicode Standard [UNIV3] defines various equivalences between | The Unicode Standard [UNIV3] defines various equivalences between | |||
| sequences of characters for various purposes. Unicode Standard Annex | sequences of characters for various purposes. Unicode Standard Annex | |||
| #15 [UNI15] defines various Normalization Forms for these | #15 [UNI15] defines various Normalization Forms for these | |||
| equivalences. IRIs SHOULD be created using the Normalization Form C | equivalences. IRIs SHOULD be created using Normalization Form C | |||
| (NFC). When an IRI is created in an UCS-based encoding without the | (NFC). Equivalence of IRIs MUST rely on the IRIs being appropriately | |||
| end-user being aware of or interested in Unicode normalization | pre-normalized, rather than applying normalization, except when | |||
| issues, the IRI MUST be created using the normalization form NFC. | ||||
| Equivalence of IRIs MUST rely on the IRIs being appropriately pre- | ||||
| normalized, rather than applying normalization, except when | ||||
| converting from a non-UCS-based encoding to an UCS-based encoding, | converting from a non-UCS-based encoding to an UCS-based encoding, | |||
| where a normalizing transcoder using NFC MUST be used. | where a normalizing transcoder using NFC MUST be used. | |||
| As an example, http://www.example.org/résumé.html (in XML | ||||
| Notation) is in NFC. On the other hand, http://www.example.org/ | ||||
| résumé.html is not in NFC. The former uses precombined | ||||
| e-acute characters, the later uses 'e' characters followed by | ||||
| combining acute accents, both are defined as canonically equivalent | ||||
| in [UNIV3]. | ||||
| Various IRI schemes may allow the usage of International Domain Names | Various IRI schemes may allow the usage of International Domain Names | |||
| (IDN) [IDNA]. When in use in IRIs, those names SHOULD be validated | (IDN) [RFCXXXX]. When in use in IRIs, those names SHOULD be | |||
| using the rules defined by [Nameprep]. An IRI containing an invalid | validated using the ToASCII operation defined in [RFCXXXX], with the | |||
| IDN cannot successfully be resolved. For legibility purposes, IDN | flags "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing | |||
| components of IRIs SHOULD not be converted into ASCII Compatible | an invalid IDN cannot successfully be resolved. For legibility | |||
| Encoding (ACE). However, this conversion may be applied when mapping | purposes, IDN components of IRIs SHOULD not be converted into ASCII | |||
| an IRI into an URI, see Section 3.1. | Compatible Encoding (ACE). However, this conversion may be applied | |||
| when mapping an IRI into an URI, see Section 3.1. | ||||
| 3. Relationship between IRIs and URIs | 3. Relationship between IRIs and URIs | |||
| IRIs are meant to replace URIs in identifying resources for | IRIs are meant to replace URIs in identifying resources for | |||
| protocols, formats and software components which use a UCS-based | protocols, formats and software components which use a UCS-based | |||
| character repertoire. These protocols and components may never need | character repertoire. These protocols and components may never need | |||
| to use URIs directly, especially when the resource identifier is used | to use URIs directly, especially when the resource identifier is used | |||
| simply for identification purposes. However, when the resource | simply for identification purposes. However, when the resource | |||
| identifier is used for resource retrieval, it is in many cases | identifier is used for resource retrieval, it is in many cases | |||
| necessary to determine the associated URI because most retrieval | necessary to determine the associated URI because most retrieval | |||
| skipping to change at page 11, line 27 | skipping to change at page 12, line 38 | |||
| a) Syntactical: Many URI schemes and components define additional | a) Syntactical: Many URI schemes and components define additional | |||
| syntactical restrictions not captured in Section 2.2. Such | syntactical restrictions not captured in Section 2.2. Such | |||
| restrictions can be applied to IRIs by noting that IRIs are | restrictions can be applied to IRIs by noting that IRIs are | |||
| only valid if they map to syntactically valid URIs. This means | only valid if they map to syntactically valid URIs. This means | |||
| that such syntactical restrictions do not have to be defined | that such syntactical restrictions do not have to be defined | |||
| again on the IRI level. | again on the IRI level. | |||
| b) Interpretational: URIs identify resources in various ways. | b) Interpretational: URIs identify resources in various ways. | |||
| IRIs also identify resources. When the IRI is used simply for | IRIs also identify resources. When the IRI is used simply for | |||
| indentification purposes, it is not necessary to map the IRI to | identification purposes, it is not necessary to map the IRI to | |||
| an URI (see Section 2.3). However, when an IRI is used for | an URI (see Section 2.3). However, when an IRI is used for | |||
| resource retrieval, the resource that the IRI locates is the | resource retrieval, the resource that the IRI locates is the | |||
| same as the one located by the URI obtained after converting | same as the one located by the URI obtained after converting | |||
| the IRI according to the procedure defined here. This means | the IRI according to the procedure defined here. This means | |||
| that there is no need to define resolution again on the IRI | that there is no need to define resolution separately on the | |||
| level. | IRI level. | |||
| This mapping is accomplished in two steps. | This mapping is accomplished in two steps. | |||
| Step 1) This step generates a UCS-based encoding from the original | Step 1) This step generates a UCS-based encoding from the original | |||
| IRI format. This step has three variants, depending on the | IRI format. This step has three variants, depending on the | |||
| form of the input. | form of the input. | |||
| Variant A) If the IRI is written on paper or read out loud, | Variant A) If the IRI is written on paper or read out loud, | |||
| or otherwise represented as a sequence of characters | or otherwise represented as a sequence of characters | |||
| independent of any encoding: Represent the IRI as a | independent of any encoding: Represent the IRI as a | |||
| skipping to change at page 12, line 19 | skipping to change at page 13, line 30 | |||
| Step 2) For each character that is disallowed in URI references, | Step 2) For each character that is disallowed in URI references, | |||
| apply steps 1) through 3) below. The disallowed characters | apply steps 1) through 3) below. The disallowed characters | |||
| consist of all non-ASCII characters, plus the excluded | consist of all non-ASCII characters, plus the excluded | |||
| characters listed in Section 2.4 of [RFC2396], except for the | characters listed in Section 2.4 of [RFC2396], except for the | |||
| number sign (#) and percent sign (%) and the square bracket | number sign (#) and percent sign (%) and the square bracket | |||
| characters re-allowed in [RFC2732]. | characters re-allowed in [RFC2732]. | |||
| 1) Convert the character to a sequence of one or more octets | 1) Convert the character to a sequence of one or more octets | |||
| using UTF-8 [RFC2279]. | using UTF-8 [RFC2279]. | |||
| 2) Convert each octet to %HH, where HH is the hexadecimal | 2) Convert each octet to %hh, where hh is the hexadecimal | |||
| notation of the octet value. Note: This is identical to | notation of the octet value. Note: This is identical to | |||
| the escaping mechanism in Section 2.4.1 of [RFC2396]. | the escaping mechanism in Section 2.4.1 of [RFC2396]. | |||
| Note: To reduce variability, the hexadecimal notation | ||||
| SHOULD use lower case letters. | ||||
| 3) Replace the original character by the resulting character | 3) Replace the original character by the resulting character | |||
| sequence. | sequence. | |||
| Note that in this process (in step 2.3), characters allowed in URI | Note that in this process (in step 2.3), characters allowed in URI | |||
| references and existing escape sequences are not escaped further. | references and existing escape sequences are not escaped further. | |||
| (This mapping is similar to, but different from, the escaping applied | (This mapping is similar to, but different from, the escaping applied | |||
| when including arbitrary content into some part of a URI.) | when including arbitrary content into some part of a URI.) For | |||
| example, an IRI of | ||||
| http://www.example.org/red%09rosé#<red> (in XML notation) is | ||||
| converted to | ||||
| http://www.example.org/red%09ros%c3%a9#%3cred%3e, not to something | ||||
| like | ||||
| http%3a%2f%2fwww.example.org%2fred%2509ros%c3%a9%23red. | ||||
| Note that some older software transcoding to UTF-8 may produce | ||||
| illegal output for some input, in particular for characters outside | ||||
| the BMP (Basic Multilingual Plane). As an example, for the following | ||||
| IRI with non-BMP characters (in XML Notation): | ||||
| http://example.com/ | ||||
| (the first three letters of the Old Italic alphabet) the correct | ||||
| conversion to a URI is: | ||||
| http://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C%82 | ||||
| The above mapping produces a URI fully conforming to [RFC2396] (as | The above mapping produces a URI fully conforming to [RFC2396] (as | |||
| amended by [RFC2732] and [IDNURI]) out of each IRI. The mapping is | amended by [RFC2732] and [IDNURI]) out of each IRI. The mapping is | |||
| also an identity transformation for URIs and is idempotent -- | also an identity transformation for URIs and is idempotent -- | |||
| applying the mapping a second time will not change anything. Every | applying the mapping a second time will not change anything. Every | |||
| URI is therefore by definition an IRI. | URI is therefore by definition an IRI. | |||
| Note: For backwards compatibility with infrastructure that does not | Note: For backwards compatibility with infrastructure that does not | |||
| implement the updates of [IDNURI], converters MAY also convert the | implement the updates of [IDNURI], converters MAY also convert the | |||
| 'ihostname' part of an IRI using the ToASCII operation specified in | 'ihostname' part of an IRI using the ToASCII operation specified in | |||
| Section 4.1 of [IDNA] between Step 1 and Step 2. Note that the | Section 4.1 of [RFCXXXX] between Step 1 and Step 2. Note that the | |||
| ToASCII operation may fail. Note that Internationalized Domain Names | ToASCII operation may fail. Note that Internationalized Domain Names | |||
| may be contained in parts of an IRI other than the 'ihostname' part. | may be contained in parts of an IRI other than the 'ihostname' part. | |||
| 3.2 Converting URIs to IRIs | 3.2 Converting URIs to IRIs | |||
| In some situations, it may be desirable to try to convert a URI into | In some situations, it may be desirable to try to convert a URI into | |||
| an equivalent IRI. This section gives a procedure to do such a | an equivalent IRI. This section gives a procedure to do such a | |||
| conversion. The conversion described in this section will always | conversion. The conversion described in this section will always | |||
| give an IRI which maps back to the URI that was used as an input for | result in an IRI which maps back to the URI that was used as an input | |||
| the conversion, but perhaps not exactly the original IRI (if there | for the conversion (except for potential case differences in escape | |||
| ever was one). | sequences). However, the IRI resulting from this conversion may not | |||
| be exactly the same as the original IRI (if there ever was one). | ||||
| URI to IRI conversion removes escape sequences, but not all escaping | URI to IRI conversion removes escape sequences, but not all escaping | |||
| can be eliminated. There are many reasons for this: | can be eliminated. There are several reasons for this: | |||
| a. Some escape sequences are necessary to distinguish escaped and | a) Some escape sequences are necessary to distinguish escaped and | |||
| unescaped uses of reserved characters. | unescaped uses of reserved characters. | |||
| b. Some escape sequences cannot be interpreted as sequences of | b) Some escape sequences cannot be interpreted as sequences of | |||
| UTF-8 octets. | UTF-8 octets. | |||
| (Note: Due to the regularities in the octet patterns of UTF-8, | (Note: Due to the regularities in the octet patterns of UTF-8, | |||
| there is a very high probability, but no guarantee, that escape | there is a very high probability, but no guarantee, that escape | |||
| sequences that can be interpreted as sequences of UTF-8 octets | sequences that can be interpreted as sequences of UTF-8 octets | |||
| actually originated from UTF-8. For a detailed discussion, see | actually originated from UTF-8. For a detailed discussion, see | |||
| [Duer97].) | [Duer97].) | |||
| c. The conversion may result in a character that is not | c) The conversion may result in a character that is not | |||
| appropriate in an IRI. See Section 5.1 for further details. | appropriate in an IRI. See Section 5.1 for further details. | |||
| Conversion from a URI to an IRI is done using the following steps (or | Conversion from a URI to an IRI is done using the following steps (or | |||
| any other algorithm that produces the same result): | any other algorithm that produces the same result): | |||
| 1) Represent the URI as a sequence of octets in US-ASCII. | 1) Represent the URI as a sequence of octets in US-ASCII. | |||
| 2) Convert all hexadecimal escapes (% followed by two hexadecimal | 2) Convert all hexadecimal escapes (% followed by two hexadecimal | |||
| digits) of %80 and higher to the corresponding octets. | digits) except those corresponding to '#' and '%' and | |||
| characters in 'reserved', to the corresponding octets. | ||||
| 3) Re-escape any octets that are not part of a strictly legal UTF- | 3) Re-escape any octets that are not part of a strictly legal UTF- | |||
| 8 octet sequence. | 8 octet sequence. | |||
| 4) Re-escape all octets that in UTF-8 represent characters that | 4) Re-escape all octets that in UTF-8 represent characters that | |||
| are not appropriate according to Section 5.1. | are not appropriate according to Section 5.1. | |||
| 5) Interpret the resulting octet sequence as a sequence of | 5) Interpret the resulting octet sequence as a sequence of | |||
| characters encoded in UTF-8. | characters encoded in UTF-8. | |||
| skipping to change at page 14, line 5 | skipping to change at page 15, line 35 | |||
| 4. Bidirectional IRIs for Right-to-left Languages | 4. Bidirectional IRIs for Right-to-left Languages | |||
| Some UCS characters, such as those used in the Arabic and Hebrew | Some UCS characters, such as those used in the Arabic and Hebrew | |||
| script, have an inherent right-to-left writing direction. IRIs | script, have an inherent right-to-left writing direction. IRIs | |||
| containing such characters (called bidirectional IRIs or Bidi IRIs) | containing such characters (called bidirectional IRIs or Bidi IRIs) | |||
| require additional attention because of the non-trivial relation | require additional attention because of the non-trivial relation | |||
| between logical representation (used for digital representation as | between logical representation (used for digital representation as | |||
| well as when reading/spelling) and visual representation (used for | well as when reading/spelling) and visual representation (used for | |||
| display/printing). | display/printing). | |||
| 4.1 Bidi IRI Structure | Because of the complex interaction between the logical | |||
| representation, the visual representation, and the syntax of a Bidi | ||||
| IRIs have an inherent structure that distinguishes structural | IRI, a balance is needed between various requirements. The main | |||
| characters (usually punctuation such as '@', '.', ':', '/', and so | requirements are (1) user-predictable conversion between visual and | |||
| on) called delimiters and payload components (usually consisting | logical representation; (2) the ability to include a wide range of | |||
| mostly of letters and digits). | characters in various parts of the IRI; (3) no or not too big changes | |||
| or restrictions for implementations. | ||||
| ISSUE: Exact definition of components. | 4.1 Logical Storage and Visual Presentation | |||
| In their internal digital representation, i.e. stored or transmitted | In their internal digital representation, i.e. stored or transmitted | |||
| for resolution, bidirectional IRIs MUST be in full logical order both | for resolution, bidirectional IRIs MUST be in full logical order, and | |||
| for the overall structure as well as for the individual components. | MUST conform directly to the IRI syntax rules (which includes the | |||
| They MUST conform directly to the IRI syntax rules (which includes | rules relevant to their scheme). This assures that bidirectional | |||
| the rules relevant to their scheme). This is necessary to make sure | IRIs can be processed in the same way as other IRIs. | |||
| that bidirectional IRIs can be processed in the same way as other | ||||
| IRIs. | ||||
| The components have the following restrictions: | When rendered, bidirectional IRIs MUST be rendered using the Unicode | |||
| Bidirectional Algorithm [UNIV3], [UNI9]. Bidirectional IRIs MUST be | ||||
| rendered with an overall left-to-right direction. | ||||
| 1) A component MUST NOT not use both right-to-left and left-to- | In text with a left-to-right base directionality or embedding (e.g | |||
| right characters. | English, Cyrillic), the Unicode Bidirectional Algorithm will | |||
| automatically use an overall left-to-right direction for the IRI. In | ||||
| text with a right-to-left base directionality or embedding (e.g. | ||||
| Arabic or Hebrew), some kind of embedding is needed. This may be | ||||
| Unicode bidi formatting codes (LRE before the IRI, and PDF after the | ||||
| IRI, both not part of the IRI itself) or equivalent features of a | ||||
| higher-order protocol (e.g. the dir='ltr' attribute in HTML). | ||||
| 2) A component MUST NOT contain bidirectional formatting | IRIs MUST NOT contain bidirectional formatting characters (LRM, RLM, | |||
| characters. | LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of | |||
| the IRI, but do not itself appear visually. It would therefore not | ||||
| be possible to again correctly input an IRI with such characters. | ||||
| 3) A component using right-to-left characters MUST NOT use any | 4.2 Bidi IRI Structure | |||
| other class of characters (e.g. neutrals or numbers). | ||||
| Note: Restrictions 1) and 2) are not very severe, in that they do not | The Unicode Bidirectional Algorithm is designed mainly for running | |||
| overly restrict useful identifiers. Also, trying to remove it would | text. To make sure that it does not affect the rendering of | |||
| make it impossible for humans to predict the logical sequence of | bidirectional IRIs too much, some restrictions on bidirectional IRIs | |||
| characters inside a single component. On the other hand, it would be | are necessary. These restrictions are given in terms of delimiters | |||
| very desirable to remove or at least soften restriction 3). | (structural characters, mostly punctuation such as | |||
| Otherwise, it is impossible to combine Arabic or Hebrew letters with | '@', '.', ':', '/') and components (usually consisting mostly of | |||
| numbers, or to use a hyphen between two subcomponents of an Arabic | letters and digits). | |||
| component to avoid the cursive connection of the two subcomponents. | ||||
| To a certain extent, softening this restriction should be easily | ||||
| possible by adding additional formatting characters in well defined | ||||
| ways similar to the provisions in Section 4.2. Feedback on this | ||||
| issue is particularly welcome. | ||||
| 4.2 Visual Rendering of Bidi IRIs | The following syntax rules from Section 2.2 correspond to components | |||
| for the purpose of Bidi behavior: iopaquepart, irelsegment, iregname, | ||||
| iuserinfo, isegment, iparam, ihostname, iquery, and ifragment. | ||||
| Bidirectional IRIs MUST be rendered visually by rendering each | Specifications that define the syntax of any of the above components | |||
| component and each structural character from left to right. They | MAY divide them further and define smaller parts to be components | |||
| MUST render each component according to its natural direction (i.e. | according to this document. As an example, the restrictions of | |||
| left-to-right for components with left-to-right characters, right-to- | [RFCXXXX] on bidirectional domain names correspond to treating each | |||
| left for components with right-to-left characters). | label of the domain name as a component. Even where the components | |||
| are not defined formally, it may be helpful to think about some | ||||
| syntax in terms of components and to apply the relevant restrictions. | ||||
| For example, for the usual name/value syntax in query parts, it is | ||||
| convenient to treat each name and each value as a component. As | ||||
| another example, the extensions in a resource name can be treated as | ||||
| separate components. | ||||
| ISSUE: The alternative is to display a series of right-to-left | For each component, the following restrictions apply: | |||
| components in their natural (right-to-left) order. This has the | ||||
| advantage that it will often be easier for native people to read the | ||||
| components in the right order. The restrictions on individual | ||||
| components change. In some cases, the correct visual rendering is | ||||
| automatic (i.e. exactly the same as with the Unicode algorithm), and | ||||
| so in these cases, no bidi formatting characters have to be added. | ||||
| In a textual context, i.e. assuming rendering by the Unicode | 1) A component SHOULD NOT not use both right-to-left and left-to- | |||
| bidirectional algorithm, the visual rendering backing store is done | right characters. | |||
| as follows: | ||||
| The visual representation uses some of the following Bidi formatting | 2) A component using right-to-left characters SHOULD start and end | |||
| characters described by using a XML-style entity notation: | with right-to-left characters. | |||
| ‎ U+200E LEFT-TO-RIGHT MARK | The above restrictions are given as shoulds, rather than as musts. | |||
| ‏ U+200F RIGHT-TO-LEFT MARK | For IRIs that are never presented visually, they are not relevant. | |||
| &lre; U+202A LEFT-TO-RIGHT EMBEDDING | However, for IRIs in general, they are very important to insure | |||
| &rle; U+202B RIGHT-TO-LEFT EMBEDDING | consistent conversion between visual presentation and logical | |||
| &pdf; U+202C POP DIRECTIONAL FORMATTING | representation, in both directions. | |||
| &lro; U+202D LEFT-TO-RIGHT OVERRIDE | ||||
| &rlo; U+202E RIGHT-TO-LEFT OVERRIDE | ||||
| Each component with right-to-left characters is preceded and | In some components, the above restrictions may actually be strictly | |||
| followed by an ‎. This left-to-right mark provides a left- | enforced. For example, [RFCXXXX] requires that these restrictions | |||
| to-right context to intervening syntactic characters. | apply to the labels of the host name part of an IRI. In some other | |||
| components, for example path components, following these restrictions | ||||
| may not be too difficult. For other components, such as parts of the | ||||
| query part, it may be very difficult to enforce the restrictions, | ||||
| because the values of query parameters may be arbitrary character | ||||
| sequences. | ||||
| If the overall context (base directionality) is right-to-left, | In order to satisfy the above restrictions, the affected component | |||
| the identifier is preceded by an &lre; and followed by a &pdf;. | can be mapped to URI notation as described in Section 3.1. Please | |||
| This makes sure that the components of the identifier are | note that the whole component needs to be mapped (see also Example 9 | |||
| rendered in left-to-right order. This may also be done by | below). | |||
| using the equivalent features of a higher-order protocol (e.g. | ||||
| by using the dir='ltr' attribute in HTML). | ||||
| 4.3 Input of Bidi IRIs | 4.3 Input of Bidi IRIs | |||
| Bidi input methods MUST generate Bidi IRIs in logical order while | Bidi input methods MUST generate Bidi IRIs in logical order while | |||
| rendering them according to Section 4.2. During input, rendering | rendering them according to Section 4.1. During input, rendering | |||
| should be updated after every new character that is input to avoid | should be updated after every new character that is input to avoid | |||
| end user confusion. | end user confusion. | |||
| 4.4 Examples | ||||
| This section gives examples of bidirectional IRIs, in Bidi Notation. | ||||
| It shows legal IRIs with the relationship between logical and visual | ||||
| representation, and explains how certain phenomena in this | ||||
| relationship may look strange to somebody not familiar with | ||||
| bidirectional behavior, but familiar to users of Arabic and Hebrew. | ||||
| It also shows what happens if the restrictions given in Section 4.2 | ||||
| are not followed. The examples below can be seen at [BidiEx], in | ||||
| Arabic, Hebrew, and Bidi Notation variants. | ||||
| Example 1: A single component with right-to-left (rtl) characters is | ||||
| inverted: | ||||
| logical representation: http://ab.CDEFGH.ij/kl/mn/op.html, | ||||
| visual representation: http://ab.HGFEDC.ij/kl/mn/op.html. | ||||
| Components can be read one-by-one, and each component can be read in | ||||
| its natural direction. | ||||
| Example 2: More than one consecutive component with rtl characters is | ||||
| inverted as a whole: | ||||
| logical representation: http://ab.CDE.FGH/ij/kl/mn/op.html, | ||||
| visual representation: http://ab.HGF.EDC/ij/kl/mn/op.html. | ||||
| A sequence of rtl components is read rtl, in the same way as a | ||||
| sequence of rtl words is read rtl in a bidi text. | ||||
| Example 3: All components of an IRI (except for the scheme) are rtl. | ||||
| All rtl components are inverted overall: | ||||
| logical representation: http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV, | ||||
| visual representation: http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA. | ||||
| The whole IRI (except the scheme) is read rtl. Delimiters between | ||||
| rtl components stay between the respective components; delimiters | ||||
| between ltr and rtl components don't move. | ||||
| Example 4: Several sequences of rtl components are each inverted on | ||||
| their own: | ||||
| logical representation: http://AB.CD.ef/gh/IJ/KL.html, | ||||
| visual representation: http://DC.BA.ef/gh/LK/JI.html. | ||||
| Each sequence of rtl components is read rtl, in the same way as each | ||||
| sequence of rtl words in an ltr text is read rtl. | ||||
| Example 5: Example 2, applied to components of different kinds: | ||||
| logical representation: http://ab.cd.EF/GH/ij/kl.html, | ||||
| visual representation: http://ab.cd.HG/FE/ij/kl.html. | ||||
| The inversion of the domain name label and the path component may be | ||||
| unexpected, but is consistent with other bidi behavior. | ||||
| Example 6: Same as example 5, with more rtl components: | ||||
| logical representation: http://ab.CD.EF/GH/IJ/kl.html, | ||||
| visual representation: http://ab.JI/HG/FE.DC/kl.html. | ||||
| The inversion of the domain name labels and the path components may | ||||
| be easier to identify because the delimiters also move. | ||||
| Example 7: A single rtl component with included digits: | ||||
| logical representation: http://ab.CDE123FGH.ij/kl/mn/op.html, | ||||
| visual representation: http://ab.HGF123EDC.ij/kl/mn/op.html. | ||||
| Numbers are written ltr in all cases, but are treated as an | ||||
| additional embedding inside a run of rtl characters. This is | ||||
| completely consistent with usual bidirectional text. | ||||
| Example 8 (not allowed): Numbers at the start or end of a rtl | ||||
| component: | ||||
| logical representation: http://ab.cd.ef/GH1/2IJ/KL.html, | ||||
| visual representation: http://ab.cd.ef/LK/JI1/2HG.html. | ||||
| The sequence '1/2' is interpreted by the bidi algorithm as a | ||||
| fraction, fragmenting the components and leading to confusion. There | ||||
| are other characters that are interpreted in a special way close to | ||||
| numbers, in particular '+', '-', '#', '$', '%', ',', '.', and ':'. | ||||
| Example 9 (not allowed): The numbers in the previous example are | ||||
| escaped: | ||||
| logical representation: http://ab.cd.ef/GH%31/%32IJ/KL.html, | ||||
| visual representation (Hebrew): http://ab.cd.ef/LK/JI%32/%31HG.html, | ||||
| visual representation (Arabic): http://ab.cd.ef/LK/JI32%/31%HG.html. | ||||
| Depending on whether the upper-case letters represent Arabic or | ||||
| Hebrew, the visual representation is different. | ||||
| 5. Use of IRIs | 5. Use of IRIs | |||
| 5.1 Limitations on UCS Character Allowed in IRI | 5.1 Limitations on UCS Characters Allowed in IRIs | |||
| This section discusses the limitations on characters and character | This section discusses the limitations on characters and character | |||
| sequences usable for IRIs. The considerations in this section are | sequences usable for IRIs. The considerations in this section are | |||
| relevant when creating IRIs and when converting from URIs to IRIs. | relevant when creating IRIs and when converting from URIs to IRIs. | |||
| a. The repertoire of characters allowed in each IRI component is | a) The repertoire of characters allowed in each IRI component is | |||
| limited by the definition of that component. For example, the | limited by the definition of that component. For example, the | |||
| definition of the scheme component does not allow characters | definition of the scheme component does not allow characters | |||
| beyond US-ASCII. | beyond US-ASCII. | |||
| (Note: In accordance with URI practice, generic IRI software | (Note: In accordance with URI practice, generic IRI software | |||
| cannot and should not check for such limitations.) | cannot and should not check for such limitations.) | |||
| b. In the URI syntax, characters that are likely to be used to | b) In the URI syntax, characters that are likely to be used to | |||
| delimit URIs in text and print ("space", "delims", and | delimit URIs in text and print ("space", "delims", and | |||
| "unwise") were excluded. They are included in the IRI syntax, | "unwise") were excluded. They are included in the IRI syntax | |||
| for the following reasons: | (with the exception of '%', which cannot be used directly, and | |||
| '#', which is used in IRI references), for the following | ||||
| reasons: | ||||
| 1) The syntax includes many other characters that are not | 1) The syntax includes many other characters that are not | |||
| appropriate in many cases. | appropriate in many cases. | |||
| 2) Some implementation practice already allows them in URI | 2) Some implementation practice already allows them in URI | |||
| references (for example spaces in fragment identifiers). | references (for example spaces in fragment identifiers). | |||
| 3) It is very convenient in some cases, for example for | 3) It is very convenient in some cases, for example for | |||
| XPointers in XML attributes. | XPointers in XML attributes. | |||
| 4) Considering context is already necessary in the case of | 4) Considering context is already necessary in the case of | |||
| URIs, for example for "&" in XML. | URIs, for example for "&" in XML. | |||
| However, these characters should be used carefully. Whenever | However, these characters should be avoided where possible. | |||
| there is a chance that an IRI will be used in a component where | Whenever there is a chance that an IRI will be used in a | |||
| these characters can be harmful, they should be escaped. | component where these characters can be harmful, they should be | |||
| escaped from the start. | ||||
| c. The UCS contains many areas of characters for which there are | c) The UCS contains many areas of characters for which there are | |||
| strong visual look-alikes. Because of the likelihood of | strong visual look-alikes. Because of the likelihood of | |||
| transcription errors, these also should be avoided. This | transcription errors, these also should be avoided. This | |||
| includes the full-width equivalents of ASCII characters, half- | includes the full-width equivalents of ASCII characters, half- | |||
| width Katakana characters for Japanese, and many others. This | width Katakana characters for Japanese, and many others. This | |||
| also includes many look-alikes of "space", "delims", and | also includes many look-alikes of "space", "delims", and | |||
| "unwise", characters excluded in [RFC2396]. | "unwise", characters excluded in [RFC2396]. | |||
| Additional information is available from [UNIXML]. Although [UNIXML] | Additional information is available from [UNIXML]. [UNIXML] is | |||
| is written in a different context, it discusses many of the | written in the context of running text rather than in the context of | |||
| categories of characters and code points not appropriate for IRIs. | identifiers. Nevertheless, it discusses many of the categories of | |||
| characters and code points not appropriate for IRIs. | ||||
| 5.2 Software Interfaces and Protocols | 5.2 Software Interfaces and Protocols | |||
| Although an IRI is defined as a sequence of characters, software | Although an IRI is defined as a sequence of characters, software | |||
| interfaces for URIs typically function on sequences of octets. Thus, | interfaces for URIs typically function on sequences of octets or | |||
| software interfaces and protocols MUST define which character | other kinds of code units. Thus, software interfaces and protocols | |||
| encoding is used. | MUST define which character encoding is used. | |||
| Intermediate software interfaces between IRI-capable components and | Intermediate software interfaces between IRI-capable components and | |||
| URI-only components MUST map the IRIs as per Section 3.1, when | URI-only components MUST map the IRIs as per Section 3.1, when | |||
| transferring from IRI-capable to URI-only components. Such a mapping | transferring from IRI-capable to URI-only components. Such a mapping | |||
| SHOULD be applied as late as possible. It should not be applied | SHOULD be applied as late as possible. It should not be applied | |||
| between components that are known to be able to handle IRIs. | between components that are known to be able to handle IRIs. | |||
| 5.3 Format of URIs and IRIs in Documents and Protocols | 5.3 Format of URIs and IRIs in Documents and Protocols | |||
| Document formats that transport URIs may need to be upgraded to allow | Document formats that transport URIs may need to be upgraded to allow | |||
| skipping to change at page 18, line 41 | skipping to change at page 22, line 13 | |||
| input the IRI. Depending on the script and the input method used, | input the IRI. Depending on the script and the input method used, | |||
| this may be a more or less complicated process. | this may be a more or less complicated process. | |||
| The process of IRI entry must assure, as far as possible, that the | The process of IRI entry must assure, as far as possible, that the | |||
| restrictions defined in Section 2.2 are met. This may be done by | restrictions defined in Section 2.2 are met. This may be done by | |||
| choosing appropriate input methods or variants/settings thereof, by | choosing appropriate input methods or variants/settings thereof, by | |||
| appropriately converting the characters being input, by eliminating | appropriately converting the characters being input, by eliminating | |||
| characters that cannot be converted, and/or by issuing a warning or | characters that cannot be converted, and/or by issuing a warning or | |||
| error message to the user. | error message to the user. | |||
| As an example of variant settings, input method editors for East | ||||
| Asian Languages usually allow to input Latin letters and related | ||||
| characters in full-width or half-width versions. For IRI input, the | ||||
| input method editor should be set to half-width input, in order to | ||||
| produce US-ASCII characters where possible. | ||||
| An input field primarily or only used for the input of URIs/IRIs | An input field primarily or only used for the input of URIs/IRIs | |||
| should allow the user to view an IRI as converted to a URI. Places | should allow the user to view an IRI as mapped to a URI. Places | |||
| where the input of IRIs is frequent should provide the possibility | where the input of IRIs is frequent should provide the possibility | |||
| for viewing an IRI as converted to a URI. This will help users when | for viewing an IRI as mapped to a URI. This will help users when | |||
| some of the software they use does not yet accept IRIs. | some of the software they use does not yet accept IRIs. | |||
| An IRI input component that interfaces to components that handle | An IRI input component that interfaces to components that handle | |||
| URIs, but not IRIs, must escape the IRI before passing it to such a | URIs, but not IRIs, must map the the IRI to an URI before passing it | |||
| component. | to such a component. | |||
| For the input of IRIs with right-to-left characters, please see | For the input of IRIs with right-to-left characters, please see | |||
| Section 4. | Section 4.3. | |||
| 6.3 URI/IRI Generation | 6.3 URI/IRI Transfer Between Applications | |||
| Many applications, in particular many mail user agents, try to detect | ||||
| URIs appearing in plain text. For this, they use some heuristics | ||||
| based on URI syntax. They then allow the user to click on such URIs | ||||
| and retrieve the corresponding resource in an appropriate (usually | ||||
| scheme-dependent) application. | ||||
| Such applications have to be upgraded to use the IRI syntax rather | ||||
| than the URI syntax as a base for heuristics. In particular, a non- | ||||
| ASCII character should not be taken as the indication of the end of | ||||
| an IRI. Such applications also have to make sure that they correctly | ||||
| convert the detected IRI from the encoding of the document or | ||||
| application where the IRI appears to the encoding used by the system- | ||||
| wide IRI invocation mechanism, or to an URI (according to Section | ||||
| 3.1) if the system-wide invocation mechanism only accepts URIs. | ||||
| The clipboard is another frequently used way to transfer URIs and | ||||
| IRIs from one application to another. On most platforms, the | ||||
| clipboard is able to store and transfer text in many languages and | ||||
| scripts. Correctly used, the clipboard transfers characters, not | ||||
| bytes, which will do the right thing with IRIs. | ||||
| 6.4 URI/IRI Generation | ||||
| Systems that are offering resources through the Internet, where those | Systems that are offering resources through the Internet, where those | |||
| resources have logical names, sometimes automatically generate URIs | resources have logical names, sometimes automatically generate URIs | |||
| for the resources they offer. For example, some HTTP servers can | for the resources they offer. For example, some HTTP servers can | |||
| generate a directory listing for a file directory, and then respond | generate a directory listing for a file directory, and then respond | |||
| to the generated URIs with the files. | to the generated URIs with the files. | |||
| Many legacy character encodings are in use in various file systems. | Many legacy character encodings are in use in various file systems. | |||
| Many currently deployed systems do not transform the local character | Many currently deployed systems do not transform the local character | |||
| representation of the underlying system before generating URIs. | representation of the underlying system before generating URIs. | |||
| skipping to change at page 19, line 29 | skipping to change at page 23, line 31 | |||
| use IRIs converted to URIs in cases where it cannot be expected that | use IRIs converted to URIs in cases where it cannot be expected that | |||
| the recipient is able to handle IRIs. Due to the way most user | the recipient is able to handle IRIs. Due to the way most user | |||
| agents currently work, native IRIs, encoded in UTF-8, may be used if | agents currently work, native IRIs, encoded in UTF-8, may be used if | |||
| the recipient announces that it can interpret UTF-8. This requires | the recipient announces that it can interpret UTF-8. This requires | |||
| that the whole page is sent as UTF-8. If this is not possible, | that the whole page is sent as UTF-8. If this is not possible, | |||
| escaping can always be used. | escaping can always be used. | |||
| This recommendation in particular applies to HTTP servers. For FTP | This recommendation in particular applies to HTTP servers. For FTP | |||
| servers, similar considerations apply, see in particular [RFC2640]. | servers, similar considerations apply, see in particular [RFC2640]. | |||
| 6.4 URI/IRI Selection | 6.5 URI/IRI Selection | |||
| In some cases, resource owners and publishers have control over the | In some cases, resource owners and publishers have control over the | |||
| IRIs used to identify their resources. Such control is mostly | IRIs used to identify their resources. Such control is mostly | |||
| executed by controlling the resource names, such as file names, | executed by controlling the resource names, such as file names, | |||
| directly. | directly. | |||
| In such cases, it is recommended to avoid choosing IRIs that are | In such cases, it is recommended to avoid choosing IRIs that are | |||
| easily confused. For example, for US-ASCII, the lower-case ell "l" | easily confused. For example, for US-ASCII, the lower-case ell "l" | |||
| is easily confused with the digit one "1", and the upper-case oh "O" | is easily confused with the digit one "1", and the upper-case oh "O" | |||
| is easily confused with the digit zero "0". Publishers should avoid | is easily confused with the digit zero "0". Publishers should avoid | |||
| skipping to change at page 20, line 5 | skipping to change at page 24, line 8 | |||
| here. As long as names are limited to characters from a single | here. As long as names are limited to characters from a single | |||
| script, native writers of a given script or language will know best | script, native writers of a given script or language will know best | |||
| when ambiguities can appear, and how they can be avoided. What may | when ambiguities can appear, and how they can be avoided. What may | |||
| look ambiguous to a stranger may be completely obvious to the average | look ambiguous to a stranger may be completely obvious to the average | |||
| native user. On the other hand, in some cases, the UCS contains | native user. On the other hand, in some cases, the UCS contains | |||
| variants for compatibility reasons, for example for typographic | variants for compatibility reasons, for example for typographic | |||
| purposes. These should be avoided wherever possible. Although there | purposes. These should be avoided wherever possible. Although there | |||
| may be exceptions, in general newly created resource names should be | may be exceptions, in general newly created resource names should be | |||
| in NFKC [UNI15] (which means that they are also in NFC). | in NFKC [UNI15] (which means that they are also in NFC). | |||
| In certain cases, there is a chance that letters from different | As an example, the UCS contains a codepoint for the 'fi' ligature. | |||
| Wherever possible, IRIs should use the two letters 'f' and 'i' rather | ||||
| than the 'fi' ligature. An example where the later may be used is in | ||||
| the query part of an IRI for an explicit search for a word containing | ||||
| the 'fi' ligature. | ||||
| In certain cases, there is a chance that characters from different | ||||
| scripts look the same. The best known example is the Latin 'A', the | scripts look the same. The best known example is the Latin 'A', the | |||
| Greek 'Alpha', and the Cyrillic 'A'. To avoid such cases, only IRIs | Greek 'Alpha', and the Cyrillic 'A'. To avoid such cases, only IRIs | |||
| should be generated where all the letters in a single component are | should be generated where all the characters in a single component | |||
| from the same script. This is similar to the heuristics used to | are used together in a given language. This usually means that all | |||
| distinguish between letters and numbers in the examples above. Also, | these characters will be from the same script, but there are | |||
| for the above three scripts, using lower-case letters results in | languages that mix characters from different scripts (such as | |||
| fewer ambiguities than using upper-case letters. | Japanese). This is similar to the heuristics used to distinguish | |||
| between letters and numbers in the examples above. Also, for Latin, | ||||
| Greek, and Cyrillic, using lower-case letters results in fewer | ||||
| ambiguities than using upper-case letters. | ||||
| 6.5 Display of URIs/IRIs | 6.6 Display of URIs/IRIs | |||
| In situations where the rendering software is not expected to display | In situations where the rendering software is not expected to display | |||
| non-ASCII parts of the IRI correctly using the available layout and | non-ASCII parts of the IRI correctly using the available layout and | |||
| font resources, these parts should be escaped before being displayed. | font resources, these parts should be escaped before being displayed. | |||
| For display of Bidi IRIs, please see Section 4.2. | For display of Bidi IRIs, please see Section 4.1. | |||
| 6.6 Interpretation of URIs and IRIs | 6.7 Interpretation of URIs and IRIs | |||
| Software that interprets IRIs as the names of local resources should | Software that interprets IRIs as the names of local resources should | |||
| accept IRIs in multiple forms, and convert and match them with the | accept IRIs in multiple forms, and convert and match them with the | |||
| appropriate local resource names. | appropriate local resource names. | |||
| First, multiple representations include both IRIs in the native | First, multiple representations include both IRIs in the native | |||
| character encoding of the protocol and also their URI counterparts. | character encoding of the protocol and also their URI counterparts. | |||
| Second, it may include URIs constructed based on other character | Second, it may include URIs constructed based on other character | |||
| encodings than UTF-8. Such URIs may be produced by user agents that | encodings than UTF-8. Such URIs may be produced by user agents that | |||
| skipping to change at page 20, line 44 | skipping to change at page 25, line 8 | |||
| convert non-ASCII characters to URIs. Whether this is necessary, and | convert non-ASCII characters to URIs. Whether this is necessary, and | |||
| what character encodings to cover, depends on a number of factors, | what character encodings to cover, depends on a number of factors, | |||
| such as the legacy character encodings used locally and the | such as the legacy character encodings used locally and the | |||
| distribution of various versions of user agents. For example, | distribution of various versions of user agents. For example, | |||
| software for Japanese may accept URIs in Shift_JIS and/or EUC-JP in | software for Japanese may accept URIs in Shift_JIS and/or EUC-JP in | |||
| addition to UTF-8. | addition to UTF-8. | |||
| Third, it may include additional mappings to be more user-friendly | Third, it may include additional mappings to be more user-friendly | |||
| and robust against transmission errors. These would be similar to | and robust against transmission errors. These would be similar to | |||
| how currently some servers treat URIs as case-insensitive, or perform | how currently some servers treat URIs as case-insensitive, or perform | |||
| additional matchings to account for spelling errors. For characters | additional matching to account for spelling errors. For characters | |||
| beyond the ASCII repertoire, this may for example include ignoring | beyond the ASCII repertoire, this may for example include ignoring | |||
| the accents on received IRIs or resource names where appropriate. | the accents on received IRIs or resource names where appropriate. | |||
| Please note that such mappings, including case mappings, are | Please note that such mappings, including case mappings, are | |||
| language-dependent. | language-dependent. | |||
| It can be difficult to unambiguously identify a resource if too many | It can be difficult to unambiguously identify a resource if too many | |||
| mappings are taken into consideration. However, escaped and non- | mappings are taken into consideration. However, escaped and non- | |||
| escaped parts of IRIs can always clearly be distinguished. Also, the | escaped parts of IRIs can always clearly be distinguished. Also, the | |||
| regularity of UTF-8 (see [Duer97] makes the potential for collisions | regularity of UTF-8 (see [Duer97]) makes the potential for collisions | |||
| lower than it may seem at first sight. | lower than it may seem at first sight. | |||
| 6.7 Upgrading Strategy | 6.8 Upgrading Strategy | |||
| As this recommendation places further constraints on software for | Where this recommendation places further constraints on software for | |||
| which many instances are already deployed, it is important to | which many instances are already deployed, it is important to | |||
| introduce upgrades carefully, and to be aware of the various | introduce upgrades carefully, and to be aware of the various | |||
| interdependencies. | interdependencies. | |||
| If IRIs cannot be interpreted correctly, they should not be generated | If IRIs cannot be interpreted correctly, they should not be generated | |||
| or transported. This suggests that upgrading URI interpreting | or transported. This suggests that upgrading URI interpreting | |||
| software to accept IRIs should have highest priority. | software to accept IRIs should have highest priority. | |||
| On the other hand, a single IRI is interpreted only by a single or | On the other hand, a single IRI is interpreted only by a single or | |||
| very few interpreters that are known in advance, while it may be | very few interpreters that are known in advance, while it may be | |||
| skipping to change at page 22, line 25 | skipping to change at page 26, line 37 | |||
| normalization expectations of a user or actual normalization when | normalization expectations of a user or actual normalization when | |||
| entering an IRI do not match the normalization used on the server | entering an IRI do not match the normalization used on the server | |||
| side. Conceptually, this is no different from the problems | side. Conceptually, this is no different from the problems | |||
| surrounding the use of case-insensitive web servers. For example, a | surrounding the use of case-insensitive web servers. For example, a | |||
| popular web page with a mixed case name (http://big.site/ | popular web page with a mixed case name (http://big.site/ | |||
| PopularPage.html) might be "spoofed" by someone who obtains access to | PopularPage.html) might be "spoofed" by someone who obtains access to | |||
| http://big.site/popularpage.html. However, the introduction of | http://big.site/popularpage.html. However, the introduction of | |||
| character normalization, and of additional mappings for user | character normalization, and of additional mappings for user | |||
| convenience, may increase the chance for spoofing. | convenience, may increase the chance for spoofing. | |||
| Spoofing can occur due to the fact that in the UCS, there are many | Spoofing can occur because in the UCS, there are many characters that | |||
| characters that look very similar. Details are discussed in Section | look very similar. Details are discussed in Section 6.5. Again, | |||
| 6.4. Again, this is very similar to spoofing possibilities on US- | this is very similar to spoofing possibilities on US-ASCII, e.g. | |||
| ASCII, e.g. using 'br0ken' or '1ame' URIs. | using 'br0ken' or '1ame' URIs. | |||
| Spoofing can occur when URIs in various encodings are accepted to | Spoofing can occur when URIs in various encodings are accepted to | |||
| deal with older user agents. In some cases, in particular for Latin- | deal with older user agents. In some cases, in particular for Latin- | |||
| based resource names, this is usually easy to detect because UTF-8- | based resource names, this is usually easy to detect because UTF-8- | |||
| encoded names, when interpreted and viewed as legacy encodings, | encoded names, when interpreted and viewed as legacy encodings, | |||
| produce mostly garbage. In other cases, when concurrently used | produce mostly garbage. In other cases, when concurrently used | |||
| encodings have a similar structure, but there are no characters that | encodings have a similar structure, but there are no characters that | |||
| have exactly the same encoding, detection is more difficult. | have exactly the same encoding, detection is more difficult. | |||
| Spoofing can occur in various IRI components, such as the domain name | Spoofing can occur in various IRI components, such as the domain name | |||
| part or a path part. For considerations specific to the domain name | part or a path part. For considerations specific to the domain name | |||
| part, see [Nameprep]. For the path part, administrators of sites | part, see [Nameprep]. For the path part, administrators of sites | |||
| which allow independent users to create resources in the same subarea | which allow independent users to create resources in the same subarea | |||
| may need to be careful to check for spoofing. | may need to be careful to check for spoofing. | |||
| Spoofing can occur with bidirectional IRIs, if the restrictions in | ||||
| Section 4.2 are not followed. The same visual representation may be | ||||
| interpreted as different logical representations, and vice versa. It | ||||
| is also very important that a correct Unicode bidirectional | ||||
| implementation is used. | ||||
| 8. Change log | 8. Change log | |||
| Changes from -00 to -01 | 8.1 Changes from -01 to -02 | |||
| - New approach for Bidi section, many examples. | ||||
| - Created idelims, removed '%' and '#'. Changed userinfo to | ||||
| iuserinfo in iserver. | ||||
| - Changed to ABNF defined by [RFC2234]. | ||||
| - Included bug fixes from [RFC2396bis]. | ||||
| - Additions to Acknowledgements. | ||||
| 8.2 Changes from -00 to -01 | ||||
| - Re-integrated the section on Bidi, some issues left. | - Re-integrated the section on Bidi, some issues left. | |||
| - Integrated IDN, changed syntax (host, userinfo,....). | - Integrated IDN, changed syntax (host, userinfo,....). | |||
| - Moved some text around, marked some as informational. | - Moved some text around, marked some as informational. | |||
| - Made a clear distinction of IRI use for identification only and | - Made a clear distinction of IRI use for identification only and | |||
| for resource resolution. | for resource resolution. | |||
| - Fixed various details in wording, spelling,... | - Fixed various details in wording, spelling,... | |||
| 9. Acknowlegdements | 9. Acknowledgements | |||
| We would like to thank Larry Masinter for his work as coauthor of | We would like to thank Larry Masinter for his work as coauthor of | |||
| many earlier versions of this document (draft-masinter-url-i18n-xx). | many earlier versions of this document (draft-masinter-url-i18n-xx). | |||
| The issue addressed here has been discussed at numerous times over | The discussion on the issue addressed here has started a long time | |||
| the last years; for example, there was a thread in the HTML working | ago. There was a thread in the HTML working group in August 1995 | |||
| group in August 1995 (under the topic of "Globalizing URIs") in the | (under the topic of "Globalizing URIs") and in the www-international | |||
| www-international mailing list in July 1996 (under the topic of | mailing list in July 1996 (under the topic of "Internationalization | |||
| "Internationalization and URLs"), and ad-hoc meetings at the Unicode | and URLs"), and ad-hoc meetings at the Unicode conferences in | |||
| conferences in September 1995 and September 1997. | September 1995 and September 1997. | |||
| Thanks to Francois Yergeau, Chris Wendt, Yaron Goland, Graham Klyne, | Thanks to Francois Yergeau, Matti Allouche, Roy Fielding, Tim | |||
| Roy Fielding, Tim Berners-Lee, M.T. Carrasco Benitez, James Clark, | Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim | |||
| Andrea Vine, Misha Wolf, Leslie Daigle, Makoto MURATA, Tex Texin, | Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie | |||
| Bjoern Hoehrmann, Dan Oscarson, and many others for help with | Daigle, Ted Hardie, Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex | |||
| understanding the issues and possible solutions. Thanks also to the | Texin, Graham Klyne, Bjoern Hoehrmann, Chris Lilly, Dan Oscarson, | |||
| members of the W3C I18N Working Group and Interest Group for their | Elliotte Rusty Harold, Mike J. Brown, Carlos Viegas Damasio, and | |||
| many others for help with understanding the issues and possible | ||||
| solutions, and getting the details right. Thanks also to the members | ||||
| of the W3C I18N Working Group and Interest Group for their | ||||
| contributions and their work on [CharMod], to the members of many | contributions and their work on [CharMod], to the members of many | |||
| other W3C WGs for adopting the ideas, and to the members of the | other W3C WGs for adopting the ideas, and to the members of the | |||
| Montreal IAB Workshop on Internationalization and Localization for | Montreal IAB Workshop on Internationalization and Localization for | |||
| their review. | their review. | |||
| References | Normative References | |||
| [ISO10646] International Organization for Standardization, | ||||
| "Information Technology - Universal Multiple-Octet Coded | ||||
| Character Set (UCS) - Part 1: Architecture and Basic | ||||
| Multilingual Plane - Part 2: Supplementary Planes", ISO | ||||
| Standard 10646, with amendment, July 2002. | ||||
| [RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax | ||||
| Specifications: ABNF", RFC 2234, November 1997. | ||||
| [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO | ||||
| 10646", RFC 2279, January 1998. | ||||
| [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform | ||||
| Resource Identifiers (URI): Generic Syntax", RFC 2396, | ||||
| August 1998. | ||||
| [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format for | ||||
| Literal IPv6 Addresses in URL's", RFC 2732, December | ||||
| 1999. | ||||
| [RFCXXXX] Faltstrom, P., Hoffman, P. and A. Costello, | ||||
| "Internationalizing Domain Names in Applications (IDNA)", | ||||
| draft-ietf-idn-idna-14.txt (work in progress), October | ||||
| 2002, <http://www.ietf.org/internet-drafts/draft-ietf- | ||||
| idn-idna-14.txt>. | ||||
| [UNI15] Davis, M. and M. Duerst, "Unicode Normalization Forms", | ||||
| Unicode Standard Annex #15, March 2001, <http:// | ||||
| www.unicode.org/unicode/reports/tr15/tr15-21.html>. | ||||
| Non-normative References | ||||
| [BidiEx] "Examples of bidirectional IRIs", <http://www.w3.org/ | ||||
| International/iri-edit/BidiExamples>. | ||||
| [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M., | [CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M., | |||
| Freytag, A. and T. Texin, "Character Model for the | Freytag, A. and T. Texin, "Character Model for the | |||
| World Wide Web", World Wide Web Consortium Working | World Wide Web", World Wide Web Consortium Working | |||
| Draft, April 2002, <http://www.w3.org/TR/charmod>. | Draft, April 2002, <http://www.w3.org/TR/charmod>. | |||
| [Duer97] Duerst, M., "The Properties and Promises of UTF-8", | [Duer97] Duerst, M., "The Properties and Promises of UTF-8", | |||
| Proc. 11th International Unicode Conference, San Jose | Proc. 11th International Unicode Conference, San Jose | |||
| , September 1997, <http://www.ifi.unizh.ch/mml/ | , September 1997, <http://www.ifi.unizh.ch/mml/ | |||
| mduerst/papers/PDF/IUC11-UTF-8.pdf>. | mduerst/papers/PDF/IUC11-UTF-8.pdf>. | |||
| skipping to change at page 24, line 11 | skipping to change at page 29, line 33 | |||
| International Unicode Conference, San Jose , | International Unicode Conference, San Jose , | |||
| September 2001, <http://www.w3.org/2001/Talks/0912- | September 2001, <http://www.w3.org/2001/Talks/0912- | |||
| IUC-IRI/paper.html>. | IUC-IRI/paper.html>. | |||
| [HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01 | [HTML4] Raggett, D., Le Hors, A. and I. Jacobs, "HTML 4.01 | |||
| Specification", World Wide Web Consortium | Specification", World Wide Web Consortium | |||
| Recommendation, December 1999, <http://www.w3.org/TR/ | Recommendation, December 1999, <http://www.w3.org/TR/ | |||
| REC-html40/appendix/notes.html#h-B.2>. | REC-html40/appendix/notes.html#h-B.2>. | |||
| [IDNURI] Duerst, M., "Internationalized Domain Names in URIs", | [IDNURI] Duerst, M., "Internationalized Domain Names in URIs", | |||
| draft-ietf-idn-uri-02.txt (work in progress), July | draft-ietf-idn-uri-03.txt (work in progress), July | |||
| 2002, <http://www.ietf.org/internet-drafts/draft- | 2002, <http://www.ietf.org/internet-drafts/draft- | |||
| ietf-idn-uri-02.txt>. | ietf-idn-uri-03.txt>. | |||
| [IDNA] Faltstrom, P., Hoffman, P. and A. Faltstrom, | ||||
| "Internationalizing Domain Names in Applications | ||||
| (IDNA)", draft-ietf-idn-idna-09.txt (work in | ||||
| progress), May 2002, <http://www.ietf.org/internet- | ||||
| drafts/draft-ietf-idn-idna-09.txt>. | ||||
| [ISO10646] International Organization for Standardization, | ||||
| "Information Technology - Universal Multiple-Octet | ||||
| Coded Character Set (UCS) - Part 1: Architecture and | ||||
| Basic Multilingual Plane", ISO Standard 10646-1, with | ||||
| amendments, October 2000. | ||||
| [Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | [Nameprep] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | |||
| Profile for Internationalized Domain Names", draft- | Profile for Internationalized Domain Names", draft- | |||
| ietf-idn-nameprep-10.txt (work in progress), May | ietf-idn-nameprep-11.txt (work in progress), June | |||
| 2002, <http://www.ietf.org/internet-drafts/draft- | 2002, <http://www.ietf.org/internet-drafts/draft- | |||
| ietf-idn-nameprep-10.txt>. | ietf-idn-nameprep-11.txt>. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, | [RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, | |||
| H., Atkinson, R., Crispin, M. and P. Svanberg, "The | H., Atkinson, R., Crispin, M. and P. Svanberg, "The | |||
| Report of the IAB Character Set Workshop held 29 | Report of the IAB Character Set Workshop held 29 | |||
| February - 1 March, 1996", RFC 2130, April 1997. | February - 1 March, 1996", RFC 2130, April 1997. | |||
| [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. | |||
| [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September | [RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September | |||
| 1997. | 1997. | |||
| [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | |||
| Languages", BCP 18, RFC 2277, January 1998. | Languages", BCP 18, RFC 2277, January 1998. | |||
| [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO | ||||
| 10646", RFC 2279, January 1998. | ||||
| [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. | [RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998. | |||
| [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, | [RFC2396bis] Berners-Lee, T., Fielding, R. and L. Masinter, | |||
| "Uniform Resource Identifiers (URI): Generic Syntax", | "Uniform Resource Identifier (URI): Generic Syntax", | |||
| RFC 2396, August 1998. | Internet-Draft (work in progress), October 2002. | |||
| [RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, | [RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, | |||
| August 1998. | August 1998. | |||
| [RFC2616] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., | [RFC2616] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., | |||
| Masinter, L., Leach, P. and T. Berners-Lee, | Masinter, L., Leach, P. and T. Berners-Lee, | |||
| "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, | "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, | |||
| June 1999. | June 1999. | |||
| [RFC2640] Curtin, B., "Internationalization of the File | [RFC2640] Curtin, B., "Internationalization of the File | |||
| Transfer Protocol", RFC 2640, July 1999. | Transfer Protocol", RFC 2640, July 1999. | |||
| [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. | [RFC2718] Masinter, L., Alvestrand, H., Zigmond, D. and R. | |||
| Petke, "Guidelines for new URL Schemes", RFC 2718, | Petke, "Guidelines for new URL Schemes", RFC 2718, | |||
| November 1999. | November 1999. | |||
| [RFC2732] Hinden, R., Carpenter, B. and L. Masinter, "Format | ||||
| for Literal IPv6 Addresses in URL's", RFC 2732, | ||||
| December 1999. | ||||
| [UNIV3] The Unicode Consortium, "The Unicode Standard Version | [UNIV3] The Unicode Consortium, "The Unicode Standard Version | |||
| 3.0", Addison-Wesley, Reading, MA , 2000. | 3.0", Addison-Wesley, Reading, MA , 2000. | |||
| [UNI15] Davis, M. and M. Duerst, "Unicode Normalization | [UNI9] Davis, M., "The Bidirectional Algorithm", Unicode | |||
| Forms", Unicode Standard Annex #15, March 2001, | Standard Annex #9, March 2002, <http:// | |||
| <http://www.unicode.org/unicode/reports/tr15/tr15- | www.unicode.org/unicode/reports/tr9>. | |||
| 21.html>. | ||||
| [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | [UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other | |||
| Markup Languages", Unicode Technical Report #20, | Markup Languages", Unicode Technical Report #20, | |||
| World Wide Web Consortium Note, Februar 2002, <http:/ | World Wide Web Consortium Note, February 2002, | |||
| /www.w3.org/TR/unicode-xml/>. | <http://www.w3.org/TR/unicode-xml/>. | |||
| [W3CIRI] "Internationalization - URIs and other identifiers", | [W3CIRI] Duerst, M., "Internationalization - URIs and other | |||
| <http://www.w3.org/International/O-URL-and- | identifiers", World Wide Web Consortium Note, | |||
| ident.html>. | September 2002, <http://www.w3.org/International/O- | |||
| URL-and-ident.html>. | ||||
| [XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking | [XLink] DeRose, S., Maler, E. and D. Orchard, "XML Linking | |||
| Language (XLink) Version 1.0", World Wide Web | Language (XLink) Version 1.0", World Wide Web | |||
| Consortium Recommendation, June 2001, <http:// | Consortium Recommendation, June 2001, <http:// | |||
| www.w3.org/TR/xlink/#link-locators>. | www.w3.org/TR/xlink/#link-locators>. | |||
| [XML1] Bray, T., Paoli, J., Sperberg-McQueen, C. and E. | [XML1] Bray, T., Paoli, J., Sperberg-McQueen, C. and E. | |||
| Maler, "Extensible Markup Language (XML) 1.0 (Second | Maler, "Extensible Markup Language (XML) 1.0 (Second | |||
| Edition)", World Wide Web Consortium Recommendation, | Edition)", World Wide Web Consortium Recommendation, | |||
| including Erratum 26 at http://www.w3.org/XML/xml- | including Erratum 26 at http://www.w3.org/XML/xml- | |||
| skipping to change at page 26, line 19 | skipping to change at page 31, line 23 | |||
| XML", World Wide Web Consortium Recommendation, | XML", World Wide Web Consortium Recommendation, | |||
| January 1999, <http://www.w3.org/TR/REC-xml#sec- | January 1999, <http://www.w3.org/TR/REC-xml#sec- | |||
| external-ent>. | external-ent>. | |||
| [XMLSchema] Biron, P. and A. Malhotra, "XML Schema Part 2: | [XMLSchema] Biron, P. and A. Malhotra, "XML Schema Part 2: | |||
| Datatypes", World Wide Web Consortium Recommendation, | Datatypes", World Wide Web Consortium Recommendation, | |||
| May 2001, <http://www.w3.org/TR/xmlschema-2/#anyURI>. | May 2001, <http://www.w3.org/TR/xmlschema-2/#anyURI>. | |||
| Authors' Addresses | Authors' Addresses | |||
| Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever | Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possible, for example as "Dürst in XML and HTML.) | |||
| possible, for example as "Dürst in XML and HTML.) | World Wide Web Consortium | |||
| W3C/Keio University | 200 Technology Square | |||
| 5322 Endo | Cambridge, MA 02139 | |||
| Fujisawa 252-8520 | U.S.A. | |||
| Japan | ||||
| Phone: +81 466 49 1170 | Phone: +1 617 253 5509 | |||
| Fax: +81 466 49 1171 | Fax: +1 617 258 5999 | |||
| EMail: duerst@w3.org | EMail: duerst@w3.org | |||
| URI: http://www.w3.org/People/D%C3%BCrst/ | URI: http://www.w3.org/People/D%C3%BCrst/ | |||
| (Note: This is the escaped form of an IRI.) | (Note: This is the escaped form of an IRI.) | |||
| Michel Suignard | Michel Suignard | |||
| Microsoft Corporation | Microsoft Corporation | |||
| One Microsoft Way | One Microsoft Way | |||
| Redmond, WA 98052 | Redmond, WA 98052 | |||
| U.S.A. | U.S.A. | |||
| End of changes. | ||||
This html diff was produced by rfcdiff 1.12, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ | ||||