<?xml version='1.0'?>
<?xml-stylesheet type="text/xsl" href="xmlspec.xsl"?>
<!--  $Id: Overview.xml,v 1.4 2009/07/09 08:47:29 ht Exp $ -->
<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.2//EN" "../../../2002/xmlspec/dtd/2.10/xmlspec.dtd" [
<!ENTITY doc.type "wgnote">
<!ENTITY doc.status "int-review">
<!ENTITY w3c-document-type "W3C Working Group Note">
<!ENTITY iso.doc.date "20081103">
<!ENTITY draft.day "3">
<!ENTITY draft.month "November">
<!ENTITY draft.year "2008">
<!ATTLIST spec xml:lang CDATA #IMPLIED>
]>
<spec xml:lang="en" w3c-doctype="&doc.type;" status="&doc.status;">
 <header>
  <title>Legacy extended IRIs for XML resource identification</title>
  <w3c-designation>&doc.type;-&iso.doc.date;</w3c-designation>
  <w3c-doctype>&w3c-document-type;</w3c-doctype>
  <pubdate>
   <day>&draft.day;</day>
   <month>&draft.month;</month>
   <year>&draft.year; (BNF comment style corrected in place 2009-07-09)</year>
  </pubdate>
  <publoc>
  <loc href="http://www.w3.org/TR/&draft.year;/NOTE-leiri-&iso.doc.date;/"/>
  </publoc>
  <altlocs><loc href="Overview.xml">XML</loc>
  </altlocs>
  <latestloc>
  <loc href="http://www.w3.org/TR/leiri/"/>
  </latestloc>
  <prevlocs>
  </prevlocs>
  <authlist>
   <author>
    <name>Henry S. Thompson</name>
    <affiliation>University of Edinburgh</affiliation>
    <email href="mailto:ht@inf.ed.ac.uk">ht@inf.ed.ac.uk</email>
   </author>
   <author>
    <name>Richard Tobin</name>
    <affiliation>University of Edinburgh</affiliation>
    <email href="mailto:richard@inf.ed.ac.uk">richard@inf.ed.ac.uk</email>
   </author>
   <author>
    <name>Norman Walsh</name>
    <affiliation>Mark Logic Corporation</affiliation>
    <email href="mailto:norman.walsh@marklogic.com">norman.walsh@marklogic.com</email>
   </author>
  </authlist>
<status>
<p><emph>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the <loc href="http://www.w3.org/TR/">W3C technical reports index</loc>  at http://www.w3.org/TR/.</emph></p>
<p>This document is a W3C Working Group Note.  It has been developed
by the <loc href="http://www.w3.org/XML/Core/">XML Core Working Group</loc>, part of the <loc href="http://www.w3.org/XML/">XML Activity</loc> in the W3C <loc href="http://www.w3.org/UbiWeb/">Ubiquitous Web Domain</loc>.</p>
<p>Publication as a Working Group Note does not imply endorsement by
the W3C Membership. This is a draft document and may be updated,
replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than work in
progress.</p>
      <p>Please send comments about this document to <loc href="mailto:xml-editor@w3.org">xml-editor@w3.org</loc>
(<loc href="http://lists.w3.org/Archives/Public/xml-editor/">archived</loc>).</p>
 <p>This document is very closely based on material from <bibref ref="iribis"/>, specifically section 2.2, "ABNF for IRI References and IRIs" and
section 7, "Legacy Extended IRIs", included here by permission of its
authors.  It is intended to provide a basis for a single
normative reference from many XML- and/or HTML-related standards in advance of
the final publication of <bibref ref="iribis"/> as an RFC.  When that publication occurs, this specification will be
re-issued to reference it in place of the extracts given below.</p>
<p> This document was produced by a group operating under the <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</loc>. W3C maintains a <loc href="http://www.w3.org/2004/01/pp-impl/18796/status">public list of any patent disclosures</loc> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</loc> must disclose the information in accordance with <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</loc>. </p>
</status>
  <abstract id="abstract">
   <p>For historic reasons, some formats have allowed variants of IRIs that
   are somewhat less restricted in syntax, for example XML system identifiers
and W3C XML Schema anyURIs.  This document provides a
   definition and a name (Legacy Extended IRI or LEIRI) for these variants for
   easy reference.  These variants have to be used with care; they
   require further processing before being fully interchangeable as
   IRIs.  New protocols and formats should not use Legacy Extended IRIs.</p>
  </abstract>
  <pubstmt>
   <p>World-Wide Web Consortium, XML Core
    Working Group, 2008.</p>
  </pubstmt>
  <sourcedesc>
   <p>Created in electronic form using XML.</p>
  </sourcedesc>
  <langusage>
   <language id="EN">English</language>
   <language id="abnf">Augmented Backus-Naur Form (formal grammar)</language>
   <language>Extensible Markup Language (XML)</language> </langusage>
  <revisiondesc>
   <slist>
    <sitem>2008-09-17 Copy wholesale from <loc href="http://tools.ietf.org/html/draft-duerst-iri-bis-03">Martin's
previous draft</loc></sitem>
    <sitem>2008-09-29 Updates from, and reference to, <loc href="http://tools.ietf.org/html/draft-duerst-iri-bis-04">Martin's
current draft</loc></sitem>
    <sitem>2008-10-10 and 2008-10-15 Remove Duerst in favour of Tobin
and Walsh as editors, but address a number of Duerst's concerns</sitem>
   </slist>
  </revisiondesc>
  </header>
  <body>
    <div1 id="intro">
      <head>Introduction</head>
     <p>For historic reasons, some formats have allowed variants of IRIs <bibref ref="iri"/> that
   are somewhat less restricted in syntax, for example XML system identifiers
and W3C XML Schema anyURIs.  This document provides a
   definition and a name (Legacy Extended IRI or LEIRI) for these variants for
   easier reference.  These variants have to be used with care; they
   require further processing before being fully interchangeable as 
   IRIs.  New protocols and formats <rfc2119>should not</rfc2119> use Legacy
Extended IRIs.  The provisions in this
   document also apply to Legacy Extended IRI references.
</p>
</div1>
   <div1 id="notation">
    <head>Notation</head>
    <p>In this document, characters are referenced by
   using a prefix of 'U+' followed by four to six hexadecimal digits.</p>
    <p>In this document, the key words <rfc2119>must</rfc2119>, <rfc2119>must not</rfc2119>, <rfc2119>required</rfc2119>,
   <rfc2119>shall</rfc2119>, <rfc2119>shall not</rfc2119>, <rfc2119>should</rfc2119>, <rfc2119>should not</rfc2119>, <rfc2119>recommended</rfc2119>, <rfc2119>may</rfc2119>,
   and <rfc2119>optional</rfc2119> are to be interpreted as described in <bibref ref="maymust"/>.</p>
   </div1>
     <div1 id="syntax">
      <head>Legacy Extended IRI Syntax</head>
      <p>The syntax of Legacy Extended IRIs (LEIRIs) and LEIRI references is
the same as that for IRIs and IRI references except that
<nt def="ucschar">ucschar</nt> is redefined. The syntax of this
   ABNF is described in <bibref ref="abnf_spec"/>.  Character numbers are taken from the
   UCS, without implying any actual binary encoding.  Terminals in the
   ABNF are characters, not bytes.</p>
      <p>For consistency with <bibref ref="iri"/> for IRIs,
generic LEIRI software <rfc2119>should not</rfc2119> check
LEIRIs for conformance to this syntax.</p>
<p>Some productions are ambiguous.  The "first-match-wins" (a.k.a.
   "greedy") algorithm applies.  For details, see <bibref ref="rfc3986"/>.</p>
      <scrap lang="abnf">
  <head>Productions changed from RFC3986</head>
  <prodgroup>
<prod id="LEIRI"><lhs>LEIRI</lhs><rhs><nt def="scheme">scheme</nt> ":" <nt def="ihier-part">ihier-part</nt> [ "?" <nt def="iquery">iquery</nt> ]
                    [ "#" <nt def="ifragment">ifragment</nt> ]</rhs>
</prod>
<prod id="ihier-part"><lhs>ihier-part</lhs>
 <rhs>"//" <nt def="iauthority">iauthority</nt> <nt def="ipath-abempty">ipath-abempty</nt></rhs>
 <rhs>/ <nt def="ipath-absolute">ipath-absolute</nt></rhs>
 <rhs>/ <nt def="ipath-rootless">ipath-rootless</nt></rhs>
 <rhs>/ <nt def="ipath-empty">ipath-empty</nt></rhs>
</prod>
<prod id="LEIRI-reference"><lhs>LEIRI-reference</lhs><rhs><nt def="LEIRI">LEIRI</nt> / <nt def="irelative-ref">irelative-ref</nt></rhs>
</prod>
<prod id="absolute-LEIRI"><lhs>absolute-LEIRI</lhs><rhs><nt def="scheme">scheme</nt> ":" <nt def="ihier-part">ihier-part</nt> [ "?" <nt def="iquery">iquery</nt> ]</rhs>
</prod>
<prod id="irelative-ref"><lhs>irelative-ref</lhs><rhs><nt def="irelative-part">irelative-part</nt> [ "?" <nt def="iquery">iquery</nt> ] [ "#" <nt def="ifragment">ifragment</nt> ]</rhs>
</prod>
<prod id="irelative-part"><lhs>irelative-part</lhs><rhs>"//" <nt def="iauthority">iauthority</nt> <nt def="ipath-abempty">ipath-abempty</nt></rhs>
<rhs>/ <nt def="ipath-absolute">ipath-absolute</nt></rhs>
<rhs>/ <nt def="ipath-noscheme">ipath-noscheme</nt></rhs>
<rhs>/ <nt def="ipath-empty">ipath-empty</nt></rhs>
</prod>
<prod id="iauthority"><lhs>iauthority</lhs><rhs>[ <nt def="iuserinfo">iuserinfo</nt> "@" ] <nt def="ihost">ihost</nt> [ ":" <nt def="port">port</nt> ]</rhs>
</prod>
<prod id="iuserinfo"><lhs>iuserinfo</lhs><rhs>*( <nt def="iunreserved">iunreserved</nt> / <nt def="pct-encoded">pct-encoded</nt> / <nt def="sub-delims">sub-delims</nt> / ":" )</rhs>
</prod>
<prod id="ihost"><lhs>ihost</lhs><rhs><nt def="IP-literal">IP-literal</nt> / <nt def="IPv4address">IPv4address</nt> / <nt def="ireg-name">ireg-name</nt></rhs>
</prod>
<prod id="ireg-name"><lhs>ireg-name</lhs><rhs>*( <nt def="iunreserved">iunreserved</nt> / <nt def="pct-encoded">pct-encoded</nt> / <nt def="sub-delims">sub-delims</nt> )</rhs>
</prod>
<prod id="ipath"><lhs>ipath</lhs><rhs><nt def="ipath-abempty">ipath-abempty</nt>   <com>begins with "/" or is empty</com></rhs>
<rhs>/ <nt def="ipath-abempty">ipath-absolute</nt>  <com>begins with "/" but not "//"</com></rhs>
<rhs>/ <nt def="ipath-abempty">ipath-noscheme</nt>  <com>begins with a non-colon segment</com></rhs>
<rhs>/ <nt def="ipath-abempty">ipath-rootless</nt>  <com>begins with a segment</com></rhs>
<rhs>/ <nt def="ipath-abempty">ipath-empty</nt>     <com>zero characters</com></rhs>
</prod>
<prod id="ipath-abempty"><lhs>ipath-abempty</lhs><rhs>*( "/" <nt def="isegment">isegment</nt> )</rhs>
</prod>
<prod id="ipath-absolute"><lhs>ipath-absolute</lhs><rhs>"/" [ <nt def="isegment-nz">isegment-nz</nt> *( "/" <nt def="isegment">isegment</nt> ) ]</rhs>
</prod>
<prod id="ipath-noscheme"><lhs>ipath-noscheme</lhs><rhs><nt def="isegment-nz-nc">isegment-nz-nc</nt> *( "/" <nt def="isegment">isegment</nt> )</rhs>
</prod>
<prod id="ipath-rootless"><lhs>ipath-rootless</lhs><rhs><nt def="isegment-nz">isegment-nz</nt> *( "/" <nt def="isegment">isegment</nt> )</rhs>
</prod>
<prod id="ipath-empty"><lhs>ipath-empty</lhs><rhs>0&lt;<nt def="ipchar">ipchar</nt>></rhs>
</prod>
<prod id="isegment"><lhs>isegment</lhs><rhs>*<nt def="ipchar">ipchar</nt></rhs>
</prod>
<prod id="isegment-nz"><lhs>isegment-nz</lhs><rhs>1*<nt def="ipchar">ipchar</nt></rhs>
</prod>
<prod id="isegment-nz-nc"><lhs>isegment-nz-nc</lhs><rhs>1*( <nt def="ipath-abempty">iunreserved</nt> / <nt def="ipath-abempty">pct-encoded</nt> / <nt def="ipath-abempty">sub-delims</nt> / "@" )</rhs>
                  <rhs><com>non-zero-length segment without any colon ":"</com></rhs>
</prod>
<prod id="ipchar"><lhs>ipchar</lhs><rhs><nt def="iunreserved">iunreserved</nt> / <nt def="pct-encoded">pct-encoded</nt> / <nt def="sub-delims">sub-delims</nt> / ":"</rhs>
<rhs>/ "@"</rhs>
</prod>
<prod id="iquery"><lhs>iquery</lhs><rhs>*( <nt def="ipchar">ipchar</nt> / <nt def="iprivate">iprivate</nt> / "/" / "?" )</rhs>
</prod>
<prod id="ifragment"><lhs>ifragment</lhs><rhs>*( <nt def="ipchar">ipchar</nt> / "/" / "?" )</rhs>
</prod>
<prod id="iunreserved"><lhs>iunreserved</lhs><rhs><xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</xnt> / <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt> / "-"
/ "." / "_" / "~" / <nt def="ipath-abempty">ucschar</nt></rhs>
</prod>
<prod id="iprivate"><lhs>iprivate</lhs><rhs>%xE000-F8FF / %xE0000-E0FFF / %xF0000-FFFFD</rhs>
<rhs>/ %x100000-10FFFD</rhs>
</prod>
</prodgroup>
        </scrap>
	<scrap lang="abnf">
  <head>Productions unchanged from RFC3986</head>
  <prodgroup>
<prod id="scheme"><lhs>scheme</lhs><rhs><xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</xnt> *( <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</xnt> / <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt> / "+" / "-" / "." )</rhs>
</prod>
<prod id="port"><lhs>port</lhs><rhs>*<xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt></rhs>
</prod>
<prod id="IP-literal"><lhs>IP-literal</lhs><rhs>"[" ( <nt def="ipath-abempty">IPv6address</nt> / <nt def="ipath-abempty">IPvFuture</nt>  ) "]"</rhs>
</prod>
<prod id="IPvFuture"><lhs>IPvFuture</lhs><rhs>"v" 1*<xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</xnt> "." 1*( <nt def="ipath-abempty">unreserved</nt> / <nt def="ipath-abempty">sub-delims</nt> / ":" )</rhs>
</prod>
<prod id="IPv6address"><lhs>IPv6address</lhs><rhs>                           6( <nt def="h16">h16</nt> ":" ) <nt def="h16">ls32</nt></rhs>
<rhs>/                       "::" 5( <nt def="h16">h16</nt> ":" ) <nt def="h16">ls32</nt></rhs>
<rhs>/ [               <nt def="h16">h16</nt> ] "::" 4( <nt def="h16">h16</nt> ":" ) <nt def="h16">ls32</nt></rhs>
<rhs>/ [ *1( <nt def="h16">h16</nt> ":" ) <nt def="h16">h16</nt> ] "::" 3( <nt def="h16">h16</nt> ":" ) <nt def="h16">ls32</nt></rhs>
<rhs>/ [ *2( <nt def="h16">h16</nt> ":" ) <nt def="h16">h16</nt> ] "::" 2( <nt def="h16">h16</nt> ":" ) <nt def="h16">ls32</nt></rhs>
<rhs>/ [ *3( <nt def="h16">h16</nt> ":" ) <nt def="h16">h16</nt> ] "::"    <nt def="h16">h16</nt> ":"   <nt def="h16">ls32</nt></rhs>
<rhs>/ [ *4( <nt def="h16">h16</nt> ":" ) <nt def="h16">h16</nt> ] "::"              <nt def="h16">ls32</nt></rhs>
<rhs>/ [ *5( <nt def="h16">h16</nt> ":" ) <nt def="h16">h16</nt> ] "::"              <nt def="h16">h16</nt> </rhs>
<rhs>/ [ *6( <nt def="h16">h16</nt> ":" ) <nt def="h16">h16</nt> ] "::"</rhs>
</prod>
<prod id="h16"><lhs>h16</lhs><rhs>1*4<xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</xnt></rhs>
</prod>
<prod id="ls32"><lhs>ls32</lhs><rhs>( <nt def="ipath-abempty">h16</nt>
":" <nt def="ipath-abempty">h16</nt> ) / <nt def="ipath-abempty">IPv4address</nt> </rhs>
</prod>
<prod id="IPv4address"><lhs>IPv4address</lhs><rhs><nt def="dec-octet">dec-octet</nt> "." <nt def="dec-octet">dec-octet</nt> "." <nt def="dec-octet">dec-octet</nt> "." <nt def="dec-octet">dec-octet</nt></rhs>
</prod>
<prod id="dec-octet"><lhs>dec-octet</lhs><rhs><xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt>                 <com>0-9</com></rhs>
<rhs>/ %x31-39 <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt>         <com>10-99</com></rhs>
<rhs>/ "1" 2<xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt>            <com>100-199</com></rhs>
<rhs>/ "2" %x30-34 <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt>     <com>200-249</com></rhs>
<rhs>/ "25" %x30-35          <com>250-255</com></rhs>
</prod>
<prod id="pct-encoded"><lhs>pct-encoded</lhs><rhs>"%" <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</xnt> <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</xnt></rhs>
</prod>
<prod id="unreserved"><lhs>unreserved</lhs><rhs><xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</xnt> / <xnt href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</xnt> / "-" / "." / "_" / "~"</rhs>
</prod>
<prod id="reserved"><lhs>reserved</lhs><rhs><nt def="gen-delims">gen-delims</nt> / <nt def="sub-delims">sub-delims</nt></rhs>
</prod>
<prod id="gen-delims"><lhs>gen-delims</lhs><rhs>":" / "/" / "?" / "#" / "[" / "]" / "@"</rhs>
</prod>
<prod id="sub-delims"><lhs>sub-delims</lhs><rhs>"!" / "$" / "&amp;" / "'" / "(" / ")"</rhs>
<rhs>/ "*" / "+" / "," / ";" / "="</rhs>
</prod>  </prodgroup>
        </scrap>
      <scrap lang="abnf">
       <head>Modified ucschar production</head>
       <prod id="ucschar">
        <lhs>ucschar</lhs>
        <rhs>" " / "&lt;" / ">" / '"' / "{" / "}" / "|"</rhs>
        <rhs>/ "\" / "^" / "`" / %x0-1F / %x7F-D7FF</rhs>
        <rhs>/ %xE000-FFFD / %x10000-10FFFF</rhs>
       </prod>
      </scrap>
      <p>The restriction on bidirectional formatting characters in
<xspecref href="http://tools.ietf.org/html/rfc3629#section-4.1">Section 4.1</xspecref> of <bibref ref="iri"/>
   is lifted.  The <nt def="iprivate">iprivate</nt> production becomes redundant.</p>
      <p>Formats that use Legacy Extended IRIs <rfc2119>may</rfc2119> further restrict the
   characters allowed therein, either implicitly by the fact that the
   format as such does not allow some characters, or explicitly.  An
   example of a character not allowed implicitly may be the NUL
   character (<code>U+0000</code>).  However, all the characters allowed in IRIs <rfc2119>must</rfc2119>
   still be allowed.</p>
</div1>
   <div1 id="conversion">
    <head>Conversion of Legacy Extended IRIs to IRIs</head>
    <p>To convert a Legacy Extended IRI (reference) to an IRI (reference), each character allowed in a Legacy Extended IRI (reference)
   but not allowed in an IRI (reference) (see <specref ref="charStatus"/>) <rfc2119>must</rfc2119> be
   percent-encoded by applying the following steps:
    </p>
    <olist>
      <item><p>Convert the character to a sequence of one or more octets
         using UTF-8 <bibref ref="rfc3629"/>.</p></item>
      <item><p>Convert each octet to <code>%HH</code>, where <code>HH</code> is the hexadecimal notation
of the octet value.  Note that this is identical to the percent-encoding
mechanism in Section 2.1 of <bibref ref="rfc3986"/>.  To reduce variability,
the hexadecimal notation <rfc2119>should</rfc2119> use uppercase letters.</p></item>
      <item><p>Replace the original character with the resulting character
         sequence (that is, a sequence of <code>%HH</code> triplets).</p></item>
     </olist>
    <p>Conversion from a LEIRI to an IRI or a URI <rfc2119>must</rfc2119> be performed only when absolutely necessary and as late as possible in a processing chain. In particular, neither the process of converting a relative LEIRI to an absolute one nor the process of passing a LEIRI to a process or software component responsible for dereferencing it <rfc2119>should</rfc2119> trigger percent-encoding.</p>
   </div1>
   <div1 id="charStatus">
    <head>Characters allowed in Legacy Extended IRIs but not in IRIs</head>
    <p>This section provides a list of the groups of characters and code
   points that are allowed in Legacy Extedend IRIs but are not allowed
   in IRIs or are allowed in IRIs only in the query part.  For each
   group of characters, advice on the usage of these characters is also
   given, concentrating on the reasons not to use them.</p>
    <glist>
     <gitem>
      <label>Space (U+0020)</label>
      <def>
       <p>Some formats and applications use space as a delimiter, for example, for
items in a list.  Appendix C of <bibref ref="rfc3986"/> also mentions that
white space may have to be added when displaying or printing long URIs; the
same applies to long IRIs.  This means that spaces can disappear or can make
the Legacy Extended IRI to be interpreted as two or more separate IRIs.</p>
      </def>
     </gitem>
     <gitem>
      <label>Delimiters "&lt;" (U+003C), ">" (U+003E) and '"' (U+0022)</label>
      <def>
       <p>Appendix C of <bibref ref="rfc3986"/> suggests the use of
double-quotes (<code>"http://example.com/"</code>) and angle brackets
(<code>&lt;http://example.com/></code>) as delimiters for URIs in plain text.
These conventions are often used and also apply to IRIs.  Legacy Extended IRIs
using these characters will be cut off at the wrong place.</p>
      </def>
     </gitem>
     <gitem>
      <label>Unwise characters "\" (U+005C), "^" (U+005E), "`" (U+0060), "{"
      (U+007B), "|" (U+007C) and "}" (U+007D)</label>
      <def>
       <p>These characters
      originally have been excluded from URIs because the respective
      codepoints are assigned to different graphic characters in some
      7-bit or 8-bit encoding.  Despite the move to Unicode, some of
      these characters are still occasionally displayed differently on
      some systems, for example, <code>U+005C</code> as a Japanese Yen symbol.  Also, the
      fact that these characters are not used in URIs or IRIs has
      encouraged their use outside URIs or IRIs in contexts that may
      include URIs or IRIs.  In case a Legacy Extended IRI with such a
      character is used in such a context, the Legacy Extended IRI will
      be interpreted piecemeal.</p>
      </def>
     </gitem>
     <gitem>
      <label>The controls (C0 controls, DEL and C1 controls, U+0000 - U+001F U+007F - U+009F)</label>
      <def>
       <p>There is no way to transmit these characters reliably
      except potentially in electronic form.  Even when in electronic
      form, some software components might silently filter out some of
      these characters or may stop processing alltogether when
      encountering some of them.  These characters may affect text
      display in subtle, unnoticable ways or in drastic, global and
      irreversible ways depending on the hardware and software involved.
      The use of some of these characters may allow malicious users to
      manipulate the display of a Legacy Extended IRI and its context.</p>
      </def>
     </gitem>
     <gitem>
      <label>Bidi formatting characters (U+200E, U+200F, U+202A-202E)</label>
      <def>
       <p>These
      characters affect the display ordering of characters.  Displayed
      Legacy Extended IRIs containing these characters cannot be
      converted back to electronic form (logical order) unambiguously.
      These characters may allow malicious users to manipulate the
      display of a Legacy Extended IRI and its context.</p>
      </def>
     </gitem>
     <gitem>
      <label>Specials (U+FFF0-FFFD)</label>
      <def>
       <p>These code points provide functionality
       beyond that useful in a Legacy Extended IRI, for example byte
       order identification, annotation and replacements for unknown
       characters and objects.  Their use and interpretation in a Legacy
       Extended IRI serves no purpose and may lead to confusing display
       variations.</p>
      </def>
     </gitem>
     <gitem>
      <label>Private use code points (U+E000-F8FF, U+F0000-FFFFD, U+100000-
      10FFFD)</label>
      <def>
       <p>Display and interpretation of these code points is by
      definition undefined without private agreement.  Therefore, these
      code points are not suited for use on the Internet.  They are not
      interoperable and may have unpredictable effects.</p>
      </def>
     </gitem>
     <gitem>
      <label>Tags (U+E0000-E0FFF)</label>
      <def>
       <p> These characters provide a way to include language
       tags in Unicode plain text.  They are not appropriate for Legacy
       Extended IRIs because language information in identifiers cannot
       reliably be input, transmitted (for example, on a visual medium such as
       paper), or recognized.</p>
      </def>
     </gitem>
     <gitem>
      <label>Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF, U+2FFFE-2FFFF,
      U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF,
      U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF,
      U+BFFFE-BFFFF, U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF,
      U+FFFFE-FFFFF, U+10FFFE-10FFFF)</label>
      <def>
       <p>These code points are defined as
      non-characters.  Applications may use some of them internally, but
      are not prepared to interchange them.</p>
      </def>
     </gitem>
    </glist>
    <p>For reference, we here also list the code points and code units not
   even allowed in Legacy Extended IRIs:</p>
    <glist>
     <gitem>
      <label>Surrogate code units (U+D800-U+DFFF)</label>
      <def>
       <p>These do not represent Unicode
      codepoints.</p>
      </def>
     </gitem>
    </glist>
   </div1>
</body>

<back>
<div1 id="refs">
 <head>References</head>
 <blist>
  <bibl id="maymust" key="RFC2119">Bradner, S., <emph>Key words for use in RFCs to Indicate
              Requirement Levels</emph>, BCP 14, RFC 2119, IETF, March 1997. 
Available online as <loc href="http://tools.ietf.org/html/bcp14">http://tools.ietf.org/html/bcp14</loc> and <loc href="http://tools.ietf.org/html/rfc2119">http://tools.ietf.org/html/rfc2119</loc></bibl>
  <bibl id="abnf_spec" key="RFC5234">Crocker, D. and P. Overell, Eds, <emph>Augmented BNF for Syntax
              Specifications: ABNF</emph>, RFC 5234/STD 68, IETF, January 2008. 
Available online as <loc href="http://tools.ietf.org/html/rfc5234">http://tools.ietf.org/html/rfc5234</loc>
</bibl>
  <bibl id="rfc3629" key="RFC3629">Yergeau, F., <emph>UTF-8, a transformation format of ISO
              10646</emph>, STD 63, RFC 3629, IETF November 2003.  Available
online as <loc href="http://tools.ietf.org/html/rfc3629">http://tools.ietf.org/html/rfc3629</loc>.</bibl>
  <bibl id="rfc3986" key="RFC3986">Berners-Lee, T., R. Fielding and L. Masinter, <emph>Uniform
              Resource Identifier (URI): Generic Syntax</emph>, STD 66,
              RFC 3986, IETF, January 2005.  Available online as <loc href="http://tools.ietf.org/html/rfc3986">http://tools.ietf.org/html/rfc3986</loc>.</bibl>
  <bibl id="iri" key="RFC3987"><emph>Internationalized Resource Identifiers
(IRIs)</emph>, RFC3987, Dürst, M. and M. Suignard, eds.  IETF,
2005.  Available online as <loc href="http://tools.ietf.org/html/rfc3987">http://tools.ietf.org/html/rfc3987</loc></bibl>
  <bibl id="iribis" key="IRI-bis"><emph>Internationalized Resource Identifiers
(IRIs)</emph>, draft-duerst-iri-bis-04, Dürst, M. and M. Suignard, eds.  IETF,
2008.  Available online as <loc href="http://tools.ietf.org/html/draft-duerst-iri-bis-04">http://tools.ietf.org/html/draft-duerst-iri-bis-04</loc>.</bibl>
 </blist>
</div1>
</back>
</spec>
