<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>26936</bug_id>
          
          <creation_ts>2014-09-30 07:32:46 +0000</creation_ts>
          <short_desc>Correct one range in url-code-points</short_desc>
          <delta_ts>2014-09-30 11:51:46 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>URL</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>major</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>mark</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>mike</cc>
          
          <qa_contact>sideshowbarker+urlspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>112445</commentid>
    <comment_count>0</comment_count>
    <who name="">mark</who>
    <bug_when>2014-09-30 07:32:46 +0000</bug_when>
    <thetext>The URL code points are defined in https://url.spec.whatwg.org/#url-code-points

They include the following ranges

...U+D0000 to U+DFFFD, U+E1000 to U+EFFFD

The U+E1000 is incorrect, and needs to be changed to U+E0000, like the other range starts.


Background

There are several reasons for this

1. compatibility with the following, which removes U+DFFFE and U+DFFFF, but not the range up to U+E0FFF.
https://html.spec.whatwg.org/multipage/syntax.html#preprocessing-the-input-stream

2. compatibility with XML characters, which do the same
http://www.w3.org/TR/REC-xml/#dt-character

3. compatibility with UTS46, which allows characters in that range (ignored, but allowed).

E0100..E01EF; ignored    # 4.0  VARIATION SELECTOR-17..VARIATION SELECTOR-256
E01F0..EFFFD; disallowed # NA   &lt;reserved-E01F0&gt;..&lt;reserved-EFFFD&gt;

This is important in the definitions of path, query, and fragment states (among others), because they use the URL code point.

https://url.spec.whatwg.org/#relative-path-state
https://url.spec.whatwg.org/#query-state
https://url.spec.whatwg.org/#fragment-state

Note: The VS-17..256 are used to indicate particular variants of CJK characters, and it is important that they be allowed in paths, queries, fragments, etc.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112455</commentid>
    <comment_count>1</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-09-30 11:51:46 +0000</bug_when>
    <thetext>https://github.com/whatwg/url/commit/d7010306adf67d6e07d645122c2c27f8a1f8cf31</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>