This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26936 - Correct one range in url-code-points
Summary: Correct one range in url-code-points
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: All All
: P3 major
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-30 07:32 UTC by mark
Modified: 2014-09-30 11:51 UTC (History)
1 user (show)

See Also:


Attachments

Description mark 2014-09-30 07:32:46 UTC
The URL code points are defined in https://url.spec.whatwg.org/#url-code-points

They include the following ranges

...U+D0000 to U+DFFFD, U+E1000 to U+EFFFD

The U+E1000 is incorrect, and needs to be changed to U+E0000, like the other range starts.


Background

There are several reasons for this

1. compatibility with the following, which removes U+DFFFE and U+DFFFF, but not the range up to U+E0FFF.
https://html.spec.whatwg.org/multipage/syntax.html#preprocessing-the-input-stream

2. compatibility with XML characters, which do the same
http://www.w3.org/TR/REC-xml/#dt-character

3. compatibility with UTS46, which allows characters in that range (ignored, but allowed).

E0100..E01EF; ignored    # 4.0  VARIATION SELECTOR-17..VARIATION SELECTOR-256
E01F0..EFFFD; disallowed # NA   <reserved-E01F0>..<reserved-EFFFD>

This is important in the definitions of path, query, and fragment states (among others), because they use the URL code point.

https://url.spec.whatwg.org/#relative-path-state
https://url.spec.whatwg.org/#query-state
https://url.spec.whatwg.org/#fragment-state

Note: The VS-17..256 are used to indicate particular variants of CJK characters, and it is important that they be allowed in paths, queries, fragments, etc.