This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The URL code points are defined in https://url.spec.whatwg.org/#url-code-points They include the following ranges ...U+D0000 to U+DFFFD, U+E1000 to U+EFFFD The U+E1000 is incorrect, and needs to be changed to U+E0000, like the other range starts. Background There are several reasons for this 1. compatibility with the following, which removes U+DFFFE and U+DFFFF, but not the range up to U+E0FFF. https://html.spec.whatwg.org/multipage/syntax.html#preprocessing-the-input-stream 2. compatibility with XML characters, which do the same http://www.w3.org/TR/REC-xml/#dt-character 3. compatibility with UTS46, which allows characters in that range (ignored, but allowed). E0100..E01EF; ignored # 4.0 VARIATION SELECTOR-17..VARIATION SELECTOR-256 E01F0..EFFFD; disallowed # NA <reserved-E01F0>..<reserved-EFFFD> This is important in the definitions of path, query, and fragment states (among others), because they use the URL code point. https://url.spec.whatwg.org/#relative-path-state https://url.spec.whatwg.org/#query-state https://url.spec.whatwg.org/#fragment-state Note: The VS-17..256 are used to indicate particular variants of CJK characters, and it is important that they be allowed in paths, queries, fragments, etc.
https://github.com/whatwg/url/commit/d7010306adf67d6e07d645122c2c27f8a1f8cf31