This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23958 - Percent-decode on non-ASCII input
Summary: Percent-decode on non-ASCII input
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-02 20:19 UTC by Simon Sapin
Modified: 2014-01-14 16:11 UTC (History)
1 user (show)

See Also:


Attachments

Description Simon Sapin 2013-12-02 20:19:38 UTC
http://url.spec.whatwg.org/#host-parsing

[[
Let host be the result of running utf-8's decoder on the percent decoding of input. 
]]

Percent decoding is defined for "a string using code points in the range U+0000 to U+007F", but there is not guarantee that input does not contain other code points at that point of host parsing.

What happens to non-ASCII code points?
Comment 1 Anne 2013-12-03 13:14:40 UTC
How do you get non-ASCII there?
Comment 2 Simon Sapin 2013-12-03 13:29:18 UTC
new URL('http://☃/')
Comment 3 Anne 2013-12-03 13:33:32 UTC
So one option would be to percent escape during the host name state. But that is not the only entry point to the host parser. Hmm.
Comment 4 Simon Sapin 2013-12-04 19:01:36 UTC
The host parser should UTF8-percent-encode, just before it percent-decodes in step 3.