This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27289 - Host parsing and utf8PercentDecode
Summary: Host parsing and utf8PercentDecode
Status: RESOLVED INVALID
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: Unsorted
Assignee: Sam Ruby
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on: 25946
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-10 00:23 UTC by Sam Ruby
Modified: 2014-11-20 19:56 UTC (History)
2 users (show)

See Also:


Attachments

Description Sam Ruby 2014-11-10 00:23:44 UTC
Consider the following URLs:

1) http://intertwingly.net/projects/pegurl/liveview.html#http://¡/
2) http://intertwingly.net/projects/pegurl/liveview.html#http://¡/
3) http://intertwingly.net/projects/pegurl/liveview.html#http://%C2%A1/

If you perform a utf-8 decode without BOM on the percent decoding of utf-8 encode on the third URL, you will end up with the second URL.  And indeed browsers treat these the two URLs the same.

Unfortunately, this is also true for the first URL by virtue of the fact that there are no percent signs to decode.  I say unfortunately as browsers do not treat these two URLs the same.

I propose that the spec text in step 3 of https://url.spec.whatwg.org/#host-parsing be changed to reference a utf8PercentDecode function, one that only UTF-8 decodes bytes that were produced by percent decoding.  One possible implementation of such a function can be found in:

http://intertwingly.net/projects/pegurl/url.js
Comment 1 Anne 2014-11-10 09:08:56 UTC
I don't understand why 1) and 2) would be treated the same. Could you elaborate?
Comment 2 Sam Ruby 2014-11-20 19:56:07 UTC
After a more careful reading of the spec, I've come to the conclusion that this bug is invalid.