This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23009 - Unicode normalization can produce / code points in domain names
Summary: Unicode normalization can produce / code points in domain names
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-19 16:11 UTC by Anne
Modified: 2014-01-14 11:19 UTC (History)
3 users (show)

See Also:


Attachments

Description Anne 2013-08-19 16:11:58 UTC
E.g. "℁" (U+2101) gives "a/s". Which means "http://ex℁ample℁" becomes "http://exa/sample/" except the host is "exa/sample" rather than "exa"...

We should probably fail host parsing if the output gives any label that contains "/" as a code point. Presumably by further overriding the IDNA2003 ToASCII algorithm. Other code points that would change re-parsing and would need to be added: ":", "\", "?", "#".

Source: http://krijnhoetmer.nl/irc-logs/whatwg/20130815#l-327
Comment 1 Peter Occil 2013-08-20 15:44:04 UTC
There is no need to "override" the algorithm; IDNA2003 already includes a flag for that purpose: "UseSTD3ASCIIRules"; see section 4 of RFC3490.
Comment 2 Anne 2013-08-20 16:28:16 UTC
It does, but that excludes way more code points than implementations do and is not compatible with the web. E.g. _ (U+005F) must not be excluded.