23009 – Unicode normalization can produce / code points in domain names

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23009 - Unicode normalization can produce / code points in domain names

Summary: Unicode normalization can produce / code points in domain names

Status:	RESOLVED FIXED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+urlspec

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-08-19 16:11 UTC by Anne
Modified:	2014-01-14 11:19 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description Anne 2013-08-19 16:11:58 UTC

E.g. "℁" (U+2101) gives "a/s". Which means "http://ex℁ample℁" becomes "http://exa/sample/" except the host is "exa/sample" rather than "exa"...

We should probably fail host parsing if the output gives any label that contains "/" as a code point. Presumably by further overriding the IDNA2003 ToASCII algorithm. Other code points that would change re-parsing and would need to be added: ":", "\", "?", "#".

Source: http://krijnhoetmer.nl/irc-logs/whatwg/20130815#l-327

Comment 1 Peter Occil 2013-08-20 15:44:04 UTC

There is no need to "override" the algorithm; IDNA2003 already includes a flag for that purpose: "UseSTD3ASCIIRules"; see section 4 of RFC3490.

Comment 2 Anne 2013-08-20 16:28:16 UTC

It does, but that excludes way more code points than implementations do and is not compatible with the web. E.g. _ (U+005F) must not be excluded.

Comment 3 Anne 2014-01-14 11:19:59 UTC

https://github.com/whatwg/url/commit/81cdd6704ea695e1619e76794227d2c9d10d2aa7
https://github.com/whatwg/url/commit/0eaf28c5ae63b5b0487cce484f3ce201e0d98494