This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
It seems Chrome and Firefox do lowercase hosts on parsing. Should this be the standard parsing behaviour?
That's part of the IDNA 2003 algorithm referenced by the specification, no?
IDNA2003's ToASCII does not lower-case the input. At least, not if it's ASCII-only. It reads: "ToASCII never alters a sequence of code points that are all in the ASCII range to begin with". I couldn't find anything that suggest otherwise.
So there are several ways to fix this. 1) Add lowercasing to host state 2) Add lowercasing to the host parser 3) Add lowercasing to the ToASCII (and ToUnicode?) algorithms 2 seems the best, but does not impact the static methods on URL as they are defined in terms of ToASCII and ToUnicode. It might make sense to redefine those static methods in terms of the host parser and have an override in the host parser as to what kind of host you want to be returned (ASCII or Unicode).
The idea we had was to do approach 2), but have the actual lowercasing be part of the ToASCII operation. Maybe combined with what is needed for bug 24257 and bug 24191. The host parser could then have a flag as to whether ToUnicode needs to be run in the end.
https://github.com/whatwg/url/commit/81cdd6704ea695e1619e76794227d2c9d10d2aa7