This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24187 - Lowercase ASCII-only host
Summary: Lowercase ASCII-only host
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on: 24191 24257
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-02 10:32 UTC by Santiago M. Mola
Modified: 2014-01-13 18:16 UTC (History)
1 user (show)

See Also:


Attachments

Description Santiago M. Mola 2014-01-02 10:32:38 UTC
It seems Chrome and Firefox do lowercase hosts on parsing. Should this be the standard parsing behaviour?
Comment 1 Anne 2014-01-06 18:13:41 UTC
That's part of the IDNA 2003 algorithm referenced by the specification, no?
Comment 2 Santiago M. Mola 2014-01-06 21:37:32 UTC
IDNA2003's ToASCII does not lower-case the input. At least, not if it's ASCII-only. It reads: "ToASCII never alters a sequence of code points that are all in the ASCII range to begin with". I couldn't find anything that suggest otherwise.
Comment 3 Anne 2014-01-08 10:10:01 UTC
So there are several ways to fix this.

1) Add lowercasing to host state
2) Add lowercasing to the host parser
3) Add lowercasing to the ToASCII (and ToUnicode?) algorithms

2 seems the best, but does not impact the static methods on URL as they are defined in terms of ToASCII and ToUnicode. It might make sense to redefine those static methods in terms of the host parser and have an override in the host parser as to what kind of host you want to be returned (ASCII or Unicode).
Comment 4 Anne 2014-01-11 17:28:08 UTC
The idea we had was to do approach 2), but have the actual lowercasing be part of the ToASCII operation. Maybe combined with what is needed for bug 24257 and bug 24191.

The host parser could then have a flag as to whether ToUnicode needs to be run in the end.