This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
If I interpreted correctly, with the current spec, "http://ho st" will be parsed with no errors at all, resulting in host set to "ho st". This doesn't match WebKit or Gecko behaviour. WebKit: location.host = 'ho st' will lead to an HTTP request to "http://ho%20st" Gecko: location.host = 'ho st' will throw NS_ERROR_MALFORMED_URI
Are you sure? I thought the ToASCII algorithm would fail. If it doesn't this would be a bug somewhere.
Right. ToASCII algorithm will fail if the UseSTD3ASCIIRules flag is set. It won't otherwise.
Okay, so browsers do not implement UseSTD3ASCIIRules. They allow some, such as _, but forbid others, such as space. Le sigh. We cannot set that flag because of the underscore. So we need a list of code points, including space, that will make the parser return failure. That seems the easiest. Maybe that could be done in the same step that lowercases certain code points.
Firefox only forbids 0x00 and 0x20 at the moment. However, if we percent-decode we should also forbid "%", "/", "\", "?", "#", and ":" as otherwise you can get re-parsing attacks.
I used this to find failure code points: <script> function testURL(url, cp) { var a = document.createElement("a") a.href = url output = cp + ": " if(a.host) output += "parsed; " output += a.host w(output) } for(var i = 0; i < 0xFF; i++) { var url = "http://a" + String.fromCodePoint(i) + "a/" testURL(url, i) } </script> in http://software.hixie.ch/utilities/js/live-dom-viewer/
https://github.com/whatwg/url/commit/81cdd6704ea695e1619e76794227d2c9d10d2aa7
I'm attaching results for this test (modified to work in other browsers): <script> function testURL(url, cp) { var output = "0x" + cp.toString(16) + ": "; try { var a = document.createElement("a"); a.href = url; if(a.host) output += "parsed; "; output += a.host; } catch (e) { output += e; } w(output); } for(var i = 0; i < 0xFF; i++) { var url = "http://a" + String.fromCharCode(i) + "a/"; testURL(url, i); } </script> Firefox is the most permissive here. On Firefox 25, parsing fails (as in !a.host) for: 0x00, 0x20, 0x3A (':'), Chrome 25 only works with 0x20-0x24, 0x26-0x2E, 0x30-0x39, 0x3C-0x3E, 0x40-0x5A, 0x5F-0x7D. Note that: - 0x09, 0x0A, 0x0D are ignored before parse and shouldn't be expected after ToASCII. - 0x2F ('/') works with this test, but will fail if used if used to set a.host. - 0x3A (':') works both with this test and setting a.host, but it leads to an unexpected result (':0' and 'a:0' respectively) instead of 'a%3Aa' or 'a:a'. Safari 7 only works with 0x25, 0x2D, 0x2E, 0x30-0x39, 0x41-0x5A, 0x5F, 0x61-0x7A. - 0x00, 0x3F ('#'), 0x2F ('/'), 0x3F ('?'), 0x40 ('@'), 0x5C ('\') are accepted... but truncate host. IE 11 and 10 only works with 0x01-0x24, 0x26-0x2E, 0x30-0x39, 0x3B-0x3E, 0x41-0x5B, 0x5D-0x7F. - 0x00 is accepted but truncates host. - 0x09, 0x0A, 0x0D are ignored before parse and shouldn't be expected after ToASCII. - 0x25, 0x30 will throw "Invalid argument". - I could not test 0x2F, 0x3F, 0x40, 0x5C properly. For all tests, I've ignored output above 0x7F. Those shouldn't be present after ToASCII (and it would should be failure if they are).
Created attachment 1426 [details] TEST-CHROME-31
Created attachment 1427 [details] TEST-FIREFOX-25
Created attachment 1428 [details] TEST-IE-11
Created attachment 1429 [details] TEST-SAFARI-7
Honestly, I can't see any benefit in accepting anything not in the 0-9a-z_ range after ToASCII is run. This is the only range that works consistently across browsers and it's somewhat close to previous standards (including IDNA), being "_" the only deviation, which is justified by widespread use in the real world.
The fix here was wrong. It needs to be happen after ToASCII has run. The idea behind letting through most ASCII code points is to allow for weirdly configured intranet environments. A network error seems preferable to not being able to type in an address.
See http://krijnhoetmer.nl/irc-logs/whatwg/20140113#l-828 for why the fix was wrong. New fix deployed. https://github.com/whatwg/url/commit/0eaf28c5ae63b5b0487cce484f3ce201e0d98494