This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Specification: http://www.whatwg.org/specs/web-apps/current-work/ Multipage: http://www.whatwg.org/C#dependencies Complete: http://www.whatwg.org/c#dependencies Comment: IDNA is obsoleted by IDNA2008 which in turn is patched by http://unicode.org/reports/tr46/ I think. However, which version browsers implement is not entirely clear to me. Posted from: 212.238.236.229 by annevk@annevk.nl User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.10 (KHTML, like Gecko) Chrome/23.0.1262.0 Safari/537.10
Any chance I'll be able to defer to the URL spec?
I guess that would be somewhat logical, but I am planning to defer on domain names and IP addresses myself until some implementors have spoken up. (There are plenty of problems to solve with URLs outside the domain name part.)
So I should reassign this to you in the URL component?
Yes, I will attempt to solve host parsing.
*** Bug 15254 has been marked as a duplicate of this bug. ***
*** Bug 19283 has been marked as a duplicate of this bug. ***
http://mathias.html5.org/tests/url/idna2003-separators/ has a test for domain label separators. Also make sure to support underscores and such in domain labels as they are used.
*** Bug 20036 has been marked as a duplicate of this bug. ***
For most all-ASCII domain names, the situation is relatively straightforward. I suggest the following changes to the host parser: =========== 3. If host is empty, parse error, return failure. 4. If host consists of only ASCII characters (characters in the range U+0000 to U+007F) run these substeps: 1. If host is longer than 255 characters, parse error, return failure. 2. Split host into labels separated by U+002E FULL STOP. 3. If any label is empty, or is longer than 63 characters, or begins or ends with U+002D HYPHEN-MINUS, or contains U+002D HYPHEN-MINUS in both the third and fourth positions, parse error, return failure. 4. If there are exactly four labels, and each label contains only ASCII digits, and each label represents a number from 0 through 255, return host. 5. [If there are two or more labels and the last label starts with an ASCII digit, parse error, return failure. (Not sure if this is needed.)] 6. If any label consists of a character other than either U+002D HYPHEN MINUS or an ASCII alphanumeric, parse error, return failure. 7. If any label starts with "xn--", jump to the step labeled IDNA. 8. Convert host to ASCII lowercase and return host. 4. IDNA. [IDNA hell] ===================== The following is my opinion. For IDNA, I believe the function should allow domain names that are valid in either IDNA2003 or IDNA2008. (The "deviation characters" under UTS46, such as LATIN LETTER SHARP S, may be an exception.) But I think that a separate algorithm should decide whether to return each label of the domain name as Punycode or not, such as Mozilla's IDN display algorithm, which is expected to take effect when Firefox 22 is released (see issue 722299 <https://bugzilla.mozilla.org/show_bug.cgi?id=722299>).
Correction: == 4. If there are exactly four labels, and each label contains only ASCII digits, and each label represents a number from 0 through 255 in the shortest possible form (for example, "25" and not "025"), return host. ==
Correction 2: Substep 7 ("7. If any label starts with "xn--", jump to the step labeled IDNA.") should be moved so that it comes after the step "2. Split host into labels separated by U+002E FULL STOP."
Using IDNA 2003 for now. See bug 23005 for HTML follow-up work.
*** Bug 22986 has been marked as a duplicate of this bug. ***