This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18910 - IDNA
Summary: IDNA
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
: 15254 19283 20036 22986 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-09-18 14:08 UTC by contributor
Modified: 2016-02-22 08:27 UTC (History)
9 users (show)

See Also:


Attachments

Description contributor 2012-09-18 14:08:56 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/
Multipage: http://www.whatwg.org/C#dependencies
Complete: http://www.whatwg.org/c#dependencies

Comment:
IDNA is obsoleted by IDNA2008 which in turn is patched by
http://unicode.org/reports/tr46/ I think. However, which version browsers
implement is not entirely clear to me.

Posted from: 212.238.236.229 by annevk@annevk.nl
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.10 (KHTML, like Gecko) Chrome/23.0.1262.0 Safari/537.10
Comment 1 Ian 'Hixie' Hickson 2012-09-18 23:57:04 UTC
Any chance I'll be able to defer to the URL spec?
Comment 2 Anne 2012-09-19 07:03:10 UTC
I guess that would be somewhat logical, but I am planning to defer on domain names and IP addresses myself until some implementors have spoken up. (There are plenty of problems to solve with URLs outside the domain name part.)
Comment 3 Ian 'Hixie' Hickson 2012-10-30 00:03:49 UTC
So I should reassign this to you in the URL component?
Comment 4 Anne 2012-10-30 09:18:08 UTC
Yes, I will attempt to solve host parsing.
Comment 5 Anne 2012-12-21 14:30:08 UTC
*** Bug 15254 has been marked as a duplicate of this bug. ***
Comment 6 Anne 2012-12-21 14:30:58 UTC
*** Bug 19283 has been marked as a duplicate of this bug. ***
Comment 7 Anne 2012-12-21 14:31:50 UTC
http://mathias.html5.org/tests/url/idna2003-separators/ has a test for domain label separators.

Also make sure to support underscores and such in domain labels as they are used.
Comment 8 Anne 2012-12-21 14:32:17 UTC
*** Bug 20036 has been marked as a duplicate of this bug. ***
Comment 9 Peter Occil 2013-05-18 20:40:00 UTC
For most all-ASCII domain names, the situation is relatively straightforward.  I suggest
the following changes to the host parser:

===========
3. If host is empty, parse error, return failure.

4. If host consists of only ASCII characters (characters in the range U+0000 to U+007F)
    run these substeps:
    
     1.  If host is longer than 255 characters, parse error, return failure.
    
     2.  Split host into labels separated by U+002E FULL STOP.
        
     3.  If any label is empty, or is longer than 63 characters,
          or begins or ends with U+002D HYPHEN-MINUS, or contains U+002D HYPHEN-MINUS
          in both the third and fourth positions, parse error, return failure.
          
     4.  If there are exactly four labels, and each label contains only ASCII digits, and each
          label represents a number from 0 through 255, return host.
     
     5.  [If there are two or more labels and the last label starts with an ASCII digit, parse error,
          return failure. (Not sure if this is needed.)]
      
     6.  If any label consists of a character other than either U+002D HYPHEN MINUS or an
          ASCII alphanumeric, parse error, return failure.

     7.  If any label starts with "xn--", jump to the step labeled IDNA.

     8.  Convert host to ASCII lowercase and return host.

4. IDNA. [IDNA hell]
=====================

The following is my opinion.

For IDNA, I believe the function should allow domain names that are valid in either IDNA2003
or IDNA2008.  (The "deviation characters" under UTS46, such as LATIN LETTER SHARP S, may be
an exception.)  But I think that a separate algorithm should decide whether to return each label
of the domain name as Punycode or not, such as Mozilla's IDN display algorithm, which 
is expected to take effect when Firefox 22 is released (see issue 722299
<https://bugzilla.mozilla.org/show_bug.cgi?id=722299>).
Comment 10 Peter Occil 2013-05-18 20:45:24 UTC
Correction: 
==
4.  If there are exactly four labels, and each label contains only ASCII digits, and each label represents a number from 0 through 255 in the shortest possible form (for example, "25" and not "025"), return host.
==
Comment 11 Peter Occil 2013-05-18 21:05:58 UTC
Correction 2:

Substep 7 ("7.  If any label starts with "xn--", jump to the step labeled IDNA.") should be moved so that it comes after the step "2.  Split host into labels separated by U+002E FULL STOP."
Comment 12 Anne 2013-08-19 11:06:29 UTC
Using IDNA 2003 for now. See bug 23005 for HTML follow-up work.
Comment 13 Anne 2013-08-19 11:10:57 UTC
*** Bug 22986 has been marked as a duplicate of this bug. ***