<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>18910</bug_id>
          
          <creation_ts>2012-09-18 14:08:56 +0000</creation_ts>
          <short_desc>IDNA</short_desc>
          <delta_ts>2016-02-22 08:27:43 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>URL</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#dependencies</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>annevk</cc>
    
    <cc>budryan23</cc>
    
    <cc>e.mojarro11</cc>
    
    <cc>ian</cc>
    
    <cc>lambda</cc>
    
    <cc>mathias</cc>
    
    <cc>mike</cc>
    
    <cc>poccil</cc>
    
    <cc>xn--mlform-iua</cc>
          
          <qa_contact>sideshowbarker+urlspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>74040</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-09-18 14:08:56 +0000</bug_when>
    <thetext>Specification: http://www.whatwg.org/specs/web-apps/current-work/
Multipage: http://www.whatwg.org/C#dependencies
Complete: http://www.whatwg.org/c#dependencies

Comment:
IDNA is obsoleted by IDNA2008 which in turn is patched by
http://unicode.org/reports/tr46/ I think. However, which version browsers
implement is not entirely clear to me.

Posted from: 212.238.236.229 by annevk@annevk.nl
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.10 (KHTML, like Gecko) Chrome/23.0.1262.0 Safari/537.10</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74063</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-09-18 23:57:04 +0000</bug_when>
    <thetext>Any chance I&apos;ll be able to defer to the URL spec?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74064</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-09-19 07:03:10 +0000</bug_when>
    <thetext>I guess that would be somewhat logical, but I am planning to defer on domain names and IP addresses myself until some implementors have spoken up. (There are plenty of problems to solve with URLs outside the domain name part.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77366</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-10-30 00:03:49 +0000</bug_when>
    <thetext>So I should reassign this to you in the URL component?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77381</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-10-30 09:18:08 +0000</bug_when>
    <thetext>Yes, I will attempt to solve host parsing.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80481</commentid>
    <comment_count>5</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-12-21 14:30:08 +0000</bug_when>
    <thetext>*** Bug 15254 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80484</commentid>
    <comment_count>6</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-12-21 14:30:58 +0000</bug_when>
    <thetext>*** Bug 19283 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80486</commentid>
    <comment_count>7</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-12-21 14:31:50 +0000</bug_when>
    <thetext>http://mathias.html5.org/tests/url/idna2003-separators/ has a test for domain label separators.

Also make sure to support underscores and such in domain labels as they are used.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80488</commentid>
    <comment_count>8</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-12-21 14:32:17 +0000</bug_when>
    <thetext>*** Bug 20036 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>87880</commentid>
    <comment_count>9</comment_count>
    <who name="Peter Occil">poccil</who>
    <bug_when>2013-05-18 20:40:00 +0000</bug_when>
    <thetext>For most all-ASCII domain names, the situation is relatively straightforward.  I suggest
the following changes to the host parser:

===========
3. If host is empty, parse error, return failure.

4. If host consists of only ASCII characters (characters in the range U+0000 to U+007F)
    run these substeps:
    
     1.  If host is longer than 255 characters, parse error, return failure.
    
     2.  Split host into labels separated by U+002E FULL STOP.
        
     3.  If any label is empty, or is longer than 63 characters,
          or begins or ends with U+002D HYPHEN-MINUS, or contains U+002D HYPHEN-MINUS
          in both the third and fourth positions, parse error, return failure.
          
     4.  If there are exactly four labels, and each label contains only ASCII digits, and each
          label represents a number from 0 through 255, return host.
     
     5.  [If there are two or more labels and the last label starts with an ASCII digit, parse error,
          return failure. (Not sure if this is needed.)]
      
     6.  If any label consists of a character other than either U+002D HYPHEN MINUS or an
          ASCII alphanumeric, parse error, return failure.

     7.  If any label starts with &quot;xn--&quot;, jump to the step labeled IDNA.

     8.  Convert host to ASCII lowercase and return host.

4. IDNA. [IDNA hell]
=====================

The following is my opinion.

For IDNA, I believe the function should allow domain names that are valid in either IDNA2003
or IDNA2008.  (The &quot;deviation characters&quot; under UTS46, such as LATIN LETTER SHARP S, may be
an exception.)  But I think that a separate algorithm should decide whether to return each label
of the domain name as Punycode or not, such as Mozilla&apos;s IDN display algorithm, which 
is expected to take effect when Firefox 22 is released (see issue 722299
&lt;https://bugzilla.mozilla.org/show_bug.cgi?id=722299&gt;).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>87881</commentid>
    <comment_count>10</comment_count>
    <who name="Peter Occil">poccil</who>
    <bug_when>2013-05-18 20:45:24 +0000</bug_when>
    <thetext>Correction: 
==
4.  If there are exactly four labels, and each label contains only ASCII digits, and each label represents a number from 0 through 255 in the shortest possible form (for example, &quot;25&quot; and not &quot;025&quot;), return host.
==</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>87882</commentid>
    <comment_count>11</comment_count>
    <who name="Peter Occil">poccil</who>
    <bug_when>2013-05-18 21:05:58 +0000</bug_when>
    <thetext>Correction 2:

Substep 7 (&quot;7.  If any label starts with &quot;xn--&quot;, jump to the step labeled IDNA.&quot;) should be moved so that it comes after the step &quot;2.  Split host into labels separated by U+002E FULL STOP.&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92262</commentid>
    <comment_count>12</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-08-19 11:06:29 +0000</bug_when>
    <thetext>Using IDNA 2003 for now. See bug 23005 for HTML follow-up work.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92265</commentid>
    <comment_count>13</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-08-19 11:10:57 +0000</bug_when>
    <thetext>*** Bug 22986 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>