<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>19944</bug_id>
          
          <creation_ts>2012-11-11 23:23:28 +0000</creation_ts>
          <short_desc>Editorial:  Hangul names missing from the Korean index</short_desc>
          <delta_ts>2013-01-21 16:25:24 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>Encoding</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>trivial</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>pub-w3</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>mike</cc>
    
    <cc>VYV03354</cc>
          
          <qa_contact>sideshowbarker+encodingspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>78212</commentid>
    <comment_count>0</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-11-11 23:23:28 +0000</bug_when>
    <thetext>Unlike Han characters, Hangul syllables actually have Unicode names, and the Korean index should probably use those instead of just saying ‘&lt;Hangul Syllable&gt;’.

See Unicode 6.1, Section 3.12 Conjoining Jamo Behavior, Hangul Syllable Name Generation [*].  For instance, U+D4DB is ‘HANGUL SYLLABLE PWILH’.

[*] &lt;http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf&gt;, pp. 111-112.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78214</commentid>
    <comment_count>1</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-12 08:09:23 +0000</bug_when>
    <thetext>As long as https://raw.github.com/whatwg/encoding/master/UnicodeData.txt does not contain them (or the equivalent file on Unicode.org) that seems unlikely to happen.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78238</commentid>
    <comment_count>2</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-11-12 21:16:48 +0000</bug_when>
    <thetext>Are you saying that Unicode would have to publish an official list rather than an algorithm, or would it be enough if someone implemented the algorithm, checked the output against lists believed to be correct (e.g., http://www.inames.net/lang/out/out_p1s3_hangul.html) and sent you the data in a suitable format?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78239</commentid>
    <comment_count>3</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-11-12 21:28:24 +0000</bug_when>
    <thetext>There is also &lt;http://www.itscj.ipsj.or.jp/sc2/open/02n4168/HangulSy.txt&gt;, which appears to be part of ISO 10646.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78243</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-12 22:06:12 +0000</bug_when>
    <thetext>Well if you are willing to do the work I do not think I would object. It seems like a welcome addition. As a patch to https://github.com/whatwg/encoding/blob/master/index.py would be the best, but I can do that last bit too.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78333</commentid>
    <comment_count>5</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-11-15 00:40:34 +0000</bug_when>
    <thetext>The easiest solution is probably to add the algorithm to index.py:

@@ -5,6 +5,13 @@ data = json.loads(open(&quot;indexes.json&quot;, &quot;r&quot;).read())
 # Copy from ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
 names = open(&quot;UnicodeData.txt&quot;, &quot;r&quot;).readlines()
 
+jamo = [[&quot;G&quot;,&quot;GG&quot;,&quot;N&quot;,&quot;D&quot;,&quot;DD&quot;,&quot;R&quot;,&quot;M&quot;,&quot;B&quot;,&quot;BB&quot;,&quot;S&quot;,&quot;SS&quot;,&quot;&quot;,&quot;J&quot;,&quot;JJ&quot;,&quot;C&quot;,&quot;K&quot;,
+         &quot;T&quot;,&quot;P&quot;,&quot;H&quot;],
+        [&quot;A&quot;,&quot;AE&quot;,&quot;YA&quot;,&quot;YAE&quot;,&quot;EO&quot;,&quot;E&quot;,&quot;YEO&quot;,&quot;YE&quot;,&quot;O&quot;,&quot;WA&quot;,&quot;WAE&quot;,&quot;OE&quot;,&quot;YO&quot;,&quot;U&quot;,
+         &quot;WEO&quot;,&quot;WE&quot;,&quot;WI&quot;,&quot;YU&quot;,&quot;EU&quot;,&quot;YI&quot;,&quot;I&quot;],
+        [&quot;&quot;,&quot;G&quot;,&quot;GG&quot;,&quot;GS&quot;,&quot;N&quot;,&quot;NJ&quot;,&quot;NH&quot;,&quot;D&quot;,&quot;L&quot;,&quot;LG&quot;,&quot;LM&quot;,&quot;LB&quot;,&quot;LS&quot;,&quot;LT&quot;,&quot;LP&quot;,
+         &quot;LH&quot;,&quot;M&quot;,&quot;B&quot;,&quot;BS&quot;,&quot;S&quot;,&quot;SS&quot;,&quot;NG&quot;,&quot;J&quot;,&quot;C&quot;,&quot;K&quot;,&quot;T&quot;,&quot;P&quot;,&quot;H&quot;]]
+
 def char(cp):
     if cp &gt; 0xFFFF:
         hi, lo = divmod(cp-0x10000, 0x400)
@@ -23,7 +30,9 @@ def get_name(cp):
     elif cp &gt;= 0x4E00 and cp &lt;= 0x9FCB:
         return &quot;&lt;CJK Ideograph&gt;&quot;
     elif cp &gt;= 0xAC00 and cp &lt;= 0xD7A3:
-        return &quot;&lt;Hangul Syllable&gt;&quot;
+        i = cp - 0xAC00
+        s = jamo[0][i/21/28] + jamo[1][i%(21*28)/28] + jamo[2][i%28]
+        return &quot;HANGUL SYLLABLE &quot; + s
     elif cp &gt;= 0xE000 and cp &lt;= 0xF8FF:
         return &quot;&lt;Private Use&gt;&quot;
     elif cp &gt;= 0x20000 and cp &lt;= 0x2A6D6:</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78336</commentid>
    <comment_count>6</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-11-15 01:43:21 +0000</bug_when>
    <thetext>Alternative, slightly simpler formula:
s = jamo[0][i/21/28] + jamo[1][i/28%21] + jamo[2][i%28]</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78402</commentid>
    <comment_count>7</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-16 12:30:50 +0000</bug_when>
    <thetext>https://github.com/whatwg/encoding/commit/9cc353b228d5ec9c6fdbb7a516ef18d8970fdd27

Thanks!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>81599</commentid>
    <comment_count>8</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2013-01-19 15:00:41 +0000</bug_when>
    <thetext>It would make more sense to write the first division as i/28/21.
(This does of course make no difference to the result, and this code is not part of the specification, so I leave the bug closed.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>81888</commentid>
    <comment_count>9</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-01-21 16:25:24 +0000</bug_when>
    <thetext>Done.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>