<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>25396</bug_id>
          
          <creation_ts>2014-04-20 04:56:43 +0000</creation_ts>
          <short_desc>Incorrect mapping in index18030.txt</short_desc>
          <delta_ts>2014-04-28 12:16:57 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>Encoding</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Alexander Shtuchkin">ashtuchkin</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>duerst</cc>
    
    <cc>jshin</cc>
    
    <cc>mike</cc>
    
    <cc>www-international</cc>
          
          <qa_contact>sideshowbarker+encodingspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>104098</commentid>
    <comment_count>0</comment_count>
    <who name="Alexander Shtuchkin">ashtuchkin</who>
    <bug_when>2014-04-20 04:56:43 +0000</bug_when>
    <thetext>Input sequence A3 A0 in GB18030 is decoded as U+E5E5 by iconv and ICU. F.ex. 

&gt; printf &quot;\xA3\xA0&quot; | iconv -f gb18030 -t utf-16le | hexdump
0000000 e5 e5

ICU table: http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml

Using the algorithm given in http://encoding.spec.whatwg.org/#gb18030-encoder, 
A3 A0 results in pointer 6555, which is mapped to U+3000 IDEOGRAPHIC SPACE in index18030.txt.

I believe this mapping incorrect and should be replaced with U+E5E5.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>104099</commentid>
    <comment_count>1</comment_count>
    <who name="Martin Dürst">duerst</who>
    <bug_when>2014-04-21 08:43:01 +0000</bug_when>
    <thetext>For what it&apos;s worth, Ruby also produces U+E5E5:

prompt&gt; ruby -e &apos;p &quot;\xA3\xA0&quot;.encode(&quot;UTF-16BE&quot;, &quot;GB18030&quot;)&apos;
&quot;\uE5E5&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>104132</commentid>
    <comment_count>2</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2014-04-21 22:59:53 +0000</bug_when>
    <thetext>I&apos;m pretty sure I added a comment here (well, it&apos;s on my phone and I may have forgotten to press &apos;save changes&apos; button.).  

Anyway, I think we&apos;d better keep the current mapping as it is. Mapping to a PUA code point does not make much sense.

Webkit/Blink actually overrides the ICU mapping and map &apos;xA3 xA0&apos; to U+3000. See http://goo.gl/ocjnDR</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>104154</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-22 10:17:17 +0000</bug_when>
    <thetext>I should probably add a note about this in http://encoding.spec.whatwg.org/#indexes</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>104538</commentid>
    <comment_count>4</comment_count>
    <who name="Alexander Shtuchkin">ashtuchkin</who>
    <bug_when>2014-04-28 10:35:29 +0000</bug_when>
    <thetext>So, I guess it&apos;s just a matter of policy. Choosing WebKit as an authority makes a lot of sense to me. Thank you for explanation!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>104546</commentid>
    <comment_count>5</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-28 12:16:57 +0000</bug_when>
    <thetext>https://github.com/whatwg/encoding/commit/55accc77339e9618d35149efca85e3f4a9041dd6</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>