This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25396 - Incorrect mapping in index18030.txt
Summary: Incorrect mapping in index18030.txt
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
Depends on:
Reported: 2014-04-20 04:56 UTC by Alexander Shtuchkin
Modified: 2014-04-28 12:16 UTC (History)
4 users (show)

See Also:


Description Alexander Shtuchkin 2014-04-20 04:56:43 UTC
Input sequence A3 A0 in GB18030 is decoded as U+E5E5 by iconv and ICU. F.ex. 

> printf "\xA3\xA0" | iconv -f gb18030 -t utf-16le | hexdump
0000000 e5 e5

ICU table:

Using the algorithm given in, 
A3 A0 results in pointer 6555, which is mapped to U+3000 IDEOGRAPHIC SPACE in index18030.txt.

I believe this mapping incorrect and should be replaced with U+E5E5.
Comment 1 Martin Dürst 2014-04-21 08:43:01 UTC
For what it's worth, Ruby also produces U+E5E5:

prompt> ruby -e 'p "\xA3\xA0".encode("UTF-16BE", "GB18030")'
Comment 2 Jungshik Shin 2014-04-21 22:59:53 UTC
I'm pretty sure I added a comment here (well, it's on my phone and I may have forgotten to press 'save changes' button.).  

Anyway, I think we'd better keep the current mapping as it is. Mapping to a PUA code point does not make much sense.

Webkit/Blink actually overrides the ICU mapping and map 'xA3 xA0' to U+3000. See
Comment 3 Anne 2014-04-22 10:17:17 UTC
I should probably add a note about this in
Comment 4 Alexander Shtuchkin 2014-04-28 10:35:29 UTC
So, I guess it's just a matter of policy. Choosing WebKit as an authority makes a lot of sense to me. Thank you for explanation!