25396 – Incorrect mapping in index18030.txt

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25396 - Incorrect mapping in index18030.txt

Summary: Incorrect mapping in index18030.txt

Status:	RESOLVED INVALID

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	Encoding (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+encodingspec

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-04-20 04:56 UTC by Alexander Shtuchkin
Modified:	2014-04-28 12:16 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

Description Alexander Shtuchkin 2014-04-20 04:56:43 UTC

Input sequence A3 A0 in GB18030 is decoded as U+E5E5 by iconv and ICU. F.ex. 

> printf "\xA3\xA0" | iconv -f gb18030 -t utf-16le | hexdump
0000000 e5 e5

ICU table: http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml

Using the algorithm given in http://encoding.spec.whatwg.org/#gb18030-encoder, 
A3 A0 results in pointer 6555, which is mapped to U+3000 IDEOGRAPHIC SPACE in index18030.txt.

I believe this mapping incorrect and should be replaced with U+E5E5.

Comment 1 Martin Dürst 2014-04-21 08:43:01 UTC

For what it's worth, Ruby also produces U+E5E5:

prompt> ruby -e 'p "\xA3\xA0".encode("UTF-16BE", "GB18030")'
"\uE5E5"

Comment 2 Jungshik Shin 2014-04-21 22:59:53 UTC

I'm pretty sure I added a comment here (well, it's on my phone and I may have forgotten to press 'save changes' button.).  

Anyway, I think we'd better keep the current mapping as it is. Mapping to a PUA code point does not make much sense.

Webkit/Blink actually overrides the ICU mapping and map 'xA3 xA0' to U+3000. See http://goo.gl/ocjnDR

Comment 3 Anne 2014-04-22 10:17:17 UTC

I should probably add a note about this in http://encoding.spec.whatwg.org/#indexes

Comment 4 Alexander Shtuchkin 2014-04-28 10:35:29 UTC

So, I guess it's just a matter of policy. Choosing WebKit as an authority makes a lot of sense to me. Thank you for explanation!

Comment 5 Anne 2014-04-28 12:16:57 UTC

https://github.com/whatwg/encoding/commit/55accc77339e9618d35149efca85e3f4a9041dd6