This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Input sequence A3 A0 in GB18030 is decoded as U+E5E5 by iconv and ICU. F.ex. > printf "\xA3\xA0" | iconv -f gb18030 -t utf-16le | hexdump 0000000 e5 e5 ICU table: http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml Using the algorithm given in http://encoding.spec.whatwg.org/#gb18030-encoder, A3 A0 results in pointer 6555, which is mapped to U+3000 IDEOGRAPHIC SPACE in index18030.txt. I believe this mapping incorrect and should be replaced with U+E5E5.
For what it's worth, Ruby also produces U+E5E5: prompt> ruby -e 'p "\xA3\xA0".encode("UTF-16BE", "GB18030")' "\uE5E5"
I'm pretty sure I added a comment here (well, it's on my phone and I may have forgotten to press 'save changes' button.). Anyway, I think we'd better keep the current mapping as it is. Mapping to a PUA code point does not make much sense. Webkit/Blink actually overrides the ICU mapping and map 'xA3 xA0' to U+3000. See http://goo.gl/ocjnDR
I should probably add a note about this in http://encoding.spec.whatwg.org/#indexes
So, I guess it's just a matter of policy. Choosing WebKit as an authority makes a lot of sense to me. Thank you for explanation!
https://github.com/whatwg/encoding/commit/55accc77339e9618d35149efca85e3f4a9041dd6