This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Firefox ended up not following the plan from bug 16862 comment 18. Its gbk decoder is identical to its gb18030 decoder, but its gbk encoder per https://bugzilla.mozilla.org/show_bug.cgi?id=951691 is distinct. So we should probably bring the gbk encoder back. When fixing this we should pay attention to the EURO sign and PUA code points. See https://bugzilla.mozilla.org/show_bug.cgi?id=951691#c16 https://bugzilla.mozilla.org/show_bug.cgi?id=951691#c19 Having said that, if other browsers meanwhile converged on not having a distinct gbk encoder, perhaps Firefox should revisit its approach. Input welcome.
Data point: Chromium has NOT aligned with the Encoding standard here. Our tracking bug is http://crbug.com/339862 As usual, Jungshik has a lot more context than I do, but we were definitely hesitant about trying to make this change.
Anticipated changes: * Partially revert https://github.com/whatwg/encoding/commit/182ad9e607a7c6f0fa51d9dd6c638edaa5ec59fd to restore gb18030 as independent encoding with a single label, and gbk as independent encoding with nine labels. * Map gbk's decoder to gb18030's decoder (no flags). * Introduce a flag for gb18030's encoder that limits it to what gbk can output. (Still need to look into € and PUA.) * Use that flag to define gbk's encoder. (Per that commit we apparently historically defined gb18030 in terms of gbk, but that doesn't make much sense. So now we'll define gbk as a subset of gb18030.)
https://github.com/whatwg/encoding/commit/c8838716fc6f575f50506e5b82f12c434b5be6bb (It turns out that gbk supports the same PUA code points as far as I can tell.)
Sorry that I didn't get back here in a timely manner. I was out on internal/external conferences last week. Chromium was hesitant, but I've been considering merging gbk and gb18030 per spec before the latest revision. Moreover, latest revision made it a bit hard to implement GBK/GB18030 without touching the ICU's gb18030 implementation (even though I agree to the approach; 1. decoding is identical for both encodings 2. gbk encoding is limited to 'the gbk subset'). I've just read the latest revision and it's just my first thought. There might be an easier way. I'll give more thought to it.
I filed bug 28156 suggesting that GBK and GB18030 be completely separated even when decoding (toUnicode).