ISSUE-454: Separate GBK and GB18030 even for decoding (toUnicode)

Separate GBK and GB18030 even for decoding (toUnicode)

State:
CLOSED
Product:
encoding
Raised by:
Richard Ishida
Opened on:
2015-03-30
Description:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28156

This issue tracks the bug listed above and was created as part of the WG CR process.

---

After bug 27235, GBK and GB18030 are distinct when encoding (fromUnicode).

I guess the rationale for treating GBK and GB18030 identically when decodidng
(toUnicode) is that there are (significant) number of pages that are actually
in GB18030 but are mislabelled as GBK.

I wonder if there's any statistics collected for that. I'm curious to know what
percentage of documents labelled as GBK are actually in GB18030. My suspicion
is that it's pretty low especially compared with 'ISO-8859-1 vs windows-1252',
'EUC-KR vs windows-949' (because it's so prevalent that the spec's EUC-KR is
actually windows-949, which I fully support), 'TIS 620 : ISO-8859-11 :
windows-864', and so forth.

I'm raising this issue because 1) Blink, Webkit, Firefox (and I guess, IE, too)
have treated two encodings separately 2) Blink need to add extra code to treat
GBK/GB18030 as specified in the current spec.

I believe that it's doable (I thought about how to do that yesterday), but I'm
not convinced that it's worth the effort / extra code.
Related Actions Items:
No related actions
Related emails:
  1. I18N-ISSUE-454 (BUG28156): Separate GBK and GB18030 even for decoding (toUnicode) [encoding] (from sysbot+tracker@w3.org on 2015-03-30)

Related notes:

These issues are now tracked at http://www.w3.org/International/docs/encoding/encoding-cr-doc

Richard Ishida, 16 Sep 2015, 11:43:18

Display change log ATOM feed


Addison Phillips <addisonI18N@gmail.com>, Chair, Richard Ishida <ishida@w3.org>, Bert Bos <bert@w3.org>, Fuqiao Xue <xfq@w3.org>, Atsushi Shimono <atsushi@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 454.html,v 1.1 2023/07/19 12:02:04 carcone Exp $