ISSUE-455: treatment of invalid 2-byte sequence is different in different CJK encodings

treatment of invalid 2-byte sequence is different in different CJK encodings

State:
CLOSED
Product:
encoding
Raised by:
Richard Ishida
Opened on:
2015-03-30
Description:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28141

This issue tracks the bug listed above and was created as part of the WG CR process.

---

Reporter: jshin@chromium.org

Per bug 16691 comment 15, I'm tightening Blink's encoding tables for CJK
encodings to handle unmappable 2-byte sequence in a safe manner.



The current spec has the following provision after looking up |pointer|.

* EUC-KR decoder
If pointer is null and byte is in the range 0x00 to 0x7F, prepend byte to
stream.


* Big5 decoder

If pointer is null and byte is in the range 0x00 to 0x7F, prepend byte to
stream.

* Shift_JIS decoder
If pointer is null, prepend byte to stream.

* EUC-JP decoder
If byte is not in the range 0xA1 to 0xFE, prepend byte to stream.


* GB18030 decoder
If pointer is null, prepend byte to stream.

For now, let's put aside EUC-JP and GB18030.

I don't see a reason to make SJIS decoder behave differently than EUC-KR and
Big5 decoder. One possible reason may be that [xA1, xDF] is a character by
itself.

In SJIS, "0xFC 0xE0" [1] is turned to U+FFFD, but the second byte (0xE0)
becomes the lead of what follows.

In EUC-KR, "0xFE 0xE0" is turned to U+FFFD and the next lead byte is taken from
the byte *after* 0xE0.

Shouldn't we change the part of SJIS decoder quoted above to the following?

If pointer is null and byte is in the range of 0x00 - 0x7F, prepend byte to
the stream.
Related Actions Items:
No related actions
Related emails:
  1. I18N-ISSUE-455 (BUG28141): treatment of invalid 2-byte sequence is different in different CJK encodings [encoding] (from sysbot+tracker@w3.org on 2015-03-30)

Related notes:

These issues are now tracked at http://www.w3.org/International/docs/encoding/encoding-cr-doc

Richard Ishida, 16 Sep 2015, 11:41:17

Display change log ATOM feed


Addison Phillips <addisonI18N@gmail.com>, Chair, Richard Ishida <ishida@w3.org>, Bert Bos <bert@w3.org>, Fuqiao Xue <xfq@w3.org>, Atsushi Shimono <atsushi@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 455.html,v 1.1 2023/07/19 12:02:04 carcone Exp $