This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 19941 - Premature shift to two-byte mode in stateful encoders?
Summary: Premature shift to two-byte mode in stateful encoders?
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: All Windows 3.1
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-11 22:58 UTC by pub-w3
Modified: 2013-01-15 10:13 UTC (History)
2 users (show)

See Also:


Attachments

Description pub-w3 2012-11-11 22:58:29 UTC
The hz-gb-2312 encoder shifts to two-byte mode (i.e., emits the shift sequence ~{ or 7E 7B) whenever a non-ASCII character is seen (and the encoder is not in two-byte mode already), without checking whether the character is actually encodable (part of GB2312).  If it is not, an encoder error will be emitted next, which means that 1) for a terminating encoder, the output will end with a useless shift sequence, and 2) for a non-terminating encoder, the two-byte shift will have to be followed immediately by a one-byte (ASCII) shift (~} or 7E 7D) before the ASCII representation of the unrepresentable character.

It seems better not to output shift sequences with no purpose.

This issue also applies to the encoders for ISO-2022-JP and ISO-2022-KR.
Comment 1 Anne 2013-01-14 11:24:21 UTC
Okay, so for the hz-gb-2312 encoder we could switch 7 and 8 and add to the new 8 the additional condition that pointer is not null.

A similar type of fix works for the other encoders as far as I can tell.
Comment 2 pub-w3 2013-01-14 19:16:23 UTC
Yes, that should work.

Moving 7 to after 9 would be slightly simpler and might give the same result.