This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 19979 - Definition of 'encoding' doesn't work e.g. for iso-2022-jp
Summary: Definition of 'encoding' doesn't work e.g. for iso-2022-jp
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-16 09:28 UTC by Martin Dürst
Modified: 2012-11-16 12:15 UTC (History)
1 user (show)

See Also:


Attachments

Description Martin Dürst 2012-11-16 09:28:47 UTC
'encoding' is currently defined as follows:

An encoding defines a mapping from a code point to one or more bytes (and vice versa).

This does not work e.g. for iso-2022-jp, because that can only be explained as a mapping from a sequence (one or more) of code points to a sequence (one or more) of bytes (and vice versa).
Comment 1 Anne 2012-11-16 11:50:10 UTC
Why is that? I can see how this is the case for big5 though. How about "An encoding defines a mapping from a code point sequence to a byte sequence (and vice versa)."?
Comment 2 Martin Dürst 2012-11-16 12:02:25 UTC
(In reply to comment #1)
> Why is that? I can see how this is the case for big5 though. How about "An
> encoding defines a mapping from a code point sequence to a byte sequence
> (and vice versa)."?

The fix looks okay! But I don't understand why you think this is needed for Big5, but not for iso-2022-jp. If I have the byte sequence 0x24 0x24, is iso-2022-jp, this can either be "$$" or "い" (Hiragana I). So you need context to know which it is, you can't just convert one code point at a time. For Big5, on the other hand, you can convert one code point at a time assuming you get "packets" of bytes that each represent a code point.
Comment 3 Anne 2012-11-16 12:15:20 UTC
https://github.com/whatwg/encoding/commit/088780df57d0b4d567aad175d2be3d46980b7561

Sure, you need multiple bytes for one code point. The definition already covered that... big5 however can emit two code points sometimes.