19979 – Definition of 'encoding' doesn't work e.g. for iso-2022-jp

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 19979 - Definition of 'encoding' doesn't work e.g. for iso-2022-jp

Summary: Definition of 'encoding' doesn't work e.g. for iso-2022-jp

Status:	RESOLVED FIXED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	Encoding (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+encodingspec

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-11-16 09:28 UTC by Martin Dürst
Modified:	2012-11-16 12:15 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Martin Dürst 2012-11-16 09:28:47 UTC

'encoding' is currently defined as follows:

An encoding defines a mapping from a code point to one or more bytes (and vice versa).

This does not work e.g. for iso-2022-jp, because that can only be explained as a mapping from a sequence (one or more) of code points to a sequence (one or more) of bytes (and vice versa).

Comment 1 Anne 2012-11-16 11:50:10 UTC

Why is that? I can see how this is the case for big5 though. How about "An encoding defines a mapping from a code point sequence to a byte sequence (and vice versa)."?

Comment 2 Martin Dürst 2012-11-16 12:02:25 UTC

(In reply to comment #1)
> Why is that? I can see how this is the case for big5 though. How about "An
> encoding defines a mapping from a code point sequence to a byte sequence
> (and vice versa)."?

The fix looks okay! But I don't understand why you think this is needed for Big5, but not for iso-2022-jp. If I have the byte sequence 0x24 0x24, is iso-2022-jp, this can either be "$$" or "い" (Hiragana I). So you need context to know which it is, you can't just convert one code point at a time. For Big5, on the other hand, you can convert one code point at a time assuming you get "packets" of bytes that each represent a code point.

Comment 3 Anne 2012-11-16 12:15:20 UTC

https://github.com/whatwg/encoding/commit/088780df57d0b4d567aad175d2be3d46980b7561

Sure, you need multiple bytes for one code point. The definition already covered that... big5 however can emit two code points sometimes.