15332 – Consider adding a description about some "asymmetric" encodings

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 15332 - Consider adding a description about some "asymmetric" encodings

Summary: Consider adding a description about some "asymmetric" encodings

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	Encoding (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+encodingspec

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	17839
	Show dependency tree / graph

Reported:	2011-12-24 15:40 UTC by Masatoshi Kimura
Modified:	2012-10-30 17:13 UTC (History)
CC List:	5 users (show)

See Also:	https://bugzilla.mozilla.org/show_bug.cgi?id=712876

Attachments

Description Masatoshi Kimura 2011-12-24 15:40:53 UTC

IE and Firefox use asymmetric mapping table for some charsets. Mainly ISO charsets use corresponding Windows charsets for decoding while be strict about encoding.
IMO it's desirable to employ this approach to keep "willful violation" to IANA registry as low as possible. iso-8859-9, latin5, l5, csISOLatin5, and iso-ir-148 are not aliases of windows-1254.

Comment 1 Masatoshi Kimura 2011-12-26 11:22:11 UTC

See also bug 15340. At least ISO encodings need to be separated from Windows encodings so that conformance checkers can report parse errors.

Comment 2 Anne 2011-12-27 12:28:51 UTC

Since these are legacy encodings, is it really worth caring that much about the IANA registry? It seems better to simplify code and lower the barrier to entry for new players.

Comment 3 Masatoshi Kimura 2011-12-27 12:58:15 UTC

I don't think the barrier is so high because browsers can ignore parse errors (that is, it's sufficient to just replace mapping tables). But conformance checkers can not.

Comment 4 Anne 2011-12-27 13:04:16 UTC

Right, about conformance checkers. I think they should flag everything that is not UTF-8. I don't really think it's worthwhile for them to flag that your usage of iso-8859-1 is actually windows-1252.

Henri, Ian, opinions?

Comment 5 Michael[tm] Smith 2011-12-27 13:33:09 UTC

(In reply to comment #4)
> Right, about conformance checkers. I think they should flag everything that is
> not UTF-8. I don't really think it's worthwhile for them to flag that your
> usage of iso-8859-1 is actually windows-1252.

If you mean requiring conformance checkers to emit warning messages for any document that's not UTF-8,  I'm not sure Richard would be too keen on that.

Comment 6 Ian 'Hixie' Hickson 2012-10-02 19:34:03 UTC

I think if a document is labeled as ISO-8859-1 but has characters that are going to be interpreted differently than ISO-8859-1 says they should be, that the validator should give an error message.

This is what the HTML spec currently requires for HTML docs.

Comment 7 Anne 2012-10-22 12:48:15 UTC

1. Per the Encoding Standard there is no difference between iso-8859-1 and windows-1252. I think that is fine, unless there is some compatibility problem with that.

2. I think we should make non-utf-8 usage non-conforming because there are too many traps with URLs, form submission, and other formats that only work well with utf-8.

Per that I'm going to mark this WONTFIX.