This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 15332 - Consider adding a description about some "asymmetric" encodings
Summary: Consider adding a description about some "asymmetric" encodings
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 17839
  Show dependency treegraph
 
Reported: 2011-12-24 15:40 UTC by Masatoshi Kimura
Modified: 2012-10-30 17:13 UTC (History)
5 users (show)

See Also:


Attachments

Description Masatoshi Kimura 2011-12-24 15:40:53 UTC
IE and Firefox use asymmetric mapping table for some charsets. Mainly ISO charsets use corresponding Windows charsets for decoding while be strict about encoding.
IMO it's desirable to employ this approach to keep "willful violation" to IANA registry as low as possible. iso-8859-9, latin5, l5, csISOLatin5, and iso-ir-148 are not aliases of windows-1254.
Comment 1 Masatoshi Kimura 2011-12-26 11:22:11 UTC
See also bug 15340. At least ISO encodings need to be separated from Windows encodings so that conformance checkers can report parse errors.
Comment 2 Anne 2011-12-27 12:28:51 UTC
Since these are legacy encodings, is it really worth caring that much about the IANA registry? It seems better to simplify code and lower the barrier to entry for new players.
Comment 3 Masatoshi Kimura 2011-12-27 12:58:15 UTC
I don't think the barrier is so high because browsers can ignore parse errors (that is, it's sufficient to just replace mapping tables). But conformance checkers can not.
Comment 4 Anne 2011-12-27 13:04:16 UTC
Right, about conformance checkers. I think they should flag everything that is not UTF-8. I don't really think it's worthwhile for them to flag that your usage of iso-8859-1 is actually windows-1252.

Henri, Ian, opinions?
Comment 5 Michael[tm] Smith 2011-12-27 13:33:09 UTC
(In reply to comment #4)
> Right, about conformance checkers. I think they should flag everything that is
> not UTF-8. I don't really think it's worthwhile for them to flag that your
> usage of iso-8859-1 is actually windows-1252.

If you mean requiring conformance checkers to emit warning messages for any document that's not UTF-8,  I'm not sure Richard would be too keen on that.
Comment 6 Ian 'Hixie' Hickson 2012-10-02 19:34:03 UTC
I think if a document is labeled as ISO-8859-1 but has characters that are going to be interpreted differently than ISO-8859-1 says they should be, that the validator should give an error message.

This is what the HTML spec currently requires for HTML docs.
Comment 7 Anne 2012-10-22 12:48:15 UTC
1. Per the Encoding Standard there is no difference between iso-8859-1 and windows-1252. I think that is fine, unless there is some compatibility problem with that.

2. I think we should make non-utf-8 usage non-conforming because there are too many traps with URLs, form submission, and other formats that only work well with utf-8.

Per that I'm going to mark this WONTFIX.