<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>15332</bug_id>
          
          <creation_ts>2011-12-24 15:40:53 +0000</creation_ts>
          <short_desc>Consider adding a description about some &quot;asymmetric&quot; encodings</short_desc>
          <delta_ts>2012-10-30 17:13:11 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>Encoding</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          <see_also>https://bugzilla.mozilla.org/show_bug.cgi?id=712876</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          <blocked>17839</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Masatoshi Kimura">VYV03354</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>gphemsley</cc>
    
    <cc>hsivonen</cc>
    
    <cc>ian</cc>
    
    <cc>ishida</cc>
    
    <cc>mike</cc>
          
          <qa_contact>sideshowbarker+encodingspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>62011</commentid>
    <comment_count>0</comment_count>
    <who name="Masatoshi Kimura">VYV03354</who>
    <bug_when>2011-12-24 15:40:53 +0000</bug_when>
    <thetext>IE and Firefox use asymmetric mapping table for some charsets. Mainly ISO charsets use corresponding Windows charsets for decoding while be strict about encoding.
IMO it&apos;s desirable to employ this approach to keep &quot;willful violation&quot; to IANA registry as low as possible. iso-8859-9, latin5, l5, csISOLatin5, and iso-ir-148 are not aliases of windows-1254.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>62030</commentid>
    <comment_count>1</comment_count>
    <who name="Masatoshi Kimura">VYV03354</who>
    <bug_when>2011-12-26 11:22:11 +0000</bug_when>
    <thetext>See also bug 15340. At least ISO encodings need to be separated from Windows encodings so that conformance checkers can report parse errors.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>62035</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2011-12-27 12:28:51 +0000</bug_when>
    <thetext>Since these are legacy encodings, is it really worth caring that much about the IANA registry? It seems better to simplify code and lower the barrier to entry for new players.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>62036</commentid>
    <comment_count>3</comment_count>
    <who name="Masatoshi Kimura">VYV03354</who>
    <bug_when>2011-12-27 12:58:15 +0000</bug_when>
    <thetext>I don&apos;t think the barrier is so high because browsers can ignore parse errors (that is, it&apos;s sufficient to just replace mapping tables). But conformance checkers can not.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>62037</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2011-12-27 13:04:16 +0000</bug_when>
    <thetext>Right, about conformance checkers. I think they should flag everything that is not UTF-8. I don&apos;t really think it&apos;s worthwhile for them to flag that your usage of iso-8859-1 is actually windows-1252.

Henri, Ian, opinions?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>62038</commentid>
    <comment_count>5</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2011-12-27 13:33:09 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; Right, about conformance checkers. I think they should flag everything that is
&gt; not UTF-8. I don&apos;t really think it&apos;s worthwhile for them to flag that your
&gt; usage of iso-8859-1 is actually windows-1252.

If you mean requiring conformance checkers to emit warning messages for any document that&apos;s not UTF-8,  I&apos;m not sure Richard would be too keen on that.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>75093</commentid>
    <comment_count>6</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-10-02 19:34:03 +0000</bug_when>
    <thetext>I think if a document is labeled as ISO-8859-1 but has characters that are going to be interpreted differently than ISO-8859-1 says they should be, that the validator should give an error message.

This is what the HTML spec currently requires for HTML docs.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>76902</commentid>
    <comment_count>7</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-10-22 12:48:15 +0000</bug_when>
    <thetext>1. Per the Encoding Standard there is no difference between iso-8859-1 and windows-1252. I think that is fine, unless there is some compatibility problem with that.

2. I think we should make non-utf-8 usage non-conforming because there are too many traps with URLs, form submission, and other formats that only work well with utf-8.

Per that I&apos;m going to mark this WONTFIX.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>