<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>12897</bug_id>
          
          <creation_ts>2011-06-06 20:01:58 +0000</creation_ts>
          <short_desc>In some parsers, UTF-8 BOM trumps the HTTP charset attribute (Encoding sniffing algorithm)</short_desc>
          <delta_ts>2011-12-31 03:34:13 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>LC1 HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc>http://dev.w3.org/html5/spec/parsing#encoding-sniffing-algorithm</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>major</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Leif Halvard Silli">xn--mlform-iua</reporter>
          <assigned_to>contributor</assigned_to>
          <cc>hsivonen</cc>
    
    <cc>ian</cc>
    
    <cc>julian.reschke</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>xn--mlform-iua</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>49225</commentid>
    <comment_count>0</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-06 20:01:58 +0000</bug_when>
    <thetext>PROPOSAL: 
   Spec IE and Webkit&apos;s handling of the Byte Order Mark for the UTF-8 encoding as  REQUIRED:   Whenever the document begins with the UTF-8 Byte Order Mark, then ignore the encoding info of the HTTP &quot;Content-Type: text/html; charset=[encodingname]&quot; header and ignore as well any user actions to override the document&apos;s encoding.
   Consequently, 
      * when there is a UTF-8 BOM,  then the encoding info provided by HTTP and the user should be treated as irrelevant
      * the two first steps of the encoding sniffing algorithm must be changed 

CURRENT STATUS: 
   The encoding sniffing algorithm two first steps give users + transporation layer (HTTP/MIME) power to override a document&apos;s character encoding:

   ]] 1. If the user has explicitly instructed the user agent to override the document&apos;s character encoding with a specific encoding, optionally return that encoding with the confidence certain and abort these steps.
      2. If the transport layer specifies an encoding, and it is supported, return that encoding with the confidence certain, and abort these steps.[[

   HOWEVER, reality is that two mayor user agents operates with an exceptio to the above rules:  Whenever the document includes the UTF-8 Byte Order Mark, then Internet Explorer and Webkit  
    - do *not* allow users to override the encoding
    - do *not* respect the encoding information in the HTTP server&apos;s Content-Type header.
    - do *not* permit their heuristic character dection features to guess any encoding other than UTF-8
   Consequently, in IE and Webkit it is impossible for the user - as well as for a HTTP server -  to cause a document with the UTF-8 Byte Order Mark to be intepreted as e.g. KOI8-R encoded or Windows-1252 encoded.

   In contrast, Firefox and Opera
    - *do* obey the HTTP server&apos;s Content-Type header also when ther is a UTF-8 BOM 
    - *do* allow users to override the encoding also when ther is a UTF-8 BOM
    - *do* permit their heuristic character dection features to guess an encoding other than UTF-8 (Opera and Firefox allow their users to tune/fiddle with how their heuristic encoding sniffing work.)
   Consequently, in Firefox and Opera it is *possible* for the user - as well as for a HTTP server -  to cause a document with the UTF-8 Byte Order Mark to be intepreted as e.g. KOI8-R encoded or Windows-1252 encoded

BENEFITS:
    A. Harmonization with XML 1.0  Appendix F.2, &quot;Priorities in the Presence of External Encoding Information&quot;, which recommends BOM to have higher priority than external encoding information: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info  (Opera/Firefox do not yet implement this XML 1.0 recommendation)
    B. a simple, reliable way to specify the UTF-8 encoding
    C. FIrefox/Opera converge with IE/Webkit = browsers more interopable: 
    D. security: cameleon documens (where the document gets another and risky interpretation when read as legacy encoding) become more difficult to create
    E. User experience: less &quot;gibberish&quot; and less &quot;mojibake&quot; for users [*] http://en.wikipedia.org/wiki/Mojibake
    F. Same as A): Promotes a polyglot way to specify the encoding: the BOM works in both HTML and XML. (The Polyglot spec already says that the UTF-8 BOM  is the most polyglot enocoding method.)

Other justifications:
   - Opera Software &quot;We have introduced the BOM as requirement for each source file when we have written the build tools as a simple way to verify that all files are utf8 encoded&amp;#8221;. (http://stackoverflow.com/questions/4658985/how-to-keep-the-bom-when-editing-files-in-espresso)

NOTES: 

   (1) Browsers tested as part of this bug report: IE8, Safari, Chrome (which shows above described behavior) as well as Opera and Firefox (which do support this behavior). Other browsers, e.g. KHTML, have not been tested.
   (2) BOM in UTF-16: I have not looked into how BOM in UTF-16 is handled by parsers.
   (3)  For the record: All browsers, including Firefox and Opera, *do* already ignore the META charset *element* whenever ther is a UTF-8 BOM. This bug report says that they should *also* ignore HTTP.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49226</commentid>
    <comment_count>1</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-06 20:15:50 +0000</bug_when>
    <thetext>A minor data correction:

(In reply to comment #0)
   [ snip ]
&gt; NOTES: 
&gt; 
&gt;    (1) Browsers tested as part of this bug report: IE8, Safari, Chrome (which
&gt; shows above described behavior) as well as Opera and Firefox (which do support
&gt; this behavior). Other browsers, e.g. KHTML, have not been tested.

The last parenthesis should be read &quot;(which do *not* support this behaviour)&quot;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49227</commentid>
    <comment_count>2</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-06 20:46:34 +0000</bug_when>
    <thetext>Quirks-mode:  Additionally, the Opera/Firefox behaviour sends those browsers into Quirks-Mode (because they see some illegal characters before the DOCTYPE), whereas Internet Explorer and Webkit remain in no-quirks mode.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49228</commentid>
    <comment_count>3</comment_count>
      <attachid>994</attachid>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-06 21:04:54 +0000</bug_when>
    <thetext>Created attachment 994
Polyglot file with BOM, served as &apos;application/xhtml+x; charset=koi8-r&apos;

Parsers which follow the recommendation in  XML 1.0, should respect the BOM and ignore the encoding information inisde the Content-Type header.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49230</commentid>
    <comment_count>4</comment_count>
      <attachid>995</attachid>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-06 21:09:57 +0000</bug_when>
    <thetext>Created attachment 995
Polyglot file with BOM served as &apos;text/html charset=koi8-r&apos;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49233</commentid>
    <comment_count>5</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-06 23:51:26 +0000</bug_when>
    <thetext>Test file which demonstrates the issues are available here - it is recommended to read the explanation of that page: http://malform.no/testing/html5/bom/

Direct link, XML test file: http://malform.no/testing/html5/bom/xml.html
Direct link, HTML test file: http://malform.no/testing/html5/bom/htm.html
Additionally, Opera has some extra bugs (a bug in the bug): http://malform.no/testing/html5/bom/xml_(ISO-8859-1).html

All the pages have a BOM in combination with erroneous encoding info inside the HTTP Content-Type: header.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49237</commentid>
    <comment_count>6</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-07 02:49:13 +0000</bug_when>
    <thetext>Related Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=662458
 Related Opera bug: DSK-338772    AT-the-server      bugs.opera.com</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49241</commentid>
    <comment_count>7</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2011-06-07 05:14:24 +0000</bug_when>
    <thetext>(In reply to comment #0)
&gt;     A. Harmonization with XML 1.0  Appendix F.2, &quot;Priorities in the Presence of
&gt; External Encoding Information&quot;, which recommends BOM to have higher priority
&gt; than external encoding information:
&gt; http://www.w3.org/TR/xml/#sec-guessing-with-ext-info  (Opera/Firefox do not yet
&gt; implement this XML 1.0 recommendation)

I believe you are misreading the XML 1.0 spec. It says that in the HTTP case, RFC 3023 applies but for anyone specifying a new case, they recommend giving XML itself precedence. However, since the RFC applies in the HTTP case, in the HTTP case, the charset parameter on the HTTP level is authoritative.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49256</commentid>
    <comment_count>8</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-07 11:53:05 +0000</bug_when>
    <thetext>(In reply to comment #7)

&gt; I believe you are misreading the XML 1.0 spec. It says that in the HTTP case,
&gt; RFC 3023 applies but for anyone specifying a new case, they recommend giving
&gt; XML itself precedence. However, since the RFC applies in the HTTP case, in the
&gt; HTTP case, the charset parameter on the HTTP level is authoritative.

(1) It is already great if we agree that about the interpretation whenever HTTP  is *not* used!

(2) In that regard, HTML5 tends to talk about &quot;the higher protocol&quot; and not specifically about HTTP.

(3) It is in the power of HTML5 spec to specify how XHTML5 and HTML5 document should be interpreted. Because: 

  a) the HTML5 effort (including &quot;sister projects&quot;) looks as redefining/refining the HTTP specs as well as HTML itself. 

  b) XML 1.0 defers it: &quot;the preferred method of handling conflict  should be specified as part of the higher-level protocol used to deliver XML&quot;

  c) XML 1.0 defines a recommended rule (which it probably would like to see in HTTP as well): &quot;If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding.&quot;

But apart from what XML says, we must also look at interoperatibility - and the effects of Opera and Mozilla&apos;s reading of the specifications.

  I) In Mozilla&apos;s bugzilla there are several reports about how to handle the BOM gibberish letters whenever the BOM is ignored in favor of an external protocol.

  II) Opera has implemented a very strange behaviour were it sometimes eats the BOM gibberish, so that the page does not go in to quirks-mode, whereas sometimes it does not eat the BOM gibberish, leading to quirks mode. See my tests: http://malform.no/testing/html5/bom/ 

   Et cetera: Yellow Screen of Death, IE/Webkit, wrong resulting encoding. 

   I don&apos;t know if I misread Julian, but I&apos;ll also quote a message to Adam in 2009: [*]

]]
   &gt; The algorithm tolerates leading white space, but not leading BOMs.

   Is there a particular reason why the BOM is not tolerated, given 
   &lt;http://www.w3.org/TR/REC-xml/#sec-guessing&gt;?
               [ snipping in Julian&apos;s message ]
   Let&apos;s ignore &quot;correctly&quot; for a second -- [ snipping ]
]]

[*] http://lists.w3.org/Archives/Public/public-html/2009Nov/0579</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49272</commentid>
    <comment_count>9</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-07 14:58:25 +0000</bug_when>
    <thetext>XML 1.0 points to RFC3023. But is notable how RFC3023 only speaks about the UTF-16 BOM and not about the UTF-8 BOM: http://tools.ietf.org/html/rfc3023#page-15</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49373</commentid>
    <comment_count>10</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-09 11:27:00 +0000</bug_when>
    <thetext>More data collected - after discussion on www-international@ and implementation tests:

NOTE: Data is needed for IE9&apos;s XML parser. Assumption: behaves as Webkit (because that is how it acts for HTML)

Spec data - XML:
---

* XML 1.0 only says that Content-Type: *can* have priority (depending on what the higher protocol says) over &quot;&lt;?xml version=&quot;1.0&quot; encoding=&quot;value&quot;?&gt;&quot; Quote:  ]] In the absence of information provided by an external transport protocol  (e.g. HTTP or MIME), it is a fatal error[[ &lt;http://www.w3.org/TR/xml/#charencoding&gt; Thus it depends on the rules of the higher protocal.

Spec data - RFC3023
---

1) RFC3023 &apos;XML Media Types&apos; specifies that HTTP charset parameter does have priority. (Meaning that the xml parser must - legally - ignore the XML encoding declaration.) 

2) But RFC3023 actually only justifies it for &apos;text/xml&apos;, where *transcoding* (leading the doc to have another coding than the one specified inside the document) and *compatibility with tex/plain* are the justifications: &lt;http://tools.ietf.org/html/rfc3023#section-3.1&gt; 

3) For &apos;application/xml&apos;, then RFC3023 has no real justification. The only thing it has is: &quot;it is possible for users to configure web servers&quot; and &quot;the HTTP spec says so&quot;. http://tools.ietf.org/html/rfc3023#section-3.2

4)    Notably, RFC3023 seriously discusses the Appendix F: &quot;Autodetection of Character Encodings (Non-Normative&quot;. (http://www.w3.org/TR/xml/#sec-guessing)  Which (once again) under the heading &quot;Priorities in the Presence of External Encoding Information&quot; states:  ]] In the interests of interoperability, however, the following rule is recommended. If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding. [[  &lt;http://www.w3.org/TR/xml/#sec-guessing-with-ext-info&gt;

Implementation data - RFC3023:
---

* Parsers implementing RFC3023 (HTTP has priority over document data): Opera, Firefox, Amaya

** Parsers implementing RFC3023 and which *also* emits &apos;fatal errror&apos; if HTTP charset and UTF-8 BOM disagree: Opera, Firefox. (Thus: not Amaya.) Note: per XML 1.0 it is required, *if HTTP and RFC3023 requires it! (and they do!)* to ignore the XML encoding declaration in favour of the HTTP charset paramenter. But note that it is not permitted, per XML 1.0, to act as if BOM does not exist, even if the doc is served via HTTP!

* Parsers *not* implementing RFC3023 (thus giving priority to document data instead), and which do not emit fatal errors: Webkit, Xerces C++, XMLMind Editor on Mac (based on Xerces Java), RXP, oXygen

** Parsers *not* implementing RFC3023 and which, in case of conflict and without emitting fatal error, adheres to BOM and ignores the XML encoding declaration: Webkit, (IE9 must be checked)

** Parsers not implementing RFC3023 and which, in case of conflict and without emitting fatal error, adheres to the XML encoding declaration and ignores the BOM: XMLmind Editor for Mac, Xerces C++, oXygen, RXP

   
Implementation data - non-RFC3023 (file protocol):
---

* Parsers emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration: Opera.

* Parsers *not* emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration: Webkit, Firefox, oXygen, XMLmind XML editor for mac (based on Xerces Java), Amaya

** Parsers *not* emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration and which gives priority to UTF-8 BOM: Webkit, Firefox, oXygen

** Parsers *not* emitting fatal error if UTF-8 BOM conflicts with the XML encoding declaration and which gives priority to XML encoding declaration (and/or to the UTF-8 encoding default, if they comopletely jumps over the UTF-8 BOM): XMLmind XML editor, RXP and (probably) Xerces C++


Implementation data - charset names:
---

* Webkit and some of the editiors, emit &apos;fatal error&apos; if the charset *name* in the XML encoding declaration is *unknown*. This, even if they (for example Webkit) *otherwise* do not emit a fatal error whenever UTF-8 BOM conflicts with the XML encoding declaration.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49374</commentid>
    <comment_count>11</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-09 11:49:24 +0000</bug_when>
    <thetext>* Xerces C++ bug: https://issues.apache.org/jira/browse/XERCESC-1967
* RXP bug has been reported.
* XMLmind Editor Bug has been reported
* oXygen bug has been reported
* The XML test suite has a &apos;Lack of &apos;fatal error&apos; tests for invalid encodings&apos; http://lists.w3.org/Archives/Public/public-xml-testsuite/2011Jun/0000.html
* Thread on ww-international:
  http://lists.w3.org/Archives/Public/www-international/2011AprJun/0079.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49377</commentid>
    <comment_count>12</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-09 12:30:19 +0000</bug_when>
    <thetext>libxml2 is a rare exception - for HTTP, it behaves as RFC3023 specifices, thus we can count libxml, Opera and Firefox:

$: xmllint http://malform.no/testing/html5/bom/xml.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49378</commentid>
    <comment_count>13</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-09 12:34:05 +0000</bug_when>
    <thetext>(In reply to comment #12)
&gt; libxml2 is a rare exception - for HTTP, it behaves as RFC3023 specifices, thus
&gt; we can count libxml, Opera and Firefox:
&gt; 
&gt; $: xmllint http://malform.no/testing/html5/bom/xml.html

However, for file:// operations, then it does not do what XML 1.0 specifies: It ignores the UTF-8 BOM. And obeyes the XML encoding declaration.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49382</commentid>
    <comment_count>14</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-09 12:52:46 +0000</bug_when>
    <thetext>(In reply to comment #13)
&gt; (In reply to comment #12)
&gt; &gt; libxml2 is a rare exception - for HTTP, it behaves as RFC3023 specifices, thus
&gt; &gt; we can count libxml, Opera and Firefox:
&gt; &gt; 
&gt; &gt; $: xmllint http://malform.no/testing/html5/bom/xml.html
&gt; 
&gt; However, for file:// operations, then it does not do what XML 1.0 specifies: It
&gt; ignores the UTF-8 BOM. And obeyes the XML encoding declaration.

https://bugzilla.gnome.org/show_bug.cgi?id=652185</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>49795</commentid>
    <comment_count>15</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-06-17 18:44:03 +0000</bug_when>
    <thetext>Example site in the wild: http://vertikal.dk

Web site data: 

* Each page typicaly begins with *several* byte-order marks (sic)
* Many pages mixes UTF-8 and Windows-1252 encoding, even in the
   very same *sentence*: http://vertikal.dk/foto/hitlist.htm

Browser handling data:

* Opera and Firefox allows users to override the encoding.  
   However, to *very little* benfit for the user
* Internet Explorer and Webkit does not allow encoding to be overridden. 
* All browsers apears to be in quirks-mode</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>53728</commentid>
    <comment_count>16</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2011-08-04 05:16:46 +0000</bug_when>
    <thetext>mass-move component to LC1</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>60753</commentid>
    <comment_count>17</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-12-02 00:41:24 +0000</bug_when>
    <thetext>Henri, what do you think the spec should say here? (If you think no change is needed, please close the bug. Thanks!)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>60763</commentid>
    <comment_count>18</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2011-12-02 08:26:26 +0000</bug_when>
    <thetext>(In reply to comment #17)
&gt; Henri, what do you think the spec should say here? (If you think no change is
&gt; needed, please close the bug. Thanks!)

EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the tracker issue; or you may create a tracker issue
yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: The precedence of HTTP encoding declaration over the internal encoding declaration is indeed backwards in the light of Ruby&apos;s Postulate (http://intertwingly.net/slides/2004/devcon/69.html). However, the main issue is HTTP vs. meta. The BOM is mainly a theoretical sideshow. Since, for compatibility, performance, etc., we aren&apos;t changing the precedence of HTTP and meta, it&apos;s not worthwhile to tweak the precedence of the UTF-8 BOM, which in practice is a sideshow (mainly because it makes sense to configure text editors not to emit it in order to make the text editors useful for editing formats that misbehave if the UTF-8 BOM is present). When we aren&apos;t changing the precedence of HTTP and meta to give higher precedence to the value that logically has the higher probability of being right, it makes sense to fully retain the current order which is logical in another way: precedence is given to the outermost encoding indicator.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>62170</commentid>
    <comment_count>19</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-12-31 03:34:13 +0000</bug_when>
    <thetext>Comment #0 does not display its content. For the original text, see public-html:
http://lists.w3.org/Archives/Public/public-html/2011Jun/0084.html

Comment #1  was also lost. Hene I paste it in here from my e-mail copy:

2011-06-06 20:15:50 UTC ---
A minor data correction:

(In reply to comment #0)
   [ snip ]
&gt; NOTES: 

???(1) Browsers tested as part of this bug report: IE8, Safari, Chrome (which
shows above described behavior) as well as Opera and Firefox (which do support
this behavior). Other browsers, e.g. KHTML, have not been tested.

The last parenthesis should be read &quot;(which do *not* support this behaviour)&quot;.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>994</attachid>
            <date>2011-06-06 21:04:54 +0000</date>
            <delta_ts>2011-06-06 21:12:16 +0000</delta_ts>
            <desc>Polyglot file with BOM, served as &apos;application/xhtml+x; charset=koi8-r&apos;</desc>
            <filename>file.html.koi8-r.xhtml</filename>
            <type>application/xhtml+xml;charset=koi8-r</type>
            <size>3260</size>
            <attacher name="Leif Halvard Silli">xn--mlform-iua</attacher>
            
              <data encoding="base64">77u/PCFET0NUWVBFIGh0bWw+CjxodG1sIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3ho
dG1sIiB4bWw6bGFuZz0iZW4iIGxhbmc9ImVuIj4KIDxoZWFkPgogIDxtZXRhIGNoYXJzZXQ9IktP
STgtciIgLz4KICA8dGl0bGU+VVRGLTggZW5jb2RlZCBkb2N1bWVudCB3aXRoIGVycm9uZW91cyBl
eHRlcm5hbCBlbmNvZGluZzwvdGl0bGU+CjwvaGVhZD48Ym9keT4KPGgxPlRlc3QgZG9jdW1lbnQ6
IFVURi04IGVuY29kZWQgZG9jdW1lbnQgd2l0aCBlcnJvbmVvdXMgZXh0ZXJuYWwgZW5jb2Rpbmc8
L2gxPgo8cD5UaGlzIEhUTUwtY29tcGF0aWJsZSBYSFRNTCBkb2N1bWVudCwgaXMgZW5jb2RlZCB3
aXRoIHRoZSBVVEYtOCBlbmNvZGluZyBhbmQgaXMgYWxzbyBnaXZlbiBhIGNoYXJhY3RlciBlbmNv
ZGluZyBzaWduYXR1cmUgaW4gdGhlIGZvcm0gb2YgYSBCeXRlIE9yZGVyIE1hcmsgKEJPTSkuIEhv
d2V2ZXIsIGluIGNvbnRyYXN0IHRvIHRoaXMsIHRoZSBIVFRQIENvbnRlbnQtVHlwZTogaGVhZGVy
IGNvbWluZyBmcm9tIHRoZSBXZWIgc2VydmVyLCBjbGFpbXMgKHN1Y2ggaXMgYSBsZWFzdCB0aGUg
cGxhbiAuLi4pIHRoYXQgdGhlIGVuY29kaW5nIG9mIHRoaXMgZG9jdW1lbnQgaXMgSVNPLTg4NTkt
MS48L3A+CjxkaXYgc3R5bGU9ImJvcmRlcjpzb2xpZCBicm93bjtib3JkZXItcmFkaXVzOjIwcHg7
cGFkZGluZzoxNXB4OyI+PHA+PHN0cm9uZz5DaGFyYWN0ZXIgZ2liYmVyaXNoOjwvc3Ryb25nPiBI
ZXJlIGFyZSBzb21lIG5vbi1BU0NJSSBsZXR0ZXJzIHdoaWNpaCByZXF1aXJlcyBVVEYtOCBpbnRl
cnByZXRhdGlvbiBpbiBvcmRlciB0byBiZSBkaXNwbGF5ZWQgY29ycmVjdGx5OiA8dmFyPsOmw7jD
pSDDhsOYw4Ugw7bDvMO/IMOUw5vFuCDQkNCR0JIg0LDQsdCyINCv0K7QliDRj9GO0LY8L3Zhcj48
L3A+CjwvZGl2Pgo8cD5Gb3Igc2l0dWF0aW9ucyB3aGVyZSB0d28gbGF5ZXJzIHNwZWNpZmllcyBk
aWZmZXJlbnQgZW5jb2RpbmcsIHRoZW4gPGEgaHJlZj0iaHR0cDovL3d3dy53My5vcmcvVFIveG1s
LyNzZWMtZ3Vlc3Npbmctd2l0aC1leHQtaW5mbyI+WE1MIDEuMCBhcHBlbmRpeCBGLjIgcmVjb21t
ZW5kczwvYT46IDwvcD4KPGJsb2NrcXVvdGU+CjxwPkluIHRoZSBpbnRlcmVzdHMgb2YgaW50ZXJv
cGVyYWJpbGl0eSwgaG93ZXZlciwgdGhlIGZvbGxvd2luZyBydWxlIGlzIHJlY29tbWVuZGVkLjwv
cD4KCjx1bD48bGk+SWYgYW4gWE1MIGVudGl0eSBpcyBpbiBhIGZpbGUsIHRoZSBCeXRlLU9yZGVy
IE1hcmsgYW5kIGVuY29kaW5nIGRlY2xhcmF0aW9uIGFyZSB1c2VkIChpZiBwcmVzZW50KSB0byBk
ZXRlcm1pbmUgdGhlIGNoYXJhY3RlciBlbmNvZGluZy48L2xpPjwvdWw+CjwvYmxvY2txdW90ZT4K
CjxwPkZvciBIVE1MLCB0aGVuIGF0IGxlYXN0IEludGVybmV0IEV4cGxvcmVyIDggYW5kIFdlYmtp
dCAoU2FmYXJpLCBDaHJvbWUpIGJlaGF2ZSBhcyByZWNvbW1lbmRlZCBmb3IgWE1MIDEuMDogVGhl
eSByZXNwZWN0IHRoZSBCT00gbW9yZSB0aGFuIHRoZXkgcmVzcGVjdCB0aGUgSFRUUCBDb250ZW50
LVR5cGU6IGhlYWRlci4gVGhleSBhbHNvIHJlc3BlY3QgdGhlIEJPTSBtb3JlIHRoYW4gYSB1c2Vy
J3MgcG9zc2libGUgYXR0ZW1wdCB0byBvdmVycmlkZSB0aGUgZW5jb2RpbmcsIGFuZCBmb3IgV2Vi
a2l0IHRoaXMgZ29lcyBmb3IgYm90aCBYTUwgYW5kIEhUTUwuIChJIGhhdmUgbm90IHRlc3RlZCBJ
bnRlcm5ldCBFeHBsb3JlciB2ZXJzaW9uIDkuKTwvcD4KCjxwPkZvciBYTUwsIHRoZW4gT3BlcmEg
YW5kIEZpcmVmb3ggZG8gbm90IHJlc3BlY3QgdGhlIEJPTSBhcyBtdWNoIGFzIHRoZSBYTUwgc3Bl
Y2lmaWNhdGlvbiByZWNvbW1lbmRzLiBBcyBhIGNvbnNxdWVuc2UsIGluIGZhY2Ugb2YgYW4gWE1M
IGRvY3VtZW50IHdpdGggZXJyb25lb3VzIGVuY29kaW5nIGluZm8gaW5zaWRlIHRoZSBIVFRQIENv
bnRlbnQtVHlwZTogaGVhZGVyLCB0aGVuIEZpcmVmb3ggYW5kIE9wZXJhIGZpcmVzIGEgZHJhY29u
aWFuIGVycm9yIG1lc3NzYWdlLiBGb3IgaW5zdGFuY2UsIHRoaXMgZG9jdW1lbnQgaGFzIGEgSFRU
UCBDb250ZW50LVR5cGU6IGhlYWRlciB3aGljaCBzYXlzICJJU08tODg1OS0xIiwgd2hpY2ggLSB3
aGVuIHRoaXMgbGFibGUgaXMgcmVzcGVjdGVkLCBsZWFkcyB0aGUgcGFyc2VyIHRvIHNlZSBzb21l
IGlsbGVnYWwgY2hhcmFjdGVycyBiZWZvciB0aGUgRE9DVFlQRS4gSW4gY29udHJhc3QsIFdlYmtp
dCBicm93c2Vycywgd2hpY2ggcmVzcGVjdCB0aGUgWE1MIHJlY29tbWVuZGF0aW9uLCB0aGV5IGRv
IG5vdCBkaXNwbGF5IGFueSBkcmFjb25pYW4gZXJyb3IgbWVzc2FnZS4gPC9wPgoKPHA+Rm9yIEhU
TUwsIGFnYWluLCB0aGUgbWlzLWludGVycHJldGF0aW9uIG9mIE9wZXJhIGFuZCBGaXJlZm94IGxl
YWRzIHRoZW0gdG8gc2VlIDMgaWxsZWdhbCBjaGFyYWN0ZXJzIGJlZm9yZSB0aGUgRE9DVFlQRSwg
d2hpY2ggaW4gdHVybnMgc2VuZHMgdGhlbSBpbnRvIHF1aXJrcyBtb2RlIC0gdGhpcyBpcyBhbiBp
bXBvcnRhbnQgcmVhc29uIGZvciB3aHkgdXNlciBpbnRlcmFjdGlvbiBhbmQgSFRUUCBzaG91bGQg
YmUgaWdub3JlZCB3aGVuZXZlciB0aGVyZSBpcyBhIEJPTS48L3A+IAoKPGRpdiBzdHlsZT0iYm9y
ZGVyOnNvbGlkIGJyb3duO2JvcmRlci1yYWRpdXM6MjBweDtwYWRkaW5nOjE1cHg7Ij48cD48c3Ry
b25nPkNTUyBib3ggbW9kZWwgZXJyb3I6PC9zdHJvbmc+IElmIHRoaXMgZG9jdW1lbnQgaXMgaW50
ZXJwcmV0ZWQgYXMgSFRNTCwgdGhlbiBpbiBGaXJlZm94IGFuZCBPcGVyYSB5b3UgY2FuIHNlZSB0
aGUgZWZmZWN0IG9mIHRoZSBRdWlya3MtTW9kZSBvbiB0aGVzZSB0byBlbGVtZW50czo8L3A+IAog
CjxkaXYgc3R5bGU9InBhZGRpbmc6MTBweDtib3JkZXI6ZG91YmxlIDNweCBncmVlbjt3aWR0aDox
MDBweDttYXJnaW46YXV0bzsiPlJlZmVyZW5jZTogdGhpcyBlbGVtZW50IGlzIGFsd2F5cyAxMDAg
cGl4ZWxzIHdpZGUuPC9kaXY+CjxkaXYgc3R5bGU9InBhZGRpbmc6MTBweDtib3JkZXI6ZG91Ymxl
IDNweCBncmVlbjt3aWR0aDoxMDA7bWFyZ2luOmF1dG87Ij5UaGUgd2lkdGggYXR0cmlidXRlIGZv
ciB0aGlzIGVsZW1lbnQgaXMgbGFja2luZyB1bml0IGluZm9ybWF0aW9uLiBJbiBuby1xdWlya3Mg
bW9kZSwgaXQgd2lsbCB0aHVzIGZpbGwgdGhlIGVudGlyZSB3aWR0aCBvZiB0aGUgc2NyZWVuLiBP
dGhlcndpc2UsIGl0IHdpbGwgYmUgMTAwIHBpeGVscyB3aWRlLiA8L2Rpdj4KPC9kaXY+Cgo8L2Jv
ZHk+PC9odG1sPgo=
</data>

          </attachment>
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>995</attachid>
            <date>2011-06-06 21:09:57 +0000</date>
            <delta_ts>2011-06-06 21:12:53 +0000</delta_ts>
            <desc>Polyglot file with BOM served as &apos;text/html charset=koi8-r&apos;</desc>
            <filename>file.html.koi8-r.html</filename>
            <type>text/html; charset=koi8-r</type>
            <size>3260</size>
            <attacher name="Leif Halvard Silli">xn--mlform-iua</attacher>
            
              <data encoding="base64">77u/PCFET0NUWVBFIGh0bWw+CjxodG1sIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3ho
dG1sIiB4bWw6bGFuZz0iZW4iIGxhbmc9ImVuIj4KIDxoZWFkPgogIDxtZXRhIGNoYXJzZXQ9IktP
STgtciIgLz4KICA8dGl0bGU+VVRGLTggZW5jb2RlZCBkb2N1bWVudCB3aXRoIGVycm9uZW91cyBl
eHRlcm5hbCBlbmNvZGluZzwvdGl0bGU+CjwvaGVhZD48Ym9keT4KPGgxPlRlc3QgZG9jdW1lbnQ6
IFVURi04IGVuY29kZWQgZG9jdW1lbnQgd2l0aCBlcnJvbmVvdXMgZXh0ZXJuYWwgZW5jb2Rpbmc8
L2gxPgo8cD5UaGlzIEhUTUwtY29tcGF0aWJsZSBYSFRNTCBkb2N1bWVudCwgaXMgZW5jb2RlZCB3
aXRoIHRoZSBVVEYtOCBlbmNvZGluZyBhbmQgaXMgYWxzbyBnaXZlbiBhIGNoYXJhY3RlciBlbmNv
ZGluZyBzaWduYXR1cmUgaW4gdGhlIGZvcm0gb2YgYSBCeXRlIE9yZGVyIE1hcmsgKEJPTSkuIEhv
d2V2ZXIsIGluIGNvbnRyYXN0IHRvIHRoaXMsIHRoZSBIVFRQIENvbnRlbnQtVHlwZTogaGVhZGVy
IGNvbWluZyBmcm9tIHRoZSBXZWIgc2VydmVyLCBjbGFpbXMgKHN1Y2ggaXMgYSBsZWFzdCB0aGUg
cGxhbiAuLi4pIHRoYXQgdGhlIGVuY29kaW5nIG9mIHRoaXMgZG9jdW1lbnQgaXMgSVNPLTg4NTkt
MS48L3A+CjxkaXYgc3R5bGU9ImJvcmRlcjpzb2xpZCBicm93bjtib3JkZXItcmFkaXVzOjIwcHg7
cGFkZGluZzoxNXB4OyI+PHA+PHN0cm9uZz5DaGFyYWN0ZXIgZ2liYmVyaXNoOjwvc3Ryb25nPiBI
ZXJlIGFyZSBzb21lIG5vbi1BU0NJSSBsZXR0ZXJzIHdoaWNpaCByZXF1aXJlcyBVVEYtOCBpbnRl
cnByZXRhdGlvbiBpbiBvcmRlciB0byBiZSBkaXNwbGF5ZWQgY29ycmVjdGx5OiA8dmFyPsOmw7jD
pSDDhsOYw4Ugw7bDvMO/IMOUw5vFuCDQkNCR0JIg0LDQsdCyINCv0K7QliDRj9GO0LY8L3Zhcj48
L3A+CjwvZGl2Pgo8cD5Gb3Igc2l0dWF0aW9ucyB3aGVyZSB0d28gbGF5ZXJzIHNwZWNpZmllcyBk
aWZmZXJlbnQgZW5jb2RpbmcsIHRoZW4gPGEgaHJlZj0iaHR0cDovL3d3dy53My5vcmcvVFIveG1s
LyNzZWMtZ3Vlc3Npbmctd2l0aC1leHQtaW5mbyI+WE1MIDEuMCBhcHBlbmRpeCBGLjIgcmVjb21t
ZW5kczwvYT46IDwvcD4KPGJsb2NrcXVvdGU+CjxwPkluIHRoZSBpbnRlcmVzdHMgb2YgaW50ZXJv
cGVyYWJpbGl0eSwgaG93ZXZlciwgdGhlIGZvbGxvd2luZyBydWxlIGlzIHJlY29tbWVuZGVkLjwv
cD4KCjx1bD48bGk+SWYgYW4gWE1MIGVudGl0eSBpcyBpbiBhIGZpbGUsIHRoZSBCeXRlLU9yZGVy
IE1hcmsgYW5kIGVuY29kaW5nIGRlY2xhcmF0aW9uIGFyZSB1c2VkIChpZiBwcmVzZW50KSB0byBk
ZXRlcm1pbmUgdGhlIGNoYXJhY3RlciBlbmNvZGluZy48L2xpPjwvdWw+CjwvYmxvY2txdW90ZT4K
CjxwPkZvciBIVE1MLCB0aGVuIGF0IGxlYXN0IEludGVybmV0IEV4cGxvcmVyIDggYW5kIFdlYmtp
dCAoU2FmYXJpLCBDaHJvbWUpIGJlaGF2ZSBhcyByZWNvbW1lbmRlZCBmb3IgWE1MIDEuMDogVGhl
eSByZXNwZWN0IHRoZSBCT00gbW9yZSB0aGFuIHRoZXkgcmVzcGVjdCB0aGUgSFRUUCBDb250ZW50
LVR5cGU6IGhlYWRlci4gVGhleSBhbHNvIHJlc3BlY3QgdGhlIEJPTSBtb3JlIHRoYW4gYSB1c2Vy
J3MgcG9zc2libGUgYXR0ZW1wdCB0byBvdmVycmlkZSB0aGUgZW5jb2RpbmcsIGFuZCBmb3IgV2Vi
a2l0IHRoaXMgZ29lcyBmb3IgYm90aCBYTUwgYW5kIEhUTUwuIChJIGhhdmUgbm90IHRlc3RlZCBJ
bnRlcm5ldCBFeHBsb3JlciB2ZXJzaW9uIDkuKTwvcD4KCjxwPkZvciBYTUwsIHRoZW4gT3BlcmEg
YW5kIEZpcmVmb3ggZG8gbm90IHJlc3BlY3QgdGhlIEJPTSBhcyBtdWNoIGFzIHRoZSBYTUwgc3Bl
Y2lmaWNhdGlvbiByZWNvbW1lbmRzLiBBcyBhIGNvbnNxdWVuc2UsIGluIGZhY2Ugb2YgYW4gWE1M
IGRvY3VtZW50IHdpdGggZXJyb25lb3VzIGVuY29kaW5nIGluZm8gaW5zaWRlIHRoZSBIVFRQIENv
bnRlbnQtVHlwZTogaGVhZGVyLCB0aGVuIEZpcmVmb3ggYW5kIE9wZXJhIGZpcmVzIGEgZHJhY29u
aWFuIGVycm9yIG1lc3NzYWdlLiBGb3IgaW5zdGFuY2UsIHRoaXMgZG9jdW1lbnQgaGFzIGEgSFRU
UCBDb250ZW50LVR5cGU6IGhlYWRlciB3aGljaCBzYXlzICJJU08tODg1OS0xIiwgd2hpY2ggLSB3
aGVuIHRoaXMgbGFibGUgaXMgcmVzcGVjdGVkLCBsZWFkcyB0aGUgcGFyc2VyIHRvIHNlZSBzb21l
IGlsbGVnYWwgY2hhcmFjdGVycyBiZWZvciB0aGUgRE9DVFlQRS4gSW4gY29udHJhc3QsIFdlYmtp
dCBicm93c2Vycywgd2hpY2ggcmVzcGVjdCB0aGUgWE1MIHJlY29tbWVuZGF0aW9uLCB0aGV5IGRv
IG5vdCBkaXNwbGF5IGFueSBkcmFjb25pYW4gZXJyb3IgbWVzc2FnZS4gPC9wPgoKPHA+Rm9yIEhU
TUwsIGFnYWluLCB0aGUgbWlzLWludGVycHJldGF0aW9uIG9mIE9wZXJhIGFuZCBGaXJlZm94IGxl
YWRzIHRoZW0gdG8gc2VlIDMgaWxsZWdhbCBjaGFyYWN0ZXJzIGJlZm9yZSB0aGUgRE9DVFlQRSwg
d2hpY2ggaW4gdHVybnMgc2VuZHMgdGhlbSBpbnRvIHF1aXJrcyBtb2RlIC0gdGhpcyBpcyBhbiBp
bXBvcnRhbnQgcmVhc29uIGZvciB3aHkgdXNlciBpbnRlcmFjdGlvbiBhbmQgSFRUUCBzaG91bGQg
YmUgaWdub3JlZCB3aGVuZXZlciB0aGVyZSBpcyBhIEJPTS48L3A+IAoKPGRpdiBzdHlsZT0iYm9y
ZGVyOnNvbGlkIGJyb3duO2JvcmRlci1yYWRpdXM6MjBweDtwYWRkaW5nOjE1cHg7Ij48cD48c3Ry
b25nPkNTUyBib3ggbW9kZWwgZXJyb3I6PC9zdHJvbmc+IElmIHRoaXMgZG9jdW1lbnQgaXMgaW50
ZXJwcmV0ZWQgYXMgSFRNTCwgdGhlbiBpbiBGaXJlZm94IGFuZCBPcGVyYSB5b3UgY2FuIHNlZSB0
aGUgZWZmZWN0IG9mIHRoZSBRdWlya3MtTW9kZSBvbiB0aGVzZSB0byBlbGVtZW50czo8L3A+IAog
CjxkaXYgc3R5bGU9InBhZGRpbmc6MTBweDtib3JkZXI6ZG91YmxlIDNweCBncmVlbjt3aWR0aDox
MDBweDttYXJnaW46YXV0bzsiPlJlZmVyZW5jZTogdGhpcyBlbGVtZW50IGlzIGFsd2F5cyAxMDAg
cGl4ZWxzIHdpZGUuPC9kaXY+CjxkaXYgc3R5bGU9InBhZGRpbmc6MTBweDtib3JkZXI6ZG91Ymxl
IDNweCBncmVlbjt3aWR0aDoxMDA7bWFyZ2luOmF1dG87Ij5UaGUgd2lkdGggYXR0cmlidXRlIGZv
ciB0aGlzIGVsZW1lbnQgaXMgbGFja2luZyB1bml0IGluZm9ybWF0aW9uLiBJbiBuby1xdWlya3Mg
bW9kZSwgaXQgd2lsbCB0aHVzIGZpbGwgdGhlIGVudGlyZSB3aWR0aCBvZiB0aGUgc2NyZWVuLiBP
dGhlcndpc2UsIGl0IHdpbGwgYmUgMTAwIHBpeGVscyB3aWRlLiA8L2Rpdj4KPC9kaXY+Cgo8L2Jv
ZHk+PC9odG1sPgo=
</data>

          </attachment>
      

    </bug>

</bugzilla>