<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>10656</bug_id>
          
          <creation_ts>2010-09-20 03:28:56 +0000</creation_ts>
          <short_desc>The spec says that for scripts the BOM overrides the HTTP charset</short_desc>
          <delta_ts>2010-10-04 13:54:54 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>pre-LC1 HTML5 spec (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#script-processing-encoding</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>bzbarsky</cc>
    
    <cc>ian</cc>
    
    <cc>julian.reschke</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>39106</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2010-09-20 03:28:56 +0000</bug_when>
    <thetext>Section: http://www.whatwg.org/specs/web-apps/current-work/#script-processing-encoding

Comment:
The spec says that @charset overrides the HTTP charset and doesn&apos;t allow
examining the BOM

Posted from: 173.48.34.3</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39107</commentid>
    <comment_count>1</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2010-09-20 03:34:37 +0000</bug_when>
    <thetext>Specifically, section 4.3.1 has this to say in step 5 of &quot;running a script&quot;:

  If the script element has a charset attribute, then let the script block&apos;s
  character encoding for this script element be the encoding given by the
  charset attribute.

  Otherwise, let the script block&apos;s character encoding for this script element
  be the same as the encoding of the document itself.

Then in step 9 we have:

  Once the resource&apos;s Content Type metadata is available, if it ever is, apply
  the algorithm for extracting an encoding from a Content-Type to it. If this
  returns an encoding, and the user agent supports that encoding, then let the
  script block&apos;s character encoding be that encoding.

So far so good.  But then step 10 says to fetch the script and execute the script block, and the &quot;executing a script block&quot; stuff has a &quot;If the script is from an external file and the script block&apos;s type is a text-based language&quot; section which says to examine the BOM and use the resulting value if a BOM is found, no matter what the HTTP headers said.
That&apos;s certainly not what Gecko does.  Does some other browser do this?  Is it what we want to happen here?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39108</commentid>
    <comment_count>2</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2010-09-20 03:35:56 +0000</bug_when>
    <thetext>Specifically, Gecko&apos;s order here is:

1)  HTTP header
2)  @charset
3)  BOM
4)  Linking document</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39426</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-26 01:08:08 +0000</bug_when>
    <thetext>The spec&apos;s order for scripts is:

1) BOM
2) HTTP header
3) charset=&quot;&quot;
4) Linking document

For documents, it is:

1) User
2) HTTP
3) BOM
4) &lt;meta&gt;
5) History
6) autodetect
7) default

I could see an argument for having the script order be:

1) HTTP header
2) BOM
3) charset=&quot;&quot;
4) Linking document

...but why would you put charset=&quot;&quot; between the HTTP headers and the BOM?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39505</commentid>
    <comment_count>4</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2010-09-26 19:33:10 +0000</bug_when>
    <thetext>&gt; ...but why would you put charset=&quot;&quot; between the HTTP headers and the BOM?

Because it&apos;s more reliably in the presence of arbitrary encodings?  In particular, two-byte encodings that are not UTF-16 could give false positives for the BOM.

In any case, the main weirdness I see here is the BOM overriding the HTTP header (and the inconsistency of this with documents and stylesheets).

What do non-Gecko UAs do here?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39850</commentid>
    <comment_count>5</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-29 00:34:19 +0000</bug_when>
    <thetext>(For the record, I agree that the spec is wrong to have the BOM override HTTP metadata in this case, since it doesn&apos;t anywhere else.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39852</commentid>
    <comment_count>6</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-29 01:04:04 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Upon further investigation I&apos;ve gone with what Gecko does here.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39853</commentid>
    <comment_count>7</comment_count>
    <who name="">contributor</who>
    <bug_when>2010-09-29 01:05:13 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r5545.
Check-in comment: Match Gecko for character encoding processing for &lt;script&gt;
http://html5.org/tools/web-apps-tracker?from=5544&amp;to=5545</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>