<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>11426</bug_id>
          
          <creation_ts>2010-11-29 13:04:21 +0000</creation_ts>
          <short_desc>Meta prescan should run on the first 1024 bytes</short_desc>
          <delta_ts>2011-08-04 05:11:44 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>LC1 HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P1</priority>
          <bug_severity>critical</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Henri Sivonen">hsivonen</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>ap</cc>
    
    <cc>ian</cc>
    
    <cc>jennb</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>w3c</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>42834</commentid>
    <comment_count>0</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-11-29 13:04:21 +0000</bug_when>
    <thetext>http://www.whatwg.org/specs/web-apps/current-work/#determining-the-character-encoding

The spec says:
&quot;The user agent may wait for more bytes of the resource to be available, either in this step or at any later step in this algorithm. For instance, a user agent might wait 500ms or 512 bytes, whichever came first. In general preparsing the source to find the encoding improves performance, as it reduces the need to throw away the data structures used when parsing upon finding the encoding information. However, if the user agent delays too long to obtain data to determine the encoding, then the cost of the delay could outweigh any performance improvements from the preparse.&quot;

First, the spec should suggest 1024 bytes instead of 512. Second, for predictable results, the spec should probably require the prescan to inspect the first 1024 (stopping earlier if an internal encoding declaration is found earlier).

(It follows that if the server sends 1023 unlabeled bytes and then lets the connection stall, nothing is rendered while the connection stalls.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42835</commentid>
    <comment_count>1</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-11-29 13:08:00 +0000</bug_when>
    <thetext>CCing ap@webkit.org for verification that WebKit really uses 1024 as the special number of bytes. (IIRC, I got the number 1024 from ap.) Gecko as of Firefox 4.0 does.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42922</commentid>
    <comment_count>2</comment_count>
    <who name="Jenn Braithwaite">jennb</who>
    <bug_when>2010-11-30 19:48:38 +0000</bug_when>
    <thetext>I can confirm that WebKit really uses 1024 as the special number of bytes.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>43177</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-12-08 01:29:52 +0000</bug_when>
    <thetext>We don&apos;t want to _require_ that the UA scan, since otherwise you&apos;d never render a document that hung after 1023 bytes without an encoding. There has to be some timeout, and since it has perf implications, it seems like UAs should be allowed to reduce it to zero.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>43178</commentid>
    <comment_count>4</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-12-08 01:32:31 +0000</bug_when>
    <thetext>That WebKit only waits for 1024 bytes conflicts with other information I was given, namely that WebKit _only_ uses a prescan and doesn&apos;t look at &lt;meta&gt; in the parser at all, given that it passes this test:
http://hixie.ch/tests/adhoc/html/parsing/encoding/054.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>43180</commentid>
    <comment_count>5</comment_count>
    <who name="Jenn Braithwaite">jennb</who>
    <bug_when>2010-12-08 01:40:45 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; That WebKit only waits for 1024 bytes conflicts with other information I was
&gt; given, namely that WebKit _only_ uses a prescan and doesn&apos;t look at &lt;meta&gt; in
&gt; the parser at all, given that it passes this test:
&gt; http://hixie.ch/tests/adhoc/html/parsing/encoding/054.html

The condition used by WebKit is: 1024 bytes &amp;&amp; no longer in head section</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>43188</commentid>
    <comment_count>6</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-12-08 10:31:28 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; We don&apos;t want to _require_ that the UA scan, since otherwise you&apos;d never render
&gt; a document that hung after 1023 bytes without an encoding.

That&apos;s the situation in Gecko right now. No one has complained yet.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>45299</commentid>
    <comment_count>7</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-02-09 00:02:02 +0000</bug_when>
    <thetext>Looks like WebKit actually uses 1024 for the number of bytes to the _beginning_ of the &lt;meta&gt;, and never bails if it&apos;s not in the &lt;head&gt;, and only uses the preparse, not the full algorithm, so all in all it doesn&apos;t really match the spec at all:

   http://hixie.ch/tests/adhoc/html/parsing/encoding/134.html
   http://hixie.ch/tests/adhoc/html/parsing/encoding/135.html

Re comment 6: Please consider this your first complaint, then. Blocking until the first 1024 bytes have been seen without a timeout would mean that hanging-GET style iframes would not fire anything until a kilobyte of data has been received, which could result in several events getting eaten up.

EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: see diff given below
Rationale: I don&apos;t mind updating the recommendation to 1024 bytes, since it is closer to what browsers do (though clearly not identical), but forcing browsers to stall seems like it would harm potentially good performance competition.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>45300</commentid>
    <comment_count>8</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-02-09 00:02:20 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r5860.
Check-in comment: Change the limit for where charsets should be given to the first 1024 bytes.
http://html5.org/tools/web-apps-tracker?from=5859&amp;to=5860</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>45482</commentid>
    <comment_count>9</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2011-02-14 14:55:39 +0000</bug_when>
    <thetext>(In reply to comment #7)
&gt; Looks like WebKit actually uses 1024 for the number of bytes to the _beginning_
&gt; of the &lt;meta&gt;, 

Thanks for not following WebKit for beginning versus end of the meta.

&gt; Re comment 6: Please consider this your first complaint, then. Blocking until
&gt; the first 1024 bytes have been seen without a timeout would mean that
&gt; hanging-GET style iframes would not fire anything until a kilobyte of data has
&gt; been received, which could result in several events getting eaten up.

Only if hanging-get authors aren&apos;t competent enough to declare their encoding up front.

&gt; EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are
&gt; satisfied with this response, please change the state of this bug to CLOSED. 

OK.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>53233</commentid>
    <comment_count>10</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2011-08-04 05:11:44 +0000</bug_when>
    <thetext>mass-move component to LC1</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>