This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11426 - Meta prescan should run on the first 1024 bytes
Summary: Meta prescan should run on the first 1024 bytes
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: All All
: P1 critical
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-29 13:04 UTC by Henri Sivonen
Modified: 2011-08-04 05:11 UTC (History)
7 users (show)

See Also:


Attachments

Description Henri Sivonen 2010-11-29 13:04:21 UTC
http://www.whatwg.org/specs/web-apps/current-work/#determining-the-character-encoding

The spec says:
"The user agent may wait for more bytes of the resource to be available, either in this step or at any later step in this algorithm. For instance, a user agent might wait 500ms or 512 bytes, whichever came first. In general preparsing the source to find the encoding improves performance, as it reduces the need to throw away the data structures used when parsing upon finding the encoding information. However, if the user agent delays too long to obtain data to determine the encoding, then the cost of the delay could outweigh any performance improvements from the preparse."

First, the spec should suggest 1024 bytes instead of 512. Second, for predictable results, the spec should probably require the prescan to inspect the first 1024 (stopping earlier if an internal encoding declaration is found earlier).

(It follows that if the server sends 1023 unlabeled bytes and then lets the connection stall, nothing is rendered while the connection stalls.)
Comment 1 Henri Sivonen 2010-11-29 13:08:00 UTC
CCing ap@webkit.org for verification that WebKit really uses 1024 as the special number of bytes. (IIRC, I got the number 1024 from ap.) Gecko as of Firefox 4.0 does.
Comment 2 Jenn Braithwaite 2010-11-30 19:48:38 UTC
I can confirm that WebKit really uses 1024 as the special number of bytes.
Comment 3 Ian 'Hixie' Hickson 2010-12-08 01:29:52 UTC
We don't want to _require_ that the UA scan, since otherwise you'd never render a document that hung after 1023 bytes without an encoding. There has to be some timeout, and since it has perf implications, it seems like UAs should be allowed to reduce it to zero.
Comment 4 Ian 'Hixie' Hickson 2010-12-08 01:32:31 UTC
That WebKit only waits for 1024 bytes conflicts with other information I was given, namely that WebKit _only_ uses a prescan and doesn't look at <meta> in the parser at all, given that it passes this test:
http://hixie.ch/tests/adhoc/html/parsing/encoding/054.html
Comment 5 Jenn Braithwaite 2010-12-08 01:40:45 UTC
(In reply to comment #4)
> That WebKit only waits for 1024 bytes conflicts with other information I was
> given, namely that WebKit _only_ uses a prescan and doesn't look at <meta> in
> the parser at all, given that it passes this test:
> http://hixie.ch/tests/adhoc/html/parsing/encoding/054.html

The condition used by WebKit is: 1024 bytes && no longer in head section
Comment 6 Henri Sivonen 2010-12-08 10:31:28 UTC
(In reply to comment #3)
> We don't want to _require_ that the UA scan, since otherwise you'd never render
> a document that hung after 1023 bytes without an encoding.

That's the situation in Gecko right now. No one has complained yet.
Comment 7 Ian 'Hixie' Hickson 2011-02-09 00:02:02 UTC
Looks like WebKit actually uses 1024 for the number of bytes to the _beginning_ of the <meta>, and never bails if it's not in the <head>, and only uses the preparse, not the full algorithm, so all in all it doesn't really match the spec at all:

   http://hixie.ch/tests/adhoc/html/parsing/encoding/134.html
   http://hixie.ch/tests/adhoc/html/parsing/encoding/135.html

Re comment 6: Please consider this your first complaint, then. Blocking until the first 1024 bytes have been seen without a timeout would mean that hanging-GET style iframes would not fire anything until a kilobyte of data has been received, which could result in several events getting eaten up.

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: see diff given below
Rationale: I don't mind updating the recommendation to 1024 bytes, since it is closer to what browsers do (though clearly not identical), but forcing browsers to stall seems like it would harm potentially good performance competition.
Comment 8 contributor 2011-02-09 00:02:20 UTC
Checked in as WHATWG revision r5860.
Check-in comment: Change the limit for where charsets should be given to the first 1024 bytes.
http://html5.org/tools/web-apps-tracker?from=5859&to=5860
Comment 9 Henri Sivonen 2011-02-14 14:55:39 UTC
(In reply to comment #7)
> Looks like WebKit actually uses 1024 for the number of bytes to the _beginning_
> of the <meta>, 

Thanks for not following WebKit for beginning versus end of the meta.

> Re comment 6: Please consider this your first complaint, then. Blocking until
> the first 1024 bytes have been seen without a timeout would mean that
> hanging-GET style iframes would not fire anything until a kilobyte of data has
> been received, which could result in several events getting eaten up.

Only if hanging-get authors aren't competent enough to declare their encoding up front.

> EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> satisfied with this response, please change the state of this bug to CLOSED. 

OK.
Comment 10 Michael[tm] Smith 2011-08-04 05:11:44 UTC
mass-move component to LC1