[Bug 9984] New: [parser] Insertion point for script@onload doesn't match Firefox

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9984

           Summary: [parser] Insertion point for script@onload doesn't
                    match Firefox
           Product: HTML WG
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: critical
          Priority: P1
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: w3c@adambarth.com
         QAContact: public-html-bugzilla@w3.org
                CC: hsivonen@iki.fi, mike@w3.org, public-html@w3.org


In double-checking our work with the HTML5 parser implementation in Minefield,
we noticed that Minefield handles this case slightly different (likely due to
the spec ambiguity above).  In particular, when the page calls document.write
from the load event of a script tag, Minefield seems to believe there is no
current insertion point and blows away the entire document:

http://trac.webkit.org/export/LATEST/trunk/LayoutTests/fast/tokenizer/write-on-load.html

Assuming we fix Bug 9983, the load event shares the same insertion point record
as the external script itself, resulting in the numerals in that test being
printed in order from 1 to 7.  That behavior appears to match the legacy WebKit
and Firefox behavior and is, therefore, likely compatible with the web.

Note: I think this is a bug in Firefox, not in the spec.  I'm filing it here in
case Henri wants to discuss changing the spec to match Firefox and because it's
tied into Bug 9983.

== Response from Henri ==

Is there evidence of sites calling document.write() from the load handler of a
script so often that calling document.write() from a script load handler needs
to work?

The off-the-main-thread parsing implementation in Minefield makes it necessary
to know in advance which points in the network stream are eligible for
document.write(). Since establishing a point eligible for document.write() is
somewhat complex, it is only done at </script> and only for scripts that don't
have defer or async specified in the source. Furthermore, to be able to perform
multiple DOM modifications in a script-unsafe batch, the HTML5 parser limits
script execution of any kind to well-defined points (</script> mainly plus a
couple of other cases that may go away as soon as other parts of Gecko are
changed not to expect that behavior).

For these reasons, I've made the parser forbid document.write() from all event
handlers. When I did this, the event handler I particularly wanted to prevent
from writing was the SVG load event handler. I wasn't thinking of <script
onload>. While <script onload> of parser-inserted scripts is guaranteed to fire
when the parser is at the </script> safe point (*if* the event fires
synchronously!), I'd rather not punch a special hole for that event handler
without a compelling use case or site compat requirement. (Also, it seems
inconsistent to make load on <script> fire synchronously when load events in
general are async.)

== Response from abarth ==

On Tue, Jun 22, 2010 at 2:36 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
> "Adam Barth" <w3c@adambarth.com> wrote:
>> In double-checking our work with the HTML5 parser implementation in
>> Minefield, we noticed that Minefield handles this case slightly
>> different (likely due to the spec ambiguity above).  In particular,
>> when the page calls document.write from the load event of a script
>> tag, Minefield seems to believe there is no current insertion point
>> and blows away the entire document:
>>
>> http://trac.webkit.org/export/LATEST/trunk/LayoutTests/fast/tokenizer/write-on-load.html
>>
>> Under the above interpretation of the spec, the load event shares the
>> same insertion point record as the external script itself, resulting
>> in the numerals in that test being printed in order from 1 to 7.
>> That
>> behavior appears to match the legacy WebKit and Firefox behavior and
>> is, therefore, likely compatible with the web.
>
> Is there evidence of sites calling document.write() from the load handler of a script so often that calling document.write() from a script load handler needs to work?

I don't have any such evidence at this time.

> The off-the-main-thread parsing implementation in Minefield makes it necessary to know in advance which points in the network stream are eligible for document.write(). Since establishing a point eligible for document.write() is somewhat complex, it is only done at </script> and only for scripts that don't have defer or async specified in the source.

The insertion point, in this case, is in fact such a point.  It's the
same insertion point that we use for the external script itself
(assuming the spec intends to create an insertion point for external
scripts).

> Furthermore, to be able to perform multiple DOM modifications in a script-unsafe batch, the HTML5 parser limits script execution of any kind to well-defined points (</script> mainly plus a couple of other cases that may go away as soon as other parts of Gecko are changed not to expect that behavior).

I didn't understand this statement.  Script can execute at all kinds
of crazy times (especially in light of DOM mutation events and
plug-ins).  I don't understand how you can limit script execution to
point in time when you have such well-defined insertion points.

> For these reasons, I've made the parser forbid document.write() from all event handlers.

This is unlikely to be correct.  For example, what if a script
executing as a result of a <script> element (either external or
inline) dispatches a synchronous event, such as a DOM mutation event
or directly via dispatchEvent?  Surely a document.write call in such
an event handler should respect the current insertion point.  That's
certainly what the spec says for inline scripts.

> When I did this, the event handler I particularly wanted to prevent from writing was the SVG load event handler. I wasn't thinking of <script onload>. While <script onload> of parser-inserted scripts is guaranteed to fire when the parser is at the </script> safe point (*if* the event fires synchronously!),

The spec is explicit about when to fire script@onload synchronously.
The behavior in the spec seems to patch previous versions of Firefox
and WebKit.

> I'd rather not punch a special hole for that event handler without a compelling use case or site compat requirement. (Also, it seems inconsistent to make load on <script> fire synchronously when load events in general are async.)

I guess I don't fully understand the implementation constraints you're
operating under.  There are two issues:

1) Should the script load even fire synchronously.
2) Should synchronous events that call document.write use the current
insertion point.

There are a number of benefits to firing the script load event synchronously:

A) The behavior matches all the shipping browsers I've tested.
B) Firing the script load event synchronously is more predictable for
developers and less likely to lead to race conditions.
C) There are unknown compatibility implications for changing when the
event is fired.

There are a number of benefits for having synchronous events use the
current insertion point.

a) The behavior matches all the shipping browsers I've tested.
b) Reusing the current insertion point is more predictable for
developers and less likely to lead to race conditions.
c) Reusing the current insertion point better matches the mental model
for how HTML documents are processed (basically, it abstracts away how
much of the document input stream is buffered in the network layer and
how much is buffered inside the tokenizer).
d) There are unknown compatibility implications for changing how
document.write in synchronous events behaves.

The only "con" I see to (1) and (2) is a new implementation constraint
in Gecko that I don't quite understand given that we're not creating
any new insertion points (these events are just using insertion points
that already exist for the <script> elements themselves).

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Tuesday, 22 June 2010 21:36:31 UTC