This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9984 - [parser] Insertion point for script@onload doesn't match Firefox
Summary: [parser] Insertion point for script@onload doesn't match Firefox
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P1 critical
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-22 21:36 UTC by Adam Barth
Modified: 2010-10-04 14:45 UTC (History)
7 users (show)

See Also:


Attachments

Description Adam Barth 2010-06-22 21:36:28 UTC
In double-checking our work with the HTML5 parser implementation in Minefield, we noticed that Minefield handles this case slightly different (likely due to the spec ambiguity above).  In particular, when the page calls document.write from the load event of a script tag, Minefield seems to believe there is no current insertion point and blows away the entire document:

http://trac.webkit.org/export/LATEST/trunk/LayoutTests/fast/tokenizer/write-on-load.html

Assuming we fix Bug 9983, the load event shares the same insertion point record as the external script itself, resulting in the numerals in that test being printed in order from 1 to 7.  That behavior appears to match the legacy WebKit and Firefox behavior and is, therefore, likely compatible with the web.

Note: I think this is a bug in Firefox, not in the spec.  I'm filing it here in case Henri wants to discuss changing the spec to match Firefox and because it's tied into Bug 9983.

== Response from Henri ==

Is there evidence of sites calling document.write() from the load handler of a script so often that calling document.write() from a script load handler needs to work?

The off-the-main-thread parsing implementation in Minefield makes it necessary to know in advance which points in the network stream are eligible for document.write(). Since establishing a point eligible for document.write() is somewhat complex, it is only done at </script> and only for scripts that don't have defer or async specified in the source. Furthermore, to be able to perform multiple DOM modifications in a script-unsafe batch, the HTML5 parser limits script execution of any kind to well-defined points (</script> mainly plus a couple of other cases that may go away as soon as other parts of Gecko are changed not to expect that behavior).

For these reasons, I've made the parser forbid document.write() from all event handlers. When I did this, the event handler I particularly wanted to prevent from writing was the SVG load event handler. I wasn't thinking of <script onload>. While <script onload> of parser-inserted scripts is guaranteed to fire when the parser is at the </script> safe point (*if* the event fires synchronously!), I'd rather not punch a special hole for that event handler without a compelling use case or site compat requirement. (Also, it seems inconsistent to make load on <script> fire synchronously when load events in general are async.)

== Response from abarth ==

On Tue, Jun 22, 2010 at 2:36 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
> "Adam Barth" <w3c@adambarth.com> wrote:
>> In double-checking our work with the HTML5 parser implementation in
>> Minefield, we noticed that Minefield handles this case slightly
>> different (likely due to the spec ambiguity above).  In particular,
>> when the page calls document.write from the load event of a script
>> tag, Minefield seems to believe there is no current insertion point
>> and blows away the entire document:
>>
>> http://trac.webkit.org/export/LATEST/trunk/LayoutTests/fast/tokenizer/write-on-load.html
>>
>> Under the above interpretation of the spec, the load event shares the
>> same insertion point record as the external script itself, resulting
>> in the numerals in that test being printed in order from 1 to 7.
>> That
>> behavior appears to match the legacy WebKit and Firefox behavior and
>> is, therefore, likely compatible with the web.
>
> Is there evidence of sites calling document.write() from the load handler of a script so often that calling document.write() from a script load handler needs to work?

I don't have any such evidence at this time.

> The off-the-main-thread parsing implementation in Minefield makes it necessary to know in advance which points in the network stream are eligible for document.write(). Since establishing a point eligible for document.write() is somewhat complex, it is only done at </script> and only for scripts that don't have defer or async specified in the source.

The insertion point, in this case, is in fact such a point.  It's the
same insertion point that we use for the external script itself
(assuming the spec intends to create an insertion point for external
scripts).

> Furthermore, to be able to perform multiple DOM modifications in a script-unsafe batch, the HTML5 parser limits script execution of any kind to well-defined points (</script> mainly plus a couple of other cases that may go away as soon as other parts of Gecko are changed not to expect that behavior).

I didn't understand this statement.  Script can execute at all kinds
of crazy times (especially in light of DOM mutation events and
plug-ins).  I don't understand how you can limit script execution to
point in time when you have such well-defined insertion points.

> For these reasons, I've made the parser forbid document.write() from all event handlers.

This is unlikely to be correct.  For example, what if a script
executing as a result of a <script> element (either external or
inline) dispatches a synchronous event, such as a DOM mutation event
or directly via dispatchEvent?  Surely a document.write call in such
an event handler should respect the current insertion point.  That's
certainly what the spec says for inline scripts.

> When I did this, the event handler I particularly wanted to prevent from writing was the SVG load event handler. I wasn't thinking of <script onload>. While <script onload> of parser-inserted scripts is guaranteed to fire when the parser is at the </script> safe point (*if* the event fires synchronously!),

The spec is explicit about when to fire script@onload synchronously.
The behavior in the spec seems to patch previous versions of Firefox
and WebKit.

> I'd rather not punch a special hole for that event handler without a compelling use case or site compat requirement. (Also, it seems inconsistent to make load on <script> fire synchronously when load events in general are async.)

I guess I don't fully understand the implementation constraints you're
operating under.  There are two issues:

1) Should the script load even fire synchronously.
2) Should synchronous events that call document.write use the current
insertion point.

There are a number of benefits to firing the script load event synchronously:

A) The behavior matches all the shipping browsers I've tested.
B) Firing the script load event synchronously is more predictable for
developers and less likely to lead to race conditions.
C) There are unknown compatibility implications for changing when the
event is fired.

There are a number of benefits for having synchronous events use the
current insertion point.

a) The behavior matches all the shipping browsers I've tested.
b) Reusing the current insertion point is more predictable for
developers and less likely to lead to race conditions.
c) Reusing the current insertion point better matches the mental model
for how HTML documents are processed (basically, it abstracts away how
much of the document input stream is buffered in the network layer and
how much is buffered inside the tokenizer).
d) There are unknown compatibility implications for changing how
document.write in synchronous events behaves.

The only "con" I see to (1) and (2) is a new implementation constraint
in Gecko that I don't quite understand given that we're not creating
any new insertion points (these events are just using insertion points
that already exist for the <script> elements themselves).
Comment 1 Ian 'Hixie' Hickson 2010-07-14 18:29:48 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Unless there are specific compatibility constraints, I'd rather not make onload="" on <script> a special exception to the general rule that document.write() is unsafe in event handlers.
Comment 2 Adam Barth 2010-07-14 18:41:17 UTC
I don't quite understand your response.  My understanding of the current spec is that calling document.write() from script@onload doesn't blow away the document.  If you'd rather not make onload="" on <script> a special exception to the general rule that document.write() is unsafe in event handlers then won't you have to change the spec?  Note that DOM mutation events and events dispatched via a script calling dispatchEvent are already exceptions to this rule.

(It's entirely possible I'm misunderstanding the spec.)
Comment 3 Adam Barth 2010-07-14 22:16:39 UTC
Just chatted with Ian in #whatwg.  The spec fires the load event synchronously while we still have an active insertion point.  I like the spec behavior, so I'm not asking for it to change in the spec.  The behavior is just different in Firefox 4b1, and I want to make sure Henri is ok with the current text.  I'm happy to move this to a bugzilla.mozilla.org bug if that's more appropriate.
Comment 4 Henri Sivonen 2010-07-15 12:16:00 UTC
(In reply to comment #0)
> > I'd rather not punch a special hole for that event handler without a compelling use case or site compat requirement. (Also, it seems inconsistent to make load on <script> fire synchronously when load events in general are async.)
> 
> I guess I don't fully understand the implementation constraints you're
> operating under.  There are two issues:
> 
> 1) Should the script load even fire synchronously.
> 2) Should synchronous events that call document.write use the current
> insertion point.
> 
> There are a number of benefits to firing the script load event synchronously:
> 
> A) The behavior matches all the shipping browsers I've tested.
> B) Firing the script load event synchronously is more predictable for
> developers and less likely to lead to race conditions.
> C) There are unknown compatibility implications for changing when the
> event is fired.
> 
> There are a number of benefits for having synchronous events use the
> current insertion point.
> 
> a) The behavior matches all the shipping browsers I've tested.
> b) Reusing the current insertion point is more predictable for
> developers and less likely to lead to race conditions.
> c) Reusing the current insertion point better matches the mental model
> for how HTML documents are processed (basically, it abstracts away how
> much of the document input stream is buffered in the network layer and
> how much is buffered inside the tokenizer).
> d) There are unknown compatibility implications for changing how
> document.write in synchronous events behaves.
> 
> The only "con" I see to (1) and (2) is a new implementation constraint
> in Gecko that I don't quite understand given that we're not creating
> any new insertion points (these events are just using insertion points
> that already exist for the <script> elements themselves).

I had a look at the code. It's certainly doable to reorder the script load event firing and the insertion point getting undefined.

What I'm not quite sure about is how this will interact with the relative ordering of multiple document.write()s when the script load handler initiates nested document.write()s. It's easier to see in a debugger what happens than to try to read the code... I'll report back. Let's keep this spec bug open for a moment longer. (Sorry about the slow response.)
Comment 5 Ian 'Hixie' Hickson 2010-08-16 21:13:36 UTC
Reassigning to you, Henri. Please reassign it to me if you decide something in the spec needs changing.
Comment 6 Henri Sivonen 2010-08-25 10:45:05 UTC
So, in Gecko, there are the following operations of interest (assuming sicking's patch that adds scriptwillexecute and scriptdidexecute lands in its current form) is as follows:
 1) the parser gets unblocked.
 2) scriptwillexecute fires.
 3) insertion point becomes non-undefined
 4) the document becomes aware of an external script being evaluated (for write-neutralization) if applicable
 5) the insertion point becomes the right point for this script
 6) the script is evaluated!
 7) the insertion point stops being the right point for this script
 8) the document becomes unaware of an external script being evaluated
 9) the insertion point potentially becomes undefined
10) scriptdidexecute fires
11) load (for external scripts) fires

Now, to support document.write() from onload, the steps above would need to be shuffled so that the load event fires before step #7. In that case, onload would have the insertion point behavior equivalent of the statements in the event handler being part of the script itself.

Alternatively, load could fire after step #7 but behave for insertion point purposes as if it were a script immediately after the script proper *but* executing before external scripts written by the script proper get to execute.

Old Gecko behaves as if the insertion point were the insertion point of the parent script. WebKit and Opera behave differently from old Gecko and each other:
http://hsivonen.iki.fi/test/write-on-load.html
In particular, old WebKit doesn't appear to behave the way abarth describes WebKit's new behavior in comment 0.

Making load behave as if it were a distinct script for insertion point purposes *and* letting the external scripts written by the script proper run first would be pain.

In the interest of time, I didn't find out what IE does (since data: URLs don't work in IE). However, given that there's no use case, no evidence of sites breaking and no uniform legacy behavior, I'd prefer not to make it possible to write content to the parser from onload or from the handlers for the new events sicking is adding.

So I think the spec should change to make it so that the insertion point is undefined when the onload event handler runs (or when the handlers for the events sicking is adding run).
Comment 7 Ian 'Hixie' Hickson 2010-08-26 18:48:51 UTC
But presumably you still want the event to allow document.write() when the whole thing is happening under document.open()/document.write() calls, right?

Why is non-parser code setting the insertion point in Gecko? That seems odd.

I don't really know how to do this at a spec level, short of having a special flag that unsets the insertion point temporarily while firing 'load' or something equally silly. Or having the parser fire 'load' if it's the one executing it, or something convoluted like that.

Adam, any views on whether 'load' here should change?
Comment 8 Henri Sivonen 2010-08-31 15:24:20 UTC
(In reply to comment #7)
> But presumably you still want the event to allow document.write() when the
> whole thing is happening under document.open()/document.write() calls, right?

Doh. I forgot about that. The premise of my opinion is wrecked. It seems I have more Gecko script execution hacking ahead. 

> Why is non-parser code setting the insertion point in Gecko? That seems odd.

I reasoned that document.write() is only permitted from parser-inserted scripts, so the easiest way to disallow it otherwise was to make the execution of parser-inserted script allow it.

There are several insertion points in Gecko. The parser doesn't know which one is current. The .write() implementation passes the memory address to the parser so the address can be used as a script-unique identifier for searching the right insertion point.

I didn't rewrite the Gecko script execution code for the HTML5 parser. It seemed more reasonable to keep the old script execution code in place, since replacing the parser was big enough a change. So the script execution code is now the old code but a bit evolved. Today, I've been rewriting notable parts of the script execution code, though, after discovering that it was non-compliant in a Web-breaking way.

(In general, having a document-owner script loader manage when parser-inserted scripts block is sadness when a crazy script moves a node to another doc during the parser so that the parser inserts nodes to another doc while assuming the scripts get executed in the context of the script loader of the doc the parser is associated with. Well, more to the point, moving nodes like that is sadness. I expect things to be quite broken if someone does that...)
Comment 9 Henri Sivonen 2010-09-01 13:34:43 UTC
OK. I'll patch Gecko to do what Chromium nightlies do.

Where can I find the copyright&license info for http://trac.webkit.org/export/LATEST/trunk/LayoutTests/fast/tokenizer/write-on-load.html (in order to import it to mozilla-central)?
Comment 10 Adam Barth 2010-09-01 22:16:27 UTC
> Where can I find the copyright&license info for
> http://trac.webkit.org/export/LATEST/trunk/LayoutTests/fast/tokenizer/write-on-load.html
> (in order to import it to mozilla-central)?

http://trac.webkit.org/browser/trunk/WebKit/LICENSE