This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9843 - Specced behavior for document.write("<link rel=stylesheet href=...><script>...</script>...") matches none of the top 4 engines
Summary: Specced behavior for document.write("<link rel=stylesheet href=...><script>.....
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC Linux
: P1 critical
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://hsivonen.iki.fi/test/moz/sheet...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-03 13:07 UTC by Henri Sivonen
Modified: 2010-10-12 09:38 UTC (History)
11 users (show)

See Also:


Attachments

Description Henri Sivonen 2010-06-03 13:07:52 UTC
Consider:
document.write("<link rel=stylesheet href=...><script>...</script>...")

According to the spec, the style sheet becomes a style sheet blocking scripts.
http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#a-style-sheet-blocking-scripts

The according to http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#parsing-main-incdata , case An end tag whose tag name is "script", subcase "Otherwise", step 3, the UA must "Spin the event loop until there is no style sheet blocking scripts and the script's "ready to be parser-executed" flag is set."

As far as I can tell, this is not what happens in reality. What seems to happen is:

In Gecko (old and new parser), the style sheet becomes a style sheet blocking scripts, the parser blocks at the written </script> and document.write returns early before the written content after </script> has been tokenized.

In IE8, the style sheet becomes a style sheet blocking scripts, but document.written internal scripts are immune to style sheets blocking scripts, so the document.write tokenizes to completion and the next inline script from the network blocks on the style sheet. (Sorry, the demo doesn't prove the part about the next network-originating script blocking without editing the demo and experimenting.)

In Chrome beta channel, the style sheet doesn't block scripts but blocks painting. The whole page is parsed to completion but isn't painted until the style sheet has been loaded.

In Opera, the style sheet doesn't block anything. The page FOUCs.

I think *any* of these four behaviors is preferable over creating a nested event loop from within document.write(), so I request the spec be changed to one of the pre-existing behaviors on this point.

If Gecko's behavior is chosen for the spec, it would be necessary to specify that the style sheet fetch completes asynchronously (even if the style sheet were available immediately) in order to avoid making the document.write return behavior sensitive to caching or to the URL scheme (data: URLs).

I don't have data to be able to argue the choice among the four behaviors based on compatibility data. IE's behavior is the hardest to distinguish from what the spec says now.

Note that it's not necessary to change the behavior for the case where the <link> and the internal script come from the network stream, since that can be handled per current spec (and Gecko does) without creating a nested event loop.
Comment 1 Ian 'Hixie' Hickson 2010-07-14 21:30:17 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale:

Surely the script nesting level in this case (when you parse the document.written "</script>") is non-zero, and so you don't go down the path that blocks, you just go down the path that pauses the parser and resumes later when you're not nested. Or am I misunderstanding?
Comment 2 Henri Sivonen 2010-08-31 08:46:36 UTC
(In reply to comment #1)
> Surely the script nesting level in this case (when you parse the
> document.written "</script>") is non-zero, and so you don't go down the path
> that blocks, you just go down the path that pauses the parser and resumes later
> when you're not nested. Or am I misunderstanding?

The spec says "yielding control back to the caller" in that case, which is what Gecko does, but AFAICT, the spec yields the control back to the caller depending on whether an external style sheet has loaded. This seems bad.

Am I misunderstanding something?

I'm inclined to make document.written inline scripts not check if there's a "style sheet blocking scripts". (I realize that this requires a concept that the spec doesn't currently have: the script element knowing if it was document.written. In Gecko, an element is considered to be document.written, if the '>' character of the start tag was document.written.)
Comment 3 Henri Sivonen 2010-09-01 08:00:03 UTC
First, http://hsivonen.iki.fi/test/moz/sheet-blocking-script-baseline.php shows that WebKit (incl. Chromium nightly) and Opera don't block even non-document.written parser-inserted scripts until style sheets have loaded.

This leaves only IE and old Gecko as existing behaviors to consider. Gecko's old behavior is not OK, because it makes document.write() return or not return early depending on the loading state of style sheets:
http://hsivonen.iki.fi/test/moz/sheet-blocking-script.html

http://hsivonen.iki.fi/test/moz/sheet-blocking-script3.html shows that script-written sheet can block scripts in IE.

http://hsivonen.iki.fi/test/moz/sheet-blocking-script2.html shows that document.written scripts don't wait for style sheets IE even when another script has already caused document.write to return early. However, these tests also show that IE and Opera put written scripts in the DOM right away, which is *really* weird and not something I want to clone and not something I'd want the spec to say.

My WIP patch for Gecko make Gecko match Chromium nightlies on the number of script elements in the DOM at a given point but makes Gecko match IE on the availability of computed style. (I.e. the patch looks at the document.writtenness of parser-inserted inline scripts instead of checking if document.write is still on the call stack.)

I could be persuaded that I should check whether document.write is on the call stack instead, but this is the story I'm going to go with for now.
Comment 4 Ian 'Hixie' Hickson 2010-09-25 19:11:27 UTC
(In reply to comment #2)
> The spec says "yielding control back to the caller" in that case, which is what
> Gecko does, but AFAICT, the spec yields the control back to the caller
> depending on whether an external style sheet has loaded. This seems bad.

Well it's not great, sure, but in practice what's the worst that could happen? It's not like the style sheet is going to completely load before the event loop spins anyway, right? I could make this explicit if you like.

Making things depend on where the "<" came from or some such seems like adding yet more hacks to an already particularly large pile of hacks. Given that there's no interop here, I'd really rather just do the minimum required to be compatible with legacy content and have the design be somewhat sane.

Also note that there are no nested event loops here. The parser just blocks and returns control to the event loop until the style sheet is ready, it doesn't nest the event loop (unless you implement it that way, but that would be bad, as you point out).
Comment 5 Henri Sivonen 2010-09-29 08:27:25 UTC
(In reply to comment #4)
> Well it's not great, sure, but in practice what's the worst that could happen?

When the document's parser is not script-created, the DOM after document.write("<link rel=stylesheet href=...><script>...</script>..."); would be different in HTML5-compliant browsers and in existing Presto, WebKit and Trident (and Gecko 2.0 if Gecko 2.0 if Gecko 2.0 ships in its current state).

When the document's parser is script-created and the script doing the writing pauses between writing a <link> and writing an inline <script>, the DOM after the second write would be network timing-dependent.

> It's not like the style sheet is going to completely load before the event loop
> spins anyway, right? I could make this explicit if you like.

That's a good point for the case where the document's parser is not script-created. It's not such a good point when the document's parser is script-created, but I should test that case in IE before saying more.

You are assuming that data: URLs and cached http: URLs don't go to a loaded state synchronously, though.

> Making things depend on where the "<" came from or some such seems like adding
> yet more hacks to an already particularly large pile of hacks.

ITYM ">". :-)

In Gecko, there's already infrastructure for this. Also, it's already needed in order to ignore document.written charset metas. Presumably, IE has infrastructure, too, since IE varies the blocking behavior for document.write vs. not.

> Given that
> there's no interop here, I'd really rather just do the minimum required to be
> compatible with legacy content and have the design be somewhat sane.

Actually, currently none of the top 4 engines blocks document.written scripts on sheets, so in that sense, there is interop (although Gecko and Trident special-case document.written scripts while WebKit and Presto never block anyway).
Comment 6 Henri Sivonen 2010-09-29 08:35:21 UTC
Answering Hixie's question from IRC: Gecko defines document.written script as a script whose start tag's ending '>' character came from a document.write argument string. I don't know how IE defines document.written script; I tried to make the behavior equivalent in the case where the entire script is document.written and exact definition is moot.

If you test nightlies, please be aware that there's been a pattern of backing out the relevant patch on Mondays and relanding it on Wednesdays.
Comment 7 Ian 'Hixie' Hickson 2010-09-30 02:19:07 UTC
I *really* don't want to require that browsers keep track of the provenance of characters in the input stream. That's pretty weird even for the Web.

Do you have some real world sites I could study to understand this problem better?


> Also, it's already needed in order to ignore document.written charset metas.

Why is that required? Are there pages that break if you don't do that?
Comment 8 Jonas Sicking (Not reading bugmail) 2010-09-30 04:03:57 UTC
FWIW, I think I agree with Hixie. It'd be great if we don't have to keep track of what comes from document.write and what comes from the network.
Comment 9 Jonas Sicking (Not reading bugmail) 2010-09-30 04:33:37 UTC
Hrm.. though now I remember that it does have the side effect that you don't really know how much content has been inserted into the DOM if you document.write an inline script.

I.e. the DOM you see after the insert depends on if all existing stylesheets have loaded or not. That is also not very nice.
Comment 10 Henri Sivonen 2010-09-30 09:09:00 UTC
In Gecko, the provenance tracking comes "for free" from off-the-main-thread parsing already keeping document.written source and network-originating source very, very separate from each other.
Comment 11 Jonas Sicking (Not reading bugmail) 2010-10-06 00:42:49 UTC
Note that this might not remain true as we probably will want to start doing speculative parsing on document.written contents. Apparently other browsers do.
Comment 12 Ian 'Hixie' Hickson 2010-10-06 18:48:27 UTC
FWIW, I discussed this with Hyatt and Hyatt agrees that blocking on any script that might refer to layout dimensions that might depend on script is the right solution for correctness (that's what the spec does).

I agree that this leads to a situation where document.write() can return early based on network latency.

The decision here seems to be between making the input stream track where data comes from, and making how much document.write() has parsed before it returns depend on network latency.

Further input (especially from browser vendors) is definitely welcome.
Comment 13 Jonas Sicking (Not reading bugmail) 2010-10-06 21:03:21 UTC
I don't have a strong preference. One thing to remember is that this is only a problem for document.write, which is a horrible API anyway.

I can generally live with either behavior here. But I definitely agree that telling written tags apart from network-originated tags potentially introduces a fair amount of complexity into the parser. (Including in gecko if/when we start doing speculative parsing for doc.written stuff)
Comment 14 Henri Sivonen 2010-10-07 12:43:24 UTC
(In reply to comment #13)
> (Including in gecko if/when we
> start doing speculative parsing for doc.written stuff)

I think it doesn't make sense to fully speculatively parse document.written stuff in Gecko. When document.write blocks, it would make sense to prescan the blocked part for interesting URLs. I'm planning on doing that sometime after reimplementing the sanitizer. I think this will have no effect on the provenance tracking machinery already in place in Gecko.
Comment 15 Ian 'Hixie' Hickson 2010-10-12 09:38:03 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: I'm tentatively rejecting this request, on the grounds that everything sucks but at least the behaviour in the spec isn't too magical, and that the only obvious alternative breaks what is currently in an implementation invariant (that the input stream is source-agnostic).

If there are pages that depend on particular behaviour here, that would be good to know. Please don't hesitate to reopen this bug if there is such data.