This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10117 - Tag name state algorithm has mis-ordered step
Summary: Tag name state algorithm has mis-ordered step
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P1 critical
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/spec/Overview...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-09 00:15 UTC by Adrian Bateman [MSFT]
Modified: 2011-11-11 00:57 UTC (History)
5 users (show)

See Also:


Attachments

Description Adrian Bateman [MSFT] 2010-07-09 00:15:07 UTC
Change

  U+003E GREATER-THAN SIGN (>)
  Emit the current tag token. Switch to the data state.

to

  U+003E GREATER-THAN SIGN (>)
  Switch to the data state. Emit the current tag token.

----------------
Details of issue:

Section 8.2.4.10 (Tag name state) says

  U+003E GREATER-THAN SIGN (>)
  Emit the current tag token. Switch to the data state.

The "Emit the current tag token" step is defined in section 8.2.4 as:

  When a token is emitted, it must immediately be handled by the
  tree construction stage. The tree construction stage can affect
  the state of the tokenization stage, and can insert additional
  characters into the stream.

So let us consider the following HTML:

  <html>
  <head>
  <script><!-- window.alert(); --></script>
  </head>
  <body></body>
  </html>

At the closing '>' of '<script>', the tokenizer is in tag name state.  It emits the current tag token, which is a 'script' start tag.

The tree construction stage, in section 8.2.5.7 ("in head" insertion mode), specifies:

  A start tag whose tag name is "script"
  Run these steps:
  ...
  5.Switch the tokenizer to the script data state.

The tree construction stage therefore resets the tokenizer state immediately.

After completing, the tree construction stage returns to the tokenizer.  *And at that point, the tokenizer is specified to reset to the data state!*  This state update overwrites the state update from the tree construction stage, and the script is not parsed as script.

The identical bug exists in all the other states that can emit start tags which can contain content (8.2.4.34 through 8.2.4.37, and 8.2.4.42).

The fix is to reverse the order of the state update and the token emission:

  U+003E GREATER-THAN SIGN (>)
  Switch to the data state. Emit the current tag token.
Comment 1 Adam Barth 2010-07-09 00:59:31 UTC
Interesting.  We missed that bug in the WebKit implementation because we don't *immediately* hand the token off to the tree builder.  Instead, we adjust the state as Adrian suggests and then pass the token from the tokenizer into the tree builder.
Comment 2 Ian 'Hixie' Hickson 2010-07-14 21:10:40 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter's comments.
Comment 3 contributor 2010-07-14 21:11:06 UTC
Checked in as WHATWG revision r5164.
Check-in comment: Make 'emit' always come after 'switch', and remove any mention of 'stay' in the tokeniser.
http://html5.org/tools/web-apps-tracker?from=5163&to=5164
Comment 4 Ian 'Hixie' Hickson 2011-11-11 00:57:30 UTC
Looks like I missed some. See bug 14698.