<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>10117</bug_id>
          
          <creation_ts>2010-07-09 00:15:07 +0000</creation_ts>
          <short_desc>Tag name state algorithm has mis-ordered step</short_desc>
          <delta_ts>2011-11-11 00:57:30 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>pre-LC1 HTML5 spec (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://dev.w3.org/html5/spec/Overview.html#tag-name-state</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P1</priority>
          <bug_severity>critical</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Adrian Bateman [MSFT]">adrianba</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>w3c</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>36704</commentid>
    <comment_count>0</comment_count>
    <who name="Adrian Bateman [MSFT]">adrianba</who>
    <bug_when>2010-07-09 00:15:07 +0000</bug_when>
    <thetext>Change

  U+003E GREATER-THAN SIGN (&gt;)
  Emit the current tag token. Switch to the data state.

to

  U+003E GREATER-THAN SIGN (&gt;)
  Switch to the data state. Emit the current tag token.

----------------
Details of issue:

Section 8.2.4.10 (Tag name state) says

  U+003E GREATER-THAN SIGN (&gt;)
  Emit the current tag token. Switch to the data state.

The &quot;Emit the current tag token&quot; step is defined in section 8.2.4 as:

  When a token is emitted, it must immediately be handled by the
  tree construction stage. The tree construction stage can affect
  the state of the tokenization stage, and can insert additional
  characters into the stream.

So let us consider the following HTML:

  &lt;html&gt;
  &lt;head&gt;
  &lt;script&gt;&lt;!-- window.alert(); --&gt;&lt;/script&gt;
  &lt;/head&gt;
  &lt;body&gt;&lt;/body&gt;
  &lt;/html&gt;

At the closing &apos;&gt;&apos; of &apos;&lt;script&gt;&apos;, the tokenizer is in tag name state.  It emits the current tag token, which is a &apos;script&apos; start tag.

The tree construction stage, in section 8.2.5.7 (&quot;in head&quot; insertion mode), specifies:

  A start tag whose tag name is &quot;script&quot;
  Run these steps:
  ...
  5.Switch the tokenizer to the script data state.

The tree construction stage therefore resets the tokenizer state immediately.

After completing, the tree construction stage returns to the tokenizer.  *And at that point, the tokenizer is specified to reset to the data state!*  This state update overwrites the state update from the tree construction stage, and the script is not parsed as script.

The identical bug exists in all the other states that can emit start tags which can contain content (8.2.4.34 through 8.2.4.37, and 8.2.4.42).

The fix is to reverse the order of the state update and the token emission:

  U+003E GREATER-THAN SIGN (&gt;)
  Switch to the data state. Emit the current tag token.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>36705</commentid>
    <comment_count>1</comment_count>
    <who name="Adam Barth">w3c</who>
    <bug_when>2010-07-09 00:59:31 +0000</bug_when>
    <thetext>Interesting.  We missed that bug in the WebKit implementation because we don&apos;t *immediately* hand the token off to the tree builder.  Instead, we adjust the state as Adrian suggests and then pass the token from the tokenizer into the tree builder.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>36868</commentid>
    <comment_count>2</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-07-14 21:10:40 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter&apos;s comments.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>36869</commentid>
    <comment_count>3</comment_count>
    <who name="">contributor</who>
    <bug_when>2010-07-14 21:11:06 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r5164.
Check-in comment: Make &apos;emit&apos; always come after &apos;switch&apos;, and remove any mention of &apos;stay&apos; in the tokeniser.
http://html5.org/tools/web-apps-tracker?from=5163&amp;to=5164</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>59889</commentid>
    <comment_count>4</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-11-11 00:57:30 +0000</bug_when>
    <thetext>Looks like I missed some. See bug 14698.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>