This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21308 - HTML5 parser does inspect the namespace of elements on the stack of open elements
Summary: HTML5 parser does inspect the namespace of elements on the stack of open elem...
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on: 22322
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-16 00:40 UTC by Rafael Weinstein
Modified: 2013-07-03 14:20 UTC (History)
6 users (show)

See Also:


Attachments

Description Rafael Weinstein 2013-03-16 00:40:04 UTC
Given the ability to embed foreign content, this seems like a generic problem. Here's one instance:

<body><table><tr><td><svg><td><foreignObject></td>Foo<foo>

When <svg> is encountered, the insertion mode is left as "in cell", as it is when the </td> is encountered.

This issue here is the </td>. The rules for "in cell" for </td> say

"Pop elements from the stack of open elements stack until an element with the same tag name as the token has been popped from the stack."

Maybe "tag name" implies localName and namespaceURI, but that's not how it's implemented anywhere.

In Webkit and Gecko, the element it pops back to is <svg td>, and the continues parsing from there. This produces:

| <html>
|   <head>
|   <body>
|     <table>
|       <tbody>
|         <tr>
|           <td>
|             <svg svg>
|               <svg td>
|                 <svg foreignObject>
|               "Foo"
|               <svg foo>

I'm actually not sure what the "right" output is here, but the above doesn't seem like a candidate. I can imagine:

| <html>
|   <head>
|   <body>
|     "Foo"
|     <foo>
|     <table>
|       <tbody>
|         <tr>
|           <td>
|             <svg svg>
|               <svg td>
|                 <svg foreignObject>

Where the </td> clears back past the first (html) <td>, and the "Foo", and <foo> get lifted out of the table.

I can also imagine:

| <html>
|   <head>
|   <body>
|     <table>
|       <tbody>
|         <tr>
|           <td>
|             <svg svg>
|               <svg td>
|                 <svg foreignObject>
|                   "Foo"
|                   <foo>

Where the </td> is ignored (perhaps because we define "in scope" to not search past the <svg foreignObject>
Comment 1 Rafael Weinstein 2013-03-16 00:41:49 UTC
Created bug with WHATWG product at Hixie's request.
Comment 2 Adam Klein 2013-04-01 21:30:08 UTC
Uploaded a work-in-progress patch for WebKit at https://bugs.webkit.org/show_bug.cgi?id=113723. The short description of that change is that, in nearly all places, we now check that the namespace of the current stack item is HTML before comparing it against the particular local name we're searching for (e.g., when checking if there's a <td> in table scope). This passes all the html5lib tests, fwiw.
Comment 3 Ian 'Hixie' Hickson 2013-05-29 20:17:36 UTC
Clearly html5lib is missing some important tests. :-)

I don't really understand why the current behaviour is wrong. The tags in the markup have no namespace. The unmatched end tag here is invalid; how can we know what namespace the author intended? I don't see what's wrong with just matching based on the local name alone.
Comment 4 Rafael Weinstein 2013-05-29 20:35:45 UTC
I think you're choosing to look at it that way.

The parser picks a namespace for each tag as it is encountered. The fact that the namespace is implicit in the structure of the input is irrelevant. All elements' namespaces are determined as they are encountered.

It should not be possible for *any* end tag in the HTML namespace to mistake its opening element for an element in another namespace which happens to have the same local name.

WebKit and Blink have already made this change. I wouldn't be surprised if Gecko has as well. William?
Comment 5 Ian 'Hixie' Hickson 2013-06-12 17:55:56 UTC
I think to fix this I first need to fix bug 22322.
Comment 6 Peter Occil 2013-06-12 22:24:38 UTC
(In reply to comment #5)
> I think to fix this I first need to fix bug 22322.

I don't see how that issue will help fix this one; that issue deals with "acting as if a --tag token-- had been seen", and by the time a token is generated (as opposed to the time an element is created), it has not been assigned to any namespace yet.
Comment 7 Ian 'Hixie' Hickson 2013-06-19 21:38:14 UTC
Yeah, the point is that it should be assigned a namespace, essentially. If you see a <p> token, and you're in an HTML block with an open <p>, and you process it per rules that try to imply the closing </p>, you don't want the implicit closing </p> to be handled as matching some foreign namespace <p> element. By fixing that bug, we can make sure that can't happen, which will make reasoning about this bug much easier.
Comment 8 Ian 'Hixie' Hickson 2013-07-01 22:19:24 UTC
(In reply to comment #0)
> Given the ability to embed foreign content, this seems like a generic
> problem. Here's one instance:
> 
> <body><table><tr><td><svg><td><foreignObject></td>Foo<foo>
> 
> When <svg> is encountered, the insertion mode is left as "in cell", as it is
> when the </td> is encountered.
> 
> This issue here is the </td>. The rules for "in cell" for </td> [...]

Those rules don't get invoked, do they?

The tree construction dispatcher:

   http://whatwg.org/html#tree-construction-dispatcher

...defers to "in foreign content". That then falls through to the default "any other end tag" rules. That walks up the tree looking either for a match for </td>, or for an HTML element. Before it hits an HTML element, though, it hits the <svg:td> element, and closes that.

That seems right to me.

I don't think the bug as filed is actually valid.

Having said that, there are lots of similar cases that _are_ bogus. I've tried to fix all those now. For example, this case:

   <body><table><tr><td><svg><td><foreignObject></tr>

As the spec stood before, the result was completely bogus. The </tr> implied a </td> that closed the <svg:td>, and the parser ended up in a nonsensical state.

That's fixed now, I think (see patch below).
Comment 9 contributor 2013-07-01 22:19:35 UTC
Checked in as WHATWG revision r8003.
Check-in comment: Make all occurrences of 'same tag name' in the parser explicitly refer to HTML elements.
http://html5.org/tools/web-apps-tracker?from=8002&to=8003
Comment 10 Ian 'Hixie' Hickson 2013-07-01 23:32:15 UTC
Oops, I need to go through and check instances of "whose tag name", too.
Comment 11 contributor 2013-07-01 23:43:37 UTC
Checked in as WHATWG revision r8007.
Check-in comment: Fix more 'same tag name' issues (HTML parser).
http://html5.org/tools/web-apps-tracker?from=8006&to=8007
Comment 12 Ian 'Hixie' Hickson 2013-07-03 14:20:14 UTC
Let me know if I missed anything!