This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11393 - End tag fails to break out of foreign content
Summary: End tag fails to break out of foreign content
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC Linux
: P1 critical
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-23 17:34 UTC by James Graham
Modified: 2011-08-04 05:05 UTC (History)
8 users (show)

See Also:


Attachments

Description James Graham 2010-11-23 17:34:31 UTC
Given the input:

<!DOCTYPE html><body><table><caption><svg><g>foo</g><g>bar</g>baz</table><p>quux

the current spec requires that the </table> be ignored, resulting in:

#document
|  <!DOCTYPE html>
|  <html>
|    <head>
|    <body>
|      <table>
|        <caption>
|          <svg svg>
|            <svg g>
|              "foo"
|            <svg g>
|              "bar"
|            "baz"
|          <p>
|            "quux"

Previously (before the recent foreign content changes) it required:

#document
| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <table>
|       <caption>
|         <svg svg>
|           <svg g>
|             "foo"
|           <svg g>
|             "bar"
|           "baz"
|     <p>
|       "quux"

The old behaviour makes more sense to me.
Comment 1 Ian 'Hixie' Hickson 2011-01-01 06:07:43 UTC
Why?
Comment 2 Ian 'Hixie' Hickson 2011-02-03 20:36:29 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: I don't understand why either behaviour is better or worse than the other.
Comment 3 Simon Pieters 2011-02-03 20:58:41 UTC
The </table> tag breaks out of the <svg>, but it doesn't close the table. That makes no sense. Either the </table> tag should be ignored altogether, or it should break out properly and close the table.

The current behavior is like <foo><bar></foo> closing the bar but not the foo.
Comment 4 Simon Pieters 2011-02-03 21:04:08 UTC
Hmm. Maybe I was misreading the issue. I guess the </table> tag is ignored altogether, and then <p> breaks out. If so ignore my previous comment.

However, end tags usually break out of foreign content: <div><svg></div> closes the svg. <table><caption><span></table> closes the table. Taken together, one would expect <table><caption><svg>...</table> to close the table.
Comment 5 James Graham 2011-02-04 09:33:39 UTC
(In reply to comment #4)

> However, end tags usually break out of foreign content: <div><svg></div> closes
> the svg. <table><caption><span></table> closes the table. Taken together, one
> would expect <table><caption><svg>...</table> to close the table.

Yes, this was the point I was trying to make. Sorry for being unclear.
Comment 6 Ian 'Hixie' Hickson 2011-02-09 01:01:55 UTC
I guess this is why we had a secondary insertion mode, back in the day. I knew there had to be a reason!

As far as I can tell, this is actually a place where Henri's approach of checking the namespace of the current node before each token, and switching to foreign content mode that way, rather than using the insertion mode, results in different behaviour. Indeed, WebKit and Gecko differ for <table><caption><svg></table>, with WebKit following the spec, and Gecko doing what James suggests.

I'm very reluctant to perform such deep surgery on the parser at this extremely late stage, but since we don't have interop here, and since there's a (rather minor) reason to prefer the non-spec behaviour here (see comment 4 paragraph 2), maybe it's worth it?

Adam, you're the one who'd be most impacted by this, do you have an opinion? The proposal would be to add a branch before the insertion mode switch, which checks the namespace of the current node, rather than having foreign lands be an insertion mode, and to have what used to be the foreign content mode just handle what happens when you get a token in foreign content (i.e. dropping the first <dt> of the current definition).
Comment 7 Adam Barth 2011-02-09 20:49:29 UTC
> Adam, you're the one who'd be most impacted by this, do you have an opinion?

I'd prefer not to make large changes to the algorithm at this stage, if we can avoid it.  For example, WebKit's implementation has already gone through a detailed security review.  We could certainly make the change, but it would burn a bunch of resources we'd prefer to allocate elsewhere.
Comment 8 Henri Sivonen 2011-02-09 21:25:33 UTC
(In reply to comment #6)
> Indeed, WebKit and Gecko differ for
> <table><caption><svg></table>, with WebKit following the spec, and Gecko doing
> what James suggests.

I feel vindicated for my approach precognizantly fixing a bug. (FWIW, I'm unwilling to change code to make less sense just because the spec failed to match its intended conceptual model for this long.)
Comment 9 Ian 'Hixie' Hickson 2011-02-09 21:27:29 UTC
I guess it falls to Opera or IE to cast the tie-breaking vote? First one to ship something that either matches the spec/WebKit or matches the proposed behaviour/Gecko gets to decide what happens? I don't really know how else to proceed here. On the one hand this is a pretty minor issue, and it would unduly burden WebKit to change it, especially given that they followed the spec. On the other hand, it's a consistency issue that we could fix now which would likely not have any compat implications, and Henri did tell me a long time ago to do this (though we didn't realise it was anything other than editorial at the time).
Comment 10 Henri Sivonen 2011-02-10 10:18:35 UTC
(In reply to comment #9)
> I guess it falls to Opera or IE to cast the tie-breaking vote? First one to
> ship something that either matches the spec/WebKit or matches the proposed
> behaviour/Gecko gets to decide what happens?

That's a lousy way to deal with this. Basically, you are leaving a spec bug in and seeing if implementors take the trouble to realize that the spec is wrong. Should I counter this by checking in test cases to html5lib in order to make the implementors notice?

> I don't really know how else to proceed here.

I suggest the following procedure:

 1) Admit that it was a bad idea to prematurely spec optimizations instead of speccing the intended conceptual model.

 2) Edit the spec to describe the intended conceptual model (which is already implemented in Gecko in order to avoid having to tweak the code every time we find yet another failure of your optimization to match the conceptual model). Basically, "in foreign content" shouldn't be an insertion mode but a check of the namespace of the current element on the stack in order to decide whether to handle a start tag token in the foreign lands or according to the insertion mode.

 3) Leave possible further optimizations in this area to implementations if they find evidence that optimization is necessary. (I don't have evidence showing a need to optimize Gecko on this particular point.)

I haven't examined the WebKit implementation closely, but at least for Gecko, migrating from the spec's flawed optimization to expressing the conceptual model is code was a relatively straight-forward copy and paste job. However, going in the other direction would be harder, which is why I don't want to deliberately break the conceptual model in order to emulate spec bugs.

Note that WebKit's impl isn't up to date for foreign lands changes anyway, so they need to refresh the code in any case.
Comment 11 James Graham 2011-02-10 12:08:50 UTC
(In reply to comment #9)
> I guess it falls to Opera or IE to cast the tie-breaking vote? First one to
> ship something that either matches the spec/WebKit or matches the proposed
> behaviour/Gecko gets to decide what happens? 

I agree with Henri that this seems like a silly way to proceed. However, talking to our parser guys, we like the conceptual model Henri presents and, irrespective of the conceptual model used in the spec, would prefer to implement the behaviour that you would get from fixing this bug.
Comment 12 Adam Barth 2011-02-11 06:59:40 UTC
You're using pretty strong words Henri.

I'd generally prefer if we stopped tinkering with the parser.  I'm not particularly interested in continuing to chase a moving target.  There are plenty of other things I'd prefer to invest in.
Comment 13 Henri Sivonen 2011-02-14 15:16:35 UTC
(In reply to comment #12)
> I'm not
> particularly interested in continuing to chase a moving target.  There are
> plenty of other things I'd prefer to invest in.

Me, too. That's why I fixed bugs precognizantly when it became abundantly clear that the approach in the spec caused always one more deviation from the intended conceptual model.

As for deliberately not fixing the spec now, I'm not interested in investing in implementing behaviors that are known to be spec bugs.
Comment 14 Henri Sivonen 2011-02-24 10:49:07 UTC
Another case to consider:

#data
<table><tr><td><svg><desc><td></desc><circle>
#errors
7: Start tag seen without seeing a doctype first. Expected <!DOCTYPE html>.
30: A table cell was implicitly closed, but there were open elements.
26: Unclosed element desc.
20: Unclosed element svg.
37: Stray end tag desc.
45: End of file seen and there were open elements.
45: Unclosed element circle.
7: Unclosed element table.
#document
| <html>
|   <head>
|   <body>
|     <table>
|       <tbody>
|         <tr>
|           <td>
|             <svg svg>
|               <svg desc>
|           <td>
|             <circle>

The above is the expectation with this spec bug fixed. The current spec text doesn't make <td> break out all the way.
Comment 15 Ian 'Hixie' Hickson 2011-02-28 23:05:57 UTC
(In reply to comment #10)
> 
> That's a lousy way to deal with this. Basically, you are leaving a spec bug in
> and seeing if implementors take the trouble to realize that the spec is wrong.

No, I'm trying to minimise spec churn by avoiding making non-critical changes to a stable part of the spec.


>  1) Admit that it was a bad idea to prematurely spec optimizations instead of
> speccing the intended conceptual model.

I agree that it turns out that the spec model is suboptimal, though before this bug nobody, including you, had shown this to be the case.

I do not think the current model is a premature optimisation (or indeed an optimisation at all). It's just a different way of approaching a particularly thorny problem.


>  2) Edit the spec to describe the intended conceptual model (which is already
> implemented in Gecko in order to avoid having to tweak the code every time we
> find yet another failure of your optimization to match the conceptual model).

With all due respect, what Gecko does _isn't_ the intended conceptual model. The intended conceptual modal was the (now known to be suboptimal) model in the spec.


(In reply to comment #11)
> 
> talking to our parser guys, we like the conceptual model Henri presents and,
> irrespective of the conceptual model used in the spec, would prefer to
> implement the behaviour that you would get from fixing this bug.

Thanks, that's good information.
Comment 16 Ian 'Hixie' Hickson 2011-02-28 23:56:18 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: More browsers are planning on implementing the proposal than the spec, so the spec must be changed.

Please review this diff very carefully, as it is a pretty major and risky change to very subtle requirements. I will try to prioritise fixing any new problems this change introduces.
Comment 17 contributor 2011-02-28 23:56:41 UTC
Checked in as WHATWG revision r5920.
Check-in comment: Change how MathML and SVG are parsed in text/html: use a three-level tree constructor design instead of the two-level design we had before.
http://html5.org/tools/web-apps-tracker?from=5919&to=5920
Comment 18 Henri Sivonen 2011-03-01 15:21:17 UTC
(In reply to comment #16)
> Status: Accepted

Thanks.

> Please review this diff very carefully, as it is a pretty major and risky
> change to very subtle requirements. I will try to prioritise fixing any new
> problems this change introduces.

Did you intend to change the handling of character tokens when the current node is an HTML integration point? (Why?)
Comment 19 Henri Sivonen 2011-03-21 12:03:34 UTC
(In reply to comment #18)
> Did you intend to change the handling of character tokens when the current node
> is an HTML integration point? (Why?)

Hmm. Looks like the handling of character tokens in Gecko was non-compliant before this change and there wasn't a change to spec requirements.
Comment 20 Michael[tm] Smith 2011-08-04 05:05:11 UTC
mass-moved component to LC1