22999 – Rules for omitting </p> don't match the parser

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22999 - Rules for omitting </p> don't match the parser

Summary: Rules for omitting </p> don't match the parser

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Duplicates (2):	23000 23001 (view as bug list)
Depends on:
Blocks:

Reported:	2013-08-18 20:09 UTC by contributor
Modified:	2013-11-25 18:43 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

Description contributor 2013-08-18 20:09:29 UTC

Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html
Multipage: http://www.whatwg.org/C#optional-tags
Complete: http://www.whatwg.org/c#optional-tags
Referrer: http://www.whatwg.org/specs/web-apps/current-work/multipage/

Comment:
Rules for omitting </p> don't match the parser

Posted from: 90.230.218.37
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.49 Safari/537.36 OPR/16.0.1196.45 (Edition Next)

Comment 1 Simon Pieters 2013-08-18 20:21:39 UTC

[[
A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, dir, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul, element,
]]

parser's cases that "close a p element":

A start tag whose tag name is one of: "address", "article", "aside", "blockquote", "center", "details", "dialog", "dir", "div", "dl", "fieldset", "figcaption", "figure", "footer", "header", "hgroup", "main", "menu", "nav", "ol", "p", "section", "summary", "ul"
A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
A start tag whose tag name is one of: "pre", "listing"
A start tag whose tag name is "form"
A start tag whose tag name is "li"
A start tag whose tag name is one of: "dd", "dt"
A start tag whose tag name is "plaintext"
A start tag whose tag name is "table"
A start tag whose tag name is "hr"
A start tag whose tag name is "xmp"

That includes obsolete elements, but the current list has <dir> which is obsolete.

Someone writing a serializer that omits tags might want to know about the obsolete elements and maybe also li/dd/dt even though that doesn't happen in conforming content.

Comment 2 Ian 'Hixie' Hickson 2013-08-19 04:15:57 UTC

Yeah, that's fair enough. Should probably include all the obsolete elements too.

Comment 3 Ian 'Hixie' Hickson 2013-08-22 20:25:08 UTC

See also bug 23000 and bug 23001.

Comment 4 Ian 'Hixie' Hickson 2013-10-30 23:33:18 UTC

*** Bug 23000 has been marked as a duplicate of this bug. ***

Comment 5 Ian 'Hixie' Hickson 2013-10-30 23:33:20 UTC

*** Bug 23001 has been marked as a duplicate of this bug. ***

Comment 6 Ian 'Hixie' Hickson 2013-10-30 23:50:40 UTC

Ok the things that these three bugs are suggesting are:

- add the non-conforming elements to the list of places you could omit </p>.
- add the non-conforming combinations of thead/tfoot/tbody to the list of places
  you can omit those tags
- add the non-conforming <head> elements to the list of elements before which
  you cannot omit <body>

The theory is that a conforming serialiser might omit the wrong tag if exposed to non-conforming input.

I think that makes the most sense for the third case. For the first two, it doesn't let you omit the tag in the non-conforming cases, but that's ok, right?

I think if we add this we should be explicit that these are non-conforming cases.

(in http://html5.org/r/8248 I made the conforming cases work)

Comment 7 Simon Pieters 2013-10-31 08:19:46 UTC

(In reply to Ian 'Hixie' Hickson from comment #6)
> I think that makes the most sense for the third case. For the first two, it
> doesn't let you omit the tag in the non-conforming cases, but that's ok,
> right?

I guess it's ok from the point of view that it gets parsed correctly. But I still think it's unexpected to serialize the tag if it can be omitted and the user asked for tags to be omitted.

> I think if we add this we should be explicit that these are non-conforming
> cases.

Sure.

Comment 8 Ian 'Hixie' Hickson 2013-10-31 17:33:27 UTC

Well, it's unexpected to be serialising a non-conforming output in the first place. My concern is that if we say "You may omit the </p> if the element after a paragraph is a <listing> element", people will read that as "you may use the <listing> element".

The more I think about this the more I feel like we shouldn't mention the non-conforming cases at all. I don't really understand the value here. We've already told people that they cannot use <bgsound> in <body>. Why would we remind them that they shouldn't omit <body> if they start with <bgsound>? They're not allowed to do that, since they're not allowed to include <bgsound> in the first place. I mean, if the concern is just that using <bgsound> is going to result in a non-round-tripped DOM, shouldn't we also say that they should never use <isindex> and <image> tags? If we're happy saying that the current text — which does indeed say that you can't use <isindex> and <image> — is enough to avoid those problems, why isn't the same text enough to avoid the problems with <bgsound>? After all, the same text in fact makes <bgsound> non-conforming in the exact same way.

Comment 9 Simon Pieters 2013-11-01 09:02:54 UTC

(In reply to Ian 'Hixie' Hickson from comment #8)
> Well, it's unexpected to be serialising a non-conforming output in the first
> place.

If the DOM is non-conforming, it seems quite expected that that the serializer outputs something non-conforming, too.

> My concern is that if we say "You may omit the </p> if the element
> after a paragraph is a <listing> element", people will read that as "you may
> use the <listing> element".

So don't say that. We already agreed to be explicit about it being non-conforming.

> The more I think about this the more I feel like we shouldn't mention the
> non-conforming cases at all. I don't really understand the value here. We've
> already told people that they cannot use <bgsound> in <body>. Why would we
> remind them that they shouldn't omit <body> if they start with <bgsound>?

The value is that people can configure their serializer to omit tags and still have the result be parsed the same as if they didn't omit tags, even for non-conforming DOMs.

> They're not allowed to do that, since they're not allowed to include
> <bgsound> in the first place. I mean, if the concern is just that using
> <bgsound> is going to result in a non-round-tripped DOM, shouldn't we also
> say that they should never use <isindex> and <image> tags? If we're happy
> saying that the current text — which does indeed say that you can't use
> <isindex> and <image> — is enough to avoid those problems, why isn't the
> same text enough to avoid the problems with <bgsound>? After all, the same
> text in fact makes <bgsound> non-conforming in the exact same way.

<isindex> and <image> roundtrip the parse->serialize->parse fine. The DOM will be the same.

Comment 10 Ian 'Hixie' Hickson 2013-11-01 23:01:27 UTC

(In reply to Simon Pieters from comment #9)
> <isindex> and <image> roundtrip the parse->serialize->parse fine. The DOM
> will be the same.

But they won't survive serialise->parse->serialise.

Comment 11 Ian 'Hixie' Hickson 2013-11-19 22:12:33 UTC

(In reply to Simon Pieters from comment #9)
> (In reply to Ian 'Hixie' Hickson from comment #8)
> > Well, it's unexpected to be serialising a non-conforming output in the first
> > place.
> 
> If the DOM is non-conforming, it seems quite expected that that the
> serializer outputs something non-conforming, too.

It's not expected that the DOM be non-conforming in software that is outputting HTML. Indeed, it's non-conforming for the DOM to be non-conforming. :-)


> > The more I think about this the more I feel like we shouldn't mention the
> > non-conforming cases at all. I don't really understand the value here. We've
> > already told people that they cannot use <bgsound> in <body>. Why would we
> > remind them that they shouldn't omit <body> if they start with <bgsound>?
> 
> The value is that people can configure their serializer to omit tags and
> still have the result be parsed the same as if they didn't omit tags, even
> for non-conforming DOMs.

There's no way you can guarantee a round-trippable DOM if you start with a non-conforming DOM. If your DOM starts, for example, with a comment that contains "-->", or with an <hr> which has children elements, or with a <div> element before the <head>, or any number of other weird cases, you're not going to round-trip.

I just don't see the value here.

Comment 12 Simon Pieters 2013-11-19 22:48:27 UTC

(In reply to Ian 'Hixie' Hickson from comment #11)
> There's no way you can guarantee a round-trippable DOM if you start with a
> non-conforming DOM.

Right.

> If your DOM starts, for example, with a comment that
> contains "-->", or with an <hr> which has children elements, or with a <div>
> element before the <head>, or any number of other weird cases, you're not
> going to round-trip.

But the parsed result in those cases will be the same whether you omit optional tags or not.

Comment 13 Ian 'Hixie' Hickson 2013-11-22 18:21:58 UTC

No it won't, not necessarily.

As an extreme example, take this DOM:

   #document
      |
      +-- #comment: "--><plaintext>"
      |
      +-- <html>
            |
            +-- <head>
            |
            +-- <body>
                  |
                  +-- <div>

If you omit tags, the result of parsing will be this DOM:

   #document
      |
      +-- #comment: ""
      |
      +-- <plaintext>
            |
            +-- #text: "<div></div>"

If you don't omit tags, it'll be:

   #document
      |
      +-- #comment: ""
      |
      +-- <plaintext>
            |
            +-- #text: "<html><head></head><body><div></div></body></html>"

Comment 14 Simon Pieters 2013-11-23 10:47:04 UTC

Hmm, yeah OK. Do as you wish.

Comment 15 Ian 'Hixie' Hickson 2013-11-25 18:43:37 UTC

Ok. In that case, I'm closing this since I think the issues with conforming markup  were fixed already.