<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>22999</bug_id>
          
          <creation_ts>2013-08-18 20:09:29 +0000</creation_ts>
          <short_desc>Rules for omitting &lt;/p&gt; don&apos;t match the parser</short_desc>
          <delta_ts>2013-11-25 18:43:37 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>HTML</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WORKSFORME</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#optional-tags</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>ian</cc>
    
    <cc>mathias</cc>
    
    <cc>mike</cc>
    
    <cc>zcorpan</cc>
          
          <qa_contact>contributor</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>92238</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2013-08-18 20:09:29 +0000</bug_when>
    <thetext>Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html
Multipage: http://www.whatwg.org/C#optional-tags
Complete: http://www.whatwg.org/c#optional-tags
Referrer: http://www.whatwg.org/specs/web-apps/current-work/multipage/

Comment:
Rules for omitting &lt;/p&gt; don&apos;t match the parser

Posted from: 90.230.218.37
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.49 Safari/537.36 OPR/16.0.1196.45 (Edition Next)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92239</commentid>
    <comment_count>1</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2013-08-18 20:21:39 +0000</bug_when>
    <thetext>[[
A p element&apos;s end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, dir, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul, element,
]]

parser&apos;s cases that &quot;close a p element&quot;:

A start tag whose tag name is one of: &quot;address&quot;, &quot;article&quot;, &quot;aside&quot;, &quot;blockquote&quot;, &quot;center&quot;, &quot;details&quot;, &quot;dialog&quot;, &quot;dir&quot;, &quot;div&quot;, &quot;dl&quot;, &quot;fieldset&quot;, &quot;figcaption&quot;, &quot;figure&quot;, &quot;footer&quot;, &quot;header&quot;, &quot;hgroup&quot;, &quot;main&quot;, &quot;menu&quot;, &quot;nav&quot;, &quot;ol&quot;, &quot;p&quot;, &quot;section&quot;, &quot;summary&quot;, &quot;ul&quot;
A start tag whose tag name is one of: &quot;h1&quot;, &quot;h2&quot;, &quot;h3&quot;, &quot;h4&quot;, &quot;h5&quot;, &quot;h6&quot;
A start tag whose tag name is one of: &quot;pre&quot;, &quot;listing&quot;
A start tag whose tag name is &quot;form&quot;
A start tag whose tag name is &quot;li&quot;
A start tag whose tag name is one of: &quot;dd&quot;, &quot;dt&quot;
A start tag whose tag name is &quot;plaintext&quot;
A start tag whose tag name is &quot;table&quot;
A start tag whose tag name is &quot;hr&quot;
A start tag whose tag name is &quot;xmp&quot;

That includes obsolete elements, but the current list has &lt;dir&gt; which is obsolete.

Someone writing a serializer that omits tags might want to know about the obsolete elements and maybe also li/dd/dt even though that doesn&apos;t happen in conforming content.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92250</commentid>
    <comment_count>2</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-08-19 04:15:57 +0000</bug_when>
    <thetext>Yeah, that&apos;s fair enough. Should probably include all the obsolete elements too.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92475</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-08-22 20:25:08 +0000</bug_when>
    <thetext>See also bug 23000 and bug 23001.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95590</commentid>
    <comment_count>4</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-10-30 23:33:18 +0000</bug_when>
    <thetext>*** Bug 23000 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95592</commentid>
    <comment_count>5</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-10-30 23:33:20 +0000</bug_when>
    <thetext>*** Bug 23001 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95594</commentid>
    <comment_count>6</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-10-30 23:50:40 +0000</bug_when>
    <thetext>Ok the things that these three bugs are suggesting are:

- add the non-conforming elements to the list of places you could omit &lt;/p&gt;.
- add the non-conforming combinations of thead/tfoot/tbody to the list of places
  you can omit those tags
- add the non-conforming &lt;head&gt; elements to the list of elements before which
  you cannot omit &lt;body&gt;

The theory is that a conforming serialiser might omit the wrong tag if exposed to non-conforming input.

I think that makes the most sense for the third case. For the first two, it doesn&apos;t let you omit the tag in the non-conforming cases, but that&apos;s ok, right?

I think if we add this we should be explicit that these are non-conforming cases.

(in http://html5.org/r/8248 I made the conforming cases work)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95600</commentid>
    <comment_count>7</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2013-10-31 08:19:46 +0000</bug_when>
    <thetext>(In reply to Ian &apos;Hixie&apos; Hickson from comment #6)
&gt; I think that makes the most sense for the third case. For the first two, it
&gt; doesn&apos;t let you omit the tag in the non-conforming cases, but that&apos;s ok,
&gt; right?

I guess it&apos;s ok from the point of view that it gets parsed correctly. But I still think it&apos;s unexpected to serialize the tag if it can be omitted and the user asked for tags to be omitted.

&gt; I think if we add this we should be explicit that these are non-conforming
&gt; cases.

Sure.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95618</commentid>
    <comment_count>8</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-10-31 17:33:27 +0000</bug_when>
    <thetext>Well, it&apos;s unexpected to be serialising a non-conforming output in the first place. My concern is that if we say &quot;You may omit the &lt;/p&gt; if the element after a paragraph is a &lt;listing&gt; element&quot;, people will read that as &quot;you may use the &lt;listing&gt; element&quot;.

The more I think about this the more I feel like we shouldn&apos;t mention the non-conforming cases at all. I don&apos;t really understand the value here. We&apos;ve already told people that they cannot use &lt;bgsound&gt; in &lt;body&gt;. Why would we remind them that they shouldn&apos;t omit &lt;body&gt; if they start with &lt;bgsound&gt;? They&apos;re not allowed to do that, since they&apos;re not allowed to include &lt;bgsound&gt; in the first place. I mean, if the concern is just that using &lt;bgsound&gt; is going to result in a non-round-tripped DOM, shouldn&apos;t we also say that they should never use &lt;isindex&gt; and &lt;image&gt; tags? If we&apos;re happy saying that the current text — which does indeed say that you can&apos;t use &lt;isindex&gt; and &lt;image&gt; — is enough to avoid those problems, why isn&apos;t the same text enough to avoid the problems with &lt;bgsound&gt;? After all, the same text in fact makes &lt;bgsound&gt; non-conforming in the exact same way.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95661</commentid>
    <comment_count>9</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2013-11-01 09:02:54 +0000</bug_when>
    <thetext>(In reply to Ian &apos;Hixie&apos; Hickson from comment #8)
&gt; Well, it&apos;s unexpected to be serialising a non-conforming output in the first
&gt; place.

If the DOM is non-conforming, it seems quite expected that that the serializer outputs something non-conforming, too.

&gt; My concern is that if we say &quot;You may omit the &lt;/p&gt; if the element
&gt; after a paragraph is a &lt;listing&gt; element&quot;, people will read that as &quot;you may
&gt; use the &lt;listing&gt; element&quot;.

So don&apos;t say that. We already agreed to be explicit about it being non-conforming.

&gt; The more I think about this the more I feel like we shouldn&apos;t mention the
&gt; non-conforming cases at all. I don&apos;t really understand the value here. We&apos;ve
&gt; already told people that they cannot use &lt;bgsound&gt; in &lt;body&gt;. Why would we
&gt; remind them that they shouldn&apos;t omit &lt;body&gt; if they start with &lt;bgsound&gt;?

The value is that people can configure their serializer to omit tags and still have the result be parsed the same as if they didn&apos;t omit tags, even for non-conforming DOMs.

&gt; They&apos;re not allowed to do that, since they&apos;re not allowed to include
&gt; &lt;bgsound&gt; in the first place. I mean, if the concern is just that using
&gt; &lt;bgsound&gt; is going to result in a non-round-tripped DOM, shouldn&apos;t we also
&gt; say that they should never use &lt;isindex&gt; and &lt;image&gt; tags? If we&apos;re happy
&gt; saying that the current text — which does indeed say that you can&apos;t use
&gt; &lt;isindex&gt; and &lt;image&gt; — is enough to avoid those problems, why isn&apos;t the
&gt; same text enough to avoid the problems with &lt;bgsound&gt;? After all, the same
&gt; text in fact makes &lt;bgsound&gt; non-conforming in the exact same way.

&lt;isindex&gt; and &lt;image&gt; roundtrip the parse-&gt;serialize-&gt;parse fine. The DOM will be the same.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95717</commentid>
    <comment_count>10</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-11-01 23:01:27 +0000</bug_when>
    <thetext>(In reply to Simon Pieters from comment #9)
&gt; &lt;isindex&gt; and &lt;image&gt; roundtrip the parse-&gt;serialize-&gt;parse fine. The DOM
&gt; will be the same.

But they won&apos;t survive serialise-&gt;parse-&gt;serialise.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>96553</commentid>
    <comment_count>11</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-11-19 22:12:33 +0000</bug_when>
    <thetext>(In reply to Simon Pieters from comment #9)
&gt; (In reply to Ian &apos;Hixie&apos; Hickson from comment #8)
&gt; &gt; Well, it&apos;s unexpected to be serialising a non-conforming output in the first
&gt; &gt; place.
&gt; 
&gt; If the DOM is non-conforming, it seems quite expected that that the
&gt; serializer outputs something non-conforming, too.

It&apos;s not expected that the DOM be non-conforming in software that is outputting HTML. Indeed, it&apos;s non-conforming for the DOM to be non-conforming. :-)


&gt; &gt; The more I think about this the more I feel like we shouldn&apos;t mention the
&gt; &gt; non-conforming cases at all. I don&apos;t really understand the value here. We&apos;ve
&gt; &gt; already told people that they cannot use &lt;bgsound&gt; in &lt;body&gt;. Why would we
&gt; &gt; remind them that they shouldn&apos;t omit &lt;body&gt; if they start with &lt;bgsound&gt;?
&gt; 
&gt; The value is that people can configure their serializer to omit tags and
&gt; still have the result be parsed the same as if they didn&apos;t omit tags, even
&gt; for non-conforming DOMs.

There&apos;s no way you can guarantee a round-trippable DOM if you start with a non-conforming DOM. If your DOM starts, for example, with a comment that contains &quot;--&gt;&quot;, or with an &lt;hr&gt; which has children elements, or with a &lt;div&gt; element before the &lt;head&gt;, or any number of other weird cases, you&apos;re not going to round-trip.

I just don&apos;t see the value here.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>96562</commentid>
    <comment_count>12</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2013-11-19 22:48:27 +0000</bug_when>
    <thetext>(In reply to Ian &apos;Hixie&apos; Hickson from comment #11)
&gt; There&apos;s no way you can guarantee a round-trippable DOM if you start with a
&gt; non-conforming DOM.

Right.

&gt; If your DOM starts, for example, with a comment that
&gt; contains &quot;--&gt;&quot;, or with an &lt;hr&gt; which has children elements, or with a &lt;div&gt;
&gt; element before the &lt;head&gt;, or any number of other weird cases, you&apos;re not
&gt; going to round-trip.

But the parsed result in those cases will be the same whether you omit optional tags or not.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>96708</commentid>
    <comment_count>13</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-11-22 18:21:58 +0000</bug_when>
    <thetext>No it won&apos;t, not necessarily.

As an extreme example, take this DOM:

   #document
      |
      +-- #comment: &quot;--&gt;&lt;plaintext&gt;&quot;
      |
      +-- &lt;html&gt;
            |
            +-- &lt;head&gt;
            |
            +-- &lt;body&gt;
                  |
                  +-- &lt;div&gt;

If you omit tags, the result of parsing will be this DOM:

   #document
      |
      +-- #comment: &quot;&quot;
      |
      +-- &lt;plaintext&gt;
            |
            +-- #text: &quot;&lt;div&gt;&lt;/div&gt;&quot;

If you don&apos;t omit tags, it&apos;ll be:

   #document
      |
      +-- #comment: &quot;&quot;
      |
      +-- &lt;plaintext&gt;
            |
            +-- #text: &quot;&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>96744</commentid>
    <comment_count>14</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2013-11-23 10:47:04 +0000</bug_when>
    <thetext>Hmm, yeah OK. Do as you wish.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>96799</commentid>
    <comment_count>15</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-11-25 18:43:37 +0000</bug_when>
    <thetext>Ok. In that case, I&apos;m closing this since I think the issues with conforming markup  were fixed already.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>