<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>5753</bug_id>
          
          <creation_ts>2008-06-14 09:02:20 +0000</creation_ts>
          <short_desc>parsing issues with legacy UAs</short_desc>
          <delta_ts>2010-10-04 14:49:40 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>pre-LC1 HTML5 spec (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>VERIFIED</bug_status>
          <resolution>WORKSFORME</resolution>
          
          
          <bug_file_loc>http://esw.w3.org/topic/HTML/InterimLegacyBridgingMarkup</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>NoReply</keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>FPWD</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Rob Burns">rob</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>20473</commentid>
    <comment_count>0</comment_count>
    <who name="Rob Burns">rob</who>
    <bug_when>2008-06-14 09:02:20 +0000</bug_when>
    <thetext>For the text/html serialization only:

  many key implementations do not use DTDs or any similar mechanism so they cannot correctly parse unknown HTML elements
  authors want to use the new semantics elements provided by in HTML5, but cannot do so if targeted UAs do not properly parse those elements
  routine DOM states cannot be serialized to text/html without loss of data

This interim markup has two separate but related issues:

  content models not supported by the p (paragraph) element
  incorrect parsing for newly introduced elements (parsed either as void, paragraph-terminating or non-paragraph-terminating)

(see http://esw.w3.org/topic/HTML/InterimLegacyBridgingMarkup for evolving solution proposals)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20474</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2008-06-14 09:10:01 +0000</bug_when>
    <thetext>I don&apos;t understand, could you clarify what exactly the problem is? Possibly give an example?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20477</commentid>
    <comment_count>2</comment_count>
    <who name="Rob Burns">rob</who>
    <bug_when>2008-06-14 09:46:30 +0000</bug_when>
    <thetext>Because of the disparate ways UAs currently handle parsing of unknown elements, the tree is constructed in a variety of ways. Also the content model supported by the text/html serialization does not support the full HTML5 content model.

Imagine an editing UA with the tree

p
 #textnode
 ul
   li
   li
 #textnode

A user wants this serialized to text/html without loss of data so that it can be pasted into an email application and sent to a recipient whose email UA only supports text/html processing. Right now the data is simply lost. That&apos;s just one example, but the problem/issue has wider implications.

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20484</commentid>
    <comment_count>3</comment_count>
    <who name="Lachlan Hunt">lachlan.hunt</who>
    <bug_when>2008-06-14 11:06:01 +0000</bug_when>
    <thetext>(In reply to comment #2)
&gt; Because of the disparate ways UAs currently handle parsing of unknown elements,
&gt; the tree is constructed in a variety of ways.

The spec already defines how the text/html serialisation needs to be parsed into a tree and how to reserialise it.  Unless there is a specific bug with the spec you are wanting to get fixed, simply discussing the way legacy browsers do it today is largely irrelevant.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20487</commentid>
    <comment_count>4</comment_count>
    <who name="Rob Burns">rob</who>
    <bug_when>2008-06-14 11:27:57 +0000</bug_when>
    <thetext>(In reply to comment #3)

&gt; The spec already defines how the text/html serialisation needs to be parsed
&gt; into a tree and how to reserialise it.  Unless there is a specific bug with the
&gt; spec you are wanting to get fixed, simply discussing the way legacy browsers do
&gt; it today is largely irrelevant.

So regardless of legacy UAs and just focussing on HTML5 UAs: 
  How would a UA serialize the DOM tree I gave in the above comment #2 example in a way that could be parsed into a HTML5 text/html processor without loss of data?
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20490</commentid>
    <comment_count>5</comment_count>
    <who name="Lachlan Hunt">lachlan.hunt</who>
    <bug_when>2008-06-14 12:31:27 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; So regardless of legacy UAs and just focussing on HTML5 UAs: 
&gt;   How would a UA serialize the DOM tree I gave in the above comment #2 example
&gt; in a way that could be parsed into a HTML5 text/html processor without loss of
&gt; data?

That is one of the well known differences between HTML and XHTML, and we are very much constrained by our backwards compatibility design principle.  It is not possible to represent all possible documents in each of the three representations: HTML, XHTML and DOM. This is even mentioned in the spec.

http://www.whatwg.org/specs/web-apps/current-work/#html-vs

Unfortunately, we just have to accept that this is not something we have the luxury of being able to fix in all cases.

There is also a section discussing the content model restrictions that apply to the HTML syntax.

http://www.whatwg.org/specs/web-apps/current-work/#element-restrictions

Note that although the specific example of UL inside P that you gave isn&apos;t mentioned in that section, it probably should be and that appears to be a bug in the spec.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20491</commentid>
    <comment_count>6</comment_count>
    <who name="Lachlan Hunt">lachlan.hunt</who>
    <bug_when>2008-06-14 15:10:30 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; Note that although the specific example of UL inside P that you gave isn&apos;t
&gt; mentioned in that section, it probably should be and that appears to be a bug
&gt; in the spec.

Disregard that comment. I somehow misread the P element&apos;s content model.  UL isn&apos;t even allowed inside P.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20498</commentid>
    <comment_count>7</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2008-06-14 18:45:16 +0000</bug_when>
    <thetext>I still don&apos;t understand the problem. A conforming editor couldn&apos;t create that DOM.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20504</commentid>
    <comment_count>8</comment_count>
    <who name="Rob Burns">rob</who>
    <bug_when>2008-06-14 19:05:30 +0000</bug_when>
    <thetext>As the discussion between Lachy and I shows, Henri[1] announced a change to the draft back in December without any decision from the WG. Such a major change to content models should be considered by the entire WG. This bug report suggest a way to fix it that doesn&apos;t require breaking the content models.

[1]: &lt;http://lists.w3.org/Archives/Public/public-html/2007Dec/0231.html&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20505</commentid>
    <comment_count>9</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2008-06-14 19:09:43 +0000</bug_when>
    <thetext>I really have no idea what you&apos;re proposing or what problem you&apos;re trying to solve.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20550</commentid>
    <comment_count>10</comment_count>
    <who name="Rob Burns">rob</who>
    <bug_when>2008-06-16 12:02:51 +0000</bug_when>
    <thetext>The intention here is to address the issue of using new HTML5 semantics in legacy UAs in a way that still parses in legacy UAs to the same hierarchical tree structure (even if the element types are not the same name but instead synonymous names).

It is a better way to address the issue that caused the regress of the draft that removed richer paragraph content models (allowing tables and lists within paragraphs).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>20559</commentid>
    <comment_count>11</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2008-06-16 20:23:07 +0000</bug_when>
    <thetext>(In reply to comment #10)
&gt; The intention here is to address the issue of using new HTML5 semantics in
&gt; legacy UAs in a way that still parses in legacy UAs to the same hierarchical
&gt; tree structure (even if the element types are not the same name but instead
&gt; synonymous names).

If you&apos;re ok with using different element names, then just use &lt;div&gt;. Problem solved.


&gt; It is a better way to address the issue that caused the regress of the draft
&gt; that removed richer paragraph content models (allowing tables and lists within
&gt; paragraphs).

The content models that allowed nested elements were there mostly as an experimental idea, and hadn&apos;t really gotten much thought. They were removed along with a bunch of other things I had been experimenting with when the spec started settling down. The basic reasoning was that there wasn&apos;t much point allowing it and that authors would likely not greatly appreciate it and that it would therefore be simpler to continue with HTML4&apos;s content models.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>32946</commentid>
    <comment_count>12</comment_count>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2010-03-14 13:14:11 +0000</bug_when>
    <thetext>This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>35130</commentid>
    <comment_count>13</comment_count>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2010-04-19 09:31:21 +0000</bug_when>
    <thetext>No longer waiting for a reply on this bug.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>