5753 2008-06-14 09:02:20 +0000 parsing issues with legacy UAs 2010-10-04 14:49:40 +0000 1 1 1 Unclassified HTML WG pre-LC1 HTML5 spec (editor: Ian Hickson) unspecified All All VERIFIED WORKSFORME http://esw.w3.org/topic/HTML/InterimLegacyBridgingMarkup NoReply P2 normal FPWD 1 rob ian mike public-html-admin public-html-wg-issue-tracking public-html-bugzilla oldest_to_newest 20473 0 rob 2008-06-14 09:02:20 +0000 For the text/html serialization only: many key implementations do not use DTDs or any similar mechanism so they cannot correctly parse unknown HTML elements authors want to use the new semantics elements provided by in HTML5, but cannot do so if targeted UAs do not properly parse those elements routine DOM states cannot be serialized to text/html without loss of data This interim markup has two separate but related issues: content models not supported by the p (paragraph) element incorrect parsing for newly introduced elements (parsed either as void, paragraph-terminating or non-paragraph-terminating) (see http://esw.w3.org/topic/HTML/InterimLegacyBridgingMarkup for evolving solution proposals) 20474 1 ian 2008-06-14 09:10:01 +0000 I don't understand, could you clarify what exactly the problem is? Possibly give an example? 20477 2 rob 2008-06-14 09:46:30 +0000 Because of the disparate ways UAs currently handle parsing of unknown elements, the tree is constructed in a variety of ways. Also the content model supported by the text/html serialization does not support the full HTML5 content model. Imagine an editing UA with the tree p #textnode ul li li #textnode A user wants this serialized to text/html without loss of data so that it can be pasted into an email application and sent to a recipient whose email UA only supports text/html processing. Right now the data is simply lost. That's just one example, but the problem/issue has wider implications. 20484 3 lachlan.hunt 2008-06-14 11:06:01 +0000 (In reply to comment #2) > Because of the disparate ways UAs currently handle parsing of unknown elements, > the tree is constructed in a variety of ways. The spec already defines how the text/html serialisation needs to be parsed into a tree and how to reserialise it. Unless there is a specific bug with the spec you are wanting to get fixed, simply discussing the way legacy browsers do it today is largely irrelevant. 20487 4 rob 2008-06-14 11:27:57 +0000 (In reply to comment #3) > The spec already defines how the text/html serialisation needs to be parsed > into a tree and how to reserialise it. Unless there is a specific bug with the > spec you are wanting to get fixed, simply discussing the way legacy browsers do > it today is largely irrelevant. So regardless of legacy UAs and just focussing on HTML5 UAs: How would a UA serialize the DOM tree I gave in the above comment #2 example in a way that could be parsed into a HTML5 text/html processor without loss of data? 20490 5 lachlan.hunt 2008-06-14 12:31:27 +0000 (In reply to comment #4) > So regardless of legacy UAs and just focussing on HTML5 UAs: > How would a UA serialize the DOM tree I gave in the above comment #2 example > in a way that could be parsed into a HTML5 text/html processor without loss of > data? That is one of the well known differences between HTML and XHTML, and we are very much constrained by our backwards compatibility design principle. It is not possible to represent all possible documents in each of the three representations: HTML, XHTML and DOM. This is even mentioned in the spec. http://www.whatwg.org/specs/web-apps/current-work/#html-vs Unfortunately, we just have to accept that this is not something we have the luxury of being able to fix in all cases. There is also a section discussing the content model restrictions that apply to the HTML syntax. http://www.whatwg.org/specs/web-apps/current-work/#element-restrictions Note that although the specific example of UL inside P that you gave isn't mentioned in that section, it probably should be and that appears to be a bug in the spec. 20491 6 lachlan.hunt 2008-06-14 15:10:30 +0000 (In reply to comment #5) > Note that although the specific example of UL inside P that you gave isn't > mentioned in that section, it probably should be and that appears to be a bug > in the spec. Disregard that comment. I somehow misread the P element's content model. UL isn't even allowed inside P. 20498 7 ian 2008-06-14 18:45:16 +0000 I still don't understand the problem. A conforming editor couldn't create that DOM. 20504 8 rob 2008-06-14 19:05:30 +0000 As the discussion between Lachy and I shows, Henri[1] announced a change to the draft back in December without any decision from the WG. Such a major change to content models should be considered by the entire WG. This bug report suggest a way to fix it that doesn't require breaking the content models. [1]: <http://lists.w3.org/Archives/Public/public-html/2007Dec/0231.html> 20505 9 ian 2008-06-14 19:09:43 +0000 I really have no idea what you're proposing or what problem you're trying to solve. 20550 10 rob 2008-06-16 12:02:51 +0000 The intention here is to address the issue of using new HTML5 semantics in legacy UAs in a way that still parses in legacy UAs to the same hierarchical tree structure (even if the element types are not the same name but instead synonymous names). It is a better way to address the issue that caused the regress of the draft that removed richer paragraph content models (allowing tables and lists within paragraphs). 20559 11 ian 2008-06-16 20:23:07 +0000 (In reply to comment #10) > The intention here is to address the issue of using new HTML5 semantics in > legacy UAs in a way that still parses in legacy UAs to the same hierarchical > tree structure (even if the element types are not the same name but instead > synonymous names). If you're ok with using different element names, then just use <div>. Problem solved. > It is a better way to address the issue that caused the regress of the draft > that removed richer paragraph content models (allowing tables and lists within > paragraphs). The content models that allowed nested elements were there mostly as an experimental idea, and hadn't really gotten much thought. They were removed along with a bunch of other things I had been experimenting with when the spec started settling down. The basic reasoning was that there wasn't much point allowing it and that authors would likely not greatly appreciate it and that it would therefore be simpler to continue with HTML4's content models. 32946 12 mjs 2010-03-14 13:14:11 +0000 This bug predates the HTML Working Group Decision Policy. If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment. 35130 13 mjs 2010-04-19 09:31:21 +0000 No longer waiting for a reply on this bug.