Bugzilla – Bug 6723
[Ser] No rule about empty <p> elements in HTML serialization
Last modified: 2010-06-29 13:50:43 UTC
This is from a post today by Matthias Branter on the x-query.com talk list.
There appears to be no rule in the HTML serialization spec that says empty <div>, <p>, or <script> elements should be serialized as <div></div>, <p></p>, etc, rather than <p/> or <div/>.
There isn't even a fallback rule, as there was in XSLT 1.0, that says "do what the HTML specification says."
Should tackle this either with a specific rule about serializing these empty (but not EMPTY) elements, or with a general rule that says elements in no namespace should be serialized in accordance with the rules of the HTML specification whenever possible.
A few general comments: I agree that the Serialization recommendation needs to be at least slightly more prescriptive about the requirements it places on a serializer with respect to the HTML output method. We took care to ensure that a serializer was not required to serialize valid HTML if presented with a result tree that could not be serialized as valid HTML - but in doing so, we forgot to say that if it was possible to serialize the result as valid HTML, the serializer should do so.
Having said that, I want to point out that the second item of section 4 of the Serialization recommendation states that the markup generation phase of serialization produces start and end element tags for the XML, HTML and XHTML output methods, but it only produces empty element tags for the HTML output method. That should make it clear that a P element or a DIV element with empty content should be serialized as <P></P> and <DIV></DIV> respectively (or <P> and <DIV>, if the context allows the end tag to be omitted), and never as <P/> or <DIV/>.
However, it's probably a bit hard to spot there. I would like to propose the following changes:
At the end of the third paragraph of section 7 of the Serialization recommendation, add the following sentence: "If the result tree is valid HTML, the serializer MUST serialize the result in a way that conforms with the version HTML specified by the version serialization parameter."
After the second paragraph of section 7.1, add the following note: "Note: The markup generation phase of serialization only creates start tags and end tags for the HTML output method, never XML-style empty element tags. As such, a serializer MUST serialize an HTML element that has no children, but whose content model is not empty, using a pair of adjacent start and end element tags, or as a solitary start tag if the permitted by the context."
At their joint teleconference of 2009-08-25, the XQuery and XSL working groups agreed to make the changes proposed by comment #1, with editorial correction of typographical errors in the proposal. As there were few members of the XSL WG present for the call, I will raise this issue separately in an XSL WG call, to ensure there are no objections to this decision.
At its teleconference of 2009-11-12, the XSL WG ratified the decision described in comment #2. This will be Serialization erratum SE.E13.
Michael, you were present at both calls where this was decided, so I assume the decision is acceptable to you. However, I will leave it to you to decide whether to close the bug or check with Matthias Branter that the resolution is acceptable to him.
As it's been seven months since the bug was resolved, I am assuming that Matthias accepts the resolution.