Bugzilla – Bug 6129
[Ser30] Extend HTML output method to support HTML5
Last modified: 2012-09-10 10:06:47 UTC
HTML5 working draft introduces few changes to HTML4 syntax that makes impossible to emit valid HTML5 output using HTML output method.
1) New !DOCTYPE doesn't use PUBLIC/SYSTEM.
HTML5 documents should start with
However such output can be produced only by resorting to d-o-e in XSLT. Output of !DOCTYPE is triggered by presence doctype-public/doctype-system serialization parameters but their presence will lead to spurious PUBLIC/SYSTEM part after <!DOCTYPE HTML>
The latest HTML5 WG removes this problem by allowing special !DOCTYPE for content produced by XSLT in a form
<!DOCTYPE HTML PUBLIC "XSLT-compat">
Putting this into spec wasn't easy and there is still some pushback. Anyway this way it is possible to use XSLT 1.0 and 2.0 to output subset of HTML5. The following issues will require more complex changes to HTML output method.
2) New empty elements were introduced, but their list is probably not finished yet.
List of empty elements should be thus updated or alternatively new serialization parameter like empty-elements should be created to allow user control over this serialization property.
3) All HTML5 elements are in http://www.w3.org/1999/xhtml namespace in order to get consistent DOM behaviour for HTML and XHTML content
The current HTML output method treats specially elements with "non-null namespace URI". This has to be changed to exclude elements in XHTML namespace and such elements should be output as if they were in no namespace (no prefix and no default namespace different then XHTML).
4) SVG and MathML content needs special handling.
It is not settled yet, but HTML5 will provide mechanism for including SVG and MathML fragments. But as HTML5 is not using XML namespaces some special treatment for serializing those languages will be necessary (like outputting only local names for elements from SVG and MathML namespaces).
There are probably few more issues, but not that important now.
I think that cleanest approach for accomodating HTML5 output method is to say that if version > 5 then additional serialization rules (outlined above) has to be applied.
Another problem is that HTML5 is just draft, not recommendation. How to handle this? Should be HTML5 output method left out of the serialization spec and put just into some working draft or note to have some baseline for implementors? After HTML5 gets recommendation status this note could be incorporated directly into next version of serialization spec.
Interestingly, the XSLT 1.0 specification seems to be much more accommodating here than XSLT 2.0/Serialization 1.0.
In XSLT 1.0 you can say <xsl:output method="html" version="5"/> and the spec says pretty clearly what it's intended to mean, and leaves it to implementors to make it work. Everying XSLT 1.0 says about serialization is a "should".
In Serialization 1.0 there are potentially conflicting requirements: if you use the method="html" version=5" (and the implementation supports it) then the output MUST conform to HTML5 but it also MUST meet lots of prescriptive rules for the format of the output that may well conflict with HTML5.
So I think there's scope for an erratum to Serialization 1.0 to clarify that in the case of future versions of XML, XHTML, or HTML not yet standardized, anything in the spec for the target language overrides any requirements of the serialization spec in the case of conflict.
But I think the real issue with the DOCTYPE declaration is not the XSLT 1.0 spec, it's the legacy XSLT 1.0 processors. The spec allows them to support HTML5, but many of the implementations of XSLT 1.0 aren't going to change in a hurry.
As for Serialization 1.1, it's difficult to build in support for HTML5 in its current state. Perhaps one could go so far as a non-normative appendix to say this is what HTML5 support might look like if the HTML5 spec doesn't change. But that would be going well beyond what specs normally do.
See also bug 6732, which proposes changes to Serialization 1.0 that would make the behavior of a serializer that supports HTML 5.0 or any other version of HTML or XML that does not currently exist entirely implementation-defined.
This will be addressed in the next public working draft of Serialization 3.0