This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5803 - HTML5 serialization is not compatible with XSLT
Summary: HTML5 serialization is not compatible with XSLT
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Michael[tm] Smith
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-25 15:00 UTC by Jirka Kosek
Modified: 2010-10-04 14:57 UTC (History)
8 users (show)

See Also:


Attachments

Description Jirka Kosek 2008-06-25 15:00:45 UTC
HTML5 requires !DOCTYPE declaration in the following form:

<!DOCTYPE HTML>

ie. without any public or system identifier. The trouble is that such !DOCTYPE can'be generated with XSLT. XSLT language can either generate no !DOCTYPE (default behaviour) or you have to specify public/system identifer which has to be appended to !DOCTYPE and which actually triggers generating of !DOCTYPE.

In order to make it possible to use widely deployed XSLT language to generate HTML5 content, HTML5 spec should allow optional public identifier in !DOCTYPE. It should be allowed to start HTML5 document by either:

<!DOCTYPE HTML>

or by something like

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML5//EN">

The later !DOCTYPE can be generated in XSLT by:

<xsl:output method="html" doctype-public="-//W3C//DTD HTML5//EN"/>

Older discussion about this topic can be found at
http://lists.w3.org/Archives/Public/public-html/2007JanMar/0432.html
Comment 1 Ian 'Hixie' Hickson 2008-06-25 19:51:44 UTC
Can't XSLT output arbitrary text, too?
Comment 2 Jirka Kosek 2008-06-25 20:07:19 UTC
(In reply to comment #1)
> Can't XSLT output arbitrary text, too?
> 

No. Although it is possible to use <xsl:text disable-output-escaping="yes">&lt;!DOCTYPE ...&gt;</xsl:text>, disable-output-escaping feature is only optional in XSLT language, it is not supported by all processors and it is working only when the output of XSLT transformation is directly serialized which is not always case.
Comment 3 Ian 'Hixie' Hickson 2008-06-25 20:46:03 UTC
Ah. Well, that's unfortunate. I guess XSLT will have to be updated. In the meantime you can do:

   <xsl:output method="html" doctype-public=""/>

...which will only generate one (minor) parse error. Or you can use something other than XSLT, which would be my advice.
Comment 4 Jirka Kosek 2008-06-25 20:59:08 UTC
(In reply to comment #3)
> Ah. Well, that's unfortunate. I guess XSLT will have to be updated. 

Why XSLT should be updated? Given how many effort is spent making HTML5 compatible with already deployed technology (web browsers) I don't understand why adding possibility of specifying *optional* string in !DOCTYPE is not possible?

> In the
> meantime you can do:
> 
>    <xsl:output method="html" doctype-public=""/>
> 
> ...which will only generate one (minor) parse error. 

I would like to hear what's wrong with allowing 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML5//EN">

With such provision there will be no parse errors, even minor, in XSLT generated content. 

> Or you can use something
> other than XSLT, which would be my advice.

Sorry Ian, but if you are acting as HTML5 spec editor here, you should try to improve spec and its interoperability with other technologies. If you don't like XSLT, simply don't use it. But there is many people who successfully rely on XSLT processing. It looks little bit cocky to cut off those people from ability to generate HTML5 output only because you seem to have sort of XSLT alergy.

I bet that without possibility of generating valid HTML5 by XSLT 1.0 & 2.0 as currently defined, HTML5 will not pass last call in W3C.
Comment 5 Ian 'Hixie' Hickson 2008-06-25 21:16:18 UTC
You can already use XSLT with HTML5 if you really want to, using either:

   <xsl:text disable-output-escaping="yes">&lt;!DOCTYPE ...&gt;</xsl:text>

...or (suffering excess output and a parse error):

   <xsl:output method="html" doctype-public=""/>

...or by just outputting XHTML instead of HTML, or by outputting to a DOM instead of text/html.

This is a limitation of XSLT. There's no reason why we should be bending over backwards here when XSLT could much more easily be changed to support outputting short DOCTYPEs. (Even more so since those are valid XML too.)

Also, note that there are much harder problems to deal with in the HTML syntax than just the DOCTYPE. For instance, the complications around comments in CDATA blocks. If XSLT can deal with those, it can certainly deal with outputting shorter DOCTYPEs.
Comment 6 Jirka Kosek 2008-06-26 06:57:49 UTC
(In reply to comment #5)
> You can already use XSLT with HTML5 if you really want to, using either:
> 
>    <xsl:text disable-output-escaping="yes">&lt;!DOCTYPE ...&gt;</xsl:text>

I have explained earlier why this is not generally working solution.

> 
> ...or (suffering excess output and a parse error):
> 
>    <xsl:output method="html" doctype-public=""/>

I don't see generating of invalid markup as a good solution.

> 
> ...or by just outputting XHTML instead of HTML, 

Why should be users in this particular case dictated whether to use XML or HTML serialization? With the same logic applied, we can, for example, allow SVG and MathML only in XML serialization, and get rid with many problems related to usign SVG/MathML in HTML serialization.

> or by outputting to a DOM
> instead of text/html.

Todays tools (web-servers and web-browser) can transfer text/html over HTTP but not DOM. So using DOM is not option here.

> This is a limitation of XSLT. There's no reason why we should be bending over
> backwards here when XSLT could much more easily be changed to support
> outputting short DOCTYPEs. (Even more so since those are valid XML too.)

No, more easily can be changed HTML5. HTML5 specification is work in progress, while both XSTL 1.0 and XSLT 2.0 are finished specs which are widely deployed. So cost of changing them is magnitude higher then allowing one optional string in HTML5 spec.

Moreover, it is HTML5 which is introducing changes to HTML syntax.

> Also, note that there are much harder problems to deal with in the HTML syntax
> than just the DOCTYPE. For instance, the complications around comments in CDATA
> blocks. If XSLT can deal with those, it can certainly deal with outputting
> shorter DOCTYPEs.

CDATA is completely different thing. There is no reason for outputting CDATA from XSLT transformation as all problematic characters are automatically escaped during serialization. So this is no showstopper for XSLT. But without minor change that I propose it will not be possible to emit HTML5 by XSLT.

Please accept my proposal or clearly state that HTML5 is intentionally designed not to be compatible with XSLT.

Comment 7 Henri Sivonen 2008-06-26 07:36:06 UTC
I think the HTML output mode of XSLT has a far bigger problem than the doctype: It is designed to work with XSLT programs that output elements in no namespace. However, especially with MathML and SVG support, it makes sense to write the XSLT programs to output elements in the (X)HTML, MathML and SVG namespaces so that the decision whether to serialize to XML 1.0 or HTML5 doesn't leak inside the XSLT program and is isolated to the serializer.

I think the right way to proceed is to avoid using the built-in serializer of an XSLT processor (until XSLT processors are updated to support HTML5) and to take SAX events out of the XSLT processor and stick a SAX-to-HTML5 serializer between the XSLT processor and the output stream.
Comment 8 Julian Reschke 2008-06-26 07:44:07 UTC
1. I do agree that it should be possible to generate valid HTML5 with any compliant XSLT processor.

2. No, the advices given (use something other than XSLT, rely on optional features, generate XHTML) do not work in practice.

3. One way to get there is to update XSLT 1.0 and XSLT 2.0 to include an HTML5 output method.

4. Another one would be to fix HTML5.

That being said, this seems to be something to be tracked in the WG's issue tracker.

Comment 9 Henri Sivonen 2008-06-26 07:45:40 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Can't XSLT output arbitrary text, too?
> > 
> 
> No. Although it is possible to use <xsl:text
> disable-output-escaping="yes">&lt;!DOCTYPE ...&gt;</xsl:text>,
> disable-output-escaping feature is only optional in XSLT language, it is not
> supported by all processors and it is working only when the output of XSLT
> transformation is directly serialized which is not always case.

If you aren't directly serializing the output of the XSLT program, surely XSLT is the wrong stage to worry about the doctype. You wouldn't micromanage the XML declaration, either, at that point.
Comment 10 Henri Sivonen 2008-06-26 07:54:14 UTC
(In reply to comment #3)
> In the meantime you can do:
> 
>    <xsl:output method="html" doctype-public=""/>
> 
> ...which will only generate one (minor) parse error. 

You aren't really addressing the problem if your solution is outputting non-conforming HTML5 (compare with ATAG and alt).

Still, I consider the "html" output mode of XSLT fundamentally broken for a non-doctype reason, so my suggested solution is using a non-built-in serializer until XSLT processors get a built-in HTML5 serializer.
Comment 11 Ian 'Hixie' Hickson 2008-06-26 08:08:05 UTC
My understanding is that XSLT's "HTML" support is intended for HTML4, i.e. SGML-based HTML. HTML5 has a whole new syntax with all kinds of weird things, like support for MathML and the special needs of void tags in such foreign content contexts, the weird behaviour of "<!--" in <textarea> and <script> blocks, and so on.

I agree that outputting invalid HTML5 is not a real solution, I was just suggesting it as a workaround until XSLT is updated to support HTML5.

HTML5 is not intentionally designed not to be compatible with XSLT, but compatibility with XSLT is definitely not a goal.
Comment 12 Julian Reschke 2008-06-26 08:12:54 UTC
(In reply to comment #11)
> HTML5 is not intentionally designed not to be compatible with XSLT, but
> compatibility with XSLT is definitely not a goal.

Apparently it isn't your goal.

But maybe it should be the WG's goal.

Comment 13 Ian 'Hixie' Hickson 2008-06-26 08:15:51 UTC
It's not a chartered goal, which is what I'm basing my determination on.
Comment 14 Michael[tm] Smith 2008-06-26 08:20:58 UTC
(In reply to comment #8)
> 3. One way to get there is to update XSLT 1.0 and XSLT 2.0 to include an HTML5
> output method.
> 
> 4. Another one would be to fix HTML5.
> 
> That being said, this seems to be something to be tracked in the WG's issue
> tracker.

I agree that this seems to be an issue the merits being raised in the group's
Tracker (because it concerns a possible conflict with a specification produce
by another WG). Julian, if you care to raise it, I'd appreciate it. Otherwise,
I can do it myself.

Personally speaking as an XSLT user, I really hope to we can arrive at a
solution that will allow XSLT tools to produce conformant HTML5 output that it
gotcha-free/not-broken. But I also personally agree that the HTML5
specification should not be constrained to limiting itself to only the kind of
serialized HTML output that current XSLT tools are capable of producing.
Comment 15 Julian Reschke 2008-06-26 08:34:04 UTC
(In reply to comment #14)
> I agree that this seems to be an issue the merits being raised in the group's
> Tracker (because it concerns a possible conflict with a specification produce
> by another WG). Julian, if you care to raise it, I'd appreciate it. Otherwise,
> I can do it myself.

Raised as <http://www.w3.org/html/wg/tracker/issues/54>.
Comment 16 Jirka Kosek 2008-06-26 10:13:47 UTC
(In reply to comment #7)
> I think the HTML output mode of XSLT has a far bigger problem than the doctype:
> It is designed to work with XSLT programs that output elements in no namespace.
> However, especially with MathML and SVG support, it makes sense to write the
> XSLT programs to output elements in the (X)HTML, MathML and SVG namespaces so
> that the decision whether to serialize to XML 1.0 or HTML5 doesn't leak inside
> the XSLT program and is isolated to the serializer.

But how many users is going to use MathML and SVG? I don't think it should be possible to generate every possible valid HTML5 document with XSLT, but it should be possible to generate at least one valid HTML5 document with current XSLT.

> I think the right way to proceed is to avoid using the built-in serializer of
> an XSLT processor (until XSLT processors are updated to support HTML5) and to
> take SAX events out of the XSLT processor and stick a SAX-to-HTML5 serializer
> between the XSLT processor and the output stream.

But this doesn't work with every plain vanilla XSLT procesor. Of course, many advanced XSLT processors allow you to specify your own method and corresponding serializer, but this solution is not portable.
Comment 17 Ian 'Hixie' Hickson 2008-06-26 10:25:18 UTC
Well, I've explained that compatibility with XSLT is not a goal, and there are a number of workarounds one can use if one insists on using today's XSLT with HTML5. Just saying that you disagree isn't going to suddenly make it a goal. :-)

Reassigning to Mike for arbitration.
Comment 18 Ian 'Hixie' Hickson 2008-06-26 19:52:37 UTC
Henri did a better job at explaining why this isn't as simple as the bug implies:
   http://www.w3.org/mid/5BDB45CE-CE1D-4845-8494-91972EF2FF73@iki.fi
Comment 19 Ian 'Hixie' Hickson 2008-09-20 22:54:51 UTC
This should probably be marked fixed since I added an XSLT-specific DOCTYPE.
Comment 20 Julian Reschke 2008-09-21 09:57:10 UTC
There's an open controversy about the actual name for the doc type. Lots of mails have been sent, and I don't see a consensus. Maybe the WG should vote on it.
Comment 21 Ian 'Hixie' Hickson 2008-09-21 10:07:46 UTC
Voting on a DOCTYPE name would be quite the extreme example of spec design by committee!
Comment 22 Jirka Kosek 2008-09-21 12:54:47 UTC
(In reply to comment #21)
> Voting on a DOCTYPE name would be quite the extreme example of spec design by
> committee!

Although I would personally prefer different identifier in DOCTYPE, I agree with Ian that vote on this is overkill. After all, such vote can lead to even less reasonable identifier then currently proposed in spec.

This issue was raised because it was not always possible to reasonably easy  generate HTML5 with XSLT. This is now possible, so I agree with Ian that issue is fixed.
Comment 23 Maciej Stachowiak 2010-03-14 13:14:31 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.