This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11057 - doctype about:legacy-compat, t
Summary: doctype about:legacy-compat, t
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Eliot Graff
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-14 23:36 UTC by David Carlisle
Modified: 2011-08-04 05:07 UTC (History)
7 users (show)

See Also:


Attachments

Description David Carlisle 2010-10-14 23:36:06 UTC
section 4 of

http://dev.w3.org/html5/html-xhtml-author-guide/WD-html-polyglot-20101019.htm

allows the use of about:legacy-compat, it is not clear that this leads to well formed xml, since I don't think about:legacy-compat resolves to anything (even an empty string would be OK as a minimum dtd subset) Of course one could argue that the xml parser need not fetch external subsets, or a catalog may substitute a different file, but if that argument were to be used it would apply equally to the html3.2 and html4 doctype strings, but the text here says that they are incompatible with xml.
Comment 1 Aryeh Gregor 2010-10-15 15:52:41 UTC
The HTML5 specification defines about:legacy-compat as reserved but unresolvable:

http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#about:legacy-compat
Comment 2 David Carlisle 2010-10-15 16:17:50 UTC
(In reply to comment #1)
> The HTML5 specification defines about:legacy-compat as reserved but
> unresolvable:
> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#about:legacy-compat

thanks, which leaves the question of whether an xml document referencing it is well formed or not in one of the more notorious dark corners of the XML spec.
Probably safer to tell people not to do it if they want compatible behaviour from an html and xml parser.
Comment 3 Eliot Graff 2010-10-29 19:58:40 UTC
The purpose of the about:legacy-compat value is to allow documents to be created using XSLT. Its highly likely that developers will want to create polyglot docs using XSLT, and so guidance about this is necessary.

Thank you very much for the feedback.

Eliot
Comment 4 David Carlisle 2010-10-29 21:18:56 UTC
(In reply to comment #3)
> The purpose of the about:legacy-compat value is to allow documents to be
> created using XSLT. Its highly likely that developers will want to create
> polyglot docs using XSLT, and so guidance about this is necessary.
> 
> Thank you very much for the feedback.
> 
> Eliot


This comment doesn't address the issue.

If you use xsl:output to add a doctype then XSLT gives absolutely no assurance that the resulting document is well formed, and the subject of this bug is my belief that if you specify about:legacy-compat that the resulting document is not in fact well formed.

I agree that guidance is necessary but question whether it is advisable to recommend that non well formed documents be passed to an XML parser.
Comment 5 Julian Reschke 2010-10-30 06:48:37 UTC
(In reply to comment #4)
> This comment doesn't address the issue.
> 
> If you use xsl:output to add a doctype then XSLT gives absolutely no assurance
> that the resulting document is well formed, and the subject of this bug is my
> belief that if you specify about:legacy-compat that the resulting document is
> not in fact well formed.
> 
> I agree that guidance is necessary but question whether it is advisable to
> recommend that non well formed documents be passed to an XML parser.

If we have doubts about that, a new bug should be raised against the HTML5 spec.
Comment 6 Tony Ross [MSFT] 2010-11-04 16:26:18 UTC
I agree with Julian. If this is incompatible with XML, we should raise a bug against the HTML5 spec.

Of course it was also mentioned that this is arguably not a problem, but is inconsistent with the text for HTML3.2 and HTML4 doctypes. Would expanding the definition of allowed doctypes to include those as well be a reasonable resolution to this bug?
Comment 7 David Carlisle 2010-11-04 17:53:09 UTC
(In reply to comment #6)
> I agree with Julian. If this is incompatible with XML,

well it's not incompatible with XML, just needs to be used with care.

specifically I think if the XML parser tries to de-reference it the document will fail with a fatal error as not well formed.

however an XML parser may (and ones in browsers typically do) not fetch external entities and if they don't fetch them they do not have to report errors in what they have not seen.

So it depends why the user is choosing to use the xml syntax:

Iif the file is just the end result of an xml pipeline then using about:legacy-compat is OK, but so is more or less any doctype which produces standards mode in html (even if it's an sgml not an xml dtd).

If on the other hand the user is using the xml syntax because they want (someone) to be able to use the file as -input- to an xml pipeline then probably some words of advice ought to be given as most xml parsers (rxp, xerces, msxml?) would fail to parse such a file out of the box and would need to be configured (eg with a catalog) to do something safe with the dtd, or not to resolve it. 

> we should raise a bug
> against the HTML5 spec.

the html5 spec merely says that the about: URI is to make it easier for xslt, to generate it, and for that limited use, it s OK. By implication it is also saying that xml parsers within browsers will not derefernce this SYSTEM id.

> 
> Of course it was also mentioned that this is arguably not a problem, but is
> inconsistent with the text for HTML3.2 and HTML4 doctypes. Would expanding the
> definition of allowed doctypes to include those as well be a reasonable
> resolution to this bug?

there is a kind of logic to that which appeals to me as a mathematician, but practically speaking I don't really think that we/you should be advising people to do that.


possibly just add, in the note at the end of section 4 something like...

Also note that when using an XML parser to parse a document using the about:legacy-compat the parser must be configured not to deference this URI (as that will fail and cause a parse error). The XML parsers used by web browsers are usually configured this way by default, but other XML processing pipelines may not be.


except that wording (which I just made up now) is probably too long (and also not technically accurate) for example in java your parser may deference the uri if you have installed a URIresolver that special cases this and returns something safe (eg an empty string rather than an error). Basically you need _something_ to avoid trying to fetch the about:legacy-compat, but there are various layers where that redirection can occur.  I think any kind of note that hints that about: URIs need to be used with care in xml processing pipelines would be sufficient, it's probably best to avoid the details of exactly what care is needed, since it's rather dependent on the processing framework being used.
Comment 8 Eliot Graff 2010-12-09 19:54:05 UTC
The 9 December Editor's Draft now contains the following content in section 4, The DOCTYPE:

Note that using about:legacy-compat in XML may yield unpredictable parsing results, depending on the XML processing pipeline. 

Thanks for your help and patience.

Eliot
Comment 9 Michael[tm] Smith 2011-08-04 05:06:53 UTC
mass-move component to LC1
Comment 10 Michael[tm] Smith 2011-08-04 05:07:18 UTC
mass-move component to LC1