Re: Updated DOCTYPE versioning change proposal (ISSUE-4) from Maciej Stachowiak on 2010-01-04 (public-html@w3.org from January 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 03 Jan 2010 19:45:13 -0800
To: Larry Masinter <masinter@adobe.com>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-id: <7E10ADFE-1EC4-410F-967C-32D34DA9A9D5@apple.com>
On Jan 2, 2010, at 4:21 PM, Larry Masinter wrote:

> The proposal was updated significantly, based on comments. I’ve  
> tried to address the “compound” issue as well.

Here's some purely personal comments on parts of your change proposal:

>
> In particular, the working group intends to support “polyglot”  
> documents which are both valid XML and XHTML and also valid as HTML  
> text/html; since XML workflows often require a !DOCTYPE with a  
> PublicIdentifier and a SystemIdentifier, this increases the  
> footprint of “polyglot” documents.

Do we have a demonstrated case of an XML workflow that requires both a  
PublicIdentifier and a SystemIdentifier?


> Many of the arguments made in previous discussions about versions  
> and doctypes were not careful to distinguish between “version of  
> specification” and “version of implementation”. It should be noted  
> that many *want* a version indicator to note “version of  
> implementation”, i.e., as an indicator of “best viewed by FireFox  
> 4.0 or later” or some such.  However, this change proposal is very  
> clearly providing for a version of a “specification”, and, in  
> particular, of the HTML specification, with the possibility of “mix”  
> specifications added.

Part of the reason this confusion arose is because at least one  
prominent implementor said they wanted to use a "version of  
specification" indicator to trigger versioning of implementation  
behavior (in addition to implementation-specific triggers).

> Many of the arguments in previous discussions were arguing against  
> version-specific browser behavior. But this change proposal  
> specifically does NOT allow for (any additional) version-specific  
> behavior, and in fact explicitly disallows it.




> It allows but does not require some validators to perform additional  
> validation, in that there may be additional validation based on the  
> PublicIdentifier or SystemIdentifier.   As behavior does not depend  
> on the DOCTYPE, validating the DOCTYPE is not required.

Indeed, it looks like the only MUST-level requirements are the  
following:

>  For these reasons, the DOCTYPE header is REQUIRED for HTML content  
> served as text/html (and optional for content served as an XML media  
> type), but supplying an explicit version indicator is NOT  
> RECOMMENDED except in limited circumstances.
>
> The syntax of the DOCTYPE element is:
>
> <!DOCTYPE html>
> <!DOCTYPE html PUBLIC “PublicIdentifier” “SystemIdentifier”>
> <!DOCTYPE html SYSTEM “about:legacy-compat”>

[...]

> HTML  documents not served as an XML media type MUST include a  
> DOCTYPE header, since many browsers, in the absence of a DOCTYPE  
> header, will trigger a “quirks” mode of rendering.

[...]

> However, HTML documents MUST NOT use “-//W3C//NONSGML HTML 5.0//EN”  
> until the edition of this specification referenced is actually  
> approved and published as a W3C Recommendation.

A consequence of this is that under your Change Proposal, documents  
that trigger quirks mode would be conforming. Is that an intended  
consequence? I think it is a desirable and intended feature of the  
current spec that quirks mode documents are nonconforming.


Also, minor nitpick: DOCTYPE is not an element.


The following two bullet points seem contradictory:

> Except for explicitly defined behavior (used to trigger “quirks  
> mode”, see section [#parse-behavior], [#quirks-mode] and [hsvonin]),  
> implementations which consume HTML MUST NOT use the DOCTYPE element  
> to trigger different processing behavior.


> Documents served as an XML media type MAY include a DOCTYPE header,  
> either to allow compatible content (so-called “polyglot” documents  
> which are both valid HTML and also valid XHTML) or to support  
> version-specific XML processing. While the DOCTYPE header is not  
> required, including may help in XHTML/HTML crossover.

Implementations MUST NOT use the DOCTYPE to trigger different  
processing, but documents MAY use it to support version-specific  
processing. Why would documents have a need to support version- 
specific processing if version-specific processing is not allowed?


> 9.1.1.2 PublicIdentifier for compound specifications
>
> Note that a PublicIdentifier only identifies a single specification,  
> not a complete implementation, a suite of specifications, or a  
> combination of vocabularies from multiple specifications. In order  
> to construct a PublicIdentifier for such a combination requires  
> publication of an actual specification which describes that  
> combination.
>
> Groups wishing to support the combination of HTML and other  
> specifications may supply short specifications showing how  
> additional vocabularies may be used with HTML; for example, a short  
> document “how to use RDFa with HTML” might be published. (This  
> document would reference RDFa and HTML but not include either  
> specification). In such case, the “+” format might be used:
>
> “-//W3C RDFAWG//NONSGML HTML+RDFa 20100401//EN” might reference the  
> HTML+RDFA document published by the RDFA working group.
>
> The W3C Hypertext coordination group is encouraged to coordinate  
> assignment of public identifiers.

This does not, in my opinion, address the compound document use case  
adequately. One of the original examples for this was syndication. RSS  
or Atom feeds often pull content from multiple sources which are not  
under the control of the syndicator. Thus, if versioning is to serve  
any purpose in such a scenario, it must be possible to label each  
separate HTML fragment with its own version.

Now, one could argue that versioning is of such limited usefulness  
that it's not important to serve syndication use cases, after all,  
it's only intended for controlled environments. But this goes  
completely against the future-proofing argument. A DOCTYPE-based  
version is not a sound way to future-proof HTML in syndication feeds  
against incompatible HTML changes, and a great deal of the HTML on the  
Web is republished in one or more feeds. If HTML did change  
incompatibly in the future, then we would certainly need a version  
indicator that can be applied separately to individual fragments of a  
document, such as a version attribute. This would make the DOCTYPE- 
based versioning redundant and merely a potential source of  
conflicting version indicators in the future.

Note also that if multiple languages may all be combined and each is  
versioned, then trying to represent this in a single DOCTYPE will  
result in a combinatorial explosion. Already we have the potential to  
combine HTML, MathML, SVG and RDFa. If you imagine we add 4 more  
languages (perhaps GRDDL, X3D, XForms, XSL-FO), and that each language  
has at least two versions, then we need standards specifying 256  
different doctype strings. With 10 languages having 3 versions each,  
we'd need 59049 different DOCTYPE strings, each with its own  
specification. Clearly, this approach is not scalable, compared to  
identifying each language version independently.

For these reasons, I think your approach to versioning in compound  
documents is not viable.

Regards,
Maciej
Received on Monday, 4 January 2010 03:45:47 UTC