This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9958 - The DOCTYPE paragraph must explain and define the DOCTYPE rules better and more generally
Summary: The DOCTYPE paragraph must explain and define the DOCTYPE rules better and mo...
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML/XHTML Compat. Authoring Guide (ed: Eliot Graff) (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: FPWD
Assignee: Eliot Graff
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/html-xhtml-au...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-20 04:16 UTC by Leif Halvard Silli
Modified: 2010-10-05 13:07 UTC (History)
6 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2010-06-20 04:16:14 UTC
Current definition:

]]
A polyglot document uses the <!DOCTYPE html> doctype. Note that for a polyglot document the string, html, must be lower case. For a pure HTML document, the string is defined as case-insensitive.
[[

        New, proposed replacement text (justification follows below):

]]
In polyglot markup, a doctype that ensures that the browser makes a best-effort attempt at following the relevant specifications, is REQUIRED for HTML-compatibility. The doctype MUST also be XML compatible, which means that it has to follow XML’s casing rules. Thus — in contrast to pure HTML documents — for an HTML-compatible XHTML document, it is REQUIRED:

* that the string <code>DOCTYPE</code> is in uppercase;
* that the string <code>html</code> is in lowercase (because it represents the root element);
* that the string <code>SYSTEM</code> — if present — is in uppercase;
* that the string <code>PUBLIC</code> — if present — is in uppercase;
* that an FPU — if present — is a case-sensitive match of the registered FPU that is meant;

In addition, a URI, if present in the doctype, must point to the resource that is intended. Altering the case of the URI could make it point to a another resource than the intended one. The requirement that the URI is correct is equal in both HTML and XML, even if the effect on parsing — in HTML versus XML — if the URI is incorrect, possibly differ: 

* if the URI is the string <code>about:legacy-compat</code>, the string MUST be in lowercase, as required by HTML5.
* if the URI is a http URL, the URI must point to the correct resource. 

So if an HTML polyglot contains the HTML5 doctype, then it must appear in the form <!DOCTYPE html>, case-sensitively. If a HTML polyglot contains the alternative HTML5 <code>about:legacy-compat</code> doctype, then it must be <!DOCTYPE html SYSTEM "about:legacy-compat"> or <!DOCTYPE html SYSTEM 'about:legacy-compat'>, case-sensitively. 

If an HTML polyglot contains one of the XHTML doctype that HTML5 describes as obsolete but still HTML5 compatible (currently they are XHTML 1.0 Strict or XHTML 1.1), then  it MUST be used in an XML-compatible way, as described above. An HTML polyglot may use any other XHTML doctype with a referenced DTD, if it has the same best-effort effect on HTML5-parsers as <!DOCTYPE html> has (in particular it must trigger strict mode).  However, note, that by using a DOCTYPE which references a DTD, the document is subjected to follow the rules of the DTD, and that the rules of the DTD may or may not be compatible with HTML5 based polyglot markup.

Note that doctypes for HTML4, HTML3 or HTML2, are forbidden in HTML-compatible XHTML documents, regardless of whether they contain a URI or not and regardless of their effect in HTML5 parsers, as they are not XHTML compatible.
[[

The suggested replacement text solves the following problems:

1) HTML5 actually operates with *two* doctypes: <!DOCTYPE html> and <!DOCTYPE html SYSTEM "about:legacy-compat"> – whereas current text in the polyglot draft appears to say that only <!DOCTYPE html> is valid.
2) The polyglot spec should define more generall rules – as HTML5 itself does (within its limits). That way, one can also open up for more doctypes than HTML5 mentions - as the new text does. 
3) The old text does not describe all the requirements of the DOCTYPE. E.g. it omits that the string 'DOCTYPE' must be uppercase - and so on. And it doesn't explain *why* the 'html' string must be lowercase.
4)  The last sentence - "For a pure HTML document ..." feels unnecessary. Also, it would be just as natural to mention that pure XHTML does not need a doctype. The polyglot spec in fact defines a HTML-compatible *XHTML format*. And thus, it is in fact more natural to explain why there must be a doctype. The new text explains this - however it does so as briefly as possible.
5) The effect of HTML4 doctypes once came up in the HTMLWG – and since HTML5 says that one some of them are compatible, the polyglot spec shoudl say that theyar not polyglot markup compatible.


Note, that the first sentence is in the new text is a direct quote from HTML5: "ensures that the browser makes a best-effort attempt at following the relevant specifications"
http://dev.w3.org/html5/spec/syntax.html#the-doctype
Comment 1 Leif Halvard Silli 2010-06-20 15:24:33 UTC
Sorry, instead of FPU I meand FPI (Formal Public Identifier)
Comment 2 Eliot Graff 2010-07-10 00:06:26 UTC
Updated the DOCTYPE section per these requests.

Thanks so much for the feedback!
Comment 3 Franklin Tse 2010-07-10 04:23:54 UTC
Sorry to reopen the report, but I would like to have some additional changes related to DOCTYPE.

1)
The spec says:
Polyglot markup should use the <!DOCTYPE html> document type declaration. Polyglot markup conforms to the following rules for this document type declaration:

It shouild say:
Polyglot markup MUST have a document type declaration (DOCTYPE) specified by section 8.1.1 of HTML5. In addition, the DOCTYPE MUST conform to the following rules:

It's because the DOCTYPE is required in the HTML syntax of HTML5. This MUST-level requirement should be preserved in the polyglot markup.

2)
The part of "Other document type declarations can also be used... as these document type declarations are not compatible with XHTML." can be removed.

Since the HTML syntax of HTML5 only permits certain DOCTYPEs, DOCTYPEs that are not permitted by HTML5 should not be used. (Already reflected by chnage 1, if approved).

3)
The spec says:
"Note that polyglot markup cannot use document type declarations for HTML4, HTML3, or HTML2, regardless of whether they contain a URI or not and regardless of their effect in HTML5 parsers, as these document type declarations are not compatible with XHTML."

It should say:
Although NOT RECOMMENDED, polyglot markup MAY use an obsolete permitted DOCTYPE. However, DOCTYPE defined for HTML4 MUST NOT be used as they are not not compatible with XHTML.

This emphasizes that obsolete permitted DOCTYPE should not be used and does allow the use of DOCTYPE defined for HTML4.
Comment 4 Franklin Tse 2010-07-12 04:17:55 UTC
After a second thought, it seems that there is no need to allow obsolete permitted DOCTYPE in polyglot markup, because only those XML built-in named entity references are permitted.

What do you think?
Comment 5 Eliot Graff 2010-09-04 00:41:41 UTC
DOCTYPE section was updated per fixes outlined in response to bug 9958:

4. The DOCTYPE

Polyglot markup must have a document type declaration (DOCTYPE) specified by section 8.1.1 of [HTML5]. In addition, the DOCTYPE must conform to the following rules:

    * The string DOCTYPE is in uppercase letters.
    * The string html is in lowercase letters.
    * The string SYSTEM, if present, is in uppercase letters.
    * The string PUBLIC, if present, is in uppercase letters.
    * A Formal Public Identifier (FPI), if present, is a case-sensitive match of the registered FPI to which it points.
    * A URI, if present in the document type declaration, is a case-sensitive match of the URI to which it points.
          o If the URI is the string about:legacy-compat, the string must be in lowercase, as required by HTML5.
          o If the URI is an http URL, the URI must point to the correct resource, using case-sensitive letters.

Note that polyglot markup cannot use document type declarations for HTML4, HTML3, or HTML2, regardless of whether they contain a URI or not and regardless of their effect in HTML5 parsers, as these document type declarations are not compatible with XHTML.