Bug 20993 - To facilitate migration, <!DOCTYPE html> should be recommended for the XHTML5 syntax.
To facilitate migration, <!DOCTYPE html> should be recommended for the XHTML5...
Status: REOPENED
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec
unspecified
PC All
: P2 normal
: ---
Assigned To: Robin Berjon
HTML WG Bugzilla archive list
http://www.w3.org/html/wg/drafts/html...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-14 09:53 UTC by Leif Halvard Silli
Modified: 2013-03-22 04:44 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Leif Halvard Silli 2013-02-14 09:53:07 UTC
XHTML5 sections says about the DOCTYPE declaration:

]] XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification. This specification does not define a public or system identifier, nor provide a formal DTD. [[

PROBLEM: While this spec does not define identifiers, XML in fact says that the name section of the DOCTYPE declaration (<!DOCTYPE name>) should match the root element type (that is: the element that is defined in the syntax rules as the root element). 

         The word 'DOCTYPE' can cover both DOCTYPE declarations (such as <!DOCTYPE html> as well as <!DOCTYPE html SYSTEM "URL"> etc) on one side, and DTDs (DOCTYPE definitions), which are referenced via the public or system identifier inside a DOCTYPE declaration), on the other side. Hence it is possible to read the above to say that e.g. <!DOCTYPE IAmCool> (which is a DOCTYPE)  is fully conforming XHTML5.  

         And, in fact the NU validator currently blesses <!DOCTYPE IAmCool> when used inside XHTML.

Therefore, in addition to the above, the spec should add that *if* a DOCTYPE declaration is used for a XHTML document (that is: for a XML document that begins with the html root element in the XHTML namespace, then authors are required to make sure that the root name inside the DOCTYPE declaration matches the name of the XHTML document. In other words, authors must verify that the DOCTYPE begins '<!DOCTYPE html', if that is (literally) how the root element begins, or, the root element is prefixed with myprefix, then the DOCTYPE must match that (which means that there must be a DTD somewhere which defines the xml:myprefix "attribute"): 
<!DOCTYPE myprefix:html [<!ATTLIST html xmlns:myprefix CDATA "http://www.w3.org/1999/xhtml"><!--Yes, the xmlns:myprefix must be declare-->]>.

A consequence of this rule is that XHTML5 validators must check that the DOCTYPE declaration is <!DOCTYPE html>.

Some XHTML5 validators already behave this way, for instance the XHTML5 validator that is built into the OXygen XML editor (which in turn FWIW implements Xerces), cries out if the <!DOCTYPE html> and the root element aren't in sync.


JUSTIFICATIONS:

(1) That the root name of the DOCTYPE has to match the root element (including the namespace prefix, if there is one) is something that follows more or less literally from XML 1.0 - it is implied when using a DTD. As such, this is in line with the preceding paragraph of HTML5, which says:

]] This specification does not define any syntax-level
   requirements beyond those defined for XML proper.[[

(2) By adding this, we avoid that authors do <!DOCTYPE ILoveXHTML> and other pointless "demonstrations/distractions" with the DOCTYPE. 

(3) We send a signal that plays in positive with regard compatibility with the text/html serialization, since *if* the DOCTYPE is used, then it will be HTML compatible. (This is not 100% true, if the DOCTYPEs triggers Quirks Mode. However, amongst the XHTML doctypes, none of them seem to trigger quirks. Almost standards mode is the furthest we deviate from no-quirks.)

(4) Yes, DTD-less DOCTYPE declarations are not subject to XML 1.0 DTD-validity concept. However, since well-formed documents can also be checked via XML schemas etc, it makes sense to restrict DTD-less DOCTYPEs to what XML 1.0 restricts them for, namely for declaring the root element. (Note that XML 1.0 also has a few rules that doesn't fall under whether validity nor well-formed.)

(5) There are already many validity things that are checked when the NU validators performs XHTML5 checking: It checks that the root element is <html> (yes, it could be <h:html xmlns:h="http://www.w3.org/1999/xhtml">, but in a XHTML document, the root has to be the 'html' element! And that the root must be <html>, is a validity concept - it is not a well-formed concept. (And there are many, many XHTML5 conformance checks that are validy issues and not well-formed issues. And thus, since the validity concept is involved in this (and other) aspects of _XHTML5_ conformance checking, and since many of those rules are there in order to assure interoperability between HTML5 and XHTML5, it seems logical to also include DOCTYPE validity checking as part of XHTML5.

(6) These rules make it difficult to fake and difficult to be "advanced". But keeps it simple to be simple - to use simple DOCTYPE declarations.

NOTE 1: This bug does not say that anything should change with regard to parsing of XHTML, invalid DOCTYPE declaration will continue to bother no one, except DTD-validating processors (such as e.g. XML editors).
NOTE 2: This bug does not propose to *require* the use of the DOCTYPE declaration in XHTML - it only defines how it should be used when or if it is used.
Comment 1 Robin Berjon 2013-02-18 13:51:08 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: none
Rationale:

Here's what the XML specification has to say about this:

"""
Validity constraint: Root Element Type

The Name in the document type declaration MUST match the element type of the root element.
"""

This is a *validity constraint*. Which is to say:

"""
[Definition: A rule which applies to all valid XML documents. Violations of validity constraints are errors; they MUST, at user option, be reported by validating XML processors.] 
"""

That only makes sense if you see what a "valid" XML document is:

"""
An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.
"""

Since XHTML5 has no DTD to define constraints on it, an XHTML5 can never be made to pass validity constraints. What you are requesting be checked should in fact never be checked (unless you invent an XHTML5 DTD of your own, but then you'd get the behaviour you want as per the XML spec automatically.)

An XML processor that enforces the Name constraint while there is no DTD is in violation of the XML specification, as per my reading. It is no business of the HTML specification to modify XML (unless strictly required by Web compatibility).

So concerning your justifications:

(1) is not supported by the XML specification.
(2) I don't think that this is a justification. Authors should be empowered to cry out their love of web technology wherever they see fit.
(3) is a problem for polyglot to address.
(4) I don't really understand what you're trying to get at there. I don't think that encouraging validation using XML Schema is a good practice in any situation.
(5) seems to be a comment about NU.
(6) I don't understand, sorry.

Concerning your notes:

NOTE 1: if your XML editor enforces DTD validation, you are using a broken XML editor.
NOTE 2: but that is already covered by the XML specification.
Comment 2 Leif Halvard Silli 2013-02-18 18:36:44 UTC
(In reply to comment #1)
Regarding XML 1.0 validity: XHTML5 doesn’t come with a DTD. Thus we base XHTML5’s conformance rules on well-formedness plus the rules we define in the HTML5 spec. Thus, for instance, HTML5 forbids, even for XHTML5, to place block elements inside the <p> element. The lack of a DTD does not prevent us from operating with this rule. And there is no fundamental difference betwen defining a rule for <p> and defining a rule for the DOCTYPE. Thus I propose we do define the rules we want, also for the DOCTYPE declaration.

> (1) is not supported by the XML specification.
 
That the 'name' part of the DOCTYPE declaration should reflect the root element, is definitely supported by the XML spec. That the document is well-formed even if it doesn’t reflect the root, doesn’t change that. Also, on my question to xmlschema-dev@ regarding whether XSD schema could hypothetically support validation of DOCTYPE declarations, C. M. Sperberg-McQueen replied:

]] You are quite right that there would be no logical contradiction in a schema language spec that allowed the kind of check you have in mind.[[
http://lists.w3.org/Archives/Public/xmlschema-dev/2013Feb/0016.html

In other words: We are free to define conformance rules for the DOCTYPE declaration even if we don’t operate with a DTD.

> (2) I don't think that this is a justification. Authors
>     should be empowered to cry out their love of web 
>     technology wherever they see fit.

I’ll take you seriously once you, within the quirks-mode constraints, propose the same rule for text/html.

> (3) is a problem for polyglot to address.

The HTML5 specification’s text about DOCTYPE in XHTML5, is needlessly cocky. The polyglot spec would prefer to get more help from the mother spec in this case. This wish is congruent with how the mother spec - and not the polygot spec - *permits* <meta charset="UTF-8"/"> in the XHTML5 syntax despite that it is useless inside XML documents.

> (4) I don't really understand what you're trying to get at
>     there. I don't think that encouraging validation using 
>     XML Schema is a good practice in any situation.

NU validator & most XML editors are schema based = observation. 

> (5) seems to be a comment about NU.

Even the old validator requires XHTML documents to be complete, with the root element and all.

> (6) I don't understand, sorry.

The current text in my view gives the impression that everything relating to DOCTYPE in XHTML5, is up in the air. This isn’t helpful for someone looking for some guidance.

> Concerning your notes:
> 
> NOTE 1: if your XML editor enforces DTD validation, 
> you are using a brokenXML editor.

NOTE 1 was a comment about parsers - it was not about editors. 

> NOTE 2: but that is already covered by the XML specification.

You are right that *if* someone writes a DTD for XHTML5, then the rules for how to write DTDs would in fact require the 'html' name part of the DOCTYPE declaration to be lowercase.

But other than that, XML 1.0 does not specify how to write XHTML5.  XHTML5 might be the only XML dialect where authors would like to use a DTD-less DOCTYPE. Therefore HTML5 should give some recommendation for this specific usage.

MODIFIED PROPOSAL:

When a DOCTYPE is used, then, in order "to facilitate migration to and from XHTML",[1] then it is RECOMMENDED that the 'name' part matches the root element even when the declaration, such as for <!DOCTYPE html>, doesn’t include a DTD.

And if you like, it is OK for me - and in fact better, if the spec says that the match should be a case-*in*sensitive match of 'html'. In fact, by making it a SHOULD/RECOMMENDED rule, we have already allowed deviation from lowercase.

[1] http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-charset
Comment 3 Leif Halvard Silli 2013-02-21 03:28:11 UTC
Robin, here is a concrete text proposal:

<INS>
DOCTYPE declarations used with documents conforming to this specfication MUST be shaped according to the following rules:

1. The name part MUST be an ASCII case-insensitive match for the
   string 'html'.
     
2. It is OPTIONAL whether there is a DTD associated with the
   DOCTYPE.
   
3. When there is no associated DTD, the DOCTYPE MUST match
   the DOCTYPE of the HTML syntax.

NOTE: A DOCTYPE declaration without an associated DTD does not
      have any effect on XML parsers and is only allowed in order
      to facilitate migration to and from HTML.
      
4. The associated DTD, if any, MUST define this specification’s
   list of named character references. Other character references
   MUST NOT be defined.

NOTE: Thus there is no requirement that the DTD is a complete <a
      href="http://www.w3.org/TR/xml/#dt-markupdecl">markup 
      declaration</a> that defines other aspecs of the language
      than the character references.
</INS>
Comment 4 David Carlisle 2013-02-21 10:06:40 UTC
(In reply to comment #3)
> Robin, here is a concrete text proposal:
> 
> <INS>
> DOCTYPE declarations used with documents conforming to this specfication
> MUST be shaped according to the following rules:
> 
> 1. The name part MUST be an ASCII case-insensitive match for the
>    string 'html'.

There is a pre-existing requirement from XML that the name here must match
exactly the top level document element name, so any extra advice here
(and I'm not sure there should be any) should be phrased and positioned
as describing the document element, not the doctype declaration.
I'm not sure why you want to say case insensitive anyway as the element names have to be lowercase in the xhtml syntax.

>      
> 2. It is OPTIONAL whether there is a DTD associated with the
>    DOCTYPE.

OK although saying that or not saying that adds nothing.
>    
> 3. When there is no associated DTD, the DOCTYPE MUST match
>    the DOCTYPE of the HTML syntax.

I think here you mean <!DOCTYPE html> but as for (1) this would be better phrased by saying that the document element MUST be html (but I'm not sure we want to say that, it isn't a strictly necessary requirement)
> 
> NOTE: A DOCTYPE declaration without an associated DTD does not
>       have any effect on XML parsers and is only allowed in order
>       to facilitate migration to and from HTML.

It could have some effect, in particular if the specified name is missing
or mal formed it is a fatal parse error.

>       
> 4. The associated DTD, if any, MUST define this specification’s
>    list of named character references. Other character references
>    MUST NOT be defined.

That requirement is incompatible with all the DTD currently listed in the
HTML spec.
> 
> NOTE: Thus there is no requirement that the DTD is a complete <a
>       href="http://www.w3.org/TR/xml/#dt-markupdecl">markup 
>       declaration</a> that defines other aspecs of the language
>       than the character references.
> </INS>


In your original comment you say

(yes, it could be <h:html xmlns:h="http://www.w3.org/1999/xhtml">, but in a XHTML document, the root has to be the 'html' element!


But I don't see why that should be the case
<!DOCTYPE h:html>
<h:html xmlns:h=....

is valid xhtml 1.1 and works in common browsers, why say the root has to be html>

The xhtml 1 dtd goes to extraordinary lengths to parametrise every element name precisely so that this prefixed usage is allowed.
Comment 5 Leif Halvard Silli 2013-02-21 13:20:25 UTC
In reply to comment #4 from David Carlisle:

>> 1. The name part MUST be an ASCII case-insensitive match for the
>>    string 'html'.
> 
> There is a pre-existing requirement from XML 

As I think you said elsewhere, you are right that this bug first insisted on a DTD-valid DOCTYPE, but that I now have switched to what we could call a HTML5-valid approach. The reason can be summed up like so: While the old validator performs a DTD-validity check of the DOCTYPE declaration (see http://tinyurl.com/oldvalidator ), the NU validator explicitly skips the DTD, and thus as well “blesses” whatever DOCTYPE name you might pick (see http://tinyurl.com/newvalidator ). If you can convince the NU devs to perform DTD check, then fine. 

>> 2. It is OPTIONAL whether there is a DTD associated with the
>>    DOCTYPE.
> 
> OK although saying that or not saying that adds nothing.

The spec says that XHTML5 documents are *permitted* to do <meta charset="UTF-8"/>. The spec does not have to permit something  that is useless. But the DOCTYPE rules that I propose here, can be defended from the same motivation per which the spec allows <meta charset="UTF-8"/>.

>> 3. When there is no associated DTD, the DOCTYPE MUST match
>>    the DOCTYPE of the HTML syntax.
> 
> I think here you mean <!DOCTYPE html>

Yes. Or the legacy variant - <!DOCTYPE html SYSTEM "about:legacy-compat">. (perhaps my text is unclear?)

>  but as for (1) this would be better
> phrased by saying that the document element MUST be html (but I'm not 
> sure we
> want to say that, it isn't a strictly necessary requirement)

I don't want to to say anything about prefixed XHTML - the NU validators allows that. Fine. But it is true that these rules would not permit a DTD for a prefixed XHTML. That said, it would be OK too me if the validator ignores the DOCTYPE declaration for prefixed XHTML.
 
>> NOTE: A DOCTYPE declaration without an associated DTD does not
>>       have any effect on XML parsers and is only allowed in order
>>       to facilitate migration to and from HTML.
> 
> It could have some effect, in particular if the specified name is missing
> or mal formed it is a fatal parse error.

A DOCTYPE declaration without a name part does, I think, not qualify as a DOCTYPE declaration, per XML 1.0.

>> 4. The associated DTD, if any, MUST define this specification’s
>>    list of named character references. Other character references
>>    MUST NOT be defined.
> 
> That requirement is incompatible with all the DTD currently listed in the
> HTML spec.

I specifically chose the wording 'associated' so that one can associate a correct DTD with it. (As you know, the currently listed DTS will be parsed by a conforming XHTML5 parser as if they do have the correct character references defined.) If you have any idea about how this could be said better, then feel free.

>> NOTE: Thus there is no requirement that the DTD is a complete <a
>>       href="http://www.w3.org/TR/xml/#dt-markupdecl">markup 
>>       declaration</a> that defines other aspecs of the language
>>       than the character references.
>> </INS>
> 
> 
> In your original comment you say
> 
> (yes, it could be <h:html xmlns:h="http://www.w3.org/1999/xhtml">, but in a
> XHTML document, the root has to be the 'html' element!
> 
> 
> But I don't see why that should be the case
> <!DOCTYPE h:html>
> <h:html xmlns:h=....
> 
> is valid xhtml 1.1 and works in common browsers, why say the root has to be
> html>
>
> The xhtml 1 dtd goes to extraordinary lengths to parametrise every 
> element name precisely so that this prefixed usage is allowed.

In the first text proposal I wrote (but not sent to this bug) I wrote up rules which would have allowed a prefixed doctype. As told above, we could limit the above rules to unprefixed XHTML ... (Especially) if that would make you more in favor of what I propose in this bug, then I can add it ...
Comment 6 Robin Berjon 2013-03-13 11:36:47 UTC
Sorry but I am now completely and utterly lost as to what you are expecting to get out of this bug.
Comment 7 Leif Halvard Silli 2013-03-13 12:08:32 UTC
(In reply to comment #6)
> Sorry but I am now completely and utterly lost as to what you are expecting
> to get out of this bug.

Do you understand why <meta charset="UTF-8"/> is _"permitted"_ in XHTML5?

Do you understand that by "permitting" <meta charset="UTF-8"/>, then the spec - or dare I say Ian - actually *forbade* all other incarnations of <meta charset="*"/>?

A very clever trick, if I may say so. He did not find any justifyable usecase for <meta charset="ISO-8859-1"/> in XHTML5, for instance. To make it do that, the spec has to make it an error!

So I want to apply Ian’s trick to <!DOCTYPE html>. Do you have any usecases for why <!DOCTYPE ROBIN-BERJON> should be permitted in XHTML5? Currently, the XHTML5 validator issues no error if you do such a thing.

I’m sorry, but to me this is really very obvious. Ian’s idea behind permitting <meta charset="UTF-8"/> really is to promote the swithching from XHTML to HTML and vice versa - that's what the spec says, quite literally. However, as we know, if 

XHTML5 is not _required_ to use <meta charset="UTF-8"/>. Thus, it is fully permitted to produce XHTML5 which, if consumed as HTML, fails to be parced as UTF-8.  Likewises, I don't promote that <!DOCTYPE html> should be _required_ in XHTML5. 

I only want a rule which promotes a HTML5-friendly DOCTYPE if and when a DOCTYPE use used. The HTML5 spec is the logical place to say this. It can not be express in a DTD, since DTD does not govern how DTD-less DOCTYPE declartions look like.

If you still are last, then please ask some questions of critical - or whatever - nature, so that I can try to understand where you fall of the hook.
Comment 8 Leif Halvard Silli 2013-03-22 04:40:58 UTC
Btw - XHTML5 is not the only XML format that use a DTD-free DOCTYPE. Two other formats are the 'Qt Linguist Phrase Book' (<!DOCTYPE TS>) and 'Qt translations sources' (<!DOCTYPE TS>). The Qt Linguist application will handle e.g. <!DOCTYPE ROBIN>. However, it will rectify it to e.g. <!DOCTYPE TS>.
Comment 9 Leif Halvard Silli 2013-03-22 04:44:43 UTC
(In reply to comment #8)

> the 'Qt Linguist Phrase Book' (<!DOCTYPE TS>)

Sorry. Should have been: <!DOCTYPE QPH>