This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 16949 - XHTML syntax description is lacking or misphrased
Summary: XHTML syntax description is lacking or misphrased
Status: RESOLVED NEEDSINFO
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-07 04:07 UTC by Roger Olsson
Modified: 2012-11-20 07:59 UTC (History)
2 users (show)

See Also:


Attachments

Description Roger Olsson 2012-05-07 04:07:27 UTC
13.1 Writing XHTML documents

'The syntax for using HTML with XML, whether in XHTML documents or embedded in other XML documents, is defined in the XML and Namespaces in XML specifications.'

I assume that instead of the HTML syntax, "HTML" refers to the "abstract language" (also called HTML?) here. I guess the reader has to figure out how the abstract language maps to XML. The XML specifications don't give that information, contrary to what this sentence suggests. At least XHTML documents are not valid XML since there is no DTD.

'This specification does not define any syntax-level requirements beyond those defined for XML proper.'

What does "defined for" mean here? The abstract language clearly sets syntax-level requirements (part of the syntax), since any well-formed XML document could otherwise be called XHTML.
Comment 1 contributor 2012-07-18 17:25:17 UTC
This bug was cloned to create bug 18141 as part of operation convergence.
Comment 2 Ian 'Hixie' Hickson 2012-07-23 05:16:33 UTC
Can you elaborate on what kinds of requirements or definitions you're looking for here? Maybe it would help if you could point to the equivalent text for, say, MathML, SVG, or earlier versions of XHTML.
Comment 3 Michael[tm] Smith 2012-07-30 15:16:53 UTC
Here's an attempt at getting something everybody could agree on:

"The syntax-level contraints for content written in the HTML language and served with an XML MIME type -- whether in XHTML documents or embedded in other XML documents -- are defined in the XML and Namespaces in XML specifications (just as the syntax-level for MathML and SVG content served with an XML MIME type are defined in the XML and Namespaces in XML specifications, not in the MathML and SVG specifications)."

Maybe the parenthetical part about MathML and SVG is overkill, but I think the comparison helps in getting readers to understand why we're not (re)defining those syntax-level contraints in the HTML spec itself.
Comment 4 Roger Olsson 2012-09-06 23:39:56 UTC
Yes, the XHTML syntax is the abstract language's syntax (element and attribute names, content models etc.) combined with XML's syntax (the lexical grammar, all additional XML markup and constraints). This is the idea that the XHTML syntax definition should convey. You can't define XHTML by saying that it's defined in the XML specs. It's just a logical glitch in the current sentence.

This is from section 12.1.2: 'The exact allowed contents of each individual element depend on the content model of that element, as described earlier in this specification. Elements must not contain content that their content model disallows.'

I don't see how that wouldn't apply to XHTML too, exactly as it reads.

Also, if you say that the HTML syntax is 'known simply as "HTML"', it's misleading to state here that you can 'use HTML with XML', as if embedding HTML syntax inside XML.

For comparison, the SVG specification describes its relation to XML by defining itself as an application of XML and the construct 'SVG document fragment' as an XML document (sub-)tree.
Comment 5 Michael[tm] Smith 2012-09-07 02:22:08 UTC
(In reply to comment #4)
> Yes, the XHTML syntax is the abstract language's syntax (element and attribute
> names, content models etc.)

Content models are not syntax.

> combined with XML's syntax (the lexical grammar,
> all additional XML markup and constraints). This is the idea that the XHTML
> syntax definition should convey. You can't define XHTML by saying that it's
> defined in the XML specs. It's just a logical glitch in the current sentence.

The HTML spec doesn't "define XHTML by saying that it's defined in the XML specs". Instead it defines XHTML. Defining the syntax is just one part of defining what XHTML is. And for defining the syntax, it references the XML and Namespaces in XML specs.

> This is from section 12.1.2: 'The exact allowed contents of each individual
> element depend on the content model of that element, as described earlier in
> this specification. Elements must not contain content that their content model
> disallows.'
> 
> I don't see how that wouldn't apply to XHTML too, exactly as it reads.

It applies to XHTML because like the vast majority of other statements it the spec, it applies to any document that's in the HTML namespace -- regardless of the syntax. Content models are not syntax.

That said, there are some differences between allowed content models in the HTML syntax and the XHTML syntax -- for example, the noscript element is not allowed in XHTML documents -- but those exceptions are all clearly identified in the spec. And regardless, those have nothing to do with defining the syntax (which is what your bug is asking for). 

> Also, if you say that the HTML syntax is 'known simply as "HTML"', it's
> misleading to state here that you can 'use HTML with XML', as if embedding HTML
> syntax inside XML.
> 
> For comparison, the SVG specification describes its relation to XML by defining
> itself as an application of XML and the construct 'SVG document fragment' as an
> XML document (sub-)tree.

I guess that's one way to do it, but it's not the only possible way.

Can you please respond about whether the text I proposed in comment #3 is acceptable to you or not? Here it is again:

"The syntax-level contraints for content written in the HTML language and
served with an XML MIME type -- whether in XHTML documents or embedded in other
XML documents -- are defined in the XML and Namespaces in XML specifications."
Comment 6 Ian 'Hixie' Hickson 2012-11-20 07:59:32 UTC
As far as I can tell, words to that effect are already in the spec:
http://www.whatwg.org/specs/web-apps/current-work/#writing-xhtml-documents

Those are the words quoted in comment 0.

I don't understand the problem here.

> I guess the reader has to figure out how the abstract language maps to XML. The 
> XML specifications don't give that information, contrary to what this sentence 
> suggests.

Could you give an example of another spec that does this to your satisfaction? For example, where in SVG is this mapping given? Where is the mapping defined for going from XML to DOM and DOM to XML?