Re: ISSUE-54: doctype-legacy-compat

Ian Hickson wrote:
> On Sun, 25 Jan 2009, Sam Ruby wrote:
>> Again, it is worth repeating that Venus produces a file.  Whether that 
>> file is later served as text/html or as application/xhtml+xml is 
>> something the person who uses Venus decides.
> 
> XML and text/html have differences that go far beyond mere syntax. When 
> you produce XML or text/html, you need to know which it is so that you can 
> output the right markup. The way nodes are exposed in the DOM, CSS rules 
> around the <body> and <tbody> elements, features like <noscript>, all 
> depend whether the document is XML or text/html.
> 
> It's possible to output a polyglot document that is valid both as XHTML5 
> in XML and HTML5 in text/html, but it requires care and discipline. (If 
> anything, this should be considered a third language and API set, stricter 
> than either of the other two.) One of the rules for making polyglot 
> documents is that one must output <!DOCTYPE HTML>, which is allowed in 
> both. (Other rules include being careful about using the /> form, being 
> careful about namespace declarations, being careful about xml:lang/lang, 
> being careful with script and CSS, etc.)

EXACTLY!(*)  THANK YOU!

Wile there are BIG PROBLEMS in theory and in general, when you limit the 
scope to things that (a) pass through a sanitizer, and (b) are the 
subset of things that one would reasonably expect to appear within an 
<article>, the problems are considerably more manageable.

I would like to stress that the use case is an application like Venus 
which produces files which are to be served later.  By the definition of 
HTML 5 (note the space), these files are neither XHTML5 nor HTML5; such 
a distinction would depend on how these files are served over HTTP.

And I'd like to repeat the point I made earlier: the one remaining thing 
that would make this use case less difficult to implement is permitting 
<meta charset> to appear in XHTML5, making it clear that user agents are 
to ignore such, and that it is non-conforming to specify a charset that 
differs from the one that an XML processor would associate with this 
document.  I too often get bug reports that there are occasionally 
'funny characters' in the output, which is the result of people not 
setting their content-type correctly.

- Sam Ruby

(*) OK, not exactly.  I would argue for a lowercase 'html'.  Given that 
this is likely to be a point of confusion, I prefer the way the WHATWG 
FAQ explains this over the way the current editor's draft does, namely 
the example itself utilizes a lowercase html.  People *do* tend to 
copy/paste examples, often without reading the surrounding text 
adequately, or even at all.

Received on Monday, 26 January 2009 10:27:54 UTC