This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 15831 - validator prevents XHTML5 from containing XML declaration
Summary: validator prevents XHTML5 from containing XML declaration
Status: RESOLVED FIXED
Alias: None
Product: HTML Checker
Classification: Unclassified
Component: General (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael[tm] Smith
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-01 16:50 UTC by Garret Wilson
Modified: 2015-08-23 07:07 UTC (History)
1 user (show)

See Also:


Attachments
Start of an essay I wrote years ago, illustrating this XHTML5 validation issue. (3.47 KB, text/html)
2012-02-01 16:50 UTC, Garret Wilson
Details

Description Garret Wilson 2012-02-01 16:50:02 UTC
Created attachment 1074 [details]
Start of an essay I wrote years ago, illustrating this XHTML5 validation issue.

I have a file reflection.html that is an XHTML5 file. Accordingly, I have an XML header:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
...

The validator screams:

Line 1, Column 2: Saw <?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.)

As I understand, HTML5 allows representation in a true application/xhtml+xml file. The http://validator.nu/ validator validates this file just fine (if I select "XHTML5".

(The W3C validator doesn't have an "XHTML5" option. What's crazy about this situation is that the whole point of an XML declaration is to indicate that the file is XML. Therefore, if I choose "HTML5" and the validator sees an XML declaration, is it really a leap to think that maybe I'm validating an XHTML5 file?" Or maybe if it sees an XML declaration and then a "<!DOCTYPE html>", it would just know I'm validating an XHTML5 file? Isn't that the whole reason we have an XML declaration and a DOCTYPE declaration?)
Comment 1 Michael[tm] Smith 2012-02-03 00:40:27 UTC
(In reply to comment #0)
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE html>
> <html xmlns="http://www.w3.org/1999/xhtml">
> ...
> Line 1, Column 2: Saw <?. Probable cause: Attempt to use an XML processing
> instruction in HTML. (XML processing instructions are not supported in HTML.)
> 
> As I understand, HTML5 allows representation in a true application/xhtml+xml
> file. The http://validator.nu/ validator validates this file just fine (if I
> select "XHTML5".

Yeah, that's expected because it's actually parsing the document as XML instead of as text/html.

> (The W3C validator doesn't have an "XHTML5" option.

Yep. To address that and other problems, we've been working on setting up a separate standalone instance at W3C of  a validator based on the validator.nu backend. It will expose the same options that are exposed at the http://validator.nu site. It's taken a little longer than anticipated to get it launched, but it should be live within the next two weeks.
Comment 2 Garret Wilson 2012-02-03 15:59:21 UTC
What I don't get is that my document is arguably more compliant with W3C specifications than any of the other documents that validate just fine. If someone were to ask me, "if I were to follow W3C best practices as much as possible," what would I tell them---wouldn't it be to make your document HTML5 compliant *and* XML compliant?

Why is it, then, that the documents that most closely follow W3C recommendations are the last ones to validate correctly on the W3C validator? And I still don't understand why it's so hard to validate---XML is not a new technology by any stretch of the imagination.

Shouldn't documents that most closely follow W3C recommendations be the first ones to validate properly? Isn't HTML5 with XML compliance better than HTML5 without XML compliance?
Comment 3 Michael[tm] Smith 2012-02-04 07:17:14 UTC
(In reply to comment #2)
> What I don't get is that my document is arguably more compliant with W3C
> specifications than any of the other documents that validate just fine. If
> someone were to ask me, "if I were to follow W3C best practices as much as
> possible," what would I tell them---wouldn't it be to make your document HTML5
> compliant *and* XML compliant?

No, not necessarily. While it may be that case that the W3C organizationally took the position in the past that it was a best practice to make your documents XML-compliant, I don't think the W3C takes that position now. I don't at least. And the HTML5 spec does not take any position on it either. Documents can be fully valid and "good" according to the HTML5 spec without also needing to be XML compliant.

In fact because most documents on the Web are served with a text/html MIME type and not with an XML MIME type, a more realistic best practice to encourage authors in general to follow is to make sure that their documents are valid text/html documents. But making a document that is both a valid text/html document and also XML compliant can actually be difficult. You may already be familiar with the guide we have published on how to do that:

http://dev.w3.org/html5/html-xhtml-author-guide/

If you've read through that document you know there are a lot of "gotchas" that can cause problems in how your documents are processed when you author them as well-formed XML but serve them as text/html.

The case of authoring documents as XML and also serving them with an XML MIME type is of course a lot less error-prone. But the reality of the Web is that far few people actually do that.

So at the time when the HTML5-checking feature was added to the current validator, I guess it made more sense to have that option be for HTML5 and not for XHTML5. But I don't know because I was not involved in that decision and in fact I'm not really involved at all with work on the current validator. I only work on it indirectly, by maintaining the part of it that provides the HTML5-checking feature.

> Why is it, then, that the documents that most closely follow W3C
> recommendations are the last ones to validate correctly on the W3C validator?
> And I still don't understand why it's so hard to validate---XML is not a new
> technology by any stretch of the imagination.
> 
> Shouldn't documents that most closely follow W3C recommendations be the first
> ones to validate properly? Isn't HTML5 with XML compliance better than HTML5
> without XML compliance?

No, it's not better. It's not worse either. But it's also not what most people are doing. That is, most documents on the Web are not well-formed XML documents. Many documents on the Web that claim to be XHTML documents are in fact not well-formed XML documents. The only reason they work correctly in browsers is that they're being served with a text/html MIME type. Given that it makes some sense to focus on providing text/html checking as the first choice.

But anyway, we really don't need for the service to take sides either way, and the current validator mostly does not. What I mean is, the current validator does actually already do the right thing for XHTML5 documents if, instead of using the "Validate by direct input" option, you just give it the URL of an XHTML5 document that's being served with an XML MIME. That is, it correctly recognizes your document as XHTML5. So the support is already there; the only thing that's missing is it doesn't expose that option for the "Validate by direct input" case.

The history behind the HTML5-checking feature in the current validator is that it was kind of just bolted on to the existing service as a way to make HTML5 checking available through the same user interface in the same place as the current validator. And it has served that purpose OK. And while they could also have bolted on XHTML5 checking for direct input at the time when HTML5 checking was added, they didn't, and here we are now. We could now also bolt on XHTML5 checking for direct input but I don't think that's the right way forward. The better way is to provide an additional service that exposes all the right options in the right way. And that is what I have been working on and what we will be launching very soon. So please wait for the announcement about that.

In the mean time, we have a pre-production version of that service available here:

http://www.w3.org/html/check

That gives you all the same options as the validator.nu UI does. In fact it the core part of it is exactly the same UI as validator.nu -- just with some W3C branding wrapped around it.
Comment 4 Michael[tm] Smith 2012-02-08 12:51:39 UTC
The Nu Markup Validation Service provides full XHTML5 checking
http://validator.w3.org/nu/

That is now the preferred service for checking XHTML5 and HTML5 documents.