<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>15831</bug_id>
          
          <creation_ts>2012-02-01 16:50:02 +0000</creation_ts>
          <short_desc>validator prevents XHTML5 from containing XML declaration</short_desc>
          <delta_ts>2015-08-23 07:07:49 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML Checker</product>
          <component>General</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Garret Wilson">garret</reporter>
          <assigned_to name="Michael[tm] Smith">mike+validator</assigned_to>
          <cc>mike</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>63536</commentid>
    <comment_count>0</comment_count>
      <attachid>1074</attachid>
    <who name="Garret Wilson">garret</who>
    <bug_when>2012-02-01 16:50:02 +0000</bug_when>
    <thetext>Created attachment 1074
Start of an essay I wrote years ago, illustrating this XHTML5 validation issue.

I have a file reflection.html that is an XHTML5 file. Accordingly, I have an XML header:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;!DOCTYPE html&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
...

The validator screams:

Line 1, Column 2: Saw &lt;?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.)

As I understand, HTML5 allows representation in a true application/xhtml+xml file. The http://validator.nu/ validator validates this file just fine (if I select &quot;XHTML5&quot;.

(The W3C validator doesn&apos;t have an &quot;XHTML5&quot; option. What&apos;s crazy about this situation is that the whole point of an XML declaration is to indicate that the file is XML. Therefore, if I choose &quot;HTML5&quot; and the validator sees an XML declaration, is it really a leap to think that maybe I&apos;m validating an XHTML5 file?&quot; Or maybe if it sees an XML declaration and then a &quot;&lt;!DOCTYPE html&gt;&quot;, it would just know I&apos;m validating an XHTML5 file? Isn&apos;t that the whole reason we have an XML declaration and a DOCTYPE declaration?)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>63617</commentid>
    <comment_count>1</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2012-02-03 00:40:27 +0000</bug_when>
    <thetext>(In reply to comment #0)
&gt; &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&gt; &lt;!DOCTYPE html&gt;
&gt; &lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
&gt; ...
&gt; Line 1, Column 2: Saw &lt;?. Probable cause: Attempt to use an XML processing
&gt; instruction in HTML. (XML processing instructions are not supported in HTML.)
&gt; 
&gt; As I understand, HTML5 allows representation in a true application/xhtml+xml
&gt; file. The http://validator.nu/ validator validates this file just fine (if I
&gt; select &quot;XHTML5&quot;.

Yeah, that&apos;s expected because it&apos;s actually parsing the document as XML instead of as text/html.

&gt; (The W3C validator doesn&apos;t have an &quot;XHTML5&quot; option.

Yep. To address that and other problems, we&apos;ve been working on setting up a separate standalone instance at W3C of  a validator based on the validator.nu backend. It will expose the same options that are exposed at the http://validator.nu site. It&apos;s taken a little longer than anticipated to get it launched, but it should be live within the next two weeks.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>63644</commentid>
    <comment_count>2</comment_count>
    <who name="Garret Wilson">garret</who>
    <bug_when>2012-02-03 15:59:21 +0000</bug_when>
    <thetext>What I don&apos;t get is that my document is arguably more compliant with W3C specifications than any of the other documents that validate just fine. If someone were to ask me, &quot;if I were to follow W3C best practices as much as possible,&quot; what would I tell them---wouldn&apos;t it be to make your document HTML5 compliant *and* XML compliant?

Why is it, then, that the documents that most closely follow W3C recommendations are the last ones to validate correctly on the W3C validator? And I still don&apos;t understand why it&apos;s so hard to validate---XML is not a new technology by any stretch of the imagination.

Shouldn&apos;t documents that most closely follow W3C recommendations be the first ones to validate properly? Isn&apos;t HTML5 with XML compliance better than HTML5 without XML compliance?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>63664</commentid>
    <comment_count>3</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2012-02-04 07:17:14 +0000</bug_when>
    <thetext>(In reply to comment #2)
&gt; What I don&apos;t get is that my document is arguably more compliant with W3C
&gt; specifications than any of the other documents that validate just fine. If
&gt; someone were to ask me, &quot;if I were to follow W3C best practices as much as
&gt; possible,&quot; what would I tell them---wouldn&apos;t it be to make your document HTML5
&gt; compliant *and* XML compliant?

No, not necessarily. While it may be that case that the W3C organizationally took the position in the past that it was a best practice to make your documents XML-compliant, I don&apos;t think the W3C takes that position now. I don&apos;t at least. And the HTML5 spec does not take any position on it either. Documents can be fully valid and &quot;good&quot; according to the HTML5 spec without also needing to be XML compliant.

In fact because most documents on the Web are served with a text/html MIME type and not with an XML MIME type, a more realistic best practice to encourage authors in general to follow is to make sure that their documents are valid text/html documents. But making a document that is both a valid text/html document and also XML compliant can actually be difficult. You may already be familiar with the guide we have published on how to do that:

http://dev.w3.org/html5/html-xhtml-author-guide/

If you&apos;ve read through that document you know there are a lot of &quot;gotchas&quot; that can cause problems in how your documents are processed when you author them as well-formed XML but serve them as text/html.

The case of authoring documents as XML and also serving them with an XML MIME type is of course a lot less error-prone. But the reality of the Web is that far few people actually do that.

So at the time when the HTML5-checking feature was added to the current validator, I guess it made more sense to have that option be for HTML5 and not for XHTML5. But I don&apos;t know because I was not involved in that decision and in fact I&apos;m not really involved at all with work on the current validator. I only work on it indirectly, by maintaining the part of it that provides the HTML5-checking feature.

&gt; Why is it, then, that the documents that most closely follow W3C
&gt; recommendations are the last ones to validate correctly on the W3C validator?
&gt; And I still don&apos;t understand why it&apos;s so hard to validate---XML is not a new
&gt; technology by any stretch of the imagination.
&gt; 
&gt; Shouldn&apos;t documents that most closely follow W3C recommendations be the first
&gt; ones to validate properly? Isn&apos;t HTML5 with XML compliance better than HTML5
&gt; without XML compliance?

No, it&apos;s not better. It&apos;s not worse either. But it&apos;s also not what most people are doing. That is, most documents on the Web are not well-formed XML documents. Many documents on the Web that claim to be XHTML documents are in fact not well-formed XML documents. The only reason they work correctly in browsers is that they&apos;re being served with a text/html MIME type. Given that it makes some sense to focus on providing text/html checking as the first choice.

But anyway, we really don&apos;t need for the service to take sides either way, and the current validator mostly does not. What I mean is, the current validator does actually already do the right thing for XHTML5 documents if, instead of using the &quot;Validate by direct input&quot; option, you just give it the URL of an XHTML5 document that&apos;s being served with an XML MIME. That is, it correctly recognizes your document as XHTML5. So the support is already there; the only thing that&apos;s missing is it doesn&apos;t expose that option for the &quot;Validate by direct input&quot; case.

The history behind the HTML5-checking feature in the current validator is that it was kind of just bolted on to the existing service as a way to make HTML5 checking available through the same user interface in the same place as the current validator. And it has served that purpose OK. And while they could also have bolted on XHTML5 checking for direct input at the time when HTML5 checking was added, they didn&apos;t, and here we are now. We could now also bolt on XHTML5 checking for direct input but I don&apos;t think that&apos;s the right way forward. The better way is to provide an additional service that exposes all the right options in the right way. And that is what I have been working on and what we will be launching very soon. So please wait for the announcement about that.

In the mean time, we have a pre-production version of that service available here:

http://www.w3.org/html/check

That gives you all the same options as the validator.nu UI does. In fact it the core part of it is exactly the same UI as validator.nu -- just with some W3C branding wrapped around it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>63887</commentid>
    <comment_count>4</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2012-02-08 12:51:39 +0000</bug_when>
    <thetext>The Nu Markup Validation Service provides full XHTML5 checking
http://validator.w3.org/nu/

That is now the preferred service for checking XHTML5 and HTML5 documents.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>1074</attachid>
            <date>2012-02-01 16:50:02 +0000</date>
            <delta_ts>2012-02-01 16:50:02 +0000</delta_ts>
            <desc>Start of an essay I wrote years ago, illustrating this XHTML5 validation issue.</desc>
            <filename>reflection.html</filename>
            <type>text/html</type>
            <size>3549</size>
            <attacher name="Garret Wilson">garret</attacher>
            
              <data encoding="base64">PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4NCjwhRE9DVFlQRSBodG1sPg0K
PGh0bWwgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0bWwiPg0KPGhlYWQ+DQoJPHRp
dGxlPkphdmEgUmVmbGVjdGlvbjwvdGl0bGU+DQo8L2hlYWQ+DQo8Ym9keT4NCg0KPGJsb2NrcXVv
dGU+VGhlcmUgb25jZSB3YXMgYSBndXkgd2l0aCBhIGJlYW48YnIvPg0KQXMgZm9yIHVzZSwgaXQg
cmVtYWluZWQgdG8gYmUgc2Vlbjxici8+DQpXaXRoIGEgc2hvdXQsIHRoaXMgZ3V5IGNyaWVkLDxi
ci8+DQomcXVvdDtJIGNhbiBsb29rIGRvd24gaW5zaWRlPGJyLz4NCmFuZCBzZWUgcHJvdGVpbnMg
YW5kIG90aGVyIGNvb2wgdGhpbmdzISZxdW90Ozxici8+DQooPGVtPlVtbS4uLiBBbm9ueW1vdXM8
L2VtPik8L2Jsb2NrcXVvdGU+DQoNCjxoMj5JbnRyb2R1Y3Rpb248L2gyPg0KDQo8cD5XaGF0ZXZl
ciBhbm9ueW1vdXMgZm9vbCB3cm90ZSB0aG9zZSBsaW5lcyB3YXMgdHJ5aW5nIHRvIGdldCBhY3Jv
c3MgdGhlIHBvaW50IHRoYXQgb25lIGNhbm5vdCBhY3R1YWxseSBsb29rIGluc2lkZSBhIGJlYW4g
YW5kIHNlZSBhbnl0aGluZyB0aGF0IGxvb2tzIHJlbW90ZWx5IGxpa2UgYW55dGhpbmcgZXhjZXB0
IGEgYmVhbi4gSnVzdCBhIGJlYW4uIFN1cmUsIHdlIGtub3cgdGhhdCB0aGVyZSBhcmUgcHJvdGVp
bnMgc29tZXdoZXJlIGluIHRoZXJlLCBtYXliZSBzb21lIG90aGVyIHN0dWZmIHRoYXQgeW91IGNh
biByZWFkIGFib3V0IGluIGFueSBoaWdoIHNjaG9vbCBiaW9sb2d5IHRleHRib29rLCBidXQgZm9y
IHRoZSBtb3N0IHBhcnQgeW91IGNhbid0IHNlZSBhbnl0aGluZyBpbnNpZGUgYSBiZWFuLiBVbmxl
c3MgeW91J3JlIGEgc2NpZW50aXN0IHdpdGggc3BlY2lhbCBlcXVpcG1lbnQsIG9mIGNvdXJzZS48
L3A+DQoNCjxwPlRvbyBiYWQgdGhlIGJlYW4gY291bGRuJ3QganVzdCB0ZWxsIHVzIHdoYXQgaXQg
aGFkIGluc2lkZSBpdC4gQmV0dGVyIHlldCwgaWYgYWxsIGZvb2QgY291bGQganVzdCB0YWxrLCB3
ZSB3b3VsZG4ndCBuZWVkIHRvIGhhdmUgbnV0cml0aW9uYWwgbGFiZWxzLiBKdXN0IGFzayB0aGUg
Zm9vZCBpdHNlbGYuPC9wPg0KDQo8cD5TbyBsZXQncyBnZXQgYmFjayB0byBKYXZhLiBUaGUgSmF2
YSAxLjEgc3BlY2lmaWNhdGlvbiBpbnRyb2R1Y2VkIG1hbnkgbmljZSBuZXcgZmVhdHVyZXMsIG1h
bnkgd2l0aCBuaWNlIG5ldyBiaWcgd29yZHMgdGhhdCBhcmUgc29tZXRpbWVzIHRocm93biBhcm91
bmQgYW5kIHJlbGVnYXRlZCB0byBzcGVjaWFsIGNoYXB0ZXJzIGluIHJldmlzaW9ucyBvZiBiZXN0
LXNlbGxpbmcgSmF2YSBib29rcy4gT25lIG9mIHRob3NlIG5ldyBmZWF0dXJlcyBpcyBvbmUgd2Un
bGwgYmUgZGlzY3Vzc2luZyBoZXJlLCBjYWxsZWQgPGVtPnJlZmxlY3Rpb248L2VtPi4gUmVmbGVj
dGlvbiBpcyBub3QgZGlmZmljdWx0IHRvIHVuZGVyc3RhbmQsIGFuZCBxdWl0ZSB1c2VmdWwuIFlv
dSBtYXkgc2VsZG9tIHVzZSBpdCBkaXJlY3RseSwgYnV0IGl0IGJyaW5ncyBhIGhvc3Qgb2YgbmV3
IGNhcGFiaWxpdGllcyB0aGF0LCBieSB0aGUgZW5kIG9mIHRoaXMgbGVzc29uLCB5b3UnbGwgYWdy
ZWUgYXJlIGludmFsdWFibGUgZXZlbiBpbiBkYXktdG8tZGF5IHByb2dyYW1taW5nLjwvcD4NCg0K
PHA+SGVyZSdzIHdoYXQncyBleHBlY3RlZCBvZiB5b3UgYmVmb3JlIHlvdSBiZWdpbiB0aGlzIGxl
c3NvbjogZmlyc3QsIHlvdSBzaG91bGQga25vdyBzb21ldGhpbmcgYWJvdXQgWE1MLCBiZWNhdXNl
IHdlJ3JlIGdvaW5nIHRvIHVzZSByZWZsZWN0aW9uIHRvIGNyZWF0ZSBhIHZlcnkgdXNlZnVsIGNs
YXNzIHRoYXQgd3JpdGVzIEphdmEgb2JqZWN0cyB0byBYTUwgYXV0b21hdGljYWxseS4gQWx0aG91
Z2ggWE1MIGlzIHJlbGF0aXZlbHkgbmV3IGFzIG9mIE1heSAxOSwgMTk5OCwgaW4gYSBmZXcgc2hv
cnQgbW9udGhzIGl0IHdpbGwgYmUgbW9yZSB3aWRlbHkga25vd24gYW5kIHVuZGVyc3Rvb2QgdGhh
biBIVE1MIChyZW1lbWJlciwgeW91IGhlYXJkIGl0IGhlcmUgZmlyc3QuKSBOZXZlcnRoZWxlc3Ms
IEknbGwgcHJvdmlkZSBhIGZldyBzaG9ydCBwYXJhZ3JhcGhzIG9uIFhNTCBsYXRlciBvbi48L3A+
DQoNCjxwPlNlY29uZGx5LCB5b3Ugc2hvdWxkIGJlIGF0IGxlYXN0IHJlbGF0aXZlbHkgZmFtaWxp
YXIgd2l0aCB0aGUgSmF2YSBsYW5ndWFnZSBpdHNlbGYuIEV4ZXBlcmllbmNlIHdpdGggc2ltaWxh
ciBwcm9ncmFtbWluZyBsYW5ndWFnZXMsIHN1Y2ggYXMgQysrLCB3b3VsZCBhbHNvIGJlIGJlbmVm
aWNpYWwuPC9wPg0KDQo8aDI+V2hhdCBpcyBSZWZsZWN0aW9uPzwvaDI+DQoNCjxwPlJlZmxlY3Rp
b24gaXMgSmF2YSdzIG5ldyAodmVyc2lvbiAxLjEpIGFiaWxpdHkgdG8gbG9vayBpbnNpZGUgYSBK
YXZhIG9iamVjdCA8ZW0+YXQgcnVudGltZTwvZW0+IGFuZCBzZWUgd2hhdCB2YXJpYWJsZXMgaXQg
Y29udGFpbnMsIHdoYXQgbWV0aG9kcyBpdCBzdXBwb3J0cywgd2hhdCBpbnRlcmZhY2VzIGl0IGlt
cGxlbWVudHMsIHdoYXQgY2xhc3NlcyBpdCBleHRlbmRzIC0tIGJhc2ljYWxseSBldmVyeXRoaW5n
IGFib3V0IHRoZSBvYmplY3QgdGhhdCB5b3Ugd291bGQga25vdyBhdCBjb21waWxlIHRpbWUuPC9w
Pg0KDQo8cD5Ob3csIHNpbmNlIEMrKyB3YXMgaGVyZSBmaXJzdCwgbGV0J3MgdXNlIGEgcXVpY2sg
ZXhhbXBsZSBpbiB0aGF0IGxhbmd1YWdlIHRvIHNob3cgeW91IHRoZSBwcm9ibGVtIGFuZCBob3cg
SmF2YSBoYW5kbGVzIGl0LiBMZXQncyBzdXBwb3NlIHdlIGhhdmUgYSBzaW1wbGUgQysrIGNsYXNz
IGxpa2UgdGhlIGZvbGxvd2luZzo8L3A+DQoNCjxibG9ja3F1b3RlPjxwcmU+PGNvZGU+DQpjbGFz
cyBDTXlDbGFzcw0Kew0KcHJpdmF0ZToNCiAgaW50IHg7DQpwdWJsaWM6DQogIHZvaWQgc2V0WChp
bnQgbmV3WCkge3g9bmV3WDt9DQogIGludCBnZXRYKCkge3JldHVybiB4O30NCn0NCjwvY29kZT48
L3ByZT48L2Jsb2NrcXVvdGU+DQoNCjxwPk9LLCBzbyB0aGF0J3Mgbm90IHNvIGRpZmZlcmVudCBm
cm9tIGEgSmF2YSBjbGFzcy4gSW4gZmFjdCwgaXQncyBhbG1vc3QgZXhhY3RseSB0aGUgc2FtZS4g
QnV0IGxldCdzIGFzc3VtZSBmb3IgYSBtb21lbnQgdGhhdCBpdCdzIGEgQysrIGNsYXNz4oCUaXQn
cyBwcmV0dHkgc2ltcGxlIHRvIHNlZSB3aGF0IHZhcmlhYmxlcyBpdCBoYXMgYW5kIHdoYXQgbWV0
aG9kcyBpdCBwcm92aWRlcy48L3A+DQoNCjxwPldoYXQgaGFwcGVucyB3aGVuIGl0J3MgY29tcGls
ZWQsIGhvd2V2ZXI/IFNpbmNlIHdlIHdhbnQgdG8gcmV1c2UgdGhpcyBjbGFzcyAoYnkgdGhlIHdh
eSwgd2Ugd2FudCB0byByZXVzZSB0aGlzIGNsYXNzKSwgd2UgcHV0IGl0IGluIGFuIG9iamVjdCBm
aWxlIChteWNsYXNzLm9iaiksIG9yIGJldHRlciB5ZXQsIGEgbGlicmFyeSBmaWxlIChteWNsYXNz
LmxpYikuIEl0J3Mgbm90IGFzIGVhc3kgdG8gdGVsbCB3aGF0IOKApjwvcD4NCg0KPHAgY2xhc3M9
ImNvcHlyaWdodCI+Q29weXJpZ2h0IMKpIDE5OTgtMjAwMyBHYXJyZXQgV2lsc29uPC9wPg0KDQo8
L2JvZHk+DQo8L2h0bWw+
</data>

          </attachment>
      

    </bug>

</bugzilla>