This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
http://validator.w3.org/check?uri=http%3A%2F%2Fcsszengarden.com%2F&charset=%28detect+automatically%29&doctype=HTML5&group=0 csszengarden.com is text/html, hence should use the HTML5 parser -- not the XML parser.
Ack. Thanks for the report Simon. Using debug mode shows validator choosing HTML5+XML based on "doctype" decision factor, which is a little odd. http://validator.w3.org/check?uri=http%3A%2F%2Fcsszengarden.com%2F&doctype=HTML5&debug=1 Nevertheless, this is likely to be a bit of a pain, since the criteria to determine "is it XML" are different between html5 (media type) and the rest of the HTML family (media type, doctype, xml declaration...)
I thought the rest of the HTML family also decided exclusively on media type -- at least that's what Opera, Firefox and WebKit have implemented. It would be nice if the validator said which parser it used and which MIME type it got, and further it would be nice to have a parser override. (But maybe these things should be in separate reports.)
(In reply to comment #2) > I thought the rest of the HTML family also decided exclusively on media type -- > at least that's what Opera, Firefox and WebKit have implemented. Basically, browsers and a validator are different classes of products. While it may be true that browsers can simply decide to "parse as html" when receiving text/html, for a validator there is no such thing as "parse as html" (or at least there wasn't before html5). Why? Because browsers don't use DTDs or any kind of schema. On the other hand, that's precisely what validators do. For anything before html5, a validator had a choice between SGML (for HTML4.01 and below) and XML (for XHTML 1.0 and up). The XHTML DTDs are XML DTDs, and XHTML documents MUST be parsed with an XML DTD validator. Try validating XHTML with an SGML validator and it will 1) probably crash (or at least puke) because XML DTDs are different from SGML DTDs 2) complain about all the XML-ish constructs such as <br /> (because, in SGML, such constructs have a completely different meaning than in XML) 3) completely ignore issues about missing closing tags, etc (because, in SGML/HTML, omitting closed tags is OK, whereas it is NOT in XML) etc. I somehow wish the HTML working-group of old had been clearer about this: there has been much confusion and frustration, especially since Steven's infamous message on the topic: http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html Anyay, here's hoping that HTML5 can be/remain clearer on that matter. > It would be nice if the validator said which parser it used and which MIME type > it got, and further it would be nice to have a parser override. (But maybe > these things should be in separate reports.) the &debug=1 parameter does just that. It's not shown by default, for the sake of trying to keep the UI not-too-complicated.
The validator *could* say that "well, this isn't XML, so I'm going to validate as HTML 4.01 instead" (or refuse to validate). But I digress. &debug=1 is good to know... However, my request still stands. There's no parser override. I believe it's quite possible to provide parser and MIME information while keeping the UI not-too-complicated.
Filed bug 6298 and bug 6299.
(In reply to comment #4) > The validator *could* say that "well, this isn't XML, so I'm going to validate > as HTML 4.01 instead" (or refuse to validate). But I digress. I think it would be counter productive to do that. If I am declaring XHTML 1.0, authoring valid XHTML 1.0 but serving it as text/html because I can't change my server config (the case for most people) or because I want the pages to show in IE6, I wouldn't want the validator to tell me "you suck! this is not valid HTML4.01". A bad way to alienate people, IMHO > &debug=1 is good to know... However, my request still stands. There's no parser > override. I believe it's quite possible to provide parser and MIME information > while keeping the UI not-too-complicated. Why not. I was about to ask for separate bugzilla items, but see you did that already. Thanks! Patches and/or UI suggestions welcome, too.
A slightly different scenario but probably the same bug: text/html HTML5 with xmlns declaration is misvalidated as XHTML5. http://validator.w3.org/check?uri=http%3A%2F%2Fwww.aneventapart.com%2F&debug=1
> > Nevertheless, this is likely to be a bit of a pain, since the criteria to > determine "is it XML" are different between html5 (media type) and the rest of > the HTML family (media type, doctype, xml declaration...) > I don't see how that is relevant. Any web page using the new HTML doctype (aka the HTML5 doctype "<!DOCTYPE html>") should be passed over to the Validator.nu side of the W3C's validator. It's the Validator.nu that should be "deciding" if the document is HTML5 or XHTML5, *not* the W3C's validator. There has been hundreds of hours put in to programming the validator.nu, it works perfectly on http://validator.nu. There's no need for the W3C's validator to mimic all those algorithms. Is that what's being suggested, or am I missing something here? Here is all the W3C's validator needs to do. if document is normal HTML4/XHTML1 etc do the normal validator.w3.org stuff if document is using HTML5 doctype send it over to the validator.nu part of the validator to sort out. Then the validator.nu part of the W3C's validator can determine whether it is HTML5 or XHTML5 based on the mime type, file ext etc. I was under the impression that this was the way it was set up already. Of course, there's the issue of XHTML5 without a doctype, but I'll comment on that on the other bug report. :)
(In reply to comment #8) > I don't see how that is relevant. If it is so obvious, you could provide an obvious patch? > Any web page using the new HTML doctype (aka > the HTML5 doctype "<!DOCTYPE html>") should be passed over to the Validator.nu > side of the W3C's validator. Correct.
(In reply to comment #9) > (In reply to comment #8) > > I don't see how that is relevant. > > If it is so obvious, you could provide an obvious patch? forgot a " ;) " working on a patch, FWIW...
Working on dev now. http://qa-dev.w3.org/wmvs/HEAD/check?uri=http%3A%2F%2Fwww.aneventapart.com%2F
(In reply to comment #9) > (In reply to comment #8) > > I don't see how that is relevant. > > If it is so obvious, Sorry, I wasn't suggesting it was obvious, just trying to come up with an algorithm. I was just a bit concerned over how the validator.w3.org and the validator.nu work together, that's all. :-) but it looks like it's coming together well. > you could provide an obvious patch? Well, if you can show me how to set up the validator on Ubuntu or windows XP (I've tried before, but with mixed results) I'll have a go at writing a patch for the next bug I find :) Keep up the good work
The implementation in CVS used the "textarea input" mode for validator.nu without passing the "parser" parameter, which is against a "should" in upstream docs. I changed it to use the "HTTP entity body" mode which solves this problem as well as allows in non-doctype-override, non-charset-override scenarios us to pass the original document and its content-type and charset intact to validator.nu. As a side effect, this also allows us to use gzipped requests to validator.nu with libwww-perl which is recommended by upstream docs. http://www.w3.org/mid/E1LJZgk-0007br-NR%40lionel-hutz.w3.org
(In reply to comment #13) > The implementation in CVS used the "textarea input" mode for validator.nu > without passing the "parser" parameter, which is against a "should" in upstream > docs. I changed it to use the "HTTP entity body" mode Good catch!
Thanks. One correction, just for the record: (In reply to comment #13) > gzipped requests to validator.nu with libwww-perl which is recommended by > upstream I seem to have confused this with the upstream recommendation to take advantage of compressed _responses_; the docs are more neutral about request compression. Well, we have both in CVS now anyway :)