This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The validator fails to handle comments including meta tags like this <!DOCTYPE html> <title></title> <!-- <meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS"> --> After validated, This text will be changed to <!DOCTYPE html> <title></title> <!-- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><!-- <meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS"> --> --> The error message is following Consecutive hyphens did not terminate a comment. -- is not permitted inside a comment, but e.g. - - is.
Thanks for reporting this. It seems to be a bug in the legacy validator code, because you're using http://validator.w3.org/ to check your document. I recommend that you don't use that but instead use http://validator.w3.org/nu/ The code for the legacy validator at http://validator.w3.org/ is not code I work on but as far as I can see it appears to be doing some pre-processing on the input it sends to the HTML5 backend. The way to avoid that broken preprocessing is to instead use the UI at http://validator.w3.org/nu/ which sends the input to the same HTML5 backend but without doing any preprocessing. I'll try to get this bug in the legacy validator fixed but I think it's likely I'll do that simply by having the request redirected to http://validator.w3.org/nu/ so that the preprocessing gets bypassed completely.
Ville, I can reproduce this bug with the doctype set to HTML4, so this isn't a bug in the HTML5 backend or in the REST API for the HTML5 validator but rather it seems a bug in some preprocessing step in the validator perl code.
That's right, I took a brief look at it too. The culprit is override_charset() which does a simple text replacement without any "intelligence" whatsoever and thus has no idea about the context it is working in. I'm afraid it'll take more than a few lines of code to fix this.