This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5992 - Validator ignores HTML5 encoding declaration
Summary: Validator ignores HTML5 encoding declaration
Status: RESOLVED WORKSFORME
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: HEAD
Hardware: All All
: P2 normal
Target Milestone: 0.8.6
Assignee: This bug has no owner yet - up for the taking
QA Contact: qa-dev tracking
URL: http://htmlex.met.cz/
Whiteboard:
Keywords:
: 7135 8678 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-26 06:32 UTC by Martin Hassman
Modified: 2013-04-21 02:44 UTC (History)
10 users (show)

See Also:


Attachments

Description Martin Hassman 2008-08-26 06:32:00 UTC
Seems validator ignores short version of encoding declaration:
<meta charset="utf-8">

Validation of page http://htmlex.met.cz/ gives me 1 warning "No Character Encoding Found! Falling back to UTF-8." Validation with http://html5.validator.nu/ tool gives no warning.

Looks problem is only in "Validate by URI" and "Validate by File Upload". "Validate by Direct input" does produce no warning.
Comment 1 Patrick Bielen 2009-03-19 13:55:12 UTC
(In reply to comment #0)
> Seems validator ignores short version of encoding declaration:
> <meta charset="utf-8">

Indeed... agreed, something is not right in the validator,
i get the same problem.

Best Regards,

Patrick
Comment 2 Ville Skyttä 2009-03-19 23:07:31 UTC
The problem is in the HTML::Encoding perl module used by the validator.  There's a bug report open about it at https://rt.cpan.org/Ticket/Display.html?id=42497
Comment 3 Dean Edridge 2009-03-20 12:53:43 UTC
(In reply to comment #2)
> The problem is in the HTML::Encoding perl module used by the validator. 
> There's a bug report open about it at
> https://rt.cpan.org/Ticket/Display.html?id=42497
> 

I can't see how that can be the problem. There may well be a problem with the HTML::Encoding module, but that shouldn't affect (X)HTML5 validation. AFAICT the W3C's part of the markup validator shouldn't even see the meta charset (<meta charset="utf-8">) part of the webpage, as soon as the validator sees the new HTML doctype (introduced in HTML5 (<!DOCTYPE html>)) it should pass the whole document over to the validator.nu part of the validator for validation and then the validator.nu should decide if the charset is correct or not, not the main W3C validator.
Comment 4 Olivier Thereaux 2009-03-20 14:33:42 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > The problem is in the HTML::Encoding perl module used by the validator. 

> I can't see how that can be the problem. 
[snip]
> as soon as the validator sees the
> new HTML doctype (introduced in HTML5 (<!DOCTYPE html>)) it should pass the
> whole document over to the validator.nu 

The validator 1) needs to know the encoding before it can preparse the document and detect that doctype and 2) needs to know and decode the bytes before it can pass the document to the validator.nu engine. It is not “just” a redirection. 
Comment 5 Dean Edridge 2009-03-22 09:35:24 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > The problem is in the HTML::Encoding perl module used by the validator. 
> 
> > I can't see how that can be the problem. 
> [snip]
> > as soon as the validator sees the
> > new HTML doctype (introduced in HTML5 (<!DOCTYPE html>)) it should pass the
> > whole document over to the validator.nu 
> 
> The validator 1) needs to know the encoding before it can preparse the document
> and detect that doctype and 2) needs to know and decode the bytes before it can
> pass the document to the validator.nu engine. It is not “just” a
> redirection. 
> 

I think problems like this are going to be never ending, therefore I think the W3C should use the validator.nu as for the "front end" of its validation service. Has this been considered before?
Comment 6 Olivier Thereaux 2009-03-22 22:15:21 UTC
(In reply to comment #5)
> I think problems like this are going to be never ending, therefore I think the
> W3C should use the validator.nu as for the "front end" of its validation
> service. Has this been considered before?

This is getting a little OT and would probably be best on the validator list, but yes, this has been considered. 

The validator.nu engine is a wonderful piece of software, in many ways superior to the other engines which validator.w3.org uses. However, IMHO validator.nu is neither stable enough (see e.g http://lists.w3.org/Archives/Public/www-validator/2009Mar/0037.html ) nor flexible enough (limited number of profiles, no DTD support for legacy HTML, etc) nor usable enough (bare bone UI and limited message explanations, no file upload, no direct input, etc) to simply "be" the sole and front engine on validator.w3.org. 

I am quite certain that at this point, having validator.w3.org be a frontend for multiple engines, including OpenSP for DTD and validator.nu for html5 and other applications, is the most desirable architecture.
Comment 7 Oli Studholme 2009-05-07 03:13:21 UTC
For what itfs worth, I wrote up a description of this issue, with some linked reductions:
    http://oli-studio.com/bugs/validator/html5-charset/

It was mainly intended to explain the situation to content creators, and show what combination of character set declaration methods generated no errors.
Comment 8 Ville Skyttä 2009-09-21 18:56:01 UTC
*** Bug 7135 has been marked as a duplicate of this bug. ***
Comment 9 Jill Ramonsky 2009-10-16 08:22:38 UTC
This one is biting me too. Nothing to add, except I'd like to see it fixed soon.
Comment 10 Thomas Traub 2009-12-05 23:55:59 UTC
I encountered the same issue for http://usesthis.com/
Comment 11 Michael[tm] Smith 2009-12-08 09:16:48 UTC
Ville has a new Validator release queued up to deploy, and I think it may contain a fix for this issue. I'll check with him and see.
Comment 12 Ville Skyttä 2009-12-10 19:01:40 UTC
There is no fix for this issue yet.  I have some local prototype level code for this which I'll revisit soon, but it has some showstopper problems (for example it might in some cases affect validation of non-HTML5 HTML documents).  Due to how the validator works at the moment, the fix is not trivial.
Comment 13 Ville Skyttä 2009-12-11 19:13:06 UTC
A fix is now in CVS and available for testing at http://qa-dev.w3.org/wmvs/HEAD/ .

Something weird happens when that (and my local instance) of validator tries to access the HTML5 validator installed locally on http://qa-dev.w3.org:8888/html5/ when validating http://htmlex.met.cz/ .  The error is "Insecure dependency in connect while running with -T switch" and what makes it strange is that interfacing the very same HTML5 validator when checking some other documents (such as the ones from comment 7 and comment 10) works just fine.  As does when the validator is configured to use http://validator.nu/ as its HTML5 validator.  I have no idea how the document to be validated could cause this (it has already been fetched locally, and is about to be POSTed to the same HTML5 instance which works fine for other docs), but I'll try to find out.
Comment 14 Ville Skyttä 2009-12-12 12:51:48 UTC
(In reply to comment #13)
> Something weird happens when that (and my local instance) of validator tries to
> access the HTML5 validator installed locally on
> http://qa-dev.w3.org:8888/html5/ when validating http://htmlex.met.cz/ .

Workaround (but no reason) found and applied, more details at http://rt.cpan.org/Public/Bug/Display.html?id=52707
Comment 15 Ville Skyttä 2010-01-08 21:42:52 UTC
*** Bug 8678 has been marked as a duplicate of this bug. ***
Comment 16 Thomas Traub 2010-01-08 22:01:05 UTC
(In reply to comment #13)
> A fix is now in CVS and available for testing at
> http://qa-dev.w3.org/wmvs/HEAD/ .
> 
This fix works for me, thanks
Comment 17 Ville Skyttä 2010-03-02 19:52:25 UTC
Code fixes are included in 0.8.6 but unfortunately the required HTML::HeadParser >= 3.60 module is not installed on the production validator.w3.org boxes yet.
Comment 18 Ted Guild 2010-03-03 04:03:24 UTC
(In reply to comment #17)
> Code fixes are included in 0.8.6 but unfortunately the required
> HTML::HeadParser >= 3.60 module is not installed on the production
> validator.w3.org boxes yet.

Installed now, sorry for the inconvenience.
Comment 19 Ville Skyttä 2010-03-03 17:17:17 UTC
Thanks, closing.
Comment 20 Sasha Vodnik 2010-06-03 23:46:58 UTC
I just ran into this bug on the production site:
http://validator.w3.org/#validate_by_upload
The validator didn't see my file's <!DOCTYPE html>.
I verified that my code validates at 
http://qa-dev.w3.org/wmvs/HEAD/#validate_by_upload
Is it possible that this bug is fixed for the URI case, but not for uploads?

(In reply to comment #18)
> (In reply to comment #17)
> > Code fixes are included in 0.8.6 but unfortunately the required
> > HTML::HeadParser >= 3.60 module is not installed on the production
> > validator.w3.org boxes yet.
> 
> Installed now, sorry for the inconvenience.
Comment 21 Michael[tm] Smith 2010-06-14 06:51:26 UTC
I changed the category on this because this is not a bug in the validator.nu HTML5-checking backend but instead relates to the Perl code
Comment 22 Michael[tm] Smith 2013-04-21 02:44:06 UTC
Just use http://validator.w3.org/nu/ directly.