<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>5992</bug_id>
          
          <creation_ts>2008-08-26 06:32:00 +0000</creation_ts>
          <short_desc>Validator ignores HTML5 encoding declaration</short_desc>
          <delta_ts>2013-04-21 02:44:06 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Validator</product>
          <component>check</component>
          <version>HEAD</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WORKSFORME</resolution>
          
          
          <bug_file_loc>http://htmlex.met.cz/</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>0.8.6</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Martin Hassman">bugzilla</reporter>
          <assigned_to name="This bug has no owner yet - up for the taking">dave.null</assigned_to>
          <cc>dean</cc>
    
    <cc>jaka</cc>
    
    <cc>jill.ramonsky</cc>
    
    <cc>mike</cc>
    
    <cc>ot</cc>
    
    <cc>pbielen</cc>
    
    <cc>ted</cc>
    
    <cc>thomastraub2000</cc>
    
    <cc>w3.org</cc>
    
    <cc>w3c</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>21651</commentid>
    <comment_count>0</comment_count>
    <who name="Martin Hassman">bugzilla</who>
    <bug_when>2008-08-26 06:32:00 +0000</bug_when>
    <thetext>Seems validator ignores short version of encoding declaration:
&lt;meta charset=&quot;utf-8&quot;&gt;

Validation of page http://htmlex.met.cz/ gives me 1 warning &quot;No Character Encoding Found! Falling back to UTF-8.&quot; Validation with http://html5.validator.nu/ tool gives no warning.

Looks problem is only in &quot;Validate by URI&quot; and &quot;Validate by File Upload&quot;. &quot;Validate by Direct input&quot; does produce no warning.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24327</commentid>
    <comment_count>1</comment_count>
    <who name="Patrick Bielen">pbielen</who>
    <bug_when>2009-03-19 13:55:12 +0000</bug_when>
    <thetext>(In reply to comment #0)
&gt; Seems validator ignores short version of encoding declaration:
&gt; &lt;meta charset=&quot;utf-8&quot;&gt;

Indeed... agreed, something is not right in the validator,
i get the same problem.

Best Regards,

Patrick</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24332</commentid>
    <comment_count>2</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2009-03-19 23:07:31 +0000</bug_when>
    <thetext>The problem is in the HTML::Encoding perl module used by the validator.  There&apos;s a bug report open about it at https://rt.cpan.org/Ticket/Display.html?id=42497</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24344</commentid>
    <comment_count>3</comment_count>
    <who name="Dean Edridge">dean</who>
    <bug_when>2009-03-20 12:53:43 +0000</bug_when>
    <thetext>(In reply to comment #2)
&gt; The problem is in the HTML::Encoding perl module used by the validator. 
&gt; There&apos;s a bug report open about it at
&gt; https://rt.cpan.org/Ticket/Display.html?id=42497
&gt; 

I can&apos;t see how that can be the problem. There may well be a problem with the HTML::Encoding module, but that shouldn&apos;t affect (X)HTML5 validation. AFAICT the W3C&apos;s part of the markup validator shouldn&apos;t even see the meta charset (&lt;meta charset=&quot;utf-8&quot;&gt;) part of the webpage, as soon as the validator sees the new HTML doctype (introduced in HTML5 (&lt;!DOCTYPE html&gt;)) it should pass the whole document over to the validator.nu part of the validator for validation and then the validator.nu should decide if the charset is correct or not, not the main W3C validator.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24348</commentid>
    <comment_count>4</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-03-20 14:33:42 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; (In reply to comment #2)
&gt; &gt; The problem is in the HTML::Encoding perl module used by the validator. 

&gt; I can&apos;t see how that can be the problem. 
[snip]
&gt; as soon as the validator sees the
&gt; new HTML doctype (introduced in HTML5 (&lt;!DOCTYPE html&gt;)) it should pass the
&gt; whole document over to the validator.nu 

The validator 1) needs to know the encoding before it can preparse the document and detect that doctype and 2) needs to know and decode the bytes before it can pass the document to the validator.nu engine. It is not “just” a redirection. </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24370</commentid>
    <comment_count>5</comment_count>
    <who name="Dean Edridge">dean</who>
    <bug_when>2009-03-22 09:35:24 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; (In reply to comment #3)
&gt; &gt; (In reply to comment #2)
&gt; &gt; &gt; The problem is in the HTML::Encoding perl module used by the validator. 
&gt; 
&gt; &gt; I can&apos;t see how that can be the problem. 
&gt; [snip]
&gt; &gt; as soon as the validator sees the
&gt; &gt; new HTML doctype (introduced in HTML5 (&lt;!DOCTYPE html&gt;)) it should pass the
&gt; &gt; whole document over to the validator.nu 
&gt; 
&gt; The validator 1) needs to know the encoding before it can preparse the document
&gt; and detect that doctype and 2) needs to know and decode the bytes before it can
&gt; pass the document to the validator.nu engine. It is not “just” a
&gt; redirection. 
&gt; 

I think problems like this are going to be never ending, therefore I think the W3C should use the validator.nu as for the &quot;front end&quot; of its validation service. Has this been considered before?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24373</commentid>
    <comment_count>6</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-03-22 22:15:21 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; I think problems like this are going to be never ending, therefore I think the
&gt; W3C should use the validator.nu as for the &quot;front end&quot; of its validation
&gt; service. Has this been considered before?

This is getting a little OT and would probably be best on the validator list, but yes, this has been considered. 

The validator.nu engine is a wonderful piece of software, in many ways superior to the other engines which validator.w3.org uses. However, IMHO validator.nu is neither stable enough (see e.g http://lists.w3.org/Archives/Public/www-validator/2009Mar/0037.html ) nor flexible enough (limited number of profiles, no DTD support for legacy HTML, etc) nor usable enough (bare bone UI and limited message explanations, no file upload, no direct input, etc) to simply &quot;be&quot; the sole and front engine on validator.w3.org. 

I am quite certain that at this point, having validator.w3.org be a frontend for multiple engines, including OpenSP for DTD and validator.nu for html5 and other applications, is the most desirable architecture.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>25018</commentid>
    <comment_count>7</comment_count>
    <who name="Oli Studholme">w3.org</who>
    <bug_when>2009-05-07 03:13:21 +0000</bug_when>
    <thetext>For what itfs worth, I wrote up a description of this issue, with some linked reductions:
    http://oli-studio.com/bugs/validator/html5-charset/

It was mainly intended to explain the situation to content creators, and show what combination of character set declaration methods generated no errors.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>27422</commentid>
    <comment_count>8</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2009-09-21 18:56:01 +0000</bug_when>
    <thetext>*** Bug 7135 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>28390</commentid>
    <comment_count>9</comment_count>
    <who name="Jill Ramonsky">jill.ramonsky</who>
    <bug_when>2009-10-16 08:22:38 +0000</bug_when>
    <thetext>This one is biting me too. Nothing to add, except I&apos;d like to see it fixed soon.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>29937</commentid>
    <comment_count>10</comment_count>
    <who name="Thomas Traub">thomastraub2000</who>
    <bug_when>2009-12-05 23:55:59 +0000</bug_when>
    <thetext>I encountered the same issue for http://usesthis.com/</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>29982</commentid>
    <comment_count>11</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2009-12-08 09:16:48 +0000</bug_when>
    <thetext>Ville has a new Validator release queued up to deploy, and I think it may contain a fix for this issue. I&apos;ll check with him and see.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>30064</commentid>
    <comment_count>12</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2009-12-10 19:01:40 +0000</bug_when>
    <thetext>There is no fix for this issue yet.  I have some local prototype level code for this which I&apos;ll revisit soon, but it has some showstopper problems (for example it might in some cases affect validation of non-HTML5 HTML documents).  Due to how the validator works at the moment, the fix is not trivial.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>30089</commentid>
    <comment_count>13</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2009-12-11 19:13:06 +0000</bug_when>
    <thetext>A fix is now in CVS and available for testing at http://qa-dev.w3.org/wmvs/HEAD/ .

Something weird happens when that (and my local instance) of validator tries to access the HTML5 validator installed locally on http://qa-dev.w3.org:8888/html5/ when validating http://htmlex.met.cz/ .  The error is &quot;Insecure dependency in connect while running with -T switch&quot; and what makes it strange is that interfacing the very same HTML5 validator when checking some other documents (such as the ones from comment 7 and comment 10) works just fine.  As does when the validator is configured to use http://validator.nu/ as its HTML5 validator.  I have no idea how the document to be validated could cause this (it has already been fetched locally, and is about to be POSTed to the same HTML5 instance which works fine for other docs), but I&apos;ll try to find out.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>30100</commentid>
    <comment_count>14</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2009-12-12 12:51:48 +0000</bug_when>
    <thetext>(In reply to comment #13)
&gt; Something weird happens when that (and my local instance) of validator tries to
&gt; access the HTML5 validator installed locally on
&gt; http://qa-dev.w3.org:8888/html5/ when validating http://htmlex.met.cz/ .

Workaround (but no reason) found and applied, more details at http://rt.cpan.org/Public/Bug/Display.html?id=52707</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>30790</commentid>
    <comment_count>15</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2010-01-08 21:42:52 +0000</bug_when>
    <thetext>*** Bug 8678 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>30792</commentid>
    <comment_count>16</comment_count>
    <who name="Thomas Traub">thomastraub2000</who>
    <bug_when>2010-01-08 22:01:05 +0000</bug_when>
    <thetext>(In reply to comment #13)
&gt; A fix is now in CVS and available for testing at
&gt; http://qa-dev.w3.org/wmvs/HEAD/ .
&gt; 
This fix works for me, thanks</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>32779</commentid>
    <comment_count>17</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2010-03-02 19:52:25 +0000</bug_when>
    <thetext>Code fixes are included in 0.8.6 but unfortunately the required HTML::HeadParser &gt;= 3.60 module is not installed on the production validator.w3.org boxes yet.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>32787</commentid>
    <comment_count>18</comment_count>
    <who name="Ted Guild">ted</who>
    <bug_when>2010-03-03 04:03:24 +0000</bug_when>
    <thetext>(In reply to comment #17)
&gt; Code fixes are included in 0.8.6 but unfortunately the required
&gt; HTML::HeadParser &gt;= 3.60 module is not installed on the production
&gt; validator.w3.org boxes yet.

Installed now, sorry for the inconvenience.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>32797</commentid>
    <comment_count>19</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2010-03-03 17:17:17 +0000</bug_when>
    <thetext>Thanks, closing.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>35957</commentid>
    <comment_count>20</comment_count>
    <who name="Sasha Vodnik">w3c</who>
    <bug_when>2010-06-03 23:46:58 +0000</bug_when>
    <thetext>I just ran into this bug on the production site:
http://validator.w3.org/#validate_by_upload
The validator didn&apos;t see my file&apos;s &lt;!DOCTYPE html&gt;.
I verified that my code validates at 
http://qa-dev.w3.org/wmvs/HEAD/#validate_by_upload
Is it possible that this bug is fixed for the URI case, but not for uploads?

(In reply to comment #18)
&gt; (In reply to comment #17)
&gt; &gt; Code fixes are included in 0.8.6 but unfortunately the required
&gt; &gt; HTML::HeadParser &gt;= 3.60 module is not installed on the production
&gt; &gt; validator.w3.org boxes yet.
&gt; 
&gt; Installed now, sorry for the inconvenience.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>36139</commentid>
    <comment_count>21</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2010-06-14 06:51:26 +0000</bug_when>
    <thetext>I changed the category on this because this is not a bug in the validator.nu HTML5-checking backend but instead relates to the Perl code</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>86442</commentid>
    <comment_count>22</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2013-04-21 02:44:06 +0000</bug_when>
    <thetext>Just use http://validator.w3.org/nu/ directly.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>