<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>6329</bug_id>
          
          <creation_ts>2008-12-22 14:14:40 +0000</creation_ts>
          <short_desc>Implement XML::LibXML Structured Errors</short_desc>
          <delta_ts>2009-03-13 14:59:08 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Validator</product>
          <component>check</component>
          <version>HEAD</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://deeden.co.uk/misc/quantum.html</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Steve Rushe">srushe</reporter>
          <assigned_to name="Olivier Thereaux">ot</assigned_to>
          <cc>ot</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>22822</commentid>
    <comment_count>0</comment_count>
    <who name="Steve Rushe">srushe</who>
    <bug_when>2008-12-22 14:14:40 +0000</bug_when>
    <thetext>I&apos;ve noticed that some pages are reporting valid at one point in time and invalid later despite nothing in the page having changed.

This occurs with 3 or 4 pages I&apos;m testing, all of which are presumably invalid with a common error (no space before a class=). I&apos;ve reduced this down to a test case which reproduces the bug I&apos;m seeing (http://deeden.co.uk/misc/quantum.html). There is no space before class=&quot;hello&quot; on line 10 and this is what the validator reports on when it views the page as invalid.  

As I write this the page is being reported as invalid, however in a while (whether  minutes or hours) it will report it as valid. If I retry the page a few times it will continue to say it is valid until it eventually starts reporting it as invalid, again consistently.

I&apos;ve checked that the headers being sent for the page are the same during both valid and invalid periods and they are, so it&apos;s not something to do with that.

The behaviour I see is happening both through the web interface and the WebService::Validator::HTML::W3C perl module.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22823</commentid>
    <comment_count>1</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2008-12-22 14:26:39 +0000</bug_when>
    <thetext>(In reply to comment #0)
&gt; I&apos;ve noticed that some pages are reporting valid at one point in time and
&gt; invalid later despite nothing in the page having changed.

I find this surprising. One way to debug this would be to check show source in the validator and see if the validator is consistently served the same markup, or not.

If the validator consistently received the same markup and reports different results, then it&apos;s a bug with the validator.

If the validator receives different markup at different times, it is more likely to be an issue with your server, some proxy/cache at your ISP level, etc.

I have loaded the validation results and will keep reloading to see if the results change. So far, results have been consistent (invalid).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22824</commentid>
    <comment_count>2</comment_count>
    <who name="Steve Rushe">srushe</who>
    <bug_when>2008-12-22 14:54:01 +0000</bug_when>
    <thetext>
(In reply to comment #1)
&gt; I find this surprising. One way to debug this would be to check show source
&gt; in the validator and see if the validator is consistently served the same
&gt; markup, or not.

I knew there was one other thing I meant to check systematically. I did do this a  quick check yesterday. Both the valid and invalid versions showed the same code in the source view.
 
&gt; I have loaded the validation results and will keep reloading to see if the
&gt; results change. So far, results have been consistent (invalid).

It was the same here until just now when I checked. At the moment, it&apos;s reporting as valid for me.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22825</commentid>
    <comment_count>3</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2008-12-22 15:08:37 +0000</bug_when>
    <thetext>(In reply to comment #2)
&gt; It was the same here until just now when I checked. At the moment, it&apos;s
&gt; reporting as valid for me.

Ack. It&apos;s not one validator being inconsistent, it&apos;s two servers acting differently:
http://128.30.52.13/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&amp;debug

http://128.30.52.49/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&amp;debug

It looks like these two servers are using different versions of the XML libraries, but the difference in result are disturbing. Will look into that.
 

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22830</commentid>
    <comment_count>4</comment_count>
    <who name="Steve Rushe">srushe</who>
    <bug_when>2008-12-22 15:38:16 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; 
&gt; Ack. It&apos;s not one validator being inconsistent, it&apos;s two servers acting
&gt; differently:
&gt; http://128.30.52.13/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&amp;debug
&gt; 
&gt; http://128.30.52.49/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&amp;debug
&gt; 
&gt; It looks like these two servers are using different versions of the XML
&gt; libraries, but the difference in result are disturbing. Will look into that.

Cheers for that Olivier. I&apos;m relieved that it&apos;s not something I did.
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22834</commentid>
    <comment_count>5</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2008-12-22 16:39:43 +0000</bug_when>
    <thetext>I am finding incompatibilities between libxml2 and XML::LibXML, two lower-lever libraries used by the validators, but only for certain versions. This is very puzzling, to say the least. 

Below, the script I used to test on  various machines, and a number of results. The results saying &quot;attributes construct error&quot; are the proper, expected ones.

I will try and contact the maintainer(s) for XML::LibXML, in hope that they can be of help.


#!/usr/bin/perl
use 5.008; use strict; use warnings; use utf8; use XML::LibXML qw();
my $dotted=XML::LibXML::LIBXML_DOTTED_VERSION;
print &quot;XML::LibXML Version: $XML::LibXML::VERSION\nlibxml2 Version: $dotted\n\n&quot;;
XML::LibXML-&gt;new()-&gt;parse_string(&apos;&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;&apos;);



XML::LibXML Version: 1.66
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Couldn&apos;t find end of Start Tag foo line 1
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Extra content at the end of the document
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^ at testlibxml.pl line 5


**************************************************************************

XML::LibXML Version: 1.68
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Couldn&apos;t find end of Start Tag foo line 1
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Extra content at the end of the document
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^ at testlibxml.pl line 5



**************************************************************************

XML::LibXML Version: 1.69
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Couldn&apos;t find end of Start Tag foo line 1
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Extra content at the end of the document
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^ at testlibxml.pl line 5



**************************************************************************

XML::LibXML Version: 1.66
libxml2 Version: 2.6.32

:1: parser error : attributes construct error
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Couldn&apos;t find end of Start Tag foo line 1
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Extra content at the end of the document
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^ at testlibxml.pl line 5


**************************************************************************



XML::LibXML Version: 1.69
libxml2 Version: 2.6.32

:1: parser error : Extra content at the end of the document


**************************************************************************

XML::LibXML Version: 1.63
libxml2 Version: 2.6.29

:1: parser error : attributes construct error
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Couldn&apos;t find end of Start Tag foo line 1
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Extra content at the end of the document
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^ at testlibxml.pl line 5


**************************************************************************

XML::LibXML Version: 1.68
libxml2 Version: 2.6.27

:1: parser error : Extra content at the end of the document



**************************************************************************

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22835</commentid>
    <comment_count>6</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2008-12-22 16:40:14 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; Cheers for that Olivier. I&apos;m relieved that it&apos;s not something I did.

No problem Steve, and many thanks for the report.


</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22836</commentid>
    <comment_count>7</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2008-12-22 16:52:08 +0000</bug_when>
    <thetext>For the time being I have downgraded the version of the XML::LibXML library (now using 1.66 which seems to work better) and the two validator.w3.org servers are producing the proper (and consistent) output.

Will keep thi sbug open until we have a satisfying resolution of the library problem, not just this workaround.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22838</commentid>
    <comment_count>8</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2008-12-22 19:37:30 +0000</bug_when>
    <thetext>I think I found the culprit in the Changelog for XML::LibXML. In recent versions, there is a new module to use the Structured Errors API (great!) but it&apos;s not quite backward compatible.

Will have to add code to handle the Structured Errors. 
http://search.cpan.org/~pajas/XML-LibXML-1.69/lib/XML/LibXML/Error.pod</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22871</commentid>
    <comment_count>9</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2008-12-31 20:36:49 +0000</bug_when>
    <thetext>(In reply to comment #8)
&gt; I think I found the culprit in the Changelog for XML::LibXML. In recent
&gt; versions, there is a new module to use the Structured Errors API (great!) but
&gt; it&apos;s not quite backward compatible.
&gt; 
&gt; Will have to add code to handle the Structured Errors. 
&gt; http://search.cpan.org/~pajas/XML-LibXML-1.69/lib/XML/LibXML/Error.pod

The frustrating part so far is that the new structured errors code only gives you the last parsing error (when, if anything, I would be happy enough showing only the first!)

Using this code, from the perl module documentation:

use 5.008; use strict; use warnings; use utf8; use XML::LibXML qw();
my $dotted=XML::LibXML::LIBXML_DOTTED_VERSION;
print &quot;XML::LibXML Version: $XML::LibXML::VERSION\nlibxml2 Version: $dotted\n\n&quot;;

eval {XML::LibXML-&gt;new()-&gt;parse_string(&apos;&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;&apos;)};
if (ref($@)) {
  # handle a structured error (XML::LibXML::Error object)
  print $@-&gt;dump();
} elsif ($@) {
  # error, but not an XML::LibXML::Error object
  print $@;
} else {
  # no error
}


ot@qa:~$ perl testlibxml.pl 
XML::LibXML Version: 1.66
libxml2 Version: 2.6.32

:1: parser error : attributes construct error
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Couldn&apos;t find end of Start Tag foo line 1
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^
:1: parser error : Extra content at the end of the document
&lt;foo attr1=&quot;value1&quot;attr2=&quot;value2&quot; /&gt;
                   ^ at testlibxml.pl line 6


ot@qa:~$ perl testlibxml.pl 
XML::LibXML Version: 1.69
libxml2 Version: 2.6.32

$error = bless( {
                  &apos;num1&apos; =&gt; 0,
                  &apos;file&apos; =&gt; &apos;&apos;,
                  &apos;message&apos; =&gt; &apos;Extra content at the end of the document
&apos;,
                  &apos;domain&apos; =&gt; 1,
                  &apos;level&apos; =&gt; 3,
                  &apos;str2&apos; =&gt; undef,
                  &apos;_prev&apos; =&gt; undef,
                  &apos;str1&apos; =&gt; undef,
                  &apos;str3&apos; =&gt; undef,
                  &apos;num2&apos; =&gt; 11,
                  &apos;code&apos; =&gt; 5,
                  &apos;line&apos; =&gt; 1
                }, &apos;XML::LibXML::Error&apos; );

Maybe I&apos;m doing it wrong? The documentation is scarce, might need to contact the developer(s) to get some clearer answers.

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>22980</commentid>
    <comment_count>10</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-01-07 22:06:45 +0000</bug_when>
    <thetext>Followup archived here:
http://lists.w3.org/Archives/Public/public-qa-dev/2009Jan/0003.html

Hoping Petr will have time to respond.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>23225</commentid>
    <comment_count>11</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-01-22 14:34:57 +0000</bug_when>
    <thetext>Got a prompt reply from Petr:
http://lists.w3.org/Archives/Public/public-qa-dev/2009Jan/0004.html
and now waiting for the release of the version 1.70 of XML::LibXML
http://search.cpan.org/dist/XML-LibXML/

Things are going to be tricky then, because any system with a version of XML::LibXML between 1.67 and 1.69 (inclusive) will have slightly wrong error reporting for xml-wf issues. I&apos;m wondering whether this would be acceptable, or whether to require &gt;= 1.70, which may then be a burden.

To be continued...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>23400</commentid>
    <comment_count>12</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-02-02 23:20:42 +0000</bug_when>
    <thetext>*** Bug 4420 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>23479</commentid>
    <comment_count>13</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-02-05 17:02:47 +0000</bug_when>
    <thetext>Ah-ha! 

XML-LibXML-1.69_1 developer release. Should be enough to start work on implementing the structured errors in the validator.
http://cpansearch.perl.org/src/PAJAS/XML-LibXML-1.69_1/Changes</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>23713</commentid>
    <comment_count>14</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-02-13 15:05:39 +0000</bug_when>
    <thetext>Implementation done, ready for next release:
http://lists.w3.org/Archives/Public/www-validator-cvs/2009Feb/0092.html
and
http://lists.w3.org/Archives/Public/www-validator-cvs/2009Feb/0136.html

Note that the code above does rely on a developer version of XML::LibXML - might be a bit of trouble for people with their own instance. But with time, we&apos;ll be fine.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24231</commentid>
    <comment_count>15</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-03-13 14:59:08 +0000</bug_when>
    <thetext>Here is a one liner to text which version of XML::LibXML your system has:

perl -MXML::LibXML -e &apos;print &quot; XML::LibXML Version: $XML::LibXML::VERSION\n&quot;;&apos; 


If you have a version &gt; 1.66 and &lt; 1.70, I suggest heading to CPAN and install the latest version:
http://search.cpan.org/dist/XML-LibXML/</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>