This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6329 - Implement XML::LibXML Structured Errors
Summary: Implement XML::LibXML Structured Errors
Status: RESOLVED FIXED
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: HEAD
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Olivier Thereaux
QA Contact: qa-dev tracking
URL: http://deeden.co.uk/misc/quantum.html
Whiteboard:
Keywords:
: 4420 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-12-22 14:14 UTC by Steve Rushe
Modified: 2009-03-13 14:59 UTC (History)
1 user (show)

See Also:


Attachments

Description Steve Rushe 2008-12-22 14:14:40 UTC
I've noticed that some pages are reporting valid at one point in time and invalid later despite nothing in the page having changed.

This occurs with 3 or 4 pages I'm testing, all of which are presumably invalid with a common error (no space before a class=). I've reduced this down to a test case which reproduces the bug I'm seeing (http://deeden.co.uk/misc/quantum.html). There is no space before class="hello" on line 10 and this is what the validator reports on when it views the page as invalid.  

As I write this the page is being reported as invalid, however in a while (whether  minutes or hours) it will report it as valid. If I retry the page a few times it will continue to say it is valid until it eventually starts reporting it as invalid, again consistently.

I've checked that the headers being sent for the page are the same during both valid and invalid periods and they are, so it's not something to do with that.

The behaviour I see is happening both through the web interface and the WebService::Validator::HTML::W3C perl module.
Comment 1 Olivier Thereaux 2008-12-22 14:26:39 UTC
(In reply to comment #0)
> I've noticed that some pages are reporting valid at one point in time and
> invalid later despite nothing in the page having changed.

I find this surprising. One way to debug this would be to check show source in the validator and see if the validator is consistently served the same markup, or not.

If the validator consistently received the same markup and reports different results, then it's a bug with the validator.

If the validator receives different markup at different times, it is more likely to be an issue with your server, some proxy/cache at your ISP level, etc.

I have loaded the validation results and will keep reloading to see if the results change. So far, results have been consistent (invalid).
Comment 2 Steve Rushe 2008-12-22 14:54:01 UTC
(In reply to comment #1)
> I find this surprising. One way to debug this would be to check show source
> in the validator and see if the validator is consistently served the same
> markup, or not.

I knew there was one other thing I meant to check systematically. I did do this a  quick check yesterday. Both the valid and invalid versions showed the same code in the source view.
 
> I have loaded the validation results and will keep reloading to see if the
> results change. So far, results have been consistent (invalid).

It was the same here until just now when I checked. At the moment, it's reporting as valid for me.
Comment 3 Olivier Thereaux 2008-12-22 15:08:37 UTC
(In reply to comment #2)
> It was the same here until just now when I checked. At the moment, it's
> reporting as valid for me.

Ack. It's not one validator being inconsistent, it's two servers acting differently:
http://128.30.52.13/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug

http://128.30.52.49/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug

It looks like these two servers are using different versions of the XML libraries, but the difference in result are disturbing. Will look into that.
 

Comment 4 Steve Rushe 2008-12-22 15:38:16 UTC
(In reply to comment #3)
> 
> Ack. It's not one validator being inconsistent, it's two servers acting
> differently:
> http://128.30.52.13/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug
> 
> http://128.30.52.49/check?uri=http%3A%2F%2Fdeeden.co.uk%2Fmisc%2Fquantum.html&debug
> 
> It looks like these two servers are using different versions of the XML
> libraries, but the difference in result are disturbing. Will look into that.

Cheers for that Olivier. I'm relieved that it's not something I did.
Comment 5 Olivier Thereaux 2008-12-22 16:39:43 UTC
I am finding incompatibilities between libxml2 and XML::LibXML, two lower-lever libraries used by the validators, but only for certain versions. This is very puzzling, to say the least. 

Below, the script I used to test on  various machines, and a number of results. The results saying "attributes construct error" are the proper, expected ones.

I will try and contact the maintainer(s) for XML::LibXML, in hope that they can be of help.


#!/usr/bin/perl
use 5.008; use strict; use warnings; use utf8; use XML::LibXML qw();
my $dotted=XML::LibXML::LIBXML_DOTTED_VERSION;
print "XML::LibXML Version: $XML::LibXML::VERSION\nlibxml2 Version: $dotted\n\n";
XML::LibXML->new()->parse_string('<foo attr1="value1"attr2="value2" />');



XML::LibXML Version: 1.66
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5


**************************************************************************

XML::LibXML Version: 1.68
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5



**************************************************************************

XML::LibXML Version: 1.69
libxml2 Version: 2.6.16

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5



**************************************************************************

XML::LibXML Version: 1.66
libxml2 Version: 2.6.32

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5


**************************************************************************



XML::LibXML Version: 1.69
libxml2 Version: 2.6.32

:1: parser error : Extra content at the end of the document


**************************************************************************

XML::LibXML Version: 1.63
libxml2 Version: 2.6.29

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 5


**************************************************************************

XML::LibXML Version: 1.68
libxml2 Version: 2.6.27

:1: parser error : Extra content at the end of the document



**************************************************************************

Comment 6 Olivier Thereaux 2008-12-22 16:40:14 UTC
(In reply to comment #4)
> Cheers for that Olivier. I'm relieved that it's not something I did.

No problem Steve, and many thanks for the report.


Comment 7 Olivier Thereaux 2008-12-22 16:52:08 UTC
For the time being I have downgraded the version of the XML::LibXML library (now using 1.66 which seems to work better) and the two validator.w3.org servers are producing the proper (and consistent) output.

Will keep thi sbug open until we have a satisfying resolution of the library problem, not just this workaround.
Comment 8 Olivier Thereaux 2008-12-22 19:37:30 UTC
I think I found the culprit in the Changelog for XML::LibXML. In recent versions, there is a new module to use the Structured Errors API (great!) but it's not quite backward compatible.

Will have to add code to handle the Structured Errors. 
http://search.cpan.org/~pajas/XML-LibXML-1.69/lib/XML/LibXML/Error.pod
Comment 9 Olivier Thereaux 2008-12-31 20:36:49 UTC
(In reply to comment #8)
> I think I found the culprit in the Changelog for XML::LibXML. In recent
> versions, there is a new module to use the Structured Errors API (great!) but
> it's not quite backward compatible.
> 
> Will have to add code to handle the Structured Errors. 
> http://search.cpan.org/~pajas/XML-LibXML-1.69/lib/XML/LibXML/Error.pod

The frustrating part so far is that the new structured errors code only gives you the last parsing error (when, if anything, I would be happy enough showing only the first!)

Using this code, from the perl module documentation:

use 5.008; use strict; use warnings; use utf8; use XML::LibXML qw();
my $dotted=XML::LibXML::LIBXML_DOTTED_VERSION;
print "XML::LibXML Version: $XML::LibXML::VERSION\nlibxml2 Version: $dotted\n\n";

eval {XML::LibXML->new()->parse_string('<foo attr1="value1"attr2="value2" />')};
if (ref($@)) {
  # handle a structured error (XML::LibXML::Error object)
  print $@->dump();
} elsif ($@) {
  # error, but not an XML::LibXML::Error object
  print $@;
} else {
  # no error
}


ot@qa:~$ perl testlibxml.pl 
XML::LibXML Version: 1.66
libxml2 Version: 2.6.32

:1: parser error : attributes construct error
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Couldn't find end of Start Tag foo line 1
<foo attr1="value1"attr2="value2" />
                   ^
:1: parser error : Extra content at the end of the document
<foo attr1="value1"attr2="value2" />
                   ^ at testlibxml.pl line 6


ot@qa:~$ perl testlibxml.pl 
XML::LibXML Version: 1.69
libxml2 Version: 2.6.32

$error = bless( {
                  'num1' => 0,
                  'file' => '',
                  'message' => 'Extra content at the end of the document
',
                  'domain' => 1,
                  'level' => 3,
                  'str2' => undef,
                  '_prev' => undef,
                  'str1' => undef,
                  'str3' => undef,
                  'num2' => 11,
                  'code' => 5,
                  'line' => 1
                }, 'XML::LibXML::Error' );

Maybe I'm doing it wrong? The documentation is scarce, might need to contact the developer(s) to get some clearer answers.

Comment 10 Olivier Thereaux 2009-01-07 22:06:45 UTC
Followup archived here:
http://lists.w3.org/Archives/Public/public-qa-dev/2009Jan/0003.html

Hoping Petr will have time to respond.
Comment 11 Olivier Thereaux 2009-01-22 14:34:57 UTC
Got a prompt reply from Petr:
http://lists.w3.org/Archives/Public/public-qa-dev/2009Jan/0004.html
and now waiting for the release of the version 1.70 of XML::LibXML
http://search.cpan.org/dist/XML-LibXML/

Things are going to be tricky then, because any system with a version of XML::LibXML between 1.67 and 1.69 (inclusive) will have slightly wrong error reporting for xml-wf issues. I'm wondering whether this would be acceptable, or whether to require >= 1.70, which may then be a burden.

To be continued...
Comment 12 Olivier Thereaux 2009-02-02 23:20:42 UTC
*** Bug 4420 has been marked as a duplicate of this bug. ***
Comment 13 Olivier Thereaux 2009-02-05 17:02:47 UTC
Ah-ha! 

XML-LibXML-1.69_1 developer release. Should be enough to start work on implementing the structured errors in the validator.
http://cpansearch.perl.org/src/PAJAS/XML-LibXML-1.69_1/Changes
Comment 14 Olivier Thereaux 2009-02-13 15:05:39 UTC
Implementation done, ready for next release:
http://lists.w3.org/Archives/Public/www-validator-cvs/2009Feb/0092.html
and
http://lists.w3.org/Archives/Public/www-validator-cvs/2009Feb/0136.html

Note that the code above does rely on a developer version of XML::LibXML - might be a bit of trouble for people with their own instance. But with time, we'll be fine.
Comment 15 Olivier Thereaux 2009-03-13 14:59:08 UTC
Here is a one liner to text which version of XML::LibXML your system has:

perl -MXML::LibXML -e 'print " XML::LibXML Version: $XML::LibXML::VERSION\n";' 


If you have a version > 1.66 and < 1.70, I suggest heading to CPAN and install the latest version:
http://search.cpan.org/dist/XML-LibXML/