This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4623 - XML parser cannot dereferenc entities, complains about as validation errors
Summary: XML parser cannot dereferenc entities, complains about as validation errors
Status: RESOLVED FIXED
Alias: None
Product: Validator
Classification: Unclassified
Component: Parser (show other bugs)
Version: 0.8.0b2
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Terje Bless
QA Contact: qa-dev tracking
URL: http://www.riastudio.fr/w3c/testcases...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-11 08:54 UTC by riaStudio
Modified: 2007-06-12 11:57 UTC (History)
0 users

See Also:


Attachments
List of HTML4 entities, XHTML1.0 Transitional (43.41 KB, text/html)
2007-06-11 08:55 UTC, riaStudio
Details
Validator results (204.68 KB, text/html)
2007-06-11 17:50 UTC, riaStudio
Details

Description riaStudio 2007-06-11 08:54:26 UTC
A bunch of HTML4 entities do not validates anymore when using the XHTML 1.0 Transitional doctype (see testcase, list of entities taken from http://www.cookwood.com/html/extras/entities.html).
Comment 1 riaStudio 2007-06-11 08:55:54 UTC
Created attachment 478 [details]
List of HTML4 entities, XHTML1.0 Transitional

Testcase
Comment 2 Olivier Thereaux 2007-06-11 15:13:02 UTC
Hello

(In reply to comment #0)
> A bunch of HTML4 entities do not validates anymore when using the XHTML 1.0
> Transitional doctype (see testcase, list of entities taken from
> http://www.cookwood.com/html/extras/entities.html).

I tested the test case you provided with both production and test versions of the validator, and both happily parsed the entities. Do you have an online example of a document that fails, or was it a transient issue?

Comment 3 riaStudio 2007-06-11 17:50:06 UTC
Created attachment 479 [details]
Validator results
Comment 4 riaStudio 2007-06-11 17:51:30 UTC
Using http://validator-test.w3.org/ (v0.8.0-beta2), both testcases still do not validate for me (248 errors).

1. when validating by uri the following testcase :
http://www.riastudio.fr/w3c/testcases/entity-testcase.html

2. when validating by file upload Attachment #479 [details].

This does not seem platform, nor browser specific (I've tried with both linux and Win, Konqueror/Firefox/IE).

See Attachment #479 [details] for the result I get.
Comment 5 Olivier Thereaux 2007-06-11 20:25:12 UTC
I'm starting to suspect the issue is not really related to entities (the validator running on e.g qa-dev.w3.org gives different results than validator-test.w3.org), but a protection mechanism of www.w3.org stopping the validator's XML parser from GETing the entities file.

looking into it.
Comment 6 Olivier Thereaux 2007-06-11 20:41:10 UTC
I can confirm my hunch, this is definitely www.w3.org's protection mechanism at play.

validator-test:~# wget http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
HTTP request sent, awaiting response... 503 Service Unavailable due to Abuse from requesting IP
20:39:03 ERROR 503: Service Unavailable due to Abuse from requesting IP.

I hope I can set up XML::LibXML to be a good web citizen and cache such files... In the meantime I'll whitelist the host and we should be safe (enough).
Comment 7 Olivier Thereaux 2007-06-12 02:16:19 UTC
(In reply to comment #6)
> I hope I can set up XML::LibXML to be a good web citizen and cache such
> files... In the meantime I'll whitelist the host and we should be safe
> (enough).

This seems to be doing the job, for now.
Comment 8 riaStudio 2007-06-12 07:00:23 UTC
Maybe this can help you (quoted from http://xmlsoft.org/catalog.html) :

"In a normal environment libxml2 will by default check the presence of a catalog in /etc/xml/catalog, and assuming it has been correctly populated, the processing is completely transparent to the document user."

"If your system is correctly configured all the authoring phase and processing should use only local files, even if your document stays portable because it uses the canonical public and system ID, referencing the remote document."

Comment 9 Olivier Thereaux 2007-06-12 11:57:25 UTC
(In reply to comment #8)
> Maybe this can help you (quoted from http://xmlsoft.org/catalog.html) :

Excellent, it helped indeed. I followed that lead and found that the perl binding which we use has an option to set the catalog, so we can use that:
http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod#load_catalog

thanks!