This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 889 - checklink reports bogus broken fragments
Summary: checklink reports bogus broken fragments
Status: RESOLVED FIXED
Alias: None
Product: LinkChecker
Classification: Unclassified
Component: checklink (show other bugs)
Version: 4.0
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Ville Skyttä
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-23 04:33 UTC by Henryk Pl
Modified: 2004-10-10 21:10 UTC (History)
0 users

See Also:


Attachments

Description Henryk Pl 2004-09-23 04:33:59 UTC
Moin,

I'm currently checking links for a new version of http://de.selfhtml.org (this
is not the most current version, but it demonstrates the problem, too) and
checklink seems to report _all_ links to fragments on some pages as broken
although they are perfectly fine.

The destination pages this happens most often with are
http://de.selfhtml.org/html/referenz/elemente.htm,
http://de.selfhtml.org/css/formate/einbinden.htm and
http://de.selfhtml.org/html/attribute/eventhandler.htm

Reproducible: Always
Steps to reproduce: 
 1. Check links on http://de.selfhtml.org/html/tabellen/aufbau.htm

Actual results: All links to fragments on
http://de.selfhtml.org/html/referenz/elemente.htm are reported as broken.
Expected results: All links on that page are valid so there should be no error
report.

Note: In an recursive run on my own machine checklink even said:
-----
Processing      http://...../selfhtml81/html/referenz/elemente.htm


Anchors

Found 0 anchors.
-----
which absolutely is not true.

And yes, as far as I can tell all involved pages are valid.
Comment 1 Ville Skyttä 2004-09-23 16:31:18 UTC
Thanks for the report.  This is indeed a bug in the link checker.  I have
created a minimal reproducer here:

  http://qa-dev.w3.org/~ville/style-anchor-bug/
  http://qa-dev.w3.org/~ville/style-anchor-bug/target.html

I haven't examined why it happens yet, but "&lt;style&gt;" in the <meta> element
of target.html (as well as your elemente.htm) causes the link checker to fail to
find any anchors in that document.  It's only the specific "&lt;style&gt;"
string that causes problems, others seem to be fine.

Side note, completely offtopic: the "&lt;sub &lt;" and "&lt;tfoot&gt;, &lt;" in
your keyword list look inconsistent with the others in it... ;)
Comment 2 Ville Skyttä 2004-10-10 21:10:17 UTC
Ok, got it.  Due to a stupid bug, we were pretty often passing the entire HTTP
response (including HTTP headers) to HTML::Parser.  The parser module is
apparently pretty lax wrt. this, so the issues caused were rare.

What triggered the failure in this case is HTML::HeadParser's behaviour of
expanding <meta> tags into fake HTTP headers, which in my testcase from comment
1 resulted in "X-Meta-Keywords: <style>", and the parser choked on it.

This should be fixed in CVS, and the latest development version (which will
pretty soon be released as 4.1) is running at http://qa-dev.w3.org/wlc/checklink
for testing.