<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>4985</bug_id>
          
          <creation_ts>2007-08-25 04:10:25 +0000</creation_ts>
          <short_desc>Link checker dies on links to particular url</short_desc>
          <delta_ts>2009-12-10 19:30:56 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>LinkChecker</product>
          <component>checklink</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://validator.w3.org/checklink?uri=http%3A%2F%2Ffastcounter.bcentral.com%2Ffc-join&amp;hide_type=all&amp;depth=&amp;check=Check</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>4.5</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>robertgibsonx</reporter>
          <assigned_to name="Ville Skyttä">ville.skytta</assigned_to>
          <cc>azerger</cc>
    
    <cc>gonzo1lee</cc>
    
    <cc>ot</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>16269</commentid>
    <comment_count>0</comment_count>
    <who name="">robertgibsonx</who>
    <bug_when>2007-08-25 04:10:25 +0000</bug_when>
    <thetext>There seems to be something that the link checker does not like about the following url. 

http://fastcounter.bcentral.com/fc-join

Internet explorer times out, but the linkchecker does not (if the site being checked links to it), or gives a 500 error (if trying to check links from it).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>16344</commentid>
    <comment_count>1</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2007-09-02 08:49:49 +0000</bug_when>
    <thetext>I can reproduce this locally - the error I get in the Apache error log is:

[Sun Sep 02 10:56:13 2007] [warn] [client 127.0.0.1] Timeout waiting for output
from CGI script /home/scop/cvs/w3c/LinkChecker/bin/checklink, referer: http://lo
calhost/checklink

However, I&apos;m not sure what to do about this - it&apos;s Apache which gives up waiting for output from checklink, not something that is strictly speaking completely checklink&apos;s fault or under its control in my opinion.

I&apos;ve moved output of the initial HTTP headers so that they are written before the first document is fetched, but if the timeout happens and Apache gives up on our CGI, we&apos;ll still get missing results and no sane error message (or incomplete results if it kicks in later during the check) and invalid markup.

One thing worth looking into would be to decrease link checker&apos;s timeout to something smaller - Apache defaults to 300 seconds (but the default in my Fedora setup&apos;s httpd.conf is 120 seconds) and the link checker uses 60 seconds by default.  That doesn&apos;t explain why I see the timeout, but for example on qa-dev.w3.org the link checker reports a better error message which means Apache didn&apos;t kill it:

http://qa-dev.w3.org/wlc/checklink?uri=http%3A%2F%2Ffastcounter.bcentral.com%2Ffc-join&amp;hide_type=all&amp;depth=&amp;check=Check
Error: 500 Can&apos;t connect to fastcounter.bcentral.com:80 (connect: timeout)

Olivier, could you check what the httpd timeout is set to on validator.w3.org Apaches?  Other ideas?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>16349</commentid>
    <comment_count>2</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2007-09-03 01:22:39 +0000</bug_when>
    <thetext>(In reply to comment #1)

&gt; One thing worth looking into would be to decrease link checker&apos;s timeout to
&gt; something smaller


I&apos;m not sure that apache is giving up (on the link checker) before the link checker gives up (on the checked link). As I just checked, apache&apos;s timeout value is 120 on our servers, which is way above the link checker&apos;s. I&apos;m suspecting that the actor abandoning the game is actually the browser, here, which I think typically has a 30 seconds timeout.

therefore, I agree we could lower the link checker&apos;s default timeout value to, say, 20 or 30 seconds.

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24044</commentid>
    <comment_count>3</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-03-04 14:04:09 +0000</bug_when>
    <thetext>(In reply to comment #2)

&gt; therefore, I agree we could lower the link checker&apos;s default timeout value to,
&gt; say, 20 or 30 seconds.

In my test instance of the link checker, I have the default timeout set to 20 seconds. I just found a strange (not documented AFAICT) behavior of LWP. See this output:

Checking link http://jtc1sc36.org/doc/36N1141.pdf
HEAD http://jtc1sc36.org/doc/36N1141.pdf  fetched in 41.01s

Checking link http://jtc1sc36.org/doc/36N1142.pdf
HEAD http://jtc1sc36.org/doc/36N1142.pdf  fetched in 21.00s

Both resources are later reported as timing out. Note that for the first resource, LWP waited twice the configured time before it timed out. 

This doubling would actually explain why apache (timeout 120) currently times out before the link checker (currenty timeout 60 in the production version) does. Browsers typically timeout at 300 seconds, so not a problem.

I suggest we switch the default timeout value to 30s in the upcoming 4.5 release. Users checking very slow servers on the commandline can override that setup.
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24052</commentid>
    <comment_count>4</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2009-03-04 17:26:07 +0000</bug_when>
    <thetext>I guess the reason for the first one taking double time is that under the hood, when accessing a host for the first time, link checker (LWP::RobotUA) tries to fetch /robots.txt which times out after 20 sec, then accessing the actual resource takes another 20.  Subsequent resources on the same host during the same check no longer result in /robots.txt access.

I agree with lowering the default timeout, it&apos;s 30 seconds in CVS now.

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24129</commentid>
    <comment_count>5</comment_count>
    <who name="Olivier Thereaux">ot</who>
    <bug_when>2009-03-10 12:41:57 +0000</bug_when>
    <thetext>*** Bug 6417 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>30066</commentid>
    <comment_count>6</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2009-12-10 19:30:56 +0000</bug_when>
    <thetext>It&apos;s 30 seconds in 4.5.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>