<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>104</bug_id>
          
          <creation_ts>2002-12-05 18:30:42 +0000</creation_ts>
          <short_desc>Restrict recursive check to a given domain</short_desc>
          <delta_ts>2013-11-03 07:35:23 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>LinkChecker</product>
          <component>checklink</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>ASSIGNED</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>enhancement</bug_severity>
          <target_milestone>---</target_milestone>
          <dependson>115</dependson>
          
          <everconfirmed>1</everconfirmed>
          <reporter>Unknown_kev_cat</reporter>
          <assigned_to name="This bug has no owner yet - up for the taking">dave.null</assigned_to>
          <cc>gonzo1lee</cc>
    
    <cc>sporosbe</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>266</commentid>
    <comment_count>0</comment_count>
    <who name="">Unknown_kev_cat</who>
    <bug_when>2002-12-05 18:30:42 +0000</bug_when>
    <thetext>Often times I want to use the recursive link checker to check the links on my
pages but the link checker will go off checking other people&apos;s links. There
should be an option to scan only the pages in a given domain. While doing this
it will still check to see that the pages linked to outside the domain exist but
i will not scan the links on those pages.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>278</commentid>
    <comment_count>1</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2002-12-06 07:33:35 +0000</bug_when>
    <thetext>Could you provide an example URI that reproduces this?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>280</commentid>
    <comment_count>2</comment_count>
    <who name="">Unknown_kev_cat</who>
    <bug_when>2002-12-07 13:06:04 +0000</bug_when>
    <thetext>http://validator.w3.org/checklink?uri=tacvek.tripod.com&amp;summary=on&amp;hide_type=all&amp;recursive=on&amp;depth=3&amp;check=Check

the above uri scans my web page. the 3rd page scaned for bad links is one that
is not on my site. this is not what i want. i want to recursivly check the pages
on my site only. the feature i am suggesting is to resrict the recusive scanning
to the original domain/sub domain/folder witghin domain.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>284</commentid>
    <comment_count>3</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2002-12-07 13:37:24 +0000</bug_when>
    <thetext>Yes, I get your point.  The way checklink should work at the moment is not only
to restrict the recursion scope to the same domain or host but the same *base URI*.
Base URI means that if you&apos;re checking links for a resource at
&lt;http://foo.bar.com/quux/something.html&gt;, checklink *should* check only
resources whose URI starts with &quot;http://foo.bar.com/quux/&quot;.  This is probably
what you mean by a given &quot;folder&quot;.

Obviously, your example URI reveals a reproducible bug in checklink, I have
logged this as bug 115, and already found a workaround.  Check it out, and
thanks for the sample URI.
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>290</commentid>
    <comment_count>4</comment_count>
    <who name="">Unknown_kev_cat</who>
    <bug_when>2002-12-08 13:06:31 +0000</bug_when>
    <thetext>The bug described in 115 appears to be the same bug as this one. That is true 
at least if Comment 4 is correct. However the link still does not appear to 
work properly.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>291</commentid>
    <comment_count>5</comment_count>
    <who name="">Unknown_kev_cat</who>
    <bug_when>2002-12-08 13:08:01 +0000</bug_when>
    <thetext>Whoops... I meant comment 3</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>292</commentid>
    <comment_count>6</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2002-12-08 13:40:26 +0000</bug_when>
    <thetext>I think bug 115 is not a duplicate; it is about the current recursion
restrictions being broken.  If I understand correctly from your initial comment
and the bug summary, this one is about a new feature, which actually would
/broaden/ the scope of recursion, not restricting it to the same base URI.

I apologize if I misunderstood, but the domain/host &quot;restriction&quot; feature has
been asked before, I&apos;m leaving this one open.  OTOH, it might be that bug 115
has caused the feature requests... :)

The version at validator.w3.org/checklink hasn&apos;t been updated yet, it&apos;s still
the old one with the recursion bug.  It&apos;ll be updated shortly.  The fix is only
in CVS for now, get version 3.6.2.2 if you want to try it locally.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>293</commentid>
    <comment_count>7</comment_count>
    <who name="">Unknown_kev_cat</who>
    <bug_when>2002-12-08 15:12:16 +0000</bug_when>
    <thetext>Well actually that would be even better. so leave this bug open. I think this 
bug should be left open as a posiblility and perhaps put on a todo list in 
an &apos;optional&apos; section. I have no real way to try a cvs version as my site will 
probably not allow it to be run (Tripod has a very picky cgi policy) and I 
don&apos;t have a personal web server.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>294</commentid>
    <comment_count>8</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2002-12-08 15:17:29 +0000</bug_when>
    <thetext>Yes, it&apos;s already kinda on the todo list; logged as an enhancement in Bugzilla :)

Anyway, I think the public service will be updated soon (I have no direct
control over it).  And in case you didn&apos;t know, checklink can also be run on the
command line...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1647</commentid>
    <comment_count>9</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2004-04-04 12:32:20 +0000</bug_when>
    <thetext>Just a quick followup: the current version running at
http://validator.w3.org/checklink should no longer have the recursion bug.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>