This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 104 - Restrict recursive check to a given domain
Summary: Restrict recursive check to a given domain
Status: ASSIGNED
Alias: None
Product: LinkChecker
Classification: Unclassified
Component: checklink (show other bugs)
Version: unspecified
Hardware: Other other
: P2 enhancement
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on: 115
Blocks:
  Show dependency treegraph
 
Reported: 2002-12-05 18:30 UTC by Unknown_kev_cat
Modified: 2013-11-03 07:35 UTC (History)
2 users (show)

See Also:


Attachments

Description Unknown_kev_cat 2002-12-05 18:30:42 UTC
Often times I want to use the recursive link checker to check the links on my
pages but the link checker will go off checking other people's links. There
should be an option to scan only the pages in a given domain. While doing this
it will still check to see that the pages linked to outside the domain exist but
i will not scan the links on those pages.
Comment 1 Ville Skyttä 2002-12-06 07:33:35 UTC
Could you provide an example URI that reproduces this?
Comment 2 Unknown_kev_cat 2002-12-07 13:06:04 UTC
http://validator.w3.org/checklink?uri=tacvek.tripod.com&summary=on&hide_type=all&recursive=on&depth=3&check=Check

the above uri scans my web page. the 3rd page scaned for bad links is one that
is not on my site. this is not what i want. i want to recursivly check the pages
on my site only. the feature i am suggesting is to resrict the recusive scanning
to the original domain/sub domain/folder witghin domain.
Comment 3 Ville Skyttä 2002-12-07 13:37:24 UTC
Yes, I get your point.  The way checklink should work at the moment is not only
to restrict the recursion scope to the same domain or host but the same *base URI*.
Base URI means that if you're checking links for a resource at
<http://foo.bar.com/quux/something.html>, checklink *should* check only
resources whose URI starts with "http://foo.bar.com/quux/".  This is probably
what you mean by a given "folder".

Obviously, your example URI reveals a reproducible bug in checklink, I have
logged this as bug 115, and already found a workaround.  Check it out, and
thanks for the sample URI.
Comment 4 Unknown_kev_cat 2002-12-08 13:06:31 UTC
The bug described in 115 appears to be the same bug as this one. That is true 
at least if Comment 4 is correct. However the link still does not appear to 
work properly.
Comment 5 Unknown_kev_cat 2002-12-08 13:08:01 UTC
Whoops... I meant comment 3
Comment 6 Ville Skyttä 2002-12-08 13:40:26 UTC
I think bug 115 is not a duplicate; it is about the current recursion
restrictions being broken.  If I understand correctly from your initial comment
and the bug summary, this one is about a new feature, which actually would
/broaden/ the scope of recursion, not restricting it to the same base URI.

I apologize if I misunderstood, but the domain/host "restriction" feature has
been asked before, I'm leaving this one open.  OTOH, it might be that bug 115
has caused the feature requests... :)

The version at validator.w3.org/checklink hasn't been updated yet, it's still
the old one with the recursion bug.  It'll be updated shortly.  The fix is only
in CVS for now, get version 3.6.2.2 if you want to try it locally.
Comment 7 Unknown_kev_cat 2002-12-08 15:12:16 UTC
Well actually that would be even better. so leave this bug open. I think this 
bug should be left open as a posiblility and perhaps put on a todo list in 
an 'optional' section. I have no real way to try a cvs version as my site will 
probably not allow it to be run (Tripod has a very picky cgi policy) and I 
don't have a personal web server.
Comment 8 Ville Skyttä 2002-12-08 15:17:29 UTC
Yes, it's already kinda on the todo list; logged as an enhancement in Bugzilla :)

Anyway, I think the public service will be updated soon (I have no direct
control over it).  And in case you didn't know, checklink can also be run on the
command line...
Comment 9 Ville Skyttä 2004-04-04 12:32:20 UTC
Just a quick followup: the current version running at
http://validator.w3.org/checklink should no longer have the recursion bug.