This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1182 - add an exclude-links option analog to exclude-docs
Summary: add an exclude-links option analog to exclude-docs
Status: RESOLVED DUPLICATE of bug 689
Alias: None
Product: LinkChecker
Classification: Unclassified
Component: checklink (show other bugs)
Version: 4.0
Hardware: Other Linux
: P2 enhancement
Target Milestone: ---
Assignee: Ville Skyttä
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-28 10:00 UTC by Stefan Ruppert
Modified: 2006-10-19 20:59 UTC (History)
1 user (show)

See Also:


Attachments

Description Stefan Ruppert 2005-03-28 10:00:23 UTC
add an option --exclude-links analog to --exclude-docs but instead of excluding
parsing a document exclude a link for checking. For example running checklink
locally a regexp of "^http:" would exclude all remote link checking.

Add the following line after:

next if ($u =~ m/^mailto:/);

next if ($u =~ $Opts{Exclude_Links});
Comment 1 Bruce Altmann 2006-06-13 06:00:53 UTC
Hello Ville,

Did you have a chance to add this.
I was thinking of adding a --staywithin option.
(I see this as a way to say - stay on our web servers. So links out are checked but not followed in recursion)

ex --staywithin *.amd.com

For 4.2.1 code - I assume this goes in the
sub in_recursion_scope()
Is that correct?

(as I see the mailto filter (in 4.2.1) is applied when it builds the list of broken links)

-Bruce
Comment 2 Ville Skyttä 2006-06-14 17:48:28 UTC
No, this has not been implemented yet, I'll look into it.

Regarding --staywithin, the recursion scope is already limited to the base URI and below of the initial document by default, and can be controlled using the --location option.  I'm considering improving that by making it possible to specify multiple recursion bases by specifying --location more than once.
Comment 3 Bruce Altmann 2006-06-15 10:00:25 UTC
Yes, I ran into this base and location issue testing  http://www.amd.com/us-en/

this main page points to other key amd URIs
enterprise.amd.com
amdlive.amd.com
search.amd.com

I am looking to have recursive check any link found under amd.com/us-en/
but stays within *.amd.com

Is there a way with the current --location option (4.2.1) to say - 
check all links (as a page) that have *.amd.com in them?
(so it does not get stuck under http://www.amd.com/us-en/)

-Bruce
Comment 4 Ville Skyttä 2006-06-15 17:30:20 UTC
(In reply to comment #3)
> Is there a way with the current --location option (4.2.1) to say - 
> check all links (as a page) that have *.amd.com in them?

I'm afraid there isn't.

This is getting off topic for this particular bug/RFE, and Bugzilla is not a good tool to facilitate discussion in the first place.  So please use the www-validator mailing list for discussions, and open new bugs for new issues, thanks in advance.
Comment 5 Ville Skyttä 2006-10-19 20:59:25 UTC
Bug 689 is actually the same as this one - marking this one as a duplicate because the other has some votes on it already.

*** This bug has been marked as a duplicate of bug 689 ***