This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27 - checklink doesn't support the robots exclusion standard
Summary: checklink doesn't support the robots exclusion standard
Status: RESOLVED FIXED
Alias: None
Product: LinkChecker
Classification: Unclassified
Component: checklink (show other bugs)
Version: unspecified
Hardware: All All
: P2 enhancement
Target Milestone: 4.0
Assignee: Ville Skyttä
QA Contact:
URL: http://www.robotstxt.org/wc/exclusion...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-10-25 22:34 UTC by Ville Skyttä
Modified: 2004-04-19 20:13 UTC (History)
0 users

See Also:


Attachments

Description Ville Skyttä 2002-10-25 22:34:12 UTC
This one could be pretty trivial to implement just by changing W3C::UserAgent to
inherit from LWP::RobotUA instead of LWP::UserAgent.

What's needed:
- Check if there are any side effects for the superclass change.
- Provide an option to behave badly, ie. not respect robots.txt.
- Make sure that the user understands whenever this causes 403's.
Comment 1 Ville Skyttä 2002-12-25 05:33:35 UTC
See also the thread at
<http://lists.w3.org/Archives/Public/www-validator/2002Dec/0211.html> for
additional discussion and resources.
Comment 2 Ville Skyttä 2004-04-19 16:13:55 UTC
This is now implemented in CVS, and will be in the next version (3.9.3).