This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29291 - robots.txt on 1 site supposedly blocking some URLs in other sites
Summary: robots.txt on 1 site supposedly blocking some URLs in other sites
Status: NEW
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: HEAD
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: qa-dev tracking
URL: http://cold32.com
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-15 02:52 UTC by Nick Levinson
Modified: 2015-11-15 02:52 UTC (History)
2 users (show)

See Also:


Attachments

Description Nick Levinson 2015-11-15 02:52:10 UTC
The W3C Link Checker, when I entered <http://cold32.com>, allowed 10 levels of recursion (more than needed), and set it to send the Referer, didn't check a small percentage of links because it supposedly was blocked by <robots.txt>, but I saw no block in http://cold32.com/robots.txt and don't know why any other robots.txt file would control this:

from <http://cold32.com/4/clothing-and-hair/2/where-to-buy-coats.htm>: http://www.gutenberg.org/cache/epub/7213/pg7213.txt

from <http://cold32.com/5/action/6/showers-but-not-heaters.htm>: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2094925/

from probably most *.htm and *.html pages: http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js

For this bug report, I guessed the component and the version; the version is actually 4.81.