This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
128.30.52.13 - - [27/Sep/2004:11:45:30 -0400] "GET /robots.txt HTTP/1.1" 200 26 Why did your site probe the robots.txt file on my server ublib.buffalo.edu 120 times this morning between 8:55 and 11:45? It has been doing this since last week and it keeps on probing whether or not the robots.txt file exists.
Most likely this is the Link Checker doing this, not the CSS validator. Our site is not "probing" yours. Someone (possibly someone local to you, or someone with a site linking to yours) is certainly checking links to your site, and the link checker is following the robots exclusion protocol and doing your server a favor in doing so. That said, a possible enhancement would be that the link checker cache the existence/lack of robots.txt for a given site instead of querying for it again and again. Reassigning to proper product and owner.
Right, the /robots.txt fetches should be cached, and actually as far as the low level implementation (LWP::RobotUA) is concerned, they _are_ cached. But in the current version of the current link checker codebase, we're instantiating several W3C::UserAgent (a superclass of LWP::RobotUA) objects per link checker run, and the /robots.txt information cache is not shared between these instances by default; instead, every one of them maintains its own small cache, practically resulting in very little caching, if at all :( The real fix would be to instantiate exactly one W3C::UserAgent per link checker run and use that for fetching all links (unless we want to do parallel fetching sometime), but that is a very intrusive change and will most likely have to wait until the next major link checker version. However, I believe it is possible to come up with an interim solution by managing a "global" WWW::RobotRules object ourselves and passing that to all instantiated UserAgents. I'll look into it.
Turns out to be that the most trivial one of the workarounds is not possible due to a bug in upstream WWW::RobotRules. Fix for that already sent to libwww-perl mailing list, no comments yet; will think about other workaround alternatives in the meantime.
Fixed in CVS by using the same W3C::UserAgent instance for all retrievals. It ain't pretty, but it works...