This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4998 - Validator sometimes uses cached content (cache-control header on requests would be useful)
Summary: Validator sometimes uses cached content (cache-control header on requests wou...
Status: RESOLVED FIXED
Alias: None
Product: Validator
Classification: Unclassified
Component: check (show other bugs)
Version: 0.8.1
Hardware: All All
: P2 normal
Target Milestone: 0.8.2
Assignee: Olivier Thereaux
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-31 10:06 UTC by Tim Jackson
Modified: 2008-12-01 03:03 UTC (History)
1 user (show)

See Also:


Attachments

Description Tim Jackson 2007-08-31 10:06:51 UTC
The Validator does not explicitly request non-cached content when it requests a page for validation. Although not actually wrong per se, this produces counter-intuitive and confusing results and makes it difficult to use the Validator for useful development purposes if there happens to be a caching proxy between the Validator and the target site, because as one fixes validation errors, it's impossible to see the results because the Validator will keep seeing the older cached version, until the cache expires.

This could be easily fixed by adding a Cache-control header to the HTTP request that the Validator makes; for example "Cache-control: max-age=0" would probably serve the purpose. If adding this to every request made the developers uncomfortable, it could be made a UI option (e.g. "Request uncached version?" or something), and/or a config file option although to be honest I can't see the downside of simply adding it to every request; it's rarely that someone is going to actually want to see the results of validation on a possibly out-of-date cached copy.

I have verified that the Validator does not do this on v0.8.1; a typical HTTP request from the Validator looks like this:

GET / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: example.com
User-Agent: W3C_Validator/1.555
Comment 1 Olivier Thereaux 2007-09-03 01:23:37 UTC
Could you give a practical test case where the current behavior of the validator is problematic?

As far as I can tell:

* the validator does not use a cache for the resources it checks

* I am assuming that the problem you describe comes from having a caching proxy between the client (the validator) and the server, and that the cache, when receiving the request from the validator, does not validate the freshness of its cached version, which seems rather wrong. Requests with "Cache-Control: max-age=0" might help per section 13.1.6 of RFC 2616, but if the cache is misbehaving, it's not even a certainty...
Comment 2 Tim Jackson 2007-09-03 12:42:13 UTC
This can be easily reproduced by:

* placing any caching proxy (e.g. Squid) in reverse proxy mode in front of a website
* requesting validation of a page on that site
* changing the content of the page which was validated
* re-requesting validation within the time period for which the page is cached (according to HTTP rules) by the intermediate proxy

The cache is not misbehaving by returning cached content (as long as it's in accordance with HTTP rules, for example based on Expires, Cache-control or other headers in the response) in the absence of an explicit request from the client (the Validator) to force a freshness check; it's doing exactly what it's supposed to do. And neither is the Validator actually misbehaving.  The point is that with the current HTTP request headers from the Validator, it's acceptable (under some circumstances) for an intermediate cache to return a cached representation. BUT, the point is that the user will generally not want to validate cached content, so adding the Cache-Control header to the request would make the Validator work more intuitively. (On reflection it should probably be "Cache-control: no-cache", although the practical effect should be identical).
Comment 3 Olivier Thereaux 2007-09-11 05:04:12 UTC
(In reply to comment #2)

> The point
> is that with the current HTTP request headers from the Validator, it's
> acceptable (under some circumstances) for an intermediate cache to return a
> cached representation. 

Understood, thanks for the excellent explanation. 
Comment 4 Olivier Thereaux 2007-09-11 05:07:15 UTC
The fix is in CVS, will be in the next release:
http://lists.w3.org/Archives/Public/www-validator-cvs/2007Sep/0058.html
Comment 5 Tim Jackson 2007-09-18 08:55:59 UTC
Thanks very much for fixing this.
Comment 6 Ville Skyttä 2007-09-18 16:24:35 UTC
Done also for the link checker in CVS.