FYI Cache-control deployment

Hi All

this is just for interest sake.  Part of our load testing we hammer our 
proxy with a whole bunch of crawlers out onto the 'net.  In the last run 
we were testing our new cache.  After about a million hits crawling 
sites, I was wondering why we only had about 200,000 files in cache.  We 
cache anything with a cache validator (ETag, Last-modified), freshness 
info (Expires), or appropriate Cache-Control response directives 
(max-age, s-maxage, public, must-revalidate etc).  It seemed to me the 
cachability of the net was not great, which limits cache effectiveness.

So I turned on counting of each different Cache-control header 
combination we received.  The results were quite interesting.

* About 70% of responses didn't include a Cache-control header at all
* Of the remaining 30%, about 80% used the Cache-control header to 
prevent caching (no-store, private).

So only about 7% of sites seem to be using Cache-control to actually 
specify how to cache something (e.g. specify freshness and revalidation 
information).  This is quite disappointing.

There were quite a few sites that sent conflicting directives. 
The private directive is odd, since there was no authentication going on.

The numbers above are only approximate, if anyone is interested, I can 
post better / more rigorous results after our next test. 

It does seem to show on the face of it that

a) Cache-control isn't well supported in the wild
b) There's a lot of confusion about Cache-control directives (based on 
the combinations people choose).

Cheers

Adrien

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com

Received on Wednesday, 25 November 2009 21:24:25 UTC