W3C httpd manual

Proxy Caching

When W3C httpd is run as a proxy it can perform caching of the documents retrieved from remote hosts to make futher requests faster.

Turning Caching On and Off

Caching is normally turned implicitly on by specifying the Cache Root Directory, but it can be explicitly turned on and off by Caching directive:
        Caching On

Setting Cache Directory

Caching is enabled on a server running as a gateway (proxy) by CacheRoot directive, which is used to set the absolute path of the cache directory:
        CacheRoot /absolute/cache/directory

Cache Size

CacheSize directive sets the maximum cache size in megabytes. Default value is 5MB, but its preferable to have several megabytes of cache, like 50-100MB, to get best results. Cache may, however, temporarily grow a few megabytes bigger than specified.


        CacheSize 20 M
sets cache size to 20 megabytes.


URLs matching a template given by NoCaching directive will never be cached, e.g.:
From version 3.0 on templates can have any number of wildcard characters *.


Only the URLs matching templates given by CacheOnly directives will be cached, e.g.:
From version 3.0 on templates can have any number of wildcard characters *.

Maximum Time to Keep Cache Files

All cached documents matching a specified template and that are older than specified by CacheClean directive will be removed. This value overrides expiry date in that no file can be stored longer than this value specifies, regardless of expiry date. Templates can have any number of wildcard characters (asterisks).


	CacheClean http://*.edu/*	1 month
        CacheClean http:*		3 weeks
        CacheClean ftp:*		14 days
        CacheClean gopher:*		5 days 12 hours
NOTE: The first matching CacheClean directive will be applied [this changed in version 3.0pre4].

Maximum Time to Keep Unused Files

Cache files matching a template and having been unused longer than specified by CacheUnused directive will be removed. Templates can have any number of wildcard characters (asterisks).


        CacheUnused http://www.w3.org/*  7 days
        CacheUnused ftp://some.server/*   14 days
        CacheUnused *                      4 days 12 hours
Note that the first matching specification will be applied; therefore HTTP files from w3.org will be kept 7 days, and not 4.5 days. [The order was vice-versa before the version 3.0pre4.]

Default Expiry Time

Files for which the server gave neither Expires: nor Last-Modified: header will be kept at most the time specified by CacheDefaultExpiry directive. Default value is zero for all documents. With HTTP default should be kept in zero (script replies shouldn't be cached); for FTP and Gopher you migth consider expiry time such as 6 hours or 1 day. Without this setting FTP and Gopher documents never get cached because FTP and Gopher protocols don't have the notion of an expiry or last modification time.


        CacheDefaultExpiry ftp://ftp.w3.org/*	2 days
        CacheDefaultExpiry ftp:*		1 day
        CacheDefaultExpiry gopher:*		6 hours 30 minutes
IMPORTANT: In case of many CacheDefaultExpiry directives the first one that matches will be used [this order was the opposite before the version 3.0pre4].

WARNING: Default expiry for HTTP will almost always cause problems because there are currently many scripts that don't give an expiry date, yet their output expires immediately. Therefore, it is better to keep the default value for http: in zero.


Currently HTTP servers give usually only the Last-Modified time, but not Expires time. Last-Modified can often be successfully used to approximate expiry date. CacheLastModifiedFactor gives the fraction of time since last modification to give the remaining time to be up-to-date.

Default value is 0.1, which means that e.g. file modified 20 days ago will expire in 2 days.


        CacheLastModifiedFactor  0.2
would cause files modified 5 months ago to expire after one month.

This feature can be turned off by specifying:

        CacheLastModifiedFactor  Off

New in Version 3.0pre5

Since version 3.0pre5 you will be able to specify a different LM factor according to URL patterns; e.g.
        CacheLastModifiedFactor  http://www.w3.org/httpd/*  0.10
        CacheLastModifiedFactor  http://www.w3.org/*        0.15
        CacheLastModifiedFactor  *                            0.20


Normally garbage collector removes all the expired files immediately; this saves disk space, but decreases the efficiency of conditional GET (if-modified-since) request. By turning the KeepExpired directive On the entire cache space is used up efficiently, and files get removed only when space is needed for new cache files:
        KeepExpired  On


Sometimes it is vital to have always up-to-date information from a certain site, regardless of expiry times specified by the remote server or calculated by the proxy. CacheRefreshInterval directive can be used to specify a cache refresh interval for URL's matching a given pattern. This will cause httpd to check that the file is still up-to-date if more than the maximum allowed time has passed since the last check, even if it would still seem to be up to date according expiry date.

Note that the cache refresh happens only if and when the document is requested, so if you have a refresh interval of 2 hours it doesn't mean that all the files in cache are fetched every two hours.

As a special case, specifying the refresh interval to be zero every cache access will cause a check to be made from remote server. This is ideal for users who need to have always the absolutely most up-to-date version, but still want faster response times and saves in network costs. This is still cheaper as all the checks are performed using the conditional GET request (with If-Modified-Since header), which sends the document only if it has changed, and otherwise tells the proxy to use the cache.


        CacheRefreshInterval  http://www.w3.org/httpd/*  1 day
        CacheRefreshInterval  http://www.w3.org/*        5 days
        CacheRefershInterval  http://weather.machine/*     2 hours
        CacheRefreshInterval  *                            1 week


Sometimes inaccurate times on other hosts cause confusion in caching. It often also makes sense not to cache documents that will expiry in a couple of minutes anyway. CacheTimeMargin defines this time margin, by default:
        CacheTimeMargin  2 mins
No document expiring in less than two minutes will be written to disk.


This directive puts proxy to standalone cache mode, i.e. only the documents found in the cache are returned, and ones no in the cache will return error rather than connection to the outside world. This is useful for demo-purposes and in other cases without network connection:
        CacheNoConnect On
Default setting is naturally Off.

This directive is typically used with expiry checking also turned Off.


If (for demo-reasons etc) it's desired that the proxy always returns documents from the cache, even if they have expired, CacheExpiryCheck can be turned off:
        CacheExpiryCheck  Off
Default setting is On, meaning that proxy never returns an expired document.

This is usually used in standalone cache mode (CacheNoConnect diretive turned On).

Garbage Collection

When caching is enabled garbage collection is also activated by default. This can be explicitly turned off with Gc directive:
        Gc  Off

Daily Garbage Collection

Garbage collection is launched right away when cache size limit is reached. However, to keep cache smaller it might be desirable to remove expired files even if there is still cache space remaining. It is possible to to launch garbage collection at a certain time, usually outside the busy hours:l
        GcDailyGc      time

GcDailyGc specifies the time to do daily garbage collection, normally during the night. Default value is 3:00. Daily garbage collection can be disabled by specifying Off.


Default value would be specified as:
        GcDailyGc       3:00
Another example: turning daily gc off:
        GcDailyGc       Off

Memory Usage of Garbage Collector

Garbage collector performs its job best if if can read information about the whole cache into memory at once. This is not possible if the machine doesn't have enough main memory.

GcMemUsage directive advices garbage collector about how much memory to use. You may imagine this is the number of kilobytes to use for gc data, but it may vary greatly according to dynamic things, like the directory structure of cached files.

Default is 500; if gc fails because memory runs out make this smaller. If your machine has so much memory that it just can't run out, make this very big.


        GcMemUsage 100
if you have very little memory.

Cache File Sizes

There are two limits controlling the size factor of a file when its value is being calculated. CacheLimit_1 sets the lower limit; under this all the files have equal size factor. CacheLimit_2 sets up higher limit; files bigger than this get extremely bad size factor (meaning they get removed right away because they are too big).

Sizes are specified in kilobytes, and defaults values are 200K and 4MB, respectively.


        CacheLimit_1 200 K
        CacheLimit_2 4000 K
would set the same values as the defaults, 200K and 4MB.

Cache Lock Timeout

During retrieval cache files are locked. If something goes wrong a lock file may be left hanging. CacheLockTimeOut directive sets the amount of time after which lock can be broken. Time is specified like all the other times in the configuration file, and default value is 20 minutes, the same as default OutputTimeOut. CacheLockTimeOut should never be less than OutputTimeOut!


        CacheLockTimeOut  30 mins
would set lock timeout to half an hour.

httpd@w3.org, July 1995