Proxy Caching

When cern_httpd is run as a proxy it can perform caching of the documents retrieved from remote hosts to make futher requests faster.


Turning Caching On and Off

Caching is normally turned implicitly on by specifying the Cache Root Directory, but it can be explicitly turned on and off by Caching directive:
        Caching On

Setting Cache Directory

Caching is enabled on a server running as a gateway (proxy) by CacheRoot directive, which is used to set the absolute path of the cache directory:
        CacheRoot /absolute/cache/directory

Cache Size

CacheSize directive sets the maximum cache size in megabytes. Default value is 5MB, but its preferable to have several megabytes of cache, like 50-100MB, to get best results. Cache may, however, temporarily grow a few megabytes bigger than specified.

Example

        CacheSize 20
sets cache size to 20 megabytes.


NoCaching

URLs matching a template given by NoCaching directive will never be cached, e.g.:
        http://really.useless.site/*
Currently templates can have only a single wildcard character *.

Maximum Time to Keep Cache Files

All cached documents matching a specified template and that are older than specified by CacheClean directive will be removed. This value overrides expiry date in that no file can be stored longer than this value specifies, regardless of expiry date.

Examples

        CacheClean http:*     1 month
        CacheClean ftp:*     14 days
        CacheClean gopher:*   5 days 12 hours

Maximum Time to Keep Unused Files

Cache files matching a template and having been unused longer than specified by CacheUnused directive will be removed.

Examples

        CacheUnused *                      4 days 12 hours
        CacheUnused http://www.w3.org/*  7 days
        CacheUnused ftp://some.server/*   14 days
Note that the last matching specification will have precedence; therefore HTTP files from info.cern.ch will be kept 7 days, and not 4.5 days.


Default Expiry Time

Files for which the server gave neither Expires: nor Last-Modified: header will be kept at most the time specified by CacheDefaultExpiry directive. Default values are zero for HTTP (script replies shouldn't be cached), and 1 day for FTP and Gopher.

Example

        CacheDefaultExpiry ftp:*     1 month
        CacheDefaultExpiry gopher:*  10 days
WARNING: Default expiry for HTTP will almost always cause problems because there are currently many scripts that don't give an expiry date, yet their output expires immediately. Therefore, it is better to keep the default value for http: in zero.


CacheLastModifiedFactor

Currently HTTP servers give usually only the Last-Modified time, but not Expires time. Last-Modified can often be successfully used to approximate expiry date. CacheLastModifiedFactor gives the fraction of time since last modification to give the remaining time to be up-to-date.

Default value is 0.1, which means that e.g. file modified 20 days ago will expire in 2 days.

Examples

        CacheLastModifiedFactor  0.2
would cause files modified 5 months ago to expire after one month.

This feature can be turned off by specifying:

        CacheLastModifiedFactor  Off

CacheTimeMargin

Sometimes inaccurate times on other hosts cause confusion in caching. It often also makes sense not to cache documents that will expiry in a couple of minutes anyway. CacheTimeMargin defines this time margin, by default:
        CacheTimeMargin  2 mins
No document expiring in less than two minutes will be written to disk.


CacheNoConnect

This directive puts proxy to standalone cache mode, i.e. only the documents found in the cache are returned, and ones no in the cache will return error rather than connection to the outside world. This is useful for demo-purposes and in other cases without network connection:
        CacheNoConnect On
Default setting is naturally Off.

This directive is typically used with expiry checking also turned Off.


CacheExpiryCheck

If (for demo-reasons etc) it's desired that the proxy always returns documents from the cache, even if they have expired, CacheExpiryCheck can be turned off:
        CacheExpiryCheck  Off
Default setting is On, meaning that proxy never returns an expired document.

This is usually used in standalone cache mode (CacheNoConnect diretive turned On).


Garbage Collection

When caching is enabled garbage collection is also activated by default. This can be explicitly turned off with Gc directive:
        Gc  Off

When to Do Garbage Collection

Garbage collection is launched right away when cache size limit is reached. However, to keep cache smaller it might be desirable to remove expired files even if there is still cache space remaining. There are three directives controlling garbage collection scheduling:
        GcDailyGc      time
        GcTimeInterval time
        GcReqInterval  requests

GcDailyGc specifies the time to do daily garbage collection, normally during the night. Default value is 3:00. Daily garbage collection can be disabled by specifying Off.

GcTimeInterval specifies the number of hours after which time to do garbage collection. This can be turned off by specifying Off instead of the time. Default value is Off.

GcReqInterval specifies the maximum number of requests between successive garbage collections. This can also be turned off, which is the default.

Examples

Default values would be specified as:
        GcDailyGc       3:00
        GcTimeInterval  Off
        GcReqInterval   Off
Another example:
        GcDailyGc       Off
        GcTimeInterval  4 hours 30 mins
        GcReqInterval   1000

Memory Usage of Garbage Collector

Garbage collector performs its job best if if can read information about the whole cache into memory at once. This is not possible if the machine doesn't have enough main memory.

GcMemUsage directive advices garbage collector about how much memory to use. You may imagine this is the number of kilobytes to use for gc data, but it may vary greatly according to dynamic things, like the directory structure of cached files.

Default is 500; if gc fails because memory runs out make this smaller. If your machine has so much memory that it just can't run out, make this very big.

Example

        GcMemUsage 100
if you have very little memory.


Cache File Sizes

There are two limits controlling the size factor of a file when its value is being calculated. CacheLimit_1 sets the lower limit; under this all the files have equal size factor. CacheLimit_2 sets up higher limit; files bigger than this get extremely bad size factor (meaning they get removed right away because they are too big).

Sizes are specified in kilobytes, and defaults values are 200K and 4MB, respectively.

Examples

        CacheLimit_1 200
        CacheLimit_2 4000
would set the same values as the defaults, 200K and 4MB.


Cache Lock Timeout

During retrieval cache files are locked. If something goes wrong a lock file may be left hanging. CacheLockTimeOut directive sets the amount of time after which lock can be broken. Time is specified like all the other times in the configuration file, and default value is 20 minutes, the same as default OutputTimeOut. CacheLockTimeOut should never be less than OutputTimeOut!

Example

        CacheLockTimeOut  30 mins
would set lock timeout to half an hour.


CacheAccessLog

Cache accesses can be logged to different log files instead of the normal access log. The CacheAccessLog directive has two forms, one with absolute log file pathname:
        CacheAccessLog  /absolute/path/file.log
This causes all the cache accesses to be logged onto a single log.

In the second form only a filename is specified:

        CacheAccessLog  file.log
This causes individual logfiles to be created for all the remote hosts that have cached files. Log files are created under the subdirectories named after remote machines in CacheRoot/{http,ftp,gopher}.


httpd@info.cern.ch