Configuration File of CERN httpd

In This Section...

Example configuration files
/etc/httpd.conf - Default configuration file
# - Comment sign
Map - Map URLs to actual files
Pass - Accept a request
Fail - Fail a request
Redirect - Redirect a request
Protect - Set up protection
DefProt - Default protection setup
Exec - Executable server scripts
Search - Index search facility
AddType - Filename suffix mappings to MIME Content-Types
AddEncoding - Filename suffix mappings to MIME Content-Transfer-Encodings
AddLanguage - Filename suffix mappings to different Content-Languages, multilanguage support
UserDir - User-supported directories, URLs starting /~username
MetaDir - Directory name for meta-information files
MetaSuffix - Suffix for meta-information files
NoLog - No log entries for listed hosts/domains
Disable - Disable methods that you don't need/want
Enable - Enable a desired method

New In 2.16

DirAccess - Enable/Selective/Disable directory listings
DirReadme - Configure/disable README-feature
AccessLog - Set access log file name
ErrorLog - Set error log file name
LogFormat - Set access log file format
LogTime - Set time zone for log files
SuffixCaseSense - Set suffix case sensitivity
InputTimeOut - Timeout for request
OutputTimeOut - Timeout for response
UserId - Default user to run as (instead of nobody)
GroupId - Default group to run as (instead of nogroup)

Firewall Gateway Caching and Other Directives

CacheRoot - Set cache root directory for a proxy server
CacheSize - Specify cache size (in megabytes)
CacheClean - Remove everything older than this (in days)
CacheUnused - Remove if has been unused this long (in days)
CacheDefaultExpiry - Default expiry time if not given by remote server (in days)
GcTimeInterval - Interval to do cache garbage collection (in hours)
GcReqInterval - Number of requests between garbage collections
GcMemUsage - Garbage collector memory usage directive
CacheLimit_1 - First cache file size limit (kilobytes)
CacheLimit_2 - Second cache file size limit (kilobytes)
CacheLockTimeOut - Break cache locks after this amount of seconds.
http_proxy
ftp_proxy
gopher_proxy
wais_proxy - Make firewall gateway (proxy) connect to another gateway

The configuration file (often referred to as the rule file) defines how httpd will translate a request into a document name. It allows one to provide an extra level of name mapping above that given by symbolic links in the file system. It allows, for example, out of date names to mapped onto their more recent counterparts.

The configuration file also allows access to be restricted. This is essential, to prevent, for example, unauthorized access to your private documents.

Note: The configuration file is not essential if you want to just export one directory tree, but then you must remember to specify the exported directory in command line:

        httpd -p 80 /your/exported/directory

The server guesses the data types of file from the file suffix. A configuration file is necessary to specify any data types which are not in the default set of suffixes. However, the default set is quite extensive.

Default Configuration File

By default, the configuration file /etc/httpd.conf is loaded, unless specified otherwise with the -r command line option:

        httpd -p 80 -r /your/own/httpd.conf

Comments in Configuration File

Each line consists of an operation code and one or two parameters, referred to as the template and the result. Lines starting with a hash sign # are ignored, as are empty lines.

Mapping, Passing and Failing

There are three main rules: Map, Pass and Fail. The server uses the top rule first, then each successive rule unless told otherwise by a Pass or a Fail rule.

Map template result: If the address matches the template, use the result string from now on for future rules.
Pass template: If the address maches the template, use it as it is, porocessing no further rules.
Pass template result: If the string matches the template, use the result string as it is, processing no futher rules.
Fail template: If the address matches the template, prohibit access, processing no futher rules.

The template string may contain one wildcard asterisk *. The result string may have the wildcard only if the template has one.

When matching,

Rules are scanned from the top of the file to the bottom.
If a request matches a Map template exactly, the result string is used instead of the original string and applied to successive rules.
If the request maches a Map template with wildcard, then the text of the request which matches the wildcard is inserted in place of the wildcard in the result string to form the translated request. If the result string has no wildcard, it is used as it is.
When a Map substitution takes place, the rule scan continues with the next rule using the new string in place of the request. This is not the case if a Pass or Fail is matched: they terminate the rule scan.

Redirecting Requests Elsewhere

When documents, or entire trees of documents, are moved from one server to another, you can use Redirect rule to tell httpd to redirect the request to another server. If the client program is smart enough user won't even notice that the document is retrieved from a different server.

Redirect template result: Document matching template is redirected to result, which must be a full URL (i.e. containing http: and the host name).

Example

        Redirect  /hypertext/WWW/*  http://www.cern.ch/WebDocs/*

This redirects everything starting with /hypertext/WWW to host www.cern.ch into virtual directory /WebDocs. For example, /hypertext/WWW/ would be redirected to http://www.cern.ch/WebDocs/.

Setting Up User Authentication and Document Protection

Documents are protected by Protect and DefProt rules. Their syntax is the following:

DefProt template setup-file [uid.gid]: Any document matching the template is associated with protection setup-file. The documents are not yet taken to be protected, but they may become protected by an existing access control list file in the same directory as the requested file, or by later matching a Protect rule. If that Protect rule doesn't specify setup-file, the one from the latest DefProt rule is used.
Protect [template setup-file [uid.gid]]: Any document matching template is protected. The type of protection is defined in finer detail in setup-file.
If setup-file is not specified the one from previous matched DefProt rule will be used. If none have matched access to the file is forbidden.

setupfile is always a full pathname for the protection setup file which specifies the actual protection parameters.

Setup file can be omitted from Protect rule, but it is obligatory in DefProt rule. If setup file is omitted it is not possible to give the uid.gid part, either.

uid.gid are the Unix user id and group id (either by name or by number, separated by a dot) to which the server should change when serving the request. These are only meaningful when the server is running as root. If they are missing they default to nobody.nogroup.

Note: Uid and gid are inherited from DefProt rule to Protect rule only when the setup-file is also inherited. If setup-file is specified for Protect rule but uid.gid is not, they default to nobody.nogroup regardless of the previous DefProt rule.

This is to avoid accidentally running the server under wrong user id with wrong setup file. This information should logically go into the protection setup file, but for safety reasons it cannot be done, because a non-trustworthy collaboration could specify it to be root. This way only the main webmaster can control user and group ids.

Executable Server Scripts

Document address is mapped into a script call by Exec rule:

        Exec template script

In both template and script there must be a * wildcard, that matches everything starting from the script filename. This is to enable httpd to know what is the script name and what is the extra path information to be passed to the script.

Example

You want to map everything starting with /your/url/doit to execute the script /usr/etc/www/htbin/doit. You do this by saying:

        Exec  /your/url/*  /usr/etc/www/htbin/*

Here asterisk mathes the script name doit (and everything else that follows it). Usually people use some fixed keyword in front of the pathname in URL to point out that the document is actually a script call. Often this keyword is /htbin. That is, usually your Exec rule looks like this:

        Exec  /htbin/*  /usr/etc/www/htbin/*

and all the URLs pointing to the scripts start with /htbin, for example /htbin/doit in the previous example.

Historical Note (HTBin Rule)

CERN httpd versions 2.13 and 2.14 had a hard-coded handling of URL pathnames starting /htbin that mapped them to scripts in a directory specified via HTBin rule:

        HTBin /your/htbin/directory

This is still handled automatically by httpd, by translating it to its equivalent Exec form:

        Exec /htbin/*  /your/htbin/directory/*

Always use Exec instead -- it is more general.

Index Search Facility

Server automatically calls a script to perform search, if the absolute pathname of search script is supplied by a Search field in rule file:

        search /search/script/pathname

This script is called with URL pathname of the document from which the query was issued from, in PATH_INFO environment variable, and absolute (translated) document pathname in PATH_TRANSLATED environment variable. Keyword part of the URL is (undecoded) in QUERY_STRING environment variable, and also decoded as command line parameters, one in each of argv[1], argv[2], ...

Search script must conform to CGI/1.0 rules, that is, it has to output either a Location: field, or start its output with:

        Content-Type: text/html

followed by a blank line. (The Content-Type can, of course, be also other than text/html -- this was just an example.

Binding Suffixes to MIME Content-Types

As well as any mapping lines in the rule file, the rule file may be used to define the data types of files with particular suffixes. CERN httpd has an extensive set of predefined suffixes, so usually you don't need to specify any.

The syntax is:

        AddType .suffix representation encoding [quality]

The parameters are as follows:

suffix: The last part of the filename. There are two special cases. *.* matches to all files which have not been matched by any explicit suffixes but do contain a dot. * by itself matches to any file which does not match any other suffix.
representation: A MIME Content-Type style description of the repreentation in fact in use in the file. See the HTTP spec. This need not be a real MIME type - it will only be used if it matches a type given by a client.
encoding: A MIME content transfer encoding type. Much more limited in variety than representations, basically whether the file is ASCII (7bit or 8bit) or binary. A few other encodings are allowed, and maybe extension to compression.
quality: Optional. A floating point number between 0.0 and 1.0 which determines the relative merits of files xxx.* which differ in their suffix only, when a link to xxx.multi is being resolved. Defaults to 1.0.

Examples

        AddType .html text/html              8bit     1.0
        AddType .text text/plain             7bit     0.9
        AddType .ps   application/postscript 8bit     1.0
        AddType *.*   application/binary     binary   0.1
        AddType *     text/plain             7bit

Historical Note (Suffix Directive)

AddType was previously called Suffix. The old name is still understood, but may be misleading since suffixes are also used to determine Content-Transfer-Encoding and language. Always use AddType instead.

Binding Suffixes to MIME Content-Transfer-Endocings

Suffixes are also used to determine the Content-Transfer-Encoding of a file (.Z suffix for x-compressed, for example). Syntax is:

        AddEncoding .suffix  encoding

Example

        AddEncoding .Z  x-compress

Multilanguage Support

Multilanguage support is also built on using suffixes to determine the language of a document. Suffix is bound to a language by AddLanguage rule (.en suffix for english, for example). Syntax is:

        AddLanguage .suffix  encoding

Examples

        AddLanguage .en  en
        AddLanguage .uk  en_UK

User-Supported Directories

User-supported directories, URLs of form /~username, are enabled by UserDir directive:

        UserDir dir-name

The dir-name argument is the directory in each user's home directory to be exported, for example WWW or Web.

Meta-Information

It is possible to tell httpd to add meta-information to response. Meta-information is stored in a directory specified by MetaDir directive, under the same directory as the file being retrieved:

        MetaDir  dir-name

Meta-information is stored in a file with the same name as the actual document, but appended with a suffix specified via MetaSuffix directive:

        MetaSuffix  .suffix

Meta-information files contain RFC822-style headers.

Suppressing Log Entries For Certain Hosts/Domains

It's not always necessary to collect log information of accesses made by local hosts. The NoLog directive can be used to prevent log entry being made for hosts matching a given IP number or host name template:

        NoLog  template

Examples

        NoLog 128.141.*.*
        NoLog *.cern.ch
        NoLog *.ch  *.fr  *.it

Enabling and Disabling HTTP Methods

You can enable/disable methods that you do/don't want your server to accept:

        Enable  method
        Disable method

By default GET, HEAD and POST are enabled, and the rest are disabled.

Examples

        Enable POST
        Disable DELETE

New In 2.16

Controlling Directory Browsing

DirAccess on: Enable directory browsing in all directories (which are not forbidden by rules). Synonym with -dy command line option. Default.
DirAccess off: Disable directory browsing. Synonym with -dn command line option.
DirAccess selective: Enable selective directory browsing - only directories containing the file .www_browsable are allowed. Synonym with -ds command line option.

README Feature

DirReadme top: For any browsable directeory containing a README file, include the text at the top of the directory listing. Synonym with -dt command line option. Default.
DirReadme bottom: Same as previous, but contents of README appear on the bottom. Synonym with -db command line option.
DirReadme off: Disables the README inclusion feature. Synonym with -dr command line option.

Access Log File

Access log file contains a log of all the requests. The name of the log file is spesified either by -l logfile command line option, or with AccessLog directive:

        AccessLog /absolute/path/logfile

Error Log File

Error log contains a log of errors that might prove useful when figuring out if something doesn't work. Error log file name is set by ErrorLog directive:

        ErrorLog /absolute/path/errorlog

If error log file is not specified, it defaults to access log file name with .error extension. If the filename extension already exists, .error will replace it.

Log File Format

Previously every server used to have its own logfile format which made it difficult to write general statistics collectors. Therefore there is now a common logfile format (which will eventually become the default). Currently it is enabled by

        LogFormat  common

The old CERN httpd format can be used by

        LogFormat  old

Log Time Format

Times in the log file are by default local time. That can be changed to be GMT time by LogTime directive:

        LogTime  gmt

Default is:

        LogTime  localtime

Suffix Case Sensitivity

Suffix case sensitivity is by default off. You can make suffixes case sensitive with SuffixCaseSense directive:

        SuffixCaseSense On

Timeouts

You can specify timeouts for how long the server will wait for request from the client, and to send response. Timeouts are all specified in seconds. The defaults are:

        InputTimeOut 120
        OutputTimeOut 1200

That is, 2 minutes and 20 minutess, respectively.

Default User Id

UserId directive sets the default user to run as instead of nobody. This directive is only meaningful when running server as root.

        UserId whoever

Default Group Id

GroupId directive sets the default group to run under instead of nogroup. This directive is only meaningful when running server as root.

        GroupId whichever

Caching

Setting Cache Directory

Caching is enabled on a server running as a gateway (proxy) by CacheRoot directive, which is used to set the absolute path of the cache directory:

        CacheRoot /absolute/cache/directory

Cache Size

CacheSize directive sets the maximum cache size in megabytes. Default value is 5MB, but its preferable to have several megabytes of cache, like 50-100MB, to get best results. Cache may, however, temporarily grow a few megabytes bigger than specified.

Example

        CacheSize 20

sets cache size to 20 megabytes.

Maximum Time to Keep Cache Files

All cache files older than specified by CacheClean directive will be removed. This value overrides expiry date in that no file can be stored longer than this value specifies, regardless of expiry date. Default value is 21 days.

Example

        CacheClean 14

would cause everything older than two weeks to be removed.

Maximum Time to Keep Unused Files

All cache files that have been unused longer than specified by CacheUnused directive will be removed. Default value is 14 days.

Example

        CacheUnused 7

would set this to one week.

Default Expiry Time

Files for which the server gave neither Expires: nor Last-Modified: header will be kept at most the number of days specified by CacheDefaultExpiry directive. Default value is 7 days.

Example

        CacheDefaultExpiry 1

would set this to one day.

How Often to Do Garbage Collection

Garbage collection is launched right away when cache size limit is reached. However, expired files should to be removed even if there is still cache space remaining. There are two directives controlling garbage collection scheduling:

        GcTimeInterval hours
        GcReqInterval requests

GcTimeInterval specifies the number of hours after which time to do garbage collection. Default value is 24 hours.

GcReqInterval specifies the maximum number of requests between successive garbage collections. Default value is 10000 requests.

Memory Usage of Garbage Collector

Garbage collector performs its job best if if can read information about the whole cache into memory at once. This is not possible if the machine doesn't have enough main memory.

GcMemUsage directive advices garbage collector about how much memory to use. You may imagine this is the number of kilobytes to use for gc data, but it may vary greatly according to dynamic things, like the directory structure of cached files.

Default is 500; if gc fails because memory runs out make this smaller. If your machine has so much memory that it just can't run out, make this very big.

Example

        GcMemUsage 100

if you have very little memory.

Cache File Sizes

There are two limits controlling the size factor of a file when its value is being calculated. CacheLimit_1 sets the lower limit; under this all the files have equal size factor. CacheLimit_2 sets up higher limit; files bigger than this get extremely bad size factor (meaning they get removed right away because they are too big).

Sizes are specified in kilobytes, and defaults values are 200K and 4MB, respectively.

Examples

        CacheLimit_1 200
        CacheLimit_2 4000

would set the same values as the defaults, 200K and 4MB.

Cache Lock Timeout

During retrieval cache files are locked. If something goes wrong a lock file may be left hanging. CacheLockTimeOut directive sets the amount of time after which lock can be broken. Time is specified in seconds, default value is 1200 seconds (20 minutes), the same as default OutputTimeOut. CacheLockTimeOut should never be less than OutputTimeOut!

Example

        CacheLockTimeOut 1800

would set lock timeout to half an hour.

Going Through Many Gateways

If there is a need to make an (inner) proxy server connect to the outside world via another (outer) proxy server, you can use the same environment variables as are used to redirect clients to the proxy to make inner proxy use the outer one:

http_proxy
ftp_proxy
gopher_proxy
wais_proxy

E.g. your (inner) proxy server's startup script could look like this:

        #!/bin/sh
        http_proxy=http://outer.proxy.server:8082/
        export http_proxy
        /usr/etc/httpd -r /etc/inner-proxy.conf -p 8081

This is a little ugly, so there are also the following directives for the configuration file:

http_proxy http://outer.proxy.server/
ftp_proxy http://outer.proxy.server/
gopher_proxy http://outer.proxy.server/
wais_proxy http://outer.proxy.server/

httpd@info.cern.ch