CERN httpd 3.0 Guide for Prereleases

CERN WWW Server [httpd, HyperText Transfer Protocol Daemon] is a generic, full featured server for serving files using the HTTP protocol. This is a TCP/IP based protocol running by convention on port 80.

Files can be real or synthesized, produced by scripts generating virtual documents. It handle clickable images, fill-out forms, and searches etc.

CERN httpd can also be run as a proxy server to allow people behind firewalls to use the Web as if the firewall was not present. A powerful feature is caching performed by the proxy, which makes cern_httpd as proxy attract even those not inside a firewall.


In This Guide...

Installation
The steps necessary to install CERN server.
Administration
How to set up document protection, index search, clickable images, server-side scripts, ...

Ari Luotonen - May 1994 - httpd@info.cern.ch

About documents generated from hypertext

Paper manuals generated from hypertext are made for convenience, for example for reading when one has no computer to turn to. We have tried to make the hypertext into fairly conventional paper documents, but they may seem a little strange in some ways.

All the links have been removed. Therefore, it is worth looking at the table of contents to see what there is in the manual. Something which is not explained in place may be explained in detail elsewhere.

We have tried to keep related matter together, but sometimes necessarily you might have to check the table of contents to find it.

Please remember that these are for the most part "living documents". That is, they are constantly changing to reflect current knowledge. If you see a statement such as "Product xxx does not support this feature", remember that it was the case when the document was generated, and may not be the same now. So if in doubt, check the online version. Of course, the living document may be out of date too, in which case it is helpful to mail its author.

Tim BL
Ari Luotonen, CERN, May 1994

Installing CERN Server

VMS note: There are special instructions if you are installing under VMS.


Getting the Program

CERN server distribution is available from info.cern.ch anonymous ftp account. Often you don't need to compile the server yourself, precompiled binaries are available for many Unix platforms. If there is no precompiled version for your platform, of if it doesn't work (e.g. the name resolution doesn't work), you should get the source code and compile it yourself.

Configuration File

If you have all your documents in a single directory tree, say /Public/Web, the easiest way to make them available to the world is to specify the following rule in your configuration file:
        Pass	/*	/Public/Web/*
This maps all the requests under the directory /Public/Web and accepts them.

The default welcome document (what you get with URL of form http://your.host/) is now Welcome.html in the directory /Public/Web.


First Trying It Out In Verbose Mode

Often it is easy to make mistakes in the configuration file that makes configuring httpd feel tedious - this doesn't have to be so. In the beginning start httpd by hand in verbose mode to listen to some port, and look what happens when you make a request to that port with your browser.

Typically test servers are run on a non-priviledged port above 1024 (you don't have to be root to bind to them), often 8001, 8080, or such. Official HTTP port is 80.

The server port is defined in the configuration file with the Port directive, but you can override it with the -p command line option while testing; e.g.

        httpd -v -r /home/you/httpd.conf -p 8080
This will start httpd in verbose mode, use configuration file httpd.conf in your home directory, and accept connections to port 8080.

You can now try to request a document form your server using a URL of form:

        http://your.host:8080/document.html
where document.html is relative to the directory that you have exported in your configuration file.

If you get an error message back see the verbose output to find out what is going wrong - it is usually self-explanatory.

And remember, you should always feel free to ask advice from httpd@info.cern.ch.


The Actual Installation of httpd

In Unix you can run the server either as stand-alone, or from Internet Daemon (inetd). A stand-alone server is typically started once at system-boot time. It waits for incoming connections, and forks itself to serve a request. This is much faster than letting inetd spawn httpd every time a request comes. We therefore recommend that you run CERN httpd in stand-alone mode.

Stand-alone Installation

A stand-alone server is started from the bootstrap command file (for example /etc/rc.local) so that it runs continuously like the sendmail daemon, for example.

This method has the advantage over using the inetd that the response time is reduced.

Add a line starting httpd to your system startup file (usually /etc/rc.local or /etc/rc). If you have the configuration file in the default place, /etc/httpd.conf, and if it specifies the port to listen to via the Port directive, you don't need any command line options:

    /usr/etc/httpd &
httpd will automatically go background so there is really no need for an ampersand in the end (as long as your configuration file /etc/httpd.conf really exists).

Or a little more safely in case httpd is removed:

    if [ -f /usr/etc/httpd ]; then
        (/usr/etc/httpd  && (echo -n ' httpd') ) & >/dev/console
    fi
Naturally you can use any of the command line options, if necessary.


Registering Your Server

Once you have your httpd up and running, and you have documents to show the word, announce your server, so that others can find it.


If It Doesn't Work...

...first run it in verbose mode with the -v option and try to figure out what goes wrong. See also the debugging chart and the FAQ. If you can't figure out what's going wrong, feel free to send mail to httpd@info.cern.ch


httpd@info.cern.ch

Installing httpd Under inetd

This is how to to set up inetd to run httpd whenever a request comes in. (These steps are the same for any daemon under unix: you will probably find a similar thing has been done for the FTP daemon, ftpd, for example.)


Step 1: Install httpd Binary

Copy httpd into a suitable directory such as /usr/etc. Make it owned by root, and make it writable only to root, for example by saying:
        chmod 755 httpd

Step 2: Add http Service to /etc/services

Put "http" in the /etc/services file, or use the name of a specific service of your own if you want to use a special port number. Standard port number for HTTP is 80.
        http    80/tcp           # WWW server
Exceptions:

Step 3: Add a Line to /etc/inetd.conf

Put a line in the internet daemon configuration file, /etc/inetd.conf.
    http  stream  tcp  nowait  root  /usr/etc/httpd  httpd
First word is the same as in /etc/services file.

If you want to pass command line options or parameters to httpd, they would listed be in the end of line, for example to set the rule file to something else than the default /etc/httpd.conf:

    http  stream  tcp  nowait  root  /usr/etc/httpd  httpd -r /my/own/rules
Note: For httpd version 2.15 and later we recommend that it is run as user root. Running httpd as root is safe, since it automatically resets its user-id to nobody. However, if you decide to use access authorization features, and you need to serve protected files, httpd will have to be able to set its user-id to some other uid as well. In any case, httpd always sets its user-id to something other than root before serving the file to the client.

Note: /etc/inetd.conf syntax varies from system to system, for example all systems don't have the field specifying the user name, in which case the default is root. If in doubt, sopy the format of other lines in your existing inetd.conf.

Note: There seems to be a limit of 4 arguments passed across by inetd, at least on the NeXT.


Step 4: Send HUP Signal to inetd

When you have updated inetd.conf, find out the process number of inetd, and send a "HUP" signal to it.

For example on BSD unix do this:

		
        > ps -aux | grep inetd | grep -v grep
        root    85   0.0  0.9 1.24M  304K ?  S  0:01 /usr/etc/inetd
        > kill -HUP 85
For system V, use ps -el instead of ps -aux. Be aware that on some systems your local file /etc/services may not be consulted by your system (see notes on debugging).


Test It!


httpd@info.cern.ch

Using NIS (Yellow Pages)

If your machine is running Sun's "Network Information Service", originally know as "yellow pages", read this.

You must:

This will load the /etc/services file info the NIS information system.

Some people have found that they needed to reboot he system afterward for the change to take effect.


httpd@info.cern.ch

Adding a Service on the NeXT

The NeXT uses the the "netinfo" database instead of the /etc/services file. This is managed with the /NextAdmin/NetInforManager application. Here's how to add the service http:
httpd@info.cern.ch

Priviliged ports

The TCP/IP port numbers below 1024 are special in that normal users are not allowed to run servers on them. This is a security feaure, in that if you connect to a service on one of these ports you can be fairly sure that you have the real thing, and not a fake which some hacker has put up for you.

The normal port number for W3 servers is port 80. This number has been assigned to WWW by the Internet Assigned Numbers Authority, IANA.

When you run a server as a test from a non-priviliged account, you will normally test it on other ports, such as 2784, 5000, 8001 or 8080.


Under Unix

The Internet Daemon inetd (running as root) can listen for incomming conections on port 80 and pass them down to a process with a safer uid for the server itself. However, the httpd versions 2.14 and later can be safely run as root since they automatically change their user-id to nobody or some other user-id depending on server setup.


Under VMS

Under UCX, the process running as a server needs BYPASS privilege to listen to ports below 1024. This might mean you have to install the server. With other TCP/IP packages, privilege of some sort is similarly required.


httpd@info.cern.ch

Debugging httpd

Suppose you think you have installed httpd but it doesn't work. Here we assume you have used port 80. If you have a situation not handled by this problem-solving guide, please mail httpd@info.cern.ch.


Type
        www http://myhost.domain/
What happens?

Connection Refused

The browser tries to connect to the daemon but gets this status in the trace.

This means that nobody was listening on that port number. Check the port numbers match between server and client. Make sure you specify the port number explicitly in the document address for www.

If you are running the daemon standalone (as you should be), check that it is actually running by taking a list of processes, and that it is listening to the correct port (specified with -p port option), or try running it from the terminal with -v option as well. The trace for the server should say "socket, bind and listen all ok". If it does, and you still get "connection refused", then you must be talking to the wrong host (or, conceivably, different ethernet adapters on the same host).

If you are running with the inet daemon, then check both the services file (/etc/services) or database (yellow pages, netinfo) if your system uses it, and the /etc/inetd.conf file. Check the service name matches between these two (e.g. http).

Did you remember to kill -HUP the inetd when you changed the inetd.conf file?

Be aware that on some systems your local file /etc/services will not be consulted E.g. when ypbind is running on Suns, then you should type

        ypwhich -m services
and ask the administrator of the machine named to change its own /etc/services.

Try running the deamon from a shell window to see better what happens.


Cannot Connect To Information Server

The usual cause of this is that the server is not running, or it's running on a different port.

There is more information you can get. Use the "verbose" option on the LineMode browser to find out what went wrong:

        www -v http://myhost.domain:80/

What do you get? A load of trace messages. There are several cases.

Unable To Access Document

Typical cause of this is that the configuration file is incorrect, or files are not readable by the user-id under which the server runs. When you are running the server as root, it will automatically switch it to nobody just before serving the document. This can be changed with the UserId configuration directive.


An Empty Document Is Displayed

The document sent back is empty, but there is no error message.

The inetd has started a process to run your server but it immediately failed. Possibilities include:


Document Address Invalid Or Access Not Authorized...

...or some similar kind of error message. This means either: If you are the server administrator, and you can't understand why the daemon refuses to deliver the file,

Bad Output

A document is displayed, but not the one you wanted.

These are some ideas:


httpd@info.cern.ch

Running Under Shell

You don't have to run the daemon under the inetd if it doesn't work (and we recommend running it standalone anyway). You can run it from a shell session.

Run httpd from your terminal turned on, with a different port number like 8080:

        httpd -p 8080
Note: You must be root (under VMS, have some privilege) to run with a port number below 1024. If you select a port above 1024, then you can run as a normal user. This way, anyone can publish files on the net. Howeever, it isn't very reliable, as your server will not automatically come back up if the machine is rebooted. In the long term it is best to install it to be started from the system startup file /etc/rc or /etc/rc.local.

You may not be able to use a port number which has been used by a daemon process recently (port may still be bound), so you may have to switch port number if you ^C and restart httpd. When it is running like this, you can also read the debugging messages (when running with -v option), and use a debugger on it if necessary. (See also: telnetting to the server).


Debugging using Trace

If you can't understand why a server refuses to give back a document, then run with the -v option to turn on debugging messages. Use -v as the very first command line option (this way debugging is turned on right away). You will see the daemon setting up the rules for translating requests into local URLs, and you will see its attept to access the file (assuming you map requests onto files).
        httpd -v -p 8080
Try to access the document from a client using another terminal window. Look at the debugging output. It will probably explain what is happening. If you still can't figure out the problem, mail your local guru help desk or if desperate httpd@info.cern.ch enclosing a copy of debugging output.


Even simpler

For testing a daemon very simply, without using a client, you can make the terminal be the client. With httpd try just running it with the terminal and typing GET /document/url into its input:
        httpd -v
        GET /document/url

httpd@info.cern.ch

Telnetting to httpd

Most implementations of telnet allow you to specify a port number. Under unix this is often just a second parameter, under VMS a /PORT option.

The HTTP protocol is a telnet protocol, so you can simulate it just by typing things in. This will help you to see exactly what a sending back, and it will check you that it really is the server not the browser which has a problem.

Here is a simple example (keybord input is in boldface):

	> telnet myhost.domain 80
	Connected to myhost.domain on port 80
	Escape is ^[
	GET /document/url
	...document or error message...

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Command Line of CERN httpd

The command line syntax for httpd allows a number of options and an optional directory argument:
        httpd  [-opt -opt -opt ...] [directory]
The directory argument, if present, indicates the directory to be exported. If not present, either a rule file is be used, to export combinations of directories, or else the default is to export the /Public directory tree.


Options

-r rulefile
Use rulefile as configuration file. This is the only necessary command line option if you don't have the default configuration file, /etc/httpd.conf. All the other options can be given as directives in the configuration file.
-p port
Listen to port port. Without this argument httpd assumes that it has been run by inetd, and uses stdin and stdout as its communication channel. Note that port numbers under 1024 are privileged.
-l logfile
Use logfile to log the requests.
-restart
Restart an already running httpd. httpd finds the out the process number of the running server from PidFile and sends it the HUP signal (HangUP). This will cause httpd to reload its configuration files and reopen its log files. Important: To find out the PidFile httpd will have to read the same configuration file as the running httpd has, so you have to specify the same -r options on the command line as for the actual httpd.
-gc_only
[only for proxies] Do only garbage collection and then exit. This can be used to run httpd periodically by cron to do garbage collection on a cache that is used by httpd run from the inetd daemon rather than standalone. When httpd is not running standalone it cannot monitor the cache, nor perform automatic garbage collection.
-v
Verbose, turn on debugging messages.
-vv
Very Verbose, turn on even more verbose debugging messages.
-version
Print version number of httpd and libwww (the WWW Common Library).

Directory Browsing

You can set these also with the DirAccess configuration directive.
-dy
Enable direcory browsing. Directories are returned as hypertext documents. See browsing directories. Default.
-dn
Disable directory browsing. An attempt to access a directory will generate an error response.
-ds
Selective directory browsing; enabled only for directories containing a file named .www_browsable

README Feature

It is common practice to put a file named README into a directory containing instructions or notices to be read by anyone new to the directory. httpd will by default embed any README file in the hypertext version of a directory.

You can set these also with the DirReadme configuration directive.

-dt
For any browsable directory which contains a README file, include the text of the README file at the top of the document before the listing. Default.
-db
As -dt but put the README at the bottom, after the listing. The -db and -dt options may be combined with -dy as -dyb, -dty etc.
-dr
Disables the README inclusion feature.

Examples

        httpd -r /usr/etc/httpd.conf -p 80
This is a standalone server running on port 80. Configuration file is /usr/etc/httpd.conf instead of the default, /etc/httpd.conf.

Note that if the Port directive is given in the configuration file the -p option is not necessary (it can be used to override the value set in the configuration file).

        httpd
httpd uses its default configuration file /etc/httpd.conf. If that file doesn't exist, httpd exports the /Public directory tree. This tree may contain soft links to other directory trees.

If the configuration file /etc/httpd.conf didn't define the port number to listen to this is an httpd reading its stdin and writing to its stdout, so it is run by inetd.

        httpd -r /usr/local/lib/httpd.conf
The same as before, but uses /usr/local/lib/httpd.conf as a rule file instead of the default /etc/httpd.conf.


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Configuration File of CERN httpd

The configuration file (often referred to as the rule file) defines how httpd will translate a request into a document name. The directives controlling httpd features are also put into the configuration file, as well as protection configuration. This is essential to prevent unauthorized access to your private documents.

Default Configuration File

By default, the configuration file /etc/httpd.conf is loaded, unless specified otherwise with the -r command line option:
        httpd -p 80 -r /your/own/httpd.conf
See also example configuration files.

Comments in Configuration File

Each line consists of an operation code and one or two parameters, referred to as the template and the result. Lines starting with a hash sign # are ignored, as are empty lines.


Restarting the Server

When you are running the server in standalone mode (not from inetd), and modify the configuration file, send the HUP signal to httpd to make it re-read the configuration file. You can find out the process number from the pid file written by httpd, e.g.
        > cat /server_root/httpd-pid
        2846
        > kill -HUP 2846
        >
Important: You must specify the configuration file as an absolute pathname for the -r option because when the server is started in standalone mode it changes its current directory to / so after startup it cannot reload configuration files that were specified with relative filenames.

To make restarting easier httpd has a -restart option, which will automatically send the HUP signal to another httpd process. Important: To find out the PidFile httpd will have to read the same configuration file as the running httpd has, so you have to specify the same -r options on the command line as for the actual httpd, e.g.

        > httpd -r /usr/etc/httpd.conf -restart
        Restarting.. httpd
        Sending..... HUP signal to process 21379
        >

Exhaustive List of Configuration Directives


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

General CERN httpd Configuration Directives


ServerRoot

Server's "home" diretory is specified via ServerRoot directive. If server root is specified, but no AddIcon directive has been used in configuration file to set up icons, the default icon directory is under server root icons. The default icons that should be present are: If these defaults don't please you you can define all from the scratch. As an example of AddIcon directive, the defaults would be specified as follows:
    Pass  /httpd-internal-icons/*  /server_root/icons/*

    AddBlankIcon   /httpd-internal-icons/blank.xbm
    AddDirIcon     /httpd-internal-icons/directory.xbm  DIR
    AddParentIcon  /httpd-internal-icons/back.xbm       UP
    AddUnknownIcon /httpd-internal-icons/unknown.xbm
    AddIcon        /httpd-internal-icons/binary.xbm     BIN  binary
    AddIcon        /httpd-internal-icons/text.xbm       TXT  text/*
    AddIcon        /httpd-internal-icons/image.xbm      IMG  image/*
    AddIcon        /httpd-internal-icons/movie.xbm      MOV  video/*
    AddIcon        /httpd-internal-icons/sound.xbm      AU   audio/*
    AddIcon        /httpd-internal-icons/tar.xbm        TAR  multipart/*tar
    AddIcon        /httpd-internal-icons/compressed.xbm CMP  x-compress x-gzip

On Proxy Server

On proxy server the icon URLs must be full URLs, because otherwise clients would translate them relative to remote host. This means that in the above example all the AddIcon* directives have to read:
    AddIcon  http://your.server/httpd-internal-icons/...
and you have to pass also the full icon URL:
    Pass  http://your.server/httpd-internal-icons/*  /server_root/icons/*
Since future smart browsers might notice that the icon server is the same one as the proxy server it may be best in this case to also Pass the partial URL as above:
    Pass  /httpd-internal-icons/*  /server_root/icons/*

HostName

On some hosts the hostname lookup fails producing only the name without the domain part. Full hostname is necessary when httpd is generating references to itself (redirection responses to clients). If necessary, provide full server hostname with HostName directive:
        HostName  full.server.host.name
You may want to use this also when the real host name is different from what you want the clients to see (you have a DNS alias for the host).


Default Port Setting

For standalone server (the one running continuously, listening to a certain port, and forking a child to handle the request) the port to listen to can be defined via Port configuration directive instead of the -p port command line option. Normally:
        Port 80
-p port command line line option still overrides this default.


PidFile

httpd re-reads its configuration file when it receives a HUP signal [HANGUP], the signal number 1. To make it easy to find out the parent httpd process id, it writes it to a file.

By default, if ServerRoot is specified, this is the file httpd-pid under server root; if not, it defaults to /tmp/httpd-pid.

The PidFile directive can be used to set the process id file name; it can be either an absolute path, or a relative one. Relative path is relative to ServerRoot, or if not defined, relative to /tmp.

Example

        ServerRoot  /Web/serverroot
        PidFile     logs/httpd-pid
would cause the process id to be written to /Web/serverroot/logs/httpd-pid.


Default User Id

UserId directive sets the default user to run as instead of nobody. This directive is only meaningful when running server as root.
        UserId whoever

Default Group Id

GroupId directive sets the default group to run under instead of nogroup. This directive is only meaningful when running server as root.
        GroupId whichever

Enabling and Disabling HTTP Methods

You can enable/disable methods that you do/don't want your server to accept:
        Enable  METHOD
        Disable METHOD
By default GET, HEAD and POST are enabled, and the rest are disabled.

Examples

        Enable POST
        Disable DELETE

IdentityCheck

If IdentityCheck configuration directive is turned On, httpd will connect to the ident daemon (RFC931) of the remote host and find out the remote login name of the owner of the client socket. This information is written to access log file, and put into the REMOTE_IDENT CGI environment variable.

Default setting is Off:

        IdentityCheck Off
and if you don't need this information you will save the resources by keeping it off. Furthermore, this information does not provide any more security and should not be trusted to be used in access control, but rather just for informational purposes, such as logging.

WARNING

On some systems there is a kernel bug that causes all the connections to the remote node to be broken if the remote ident request is not answered (ident daemon not running, for example). This is reported for at least SunOS 4.1.1, NeXT 2.0a, ISC 3.0 with TCP 1.3, and AIX 3.2.2, and later are ok. Sony News/OS 4.51, HP-UX 8-?? and Ultrix 4.3 still have this bug. A fix for Ultrix is availabe (CSO-8919).

[Thanks to Per-Steinar Iversen from Norway for pointing this out!]

If the operating system on your server host has this bug, do not use IdentityCheck!


Welcome

Welcome directive specifies the default file name to use when only a directory name is specified in the URL. There may be many Welcome directives giving alternative welcome page names. The one that was defined earlier will have precedence.

Default values are Welcome.html, welcome.html and index.html. index.html is there only for compatibility with NCSA server; the word "Welcome" is more descriptive, and has precedence.

All default values will be overridden if Welcome directive is used.

Default values could be defined as:

        Welcome Welcome.html
        Welcome welcome.html
        Welcome index.html

AlwaysWelcome

By default there is no difference between directory names with and without a trailing slash when it comes to welcome pages. The one without a trailing slash will cause an automatic redirection to the one with a trailing slash, which then gets mapped to the welcome page.

If it is desirable to have plain directory names to produce a directory listing, and only the ones with a trailing slash cause the welcome page to be returned, set the AlwaysWelcome directive to off:

        AllwaysWelcome Off
Default value is On.


User-Supported Directories

User-supported directories, URLs of form /~username, are enabled by UserDir directive:
        UserDir dir-name
The dir-name argument is the directory in each user's home directory to be exported, for example WWW:
        UserDir WWW

Meta-Information

It is possible to tell httpd to add meta-information to response. Meta-information is stored in a directory specified by MetaDir directive, under the same directory as the file being retrieved:
        MetaDir  dir-name
Meta-information is stored in a file with the same name as the actual document, but appended with a suffix specified via MetaSuffix directive:
        MetaSuffix  .suffix
Meta-information files contain RFC822-style headers.

Default settings are:

        MetaDir    .web
        MetaSuffix .meta
meaning that meta-information files are located in the .web subdirectory, and they end in .meta suffix, i.e. the metafile for file:
        /Web/Demo/file.html
would be:
        /Web/Demo/.web/file.html.meta

MaxContentLengthBuffer

httpd normally gives a content-lenght header line for every document it returns. When it's running as a proxy it buffers the document received from the remote server before sending it to the client. This directive can be used to set the value of this buffer - if it is exceeded the document will be returned without a content-lenght header field.

Default setting is 50 kilobytes:

        MaxContentLengthBuffer 50 K

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Rules In The Configuration File

Rules define the mapping between virtual URLs and physical file names. Currently the following rules are understood:

Mapping, Passing and Failing

There are three main rules: Map, Pass and Fail. The server uses the top rule first, then each successive rule unless told otherwise by a Pass or a Fail rule.

Map template result
If the address matches the template, use the result string from now on for future rules.
Pass template
If the address maches the template, use it as it is, porocessing no further rules.
Pass template result
If the string matches the template, use the result string as it is, processing no futher rules.
Fail template
If the address matches the template, prohibit access, processing no futher rules.
The template string may contain wildcards (asterisks) *. (Versions earlier than 3.0 support only a single wildcard.) The result string may have wildcards only if the template has them. In this case they expand to matched strings in respective order.

Whitespace, (literal) asterisks and backslashes are allowed in templates if they are preceded by a backslash.

The tilde character (see user-supported directories) just after a slash (in other words in the beginning of a directory name) has to be explicitly matched, i.e. wildcard does not match it.

When matching,


Redirecting Requests Elsewhere

When documents, or entire trees of documents, are moved from one server to another, you can use Redirect rule to tell httpd to redirect the request to another server. If the client program is smart enough user won't even notice that the document is retrieved from a different server.
Redirect template result
Document matching template is redirected to result, which must be a full URL (i.e. containing http: and the host name).

Example

        Redirect  /hypertext/WWW/*  http://www.cern.ch/WebDocs/*
This redirects everything starting with /hypertext/WWW to host www.cern.ch into virtual directory /WebDocs. For example, /hypertext/WWW/TheProject.html would be redirected to http://www.cern.ch/WebDocs/TheProject.html.


Setting Up User Authentication and Document Protection

Documents are protected by Protect and DefProt rules. Their syntax is the following:
DefProt template setup-file [uid.gid]
Any document matching the template is associated with protection setup-file. The documents are not yet taken to be protected, but they may become protected by an existing access control list file in the same directory as the requested file, or by later matching a Protect rule. If that Protect rule doesn't specify setup-file, the one from the latest DefProt rule is used.

Protect [template setup-file [uid.gid]]
Any document matching template is protected. The type of protection is defined in finer detail in setup-file.

If setup-file is not specified the one from previous matched DefProt rule will be used. If none have matched access to the file is forbidden.

setupfile is always a full pathname for the protection setup file which specifies the actual protection parameters.

Setup file can be omitted from Protect rule, but it is obligatory in DefProt rule. If setup file is omitted it is not possible to give the uid.gid part, either.

uid.gid are the Unix user id and group id (either by name or by number, separated by a dot) to which the server should change when serving the request. These are only meaningful when the server is running as root. If they are missing they default to nobody.nogroup.

Note: Uid and gid are inherited from DefProt rule to Protect rule only when the setup-file is also inherited. If setup-file is specified for Protect rule but uid.gid is not, they default to nobody.nogroup regardless of the previous DefProt rule.

This is to avoid accidentally running the server under wrong user id with wrong setup file. This information should logically go into the protection setup file, but for safety reasons it cannot be done, because a non-trustworthy collaboration could specify it to be root. This way only the main webmaster can control user and group ids.


Executable Server Scripts

Document address is mapped into a script call by Exec rule:
        Exec template script
VERY IMPORTANT: In both template and script there must be a * wildcard, that matches everything starting from the script filename. This is to enable httpd to know what is the script name and what is the extra path information to be passed to the script.

Example

You want to map everything starting with /your/url/doit to execute the script /usr/etc/www/htbin/doit. You do this by saying:
        Exec  /your/url/*  /usr/etc/www/htbin/*
Here asterisk mathes the script name doit (and everything else that follows it). Usually people use some fixed keyword in front of the pathname in URL to point out that the document is actually a script call. Often this keyword is /htbin. That is, usually your Exec rule looks like this:
        Exec  /htbin/*  /usr/etc/www/htbin/*
and all the URLs pointing to the scripts start with /htbin, for example /htbin/doit in the previous example.


Historical Note (HTBin Rule)

CERN httpd versions 2.13 and 2.14 had a hard-coded handling of URL pathnames starting /htbin that mapped them to scripts in a directory specified via HTBin rule:
        HTBin /your/htbin/directory
This is still handled automatically by httpd, by translating it to its equivalent Exec form:
        Exec /htbin/*  /your/htbin/directory/*
Always use Exec instead -- it is more general.


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Suffix Definitions for CERN httpd

cern_httpd uses suffixes to discover the content-type, content-encoding and content-language of a file. Default values are so extensive that httpd knows the usual file types. The following configuration directives can be used to add new suffix bindings and override existing defaults:

Binding Suffixes to MIME Content-Types

As well as any mapping lines in the rule file, the rule file may be used to define the data types of files with particular suffixes. CERN httpd has an extensive set of predefined suffixes, so usually you don't need to specify any.

The syntax is:

        AddType .suffix representation encoding [quality]
The parameters are as follows:
suffix
The last part of the filename. There are two special cases. *.* matches to all files which have not been matched by any explicit suffixes but do contain a dot. * by itself matches to any file which does not match any other suffix.

representation
A MIME Content-Type style description of the repreentation in fact in use in the file. See the HTTP spec. This need not be a real MIME type - it will only be used if it matches a type given by a client.

encoding
A MIME content transfer encoding type. Much more limited in variety than representations, basically whether the file is ASCII (7bit or 8bit) or binary. A few other encodings are allowed, and maybe extension to compression.

quality
Optional. A floating point number between 0.0 and 1.0 which determines the relative merits of files xxx.* which differ in their suffix only, when a link to xxx.multi is being resolved. Defaults to 1.0.

Examples

        AddType .html text/html              8bit     1.0
        AddType .text text/plain             7bit     0.9
        AddType .ps   application/postscript 8bit     1.0
        AddType *.*   application/binary     binary   0.1
        AddType *     text/plain             7bit

Historical Note (Suffix Directive)

AddType was previously called Suffix. The old name is still understood, but may be misleading since suffixes are also used to determine Content-Encoding and language. Always use AddType instead.


Binding Suffixes to MIME Content-Endocings

Suffixes are also used to determine the Content-Encoding of a file (.Z suffix for x-compressed, for example). Syntax is:
        AddEncoding .suffix  encoding

Example

        AddEncoding .Z  x-compress

Multilanguage Support

Multilanguage support is also built on using suffixes to determine the language of a document. Suffix is bound to a language by AddLanguage rule (.en suffix for english, for example). Syntax is:
        AddLanguage .suffix  encoding

Examples

        AddLanguage .en  en
        AddLanguage .uk  en_UK

Suffix Case Sensitivity

Suffix case sensitivity is by default off. You can make suffixes case sensitive with SuffixCaseSense directive:
        SuffixCaseSense On

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Accessory Scripts

In addition to having a fully configurable CGI script interface to handle form requests, CERN httpd has a few special directives to handle certain tasks always via CGI scripts:

Keyword Search Facility

Server automatically calls a script to perform search, if the absolute pathname of search script is supplied by a Search directive in the configuration file:
        Search /search/script/pathname
This script is called with the vital information in the following CGI environment variables:
PATH_INFO
contains the virtual URL of the file from where the query was issued from.

PATH_TRANSLTED
contains the physical filename of the document corresponding to the virtual URL in PATH_INFO.

QUERY_STRING
contains the (URL-encoded) keywords, which are also available decoded as command line parameters, one in each of argv[1], argv[2], ...

Search script must conform to CGI/1.1 rules, that is, it has to start its output with a MIME header followed by a blank line, after which comes the actual document. MIME header must contain either a Location: field, or a Content-Type: field, typically:
        Content-Type: text/html
if the document is an HTML document.


General POST Method Handler Script

POST requests are handled by calling the script defined by POST-Script directive:
        POST-Script  /absolute/path/post-handler
POST handler script is called in the normal CGI manner, and its output must be CGI compliant.

Note: Only such POST requests are handled by the POST handler that haven't already matched an Exec rule (which causes a specified script to be called).


General PUT Method Handler Script

PUT requests are handled by calling the script defined by PUT-Script configuration directive:
        PUT-Script  /absolute/path/put-handler
PUT handler script is called in the normal CGI manner, and its output must be CGI compliant.

Note: By default PUT method is disabled; you must explicitly enable it in the configuration file:

        Enable PUT
This is to enhance security.

IMPORTANT: Since PUT can be a very dangerous method because it allows files to be written back to the server, it is not possible to use PUT without access authorization module being activated. This means that you have to have at least a DefProt rule specifying a default protection setup, which then in turn defines the PutMask containing the list of allowed users and hosts to perform PUT operation.


General DELETE Method Handler Script

DELETE requests are handled by calling the script defined by DELETE-Script configuration directive:
        DELETE-Script  /absolute/path/put-handler
DELETE handler script is called in the normal CGI manner, and its output must be CGI compliant.

Note: By default PUT method is disabled; you must explicitly enable it in the configuration file:

        Enable DELETE
This is to enhance security.

IMPORTANT: Since DELETE can be a very dangerous method because it allows files to be deleted from the server, it is not possible to use DELETE without access authorization module being activated. This means that you have to have at least a DefProt rule specifying a default protection setup, which then in turn defines the DeleteMask containing the list of allowed users and hosts to perform DELETE operation.


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Directory Browsing

By default references to directories which don't include a welcome page cause httpd to generate a hypertext view of the directory listing. There are numerous configuration directives controlling this feature:

Controlling Directory Browsing

DirAccess on
Enable directory browsing in all directories (which are not forbidden by rules). Synonym with -dy command line option. Default.

DirAccess off
Disable directory browsing. Synonym with -dn command line option.

DirAccess selective
Enable selective directory browsing - only directories containing the file .www_browsable are allowed. Synonym with -ds command line option.


README Feature

DirReadme top
For any browsable directeory containing a README file, include the text at the top of the directory listing. Synonym with -dt command line option. Default.

DirReadme bottom
Same as previous, but contents of README appear on the bottom. Synonym with -db command line option.

DirReadme off
Disables the README inclusion feature. Synonym with -dr command line option.


Controlling The Look of Directory Listings

The following On/Off directives control how the directory listings look like. The default is to show icons, use brackets around ALTernaltive text, show last-modifid, size and description, and allow filename field width to vary between 15-22 characters, and reserve 25 characters for description.

DirShowIcons
Generate inlined image calls in front of each line. Icons visualize the content-type of the file, and they are defined by AddIcon configuration directive. Default.

DirShowDate
Show last modification date. Default.

DirShowSize
Show the size of files. Default.

DirShowBytes
By default files smaller than 1K are shown as just 1K. Setting this directive to On will cause the exact byte count to appear.

DirShowDescription
Show description if available. Default.

At the time of release of 2.17 there was no consensus about where the descriptions come from, and the mechanism is currently undocumented. For HTML files description it the TITLE element; for other files the description field is left empty.

DirShowMaxDescrLenght
The maximum number of characters to show in the description field.

DirShowBrackets
Use brackets around ALTernative text used by browsers not capable of displaying images. Default.

DirShowHidden
Show hidden Unix files (the ones starting with a dot).

DirShowOwner
Show the owner of the file.

DirShowGroup
Show the group of the file.

DirShowMode
Show the permissions of files.

DirShowCase
Sort entries in a case-sensitive manner, i.e. all capital letters before lower-case letters.


Filename Length

There is a minimum and maximum width for the filename field. Entries longer than the maximum value will be truncated. Default values are 15 and 25, and they can be changed with these directives:
DirShowMinLength num
At least this amount of characters is always reserved for filenames. If the longest filename in the directory is longer than num the field will be extended, but no more than the maximum limit (see next directive).

DirShowMaxLength num
Filenames longer than num will be truncated to fit in length.

Example

The default values would be set by saying:
        DirShowMinLength  15
        DirShowMaxLength  25

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Icons In The Directory Listings

cern_httpd directory icons are used, if enabled, for both regular directory listings, and FTP listings (when runnins as a proxy).

These directives are specified in the configuration file.


AddIcon Directive

The AddIcon directive binds an icon to a MIME Content-Type or Content-Encoding:
        AddIcon  icon-url ALT-text template
icon-url
is the URL of the icon.

ALT-text
is the alternative text to use on character terminal browsers.

template
is either a Content-Type template or a Content-Encoding template. Content-Type template must always contain a slash, whereas Content-Encoding template never has it.

The following important remarks serve also as examples.

VERY IMPORTANT: CERN httpd as a Normal HTTP Server

Understand that the icon-url is a virtual URL - one that will be translated through the rules. Therefore you must make sure that your configuration rules allow the icon URLs to be passed, e.g.:
    AddIcon  /icons/UNKNOWN.gif  ???  */*
    AddIcon  /icons/TEXT.gif     TXT  text/*
    AddIcon  /icons/IMAGE.gif    IMG  image/*
    AddIcon  /icons/SOUND.gif    AU   audio/*
    AddIcon  /icons/MOVIE.gif    MOV  video/*
    AddIcon  /icons/PS.gif       PS   application/postscript
    Pass /icons/*  /absolute/icon/dir/*
    ...other rules...

VERY IMPORTANT: CERN httpd as a Proxy

When using httpd as a proxy the icon URL must be an absolute URL pointing to your server; otherwise clients would translate it relative to the remote host.

Furthermore, you must have a mapping from this absolute URL to your local file system, e.g.:

    AddIcon  http://your.server/icons/UNKNOWN.gif  ???  */*
    AddIcon  http://your.server/icons/TEXT.gif     TXT  text/*
    AddIcon  http://your.server/icons/IMAGE.gif    IMG  image/*
    AddIcon  http://your.server/icons/SOUND.gif    AU   audio/*
    AddIcon  http://your.server/icons/MOVIE.gif    MOV  video/*
    AddIcon  http://your.server/icons/PS.gif       PS   application/postscript

    Pass http://your.server/icons/*  /absolute/icon/dir/*
    Pass /icons/*                    /absolute/icon/dir/*
    Pass http:*
    Pass ftp:*
    Pass gopher:*
NOTE: Both the full and partial icon URLs are Pass'ed because smart clients may be configured to connect to local servers directly, instead of through the proxy, and in that case the proxy server (which is then just a normal HTTP server from client's point of view) will be requested for /icons/... instead of http://your.server/icons/.... The proxy server has no way of knowing which will happen.


Icons in Gopher Listings

There are special internal (to httpd) MIME content types that can be bound to icons for gopher listings (the names should be self-explanatory):

Special Icons

httpd needs some special icons:
AddBlankIcon
Icon URL used in the heading of the listing to align it. This is typically a blank icon, but may contain some nice image that you wish to have on top of all your listings. The only criterion is that it must be the same size as the other icons.

AddUnknownIcon
Icon URL used for unknown file types, i.e. files for which no other icon binding applies. If you have an exhaustive set of AddIcon directives this needs not be used.

AddDirIcon
Icon URL for directories.

AddParentIcon
Icon URL for parent directory.

Example For a Regular HTTP Server

IMPORTANT: Remember to Pass the icon URLs!

        AddBlankIcon    /icons/BLANK.gif
        AddUnknownIcon  /icons/UNKNOWN.gif  ???
        AddDirIcon      /icons/DIR.gif      DIR
        AddParentIcon   /icons/PARENT.gif   UP

	Pass  /icons/*  /absolute/icon/dir/*
        ...other rules...

Example For a Proxy Server

IMPORTANT: Icon URLs must be absolute URLs, and you must have a mapping from the absolute form to local form, and remember to Pass them:
        AddBlankIcon    http://your.server/icons/BLANK.gif
        AddUnknownIcon  http://your.server/icons/UNKNOWN.gif  ???
        AddDirIcon      http://your.server/icons/DIR.gif      DIR
        AddParentIcon   http://your.server/icons/PARENT.gif   UP

        Pass http://your.server/icons/*  /absolute/icon/dir/*
        Pass /icons/*                    /absolute/icon/dir/*
        Pass  http:*
        Pass  ftp:*
        Pass  gopher:*

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Logging Control In CERN httpd

cern_httpd logs all the incoming requests to an access log file. It also has an error log where internal server errors are logged.

Access Log File

Access log file contains a log of all the requests. The name of the log file is spesified either by -l logfile command line option, or with AccessLog directive:
        AccessLog /absolute/path/logfile

Error Log File

Error log contains a log of errors that might prove useful when figuring out if something doesn't work. Error log file name is set by ErrorLog directive:
        ErrorLog /absolute/path/errorlog
If error log file is not specified, it defaults to access log file name with .error extension. If the filename extension already exists, .error will replace it.


Log File Format

Previously every server used to have its own logfile format which made it difficult to write general statistics collectors. Therefore there is now a common logfile format (which will eventually become the default). Currently it is enabled by
        LogFormat  Common
The old CERN httpd format can be used by
        LogFormat  Old

Log Time Format

Times in the log file are by default local time. That can be changed to be GMT time by LogTime directive:
        LogTime  GMT
Default is:
        LogTime  LocalTime

Suppressing Log Entries For Certain Hosts/Domains

It's not always necessary to collect log information of accesses made by local hosts. The NoLog directive can be used to prevent log entry being made for hosts matching a given IP number or host name template:
        NoLog  template

Examples

        NoLog 128.141.*.*
        NoLog *.cern.ch
        NoLog *.ch  *.fr  *.it

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Timeout Settings

Something may go wrong with the connection to the client causing httpd to hang infinitely doing nothing. This can be avoided by setting timeouts on different tasks that the server performs. All of these timeouts have relatively good default values by default and they don't usually need to be changed.

All the times for these directives are of form:

        45 secs
        10 mins
        2 mins 30 secs
        1 hour

InputTimeOut

InputTimeOut diretictive specifies the time to wait for the client to send the request (the MIME-header part of it, not the message body). Default value is:
        InputTimeOut  2 mins

OutputTimeOut

OutputTimeOut diretictive specifies the time to allow for sending the response. Default value is:
        OutputTimeOut  20 mins
If you are serving huge files for clients behind slow connections you may want to increase this value if you hear of connections being cut in the middle of transfer.


ScriptTimeOut

ScriptTimeOut diretictive specifies the time to allow for server scripts to finish. If a script doesn't return in the time specified httpd will send TERM and KILL signals to it (with 5 seconds in between to let scripts do cleanup upon exit). Default value is:
        ScriptTimeOut  5 mins

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Proxy Caching

When cern_httpd is run as a proxy it can perform caching of the documents retrieved from remote hosts to make futher requests faster.


Turning Caching On and Off

Caching is normally turned implicitly on by specifying the Cache Root Directory, but it can be explicitly turned on and off by Caching directive:
        Caching On

Setting Cache Directory

Caching is enabled on a server running as a gateway (proxy) by CacheRoot directive, which is used to set the absolute path of the cache directory:
        CacheRoot /absolute/cache/directory

Cache Size

CacheSize directive sets the maximum cache size in megabytes. Default value is 5MB, but its preferable to have several megabytes of cache, like 50-100MB, to get best results. Cache may, however, temporarily grow a few megabytes bigger than specified.

Example

        CacheSize 20 M
sets cache size to 20 megabytes.


NoCaching

URLs matching a template given by NoCaching directive will never be cached, e.g.:
        http://really.useless.site/*
From version 3.0 on templates can have any number of wildcard characters *.


CacheOnly

Only the URLs matching templates given by CacheOnly directives will be cached, e.g.:
        http://really.important.site/*
From version 3.0 on templates can have any number of wildcard characters *.


Maximum Time to Keep Cache Files

All cached documents matching a specified template and that are older than specified by CacheClean directive will be removed. This value overrides expiry date in that no file can be stored longer than this value specifies, regardless of expiry date.

Examples

        CacheClean http:*     1 month
        CacheClean ftp:*     14 days
        CacheClean gopher:*   5 days 12 hours

Maximum Time to Keep Unused Files

Cache files matching a template and having been unused longer than specified by CacheUnused directive will be removed.

Examples

        CacheUnused *                      4 days 12 hours
        CacheUnused http://www.w3.org/*  7 days
        CacheUnused ftp://some.server/*   14 days
Note that the last matching specification will have precedence; therefore HTTP files from info.cern.ch will be kept 7 days, and not 4.5 days.


Default Expiry Time

Files for which the server gave neither Expires: nor Last-Modified: header will be kept at most the time specified by CacheDefaultExpiry directive. Default values are zero for HTTP (script replies shouldn't be cached), and 1 day for FTP and Gopher.

Example

        CacheDefaultExpiry ftp:*     1 month
        CacheDefaultExpiry gopher:*  10 days
WARNING: Default expiry for HTTP will almost always cause problems because there are currently many scripts that don't give an expiry date, yet their output expires immediately. Therefore, it is better to keep the default value for http: in zero.


CacheLastModifiedFactor

Currently HTTP servers give usually only the Last-Modified time, but not Expires time. Last-Modified can often be successfully used to approximate expiry date. CacheLastModifiedFactor gives the fraction of time since last modification to give the remaining time to be up-to-date.

Default value is 0.1, which means that e.g. file modified 20 days ago will expire in 2 days.

Examples

        CacheLastModifiedFactor  0.2
would cause files modified 5 months ago to expire after one month.

This feature can be turned off by specifying:

        CacheLastModifiedFactor  Off

CacheTimeMargin

Sometimes inaccurate times on other hosts cause confusion in caching. It often also makes sense not to cache documents that will expiry in a couple of minutes anyway. CacheTimeMargin defines this time margin, by default:
        CacheTimeMargin  2 mins
No document expiring in less than two minutes will be written to disk.


CacheNoConnect

This directive puts proxy to standalone cache mode, i.e. only the documents found in the cache are returned, and ones no in the cache will return error rather than connection to the outside world. This is useful for demo-purposes and in other cases without network connection:
        CacheNoConnect On
Default setting is naturally Off.

This directive is typically used with expiry checking also turned Off.


CacheExpiryCheck

If (for demo-reasons etc) it's desired that the proxy always returns documents from the cache, even if they have expired, CacheExpiryCheck can be turned off:
        CacheExpiryCheck  Off
Default setting is On, meaning that proxy never returns an expired document.

This is usually used in standalone cache mode (CacheNoConnect diretive turned On).


Garbage Collection

When caching is enabled garbage collection is also activated by default. This can be explicitly turned off with Gc directive:
        Gc  Off

When to Do Garbage Collection

Garbage collection is launched right away when cache size limit is reached. However, to keep cache smaller it might be desirable to remove expired files even if there is still cache space remaining. It is possible to to launch garbage collection at a certain time, usually outside the busy hours:l
        GcDailyGc      time

GcDailyGc specifies the time to do daily garbage collection, normally during the night. Default value is 3:00. Daily garbage collection can be disabled by specifying Off.

Example

Default value would be specified as:
        GcDailyGc       3:00
Another example: turning daily gc off:
        GcDailyGc       Off

Memory Usage of Garbage Collector

Garbage collector performs its job best if if can read information about the whole cache into memory at once. This is not possible if the machine doesn't have enough main memory.

GcMemUsage directive advices garbage collector about how much memory to use. You may imagine this is the number of kilobytes to use for gc data, but it may vary greatly according to dynamic things, like the directory structure of cached files.

Default is 500; if gc fails because memory runs out make this smaller. If your machine has so much memory that it just can't run out, make this very big.

Example

        GcMemUsage 100
if you have very little memory.


Cache File Sizes

There are two limits controlling the size factor of a file when its value is being calculated. CacheLimit_1 sets the lower limit; under this all the files have equal size factor. CacheLimit_2 sets up higher limit; files bigger than this get extremely bad size factor (meaning they get removed right away because they are too big).

Sizes are specified in kilobytes, and defaults values are 200K and 4MB, respectively.

Examples

        CacheLimit_1 200 K
        CacheLimit_2 4000 K
would set the same values as the defaults, 200K and 4MB.


Cache Lock Timeout

During retrieval cache files are locked. If something goes wrong a lock file may be left hanging. CacheLockTimeOut directive sets the amount of time after which lock can be broken. Time is specified like all the other times in the configuration file, and default value is 20 minutes, the same as default OutputTimeOut. CacheLockTimeOut should never be less than OutputTimeOut!

Example

        CacheLockTimeOut  30 mins
would set lock timeout to half an hour.


CacheAccessLog

Cache accesses can be logged to a different log file instead of the normal access log. The CacheAccessLog directive takes an absolute pathname of the cache access log file:
        CacheAccessLog  /absolute/path/file.log

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Configuring Proxy To Connect To Another Proxy

If there is a need to make an (inner) proxy cern_httpd connect to the outside world via another (outer) proxy server, you can use the same environment variables as are used to redirect clients to the proxy to make inner proxy use the outer one: E.g. your (inner) proxy server's startup script could look like this:
        #!/bin/sh
        http_proxy=http://outer.proxy.server:8082/
        export http_proxy
        /usr/etc/httpd -r /etc/inner-proxy.conf -p 8081
This is a little ugly, so there are also the following directives in the configuration file:

no_proxy

In the same way that clients can specify a set of domains for which the proxy should not be consulted, httpd has a no_proxy configuration directive to tell it that it should not connect to another proxy for certain URLs:
        no_proxy  cern.ch,ncsa.uiuc.edu,some.host:8080
WARNING: The argument string is a comma-separated list and should not contain spaces!


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Configuration File Examples

httpd.conf
sample configuration file for running as a normal HTTP server.
prot.conf
sample configuration file for running as a normal HTTP server with access control.
proxy.conf
sample configuration file for running as a proxy without caching.
caching.conf
sample configuration file for running as a proxy with caching.


httpd@info.cern.ch

Normal HTTP Server Configuration

#
#	Sample configuration file for cern_httpd for running it
#	as a normal HTTP server.
#
# See:
#	
#
# for more information.
#
# Written by:
#	Ari Luotonen  April 1994  
#

#
#	Set this to point to the directory where you unpacked this
#	distribution, or wherever you want httpd to have its "home"
#
ServerRoot	/where/ever/server_root

#
#	The default port for HTTP is 80; if you are not root you have
#	to use a port above 1024; good defaults are 8000, 8001, 8080
#
Port	80

#
#	General setup; on some systems, like HP, nobody is defined so
#	that setuid() fails; in those cases use a different user id.
#
UserId	nobody
GroupId	nogroup

#
#	Logging; if you want logging uncomment these lines and specify
#	locations for your access and error logs
#
# AccessLog	/where/ever/httpd-log
# ErrorLog	/where/ever/httpd-errors
LogFormat	Common
LogTime		LocalTime

#
#	User-supported directories under ~/public_html
#
UserDir	public_html

#
#	Scripts; URLs starting with /cgi-bin/ will be understood as
#	script calls in the directory /your/script/directory
#
Exec	/cgi-bin/*	/your/script/directory/*

#
#	URL translation rules; If your documents are under /local/Web
#	then this single rule does the job:
#
Pass	/*	/local/Web/*

Normal HTTP Server With Access Control

#
#	Sample configuration file for cern_httpd for running it
#	as a normal HTTP server WITH access control.
#
# See:
#	
#
# for more information.
#
# Written by:
#	Ari Luotonen  April 1994  
#

#
#	Set this to point to the directory where you unpacked this
#	distribution, or wherever you want httpd to have its "home"
#
ServerRoot	/where/ever/server_root

#
#	The default port for HTTP is 80; if you are not root you have
#	to use a port above 1024; good defaults are 8000, 8001, 8080
#
Port	80

#
#	General setup; on some systems, like HP, nobody is defined so
#	that setuid() fails; in those cases use a different user id.
#
UserId	nobody
GroupId	nogroup

#
#	Logging; if you want logging uncomment these lines and specify
#	locations for your access and error logs
#
# AccessLog	/where/ever/httpd-log
# ErrorLog	/where/ever/httpd-errors
LogFormat	Common
LogTime		LocalTime

#
#	User-supported directories under ~/public_html
#
UserDir	public_html

#
#	Protection setup by usernames; specify groups in the group
#	file [if you need groups]; create and maintain password file
#	with the htadm program
#
Protection PROT-SETUP-USERS {
	UserId		nobody
	GroupId		nogroup
	ServerId	YourServersFancyName
	AuthType	Basic
	PasswdFile	/where/ever/passwd
	GroupFile	/where/ever/group
	GET-Mask	user, user, group, group, user
}

#
#	Protection setup by hosts; you can use both domain name
#	templates and IP number templates
#
Protection PROT-SETUP-HOSTS {
	UserId		nobody
	GroupId		nogroup
	ServerId	YourServersFancyName
	AuthType	Basic
	PasswdFile	/where/ever/passwd
	GroupFile	/where/ever/group
	GET-Mask	@(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu)
}

Protect	/very/secret/URL/*  	PROT-SETUP-USERS
Protect	/another/secret/URL/*	PROT-SETUP-HOSTS

#
#	Scripts; URLs starting with /cgi-bin/ will be understood as
#	script calls in the directory /your/script/directory
#
Exec	/cgi-bin/*	/your/script/directory/*

#
#	URL translation rules; If your documents are under /local/Web
#	then this single rule does the job:
#
Pass	/*	/local/Web/*


Proxy Configuration With Caching

The configuration without caching is otherwise the same, just leave out all the directives starting with "Cache" or "Gc".
#
#	Sample configuration file for cern_httpd for running it
#	as a proxy server WITH caching.
#
# See:
#	
#
# for more information.
#
# Written by:
#	Ari Luotonen  April 1994  
#

#
#	Set this to point to the directory where you unpacked this
#	distribution, or wherever you want httpd to have its "home"
#
ServerRoot	/where/ever/server_root

#
#	Set the port for proxy to listen to
#
Port	8080

#
#	General setup; on some systems, like HP, nobody is defined so
#	that setuid() fails; in those cases use a different user id.
#
UserId	nobody
GroupId	nogroup

#
#	Logging; if you want logging uncomment these lines and specify
#	locations for your access and error logs
#
# AccessLog	/where/ever/proxy-log
# ErrorLog	/where/ever/proxy-errors
LogFormat	Common
LogTime		LocalTime

#
#	Proxy protections; if you want only certain domains to use
#	your proxy, uncomment these lines and specify the Mask
#	with hostname templates or IP number templates:
#
# Protection PROXY-PROT {
# 	ServerId	YourProxyName
# 	Mask		@(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu)
# }
# Protect  *  PROXY-PROT

#
#	Pass the URLs that this proxy is willing to forward.
#
Pass	http:*
Pass	ftp:*
Pass	gopher:*
Pass	wais:*

#
#	Enable caching, specify cache root directory, and cache size
#	in megabytes
#
Caching		On
CacheRoot	/your/cache/root/dir
CacheSize	5

#
#	Specify absolute maximum for caching time
#
CacheClean	*	2 months

#
#	Specify the maximum time to be unused
#
CacheUnused	http:*		2 weeks
CacheUnused	ftp:*		1 week
CacheUnused	gopher:*	1 week

#
#	Specify default expiry times for ftp and gopher;
#	NEVER specify it for HTTP, otherwise documents generated by
#	scripts get cached which is usually a bad thing.
#
CacheDefaultExpiry	ftp:*		10 days
CacheDefaultExpiry	gopher:*	2 days

#
#	Garbage collection controls; daily garbage collection at 3am;
#
Gc		On
GcDailyGc	3:00


Ari Luotonen, CERN, 1994

CERN Server CGI/1.1 Script Support

Server scripts are used to handle searches, clickable images and forms, and to produce synthesized documents on the fly. See calendar and finger gateway for examples.


In This Section...


Important Note!

CERN httpd versions 2.15 and newer have two script interfaces. The other one is the official CGI, Common Gateway Interface, which enables scripts to be shared between different server implementations (NCSA server, Plexus, etc). The other one is the original, very easy-to-use, interface, that was introduced in version 2.13.

Use of CGI instead of the old interface is strongly encouraged.

IMPORTANT: If you have, or wish to write, scripts that use the old interface, your script name has to end in .pp suffix (comes from "Pre-Parsed"). URLs referring to these scripts should not contain this suffix. This is to make it easier to later upgrade to CGI scripts, so you only need to change the script name in the file system, and not the documents pointing to it. If you absolutely want to use the old interface (which is nice for quick hacks that don't need to be portable), see the doc.


Setting Up httpd To Call Scripts

The server knows that a request is actually a script request by looking at the beginning of the URL pathname. You can specify these special strings in the configuration file (/etc/httpd.conf) by Exec rules:
        Exec /url-prefix/*  /physical-path/*
Where /url-prefix/ is the special string that signifies a script request, and /physical-path/ is the absolute filesystem pathname of the directory that contains your scripts.

Example

        Exec  /htbin/*  /usr/etc/cgi-bin/*
makes URL paths starting with /htbin to be mapped to scripts in directory /usr/etc/cgi-bin. I.e. requesting
        /htbin/myscript
causes a call to script
        /usr/etc/cgi-bin

Historical Note

In httpd versions before 2.15 there was an HTBin directive:
        HTBin  /physical-path
which is now obsolite, but understood by the server to mean
        Exec  /htbin/*  /physical-path/*
Use of Exec rule instead is recommended for its generality.


Information Passed to CGI Scripts

CGI scripts get their input mainly from environment variables and standard input (when using POST method). Search scripts get keywords also as command line arguments.

Most important environment variables are:

QUERY_STRING
The query part of URL, that is, everything that follows the question mark. This string is URL-encoded, meaning that special characters like spaces and newlines are encoded into their hex notation (%xx), and characters like + = & have a special meaning. The contents of this variable can be easily parsed using the cgiparse program.

PATH_INFO
Extra path information given after the script name, for example with Exec rule:
        Exec  /htbin/*  /usr/etc/cgi-bin/*
a URL with path
        /htbin/myscript/extra/pathinfo
will execute the script /usr/etc/cgibin/myscript with PATH_INFO environment variable set to /extra/pathinfo.

PATH_TRANSLATED
Extra pathinfo translated through the rule system. (This doesn't always make sense.)

See also NCSA's primer to writing CGI scripts.


Results From Scripts

Scripts return their results either outputting a document to their standard output, or by outputting the location of the result document (either a full URL or a local virtual path).

Outputting a Document

Script result must begin with a Content-Type: line giving the document content type, followed by an empty line. The actual document follows the empty line. Example:
        Content-Type: text/html

        <HEAD>
        <TITLE>Script test>
        </HEAD>
        <BODY>
        <H1>My First Virtual Document</H1>
        ....
        </BODY>

Giving Document Location

If the script wants to return an existing document (local or remote), it can give a Location: header followed by an empty line: Example:
        Location: http://www.w3.org/hypertext/WWW/TheProject.html

This causes the server to send a redirection to client, which then retrieves that document. If Location starts with a slash (is not a full URL), it is taken to be a virtual path for a document on the same machine, and server passes this string right away through the rule system and serves that document as if it had been requested in the first place. In this case clients don't do the redirection, but the server does it "on the fly".

Example:

        Location: /hypertext/WWW/TheProject.html
Understand, that this is a virtual path, so after translations it might be, for example, /Public/Web/TheProject.html.

Important: Only full URLs in Location field can contain the #label part of URL, because that is meant only for the client-side, and the server cannot possibly handle it in any way.


NPH-Scripts (No-Parse-Headers)

Script wishing to output the entire HTTP reply (including status line and all response headers) should be named to begin with nph- prefix. This makes httpd connect script's output stream directly to requesting client reducing the overhead of server needlessly parsing the response headers.

Example Of NPH-Script Output

        HTTP/1.0 200 Script results follow
        Server: MyScript/1.0 via CERN/3.0
        Content-Type: text/html

        <HEAD>
        <TITLE>Just testing...</TITLE>
        </HEAD>
        <BODY>
        <H1>Output From NPH-Script</H1>
        Yep, seems to work.
        </BODY>

Setting Up A Search Script

There is a special Search directive in the configuration file givin the absolute pathname of the script performing the search:
        Search /absolute/path/search
Every time a document is searched, this script is called with
Command line
containing the search keywords decoded, one in each of argv[1], argv[2], ...
QUERY_STRING
containing the query string encoded, as it came in the URL after the question mark.
PATH_INFO
Virtual path of the document that the search was issued from.
PATH_TRANSLATED
Absolute filesystem path of the document.
Search results are output in the usual way:
        Content-Type: text/html

        ...generated document...

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

cgiparse Manual

cgiparse handles QUERY_STRING environment variable parsing for CGI scripts. It comes with CERN server distributions 2.15 and newer.

If the QUERY_STRING environment variable is not set, it reads CONTENT_LENGTH characters from its standard input.


Command Line Options

Main Options

cgiparse -keywords
Parse QUERY_STRING as search keywords. Keywords are decoded and written to standard output, one per line.

cgiparse -form
Parse QUERY_STRING as form request. Outputs a string which, when eval'ed by Bourne shell, will set shell variables beginning with FORM_ appended with field name. Field values are the contents of the variables.

cgiparse -value fieldname
Parse QUERY_STRING as form request. Prints only the value of field fieldname.

cgiparse -read
Just read CONTENT_LENGTH characters from stdin and write them to stdout.

cgiparse -init
If QUERY_STRING is not defined, read stdin and output a string that when eval'd by Bourne shell it will set QUERY_STRING to its correct value. This can be used when the same script is used with both GET and POST method. Typical use in the beginning of Bourne shell script:
        eval `cgiparse -init`
After this command the QUERY_STRING environment variable will be set regardless of whether GET or POST method was used. Therefore cgiparse may be called multiple times in the same script (otherwise with POST it could only be called once because after that the stdin would be already read, and the next cgiparse would hang).


Modifier Options

-sep separator
Specify the string used to separate multiple values. With

-prefix prefix
-count
With

-number , e.g. -2
With

-quiet
Suppress all error messages. (Non-zero exit status still indicates error.)

All options have one-character equivalents: -k -f -v -r -i -s -p -c -q


Exit Statuses


Examples

Note: In real life, of course, QUERY_STRING is already set by the server.

Here $ is the Bourne shell prompt.


Keyword Search

    $ QUERY_STRING="is+2%2B2+really+four%3F"
    $ export QUERY_STRING
    $ cgiparse -keywords
    is
    2+2
    really
    four?
    $

Parsing All Form Fields

    $ QUERY_STRING="name1=value1&name2=Second+value%3F+That%27s right%21"
    $ export QUERY_STRING
    $ cgiparse -form

    FORM_name1='value1'; FORM_name2='Second value? That'\''s right!'

    $ eval `cgiparse -form`
    $ set
    ...
    FORM_name1=value1
    FORM_name2=Second value? That's right!
    ...
    $

Extracting Only One Field Value

    QUERY_STRING as in previous example.
    $ cgiparse -value name1
    value1
    $ cgiparse -value name2
    Second value? That's right!
    $

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

cgiutils Manual

cgiutils program is provided to make it easier to produce easily a full HTTP1 response header by NPH [No-Parse-Headers] scripts. It can also be used to just calculate the Expires: header, given the time to live in a human-friendly way, like
        1 year 3 months 2 weeks 4 days 12 hours 30 mins 15 secs

Command Line Options

cgiutils -version
print the version information.

-nodate
don't produce the Date: header.

-noel
don't print the empty line after headers [in case you want to output other MIME headers yourself after the initial header lines].

-status nnn
give full HTTP1 response, instead of just a set of HTTP headers, with HTTP status code nnn.

-reason explanation
specify the reason line for HTTP1 response [can only be used with the -status nnn options.

-ct type/subtype
specify the MIME content-type.

-ce encoding
specify the content-encoding [e.g. x-compress, x-gzip].

-dl language-code
specify the content-languge code.

-length nnn
specify the MIME content-length value.

-expires time-spec
specify the time to live, like "2 days 12 hours", and cgiutils will compute the Expires: field value [which is the actual expiry date and time in GMT and in format specified by HTTP spec].

-expires now
means immediate expiry. Often this is exactly what the scripts should output.

-uri URI
specify the URI for the returned document.

-extra xxx: yyy
specify an extra header which cannot otherwise be specified for cgiutils.

IMPORTANT: Make sure that you quote the option arguments that are more than one word:
        cgiutils -expires "2 days 12 hours 30 mins"

Examples

        cgiutils -status 200 -reason "Virtual doc follows" -expires now
  ==>
        HTTP/1.0 200 Virtual doc follows
        MIME-Version: 1.0
        Server: CERN/2.17beta
        Date: Tuesday, 05-Apr-94 03:43:46 GMT
        Expires: Tuesday, 05-Apr-94 03:43:46 GMT

Note: There is an empty line after the output to mark the end of the MIME header section; if you don't want this [you want to output some more headers yourself], specify the -noel (NO-Empty-Line) option.

Note also that cgiutils gives automatically the Server: header because it is available in the CGI environment. The Date: field is also automatically generated unless -nodate option is specified.

To get only the expires field don't specify the -status option. If you don't want the empty line after the header line use also the -noel option:

        cgiutils -noel -expires "2 days"
  ==>
        Expires: Thursday, 07-Apr-94 03:44:02 GMT

httpd@info.cern.ch
Ari Luotonen, CERN, 1994

CERN Server Clickable Image Support

CERN Server versions 2.14 and newer have a htimage program in the distribution, which is an /htbin program handling clicks on sensitive images. For versions 2.15 and newer it is a CGI program (uses the Common Gateway Interface to communicate with httpd). See demo.


In This Section...


Installing htimage Binary

After compiling htimage you should move the executable binary to the same directory as your other server scripts are, and remember to set up an exec rule. For example if your scripts are in /usr/etc/cgi-bin, you could have an Exec rule like this:
        Exec  /htbin/*  /usr/etc/cgi-bin/*
Often htimage is one of the most often used scripts, and it would therefore be nice to refer to it with as short a name as possible, like /img, so you could have a Map rule just before the Exec:
        Map   /img/*    /htbin/htimage/*
        Exec  /htbin/*  /usr/etc/cgi-bin/*

Writing a Document With Clickable Images

To create a clickable image in your HTML document, you'll need to: Each clickable image has to be described to htimage via an image configuration file. These files are referred to by the extra path information in the URL causing the call to htimage:
        <A HREF="/htbin/htimage/image/config/file">
        <IMG SRC="Image.gif" ISMAP></A>
Image configuration file can be: htimage will look for both of these (afterall, it gets both PATH_INFO and PATH_TRANSLATED environment variables from httpd anyway).

You can even do some very smart mappings in the rule file to allow very short references to htimage and picture configuration files. Let's suppose all your image configuration files are in directory /usr/etc/images. Then you can use the following two rules in your server's configuration file (by default /etc/httpd.conf):

        Map   /img/*    /htbin/htimage/usr/etc/images/*
        Exec  /htbin/*  /usr/etc/cgi-bin/*
In this case you can refer to your image mapper very easily; if you have an image configuration file Dragons.conf in /usr/etc/images directory, all you need to say in the anchor is this:
        <A HREF="/img/Dragons.conf">
        <IMG SRC="Image.gif" ISMAP></A>

Image Configuration File

There are four keywords:
default URL
URL which is used if click is in none of the given shapes. This should always be set!

circle (x,y) r URL
Circle with center point (x,y) and radius r.

rectangle (x1,y1) (x2,y2) URL
Rectangle with (any) two opposite corners having coordinates (x1,y1) and (x2,y2).

polygon (x1,y1) (x2,y2) ... (xn,yn) URL
Polygon having adjacent vertices (xi,yi). If the path given is not closed (first and last coordinate pairs aren't the same) the first and last coordinate pairs will be connected by htimage. So first point is added also as the last one if necessary.

These can be abbreviated as def, circ, rect, poly.

Shapes are checked in the order they appear in config file, and the URL corresponding to the first match is returned. If none match, the default URL is returned.

URLs are


Output Produced by htimage

htimage prints a single Location: field to its stdout, or an error message with preceding Content-Type: text/html so in fact htimage behaves exactly as any other CGI/1.0 program (script), and is not in any way handled specially by the server. Therefore, you can rename htimage to whatever you prefer, like we called it /img in the above example.

Server understands this Location: field, and either directly sends that file to the client (non-full URL), or sends a redirection to client causing it to fetch the document, maybe even from another machine.

Note that URLs returned by htimage may well be other script requests - there is no reason for being limited to just regular documents.


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Protected CERN Server Setup

Access can be restricted according to user name, internet address, or both. Access control can be tree-level, file level, or both.


In This Section...


Password File

If user-wise access control is used there has to be a password file listing all the users and their encrypted passwords. Password file can be maintained by htadm program which is a part ot CERN httpd distribution.

Note: Unix password files are understood by CERN daemon (but not vice versa). However, Unix users are in no way connected to the WWW access authorization.


Group File

Group file contains declarations of groups containing users and other groups, with possibly an IP address template. Group declarations as viewed from top-level look like this:
        groupname: item, item, item
The list of items is called a group definition. Each item can be a username, an already-defined groupname, or a comma-separated list of user and group names in parentheses. Any of these can be followed by an at sign @ followed by either a single IP address template, or a comma-separated list of IP address templates in parentheses. The following are valid group declarations:
        authors: john, james
        trusted: authors, jim
        cern_people: @128.141.*.*
        hackers: marca@141.142.*.*, sanders@153.39.*.*,
                 (luotonen, timbl, hallam)@128.141.*.*,
                 cailliau@(128.141.201.162, 128.141.248.119)
        cern_hackers: hackers@128.141.*.*
If an item contains only IP address template part all users from those addresses are accepted (e.g. cern_people above). Note the last two declarations: cern_hackers group is made up of the hackers group by restricting it further according to IP address.

Group definition can be continued to next line after any comma in the definition. Forward references in group file are illegal (i.e. to use group name before it is defined).

Group definition syntax is valid not only in group file, but also in


Server Configuration File

Typically you protect a tree of documents by protect rule in rule file, and specify authorized persons and IP addresses in the protection setup file or access control list file:
        Protect /very/secret/*  /WWW/httpd.setup
If there are Unix file system protections set up so that there is no world read-permission the daemon naturally has to run as the owner or the group member of those files.

However, if there are protected trees owned by different people this doesn't work. In that case the daemon has to run as root, and the user and group ids have to be specified in the protect rule, e.g.:

        Protect /kevin/secret/*	  /WWW/httpd.setup1  kevin.www
        Protect /marcus/secret/*  /WWW/httpd.setup2  marcus.nogroup

Protection Setup File

Each protect rule has an associated protection setup file. It specifies valid authentication schemes, password and group files, and password server-id:
        AuthType      Basic
        ServerId      OurCollaboration
        PasswordFile  /WWW/Admin/passwd
        GroupFile     /WWW/Admin/group
Password server id needs not be a real machine name. It's only purpose is to inform the browser about which password file it is using (different protection setups on the same machine can use different password file and that would otherwise confuse pseudo-intelligent clients trying to figure out which password to send).

Note: Same server-ids on different machines are considered different by clients (otherwise this would be a security hole).


Protecting Entire Tree As One Entity

If you want to control access only to entire trees of documents and don't care to restrict access differently to individual files, it suffices to give a GetMask in setup file (and you don't need any ACL files):
        GetMask    group, user, group@address, ...
Group definition has the same syntax as in group file.


Protecting Individual Files Differently

When each individual file needs to be protected separately you should use an ACL (access control list) file in the same directory as the protected files. After that no file in that directory can be accessed unless there is a specific entry in ACL allowing it.

In this case you don't need the GetMask in setup file.


Restricting Access Even Further

There may be both GetMask and an ACL, in which case both conditions must be met. This is typically used so that GetMask defines a general group of people allowed to access the tree, and ACLs restrict access even further.


Protection Setup Embedded in the Configuration File

Often it is not necessary to have the protection information in a different file; as a new feature cern_httpd allows protection setup to be "embedded" inside the configuration file itself.

Instead of writing the setup in a different file and referring to it by the filename, you can use the Protection directive to define the protection setup and bind it to a name, and later refer to this setup via that name.

The previous example could be written into the main configuration as follows:

    Protection  PROT-NAME  {
        UserId        marcus
        GroupId       nogroup
        AuthType      Basic
        ServerId      OurCollaboration
        PasswordFile  /WWW/Admin/passwd
        GroupFile     /WWW/Admin/group
        GetMask       group, user, group@address, ...
    }
    Protect  /private/URL/*      PROT-NAME
    Protect  /another/private/*  PROT-NAME
Note that since the protection setup is in the same file as the other configuration directives, it is also possible to specify the UserId and GroupId for the server to run as, without it being a security hole. With external protection setup this is made impossible because of security reasons; that is why there is an extra field after the protection setup filename specifying the user and group ids in that case:
        Protect /kevin/secret/*	  /WWW/httpd.setup1  kevin.www
        Protect /marcus/secret/*  /WWW/httpd.setup2  marcus.nogroup
If you need a given protection setup only once there is no need to first bind it to a name and then refer to it by that name, but rather just combine the two:
    Protect  /private/URL/*  {
        UserId        marcus
        GroupId       nogroup
        AuthType      Basic
        ServerId      OurCollaboration
        PasswordFile  /WWW/Admin/passwd
        GroupFile     /WWW/Admin/group
        GetMask       group, user, group@address, ...
    }
IMPORTANT: httpd is not very robust in parsing this particular directive; make sure you have a space between the URL template and the curly brace, and that the ending curly brace is alone on that line. Also, comments are not allowed inside the protection setup definition.


Access Control List File

ACL file is a file named .www_acl in the same directory as the files the access of which it is controlling. It looks typically something like this:
        secret*.html : GET,POST : trusted_people
        minutes*.html: GET,POST : secretaries
        *.html : GET : willy,kenny
It is worth noticing that all the templates are matched agaist (unlike in rule file where translation of rules stops in pass and fail.. So in the previous example all the HTML files are accessible to willy and kenny, even those matching the two previous templates.

The last field is just a list of users and group (possibly at required IP addresses), and in fact this field is in same syntax as group file.

When PUT method will be implemented it can appear in the middle field separated by a comma from get:

        *.html : GET,PUT : authors


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Manual Page For htadm

CERN httpd password file can be maintained with htadm program which is a part ot CERN httpd distribution.


Command Line Options and Parameters

htadm -adduser passwordfile [username [password [realname]]]
adds a user into the password file (fails if there is already a user by that name).

htadm -deluser passwordfile [username]
deletes a user from the password file (fails if there is no user by that name).

htadm -passwd passwordfile [username [password]]
changes user's password (fails if there is no such user).

htadm -check passwordfile [username [password]]
checks user's password (fails if there is no such user). Writes either Correct or Incorrect to standard output. Also indicates password correctness by a zero return value.

htadm -create passwordfile
creates an empty password file.

If password or even username is missing in either of the previous cases they are prompted interactively. passwordfile must be always specified. Missing real name is also prompted when adding a new user.


WARNING: Do NOT use htadm to add new users to the actual Unix password file /etc/passwd, entries written by htadm are missing some necessary fields to Unix.

Note: Passwords should not be longer than 8 characters (this is a restriction from linemode clients using C library function getpass() to read the password -- there is no other cause for this restriction; the maximum hardcoded password size is actually much larger, and if you only use GUI or other clients that are able to read this long passwords, feel free to use them).

Note: htadm destroys the password from command line as soon as possible so that it is very unlikely to see somebody's password by looking at the process listing on the machine (with ps, for example).


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

Proxies

Proxy is a HTTP server typically running on a firewall machine, providing with access to the outside world for people inside the firewall. cern_httpd can be configured to run as a proxy. Furthermore, it is able to perform caching of documents, resulting in faster response times.

I (Ari Luotonen, CERN) and Kevin Altis from Intel have written a joint paper about proxies which will be presented in the WWW94 Conference.


In This Section...


Setting Up cern_httpd To Run as a Proxy

cern_httpd runs as a proxy if its configuration file allows URLs starting with corresponding access method to be passed. Typical proxy configuration file reads:
    pass http:*
    pass ftp:*
    pass gopher:*
    pass wais:*
Note that cern_httpd is capable of running as a regular HTTP server at the same time; just add your normal rules after those ones.

WARNING: The proxy_xxx environment variables that are used to redirect clients to use a proxy also affect the proxy server itself. If this is not your intention make sure that those variables are not set in httpd's environment.


Proxy Protection

cern_httpd 2.17 and newer provide a mechanism to protect the proxy against unauthorized use (in fact, the machinery behind this is the same that is used to set up document protection when running as a regular HTTP server).

Enabling and Disabling HTTP Methods

By default only HEAD, GET and POST methods are allowed to go through the proxy. You can enable more methods using the Enable directive in the configuration file:
    Enable PUT
    Enable DELETE
The Disable directive disables methods:
    Disable POST

Defining Allowed Hosts

A certain protection setup is defined to the proxy as a single entity that is given a name. Later, when protecting certain URLs this name is used to refer to the protection setup. (The name can also be the absolute pathname of the file that defines the protection, if one wishes to store protection information in a different file.)

Protection is defined as follows:

    Protection  protname  {
        Mask @(*.cern.ch, *.desy.de)
    }
This defines a protection that allows all request methods from domains cern.ch and desy.de, and none from elsewhere. This protection can be referred to by protname.

You can also use IP number templates:

    Protection  protname  {
        Mask  @(128.141.*.*, 131.169.*.*)
    }
Note that IP number templates always have four parts separated by dots.

If allowed methods are different according to domain, e.g. GET should be allowed from both of these domains, but POST and PUT only from cern.ch, you can use GetMask, PostMask, PutMask and DeleteMask directives instead:

    Protection  protname  {
        GetMask  @(*.cern.ch, *.desy.de)
        PostMask @*.cern.ch
        PutMask  @*.cern.ch
    }
Note that parentheses are necessary only if there is more than one domain name template.

Actual Protection

The Protect rule actually associates protection with a URL. In case of proxy protection you would typically say:
    Protect  http:*   protname
    Protect  ftp:*    protname
    Protect  gopher:* protname
    Protect  news:*   protname
    Protect  wais:*   protname
which would restrict all proxy use to the allowed hosts defined previously in the protection setup protname. Note that protname must be defined before it is referenced!


Caching

cern_httpd running as a proxy can also perform caching of files retrieved from remote hosts. See the configuration diretives controlling this feature.


httpd@info.cern.ch
Ari Luotonen, CERN, 1994

CERN Server FAQ

If you have problems, first make sure you're using the newest version. You'll find that out by peeking into ftp://ftp.w3.org/pub/www/src.

When something goes wrong you should run server in verbose mode (the -v flag) to see exactly what is the problem. If you usually run it from inet daemon start it now standalone to some other port (with -p port flag) with otherwise the same parameters as in /etc/inetd.conf.


My Scripts Get Served As Text Files...

...or are completely unaccessible.

It's important to understand that rules in the configuration file (Map, Pass, Exec, Fail, Protect, DefProt and Redirect) are translated from top to bottom, and the first matching Pass, Exec or Fail will terminate rule translation.

So, make sure that your Exec rule is before any general Mappings.


How do I...


Zombies

There used to be one zombie when running cern_httpd standalone; this was fixed in version 2.17beta. If you still see zombies (more than two that don't go away in a few minutes) it is a bug.


Inet daemon complains about looping...

...and terminates WWW service. :-(

This is a hard-coded inetd limitation on at least SunOS-4.1.* and NeXT, which limits maximum allowed connections from a given host to 40 per minute. This can be exceeded by scripts doing Web-roaming, or documents having masses of small inlined images.

There is a fix for at least SunOS inetd (100178-08), and in Solaris this is fixed. You can also run httpd standalone (preferably with the -fork command line option).

Most importantly, you should stop running httpd from inetd and rather run it standalone. This is because running from inetd is inefficient.


Server looks at funny directories and finds nothing

From version 2.0 until 2.15, you need to have an explicit map to file system in your rule file, e.g.:
        Map    /*    file:/*
but 2.15 doesn't have this limitation anymore.


But the document says rule file is no longer needed

True, but it also says you must remember to give your Web directory as a parameter to httpd, e.g.
        httpd  /home/me/MyGloriousWeb

httpd@info.cern.ch

CERN httpd 2.15 Release Notes

There is one single thing that needs to be done when changing over from httpd 2.14 to 2.15:
        Rename your old /htbin scripts to end in .pp suffix!

General Notes

CGI/1.0, Common Gateway Interface

Firewall Gateway Modifications

Other New Features

Enhancements, Fixes


Ari Luotonen - httpd@info.cern.ch - February 1994

CERN httpd 2.16beta Release Notes

Firewall Gateway (Proxy) Additions, Fixes

Firewall Gateway (Proxy) Caching

Other New Features

Enhancements, Fixes


Ari Luotonen - httpd@info.cern.ch - February 1994

CERN httpd 2.17beta Release Notes

General New Features

Access Authorization Enhancements / Proxy Protections

Enhancements, Fixes

Proxy Additions, Fixes

Proxy Caching

cgiutils

A new product cgiutils for producing HTTP1 replies from CGI scripts, and for easily generating the Expires: header given the time to live, e.g. "2 weeks 4 hours 30 mins".


Ari Luotonen - Henrik Frystyk - httpd@info.cern.ch - April 1994

CERN httpd 2.18beta Release Notes

New Features

Fixes


Ari Luotonen - Henrik Frystyk - httpd@info.cern.ch - April 1994
Ari Luotonen, Henrik Frystyk, CERN, May 1994

CERN httpd 3.0 PreRelease Notes

3.0 Prerelease 3

3.0 Prerelease 2

3.0 Prerelease 1


Ari Luotonen - Henrik Frystyk - httpd@info.cern.ch - April 1994