W3C httpd proxies

Proxies

Proxy is a HTTP server typically running on a firewall machine, providing with access to the outside world for people inside the firewall. W3C httpd can be configured to run as a proxy. Furthermore, it is able to perform caching of documents, resulting in faster response times.

Ari Luotonen and Kevin Altis have written a joint paper about proxies which will be presented in the WWW94 Conference.


In This Section...


Setting Up W3C httpd To Run as a Proxy

W3C httpd runs as a proxy if its configuration file allows URLs starting with corresponding access method to be passed. Typical proxy configuration file reads:
    Pass http:*
    Pass ftp:*
    Pass gopher:*
    Pass wais:*
Note that W3C httpd is capable of running as a regular HTTP server at the same time; just add your normal rules after those ones.

WARNING: The proxy_xxx environment variables that are used to redirect clients to use a proxy also affect the proxy server itself. If this is not your intention make sure that those variables are not set in httpd's environment.


file: URLs

Often file: URL is used as an ftp: URL; if the file_proxy environment variable is set for the client (and it works) W3C httpd can be made to Map all file: URLs onto ftp: URLs by placing this Map rule in front of the Pass rules:
    Map  file:*  ftp:*

Proxy Protection

cern_httpd 2.17 and newer provide a mechanism to protect the proxy against unauthorized use (in fact, the machinery behind this is the same that is used to set up document protection when running as a regular HTTP server).

Enabling and Disabling HTTP Methods

By default only HEAD, GET and POST methods are allowed to go through the proxy. You can enable more methods using the Enable directive in the configuration file:
    Enable PUT
    Enable DELETE
The Disable directive disables methods:
    Disable POST

Defining Allowed Hosts

A certain protection setup is defined to the proxy as a single entity that is given a name. Later, when protecting certain URLs this name is used to refer to the protection setup. (The name can also be the absolute pathname of the file that defines the protection, if one wishes to store protection information in a different file.)

Protection is defined as follows:

    Protection  protname  {
        Mask @(*.cern.ch, *.desy.de)
    }
This defines a protection that allows all request methods from domains cern.ch and desy.de, and none from elsewhere. This protection can be referred to by protname.

You can also use IP number templates:

    Protection  protname  {
        Mask  @(128.141.*.*, 131.169.*.*)
    }
Note that IP number templates always have four parts separated by dots.

If allowed methods are different according to domain, e.g. GET should be allowed from both of these domains, but POST and PUT only from cern.ch, you can use GetMask, PostMask, PutMask and DeleteMask directives instead:

    Protection  protname  {
        GetMask  @(*.cern.ch, *.desy.de)
        PostMask @*.cern.ch
        PutMask  @*.cern.ch
    }
Note that parentheses are necessary only if there is more than one domain name template.

WARNING: Don't use password protection on the proxy - the WWW access control was not designed with proxies in mind, and it isn't safe to use passwords with the proxy for certain reasons. Fixing this needs an addition to the HTTP protocol.

Actual Protection

The Protect rule actually associates protection with a URL. In case of proxy protection you would typically say:
    Protect  http:*   protname
    Protect  ftp:*    protname
    Protect  gopher:* protname
    Protect  news:*   protname
    Protect  wais:*   protname
which would restrict all proxy use to the allowed hosts defined previously in the protection setup protname. Note that protname must be defined before it is referenced!


Caching

W3C httpd running as a proxy can also perform caching of files retrieved from remote hosts. See the configuration diretives controlling this feature.


httpd@w3.org, July 1995