Application-Specific Proxy Servers as HTTP Stream Transducers

Charles Brooks,
Murray S. Mazer,
Scott Meeks, and
Jim Miller

Abstract: If one wishes to execute specialized processing on the HTTP requests and responses that flow between WWW clients and servers, one can add the processing in the clients, in the servers, or between them. We describe a novel approach to the latter: we generalize the notion of proxy servers to construct application-specific proxies that act as transducers on the HTTP stream. We have built a sample set of transducers, to demonstrate the idea, and an initial toolkit, to ease the task of constructing these transducers and attaching them to the HTTP stream.
Keywords: proxy servers, stream transducers, HTTP

Introduction

Typically, a WWW client (usually a browser) sends an HTTP request directly to the target WWW server, and that server returns the response directly to the client. WWW proxy servers [7] were designed to provide gateway access to the Web for people on closed subnets who could only access the Internet through a firewall machine. In the proxy technique, the WWW client sends the full URL (including protocol and server portions) to the designated proxy, which then connects to the desired server via the desired protocol, issues the request, and forwards the result back to the initial client. The key observation is that the client uses HTTP to communicate with the proxy, regardless of the protocol specified in the URL. An addition to proxy servers gave them the ability to cache retrieved documents to improve access latencies [7].

WWW clients and servers normally expect that the network will transport messages between them with the content unchanged; this is true whether a proxy is in the stream or not. That is, the correspondents expect the client's request to arrive at the server with exactly the content transmitted by the client, and they expect the server's response to reach the client intact. The underlying assumption is that the network is simply a mechanism for transporting messages between the communicating entities but is not a mechanism for applying application-specific processing to the message contents. A caching proxy, which may elect to serve the response itself instead of forwarding the request to the designated server, starts to challenge that assumption. Similarly, a load-balancing proxy, which might select a mirror site for a given request instead of the designated server, in order to reflect current network behavior, would further challenge that assumption. In both of these cases, the request may not reach the designated server, but the response likely reaches the client intact.

We challenge this assumption even further: we suggest that, for some classes of client/server applications and their network transactions, substantial value may arise from inserting, into the communication stream, application-specific transducers that may view and potentially alter the message contents. We are testing this hypothesis in the context of the World Wide Web, by building a sample set of proxy-based transducers that are bound to the HTTP request/response stream. This approach extends the ``standard'' WWW architecture in a way we believe is both novel and useful.

The rest of this paper proceeds as follows. First, we describe the motivation for introducing transducers into the HTTP stream. Then we review our approach to building and using this enhanced architectural component. We describe some examples, followed by a discussion of our publicly available toolkit [8], its implementation, and its performance. Finally, we consider some future work.

Motivation

We are developing this technology in the context of a larger research program, into approaches to building computerized browsing assistants [12] to aid the human user in accessing relevant information on the Web. We hypothesize that some of the browsing assistance is best achieved by monitoring and processing the HTTP stream between the user's browser and the Web. As an independent example, the idea of using an application-specific proxy was recently advocated for non-intrusive community content control [2]. Further, group-related assistance may arise from processing the stream between a workgroup's browsers and the Web; we are investigating a number of issues in this area.

We view the transducers as having four classes of functionality:

Filtering individual HTTP requests and responses.
Characterizing sets of messages.
Transforming message contents.
Additional processing indicated by the messages.

Here are some illustrative examples of each class in the WWW context:

Filtering: checking the validity of request headers (eliding improper ones or returning an error response to the sender); redirecting requests according to dynamic models of current network behavior; and eliding images from responses being sent to bandwidth-limited clients.
Characterizing: building a dynamic model of current network behavior based on request/response latency; and constructing a full-text index of all responses received by a workgroup (so that individuals may leverage the browsing behavior of the group).
Transforming: converting response formats into formats better suited for the client (e.g., downgrading video quality for a portable client); adding value to the content based on additional data sources (e.g., recognizing zip codes inside responses and converting them into anchors that link to census bureau data on the appropriate regions); and interpreting location-specific references inside requests to determine the appropriate destination server.
Additional processing: prefetching links embedded in each response and applying appropriate transforms, such as constructing a tree showing all document titles lying within two links of the current document.

These transducers are essentially specialized processing modules, whose inputs include at least the HTTP stream. They may also take input from other sources, may communicate with other processing modules, and may create output of any kind appropriate to their function. For example, a full-text indexer could produce an index searchable via a Web form.

These transducers may serve at least four classes of users:

Individuals: individuals may support specific interests and preferences through selection of transducers.
Groups: group transducers may effect group policy or provide benefit to individuals based on the actions of their peers.
Enterprises: as with groups, but on a larger scale.
Public: innovators, information providers, and entrepreneurs may offer transducers to the public.

Approach

Our approach clearly derives from the proxy server model: as illustrated below in Figure 1 for WWW browsers and servers, one can use the mechanism to insert other kinds of processing entities into the stream.

[An OreO as part of the HTTP stream]

Figure 1. An OreO as part of the HTTP stream

We refer to our HTTP transducers as OreOs (with appropriate apologies to the cookie makers), because the transducer is structured with one `wafer' to handle browser-side communication, another `wafer' to handle server-side communication, and a functional `filling' in the middle. As illustrated below in Figure 2, the OreOs take advantage of the HTTP proxy mechanism, essentially appearing as a server to the client side and as a client to the server side. Because the full URL is delivered intact to the OreO, the filling can use the scheme, server, server-relative URL, or request data in its processing.

[OreOs can use the HTTP proxy mechanism] Figure 2. OreOs can use the HTTP proxy mechanism

In some cases, the OreO will forward the request to another proxy (using HTTP proxy, as illustrated above); in other cases, the OreO may forward the request directly to the designated WWW server.

Figure 3 shows various possible compositions of OreOs to serve the needs of individuals, groups, enterprises, and the public.

[OreO servers]

Figure 3. Personal, group, enterprise, and public OreO servers

We advocate constructing highly specialized transducers that can be composed to produce more sophisticated aggregate behavior; this contrasts with building monolithic components that are hard to reuse. We propose that these composed functional chains may be configured dynamically as well as statically, on a granularity as fine as per-request or as coarse as per-session or across sessions. Currently, we have experience only with static configurations.

Our approach reflects the "stream model" of signal processing (later adopted and extended by the functional programming community [1]). Viewing the connection between the client and the server as a stream of data, one can build a standard box which accepts one or more such streams as input and produces one or more such streams as output. Each box can execute specialized processing on the stream(s). All of the boxes have the same basic interface (stream-in/stream-out), so they can be connected together, producing sophisticated processing from the composition of simple processing elements.

We expect that the selection of which transducers to use in which situations will in some cases be preconfigured, in others be under administrative control, in others be under user control, and in still others, be under OreO control. For applications which offer the user some direct control over the selection and invocation of network transactions, users will be able to select the functional modules to be used.

Examples

We have built several example HTTP transducers; these are meant to be evocative, rather than definitive. The first was constructed by deconstructing a CERN httpd and inserting the desired processing (written in C). The remainder were constructed in various languages using the toolkit discussed below. In addition, some of the functionality is enhanced by use of our internally-developed experimental browser, Ariadne [9]. Here are some examples; others are under development:

The first OreO performed several different functions:
- URL validation, to identify malformed URLs and report errors directly back to the browser, short-circuiting the request. In a sense, the OreO is ``serving'' part of the ``errorful URL space.''
- Measuring data transfer rates from the various servers, and reporting these to the user via a TCP backchannel (a separate communication channel) to the Ariadne browser.
- Creating a group history, based on the HTTP requests of the set of browsers passing requests through the OreO, and reporting the group history in one of two ways:
  - Through a dynamic HTML document, created and updated by the OreO, and accessed at a well-known URL.
  - Through the backchannel to Ariadne, and presented by Ariadne as a multi-colored tree in a special window.
A full-text indexing OreO, which creates a full-text index of all of the non-trivial terms appearing in the responses flowing through it. The OreO stores the index in a file system accessible by a CGI script sitting on a WWW server, and users may query the index via the usual forms interface. This OreO could serve an individual as well as a workgroup; the goal is to ease the problem of recalling the documents in which certain terms have appeared.
A ``protocol monitoring'' OreO that is used to display the request/response traffic and to diagnose malfunctioning client/server interactions.
A specialized ``wafer'' that acts as a gateway between unmodified browsers and WWW servers protected by DCE security mechanisms [OSF95c][10]. Communication between the wafer and the browser is via proxy HTTP, and communication between the wafer and the server is via DCE RPC [OSF95d][11].
A ``group annotation'' OreO, that supports the Stanford annotation approach [Ros95][14] but allows that functionality to be available to users of a variety of browsers, not just the Stanford one.
A ``rewriting'' OreO that encapsulates each anchor inside the Netscape Blink extension, making anchors easier to spot on monochrome displays.

As becomes clear through these examples, an OreO will often create some sort of information space (e.g., full-text index or group history) that may be made available to the user through a variety of mechanisms, such as

Query forms (e.g., the OreO writes data to a file that a CGI script can use to answer queries on the data)
A well-known URL (e.g., the OreO writes an HTML page to a location that a server can access to respond to requests for the information)
Interprocess communication (e.g., the OreO can send information updates or commands to a browser via a control channel to get the browser to display the relevant information to the user)
URLs that the OreO serves itself (e.g., the OreO traps requests to a particular part of URL space and returns the corresponding HTML document, which it might create dynamically)

Toolkit

To ease the task of building these transducers, we have produced a toolkit [OSF95a][8] which allows the filling developer to focus on the application-specific aspects of the transducer. In the initial version of the toolkit, we provide a `shell' that implements the `wafers' between which the filling is placed. The developer may use any program development system to create the filling, which is simply executed by the shell after it performs appropriate setup functions. The shell ensures that the filling is connected into the request/ response stream, so that the developer can simply operate on the contents and ignore the network-specific issues.

Processing Modes

The shell supports four processing approaches:

At most one request/response pair extant at any time, each pair flowing through a newly instantiated filling.
At most one request/response pair extant at any time, flowing through a single, persistent filling.
Multiple outstanding request/response pairs flowing through a single, persistent filling.
Multiple outstanding request/response pairs, each one flowing through its own filling instantiation.

Further, the filling developer may configure the shell so that the filling handles HTTP requests only, HTTP responses only, or both requests and responses; the shell handles the requests or responses (as a pass-through) if the filling does not.

Implementation

The OreO shell is implemented as a single Unix process that runs as a Unix daemon; this is due largely to the original example OreO's origin as the main() for the CERN (now W3C) HTTP daemon process (httpd). The shell accepts a series of command-line arguments that indicate the TCP/IP port on which it should accept connections, the program that it should execute to process the HTTP request and/or response, and whether the program should be executed in parallel or sequentially.

For each new connection, the shell can either open a connection to the downstream process (which can be specified either as an environment variable OREO_PROXY or via a command line argument), or simply allow the filling code to determine the destination for the request. The latter mode allows filling code to support new protocol implementations (such as an RPC based protocols).

A Single, Persistent Filling

If a single, persistent filling has been specified, the shell fork()s and exec()s a copy of the filling: this filling is expected to meet the rules for the design of a co-process: the process must retrieve a pair of Unix file descriptors corresponding to the client and server sockets from a private IPC channel created between the shell and the co-process. Once the process has retrieved these sockets, it must send an acknowledgement back to the shell (the above is encapsulated in an API provided as part of the toolkit). At that point, the co-process is responsible for the connections: it must read and write to the appropriate sockets, and close them as necessary.

The issue of multiple versus singular request/response pairs is thus the responsibility of the co-process. Normally, the processing will be singular, since the co-process will effectively execute the following loop:

do { 
  get a connection ;
  process the connection ;
} while(1);

Newly instantiated fillings

If a single persistent filling has not been requested, then the shell determines what kind of processing has been specified. The options are as follows:

process the HTTP request, the HTTP response, or both (but provide a different filling for the request and response streams).
process both the request and response streams, but do so in a single filling.

In the first case, if the filling chooses not to handle the request or response, the shell simply reads and writes data from one socket to the other. Each filling is set up appropriately: a request filling has its standard input set to the client socket and its output set to the server socket; the process is reversed for a response filling. This model is useful if there is no need to maintain state in the filling between the HTTP request and response streams, and is most similar to the notion of Unix filters (one way transforms).

In the second case outlined above, the shell sets the filling's standard input to the client socket, and the standard output to the server socket; a supplied API hides this implementation detail from the filling writer. This mode allows the filling writer to maintain internal state across the HTTP requests and responses.

Future versions of the toolkit should add new levels of abstraction for the developer; for example, we have specified a higher-level API that can present the requests and responses as pre-parsed HTML entities, so that the developer does not have to replicate that effort. Further, a toolkit might provide appropriate interfaces to the underlying WWW protocol support.

Performance

To gauge the effects of OreOs on HTTP request/response latency, we measured the additional delay introduced to a series of HTTP transactions by a PERL-based pass-through OreO (one that merely forwards requests and responses without change). Our results are encouraging: the delay experienced was between 3% and 6%, depending on the mode of the OreO. We also performed simple tests to determine whether humans could perceive the additional delay. Four identical browsers were configured to point through various configurations of OreOs and proxies (the ``worst'' being a chain of four pass-throughs), and test subjects were asked to identify each of the four configurations; the subjects could only identify the browser proxied through four pass-throughs. This suggests that users are already conditioned to the variability experienced in network transactions, and the addition of a small extra delay does not stand out. The OreO shell appears to add no perceptible delay--the filling, however, can be arbitrarily complex and therefore add arbitrary delay.

Futures

We have already begun thinking about extensions to our basic model. These extensions fall into three general categories: implementing the OreO toolkit on multiple platforms, extending the notion of filtering agents themselves, and increased browser/OreO interactions.

Other Platforms

At present, the OreO shell has been compiled and tested under HP-UX and under OSF/1 running on Intel platforms. Time and resources allowing, we plan to re-implement the OreO shell under Microsoft Windows/NT. This will require minor modifications to the networking code (vanilla BSD Unix to WinSock compliant code), as well as modification to the process creation and process execution model. The latter should not prove difficult, as the details of this mechanism have been encapsulated in a single routine. Other possibilities including re-implementing the OreO shell to run as a Windows/NT service.

Extensions

We have several ideas on how to extend the notion of filtering agents, as illustrated by the OreO. These ideas break down into three categories:

Improvements to the OreO shell,
Improvements to the OreO/Browser interaction model, and
Improvements to the actual processing of content.

Improvements to the OreO shell

We believe that the notion of the OreO shell is a good one: a layer of code that isolates the actual filling from the details of obtaining and processing network connections. The inspiration for this model is the standard input and output model for the Unix operating system [KER84]: programs can read from the standard input unit and write to the standard output unit without regard for whether these are files, terminals, network connections, etc.

Our initial thought was that the filling developer would encapsulate the invocation of the OreO shell and the appropriate filling code inside a shell script. While this has been true, we have imagined other models that would provide enhanced functionality in packaging and exporting transducing services to a user community. At present, if one wished to support several long-running OreOs, one would have to arrange explicitly for each script to be run when the machine is initially started. The following suggestions offer improvements to this situation.

OreO shell as inetd

In this model, the OreO shell is implemented similarly to the standard Unix inetd. Transducer fillings are configured similarly to the specification in the inetd.conf (perhaps specifying service name, protocol, port, program to run, etc.) As in the current inetd, the Oreo shell would accept connections on the various TCP/IP ports listed in its configuration file. New connections would result in a new process being created to run the filling code. Stdin, stdout, and stderr for the filling would be connected to the upstream (client) program by the shell; the filling would be responsible for connecting to the downstream (proxy) server by either reading a command line argument or an environment variable.

OreO shell as portmapper

In this model (inspired by the RPC daemons of both ONC and DCE [OSF95d][11]), the OreO shell functions as a registrar for the above transducers. Transducers register their services with the shell, at which time the shell listens on the port specified by the transducer. Clients connect to that port, and the shell redirects the connection to the appropriate IPC channel established between the transducer and the shell. Such a technique would permit dynamic registration of transducer code: the interface for clients would remain the same (proxy HTTP), whereas the interface between the shell and the transducer code would become more complex.

OreO shell as request broker

In this model, the OreO shell functions as a location-independent agent registrar, incorporating aspects of both the DCE rpcd and the DCE CDS (Cell Directory Services) servers. In addition to the functionality represented by the portmapper approach above, the shell would also be responsible for updating and maintaining a distributed database of agent functionality, such that an individual shell would be able to redirect a request for service to an appropriate location.

Generalized Agent Factory

In this model, the OreO shell becomes a "generalized agent factory" similar to the Softbot [13] or Sodabot [3] environments. Individual transforms are coded as functions that transform the HTTP stream as it is passed from transform to transform. The user may choose to implement these transforms in a programming language of some kind that provides its own specific GUI and other functionality.

OreO/Browser Interaction

We also intend to explore transducer/browser interaction. At present, our Ariadne browser sends an additional HTTP header (X-BackChannel:) that indicates a host, protocol, and port number on which the browser is willing to accept connections. This header is recognized by our OreO agents; servers simply ignore them. One can imagine extensions to this mechanism that are similar to the current Accept: headers, except that these headers indicate languages that the browser is willing to process: Safe-TCL, Python, Java, etc.

Finally, provided that browser's network point-of-presence is known (host, protocol, port), one can imagine using the NCSA Common Client Interface (CCI), the Spyglass Software Development Interface (SDI), or the NetScape API to communicate between a browser and an OreO. At this time, however, the mechanisms for establishing that point-of-presence are loosely specified, so it is not clear how well the above interfaces will work when the transducer would be running on a separate machine.

Improvements to processing information content

At present, the OreO shell presents a byte-stream interface to its client ``fillings.'' While this is a very general and flexible model, it is too low-level to provide the kind of productivity improvements that we had initially hoped for when designing the OreO shell. Achieving the next level of productivity will require a higher level of abstraction: we have been evaluating the W3C's Library of Common Code (libWWW)[5] as a basis for that higher level of abstraction. We believe that, at a minimum, the next level will provide an abstraction of an HTTP request/response object: a series of HTTP headers optionally followed by an opaque content body. Parsing of the input byte stream, then, becomes part of the process of constructing this object; destructing this object might include converting it to a byte-stream and directing it to the downstream sink. This functionality would be provide by the API and not directly by the filling code itself.

Once the request/response object has been constructed, we then need to determine how to transfer this object between various OreO transducers. At present, this process is quite inefficient, in that each transducer must construct the request/response object, transform it, and then re-convert it to a byte-stream representation in order to pass it on the (potentially) next transducer. Transducers may desire to see only the HTTP headers or the content body, or both. Modifications to content may require modifications to the headers (e.g., conversion of GIF images to JPEG would require modification to the Content-Type header). The above model then leads to a view of transducers as functions in a programming language, or as "functors" (functions as instantiable objects) [4]: the HTTP protocol object is passed from one such functor to another based upon some user-specified sequence.

Finally, we may wish to expand on the notion of the transducer as HTTP server. As suggested in the introduction, the CERN HTTP proxy server with caching [7] enabled is one kind of transducer. Another kind of transducer is an OreO that manages its own virtual Web-based namespace that has no mapping to an underlying file store. In this model, the OreO simply caches various information in memory: browsers send proxy requests to "service.machine.org" which fulfills these request from memory. The CGI interface to this OreO would be to invoke internal functions with the appropriate arguments: each function returns an object of type HTML.

We are beginning to work on how OreOs might communicate with each other for control purposes, and to share information. At this time, we are pursuing the notion of implementing a network blackboard via HTTP (the blackboard server would be implemented as a stand-alone OreO) using the techniques described above (OreO manages its own internal namespace, etc.)

Conclusions

We have presented a novel approach to introducing specialized processing on the HTTP requests and responses that flow between WWW clients and servers. We generalized the notion of WWW proxy servers to that of application-specific proxies that act as transducers on the HTTP stream. Our prototype OreOs demonstrate the utility of the concept, and our toolkit aims to ease the task of building such OreOs, without adding undue performance penalties. We encourage experimentation to determine the kinds of application settings to which such an architectural component is especially suited and ways in which to extend the WWW architecture further.

Availability of Software

FTPable source and binaries for HP-UX (and possibly other platforms) are available by anonymous ftp from riftp.osf.org in /pub/web/OreO. Please read the copyright notice.

References

1. Abelson, H. et al., Structure and Interpretation of Computer Programs, MIT Press, Cambridge, MA 1986.

2. Behlendorf, B., A Proposal for Non-Intrusive Community Content Control Using Proxy Servers, http://www.organic.com/Staff/brian/community-filters.html

3. Cohen, M., The SodaBot Home Page, http://www.ai.mit.edu/people/sodabot/sodabot.html.

4. Coplien, J.,Advanced C++: Programming Styles and Idioms, Addison-Wesley, Reading, MA, 1992.

5. H. Frystyk, Library of Common Code, http://www.w3.org/hypertext/WWW/Library/.

6. Kernighan, B., Pike, M., The Unix Programming Environment, Prentice-Hall, Englewood Cliffs, NJ, 1984, "Filters", pp. 101-132.

7. Luotonen, A., and Altis, K., World-Wide Web Proxies, http://www.w3.org/hypertext/WWW/Proxies/.

8. OSF RI World-Wide Web Agent Toolkit (OreO), http://www.osf.org/ri/announcements/OreO_Datasheet.html.

9. Ariadne,http://www.osf.org/ri/announcements/Ariadne_Datasheet.html.

10. , DCE-Web Home Page, http://www.osf.org:8001/www/dceweb/DCE-Web-Home-Page.html

11. , OSF Distributed Computing Environment, http://www.osf.org:8001/dce/index.html

12. Wide-Area Browsing Assistance for the World Wide Web, http://www.osf.org/www/waiba/.

13. Perkowitz, M., Internet Softbot, http://www.cs.washington.edu/research/projects/softbots/www/softbots.html.

14. Roscheisen, M. and Mogensen, C., ComMentor: Scalable Architecture for Shared WWW Annotations as a Platform for Value-Added Providers, http://www-pcd.stanford.edu/COMMENTOR.

About the Authors

Charles Brooks
OSF Research Institute
11 Cambridge Center
Cambridge, MA 02142
Murray S. Mazer
OSF Research Institute
11 Cambridge Center
Cambridge, MA 02142
Scott Meeks
OSF Research Institute
11 Cambridge Center
Cambridge, MA 02142
World Wide Web Consortium
Massachusetts Institute of Technology
Laboratory for Computer Science
545 Technology Square Cambridge, MA 02142
jmiller@mit.edu