WD-http-ng-goals-19980327

Short- and Long-Term Goals for the HTTP-NG Project

W3C Working Draft 27-March-1998

This version:: http://www.w3.org/TR/1998/WD-http-ng-goals-19980327
Previous version:: None
Latest version:: http://www.w3.org/TR/1998/WD-http-ng-goals
Editors:: Michael Spreitzer; Henrik Frystyk Nielsen

Status of This Document

This draft specification is a work in progress representing the current consensus of the W3C HTTP-NG Protocol Design Group. This is a W3C Working Draft for review by W3C members and other interested parties. Publication as a working draft does not imply endorsement by the W3C membership.

This document describes the short- and long-term goals for the work of the HTTP-NG Protocol Design Working Group (Members Only). The document does not describe an architecture, rather it describes the goals and requirements of a potential new generation of the HTTP protocol and how we intend to evaluate these goals.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress".

Note: As working drafts are subject to frequent change, you are advised to reference the above URL for "Latest version" rather than the URLs for specific working draft versions themselves. The latest version URL will always point to the most current version of this draft.

Comments on this specification may be sent to <www-http-ng-comments@w3.org>.

Part I Long Term Goals

Introduction

The goal of the HTTP-NG project is to test the hypothesis that the current HTTP/1.X approach to web protocol design can be replaced with one in which the Web is expressed as a particular set of interfaces on top of a generic distributed object system designed with Internet constraints in mind. We expect that a layered approach would bring many benefits to the Web, including easier evolution of the protocol standard, interface technology that would facilitate Web automation, easier application building, and so on. We also intend to do other pressing HTTP work, notably including further performance improvements, in the context of HTTP-NG. Even if we find that the main hypothesis is false, we hope, expect, and intend that much of the experience we gain will be useful to other design approaches.

HTTP-NG has to support (at least) the same architectural features of the Web that HTTP/1.X does. As opposed to the matters of degree below, these are cast herein as yes/no issues: does the protocol do a required thing? These yes/no issues are listed in the second section of this document.

To prove the main point, HTTP-NG has to actually go out there and replace HTTP/1.X. There has to be a way to deploy HTTP-NG into the existing web that is using HTTP/1.X (in addition to all the non-HTTP protocols), such that HTTP-NG can co-exist, interoperate, capture a growing fraction of the traffic, and eventually supplant HTTP/1.X.

For HTTP-NG to take over from HTTP/1.X, it does not suffice for us to make it technically possible (although that is certainly an important goal); in addition, people have to want to switch. For people to want to switch to it, it has to be better than HTTP/1.X. There are many dimensions in which protocols can be compared. There are people in many different roles, and they have different utility functions. In addition to the yes/no issues mentioned above, web protocols can be compared in several more-is-better dimensions. In a more-is-better dimension, the question is a matter of degree. The more-is-better dimensions are further broken down into "critical" and "non-critical". We expect HTTP-NG will have to be significantly better than HTTP/1.X in each of the critical dimensions, and at least as good as HTTP/1.X in most of the non-critical dimensions. We look forward to enriching future versions of this document with better statements about the absolute and relative importance of these axes with knowledge delivered by the Web Characterization Group. We also hope they can give us many cases where there is a definable "good enough".

The Web, in the fullness of its actual use, is expanding to include more general distributed applications, which are being layered on top of HTTP, often with unnecessary performance costs, functionality warts, and lack of generality resulting. By moving the Web to a generic distributed object system basis, we hope to enable these applications to use the generic object system directly. In particular, we would like the generic distributed object system to be simple, yet rich enough to meet the semantic and performance requirements of CORBA, DCOM, and Java RMI. This doesn't mean we intend to unify the object models of CORBA, DCOM, and Java RMI (which are somewhat diverse and warty). It means that we'd like to move HTTP onto a generic distributed object system of analogous strength, so that programmers (and middleware-owning organizations) willing to switch could get roughly the same jobs done (at least). Thus, the axes for evaluation below include the issues that would have to be done well to make a good generic distributed object system. Furthermore, deployment of HTTP-NG will be much easier if existing middleware products (CORBA, DCOM, and Java RMI) can be made useful for HTTP-NG with little effort or lossage.

Gotta-Have-It Axes for Evaluating HTTP*

HTTP transports request-response interactions between a client and a server.
The interaction is with an object called, in the context of the Web, a "resource". A server animates multiple resources.
The client and server in any given transaction are not necessarily the "ultimate" client or server; each may be acting as a proxy for another agent. By "proxy" we mean any program that acts as a server to some client(s), and, on their behalf, as one or more clients to other server(s). A proxy services requests internally or by passing them on (possibly transformed). When message traffic is secured, a proxy is trusted with the plaintext content of the messages.
We use the term "origin server" for a server that is not a proxy.
An origin server may be a gateway into another system.
Proxies may or may not be semantically transparent to other clients and server. The level of transparency is controlled through the Web interfaces.
Tunnels are intermediates that do not in any way take part in the HTTP communication but merely act as transport relays.
The semantic content of certain request-response interactions may be cached to improve both the latency and traffic load presented by equivalent requests in the future. There is a way to determine when a cache entry is no longer valid (in the current web this is not very accurate). Document retrieval interactions are cachable, subject to the server's assent. Caching can be semantically transparent when required by all parties. It must be possible for clients to detect any potential relaxation of semantic transparency. In particular, semantic transparency can only be relaxed when explicitly requested by client or origin server. Furthermore, when semantic transparency is relaxed by a cache or client, an explicit warning is given to the end user [what's an "end user"?].
A reference to a resource is called a Uniform Resource Identifier (URI). A URI conveys the information needed to contact a particular resource and conduct an interaction with it. Many different schemes for resource access are possible in the Web, and URIs are structured in a way that reflects this. A URI is a character string (wrt unspecified character set) that consists of a short, centrally-administered scheme name (such as "http"), a colon (":"), and finally a scheme-specific following the URI syntax rules. A URI may be either name-like or address-like. HTTP has the option to support any URI scheme - either directly via origin servers or via gateways into other access schemes. "http" is one of the access schemes available in the Web. The character set and encoding issues are specified for http URIs, in a way that is at least as expressive as specifying the use of US-ASCII. An http URI is address-like: it contains a DNS name, an optional TCP port number, and a server-relative resource identifier. Some URIs --- in particular, http ones for the introductory resources for English-speaking individuals, organizations, products, projects, and so on --- are user-friendly (short, unambiguous, meaningful) enough to be printed on billboards, business cards, airplanes, and so on.
There is no central administration of all the servers nor all the clients in the web. Anybody can put up a server (of any kind), and anybody can run a client (of any kind). There is no global coordination of administration and/or security domains (beyond the minimal structure given us by the Internet). There is no expectation that every party will be using the same security systems. There is no expectation that every party will be running the same version of the relevant software, nor even speaking the same version(s) of the relevant protocols. Programs and protocols are developed independently by many parties and deployed into the existing web in such a way that they can interoperate to the degree that it makes sense.
Message traffic can be secured via SSL (among, perhaps, other means).
Before granting access to a resource, a server can require a client to authenticate itself. (This is at a higher layer than is secured by SSL, which is often used in a way that authenticates the server to the client but not vice versa). Two schemes are available for use here, and the server chooses which one(s) are acceptable. In one scheme, the client simply supplies a name and password for an account maintained by (or at least accessed by) the server. The other scheme avoids sending passwords by using a secure hash of them (and other data) instead.
A resource can migrate to a different server without breaking all the URIs printed in the world, in a way that requires the resource's former server to retain and report only a small amount of information about that resource (such as a new URI, as opposed to something like a full copy of the resource).
Even though an http URI includes a DNS name, a single machine can be the origin server for URIs that use many different DNS names.
An HTTP message can contain a MIME-typed value.
A common interaction is GET, which fetches a MIME-typed value.
The MIME type, natural language (if applicable), and other aspects of the particular value returned in a response are functions of a feature called "content negotiation", which is about returning the "best" representation from among several available. The selection may be performed by either client or server. To help the server make a good selection, the client can supply, in a request, information on its preferences for MIME type, character set, natural language, and other aspects. When the server makes a selection, the response indicates which information from the request affected the choice, so that a cache can apply the cached interaction as broadly as possible. To enable the client to make the selection, the server can return a list of the available alternatives and their characteristics.
Content negotiation and especially its interactions with caching must be fully understood
A common interaction is POST, in which the request conveys some name-value pairs of character strings, and the response returns a MIME-typed value (subject to content negotiation).

More-is-Better Axes for Evaluating HTTP*

(critical) Networking performance. Specifically, latency and bandwidth. (Common sense follows.) When optimizing the common cases at the expense of the less common ones, we can't let the less common ones get too bad --- and not even very bad at all if they're still fairly common. An example of the first caution is that when using cache techniques, things should not be very bad when there is a cache miss; an example of the second is that when cache misses are still fairly frequent (e.g., when a client caches information about servers recently used, a large fraction of requests (1/4?) are to new servers and thus miss in that cache), things should work pretty darned well even when there's a cache miss.
Local processing cost. That is, demands on CPU, memory, disk, etc of clients and servers. Sadly, this is sometimes antagonistic with minimizing network bandwidth requirements.
(critical) Scalability. That is, the applicability to a web that truly is world-wide, and still growing quickly. This involves not only the general efficiency issues of the previous two items, but also requirements to handle (a) large numbers of servers and clients, and (b) the expected patterns of interaction among them.
(critical) Modularity. This is a very important kind of simplicity, which the current HTTP lacks. This includes both layering and other kinds of modularity.
(critical) Evolvability. The most used protocol on the Internet (HTTP) is being asked to serve an ever-growing variety of purposes. It has already absorbed several evolutions, and more are coming. This axis is about how easy it is to make a protocol evolution "in isolation", and how easy it is to make a correct evolution in the full context of the web (i.e., other evolutions made, often independently, in the past, concurrent present, and future).
Suitability for extension by WebDAV. This axis is not about extensibility in the abstract, but readiness for a particular extension that is of great importance. This may be a non-trivial issue because WebDAV changes the nature of the web, from one where clients mainly read/use resources to one where clients also author resources.
(critical) Expressiveness. The RPC layer needs to enable expression of interfaces equivalent to current HTTP, and of the extensions we see coming down the pike. Easily identified extensions include the work of the webdav, acap, ipp, and mmusic WGs of the IETF. This is not a matter of whether the interface can be expressed at all (it always can, although sometimes with the loss of a small or a large amount of detail), but how useful the expression is.
(critical) Security. This includes the maximum strengths available, the variety of choices available, and the flexibility of the negotiation of these choices. When looking at strength, don't ignore denial-of-service attacks. Should support both public key and secret key mechanisms. Must respect the fact that the web spans multiple, very independent security domains. It is desirable to support the full variety of policy making and enforcing structures of various organizations. This includes, and is certainly not limited to, facilitating the job of a firewall (and its administrator).
Support of liberty. The HTTP design should place as few limits as possible on what a client or server does and who he/she/it requires/trusts to do things for him/her/it.
Transport flexibility. It is desirable to be able to run the Web over transport protocols other than TCP. More different even than Transactional TCP, state-sharing TCP, and MUXing TCP. At least because there are people who want to use HTTP* over something other than the TCP-like protocols of the Internet. Perhaps also because we might find we can't get ideal TCP-like services from the Internet initially, and we'll have to support a gradual shift (e.g., from TCP to T/TCP).
Support for resource migration: ability to redirect requests, to update pointers, to garbage collect unreferenced resources and/or forwarding pointers. Supporting migration is fairly important, but garbage collection is less important.
Support for resource replication. This naturally includes support for maintaining an appropriate consistency between a cache and its sources. This might include ways for a client to pick a good replica. This might include fine-grained delegation of rights to caches. This may interact with the WebDAV extension.
Support for privacy.
Limited trust exposures. That is, how little trust has to be extended to other parties for a given party to get its job done.
Support for nested and recursive RPCs. It would be a shame if the RPC mechanism prohibited, or introduced deadlocks in, these cases.
Support for small clients and/or small servers. For embedded applications, where a client or server can't call on much CPU, memory, disk, etc.
Support for internationalization.
Support for Quality Of Service negotiations. As in ATM and IP (i.e., RSVP). This is relatively unimportant.
Support for application robustness. An application should be able to control its usage of various resources. This may involve limiting or shedding load at several levels of abstraction. It may, in general, be useful to let the peer know the reason when an interaction is rejected or aborted due to resource limits.
Support for intellectual property rights management, including payment.
Support for disconnection operation.

Part II Short Term Goals

By the end June 1998 we intend to produce a design and prototype that demonstrate the following things.

The three-layer structure - transport, RPC, and web interfaces - described in the HTTP-NG project charter. We do not expect that any of the details involved (e.g., MUX design, choice/design of RPC system, particular web interfaces) will necessarily appear in the final HTTP-NG. The point here is simply to demonstrate feasibility with this kind of technology.
The performance and efficiency gains of the MUX and W3NG protocols. These are not the only performance gains that could be addressed for HTTP-NG in general - but the limited time frame permits examination of only a few of the identifiable opportunities. Indeed, since these are new protocols, it's not at all clear that we'll have time to implement and tune them to the point where they show actual improvements. Furthermore, there has been some debate whether we can expect to measure the improvements in our testbed, due to both the issues of creating circumstances under which these protocols give improvements and the magnitude of the improvements. Finally, since we do not intend to focus on local processing costs, there is the issue of how to make convincing measurements of the aspects on which we *would* focus.
The scalability of the existing web (except for local processing cost considerations). That is, we'll follow the existing web's completely decentralized architecture, and we won't make the networking costs noticeably worse. We are ignoring local processing costs only as a matter of expediency - we can only work on so much in the given time.
The extensibility of existing distributed object technology. That is, we won't try to add any new features corresponding to HTTP+(PEP|mandatory)'s extensibility by optional/required headers (which is not to say that existing technology can't model them in any way). We will show ways in which existing distributed object technology can handle extensibility concerns.

The prototype will be tested/demonstrated in a testbed. The testbed includes modified versions of existing servers (and clients, if we have the time), where an ILU implementation of the NG protocol substack is available instead of or in addition to the server's original HTTP substack. The testbed will also include robotic clients, to produce controlled loads. The WCG will deliver characterizations of the loads for us to apply.

Michael Spreitzer, Henrik Frystyk Nielsen
@(#) $id: Overview.html,v 1.12 1997/11/11 02:27:32 frystyk Exp $