A Formalism for Internet Information References

Daniel W. Connolly
$Id: formalism.html,v 1.6 1995/02/08 08:19:38 connolly Exp $

Abstract

This is a mathematical model of computation for the resolution of references between typical information objects in distributed information applications in the internet community; specifically, the model covers the URI concept as used in HTTP and the World Wide Web, and the external body mechanism from MIME.

It provides a foundation for formal definition of semantics of so called meta-information in HTTP, URCs, so that distributed computing issues such as reliability, scalability, and security can be explored.

See Resource Discovery and Reliable Links for a discussion of terminology, related issues, and related discussion resources.

Introduction

From Integration of Internet Information Resources (iiir), Mon Apr 18 22:00:54 CDT 1994::

The Integration of Internet Information Resources Working Group (IIIR) is chartered to facilitate interoperability between Internet Information Services, and to develop, specify, and align protocols designed to integrate the plethora of Internet information services (WAIS, ARCHIE, Prospero, etc.) into a single ``virtually unified information service'' (VUIS).

The body of internet information resources is available through a number of widely deployed technologies (FTP, Gopher, HTTP, WAIS), and there are several successfully deployed applications that combine these technologies to provide information consumers with a consistent model of information regardless of the underlying technology.

But this consistent model breaks down due to a variety of faults, and the user is often left to wonder where the heart of the problem lies. With a comprehensive model of computation, it will be at least possible to define the correct behaviour and a set of fault detection and toleration mechanisms.

On the other hand, this user model has enabled a much larger audience to access internet information resources. The result is a noticeable increase in network traffic. The client-server model where all N information clients make round trips to all M information servers, creating traffic on the order of NxM, is slowly giving way to resource migration and load-balancing techniques (e.g. caching and mirroring). But these techniques are being deployed in an ad-hoc fashion, and it is not clear that, for example, proxy servers do not introduce complications to the underlying protocols.

Security, privacy, and intellectual property issues are only beginning to be addressed. (For example, proxy servers completely punt on the issue of caching access-controlled documents). Ad-hoc techniques are not acceptable strategies to address these issues.

The technology to support the growing base of internet information resources will only get more sophisticated as we attack to the problem of large scale data reduction (resource discovery and navigation) and as we employ the machine more and more to augment learning and the matinenance of information. Formal techniques are necessary to reduce the complexity of such technology.

Foundations

This formalism is based on the first-order, many-sorted logic of Larch, in the hopes that it can be integrated into the development of software that implements the formalism.

The formalism comprises the following Larch traits:

Reliable Caching and Mirroring

An Example Scenario

This section is somewhat out of date

As an example, consider successive accesses of http://S/path via an HTTP proxy P:

  1. Client C1 sends a request req1 for r=http://S/path to P.
  2. P contacts S, makes a GET request for /path, and receives a response resp1:
    	Date: Mon Dec 12 19:30:39 CST 1994
    	Last-Modified: Fri Dec 9 19:30:39 CST 1994
    	Expires: Fri Dec 16 19:30:39 CST 1994
    	Content-Type: text/plain
    	
    	blah blah blah...
    

    Let

    Then at this point, the proxy knows:

    	e0 = HTTPGet(req1, t0), i.e.
    		e0 in Represent(r, t0)
    		and e0 minimizes AcceptPenalty(req1, Represent(r, t0))
    			    over Represent(r, t0)
    	Last-Modified(r, t0) = t1
    	Expires(r, t0) = t2
    

    Since there are no URI: headers in the response, the proxy also knows:

    	Represent(r, t0) = {e0}
    
  3. The proxy passes resp1, containing e0 on to the client C1. It updates its cache so that Pcache[r] = resp1.
  4. Client C2 makes a request req2 via the proxy P. req2 has the same URI as req1.
  5. P examines its cache and finds Pcache[r] = resp1.

Future Discussion

Other attributes of references

There are certain attributes commont to many information resources. For the resources to which some of these attributes apply, we should develop models and mechanism to make it possible compute the attributes reliably:

Other Topics