A DPE Backbone for the World Wide Web

Mika Liljeberg, Heikki Helin, and Kimmo Raatikainen
University of Helsinki, Department of Computer Science
EC/ACTS-project DOLMEN (AC036): Service Machine Development
for an Open Long-term Mobile and Fixed Network Environment


The EC/ACTS-project DOLMEN involves Information Browsing as the application in a trial. In this context we have implemented a subset of the functionality specified in the HTTP-NG [W3C2]. An international trial involving both fixed and mobile networks (configuration outline) in pan-European scale will take place in May 1996. We call our subset as HTTP/Session Protocol (HTTP/SP) which is implemented on top of the Inter-Language Unification (ILU) system [PARC]. The essential features of HTTP/SP include:

  1. compatibility with the current HTTP,
  2. separation of the control and management planes from the user plane,
  3. permanent session connection between a client and a proxy, semi-permanent session connections between a proxy and servers,
  4. replicated caches maintained by proxies,
  5. automatic prefetching of inline images.

The role of a Distributed Processing Environment (DPE) is to provide the control and management planes of communication as defined in the ISDN Reference Model and to provide some of the transparencies (location and replication, in particular) as defined in the Reference Model for Open Distributed Processing [ODP].

Perspective

The current information browsing technology (i.e. the World-Wide Web) in the Internet is characterized by a number of fundamental problems. Performance is poor due to obsolete protocols and due to the lack of suitable caching and propagation methods for frequently accessed documents; see [ArW96,CrB96,PaM95] for some excellent analyses of WWW traffic patterns. Finding information is difficult since WWW has no built-in directory service. Publishing documents is equally difficult because there is no coherent way of making a new document known to the public. The availability and reliability of servers are often compromised due to network problems, server problems, and faulty software. There is no integrated support for replication to overcome these problems. The consistency of information in the Web is questionable. For instance, one often finds links that point to removed documents or old versions of the document.

A coherent new information browsing infrastructure is needed to solve the problems in the current technology. Consequently, the World-Wide Web Consortium (W3C) has embarked on an active program that aims to define a new information browsing architecture and achieve its implementation on a global scale. The W3C activities [W3C1, W3C2, W3C3] address some of these problems.

The role of information browsing in the DOLMEN project is to be the application that tests the validity and functionality of telecommunications infrastructure based on the DOLMEN Service Architecture called OSAM and provided by the DOLMEN Service Machine. We contend that the information browsing technology as it is today is insufficient for the challenging advanved te÷ecommunication architectures. Therefore, we should stay abreast the current research in the World-Wide Web community and develop new solutions. Here we propose new solutions and their implementations on the top of DOLMEN Service Machine using a DPE as one of the enabling technologies.

Figure 1: A WWW Backbone Network

Figure 1 illustrates the authors' view of how an efficient information browsing system for non-real-time multimedia information might be structured in the future. The Web in the figure consists of interlinked WWW Cache Servers (WWW Proxy Servers), which function as the backbone for all WWW traffic. The cache servers can be structured hierarchically, each of them made responsible for a well-defined geographical area or administration domain. The proxies can also act as gateways to outside networks, such as the Internet. In the case of the Internet the gateway functionality is needed to maintain interoperability with the current WWW software. This structure is reminiscent of how global naming services are organised.

A WWW browser always connects to a local WWW server for locally maintained information, or to a local cache server for outside information. Cache servers buffer retrieved documents so that frequently referred information tends to move closer to the users. Both WWW servers and caches can be replicated in order to improve availability and performance. Replication is necessary at least in the cache servers forming the backbone network. The new WWW servers and cache servers communicate using an enhanced hypertext transfer protocol that is designed for the efficient transfer of hypermedia information. The protocol might, for instance, support multicasting cache invalidation messages and updates of frequently referred documents to the backbone cache servers.

Real-time multimedia information requires guaranteed-bitrate end-to-end connections between two or more communicating end-user clients and real-time multimedia application servers. This is very different from the scheme for non-real-time services, which do not require end-to-end connections and can cope with available bitrate. While an infrastructure based on caching proxies does not make sense in this context, some level of integration between the two worlds can still be envisioned. For instance, a person's home page might contain a hypertext link that, when clicked, tries to establish a video-call to that person's video phone. In this way, real-time services could be invoked from an information browsing service. This has an implication on naming: all services accessible in this manner should be mapped into a common namespace with information resources.

Distributed Processing in Future Information Systems

The main functionality of a distributed processing environment is to provide distribution transparencies for applications. These transparencies include access, location, failure and replication transparency. Thus, a DPE can be used to increase the performance and dependability of distributed applications. Here we consider how the advanced information browsing architecture outlined above might be realized with a DPE as the enabling technology.

Figure 2: WWW Server Clusters

We have already introduced the notion of interlinked, replicated, caching WWW Proxies forming the backbone of an information browsing network. Figure 2 illustrates a possible realization of the backbone network in a DPE environment. CORBA is used as the sample environment. The figure presents two types of clusters: a WWW Cache Cluster and a WWW Server Cluster. Each cluster is a part of a local distributed processing environment and has access to the services of an Object Request Broker (ORB) that also provides interoperability between clusters.

In the example given in Figure 2 the Cache Cluster has two instances of Caching WWW Proxy Servers that share a common cache in a local distributed file system. The proxies bind to other proxies and servers through the ORB and communicate with the enhanced HTTP protocol, which is implemented on top of CORBA's Internet Interoperability Protocol (IIOP) or one of the other interoperability protocols. The Server Cluster in the example has a non-replicated WWW server and a Search Engine that provides search services to information maintained in the cluster. All the servers and proxies register themselves to the local ORB in order to join the global environment. The ORB provides access, location and replication transparency to all clients. Replicated proxies and servers provide improved failure transparency.

Each cluster is in charge of a specific domain. Domains have names and the servers within a domain derive their names from the domain name. Domains are structured hierarchically, with Server Clusters forming the lowest level domains and Cache Clusters joining them as higher level domains until a global domain is attained. Information resource names are formed by concatenating domain names starting from the global domain and working down towards the lowest level domain. Information resources are not directly registered to CORBA because of the huge amount of information. Instead, only servers and databases are registered. There is a school of thought that wants to make information resource names completely opaque (i.e. independent of location). We think this is an unrealistic approach because it would involve registering all the tens or hundreds of millions information resources into a global name space. We would settle for replacing the currently employed server name with the more flexible domain name.

Clients send requests to the proxy at their local domain. The proxy is then responsible for finding the actual information. If the information is not stored or cached in the local domain, the proxy bumps the request up into a higher level domain. The process continues until either the information has been found in a cache or the proper server has been located. Neighbouring domains might also define shortcuts (i.e. direct routes) to each other so that requests between them can be forwarded directly without involving higher level domains.

In the DOLMEN Service Architecture, Server Clusters are application servers. The Cache Clusters are value-added services that run on service nodes hosting the DOLMEN Service Machine. Cache Clusters may also act as application gateways to legacy information systems, such as old-style WWW and Gopher WAIS. Naming services provided by the DOLMEN Service Machine would be used to integrate information resources and real-time services under a coherent name space. Connection management services provided by the service machine would be used to manage the connections between Cache Clusters as well as between clients and servers. In the future the DOLMEN Service Machine may provide a component supporting management of distributed information that could be utilised to implement a distributed cache within a cluster.

Implementation based on the Inter-Language Unification (ILU) System [PARC] is briefly summarized as an appendix. Public DOLMEN Deliverable entitled Evaluation of Current Communications Technology in Hypermedia Information Browsing is scheduled to be published in July, 1996.

References

[ArW96] M. Arlitt, C. Williamson. Web Server Workload Characterization: The Search for Invariants. ACM SIGMETRICS 96-5, Philadelphia, PA, USA, 1996.

[CrB96] M. Crovella, A. Bestavros. Explaining World Wide Web Traffic Self-Similarity. ACM SIGMETRICS 96-5, Philadelphia, PA, USA, 1996.

[ODP] ISO/ITU. Reference Model for Open Distributed Processing.

[PaM95] V. Padmanabhan, J. Mogul. Improving HTTP Latency. Computer Networks and ISDN Systems, 28 (1 and 2), December 1995, pp. 25-35.

[PARC] Xerox Parc. Inter-Language Unification - ILU.

[W3C1] W3 Consortium. Propagation, Replication and Caching.

[W3C2] W3 Consortium. W3C Activity: HTTP-NG, the Next Generation.

[W3C3] W3 Consortium. WWW and OOP.


May 13, 1996

Mika.Liljeberg@cs.Helsinki.FI
Heikki.Helin@cs.Helsinki.FI
Kimmo.Raatikainen@cs.Helsinki.FI