Shared Information Space:
An Interactive, Collaborative System Enablement Perspective

Sankar Virdhagriswaran, Crystaliz Inc.
Mike Webb, Crystaliz Inc.
Jeff Mallatt, Crystaliz Inc.

Enabling interactive, concurrent, task oriented, collaborative business systems that integrate information from heterogeneous information sources is a crucial need that is yet to be realized over the WWW. At Crystaliz Inc., we have been developing LogicWare(TM) - a framework that addresses this goal. LogicWare enhances the current unidirectional, synchronous, file oriented, hyper-text architecture of WWW to a bi-directional, asynchronous, database oriented, compound document architecture. In addition, LogicWare provides a rationalized, extensible framework which can be used to add a number of services such as concurrency control, transactions, etc., in a modular fashion. Based on the experience of developing LogicWare, we will present recommendations on enhancing current HTTP/HTML infrastructure to support shared information spaces.

Requirements and LogicWare Responses

In order to develop collaborative business systems that utilize data from heterogeneous databases and use heterogeneous editors to support interactive, concurrent editing, a number of extensions need to be made to the current HTTP/HTML architecture. These extensions should support development of business systems that support structured, semi-structured, and unstructured tasks; facilitate interaction with multiple data sources; and manage the concurrency control aspects of multiple individuals interacting on the same information space. The requirements for change can be broadly classified into three areas:

  1. Content Model
  2. Data transfer
  3. Architecture

In the following sections, we describe the requirements and LogicWare's approach to addressing the requirements.

Content Model

The current content model of WWW - HTML, is not extensible and limits clients to browsing hyper text information. In developing interactive, collaborative, information systems application developers need access to various content models that are suited for editing. Examples are text (a subset of hypertext), 2D graphics, 3D graphics, spreadsheets, databases, etc. These extensions to the content language need to be made in a rationalized way instead of mutating HTML into something it was not intended to be.

LogicWare uses a meta-class oriented object extension to MIT Scheme to implement a rationalized set of content models to interact with different editors (e.g., spreadsheet programs, word processors, etc.)

Data Transfer

HTTP is a navigation oriented, single file per connection, protocol. This is insufficient for dealing with object oriented file systems that do not make a clean distinction between files and directories (e.g. OLE storage, OpenDoc Bento, or Object Management Architecture Common Object Services storage services) and data base systems that support the notion of queries and cursoring. The HTTP protocol needs to be extended to transfer multiple segments, in multiple connections and yet present the appropriate interaction model to the users.

LogicWare uses an asynchronous, out-of-band, back channel approach to solve the above problem. In order to save space, we do not present the details of the approach here.

Architecture

The current architecture of HTTP/HTML clients and servers is weak in three areas:

  1. Interaction
  2. Information space integration
  3. Extensibility, and
  4. Customizability

Interaction

The current architecture of HTTP is based on a synchronous, connection oriented interaction model. This needs to change to a asynchronous, connection less model of interaction if interactive applications such as asynchronous conferencing (which involves asynchronous multi-cast to clients) or workflow (which involves sending asynchronous notifications to clients) need to be developed.

LogicWare implements an asynchronous protocol based on Knowledge Query Manipulation Language (KQML). Protocol elements of KQML such as broadcast, subscribe, etc., are used to address the above requirements.

Information Space Integration

Due to the architecture of CGI, gateways to foreign information spaces (e.g., relational databases) is limited to merging snapshots of information from the foreign information space into the hyper-text information space. This is not sufficient for developing shared information spaces.

First, fundamentally different types of gateways need to be developed to integrate with various information sources. Relational databases can be integrated with a synchronous gateway. On the other hand, transaction processing systems are usually integrated using an asynchronous, messaging gateway. Editors and object oriented databases are integrated using an asynchronous, event oriented gateway. These gateways need to support a way for sending queries to the information source, extracting the results, exporting and importing catalog (schema) information, cursoring through the extracted information by a WWW client (useful in the case of large snap shots extracted from databases), and synchronization of the extracted snapshot with the data source that owns the snap shot. Furthermore, if application developers want to use a federated approach, where a global schema, that manages the interactions with the local schema of the foreign data sources, is implemented within the compound document information space, the extensions need to support a way for modeling the global schema and its interactions with the local schema.

LogicWare implements different types of adapters that provide the gateway services presented above. Cursoring is supported by using MIT Scheme's collections processing capabilities (e.g. list processing, array processing, etc.). Federation is supported by providing an object oriented language which can be used to model the global schema and by using the content models of the gateways to link up the global schema with the local schema.

Second, in order to support usage of structured, semi-structured, or unstructured compound document information spaces, extensions need to be placed in the HTTP stream. For structured compound document information spaces, which can be organized apriori (e.g., for a product catalog application), the key concept of collections need to be added. Semi-structured information spaces need to be supported with a query mechanism that operates over distributed indices that are pre-set. Additionally, for interactions that are of short duration and are performance intensive, the extension needs to support a way for running the queries as stored procedures on the compound document information space. On the other hand, for interactions that are not time critical, but need to be dynamic, dynamic query execution also needs to be supported. Unstructured information spaces where apriori decision on organization cannot be made, need to be supported with a dynamic index generation extension which creates indices based on usage and/or access patterns.

LogicWare uses the collection data structures in MIT Scheme to provide support for collections. Querying is supported through a forward and backward chaining rule based interpreter. Stored procedures are supported through dynamic loading of libraries. Finally, index generation is supported by the use of agents written in MIT Scheme which intercept the HTTP/HTML stream to create dynamic indices. A distributed directory system can be built on top of such indices using the distributed KQML protocols supported by LogicWare (e.g. publish/subscribe, advertise, etc.,).

Third, transaction management also needs to be supported by the extensions in order to manage updates. Different transaction models on the compound document information space including distributed transactions and long transactions need to be supported by the extensions.

Our plan is to integrate an existing TP monitor product such as Novell's Tuxedo or Transarc's Encina to support transaction processing for short duration transactions. We are exploring the use of version management systems or object oriented databases with version management capabilities to support long duration transactions.

Extensibility

Extensibility is limited for two reasons: a) A content model focused towards hyper-text browsing and b) lack of standard approaches for extending clients and servers.The content model limitation was presented above. In addition, the current approaches to extending clients and servers is limited. Currently, at least three popular ways of extending the clients exist. The first is the SDI from SpyGlass and Netscape which provides an API for extending the clients on Microsoft platforms. However, this approach does not transfer over to Unix/X platforms or Macintosh platforms. The second, is Java[TM] and HotJava[TM] browser from Sun Microsystems. Third are the VRML clients. All of these choices provide different functionality and point to different ways of extending the clients. This becomes an even more vexing problem when additional content models, that support editing of content using locally accessible clients, have to be integrated into the clients.

LogicWare uses the underlying compound document architecture supported by the platforms (MS-OLE, Apple OpenDoc, etc.) to implement a container that can be used by multiple applets interacting to produce a document.

Server extensions are more easily made through the CGI interface but extensions that can be made are limited by the functionality currently provided by CGI. CGI does not provide support for connection management, program loading, and in-memory context passing that is needed to deploy performant extensions to servers.

LogicWare uses OreO from OSF/Research Institute to address the connection management and program loading problem. However, it does suffer a performance penalty by using the context passing mechanism currently supported by CGI.

Customizability

In addition to the need to extend clients, there is a need to develop task oriented applets that combine the services of different applications. These customizations usually involve going across client-server pair (e.g. HTTP, NNTP, etc.) boundaries.

LogicWare uses the compound document automation architecture of the underlying platform (e.g. MS OLE automation, Apple Open Scripting Architecture) to support the above requirement.

Conclusion

Our approach with LogicWare and other comparable approaches point out the limitation in the current HTTP/HTML architecture for supporting interactive, concurrent collaborative system development. HTTP is a unidirectional, file oriented, synchronous, hyper-text protocol. What is needed for building interactive, concurrent, collaborative applications is bi-directional, object oriented, asynchronous, compound document protocol. Furthermore, HTML is a hyper-text display oriented content model. Interactive editing requires a rationalized set of content models that can be used to support editing of a compound document with many sub-elements of various types.