Maintaining the Dependencies between Web Pages

Status

Writing up some thoughts. Started with some discussion with Ted Guild during HK trip.

Introduction

The Problem

The same information appears on the web in multiple places, often in very different forms, without any guarantee that changes made in one place will be properly reflected in the other places or any general mechanism for propagating such changes. A particularly tricky form of this problem occurs when source information is provided and updated unpredicably by one organization and another organization is trying to provide accurate derived information on other pages.

A General Solution

If people provide formal declarations of the intended relationships between web pages, then software can (a) check whether such relationships are current, and (b) in some cases propagate the changes automatically.

This formal declaration can be made using an RDF logic, along with ontologies for web operations and (for content transformation) web services. The declarations can be presented for user editing via a Web-GUI or RDF language.

Forward and Backward Chaining

A general implementation of this system would be an HTTP 1.1 spider agent, doing GETs, checking declarations, invoking services, and doing PUTs to propagate changes where appropriate. This might be useful, but would have annoying propagation delays. @@@ is relying on PUT a problem?

A variation, "backward chaining," works when the publisher of a dependent page installs special software: it can check at GET-time whether any dependencies are out of date and force a rebuild before providing content. (This can be throttled, of course, checking no more than once every N seconds and guaranteeing only synchronization within N seconds.)

Another variation, "forward chaining," works when the publisher of the source page maintains the page in a system with change-notification hooks, like CVS. In this case, dependent pages can simply be updates whenever the relevant source pages change.

@@@@ a diagram could help a lot

Bidirectional Propagation

In contrast to direction of chaining (which concerns learning of the need to perform propagation), we can also talk about direction of propagation. Imagine page A and page B have been declared to have identical content. Ideally, we can allow either to be changed, and the system can propagate the changes to the other. Only in the case of simultaneous changes would an automatic system have difficulties.

This principal can be extended to cases more complicated than simple equivalence of content. Some transformations of content are easily reversable. For many transformations, however, the reverse is either programmatically or computationally intractable.

The Software

@@@@ Example configuration and usage

@@@@ Internals

Sandro Hawke
$Date: 2001/05/08 16:51:40 $