There are three ways to maintain an index:
This group explored ways to improve index maintenance capabilities by enhancing the interactions between web servers and indexing technologies. We began by exploring the relationship between notification and bulk content transfer, which had been grouped together during the plenary discussion because of the need to transfer several resources at once at the server's instigation. We ended up not discussing bulk content transfer after deciding that it is an orthogonal and simpler problem.
This group addressed several problems, some of which are not apparently related until we begin looking at the solutions:
We determined that the following work items need to be done, in order of complexity and precedence:
Several issues that work together to provide an overall capability often tend to be bound to one another more tightly than necessary. This is the case with notification and bulk content transfer. Instead of a notification always involving a bulk transfer of resources, we pulled these two apart and explored the general categories of each. Notification is one mode of communication while bulk transfer is one kind of package for delivery of data.
We identified the following modes of communication:
Note there are two major modes, synchronous and asynchronous, and within each major mode, there are two parts to a communication. (The "Pull" and "Push" modes may not be named or characterized correctly - maybe they should just be "Request" and "Response". We are more interested in the second pair: Register and Notify.)
When a client registers with a server, it is requesting that the server notify it when some event happens of interest to the client. The event might be as simple as 'a resource has changed', or as complex as 'a resource about biking has passed the final review stage'. After the registration is completed, the synchronous connection is dropped.
Some time later, when the event occurs, the server notifies the client. Now the server is actually a client in initiating a connection to the original client which must have a server actively listening for the notification. To keep this less confusing, we will only talk in terms of the original client and server.
The server can notify the client one of several ways. A synchronous connection much like the first connection from client to server could be attempted. SMTP-based email delivers messages using store-and-forward - this works even if both parties are not available at the same time for a synchronous communication. For a large number of clients interested in the same event, it may be more effective to use a flooding propagation of notifications via something like NNTP.
The message being transmitted in a notification should probably be very small, especially with a large number of registrants, so instead of sending a new or changed resource directly, the server should send just a reference to the resource. The client could later fetch the actual resource with a Pull/Push.
The entity transferred in any message may be one of several kinds of things all concerning a single resource:
Which one of these things is transferred in a message must be known to the receiver. Either it is known implicitly to the client by the kind of request it made, or it is declared explicitly by the server if it would otherwise be unknown to the client.
These entities may be considered resources in their own right, especially if they are given URIs to identify them.
One kind of resource that is particularly important for this proposal is a collection. A collection is a set of other resources that together act as one resource. This abstraction allows us to send messages about collections as easily as we send them about simple resources.
Some other combinations to keep in mind: Rather than a collection of actual resources, we may also have collections of metadata or differences. We may also have metadata about a collection itself or differences of a collection relative to a previous version of the collection (i.e. new, changed, or deleted elements).
Collections should be identified by URIs. There is a natural use of http and ftp URLs to identify collections. Requests for an http URL of a directory, i.e. ending in "/", is handled by servers by generating an HTML document that lists and describes the accessible elements of the directory, or a default "index.html" file is returned instead. But a client might request that the server return a representation of a collection in another form by specifying what that form could be. The Accept line could include "text/SOIF" for example.
Many standards are involved in the framework we are proposing, including: