W3C Jigsaw

Jigsaw Resource Factory

The Jigsaw resource factory is a modular piece of software that runs behind the scene, and creates HTTPResource instances out of existing data. The factory currently knows about files and directories of the underlying file system, but you can extend it to handle more objects, at will.

This document describes when the factory is called and how it maps any existing data source to HTTP exportable resources.

When is the factory invoked

Each running server has a resource factory attached to it (which it might share with other server, but this is not relevant here). Any resource can call its server factory in order to create a resource out of an existing object. Currently, the only resource that does so is the w3c.jigsaw.resources.StoreContainer, which is the base class for most resource containers (such as the one exporting directories).

When queried for an URL component, at lookup time, the directory resource first checks its children resource store for a matching resource, if such a resource is found, than it is returned as the target of the lookup, otherwise, if the directory is flaged as extensible, the directory resource derives a file name from the resource's identifier, and goes to the resource factory to obtain a wrapping resource instance. If such a resource is built successfully by the factory, the directory resource installs it as one of its children resources, and manages its persistency.

Let's walk through this algorithm with an example. Suppose there is a directory resource User which wraps an underlying file-system directory named User. This directory resource will usually be created empty (with no children resources). At some point, a client will ask for, say, User/Overview.html. The lookup process starts, and after some iterations comes to the point were it looks for Overview.html in the directory resource User. The directory resource looks into its children resources to find it, as none is found, it goes to the resource factory, and asks it to construct a resource for the file Overview.html. If a resource is returned (which depends on the factory configuration), the directory resource plugs the newly created resource into its resource store, and returns it as the target of the lookup.

One important note here: as resources are persistent objects (they persist across Jigsaw invocations), resources that wrap existing objects are created only once in the whole lifetime of the server. This means that changing the factory configuration after a resource has been indexed, has no effect on the resources that have already been created. This is one of the features that makes the server fast: indexing an existing object into a resource might be a costly process (it will involve querying multiple databases, such as the extensions and directory templates database, etc.). Caching the result of this operation allows the server to concentrate on its real work, which is to serve data back to clients. You may still however, want to change the resource factory configuration, and re-index part of your information space with these new options. Currently the only way to do that is to delete the resources to be re-indexed and have them recreated through the normal mechanism.

How the factory creates resources

The factory is defined in terms of a set of indexers. Each container resource may specify the indexer to use to index its content, through its indexer attribute which should provide the valid name of a registered indexer. You could implement for example, a MailMessageIndexer that would create resources out of a berkley-like mail box file, and have a MailResource use it to export it.

The default indexer class, in current Jigsaw release is the w3c.jigsaw.indexer.SampleresourceIndexer, which knows only about files and directories. It creates resources by maintaing two databases: the extension database is used to index files, while the directories database is used for directories indexing.

The extension database

When the sample resource indexer is called to index a normal file, the first thing it does is to split the file name into its raw name, plus its set of extensions. So, for example, if the file to be indexed if foo.en.html.gz, the raw name will be foo, and the set of extensions will be {en, html, gz}.

It then take each extension description record, and look if it defines a resource class. In a typicall setting, only the html extension will have an associated resource class, which is likely to be the FileResource class. This gives the indexer the class of the resource to build for the given file, so the indexer carries on by creating an empty instance of this class. It then creates a set of default attribute values, first by defining the following pre-defined set of attributes:

Then for each of the file extensions, it looks into the associated database record, and fill in the remaining attributes. The html extension record, for example, might define the default value for content-type to text/html. The en extension record will probably define the content-language default value to en, and finally the gz extension record will probably state that the resource's content-encoding default value should be x-gzip. Once the set of default attribute values is constructed, the resource is initialized, and returned.

The directory templates database

When the factory is called to index a directory, it examines its directory templates database. This database allows the web admin to map directory names to specific sub-classes of resources.

For each directory template, the web admin first specifies an appropriate resource class. A typicall setting, might specify, for example, that all directory named Putable should be exported by an instance of the PutableDirectory.

The class attached to a directory template needs not be a sub-class of the DirectoryResource. You can specify, for example, that directories named CVS should be exported through a CvsDirectoryResource, which will provide you with a form-based interface to CVS.

Configuring the factory

Configuring Jigsaw factory consists of editing the set of indexers, and for each indexer editing the extensions and directory templatesdatabases. This can be done entirely through the administration application. This section describes how this works, you might also want to check the configuration tutorial.

When you connect to the Jigsaw admin server through the JigAdm application, you'll see that each opened server as a node named indexers. At installation time, this will only display the default indexer which knows about usual mime types.

Open the default indexer node, and its extension database. This will show up the sorted list of currently defined extensions. To remove an extension record, select it by clicking on its name, and press the Delete Resource button (bottom of the right panel): the extension record is deleted from the database. To edit a particular extension record, select it. On the top of the right panel you can see a number of buttons, click on the Attributes button.This will bring up a form, containing all the default attribute values for the extension. This form changes depending on the class that you have attached to the extension (extension with no class applies to all resources, hence, they allow you to edit the HTTPResource attribute values). You can change any of these values, which will provided as default attribute values for resources wrapping a file that matches this particular extension.

To define new extensions, select the extensions node. This will popup a form querying you for the extension name (the identifier field at the top), and the class. Let's say you want to define the extension ps for exporting application/postscript files. Type in the name of the extension (here ps), and attach it the w3c.jigsaw.resources.FileResource class, then click on the Add Resource button. Select the newly created extension and click on the upper Attributes button. This will popup the attribute editor, state that the default value for the content-type is application/postscript, and press the Commit button. You are done: all files having the ps extension will be exported through a FileResource whose default value for the content-type attribute will be application/postscript.

Now, let's create some directory templates. Open the directories node. This will display the sorted list of currently defined templates. To remove a directory template, just select it , and press the Delete Resource button (at the bottom of the right panel). To edit the attributes of a directory template, click on its name, and select the Attributes sheet. This will display the set of attributes for the directory template itself.

Jigsaw Team
$Id: indexer.html,v 1.9 1997/07/31 08:21:08 ylafon Exp $