Delegating control over multiple search engines on a site

The simplest way to configure a search engine for a web site is simply to compute one large full-text index of all the files at the site, and then provide a single interface for visitors. However, there are situations where the simplest approach is not necessarily the best.

For example, consider a server which has three text databases --- two separate databases containing end-user documentation on different releases of the same product, and a third on, say, fly-fishing. If we used the same index files to handle searches on all three databases, users trying to search the product documentation would get information on both releases (which would surely be confusing); to make matters worse, they might get a few fish stories as well.

So, this site would need multiple indexes to serve all its clients well. If the search engine is integrated with the web server itself (e.g. by means of a server API, which one might well want to do for reasons of efficiency), this means that the server has to provide the search engine with information about all of the different indexes on the site --- somehow.

The need for delegation

The simplest way to design the search engine for this sort of site is to have a single configuration file, which points to index files and other support material for each of the indexes. However, once again, the simplest way to build the system is not necessarily the best.

The reason has to do with maintainability. Maintainers of each of the text bases will presumably want to alter their configurations from time to time. In the single-configuration-file scenario, this means that each of them would have to edit the single configuration file. By Murphy's law, it is inevitable that sooner or later, someone is going to slip up, and alter the configuration of a database which is not their own.

Indeed, it may be the case that some of the text-datbase providers are not trusted to alter the configuration of databases which they don't own. Ideally, the search engine would provide for this --- that is, it would allow for a webmaster to delegate the authority to set up a new search engine for a text database to the maintainers of that database, to specific individuals, and them only.

Delegation in context

Search engines are not the only context in which the need for delegation is important --- indeed, many webservers (of which Apache is one) allow the behavior of most server features (directory indexing, server side includes, error handling, CGI scripts, etc.) to be controlled on a per-directory basis by a file located in the directory in question (and the mechanisms that it uses for the purpose, which I have described elsewhere , could be used by search engines implemented as Apache server extensions).

Apache supports this flexibility because its users have found centralized control, and the attendant possibilities of cross-project interference, can be an insuperable administrative headache. Designers of search engine software may find their own users have similar concerns.


Robert S. Thau

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.