Distributed searching tools have proliferated in the absence of any standards for query syntax or resource discovery. As a result users must become familiar with both the contents and query rules for interacting with a variety of search engines. While there is some commonality provided by the general adoption of HTTP/HTML for managing the interaction with the user, this does not extend to the user interface itself.
URNs provide some of the framework for handling the resource identification aspects of this problem, but no architecture now exists to implement URNs across a broad spectrum of networked servers. A few projects have attempted to address the problem of mapping a standard query syntax into multiple information servers, but generally only within a single protocol (for example, Willow and Z39.50). There is some work going on at Berkeley as part of the digital library initiative there to generically map queries into servers, but the work is still in its infancy.
An architecture incorporating client query proxies can address many of the problems inherent in a distributed network of search engines. The query proxy can use HTTP/HTML to communicate with the user. Each user registers with the proxy server and provides information on his/her query syntax preferences. The preferences are stored on the query proxy, and discarded after a preset period of inactivity. When the user issues a query, the syntax is based on that user's preferences. The query submission also includes the server to search. The proxy then launches a process that contacts the server, downloads (and optionally caches locally) that server's query syntax, performs mapping from the user's syntax to the server's syntax, and issues the query. Query results are returned to the proxy, and passed back to the user. A block diagram of the architecture is available separately.
This approach requires the definition of a protocol for proxy-to-server communication, and standards for the definition of preferred query syntax. Ideally this would be handled similarly to Whois++, where the server can be queried for its templates and help files. However, in the short term it should be possible to agree on a port that will dump the necessary configuration information in a predefined format in response to a telnet connection from the query proxy.
To ease the problem of hand-coding the client query proxy with the query capabilities of the target server, it should be quite possible to add a function to search servers to describe, in a common vocabulary, the search and query capabilities of the server, to make the collection of this information feasible for machines. This dovetails nicely with the ideas described in our second position paper on indexing proxies.