What Is a Web Query Language For?

Denis Lynch
SilverPlatter Information Ltd.

A widely supported query facility will be a great advance in Web technology. Before advancing a proposal for features of such a facility we need to agree on what it will be used for. Developments in this area seem to cluster into two main areas:

Information discovery, analogous to Directory services.
Application development tools, analogous to SQL.

Both areas are suitable for development and agreement, with a high probability of quick and useful results. But it is important to realize that they are very different. Pursuing the analogies, SQL is not typically encountered by end users, nor are SQL applications broadly inter operable. Rather, SQL is a powerful tool for building specific applications which are tied to local database structures. Two SQL work tracking applications are unlikely to share enough common database structure to allow them to inter operate. On the other side, a directory user (or application) relies not the protocol and query language to support inter working, but even more on a data model at the interface.

SilverPlatter's main interest is in the information discovery aspect, which will be the subject of the remainder of this paper.

Current Web Information Discovery

There are two basic kinds of information discovery available on the Web today:

The "Search Engines" based on free text search of the pages found by their crawlers.
Site specific navigation: browsable links, local search facilities, etc.

The simple access provided by the search engines provides very little precision, which makes it increasingly difficult to find useful information. Site specific mechanisms can avoid this problem, but at the expense of restricted reach.

Dublin Core and RD. allow Web resources to include structured descriptive information that will be a critical part of more functional discovery systems - along with appropriate query facilities.

Current Library Practice

Information discovery is central focus of librarianship, evidenced by catalogues and Abstract and Index publications and databases. Some key elements of this practice are:

Specialized tools (the catalogues and indices) based on descriptive information produced according to community standards
Standard data formats, including physical ones like catalogue cards and Catalogue In Publication data, and digital ones like MARC.
Classification schemes with controlled vocabularies, like Library of Congress Subject Headings, Medline Subject Headings, and other thesauri.

These techniques are being tentatively applied literally to Web resources, e.g. by OCLC's Web cataloguing project. For information of high value and general interest this will probably be viable, but the manpower needed for manual cataloguing won't be available for vast quantities of more ephemeral information.

Where To Go From Here

A high functionality Web search facility can combine the best of ad hoc Web access and library quality accessibility. The main requirements of such a facility are:

Search on specific descriptive attributes (Title, Subject, Creator, etc.)
Flexible semantic qualification of attributes (Author vs. Editor)
Search based on linking relationships
Use of controlled vocabularies, including browsing and searching for relevant terms.
Browsing indexes
Dynamic discovery of available search attributes and semantic qualifiers
Negotiation of result format (record format, character set, detail level)

These things are all covered by the Z39.50 Search and Retrieval protocol. Because Z39.50 is a widely implemented international standard, a Web facility built on Z39.50 can be defined and deployed quickly - probably more quickly than any alternative approach. If additional requirements arise that make Z39.50 unsuitable, Z39.50 should still form the foundation of whatever is developed.