What Is a Web Query Language For?
Denis Lynch
SilverPlatter Information Ltd.
A widely supported query facility will be a great advance in Web technology.
Before advancing a proposal for features of such a facility we need to
agree on what it will be used for. Developments in this area seem to cluster
into two main areas:
-
Information discovery, analogous to Directory services.
-
Application development tools, analogous to SQL.
Both areas are suitable for development and agreement, with a high probability
of quick and useful results. But it is important to realize that they are
very different. Pursuing the analogies, SQL is not typically encountered
by end users, nor are SQL applications broadly inter operable. Rather,
SQL is a powerful tool for building specific applications which are tied
to local database structures. Two SQL work tracking applications are unlikely
to share enough common database structure to allow them to inter operate.
On the other side, a directory user (or application) relies not the protocol
and query language to support inter working, but even more on a data model
at the interface.
SilverPlatter's main interest is in the information discovery aspect,
which will be the subject of the remainder of this paper.
Current Web Information Discovery
There are two basic kinds of information discovery available on the Web
today:
-
The "Search Engines" based on free text search of the pages found by their
crawlers.
-
Site specific navigation: browsable links, local search facilities, etc.
The simple access provided by the search engines provides very little precision,
which makes it increasingly difficult to find useful information. Site
specific mechanisms can avoid this problem, but at the expense of restricted
reach.
Dublin Core and RD. allow Web resources to include structured descriptive
information that will be a critical part of more functional discovery systems
- along with appropriate query facilities.
Current Library Practice
Information discovery is central focus of librarianship, evidenced by catalogues
and Abstract and Index publications and databases. Some key elements of
this practice are:
-
Specialized tools (the catalogues and indices) based on descriptive information
produced according to community standards
-
Standard data formats, including physical ones like catalogue cards and
Catalogue In Publication data, and digital ones like MARC.
-
Classification schemes with controlled vocabularies, like Library of Congress
Subject Headings, Medline Subject Headings, and other thesauri.
These techniques are being tentatively applied literally to Web resources,
e.g. by OCLC's Web cataloguing project. For information of high value and
general interest this will probably be viable, but the manpower needed
for manual cataloguing won't be available for vast quantities of more ephemeral
information.
Where To Go From Here
A high functionality Web search facility can combine the best of ad hoc
Web access and library quality accessibility. The main requirements of
such a facility are:
-
Search on specific descriptive attributes (Title, Subject, Creator, etc.)
-
Flexible semantic qualification of attributes (Author vs. Editor)
-
Search based on linking relationships
-
Use of controlled vocabularies, including browsing and searching for relevant
terms.
-
Browsing indexes
-
Dynamic discovery of available search attributes and semantic qualifiers
-
Negotiation of result format (record format, character set, detail level)
These things are all covered by the Z39.50 Search and Retrieval protocol.
Because Z39.50 is a widely implemented international standard, a Web facility
built on Z39.50 can be defined and deployed quickly - probably more quickly
than any alternative approach. If additional requirements arise that make
Z39.50 unsuitable, Z39.50 should still form the foundation of whatever
is developed.