Towards High-Quality Searching on the Web
Mike Berry and
Computer Science Department, University of Tennessee
School of Information Science, University of Tennessee
Position paper for
WWW Consortium Distributed Indexing/Searching Workshop
In order to achieve higher quality searching on the Web, there needs
to be a shift from the operational goal of
"get every file that contains one or more of the keywords I entered,
ranked by where and how often they occur" to
"retrieve the resources that best satisfy my information need,
with the most relevant and highest quality ranked the highest".
From the theoretical information science point of view, one would
ideally like to have comprehensive high-quality topical indexes
and be able to route an information need to the most appropriate
of these to search. The problem is that such indexes would be
prohibitively expensive or impossible to construct by entirely
manual methods, given the size and diversity of the Web.
Thus there is a need for semi-automated methods that build on
already developed Web technologies and assist domain experts
in constructing and maintaining high quality topical indexes.
The overall Web search engines would then become interfaces to the
distributed collection of these specialized topical indexes.
To enable interoperation between search services, it will be necessary
to standardize descriptions of query types and search capabilities,
and to standardize the syntax for standard query types,
It will also be necessary to characterize both
the content and the quality of different search engines and their
We believe that clustering techniques based on semantic analysis
will provide the most effective characterization of content.
- Search services can advertise what types and capabilities they support
- Clients (be they browsers, applets, agents, or other search engines)
can formulate queries in a standard format, and
- Search services can translate from the standard to an internal format
Quality of a search service should be determined by evaluation based on
standard performance measures and criteria.
Currently used measures consist mainly of the number of items in
the database and the speed with which search results are returned,
with no evaluation of the relevance of the results to the expressed
information need. Measures that approximate recall and precision
and that evaluate the accuracy of the ranking of search results
need to be developed.
These measures could
perhaps be based on relevance judgments solicited from users, and on
comparison of query results across multiple search services.
Comparative ratings of database quality will provide a way to
combine ranked results from different search services.
Standard performance measure will also help in evaluating new indexing
and retrieval methods. We believe that a new generation of
statistically based semantic retrieval methods, such as
Latent Semantic Indexing (LSI),
will provide better performance than the current general of lexical
This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.