Towards High-Quality Searching on the Web

Mike Berry and Shirley Browne
, Computer Science Department, University of Tennessee

Murray Browne
School of Information Science, University of Tennessee

Position paper for WWW Consortium Distributed Indexing/Searching Workshop

In order to achieve higher quality searching on the Web, there needs to be a shift from the operational goal of "get every file that contains one or more of the keywords I entered, ranked by where and how often they occur" to "retrieve the resources that best satisfy my information need, with the most relevant and highest quality ranked the highest". From the theoretical information science point of view, one would ideally like to have comprehensive high-quality topical indexes and be able to route an information need to the most appropriate of these to search. The problem is that such indexes would be prohibitively expensive or impossible to construct by entirely manual methods, given the size and diversity of the Web. Thus there is a need for semi-automated methods that build on already developed Web technologies and assist domain experts in constructing and maintaining high quality topical indexes. The overall Web search engines would then become interfaces to the distributed collection of these specialized topical indexes.

To enable interoperation between search services, it will be necessary to standardize descriptions of query types and search capabilities, and to standardize the syntax for standard query types, so that:

It will also be necessary to characterize both the content and the quality of different search engines and their underlying databases. We believe that clustering techniques based on semantic analysis will provide the most effective characterization of content.

Quality of a search service should be determined by evaluation based on standard performance measures and criteria. Currently used measures consist mainly of the number of items in the database and the speed with which search results are returned, with no evaluation of the relevance of the results to the expressed information need. Measures that approximate recall and precision and that evaluate the accuracy of the ranking of search results need to be developed. These measures could perhaps be based on relevance judgments solicited from users, and on comparison of query results across multiple search services. Comparative ratings of database quality will provide a way to combine ranked results from different search services. Standard performance measure will also help in evaluating new indexing and retrieval methods. We believe that a new generation of statistically based semantic retrieval methods, such as Latent Semantic Indexing (LSI), will provide better performance than the current general of lexical matching methods.

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.