POSITION PAPER: Z39.50 & Ranked Searching
Co-Authors: Dr. Chris Buckley, Chief Scientist, Sabir Research; Peter Ryall, Senior Architect, LEXIS-NEXIS
Access the Distributed Indexing/Searching Workshop Call for Papers using the
In the current universe of relevancy-based search & retrieval systems, there is a wide diversity of search methodologies, ranging from simple term occurrence/proximity algorithms, to modal, LSI, & connectionist logic, to full natural language processing. Across this spectrum there are many variations in query syntax, & in the degree of control given to the user and/or client over the exactness of the interpretation of search terms, as well as over the precision & comprehensiveness of the results selected from the target collection(s). Similarly, within the WWW community, a range of syntaxes exist for input of search query terms & criteria (various flavors of structured forms, fields allowing free-form query text, etc.)
The `Type 102 Ranked Query' currently under development for use within the Z39.50 Search & Retrieval protocol has been specifically designed to accommodate the ranked search technologies used by the majority of large-scale commercial information providers and Information Retrieval (IR) software vendors. The set of features specified within the standardized syntax of the Ranked Query is estimated to encompass the functionality supported by 80-90% of mainstream commercial ranked search technologies (including those in wide use across the WWW).
How the Z39.50 Ranked Query Facilitates Distributed Searching
Using the standardized Ranked Query, a consistent query & search term syntax can be used to send searches to multiple search systems, based on the following key elements of the Query:
- A standardized methodology & an absolute scale for ranking results from a single search server, or from multiple servers (which may use a wide range of different search technologies), ensures that:
- Ranked results are more consistent & predictable from one system to another;
- A standardized syntax for allowing the user/client to specify search criteria within the query allows:
- Multiple systems to be searched consistently & concurrently;
- Standardized methods are supported for combining ranked results from disparate search systems, making the Ranked Query very powerful in a distributed searching environment.
Client/Server Interaction using the Z39.50 Ranked Query
When a client submits a Z39.50 Ranked Query, it has the option to instruct the server to reformulate the query to better describe the user's information need. The server modifies the query based on its knowledge of the collections it is searching, the vocabularies native to those collections, general linguistics, & the most effective expansions of the query terms as related to the desired precision & comprehensiveness specified by the client. If the client has so requested, processing can stop here, & the reformulated query is shipped back to the client for further modification by client and user.
The session-oriented `state-ful' nature of the Z39.50 protocol facilitates the following types of client-server interactions using the Ranked Query:
- The ability to refine a query through a series of server reformulations & client modifications & (re)submissions;
- The user can use selected results from previous searches as relevance feedback:
- Entire documents or portions of documents may be referenced by the client in subsequent queries.
Z39.50 Ranked Query increases Client Control over Query Processing
The Z39.50 Ranked Query gives the client more control over processing & evaluation of the query:
- The client is able to restrict the set of documents (collection) to be searched by including Boolean search restrictions such as date, author, subject, etc;
- The user/client may provide a number of 'hints' (suggestions) about the importance of particular query components, for instance: weighting of terms & operators, use of a variety of special ranking operators, & query reformulation options (e.g, term expansion, linguistic relationship) options;
- Many search systems support a 'tuning' mechanism to adjust the relative importance of precision vs. recall. The Z39.50 Ranked Query allows the user/client to adjust this weighting using a standardized weighting factor.
Z39.50 Ranked Query allows the Server to Return Postings Information
Because of the less predictable & deterministic nature of relevance based searching (as discussed above), a search server may perform query modifications or complex processing which is unrelated to what was specified in the user query. Although a client has more control over Z39.50 Ranked Query processing, the whys & wherefores of server query reformulation are still quite difficult for the user/client to understand.
Thus, an important feature of the Ranked Query is the ability for the server to return search result demographic meta-data (often referred to in the IR industry as `postings' data). The format & content of this data is also standardized within the definition of the Ranked Query, making it easier to interpret `postings' data from many different types of search systems.
This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.