Report of Session II, Breakout A

Report of Session II, Breakout A Chair (and report editor): Nick Arnett (Verity)

Conclusions

Define a new model for units of search and retrieval.
Define "push" and/or notification for resource discovery and maintenance.
Separate the URLs from the retrieval keys by use of a well-known META tag.
Reach consensus on a long-term view of information retrieval on the Web.
Give serious consideration to using Z39.50 or a subset ("Z39.50 lite") to accomplish many of these goals.

Issues discussed

Our group began by discussing the possible need for a hierarchical extension to SOIF via Z39.50. Questions were raised about the difficulty of doing so and use of such a protocol through firewalls. A Z39.50 public domain toolkit was recommended. (This URL does not appear to be correct.)

What is the unit of search and retrieval? Today, one file is generally treated as one document. There is a need to search and retrieve against objects that may be subsets of files (e.g., searching within chapters of a book that is stored as one file) and to treat multiple files as if they were one documents (e.g., search and retrieve books that are made up of chapters in separate files)
Many documents, especially binary objects, do not have meaningful meta-information associated. File names are generally not sufficiently informative.
Proposed standards should not close the door to search and retrieval of novel types of information such as image search.
The publishers of information on the Web often do not have administrative rights on the server with which they publish and thus cannot take advantage of standards that require it. For example, the typical Internet Service Provider does not give users access to the "robots.txt" file in the Web root directory.
One of this group's goals should be to enable more timely maintenance of indexes (including discovery of new documents) by supporting a means of asking Web servers "what is new or changed since [date and time]"? An event-based "what's new" method would improve efficiency by allowing servers to notify user-agents, including indexing robots. The group agreed that the nature of the solution should be be "pull" with notification.
We should assume that the Web will support multiple business models that will require both "pull" (such as an exploring robot) and "push" (servers that notify user-agents, such as robots, of changes) models.
Retrieval keys (typically a URL on the Web) are not always the same key that should be used to retrieve the document in a search results list. For example, a search "hit" found in a chapter of a book might properly point to the book title page. In certain kinds of Web pages, such as multi-part framed documents, retrieving on the source URL would result in client errors because the frame document would be out of context.
Search brokering services will be needed to deal with the increasing number of search agents.
Standards for meta-information and query language should be considered.

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.