Report of Session II, Breakout A
Chair (and report editor): Nick Arnett (Verity)
Conclusions
- Define a new model for units of search and retrieval.
- Define "push" and/or notification for resource discovery
and maintenance.
- Separate the URLs from the retrieval keys by use of a well-known META
tag.
- Reach consensus on a long-term view of information retrieval on the
Web.
- Give serious consideration to using Z39.50 or a subset ("Z39.50
lite") to accomplish many of these goals.
Issues discussed
Our group began by discussing the possible need for a hierarchical extension
to SOIF via Z39.50. Questions were raised about the difficulty of doing
so and use of such a protocol through firewalls. A Z39.50
public domain toolkit was recommended. (This URL does not appear to
be correct.)
- What is the unit of search and retrieval? Today, one file is generally
treated as one document. There is a need to search and retrieve against
objects that may be subsets of files (e.g., searching within chapters of
a book that is stored as one file) and to treat multiple files as if they
were one documents (e.g., search and retrieve books that are made up of
chapters in separate files)
- Many documents, especially binary objects, do not have meaningful meta-information
associated. File names are generally not sufficiently informative.
- Proposed standards should not close the door to search and retrieval
of novel types of information such as image search.
- The publishers of information on the Web often do not have administrative
rights on the server with which they publish and thus cannot take advantage
of standards that require it. For example, the typical Internet Service
Provider does not give users access to the "robots.txt" file
in the Web root directory.
- One of this group's goals should be to enable more timely maintenance
of indexes (including discovery of new documents) by supporting a means
of asking Web servers "what is new or changed since [date and time]"?
An event-based "what's new" method would improve efficiency by
allowing servers to notify user-agents, including indexing robots. The
group agreed that the nature of the solution should be be "pull"
with notification.
- We should assume that the Web will support multiple business models
that will require both "pull" (such as an exploring robot) and
"push" (servers that notify user-agents, such as robots, of changes)
models.
- Retrieval keys (typically a URL on the Web) are not always the same
key that should be used to retrieve the document in a search results list.
For example, a search "hit" found in a chapter of a book might
properly point to the book title page. In certain kinds of Web pages, such
as multi-part framed documents, retrieving on the source URL would result
in client errors because the frame document would be out of context.
- Search brokering services will be needed to deal with the increasing
number of search agents.
- Standards for meta-information and query language should be considered.
This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.