This workshop brought together a cross section of people involved with information server technologies, search technologies, and directory and online services, to discuss where repository interface standards could support better approaches to distributed indexing and searching. The goal of the workshop to produce standards, but rather to uncover and discuss areas of mutual concern where standards might gain momentum.
There was a great deal of interest in this workshop. To keep the group of a workable size while maximizing breadth of attendee backgrounds, the workshop co-chairs limited attendance to one person per position paper, and furthermore we limited attendance to one person per institution. (One extra attendee from each of @Home Network and Transarc attended, as "scribes" for the plenary sessions of the workshop.)b There were quite a few cases where people expressed legitimate desires to bring multiple representatives (e.g., from two different parts of a large company or government agency, or from two different research labs at a university), but we felt it was necessary to limit attendance to limit the workshop to a reasonable size.
The workshop spanned two days. The first day's goal was to identify areas for potential standardization through several directed discussion sessions, while the second day's goal was to filter the list of issues and identify those most likely to lead to useful standards. Each technical session during the first day began with two 15 minute talks expressing opposing views on the session topic, followed by a breakout session in three parallel tracks, during which participants were asked to examine what might be standardized over 3 month, 12 month, and longer time periods, and then to report back with a summary slide at the plenary session. At first we tried to have a brief question and answer session after each talk and a plenary discussion after the breakout session, but after the first technical session we decided to cut out questions and plenary discussions to maximize time available for the breakout sessions.
We selected three areas for the above sessions, based on the position papers submitted.
The first area, Distributed Data Collection, addressed issues associated with the collection of data across the network. We led with questions such as, "Is robots.txt adequate for future needs?" and, "What is the value of protocol- and programatic-based solutions?"
The second area concerned Data Transfer Formats, suggesting a discussion about the relative merits of early deployment (for creating a dominant standard) vs. format negotiation (allowing interoperable access to multiple standards).
The final area examined the need for architectures that
distribute search across several repositories. We observed that the most
popular indexes currently are constructed as centralized repositories in
the mainframe model, while more recently, meta-search engines have
become more popular. We asked workshop participants to consider the role
of global vs. topic-specific indexes, repository access protocols like
Z39.50, and other mesh-like models appearing on the Web. We also asked
whether distributed searching is a realistic paradigm for administratively
To keep the workshop reporting process manageable, only session chairs were given a chance to work over drafts of this report before it was published. Other workshop attendees will be given 2 weeks after the report is published during which time each will be permitted to submit a web page to be linked into the report, for any comments they want to add.