Distributed Searching Session Session IV, Track A Breakout Session Chaired and Reported by Ray Denenberg (Library of Congress) ----------------------------- We set some ground rules up front: we would not debate whether distributed searching is "good" or not, although clearly there are strong positions on both sides, but the issue is out-of-scope; for purposes of this session we assume the existence of distributed search. We also limited the scope to exclude discussion of indexing and negotiation, since these were the subjects of the other two sessions. Another scope limitation: we tried to focus on issues related specifically to "distributed" searching, as opposed to just "searching". Issues ------ In the first part of the session we developed a list of issues pertaining to distributed searching. We made some attempt to rank these issues, but only got so far as to determine the first two, in terms of importance. They are: (1) Model for merging and ranking, including de-duping. (2) Semantic interoperability. Other issues that we identified, in no particular order: - Level of aggregation. - Meta engines. - Query syntax. - Representation of quality/characteristics of an index. - User control and transparency. - Architecture. - Discipline specific domains. - Collection model; "clusters". - Granularity; what constitutes a "hit"? - Algorithmic modelling; e.g. looping, maximum length, time-to-live. - Multi-national issues - "Advertisement" model. - Search vs. browse. - Navigation. - Schema negotiation. Short/medium/long Term Areas for Standardization ------------------------------------------------ In the short term it seems reasonable to expect that there can be agreement on query syntax. In analyzing the issue of ranking and merging, we see three levels of agreement. In the short term, "add value" and "native-known" ranking/merging are possible. For the latter, the intermediary merges results based on native ranking performed by servers, and is able to reflect the ranking methodology (perhaps by an object identifier). "Add value" ranking/merging means that the intermediary adds value to the native ranking of the servers. In the medium term, "meta-data" based merging, and in the long term, "homogeneous" ranking/merging, are the objectives. "Meta-data" based merging means that the intermediary ranks and merges results based on metadata provided by the servers. Homogeneous ranking means that the server provide ranked results with consistent ranking and the intermediary simply merges the results. On the subject of semantic interoperability, we defined three level: (1) basic ad/hoc (2) static, and (3) dynamic, for the short, medium, and long terms respectively. Static vs. dynamic are understood in terms of the Z39.50 model where static semantic interoperability in achieved via out-of-band exchange (i.e. not via Z39.50) of necessary definition (attribute sets, schemas, etc.) and the dynamic semantic interoperability is achieved via the use of the Z39.50 Explain facility. For the collection model, we defined two levels, for medium and long term objective (no short term objectives seem realistic). In the medium term it is possible to define metadata at the "service" level as well as "basic" collection level metadata. In the long term, we can define "detailed" collection level metadata. In summary: Short term ---------- - Query Syntax - "add value" and "known-native" ranking/merging - Basic ad hoc semantic interoperability Medium Term ----------- - "Meta data" based ranking/merging - "Service" level meta-data - "basic" collection-level metadata - "Static" semantic interoperability Long Term --------- - "Homogeneous" ranking/merging - "Detailed collection" level metadata - "Dynamic" semantic interoperability