Position Paper Distributed Indexing/Searching Workshop Clifford Lynch University of California, Office of the President clifford.lynch@ucop.edu While the architectural model developed by the Harvest system provides a very valuable context for developing interface standards between web-crawlers and site-based gatherers, other recent work including the Warwick framework that is under development as a result of the recent OCLC/NCSA Metadata workshop and the work on the CNI White Paper on Networked Information Discovery and Retrieval suggests that this architectural model needs significant extension. Key areas include: 1. Modeling of site-based processes (algorithmic and/or intellectual) that might move from author-provided in-document metadata to more comprehensive external metadata "containers" that both enrich and refine the author-provided descriptive information and also may include (possibly through inheritance models) site policies regarding usage rights and other properties of objects. These external metadata containers might then be collected by gatherers for export or directly by web-crawlers. This has implications for the HTML extension standards that might be used to include or attach metadata to objects, as well as to the site (repository) interfaces. 2. The extension of descriptive information that is exported from a site (including what is part of the base SOIF element set) along the lines indicated by the Dublin Core work and including the interoperability insights obtained at the Warwick meeting. This would include, for example, the use of controlled descriptive vocabularies. 3. There is a need to improve support for collections of information at sites that cannot be directly indexed (for example, Z39.50 databases which are accompanied by EXPLAIN-based metadata) or which are not accessible to web-crawlers (or perhaps even to indexing algorithms embedded in Gatherers) except under highly controlled circumstances because the information provider wishes to retain control over what is exported (commercial intellectual property, for example). There is a need for a concept of trusted rendezvous sites where local information can interact with external indexing and abstracting algorithms, but where the information owners can be assured of some controls over the amount of extracted information that is being exported. The definition of these controls represents a key research problem.