Session III - Format Negotiation / Standardization Track A - Dana conference room Chair (and report editor) - Gary Adams, Sun Microsystems We started the session giving each person a chance to introduce themselves and to throw out an initial comment about the two talks. Unlike the first break out session where the initial controversy was exposed during the clarification questions at the plenary session, we started with a clean slate. The initial comments provided enough momentum to keep the dialog going strong for the remainder of the hour. It was clear from the introductions that we had a good cross section of "search engine vendors", "search service providers", academic researchers, and professional library science users. Most of the debate that took place were more generic than the session title Format Negotiation / Standardization implied. From the opening statements it was clear that neither of the plenary talks actually introduced much controversy into the session. e.g. The VSL talk essentially was a plea for more descriptive information for a large software brokering service provider and the informal standards talk was too vague about the lightweight agreement and the extended negotiated features to stir up much controversy. Most of the break out session dealt with a number of issues surrounding metadata. There were a number of arguments presented for using a small common subset, such as the Dublin/Warwick metadata workshop recommendations, and several objections that defend the need for sufficient extensibility for future needs or simply special domain needs. A strong plea was presented by the "search service providers" to make it easy to bulk load as much textual information as possible and to have a small number of common metadata elements that could be uniformly used across any spidered document. A short discussion ensued about the "spider problem". (A great quote ..., "There are a handful of 'inhalers' out there, and all of a sudden the 'network bandwidth conservation society' springs up over night.") Several large sites confirmed that current spiders are lost in the noise compared to end user browsing use of their pages. Some smaller sites reported that spiders were a higher percentage of the hits at their sites and that bandwidth should be conserved if possible. One side point was noted that "personal agents" will become far more prevalent in the future and they also need to mindful of the robots.txt convention (which they are typically not). The counter argument came from the groups with serious material published on the net. A wider range of metadata is required for structured, highly authored information. In many cases the best cataloguers are not the document authors, but highly trained librarians. e.g. metadata may be in the document from an original author or may be added potentially externally to the document itself by a third party. The third party aspect of metadata led to some notion of the PICS model of additional associated metadata. Quality of service style of discussion sprang out of the desire to 'sign' metadata or portions of metadata information. Objective, "refereed" types of information may be recorded as metadata, as well as subjective evaluations, such as "my boss" rated this page highly. A key part of the discussion took place when the distinction was made between "passive publication" (the current spider model that anything online is intended for general consumption) vs "active publication" (traditional targeted communication, with value added editorial polishing). In the active publishing model the quality of metadata and the mechanism for publication are distinctly different. In many cases a push-model is far more appropriate to guarantee timeliness of updates within search service databases ( at a minimum it should be possible to provide notification of new information and have it pulled within 24 hrs ). A short discussion took place after a miscommunication about "standards" and "economic models". The original commenter intended to say that any "proposed standards" must be viable for the current "search businesses" in order to remain a viable alternative. The message was received as "developing standard acceptable economic models" for the current search services. Once the metadata/publishing discussions settled down, some additional ideas were bounced around about result lists and the ability of search services to expand over more search vendors collections. The notion of an "exportable open inverted index format" was mentioned as a possible common denominator. No firm commitments were made, and no real consensus that this would solve the root problems of interoperability. ------ Here are a few concrete mandates that I have gleaned from the conversations although the group did not spend time assessing specific priorities / timeframes during the breakout time (i.e. they represent some of my listening biases): Short term : At a minimum there needs to be some key agreement made on the "format of metadata". The proliferation of different metadata mechanisms/structures, just makes it more chaotic for search service providers. Many authors/cataloguers will take the time to supply additional meta information, they just want to be told which form the information should take. Mechanisms for defining "collections of information" are desperately needed for presentation of intelligent entry points into repositories of generated materials and for labeling purposes. There are strong desires for more efficient/effective bulk data transfer for the spiders operated by the search service providers. Medium term : There needs to be a conduit for more specific extensions to metadata. The ability to define standard contents for specific metadata fields (controlled vocabularies) will make them more useful to a wider range of users. The current Z39.50 standards should be evaluated to see if they fulfill the requirements that have been voiced for an open extensible interface. Long term : Some form of open standards for indexing information is needed so that provider generated indexes can be shared without openly publishing the source text materials. This effort could be the result of open index data formats or the result of open interfaces for index access. Additional standards will be needed to allow some interoperability with other media types. In particular, image retrieval is beginning to be deployed out of the research community into commercial products and would benefit greatly from extensible index and metadata interfaces.