Session III			- Format Negotiation / Standardization
Track A 			- Dana conference room
Chair (and report editor)  	- Gary Adams, Sun Microsystems

We started the session giving each person a chance to introduce themselves
and to throw out an initial comment about the two talks. Unlike the first
break out session where the initial controversy was exposed during the
clarification questions at the plenary session, we started with a clean
slate.  The initial comments provided enough momentum to keep the dialog
going strong for the remainder of the hour.

It was clear from the introductions that we had a good cross section of
"search engine vendors", "search service providers", academic researchers,
and professional library science users.  Most of the debate that took place
were more generic than the session title  Format Negotiation /
Standardization implied.  From the opening statements it was clear that
neither of the plenary talks actually introduced much controversy into the
session.  e.g. The VSL talk essentially was a plea for more descriptive
information for a large software brokering service provider and the informal
standards talk was too vague about the lightweight agreement and the extended
negotiated features to stir up much controversy.

Most of the break out session dealt with a number of issues surrounding
metadata. There were a number of arguments presented for using a small common
subset, such as the Dublin/Warwick metadata workshop recommendations, and
several objections that defend the need for sufficient extensibility for
future needs or simply special domain needs.

A strong plea was presented by the "search service providers" to make it easy
to bulk load as much textual information as possible and to have a small
number of common  metadata elements that could be uniformly used across any
spidered document. A short discussion ensued about the "spider problem". (A
great quote ..., "There are a handful of 'inhalers' out there, and all of a
sudden the 'network bandwidth conservation society' springs up over night.")
Several large sites confirmed that current spiders are lost in the noise
compared to end user browsing use of their pages. Some smaller sites reported
that spiders were a higher percentage of the hits at their sites and that
bandwidth should be conserved if possible. One side point was noted that
"personal agents" will become far more prevalent in the future and they also
need to mindful of the robots.txt convention (which they are typically not).

The counter argument came from the groups with serious material published on
the net. A wider range of metadata is required for structured, highly
authored information. In many cases the best cataloguers are not the document
authors, but highly trained librarians. e.g. metadata may be in the document
from an original author or may be added potentially externally to the
document itself by a third party.

The third party aspect of metadata led to some notion of the PICS model of
additional associated metadata. Quality of service style of discussion sprang
out of the desire to 'sign' metadata or portions of metadata information.
Objective, "refereed" types of information may be recorded as metadata, as
well as subjective evaluations, such as "my boss" rated this page highly.

A key part of the discussion took place when the distinction was made between
"passive publication" (the current spider model that anything online is
intended for general consumption) vs "active publication" (traditional
targeted communication, with value added editorial polishing). In the active
publishing model the quality of metadata and the mechanism for publication
are distinctly different. In many cases a push-model is far more appropriate
to guarantee timeliness of updates within search service databases ( at a
minimum it should be possible to provide notification of new information and
have it pulled within 24 hrs ).

A short discussion took place after a miscommunication about "standards" and
"economic models". The original commenter intended to say that any "proposed
standards" must be viable for the current "search businesses" in order to
remain a viable alternative. The message was received as "developing standard
acceptable economic models" for the current search services.

Once the metadata/publishing discussions settled down, some additional ideas
were bounced around about result lists and the ability of search services to
expand over more search vendors collections. The notion of an "exportable
open inverted index format" was mentioned as a possible common denominator.
No firm commitments were made, and no real consensus that this would solve
the root problems of interoperability.

------

Here are a few concrete mandates that I have gleaned from the conversations
although the group did not spend time assessing specific priorities /
timeframes during the breakout time (i.e. they represent some of my 
listening biases):

  Short term :
	At a minimum there needs to be some key agreement made on the "format
	of metadata". The proliferation of different metadata
	mechanisms/structures, just makes it more chaotic for search service
	providers. Many authors/cataloguers will take the time to supply
	additional meta information, they just want to be told which form the
	information should take.

	Mechanisms for defining "collections of information" are desperately
	needed for presentation of intelligent entry points into repositories
	of generated materials and for labeling purposes.

	There are strong desires for more efficient/effective bulk data
	transfer for the spiders operated by the search service providers.

  Medium term :
	There needs to be a conduit for more specific extensions to metadata.
	The ability to define standard contents for specific metadata fields
	(controlled vocabularies) will make them more useful to a wider range
	of users.

	The current Z39.50 standards should be evaluated to see if they
	fulfill the requirements that have been voiced for an open extensible
	interface.

  Long term :
	Some form of open standards for indexing information is needed so
	that provider generated indexes can be shared without openly
	publishing the source text materials. This effort could be the result
	of open index data formats or the result of open interfaces for index
	access.

	Additional standards will be needed to allow some interoperability
	with other media types. In particular, image retrieval is beginning
	to be deployed out of the research community into commercial products
	and would benefit greatly from extensible index and metadata
	interfaces.