Session IV Track C - Central vs. Distributed Searching/Indexing
Chair (and Notes Editor): Ken Weiss (UC Davis)

This session opened with consideration of some questions:

Will central indices survive?
Are distributed indices feasible?
Are central and distributed architectures mutually exclusive?

Central indicies were defined as the model offered by AltaVista, InfoSeek,
Excite, Lycos, and others. These are large indicies maintained at a single
site, which attempt to collect all (or at least a very significant
percentage) of the information available in their chosen domain (WWW,
USENET news...). Distributed indicies are information systems in which a
large number of servers are interconnected, and each contains only a part
of the full database.

After brief discussion the group concluded that, for the near future,
both central and distributed services will coexist. There were some
concerns about the scaling of central services, particularly in light of
the differing growth curves of the quantity of content to be indexed, and
the capacity of hardware and networks to process and transfer the content.
However, the representatives of the central indexing services (most notably
Mike Frumkin of Excite) feel that with improvements in crawling algorithms
and the addition of some information to guide spiders, the central services
can scale for the next several years. Other issues raised on the topic of
the viability of central indicies included the coming problem of indexing
large binary object data (multimedia), and the need to allow content
providers to determine what gets indexed in a richer way than robots.txt
can handle.

Distributed indicies are still considered to be research projects. Current
testbeds do not prove scalability to multi-million record data collections.
However, distributed indicies are a promising technology for the creation
of virtual communities in which smaller content providers push their
indexing information into a distributed subject-specific mesh. This is more
akin to the Harvest and Whois++ approaches. Another possible method for
grouping related information would be the use of metadata posted such that
a central index could create a virtual community. In a distributed model
there may be a need to publish the source of information along with the
data itself, to provide some means of assessing the quality and
trustworthiness of the material.

Central indicies may evolve into a distributed architecture as a response
to the problems of scaling. As for-profit servers are aggregated into a
distributed mesh, a new business model will have to develop to support the
exchange of value-added content by unrelated parties. These models could be
as simple as an agreement to retain advertisements when search results are
redirected through a metacrawler, or as ambitious as the IBM InfoMarket
project.

Once agreement was reached on the definition of the problem, discussion
turned to standards areas that will facilitate the scaling of central
indices and the development of distributed indicies. The following areas
were identified as promising for standards work, sorted by time frame:

0-12 months:
Enhancements to robots.txt
	* How often should content be indexed (volatility indicator)?
	* Last-modified listing
	* Explicit request for indexing

Cooperation between search services and WWW server vendors
	* Out-of-band data exchange (keep it off HTTP)
	* Bulk transfer of content from server to indexer

13-24 months:
Extended server/indexer cooperation
	* Negotiated push model for transfer of metadata (RDM? If not,
what's missing?)

Infrastructure for topical or semantic virtual communities
	* Common Indexing Protocol
(ftp://ds.internic.net/internet-drafts/draft-ietf-find-cip-01.txt)
	* Multipoint registration/notification for receipt of index PUSHes
(Whois++ model)
	* Standard semantic profiles of metadata
	* Simple tools to create and distribute indexing/metadata information

25-36 months:
	* Multicast or net news model for distribution of metadata