URCs as a substrate for distributed searching
Ron Daniel Jr.
Advanced Computing Lab
MS B287
Los Alamos National Laboratory
Los Alamos, NM, USA 87545
rdaniel@lanl.gov
The increasing number of resources on the web makes centralized indices
less and less satisfactory. Some form of distributed cataloging and
indexing effort seems necessary. But exactly what form? What sort of
cataloging information should be collected? Who will create the
descriptions and how will they be managed over the lifetime of the
resource and beyond? What protocols will be used to transfer the
descriptions? How will queries be encoded and what query facilites will
be provided? What sort of forward knowledge must be propagated to
allow reasonable query forwarding?
These questions cannot be answered once and for all. We must have a
system that can adapt to change, that can allow many different
experimental solutions to co-exist, while preserving to the greatest
degree possible the intellectual investment that has been made in
describing network resources. The library community has already shown
us the power of shared cataloging.
Uniform Resource Characteristics (URCs) were proposed by the IETF's
Uniform Resource Identifier working group as a structure for
containing information on networked resources.
The rough documents for URC standards specify an abstract service that
can have many different concrete realizations, and specifies how those
different realizations can interoperate. The key ideas behind URCs are:
- Allow for a variety of attribute sets, known as "URC subtypes".
- An attribute set that is appropriate for describing HTML pages is not
likely to adequately describe scientific datasets. Any reasonable indexing
system must allow different descriptive schema to be used, and must address
namespace conflicts between the schemes.
- Standardize the meaning of a very few elements.
- Having a variety of descriptive schemes means that systems will frequently
encounter descriptions in unknown schemas. However, elements such as
URL, URN, URC, and Content-type have a rigorous definition and are so
pervasive that standardizing them will allow a great deal of useful work
to be performed even when the whole of the descriptive scheme is not known.
- Don't specify one syntax, instead specify a canonical representation
that can be mapped into and out of a variety of syntaxes.
- Specifying one syntax is a recipie for disaster over the long haul, and
leads to religious battles in the short term. PICS, IAFA, SGML, and MARC
have adherents for reasons. An appropriate canonical
representation should accomodate all of these.
- Standardize the basic operations to manipulate the canonical representation,
and let different query and transformation languages be developed to
utilize those operations in novel ways.
- Services will want to compete on the simplicity or power of their search
capabilities. A means for allowing that will also allow search
capabilities to gracefully evolve.
- Don't specify one protocol, instead specify how the canonical
representation and the operations on it are encoded in particular
protocols.
At the W3C
Workshop on Distributed Indexing/Searching
I would like to present a summary of the current state of
the URC effort and give some examples of their use for distributed
resource discovery.
For more information, see
http://www.acl.lanl.gov/URC/
Last Modified: 14 May, 1996
This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.