Efficient and Authenticated Sharing
and Indexing of Internet Resources

Shirley Browne and Keith Moore, University of Tennessee
Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory

Position paper for WWW Consortium Distributed Indexing/Searching Workshop

Internet communities need to be able to share indexing information between domains and between organizations in order to facilitate interdisciplinary and inter-organizational resource sharing. Organizations and communities need a way to share resources without each organization running a Web crawler that accesses each of the other organizations' Web sites, because such an n^2 solution does not scale. Resources providers need an easy-to-use mechanism for publishing metadata in a place where users and indexing services can access it easily and efficiently.

Scalable, efficient access to popular resources requires widespread replication, or mirroring, of these resources. With current mirroring schemes, a different name (i.e., URL) is given to each copy of a replicated file. Web crawlers must access all the mirrored copies and deduce which ones are duplicates. A user who accesses a mirrored copy, perhaps after being given a list of alternative mirror sites by an overloaded server, has no way of verifying that the retrieved mirror copy is identical to the original. Thus, there is a need for a single location-independent name for all copies of a file, so that metadata can be attached to this name rather than to the individual copies. This metadata should include a digitally signed file fingerprint so that a user can verify the integrity of a retrieved file copy. There is also a need for users to be able to verify the authenticity and integrity of metadata that comes from different sources.

The Resource Cataloging and Distribution System (RCDS) under development at the University of Tennessee is addressing the above needs. The system components include catalog servers, location servers, and file servers. Resource providers assign location-independent names to resources and submit metadata to an RCDS catalog server. An authorized file server that mirrors a copy of a file registers its name-to-location binding with an RCDS location server. An RCDS catalog server provides a centralized location from which Web crawlers can gather metadata. For clients such as Web browsers, an RCDS catalog server resolves a name to associated metadata, which may includes names for individual files. An RCDS location server resolves a name to a list of locations. The RCDS catalog server design provides for attaching a digitally signature to an assertion or to a set of assertions, where an assertion consists of an attribute-value pair.

Ideally, RCDS should use a standard format for assertion metadata, so that it presents a standard interface to clients such as Web browsers and Web crawlers, but no suitable standard currently exists. Text representations of metadata are problematic because of changes introduced by editing and other processing that invalidate a digital signature over the byte contents. The Harvest SOIF format is in practice a text-based format, although it allows arbitrary content for the value of an attribute. A digital signature could conceivably be attached to an entire SOIF record, if the record could be guaranteed not to change during transfer and processing, although this would not allow for selective signing of subsets of assertions. To be suitable for use with RCDS, SOIF would also need to allow a URN for the identifer of an object, in addition to a URL.

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.