Fifth International World Wide Web Conference

May 6-10, 1996, Paris, France

Panel5: Uniform Resource Characteristics: Metadata for the Masses

Wednesday 8 May, 1996 - 9:00-10:30

Panel Chair

Mark Madsen
APM/ANSA Limited
Poseidon House, Castle Park
Cambridge CB3 0RD, UK
Telephone: +44-1223-568934
Fax: +44-1223-359779
msm@ansa.co.uk

Panel Members

Leslie Daigle, Vice President, Research, Bunyip Information Systems. leslie@bunyip.com
Renato Iannella, DSTC Pty Ltd, University of Queensland. renato@dstc.edu.au
Dan LaLiberte, National Center for Supercomputing Applications. liberte@ncsa.uiuc.edu
Stu Weibel, Senior Research Scientist, OCLC Office of Research. weibel@oclc.org

Panel Description

Uniform Resource Characteristics (URCs) are descriptions, such as bibliographic or configuration control records, of web resources. They are intended to be used in the web for a range of purposes, including, but not limited to the following samples.

Cataloguing and indexing web resources: when those resources are not static documents, but live services, the degree of abstraction provided by description in terms of URCs is essential.

Resolution of services: a URC can point to many instances of a resource (such as a set of mirrors of heavily demanded pages) and contain sufficient information for the appropriate instance to be selected on the fly.

Web applications that make use of URCs display a need for at least three forms of interoperability.

The capability for separately-developed applications to exchange descriptions that conform to a commonly-agreed description scheme.
The possibility to translate descriptions from one scheme to another with minimum loss of information.
The ability for two applications that use the same descriptive scheme but different transfer protocols to exchange descriptions with the aid of a gateway.

To provide these forms of interoperability, the required framework must specify:

The structure of the datatypes and a minimal query capability, as well as the rules for mapping the datatypes and queries to and from at least two record formats.
The conveyance of those record formats in well-established Internet transfer protocols.
At least two syntaxes for defining new datatypes.
The semantics of a very small number of descriptive elements, such as URL and Content-type, that are needed for retrieving resources over the network, are appropriate for all URCs to utilize, and are within the domain of IETF standards.

In order to demonstrate all the forms of interoperation, the aim is also to standardize a small number of particular URC subtypes, and a small number of realizations of the URC framework in well-established Internet transfer protocols. The panel will discuss the issues and obstacles that arise in developing URC schemes and relate the progress that has been made to date.

Position Statements and Biographical Information

Resources and Services with Self-Knowledge - Mark Madsen

The dynamic aspects of the Web take on more importance with the introduction of new technologies for handling mobile code (such as Java and Spinergy) and integrating with object systems such as CORBA through gateways as in the ANSAweb system or using dynamic protocol installation as in PostModern Computing's Black Widow. With the decay of the metaphor that everything on the Web is a static document whose characteristics can be deduced from the minimal metadata contained in the HTTP content-type headers, it becomes important to be able to describe characteristics of resources of varied (and unpredictable) kinds. These resources may be documents of new kinds, streams of information created on the fly such as are encountered in live video, or agents that can migrate between Internet nodes.

In each of the cases mentioned, the object itself and the service it provides need to be described in a form which allows clients or other services to understand and be understood when they specify requirements for service quality or search for appropriate responses to context-sensitive queries. The URC framework therefore needs to be developed to allow arbitrary objects to be referenced and encoded, and to have a self-knowledge so that they can make or participate in decisions about how they are to fulfil their purposes or those of their creators. In addition, all these processes need to be capable of being reliably automated.

Mark Madsen is a System Architect at APM/ANSA in Cambridge. He has held research positions at the Universities of Cape Town, Sussex, Lancaster, Portsmouth and Leicester. Since 1980 he has been involved in projects as diverse as housing demand prediction, physical system simulations, biomedical treatment analyses and evaluations, distributed real time multimedia systems, and distributed information systems design. Since joining APM in 1995, he has contributed to the ANSAweb architecture, and is presently continuing ANSA research on the WWW as well as working on Internet security issues, in his role as Project Director of the European Commission's E2S Project. Mark is designated co-chair of the planned URC working group of the Internet Engineering Task Force.

Metadata for a Better Information World - Leslie Daigle

A basic model of Internet information transactions rests on three components: an information need, an information processing task or activity, and information resources. That is, a client's information need can be satisfied by some processing task which will draw on specific resources. The basic Web interaction model distributes the processing between the client and resource ends of the transaction (the server managers a collection of resources and transmits them on request; the client renders the resource in ways appropriate to the local system). However, an increasing number of services are lodging a significant component of the transaction computation at the server side (e.g., complex search services with basic forms interfaces), leading to an imbalance of computational commitment to any transaction.

To yield more sophisticated interactions with Internet resources, clients need to be able to reclaim some of the processing task supporting an interaction. This requires the client to have the ability to represent its needs to services, but it also requires that services have the ability to provide descriptions of their resources. While the content of these resource descriptions may vary from application to application (e.g., those for music retrieval will differ significantly from those for geospatial data representation), the structure of the resource description's representation must be standard across the Internet, permissive of the creation of detailed application-specific resource descriptions, and yet independent of any one or collection of potential applications. These resource descriptions are key data objects in the implementation of information transactions on a balanced basis between sophisticated clients and services.

Leslie Daigle has worked at Bunyip Information Systems Inc., since 1993, originally as a part-time employee while she worked on her PhD in Computer Science at McGill University, but gradually migrating to full-time involvement with Bunyip. As the Project Manager for Bunyip's Desktop Internet Resource Discovery Client, Silk, Leslie was the principal researcher developing the Uniform Resource Agent (URA) technology. She is active in the Internet Engineering Task Force, particularly in the areas of UR* and Directory Services development. Now Vice President, Research at Bunyip, Leslie is ever hopeful of pursuing her PhD at McGill in the area of multimedia retrieval systems, and never misses an opportunity to argue the short-comings of text-only retrieval technology.

Silverlining for URNs - Renato Iannella

Metadata can solve one of the largest problems in the Internet and WWW today - excessive low quality retrieval of information. The work of the URI working groups in developing the technologies for URCs to be rapidly deployed will address these types of issues.

I argue that URCs and URNs have a strong relationship if we are to try and provide the WWW with scalable resource discovery services. To assist in the process, we all need to fast-track this development to exploit the benefits that high-quality indexing services can bring to the WWW community.

There will be a need to support various URC schemes which can describe documents, geospatial data, dynamic services, and many other types of resources. Some flexible and extensible standards need to be set to enable the interchange and semantics of these URCs. Clearly, identifying the URCs with URNs is mandatory.

Renato Iannella has been a Senior Research Scientist at the Distributed Systems Technology Centre (DSTC) based in Brisbane, Australia, since 1994. At the DSTC, he leads the Resource Discovery Unit which investigates technologies used in the seamless discovery and retrieval of Internet resources on globally distributed networks. Renato is an active contributor to Internet Engineering Task Force working groups on Uniform Resources Identifiers and has authored many papers on Internet resource discovery issues. He also has expertise in Directory Services, Electronic Messaging, Object-Oriented Technologies, and Human-Computer Interaction. Renato has previously been a Senior Research Fellow at Bond University where he worked as a lecturer in Object-Oriented Programming and completed his PhD on Graphical User Interface Reference Models for Messaging and Directory Systems in 1994. Prior to starting work at Bond University in 1989, Renato worked at the University of South Australia as a Computer Systems Officer for a number of years, and as a programmer for small-banking systems software company.

URC Simplification - Dan LaLiberte

Universal Resource Characteristics have been in the design stage for several years. Part of the reason for the delay in designing, implementing, and deploying them is over-complexification. But ironically, expanding the scope of the URC problem can actually simplify it at a higher level of generality that is well understood.

I argue that it is pointless to try to agree on what distinguishes objects themselves from descriptions of or metadata about objects. Instead, we should just allow that a distinction can be made, and that metadata is any kind of data, typically highly structured. The context of the use of any data determines whether it is to be used as metadata. References to URCs or resources should all use URIs, not specifically URLs or URNs.

With this general foundation, we still need agreement on what types and formats of metadata are needed, but this should be delegated to various communities of users and proven in practice before trying to agree as a whole. Very little applies to everything.

Daniel LaLiberte joined the National Center for Supercomputing Applications in 1988 where he has worked on scientific visualization and collaboration tools. He is currently investigating world wide web architecture issues including searching, URIs, and annotation capabilities. He received his B.S. degree in Computer Science from the University of Minnesota in 1978 and is currently a CS graduate student at the University of Illinois, Urbana-Champaign studying software engineering, programming languages, and the evolution of information organization. Dan has co-authored Internet-Drafts on Uniform Resource Name schemes for the Uniform Resource Identifiers working group of the Internet Engineering Task Force.

Specification of Semantic Content - Stu Weibel

Uniform Resource Characteristics, or URCs, remain an abstraction in the minds of many disparate groups of people. At the highest level, URCs may be thought of as resource description of any conceivable type that might be searched, retrieved, or exchanged among clients and servers of unknown and changing functionality.

The Net brings many communities into the same operational environment, requiring the exchange of formal, structured metadata originating from diverse description models. People often think they are talking about the same problem, but the unique perspective of their respective communities colors their vision and leads them in different directions. Solutions that fail to accommodate the diversity of the needs of the many stakeholders will not prevail.

The Warwick Metadata Workshop resulted in a proposal for an abstract architecture of discrete, modular packages of metadata including, (but not limited to) resource description, terms and conditions, content ratings, and intellectual property rights. These metadata schemes might be defined and maintained by different communities, but could be made to work in concert to support the needs of many stakeholders.

The semantic contents of URCs must be specified by the content experts that have standing in their respective communities. Registries of metadata (similar to the notion of MIME registration) will provide the structure for formal maintenance and explication of the many metadata packages that will emerge.

The challenge for URC developers is to provide a sufficiently flexible and robust mechanism for exchange of resource description without constraining the independent syntax and semantics of the metadata packages that will emerge.

Stuart Weibel has worked in the Office of Research at OCLC since 1985. During this time he has managed projects in the areas of automated cataloging, document capture and structure analysis, and electronic publishing. He currently coordinates networked information research projects in the Office of Research, including applications of World Wide Web technology and Internet protocol standardization efforts. Dr Weibel is active in the Internet Engineering Task Force (IETF) working groups on Hypertext Markup Language and Uniform Resource Identifiers. He is also a founding member of the International World Wide Web Conference Committee (IW3C2). Other areas of service include participation on the Task Force on the Preservation of Digital Information and the ALCTS task force on Bibliographic Access to Electronic Resources.

Mail to the Organizers

Created: 9 April 1996
Last updated: 26 April 1996