Extensible Domain-Specific Metadata Standards

Shirley Browne, Kay Hohn, and Tim Niesen
Reuse Library Interoperability Group

Position paper for WWW Consortium Distributed Indexing/Searching Workshop

The Call for Participation asks whether the Dublin metadata specification should be added to HTML. We argue that a universal metadata specification should not be standardized as part of HTML because such a standard will not be universally applicable. Instead, we propose that domain-specific standards bodies develop metadata standards for their domains and bind those to existing Web standards such as HTML, SGML, and Z39.50. As an example of how this can be done, we offer the Reuse Library Interoperability Group (RIG) work on the Basic Interoperability Data Model (BIDM) for the software reuse domain, and on Web bindings for the BIDM.

The BIDM, which is an IEEE standard, is a minimal set of metadata that a reuse library should provide about its reusable assets in order to interoperate with other reuse libraries. The BIDM is expressed in terms of an extended entity-relationship data model that defines classes for assets (the reusable entities), the individual elements making up assets (i.e., files), libraries that provide assets, and organizations that develop and manage libraries and assets. The model was derived from careful study and negotiation of the commonalities between existing academic, government, and commercial reuse libraries, by representatives from these libraries. Reuse libraries need not adopt the BIDM internally, although many have. They can continue to use internal search and classification mechanisms appropriate to their unique missions while using the BIDM as a uniform external interface. The current work on Web bindings aims to map the abstract data model to concrete syntax specifications that can be used for interchange of asset metadata via the World Wide Web. The Web bindings, one that maps the BIDM to an SGML Document Type Definition (DTD), and another that maps to META and LINK tags in the header of an HTML document, have been defined and are currently being implemented and tested. Several participants are using the Harvest Gatherer to collect and interpret the metadata, using the Gatherer's SGML processing capabilities, but other SGML tools may be used as well.

Of course one needs knowledge of the semantics of the data model to interpret and process the metadata appropriately, and this knowledge may be obtained by reading the BIDM document. However, it would be advantageous to be able to transmit this meta-model information as well, so that it could drive interpretation of the asset metadata automatically. Furthermore, one extension to the BIDM has already been defined (the Asset Certification Framework) and another is underway (the Intellectual Property Rights Framework). Individual libraries may have additional metadata, beyond that specified in the BIDM, that they would like to make available, and may wish to extend the BIDM for this purpose. Thus, work is underway on a formal meta-model for describing the basic model and extensions to it.

We hope that groups in other domains will benefit from our experiences in developing and implementing an extensible data model for the software reuse community. We believe that the extended entity-relationship data modeling technique is a powerful way of capturing and describing metadata about network-accessible resources. We also believe that the RIG has achieved the proper balance between domain-specific standardization and domain-independent standardization, by developing an abstract semantic domain-specific data model and mapping the abstract model to concrete domain-independent representations such as SGML and HTML.

