A Unified Element Vocabulary for Metadata

John A. Kunze, jak@ckm.ucsf.edu
Center for Knowledge Management
University of California, San Francisco

15 May 1996

It is generally agreed that a metadata record for an information resource contains descriptive elements that are suitable for use by automatic indexing programs (such as Web crawlers) and by citation display software (such as increasingly metadata-aware Web browsers). Each element has a name, whether or not a particular record syntax identifies it explicitly (e.g., "Author" in Author=Plato) or implicitly (e.g., the second unnamed element might by convention contain a "Title").

Problems arise in designing element name spaces (vocabularies) that fit current and projected metadata needs across many uses and fields of knowledge. One need is to support an ongoing process of enrolling new names into element vocabularies (extending them) while minimizing conflict with natural language connotations and existing element definitions. An appealingly simple approach is to partition a top-level name space into subspaces (that are perhaps further sub-partioned) for which enrollment and conflict management is distributed to various interested communities. While thus delegating name space management has clear strengths, some pitfalls bear highlighting.

One problem with partitioned element vocabularies is that the divisions tend to relax over time. For example, because boundaries between intellectual domains such as social sciences, humanities, and biology blur, communities pick up elements from each other and introduce inter-community dependencies and inconsistencies. As fields of knowledge advance at different rates, outdated divisions become more onerous and community interoperability drops. Moreover, experience with Z39.50 shows steady inclination in communities not just to pick and choose elements from each other, but to import external vocabularies whole into their own vocabularies. Another problem is that while hierarchical naming systems implied by partitioned vocabularies work well in software environments, they fit poorly in written and spoken communication, where they will be used often.

A Unified Element Vocabulary

A design worth exploring is a single vocabulary for all elements. Different definitions for the same element would appear together as alternates, just as in a natural language dictionary. Alternates would be tagged with the domain of origin (e.g., biomed). Such a list of elements could scale if need be a comprehensive dictionary containing one element per word of a natural language. To test whether an element vocabulary needs that much room to grow, it is sufficient to reflect how hard it is to think of a natural language word that could not conceivably name an element.

This vocabulary would consist of a stable base of approved elements augmented by an informally evolving set of commonly used elements. The vocabulary approval process would be lightweight and adaptable; an interesting functional model to borrow from is natural language, which has approved vocabularies (e.g., the Oxford English dictionary) augmented by a set of commonly used terms. An element vocabulary might also be amenable to categorization using concepts from the Warwick Framework, which partitions elements along functional lines.

This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.