SWAD-Europe and CEN/ISS MMI-DC Workshop on Metadata for a Multilingual World Informal position paper on Dublin Core in Multiple Languages Thomas Baker 2004-07-05 This position paper is a brief and informal discussion about identifying and declaring metadata elements in the context of the Dublin Core Metadata Initiative. Helping to establish good practice for declaring vocabularies in a cross-referencable way -- along with providing a model for the institutional etiquette for doing so -- is perhaps the most important contribution DCMI can make, potentially more lasting than the fifteen elements per se. (Note that some of the terminology I use here goes a bit beyond the limits of DCMI's official specifications.) For the sake of argument: a Term Concept is a Term in an abstract sense -- the idea behind a term. I believe this is what DCMI is identifying with its Namespace Policy [1]. For example, the Dublin Core element "Subject" is identified with the URI http://purl.org/dc/elements/1.1/subject. This is the identifier for "Subject" that metadata implementors are supposed to use in their metadata (i.e., if they use URIs at all). A Term is described with a Term Description -- a cluster of (mostly) human-readable attributes such as Name, Label, Definition, Comment, Date, and Status. The Term Description for "Subject" is maintained by DCMI and published in various forms -- in a Web document, an RDF schema, and an XML schema, each with its own URI. When the Term Description is represented in a machine-processable schema language, it is referred to as a Term Declaration. A Term Description, however, can evolve over time -- the status of an element can change, a comment can be reworded for clarity, a bibliographical reference can be updated. Each successive historical state of a Term Description can be seen as a Term Version. DCMI currently identifies these successive Term Versions with URIs, though those URIs are not yet supported by official DCMI policy [2]. For example, the URI http://dublincore.org/usage/terms/history/#subject-002 denotes a specific historical version of the term "Subject". This method in effect treats Term Versions analogously to how W3C (and DCMI) treat documents -- e.g., with a timeless "Latest Version" (http://www.w3.org/TR/rdf-primer/) which at any given time corresponds to a "This Version" (http://www.w3.org/TR/2004/REC-rdf-primer-20040210/) and may point back to a "Previous Version" (http://www.w3.org/TR/2003/PR-rdf-primer-20031215/). The limits within which a term may evolve and still refer to the same Term Concept are described in the DCMI Namespace Policy [1]. Basically, if a term evolves in ways that are semantically incompatible with the Term Concept, it must be considered a new Term and given a new URI. A Translation is "about" a Term Concept, but it does not translate that Term Concept directly. Rather, it translates a Term Description. Specifically, it translates a particular Term Version -- a Term Description at a given point in time. In other words, a given Japanese translation of the element "Subject" may be about the Term Concept "Subject" (http://purl.org/dc/elements/1.1/subject), but it actually "translates" a specific Term Version (http://dublincore.org/usage/terms/history/#subject-002). Both assertions ("about" and "translates") seem necessary to fully express what is intended. Another type of assertion "about" another term is being developed in the context of discussion about Dublin Core Application Profiles (DCAPs): the Term Usage, or (more narrowly) Property Usage. A Property Usage is an assertion that a given application or set of metadata "uses" a property identified by its URI. A Property Usage may optionally be annotated with various sorts of usage notes -- context-specific clarifications of definition, local cataloging rules, constraints on cardinality and the like. Yet another type of assertion "about" another term is that of a semantic relationship between one term and another -- in particular, a term maintained by someone else. For example, Library of Congress maintains a vocabulary of MARC Relators such as Translator and Editor, which may be seen as "roles". The DCMI Usage Board is working with Library of Congress on the process of making and maintaining a set of assertions whereby Library of Congress declares each appropriate MARC Relator to be a subPropertyOf (i.e., semantically narrower than) the Dublin Core property Contributor, and DCMI then acknowledges and endorses those assertions. These assertions will be made not just "in print", but in the form of harvestable assertions in RDF. The hope is that this process might provide an example of good practice for "assertion etiquette" between vocabulary maintainers generally. Underpinning these various sorts of assertions -- Term Declarations, Translations, Property Usages, and the like -- is DCMI's model of process [3]. The process for adding new terms, for example, is that of proposal development in community-based working groups leading to approval and assignment of official status by the DCMI Usage Board. In principle, the Usage Board is also interested in reviewing and assigning status to Application Profiles (which are in effect sets of Property Usages), but the full framework within which this can be done in a principled manner -- the DCMI Abstract Model [4] -- has not yet attained official status. The process by which a systematic review of things such as Translations might be organized is even less clear given the language barrier. If this is a reasonable approach to declaring, versioning, and translating metadata terms in multiple languages, we should recognize that the thing we call a Term is actually being identified using multiple URIs, and that those different URIs serve different purposes: -- A term is identified with one URI for its Term Concept. This is what all of the Term Versions and Term Translations are "about". It is the common reference point that holds all of the versions and translations together and promotes the interoperability of descriptive metadata in open, loosely-coupled, distributed systems. -- A term is also identified in the form of a Term Version. Since languages inevitably evolve, policies for the identification of Term Concepts must allow for this. The pragmatic solution is to allow for change that is "semantically compatible" with the Term Concept. -- Web pages and formal schemas documenting these terms are of course themselves identified by URI. For the purposes of interoperability, it is desirable that the number URIs used in the metadata of the world be kept down, lest we overladen the Semantic Web with assertions of the sort that "term http://foo.org/1.1/bar is compatible with http://foo.org/1.2/bar". At the same time, other purposes require precision with regard to historical version. It would be helpful for all concerned if a broader consensus could be reached on some of the general issues outlined above, such as the contrast between a Term Concept and a Term Version (however we might want to call them) and on the allowability of "semantically compatible" evolution within a Term Concept. A Task Force on Vocabulary Management in the W3C Semantic Web Best Practice and Deployment Working Group hopes to make some progress in this direction [5]. [1] http://dublincore.org/documents/dcmi-namespace/ [2] http://dublincore.org/usage/terms/history/ [3] http://www.kc.tsukuba.ac.jp/dlkc/e-proceedings/papers/dlkc04pp112.pdf [4] http://www.ukoln.ac.uk/metadata/dcmi/abstract-model/ [5] http://lists.w3.org/Archives/Public/public-swbp-wg/2004Jun/0136.html