Workshop on 
                 Metadata for a Multilingual World

Informal position paper on Dublin Core in Multiple Languages

Thomas Baker

This position paper is a brief and informal discussion about
identifying and declaring metadata elements in the context of
the Dublin Core Metadata Initiative.  Helping to establish good
practice for declaring vocabularies in a cross-referencable way
-- along with providing a model for the institutional etiquette
for doing so -- is perhaps the most important contribution DCMI
can make, potentially more lasting than the fifteen elements
per se.  (Note that some of the terminology I use here goes
a bit beyond the limits of DCMI's official specifications.)

For the sake of argument: a Term Concept is a Term in an
abstract sense -- the idea behind a term.  I believe this
is what DCMI is identifying with its Namespace Policy [1].
For example, the Dublin Core element "Subject" is identified
with the URI http://purl.org/dc/elements/1.1/subject.  This is
the identifier for "Subject" that metadata implementors are
supposed to use in their metadata (i.e., if they use URIs
at all).

A Term is described with a Term Description -- a cluster
of (mostly) human-readable attributes such as Name, Label,
Definition, Comment, Date, and Status.  The Term Description
for "Subject" is maintained by DCMI and published in various
forms -- in a Web document, an RDF schema, and an XML
schema, each with its own URI.  When the Term Description
is represented in a machine-processable schema language,
it is referred to as a Term Declaration.

A Term Description, however, can evolve over time -- the
status of an element can change, a comment can be reworded
for clarity, a bibliographical reference can be updated.
Each successive historical state of a Term Description can
be seen as a Term Version.  DCMI currently identifies these
successive Term Versions with URIs, though those URIs are
not yet supported by official DCMI policy [2].  For example,
the URI http://dublincore.org/usage/terms/history/#subject-002
denotes a specific historical version of the term "Subject".

This method in effect treats Term Versions analogously to
how W3C (and DCMI) treat documents -- e.g., with a timeless
"Latest Version" (http://www.w3.org/TR/rdf-primer/)
which at any given time corresponds to a "This Version"
and may point back to a "Previous Version"

The limits within which a term may evolve and still refer to
the same Term Concept are described in the DCMI Namespace
Policy [1].  Basically, if a term evolves in ways that are
semantically incompatible with the Term Concept, it must be
considered a new Term and given a new URI.

A Translation is "about" a Term Concept, but it does
not translate that Term Concept directly.  Rather, it
translates a Term Description.  Specifically, it translates
a particular Term Version -- a Term Description at a given
point in time.  In other words, a given Japanese translation
of the element "Subject" may be about the Term Concept
"Subject" (http://purl.org/dc/elements/1.1/subject),
but it actually "translates" a specific Term Version
Both assertions ("about" and "translates") seem necessary to
fully express what is intended.

Another type of assertion "about" another term is being
developed in the context of discussion about Dublin
Core Application Profiles (DCAPs): the Term Usage, or
(more narrowly) Property Usage.  A Property Usage is an
assertion that a given application or set of metadata "uses"
a property identified by its URI.  A Property Usage may
optionally be annotated with various sorts of usage notes --
context-specific clarifications of definition, local cataloging
rules, constraints on cardinality and the like.

Yet another type of assertion "about" another term is that
of a semantic relationship between one term and another -- in
particular, a term maintained by someone else.  For example,
Library of Congress maintains a vocabulary of MARC Relators
such as Translator and Editor, which may be seen as "roles".
The DCMI Usage Board is working with Library of Congress on the
process of making and maintaining a set of assertions whereby
Library of Congress declares each appropriate MARC Relator
to be a subPropertyOf (i.e., semantically narrower than) the
Dublin Core property Contributor, and DCMI then acknowledges
and endorses those assertions.  These assertions will be
made not just "in print", but in the form of harvestable
assertions in RDF.  The hope is that this process might
provide an example of good practice for "assertion etiquette"
between vocabulary maintainers generally.

Underpinning these various sorts of assertions -- Term
Declarations, Translations, Property Usages, and the like --
is DCMI's model of process [3].  The process for adding new terms,
for example, is that of proposal development in community-based
working groups leading to approval and assignment of official
status by the DCMI Usage Board.  In principle, the Usage
Board is also interested in reviewing and assigning status to
Application Profiles (which are in effect sets of Property
Usages), but the full framework within which this can be
done in a principled manner -- the DCMI Abstract Model [4] --
has not yet attained official status.  The process by which
a systematic review of things such as Translations might be
organized is even less clear given the language barrier.

If this is a reasonable approach to declaring, versioning,
and translating metadata terms in multiple languages, we
should recognize that the thing we call a Term is actually
being identified using multiple URIs, and that those different
URIs serve different purposes:

-- A term is identified with one URI for its Term Concept.
   This is what all of the Term Versions and Term Translations
   are "about".  It is the common reference point that holds
   all of the versions and translations together and promotes
   the interoperability of descriptive metadata in open,
   loosely-coupled, distributed systems.

-- A term is also identified in the form of a Term Version.
   Since languages inevitably evolve, policies for the
   identification of Term Concepts must allow for this.
   The pragmatic solution is to allow for change that is
   "semantically compatible" with the Term Concept.

-- Web pages and formal schemas documenting these terms are
   of course themselves identified by URI.

For the purposes of interoperability, it is desirable that the
number URIs used in the metadata of the world be kept down,
lest we overladen the Semantic Web with assertions of the
sort that "term http://foo.org/1.1/bar is compatible with
http://foo.org/1.2/bar".  At the same time, other purposes
require precision with regard to historical version.  

It would be helpful for all concerned if a broader consensus
could be reached on some of the general issues outlined above,
such as the contrast between a Term Concept and a Term Version
(however we might want to call them) and on the allowability
of "semantically compatible" evolution within a Term Concept.
A Task Force on Vocabulary Management in the W3C Semantic Web
Best Practice and Deployment Working Group hopes to make some
progress in this direction [5].

[1] http://dublincore.org/documents/dcmi-namespace/
[2] http://dublincore.org/usage/terms/history/
[3] http://www.kc.tsukuba.ac.jp/dlkc/e-proceedings/papers/dlkc04pp112.pdf
[4] http://www.ukoln.ac.uk/metadata/dcmi/abstract-model/
[5] http://lists.w3.org/Archives/Public/public-swbp-wg/2004Jun/0136.html