Principles of Good Practice for Managing RDF Vocabularies and OWL Ontologies

W3C Editor's Draft 30 October 2007

This version:: http://www.w3.org/2006/07/Vocab/principles-20071030/
Latest version:: http://www.w3.org/2006/07/Vocab/principles
Previous version:: This is the first published version.
Editors:: Elisa Kendall, Sandpiper Software, Inc.; Vit Novacek, DERI Galway

Metadata element sets, taxonomies, subject headings, thesauri, and ontologies are examples of vocabularies which are increasingly deployed in Semantic Web settings. Managing vocabularies for use by Semantic Web applications means identifying, documenting, and publishing vocabulary terms in ways that facilitate their citation and re-use in a wide range of applications. This document articulates some basic principles of good practice for managing an RDF vocabulary. Following these principles makes an RDF vocabulary "usable": new users can learn quickly how to use the vocabulary, and a relationship of trust is built between the user community and the vocabulary developers/maintainers. This promotes growth of a user community, which generates more feedback for the developers/maintainers, leading to further improvements in quality and usability.

This document focuses primarily on those principles of good practice where a clear recommendation can be made. Other related issues remain research topics, and therefore are outside the scope of this document. Further, there are a number of ways to address most, if not all, of the topics highlighted below. While this document does not attempt to provide an exhaustive survey of those methodologies/approaches, it is intended to provide pointers to approaches that have worked well for seasoned practitioners.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was prepared by the Semantic Web Deployment Working Group (SWD) as part of the W3C Semantic Web Activity. It attempts to respond to a number of questions directed to the Semantic Web Best Practices and Deployment Working Group, and its successor, with regard to strategies for publishing and managing vocabularies over time. The recommendations reflect the experience of members of the working group in developing and managing individual vocabularies, such as Dublin Core and the Simple Knowledge Organisation System (SKOS), as well as in managing repositories of vocabularies, such as the BioPortal, a web application providing access to the Open Biomedical Ontologies (OBO) library. The principles we describe represent only the "tip of the iceberg" in terms of what may be needed to support ontology evolution over time, but cover a minimal common set of practices required to support an active user community.

This document is a W3C Editor's Draft published to solicit comments from interested parties. All comments are welcome and may be sent to public-swd-wg@w3.org; please include the text "comment" in the subject line. All messages received at this address are viewable in a public archive.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Introduction

An RDF vocabulary is a set of resources denoted by URIs. Informally, these resources are known as the "terms" of the vocabulary. These resources will usually (but not necessarily) be of type rdf:Property, rdfs:Class, owl:Class, or skos:Concept.

An RDF vocabulary is created and maintained for the use of a community of people as a set of building blocks for creating RDF descriptions of things in their domain of interest. An RDF vocabulary usually implies a shared conceptualisation, and thus the notion of an "RDF vocabulary" is similar to the notion of a "web ontology" (see OWL Web Ontology Language Use Cases and Requirements [webont-req]).

Many controlled vocabularies have been encoded in RDF, OWL, and other knowledge representation languages, and a growing number are available in the public domain. A fraction of these appear to have fostered significant reuse to date, however (see discussion thread starting with Semantic Web Ontology Map from earlier this year). While there are many issues that can limit reuse opportunities, a significant contributor is the lack of well-specified policies for vocabulary management, metadata, and provenance specification, depending on the application. Several of the most prominent RDF vocabularies currently in use (e.g., OWL, FOAF, Dublin Core, SKOS Core) have emerged from a close collaboration between a relatively small community of developers and a larger community of users. The prominence of these vocabularies may be attributed to their utility, but also to the commitment made by those responsible for developing/maintaining the vocabularies to forming, accomodating, serving, and working with a community of users.

In addition to these individual vocabularies, a number of portals are emerging as "collection points" for vocabularies designed to support users in specific domains, such as the BioPortal from NCOR (National Center for Biomedical Ontology) or specific communities, such as the Object Management Group's Business Vocabulary and Ontology Portal (coming soon). Such portals are useful for users searching for the vocabularies they serve, but also because of the significant metadata describing the ontologies that they provide. Increasingly, the metadata describing a particular vocabulary is becoming as important as the vocabulary itself for documentation and reuse purposes.

The goal of implementing the principles outlined in this document is to make an RDF vocabulary "usable". This could be restated as, managing an RDF vocabulary in such a way that it can easily be understood and deployed by users.

*** paragraph on digital preservation, per email from Dan Brickley re: social responsibility, etc. ***

*** review paragraph 1.2 in http://esw.w3.org/topic/VocabManagementNote to see if we want to incorporate any other thinking from the original note in the introduction - definitions of vocabularies, intro to the remainder of the document, etc.***

Principles of Good Practice

In this section of the document we present a number of principles reflecting collective experience in publishing and managing RDF vocabularies. These are topics for which there is general consensus among practitioners, and for which clear recommendations can be made. They include:

1. Use URIs For Naming

An RDF vocabulary consists of a set of URIs. "Naming" refers to the act of allocating URIs to resources (see RDF Semantics [RDFSem]).

The developers/maintainers of an RDF vocabulary should inform the potential user of the following:

These practices are among the most critical for ensuring that potential users can trust that a particular RDF vocabulary is stable, will persist for some length of time, and can be either referenced or used in applications they are building.

For example, in developing applications that leverage OWL-S ontologies, there have been times when some of the ontologies on which OWL-S depends, such as those representing the names of cities and states in the US, return 404 errors (i.e., they are unavailable, and thus the corresponding applications may fail unless they have access to a locally cached version). This appears to be due to temporary outages, but is an issue for developers nonetheless. Additionally, the standards described by certain dependent ontologies are constantly evolving, for example, language and country codes, while the referenced ontologies themselves appear to be static and aging. The end result is that users are less confident of the availability or applicability of these dependent vocabularies and thus of OWL-S itself.

In cases where access to such "utility" vocabularies is critical for many applications, there are discussions underway with authoritative organizations regarding development and management of key vocabularies. For example, the Library of Congress is the registration authority for parts of ISO 639 (language codes) and ISO 3166 (country codes), and thus the obvious choice to manage and maintain the corresponding RDF vocabularies. Until such time as "all of the vocabularies we might need" become available from an authoritative source, or for any of us considering publishing vocabularies designed specifically for reuse, the minimal set of naming conventions identified above should be the starting point.

Guidelines for choosing URI namespaces, including considerations and examples to assist in the process are provided in Best Practice Recipes for Publishing RDF Vocabularies [Recipes]. Additional suggestions are detailed in Cool URIs for the Semantic Web [CoolURIs]. A number of organizations are grappling with decisions regarding general URI schemes, such as the date-based scheme generally used by the W3C. Some communities, such as the Object Management Group (OMG), have found that as the number of documents and communities within the broader organization grows, dates alone may not be sufficient. In order to assist potential users in finding various artifacts on the OMG site, recent proposals suggest including the higher level specification name, date-based version information, artifact type, and so forth as part of the subordinate URI scheme. Use of a simple RDF vocabulary to support this scheme and assist in navigation, once adopted, is also being discussed.

*** note in wiki from F2F-01-2007: Good place to mention the domain registration problem???***

2. Provide Readable Documentation

RDF vocabulary publishers should provide natural-language (i.e., human-readable) documentation about the vocabulary and its proper use. The principle aim of this documentation is to help potential users *learn* how to apply the vocabulary, and therefore to promote *consistency* in the way that the vocabulary is applied. Inconsistent usage reduces the value of a vocabulary, because the meaning associated with the vocabulary becomes in practice ambiguous.

At a bare minimum, a list of the terms should be published, along with their text definitions. Ideally, detailed prose describing proper usage patterns and scenarios is recommended, with clear examples. Relevant metadata should include a description of the use-case(s) that formed the basis for the original vocabulary development, its intented audience and target domain, references and authoritative sources used, development and validation methodology, and other domain dependent content that may be useful for reuse purposes.

In practice, we recommend publishing both human and machine-readable documentation, with liberal use of rdfs:seeAlso. For vocabularies that define terminology for a single reference, such as for ISO language and country codes, it may seem trivial to link back to the original source documents and registration authority. In many cases, however, vocabulary terms are drawn from a number of sources, and documentation is critical to reusability. Examples of good documentation as well as actual metadata terms that can be used in documenting vocabularies include the DCMI Metadata Terms vocabulary as well as the SKOS Core Vocabulary Specification.

A recent EU activity involved in documenting and managing both the metadata and content for particular vocabularies is the University of Karlsruhe's Ontology Metadata Vocabulary project. The OMV team has developed an extensive ontology and related repository for collecting and managing OWL ontologies, including extensions for mappings across ontologies, multilinguality, and so forth. Practitioners may find the OMV core ontology particularly useful as a basis for in situ vocabulary documentation. Another example of a well-documented resource is Wordnet. A number of extensions, including a plugin for Protege, an RDF version, an ontology and related semantic database, a Lucene index, tools that calculate semantic similarity, etc., are available from the Wordnet site.

*** Vit: paragpraph on the importance and different means of documenting "vocabulary change", in addition to documenting the vocabulary itself, supported by discussion on each of two reference implementations. Outline:

Further discussion on vocabulary documentation can be found in [Recipes], under Requirements.

3. Articulate Maintenance Policies

An RDF vocabulary may be developed in private by a closed community and published without the need for consideration of future change. An RDF vocabulary may, on the other hand, be developed in a more public setting, potentially by an open community, with the content of the vocabulary being allowed to evolve indefinitely. Regardless, potential users need to know under what circumstances the vocabulary (or parts of it) may change, and the kinds of changes that may be expected.

The key concept here is "stability". When a potential user chooses a vocabulary, they are making an investment of time/money/effort that depends to a certain extent upon the stability of that vocabulary. Therefore, users need to know exactly how stable a vocabulary is in order to determine how much to invest. If a vocabulary is less than perfectly stable, the user needs to know exactly what may change, how it may change, and of course to be informed of changes when they do occur.

With that as background, it is essential that RDF vocabulary publishers provide maintenance policies for every vocabulary. These policies should articulate whether or not change is allowed and the manner in which change is managed. The publisher should also provide some facility whereby users can be informed of changes as and when they are made, and provide feedback if possible. Examples of different types of vocabularies and other artifacts that have a published maintenance policy include:

*** paragraph on OWL WG effort, including tracker for issues, working group timeline, revision/publication planning (Sandro?, Ian?)

All SKOS Core terms use a www.w3.org namespace, and so inherit the commitments to URI persistence made by the W3C. The SKOS Core vocabulary maintenance policy implements the following principles:

As the semantics of a SKOS term are refined in response to deployment and testing, the term passes through the following stages: "unstable", "testing", "stable". "unstable" is roughly analagous to "alpha" release in software development, "testing" is analagous to "beta" release. Once a term is stable, it may be relied upon not to change further. A stable term may be deprecated, in which case a description of the deprecated term will be maintained indefinitely.

*** ask Alistair to check the above and revise as appropriate given latest work, context ***

*** Vit: paragraph on policies implemented or explicitly identified by respondents of the ontology versioning survey (brief report avaliable [http://smile.deri.ie/resources/2007/09/ovs-eval.pdf here])***

4. Identify Versions

Where a vocabulary is allowed to change, users developing systems based on that vocabulary may prefer to work to a stationary, rather than moving, target. To support these users, the developers/maintainers of a vocabulary should:

Where the resources that are the members of a vocabulary may evolve independently, or be at differing levels of stability, the developers/maintainers may also which to allocate URIs to historical versions of a particular resource.

*** F2F-01-2007: Discuss and display various methods of versioning and maintenance policies with reference implementations

Note that version management for RDF vocabularies and, in particular, for OWL ontologies, is an ongoing research problem, and there is no single approach that may be appropriate in all situations. (Provide background on configuration management for ontologies, e.g., Jeff Hefflin's work cited above, Protege approach, issues raised for DARPA/REAL, etc. Also need examples to include ontology versioning with dependencies / broken reasoning to make the point. Vit/Siggi point out that reasoning with change management in mind may include additional metadata to be encoded in the vocabulary as well as documentation indicating how such metadata supports doing so - a decent discussion of this with pointers to reference implementations might be useful.) *** may move this discussion to the research section, with a pointer here only ***

From Vit/Siggi, we should also discuss issues related to "human-readable" configuration / version management, as for some vocabularies, human usability is as important as machine-readability.

Vit: paragraph on version-annotation metadata elements agreed by respondent of the ontology versioning survey (brief report available [http://smile.deri.ie/resources/2007/09/ovs-eval.pdf here]). Several other paragraphs presenting basic features of the [http://semweb4j.org/site/semversion/ SemVersion] reference implementation. Outline:

5. Publish a Formal Schema

While this may seem obvious to the reader, it is important that an RDF/OWL description (document) is published at the definitive namespace URI for the vocabulary. Potential users should be clearly informed as to which is the "authoritative" RDF description of an RDF vocabulary, if more than one is available. Where the resources that are the members of an RDF vocabulary are denoted by HTTP URIs, an HTTP GET request with the header field "accept=application/rdf+xml" against that URI should return an RDF/XML serialisation of an RDF graph that includes a description of the denoted resource.

Detailed instructions on how to go about publishing a vocabulary can be found in [Recipes].

Research Topics

Additional Reading

Acknowledgements

The editors would like to thank following people contributed to this note, either initially, when an early draft was developed by the Semantic Web Best Practices and Deployment Working Group, or more recently: Tom Baker (Goettingen State and University Library), Dan Brickley (W3C), Libby Miller (Asemantics), Alistair Miles (CCL), Ralph Swick (W3C)...

Principles of Good Practice for Managing RDF Vocabularies and OWL Ontologies

W3C Editor's Draft 30 October 2007

Abstract

Status of this Document

Table of Contents

Introduction