Basic Principles for Managing an RDF Vocabulary

Based on starting point: http://www.w3.org/2001/sw/BestPractices/VM/principles/20050705

Abstract

This document articulates some basic principles of good practice for managing an RDF vocabulary. Following these principles makes an RDF vocabulary "usable": new users learn quickly how to use the vocabulary, and a relationship of trust is built between the user community and the vocabulary developers/maintainers. This promotes growth of a user community, which generates more feedback for the developers/maintainers, leading to further improvements in quality and usability.

This document focuses on those principles of good practice where a clear recommendation can be made. A number of issues related to the management of RDF vocabularies have yet to be resolved, but these are outside the scope of this document. Further, there are a number of ways to address most, if not all, of the topics highlighted below. While this document does not attempt to provide an exhaustive survey of those methodologies/approaches, it is intended to provide pointers to approaches that have worked well for seasoned practitioners.

Introduction

An RDF vocabulary is a set of resources denoted by URIs. Informally, these resources are known as the "terms" of the vocabulary. The resources will usually (but not necessarily) be of type rdf:Property, rdfs:Class, owl:Class, or skos:Concept.

An RDF vocabulary is created and maintained for the use of a community of people (the 'user community') as a set of building blocks for creating RDF descriptions of things in their domain of interest. An RDF vocabulary usually implies a shared conceptualisation, and thus the notion of an 'RDF vocabulary' is almost identical to the notion of a 'web ontology' [ref???]. (efk: consider taking this last sentence out ...)

Many controlled vocabularies have been encoded in RDF, OWL, and other knowledge representation languages, and a growing number of these are available in the public domain. A fraction of these appear to have fostered significant reuse to date, however [ref. recent discussion thread on mapping the Semantic Web]. While there are many issues that can limit reuse opportunities, a significant contributor is the lack of well-specified policies for vocabulary management, metadata, and provenance specification, depending on the application. Several of the most prominent RDF vocabularies currently in use (e.g., OWL, FOAF, Dublin Core, SKOS Core) have emerged from a close collaboration between a relatively small community of developers and a larger community of users. The prominence of these vocabularies may be attributed to their utility, but also to the commitment made by those responsible for developing/maintaining the vocabularies to forming, accomodating, serving, and working with, a community of users.

In addition to these individual vocabularies, a number of portals are emerging as 'collection points' for vocabularies designed to support users in specific domains, such as the BioPortal from NCOR (National Center for Biomedical Ontology - http://www.bioontology.org/) or specific communities, such as the OMG's Ontology Portal (coming soon to http://ontology.omg.org/). Such portals are useful for users searching for the vocabularies they serve, but also because of the significant metadata describing the ontologies that they provide. Increasingly, the metadata describing a particular vocabulary is becoming as important as the vocabulary itself for documentation and reuse purposes.

The goal of implementing the principles outlined in this document is to make an RDF vocabulary "usable". This could be restated as, managing an RDF vocabulary in such a way that it can easily be understood and deployed by users.

(paragraph on digital preservation, per email from Dan Brickley re: social responsibility, etc.)

.... [some other stuff ???]

Principles of Good Practice

1. Use URIs For Naming

An RDF vocabulary consists of a set of URIs. 'Naming' refers to the act of allocating URIs to resources [ref. RDF Semantics, http://www.w3.org/TR/rdf-mt/#urisandlit].

The developers/maintainers of an RDF vocabulary should inform the potential user of the following:

The URI space from which resource names are drawn.
The ownership of this URI space.
Any commitments made by the owner(s) of the URI space to the persistence of URIs in that space.
Policies for delegation of responsibility for allocating URIs within that space to vocabulary developers/maintainers by owners of that space.
Rules used by the developers/maintainers for constructing URIs to be used as resource names.

These practices are among the most critical for ensuring that potential users can trust that a particular RDF vocabulary is stable, will persist for some length of time, and can be either referenced or used in applications they are building.

For example, in developing applications that use the OWL-S vocabularies (http://www.w3.org/Submission/OWL-S/), there have been times when certain vocabularies on which OWL-S depends, such as vocabularies representing the names of cities and states in the US, return 404 errors (i.e., they are unavailable, and thus the corresponding applications may fail unless they have a locally cached version). This appears to be a reflection of a temporary outage, but is an issue for developers nonetheless. Additionally, the standards described by certain dependent ontologies are constantly evolving, for example, language and country codes, while the referenced ontologies themselves appear to be static and aging. The end result is that users are less confident of the availability or applicability of these dependent vocabularies and thus of OWL-S itself.

In cases where access to such "utility" vocabularies is critical for many applications, there are discussions underway with authoritative organizations regarding development and management of key vocabularies. For example, the Library of Congress is the registration authority for parts of ISO 639 (language codes) and ISO 3166 (country codes), and thus the obvious choice to manage and maintain the corresponding RDF vocabularies (http://www.loc.gov/standards/). Until such time as "all of the vocabularies we might need" become available from an authoritative source, or for any of us considering publishing vocabularies designed specifically for reuse, the minimal set of naming conventions identified above should be the starting point.

Guidelines for choosing URI namespaces, including considerations and examples to assist in the process are provided in [ref. Recipes]. In addition, a number of organizations are grappling with decisions regarding general URI schemes, such as the date-based scheme generally used by the W3C. Some communities, such as the OMG, have found that as the number of documents and communities within the broader organization grows, dates alone may not be sufficient. In order to assist potential users in finding various artifacts on the OMG site, recent proposals suggest including the higher level specification name, date-based version information, artifact type, and so forth as part of the subordinate URI scheme. Use of a simple RDF vocabulary to support this scheme and assist in navigation, once adopted, is also being discussed.

E.g.s ....

F2F-01-2007:  
This should be a hard recommendation. 
Naming conventions could be linked to here from the cookbook
Good place to mention the domain registration problem

Paragraph on current practice for Dublin Core (Tom?) Paragraph on current best practices identified through the BioPortal (Daniel / Natasha?)

2. Provide Readable Documentation

The developers/maintainers of an RDF vocabulary should provide natural-language (i.e. human-readable) documentation about the vocabulary and its proper use. The principle aim of this documentation is to help potential users *learn* how to apply the vocabulary, and therefore to promote *consistency* in the way that the vocabulary is applied. Inconsistent usage reduces the value of a vocabulary, because the meaning associated with the vocabulary becomes in practice ambiguous.

As a bare minimum, a list of the terms should be published, with text definitions. It is recommended to publish detailed prose describing proper usage patterns and scenarios, with examples. Metadata should include a description of the use-case(s) that were the basis for the original vocabulary development, intented audience and target domain, references and authoritative sources used, development and validation methodology, and so forth.

Egs.

F2F-01-2007:  
Both human readable and usage of machine readable (rdf:seeAlso) 
Could be linked to the cookbook
show examples -- dc-terms, skos vocabulary, owl wordnet

3. Articulate Maintenance Policies

An RDF vocabulary may be developed in private by a closed community, and then published with no possibility for future change. An RDF vocabulary may, on the other hand, be developed in public by an open community, with the content of the vocabulary being allowed to evolve indefinitely. In any case, a potential user needs to know under what circumstances the vocabulary (or parts of it) may change, and what kinds of change may be expected.

The key concept here is 'stability'. When a potential user chooses a vocabulary, they are making an investment of time/money/effort that depends to a certain extent upon the stability of that vocabulary. Therefore a potential user needs to know exactly how stable a vocabulary is, in order to judge how much to invest. If a vocabulary is less than perfectly stable, the user needs to know exactly what may change, how it may change, and of course to be informed of changes when they do occur.

Therefore, the developers/maintainers of an RDF vocabulary should publish a maintenance policy for that vocabulary. The maintenance policy should articulate whether or not change is allowed, and the way that change is managed.

Egs.

The developers/maintainers should also provide some facility whereby users can be informed of changes as and when they are made.

Egs.

F2F-01-2007:  
Point to some examples of different types of vocabularies 
and their maintenance and persistence policies 

Paragraph discussion (Elisa / Evan / Jishnu / Pete) on metamodel management per OMG experiences
Paragraph discussion (Alistair?) per SKOS experience

For instance... 
 http://www.w3.org/1999/10/nsuri "URIs for W3C Namespaces" 
 discusses namespace maintenance

4. Identify Versions

Where a vocabulary is allowed to change, users developing systems based on that vocabulary may prefer to work to a stationary, rather than moving, target. To support these users, the developers/maintainers of a vocabulary should:

Publish versions of the vocabulary, where a 'version' is a 'snapshot' of the vocabulary at a particular point in time.
Allocated URIs to vocabulary versions, so that they may be referred to.

Where the resources that are the members of a vocabulary may evolve independently, or be at differing levels of stability, the developers/maintainers may also which to allocate URIs to historical versions of a particular resource.

Egs.

F2F-01-2007:  
Discuss and display various methods of versioning 
and maintenance policies with reference implementations

Tim and Alistair have suggested micro-ontologies (Alistair: <http://purl.org/net/d4>) that might be articulated
for publishing the relationship between versions using OWL.

For instance... 
http://www.w3.org/2006/07/SWD/wiki/BestPracticeRecipesIssues/ServingSnapshots
http://dublincore.org/usage/terms/history/
Knowledge Web
http://ontology.buffalo.edu/bfo/Versioning.pdf
http://www3.lehigh.edu/images/userImages/jgs2/Page_3813/LU-CSE-06-026.pdf

Note that version management for RDF vocabularies and, in particular, for OWL ontologies, is an ongoing research problem, and there is no single approach that may be appropriate in all situations. (Provide background on configuration management for ontologies, e.g., Jeff Hefflin's work cited above, Protege approach, issues raised for DARPA/REAL, etc.. Also need examples to include ontology versioning with dependencies / broken reasoning to make the point. Vit/Siggi point out that reasoning with change management in mind may include additional metadata to be encoded in the vocabulary as well as documentation indicating how such metadata supports doing so -- a decent discussion of this with pointers to reference implementations might be useful.)

Might also provide an example using named graphs -- check with Jeremy for decent examples

From Vit/Siggi, we should also discuss issues related to "human-readable" configuration / version management, as for some vocabularies, human usability is as important as machine-readability.

5. Publish a Formal Schema

An RDF description of an RDF vocabulary should be published. Potential users should be clearly informed as to which is the 'authoritative' RDF description of an RDF vocabulary.

Where the resources that are the members of an RDF vocabulary are denoted by HTTP URIs, an HTTP GET request with the header field 'accept=application/rdf+xml' against that URI should return an RDF/XML serialisation of an RDF graph that includes a description of the denoted resource.

F2F-01-2007:  
we should say that it is best practice to publish an RDF/OWL document at the namespace URI. 
This may be obvious to us but evidently it's not obvious to everyone. 
Point back to Recipes document for "... and here's how"

See also: