W3C's TR Space Described Using ADMS

As part of its work with the European Commission's ISA Programme, W3C has published (and will maintain) an RDF description of its Technical Reports using the ADMS data model. The RDF is available at http://www.w3.org/2012/06/tr2adms/adms.

The aim is to describe documents published under the formal W3C process as 'Semantic Assets,' defined by ADMS as:

highly reusable metadata (e.g. xml schemata, generic data models) and reference data (e.g. code lists, taxonomies, dictionaries, vocabularies) which are used for eGovernment system development.

The full data model for ADMS is shown below in the diagram that includes a lot of detail. The basic structure of Repository-Asset-Distribution is common to several vocabularies of this general type and the RADion vocabulary encapsulates this, providing a substrate for ADMS. Notice that ADMS includes cardinality constraints on several properties although many are optional.

UML Diagram of the ADMS data model

To aid discussion of how we have implemented this, we'll refer to the following example (written in Turtle):

1  @prefix : <http://www.w3.org/ns/adms#> .
2  @prefix cat: <http://www.w3.org/2012/05/cat#> .
3  @prefix data: <http://www.w3.org/data#> .
4  @prefix radion: <http://www.w3.org/ns/radion#> .
5  @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
6  @prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .
7  @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
8  @prefix dcterms: <http://purl.org/dc/terms/> .

9  <http://www.w3.org/TR/2009/REC-skos-reference-20090818/>  a :SemanticAsset, :SemanticAssetDistribution;
10   dcterms:description """This document defines the Simple Knowledge Organization System (SKOS), a common data model for sharing and linking knowledge organization systems via the Web.""";
11   dcterms:format <http://mediatypes.appspot.com/text/html>;
12   dcterms:issued "2009-08-18";
13   dcterms:license cat:DocLicense;
14   dcterms:publisher data:W3C;
15   dcterms:title "SKOS Simple Knowledge Organization System Reference";
16   dcterms:type <http://purl.org/adms/assettype/InformationExchangePackageDescription>;
17   :accessURL <http://www.w3.org/TR/2009/REC-skos-reference-20090818/>;
18   :status <http://purl.org/adms/status/Completed>;
19   xhv:last <http://www.w3.org/TR/skos-reference>;
20   xhv:previous <http://www.w3.org/TR/2009/PR-skos-reference-20090615/>;
21   rdfs:label "SKOS Simple Knowledge Organization System Reference";
22   radion:distribution <http://www.w3.org/TR/2009/REC-skos-reference-20090818/> .

According to ADMS, each W3C document is a Semantic Asset (and therefore also a radion:Asset) and the data model makes clear that the following properties are required:

The RDF schema for ADMS uses rdfs:label to provide an asset's name although, in the case of documents, the dcterms:title property feels more natural so we'll provide both (lines 15 and 21).
The description of the asset can be considered as the abstract of the W3C document and that is what is provided where it's available. This is not true for all documents in TR space but it is for many.
In line 14 we link the asset to a long-published piece of data that describes W3C.

All our Assets are also Distributions so that for each document, we assert two types in line 9 (adms:SemanticAsset and adms:SemanticAssetDistribution). Although not necessary, we assert the RADion triple that links an asset to its distribution (line 22). This is because it is likely that any query run against the data may look for the radion:distribution property and to omit it may lead to false negative results.

The cardinality constraints on a SemanticAssetDristribution mean that we need to add further triples.

Some catalogues make a distinction between the identifier for an asset and the URL from where it can be obtained. Although this does not apply to W3C where all identifiers are URIs from which the asset can be accessed directly, we need to include the adms:accessURL property for conformance with the ADMS model (line 17).
The format of the Asset must be given which for all documents in TR space is text/html. ADMS recommends using Ed Summers' work at http://mediatypes.appspot.com for this (line 11). We do not need to use the adms:representationTechnique property which is useful for providing a finer grained description of a document format than is possible through MIME types (such as "Word 6.0" cf. "Word 97" both of which have the MIME type of application/msword.
Distributions are required to declare a license. For this we refer to the W3C document license in line 13. This hasn't been described in RDF to date so we created one quickly (that no doubt could be improved). It simply says:
    cat:DocLicense a dcterms:LicenseDocument;
         dcterms:type <http://purl.org/adms/licencetype/NoDerivativeWork>;
         rdfs:label "W3C Document License"@en;
         foaf:isPrimaryTopicOf <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231> .
This creates a little class for the license that is given the label "W3C Document License" (in English) and we declare that it is the subject of the document at http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231 (which is actually the primary document itself). We also had to choose a license type from the list provided in the ADMS spec for which No Derivative Work is the appropriate one for W3C documents.
asset type
On line 16 we can see that ADMS uses dcterms:type to point to the asset type and provides a controlled vocabulary as possible enumerations. Of these, "Information Exchange Package Description" is the less than ideal but nearest acceptable value. Several of the controlled vocabularies defined within ADMS are encoded within SKOS concept schemes that have been given purl.org URLs. These currently dereference to a file at https://joinup.ec.europa.eu/svn/adms/ADMS_v1.00/ADMS_SKOS_v1.00.rdf which is seen as a temporary location (but the Purls will remain stable of course).
In similar vein to the asset type, ADMS provides a controlled vocabulary for the status of assets. These are:
  • Completed
  • Under development
  • Deprecated
  • Withdrawn
  • .

The W3C process is such that documents will fall into one of the first three of these (but not the fourth – Withdrawn). The simplest mapping is for Recommendations (completed) and both Superseded and Retired that map to Deprecated. The remainder must be classed as Under development. This is easy to understand for Working Drafts but we also have 'Notes.' These are documents produced by Working Groups, Interest Groups etc. but that have no formal standing. Therefore, they can be made obsolete at any time and therefore must count as being 'Under development' even if the relevant working group has long ago been disbanded. For this reason, and because the data is generated by algorithm, it is possible for a single document to be both Under development and Deprecated. The example shows a recommendation which is linked to the completed status on line 18.

Finally, we add in some optional properties for which we have data readily available.

previous version & latest version
ADMS uses the XHTML vocabulary to record an Asset's previous, next and latest (last) version. Every document in TR space points to its immediate predecessor (if there is one) and to a latest version. This is the short URI that always points to the latest version which has a stable URI of its own. In the example, we can see that the version immediately prior to http://www.w3.org/TR/2009/REC-skos-reference-20090818 was http://www.w3.org/TR/2009/PR-skos-reference-20090615/ (line 20) and that the latest version can be accessed at http://www.w3.org/TR/skos-reference (line 19). There is no way to tell from this data whether or not the version being described is the latest version – that's an artefact of the way W3C organizes its documents – but we can be certain that dereferencing http://www.w3.org/TR/skos-reference will give us the latest version.

And that's it – for now. We publish other data on w3.org, notably about our translations and there is data available that should allow us to link the majority of TR space documents to information about the working groups that produced them and the subject matter (such as HTML, CSS, SKOS etc.). Those are details we hope to add in the near future but for now, the data at http://www.w3.org/2012/06/tr2adms/adms is, we feel, a good start.