W3C

Implementing ADMS and the Core Vocabularies

Since last September, W3C has been working with PricewaterhouseCoopers on a contract from the European Commission’s ISA Programme to create a number of vocabularies for use in eGovernment. I described these in a post on this blog back in March when initial work on the vocabularies was nearing completion. Theory is all well and good, but it’s implementations that count which is why to get any standard to W3C Recommendation status requires at least two interoperable, independent implementations. So what’s the implementation status of ADMS, the Person, Business and Location Core Vocabularies?

All four are being taken as inputs to the W3C’s Government Linked Data Working Group as they fit that group’s charter. The Person Core Vocabulary will be taken as input to the Terms for Describing People. A lot of work has been going on in recent years concerning geospatial information in linked data. This has lead to the creation of GeoSPARQL by the Open Geospatial Consortium and the NeoGeo vocabulary that came out of a series of ‘geovocab’ events. There are differences between these two approaches and both have their supporters. Into this mix comes the ISA Programme’s Location Core Vocabulary, developed with substantial help from the European Commission’s experts on INSPIRE . At the time of writing, discussions are under way to see what is the best way forward to ensure community buy-in, conformance with INSPIRE, and how to avoid pointless duplication of effort.

The Business Core Vocabulary is effectively a specialization of the Organisation Ontology. It provides a vocabulary for the very specific purpose of publishing company register information in an interoperable manner. An early implementation of this can be seen online now at Open Corporates (Chris Taggart was a member of the working group that developed the vocabulary). See, for example, this description of UK wiro-binding company Apple Binding, the data for which is available in multiple representations including RDF published using the Business Core Vocabulary.

ADMS – the Asset Description Metadata Schema – is designed to be used to describe code lists, taxonomies, specifications and standards. So it seems only right that we should attempt to describe W3C’s own standards using it. All the documents developed by W3C Working Groups, including drafts, Notes and, of course, Recommendations, are published under the Technical Reports section of w3.org. We provide two data services that describe TR space: an ATOM feed and an RDF description. The latter has been published and maintained since January 2002 which was very much early days in the RDF world. Thus the vocabularies used to describe TR space are perhaps showing signs of age. It’s notable, for example, that the feed uses the old Dublin Core ‘elements’ namespace rather than the more modern dcterms namespace.

So, let’s see if we can use ADMS to provide a more modern description of TR space. The systems are already in place to create the basic RDF description. We use N3 Rules, processed through Cwm, to generate the existing file, based on a number of static files. Older source files were generated by hand but the process of extracting the relevant information from newly published documents is more or less fully automated now.

Taking http://www.w3.org/2002/01/tr-automation/tr.rdf as a starting point, we need to map the various publication types (Working Draft, Candidate Recommendation etc.) to the short list of statuses that ADMS requires us to use: Completed, Under development, Deprecated, Withdrawn. Further, we want to declare that each one is a Semantic Asset (within the definition provided by ADMS). To this we add things like the date of creation (given in tr.rdf as dc:date and re-cast in the ADMS description as dcterms:created) and the previous and latest versions of documents (for which ADMS uses the previous and last properties from the XHTML vocabulary). Then we add in a set of triples that apply to all documents in TR space – the license document, publisher, format etc.

One piece of data required by ADMS but not provided in tr.rdf is a description of the asset. To fulfil that requirement, we take data from another source in the W3C system to obtain abstracts for each document (where present) and provide that as a description. The end result is a static file that describes everything in TR space using ADMS, available in both RDF/XML and Turtle serializations. All that’s left for us to do now is to add the generation of those files to our publication process to ensure that they are always up to date. I’m also hoping to add in data about the translations of each document and more information about which domain each document is from, the working group that created it and so on. That shouldn’t be too hard – take a look at the source code of our translations page and you’ll see that we already publish a lot of data about TR space!

It’s worth highlighting that for W3C this was an easy thing to do since we had most of the pieces in place already in the form of auto-generated static files. Thomas Roessler and I just had to look at what was there and put the pieces together. That’s the beauty of building a system with long term maintenance in mind.

We’re not alone in generating ADMS-conformant data feeds. Several of our fellow standards bodies and others with ‘Semantic Assets’ are doing the same as will become apparent in the near future through the European Commission’s Joinup platform. The good news is that anyone can consume this data, and other data feeds like it, to build a portal/service that provides information on code lists, taxonomies, standards and more – making it easier for publishers to find and use the most appropriate resources to help build a more interoperable datasphere.

For more information on how we interpreted the ADMS model to describe TR space, see W3C’s TR Space Described Using ADMS.