Only a few years ago, governments began to open their data. Portals like data.gov and (it’s Dutch equivalent) data.overheid.nl are just two examples of the many similar services that have been established in jurisdictions around the world to make public sector data available directly on the Web. Often this is in the form of CSV files but at least some is available as linked data. As strong advocates of the publishing of data in this format, W3C set up the Government Linked Data Working Group in 2011. Its charter is divided in two parts.
First it is defining a set of Best Practices in this area — guidelines designed to make it easier for the public sector to make more of its data available as 5 star linked data. The document is evolving and, like most W3C standards, the editor’s draft is publicly visible.
Secondly, the GLD WG is chartered to standardize or, where necessary, create vocabularies that help to increase interoperability of data sets. The choice of vocabulary terms can make all the difference between data sets being interoperable and impenetrable. If you are describing a creative work the chances are you’ll use Dublin Core‘s terms for things like title and creator. But where are the equivalent stable vocabularies for public sector data?
Several exist and are in use, but governments generally require a level of reassurance on the quality and stability of vocabularies that only standards bodies can provide. It is for this reason that the Organization Ontology and Data Cube Vocabulary — both in use by the government that commissioned Epimorphics to create them (UK) — are being put through the W3C Recommendations Track. Joining these two is DCAT, the Data Catalog vocabulary, in widespread use amongst existing data portals (it’s used in CKAN for example).
Alongside this work, in September last year W3C began working with PricewaterhouseCoopers on the development of a number of vocabularies on behalf of the European Commission under it’s ISA Programme (see promotional video). Work on three ‘Core Vocabularies’ is now close to completion:
- people — a vocabulary for describing a natural person (things like name, family name, date and place of birth etc.);
- business — a vocabulary for describing legal entities as found in company registers;
- location — geographical locations and addresses, noting the importance of interoperability with the EU’s INSPIRE Directive on spatial information.
These vocabularies are being developed to a good degree of maturity by working groups comprising EU Member State representatives and others.
But in this field, as in so many others, we quickly run into the problem of discoverability. Not just discovering what vocabularies are available, but which ones are being used by others. What code lists are available and being used? What standards should I choose? To help solve this, a further important vocabulary is being developed. Called the Asset Description Metadata Standard, ADMS, it provides terms for describing these Semantic Assets — the infrastructure elements needed to make data interoperable. It’s designed for use in portals that serve public sector data publishers such as Denmark’s Digitaliser, vocab.data.gov and the EU’s own Semantic Asset portal Joinup which launched recently.
Following agreement between the W3C and the European Commission, and the chairs and members of the working group itself, ADMS and the three ISA Programme Core Vocabularies will join DCAT, the Organization Ontology and Data Cube Vocabulary as Government Linked Data WG work items. Furthermore, we expect to enrich our own RDF description of W3C TR space using ADMS in the near future.
This does not mean that W3C is rubber stamping any of this work. On the contrary, the “completed” vocabularies will be published by the GLD WG as First Public Working Drafts and, like any other draft, they may change through working group agreement, public comment and/or implementation experience. Furthermore, all four of these ISA Programme outputs are within the scope of the existing WG’s charter. In other words, the working group can achieve more of what W3C has already asked it to do by taking the ISA Programme’s work as input to its own.
What really makes this such a positive development in my mind though is that the combined efforts of a variety of public administrations look as they’re coming together to create stable, well tested infrastructure elements that will maximize the interoperability of open data sets. It’s no more than another step along the road but, I hope, an important one.
A final thought. With so much effort going in to publishing data and making it interoperable, what are people doing with it? That’s the subject of a workshop we’re running as part of another project, Crossover, in June: Using Open Data: policy modeling, citizen empowerment, data journalism.