Conformance for Vocabularies?

Part of Data

Author(s) and publish date

Skip to 9 comments

Yesterday's Government Linked Data WG meeting was dedicated to one of the vocabularies we're working on: DCAT. We were joined on the call by Rufus Pollock of the Open Knowledge Foundation whose CKAN platform is a critical use case for the vocabulary.

As DCAT is on the Recommendations Track, it should include a conformance statement. W3C's Quality Assurance Framework has more detail on this but the basic idea is simple enough: it should be possible to verify that an implementation adheres to the specification. For specs like XML, HTML and CSS that's a straightforward concept and Working Groups routinely produce test suites against which implementations can be tested: if the input is A the output should be B or the behavior should be C.

But vocabulary terms don't fit this paradigm.

Let's stick with DCAT as the example. The aim is to provide a set of terms for describing datasets published in a catalog, but — and here's the killer — the use of all terms is optional. The vocabulary includes terms like creation date and the date of last modification along with things like title, publisher and license, because the working group believes them to be useful information about a dataset. OK — but what if a particular set of metadata omits, say, the creation date? Does that make the description non-conformant? No. At least, not in terms of the spec.

Individual implementations, like CKAN, are free to state that provision of a metadata term is mandatory for datasets on its platform, and/or that values for properties must be selected from pre-defined lists, but that's application specific. What we really mean by conformance to DCAT is: "if you provide the creation date for a dataset, this is the property to use, this is the property to use to provide the title of the dataset" and so on. It also means "don't invent your own terms for anything listed here." Hmm… that's rather more wooly than "given input A the output should be B."

One of the terms in DCAT is dcterms:spatial. Where a dataset applies to a specific area, this is the term to use to provide that information. But suppose the dataset publisher is a public administration in France. Is it wrong to define a new term of ex:commune and use that instead of dcterms:spatial? After all, the concept of a Commune in France is well established and has a specific meaning.

Vocabularies like Dublin Core, FOAF and do not include conformance statements at all. Should we even try to include one for DCAT and the others we're working on in the GLD WG? The gut instinct is that yes we should. The conformance statement should encourage the use of terms within the vocabulary for properties and classes that are exactly or approximately covered. But I admit, wording that conformance statement is going to be tricky and that providing a test suite seems, well, unlikely.

Related RSS feed

Comments (9)

Comments for this post are closed.