Why we’re launching the Dataset Exchange WG

I’m delighted that the Dataset Exchange Working Group is beginning its work this week to pursue two distinct strands:

  • updating the Data Catalog vocabulary (DCAT);
  • providing a precise definition of an application profile and setting out how clients and servers can use them in content negotiation.

Within that environment, it’s likely that the WG will also look at how to create and share linksets (a.k.a. mappings) between vocabulary terms.

The DCAT vocabulary has been widely adopted but it’s clear from multiple instances and related work that DCAT lacks important features that need to be formally added. There’s also a question about exactly what its scope is. Metadata serves three primary functions:

  1. Discovery
  2. Assessment (am I allowed to use this data? Is it of sufficient quality for my purpose? What is its provenance?)
  3. Structure

These broadly map to the research data world’s FAIR Principles (Findable, Accessible, Interoperable, Reusable). How far down that list should a general purpose dataset description vocabulary go? What’s the appropriate use of schema.org cf. Dublin Core?

The concept of profiles is not new, nor is the idea that a client might use an HTTP Accept Header suggesting a preference for, say, Turtle over JSON – but what is not fully supported is a more fine grained request for, say, JSON according to schema X, or Turtle following profile Y. The new W3C WG will track the closely related work at the IETF on defining a new header and document how that new method should be used where possible and what the fall back options are.

The WG has been formed in response to the SDSVoc workshop held at the end of last year. Supported by the EU-funded VRE4EIC project, both the workshop and the WG successfully bring together two distinct communities: government data and scientific research data. This is reflected in the WG’s chairs. Caroline Burle from Nic.br is a key figure in open government data throughout South America, and Karen Coyle of the Dublin Core Metadata Initiative has unparalleled experience in library metadata, including application profiles. Noting related work around general and spatial data on the Web best practices, ODRL, tabular metadata and more, after the WG has completed its work we should be nearer the time when data can be found, assessed and reused with minimal human intervention.