Dataset Exchange Working Group update
Presenter: Peter Winstanley
Duration: 6 min
This video describes the background and ongoing work of the Dataset Exchange Working Group (DXWG), developing the 3rd version of the Data Catalog Vocabulary (DCAT), a pillar of publishing datasets on the Web.
Slides & Video
This presentation is made by the Dataset Exchange Working Group, a W3C group chartered to maintain and develop the Data Catalog Vocabulary, more popularly known by its initials “DCAT”, and also to develop a recommendation on content negotiation by profile.
This presentation will be about the current activities of the working concerning DCAT and will indicate how you can find out more and get involved.
Whenever we have complex tasks we organized our materials such that we know what we have, and where we can find it.
DCAT is fundamentally about this aspect of work.
Whilst currently focusing on data assets, DCAT in its current form provides a generalizable way to create an asset catalogue.
It is meant to be an interoperability standard.
People and organisations can use DCAT as the foundation of their catalogues to ensure that the catalogue contents can be published on the web in a re-usable form.
The history of DCAT starts in the mid-noughties and pre-dates the involvement of W3C and this working group.
There was always the close association between DCAT and the specific task of providing a standard way to share catalogs of open data on the web.
It is in the same lineage as the the FAIR Principles that are being used in many domains, including academe and government, to drive interoperability and information re-use.
DCAT focuses on the “F”, making information resources “Findable”, and the “R”, making resources “Reusable”.
The first version of DCAT, published by W3C in 2014, has at its foundation the following core concepts – firstly a catalogue.
This is a collection of asset identifiers published under a governance mechanism by a single organization.
In the first version, the only asset type considered was that of a dataset.
From the beginning, DCAT was designed recognizing that datasets are, to some extent, abstract notions and that for most practical purposes people interacted with datasets through distributions.
The same dataset could lead to a variety of published forms including machine-readable forms as well as only human-readable ones such as PDFs.
As with library record cards of old, sometimes we need to make jottings about the event of recording a dataset or distribution in our catalogue, and the catalogue record class is included for this purpose.
Over the years many organisations adopted DCAT version 1 for the publication of their data catalogues, but they didn’t do this with the unadorned DCAT data model.
They added to it to create application profiles that were tailored to the specific use cases such as national data catalogues, or other themes such as statistical or geospatial datasets.
When it came time to review the standard the lessons learned from these application profiles were brought into the thinking of the Dataset Exchange Working Group, which had now been chartered to bring DCAT up to date.
One of the core ideas the working group addressed was the need to be able to catalogue data services, and this prompted the group to make the model truly extendable by adding an abstract class “Resource” for any type of catalogable entity type.
This extension point was used now as a superclass for “Dataset” and also for “Data Service”.
Many other aspects of the application profiles were brought into DCAT through the adoption of predicates into the model to cover things like spatial or temporal resolution, user rights and obligations, and many more.
DCAT version 2 was published in 2020.
But even at that point it was clear that the user base for DCAT was getting larger.
The additions of series and versioning that are scheduled for version 3 were already apparent as requirements when version 2 was being published.
There was clearly an ongoing need for maintenance and more development work from the group.
DCAT forms a central pillar of the semantic standards used within the European Commission and member state of the European Union, and it is also used in the USA to form the basis of government open data catalogues.
The working group is moving in line with the W3C mode to shorter development cycles and is making strong progress with the development of version 3.
User communities are also well on their ways to adopt version 2, and they are open with their application profiles, SHACL constraints, and so on.
Adoption of the DCAT interoperability standard as part of an organisation’s implementation of the FAIR Principles is now way easier than in the early days of DCAT, and so we expect use to ramp up in the coming years through all sectors of society, not just academe and government.
You can contribute to the DCAT journey – visit the working group’s GitHub page and see what issues are currently under consideration.