Today the W3C Dataset Exchange Working Group (DXWG) published version 2 of the Data Catalog (DCAT) vocabulary as a W3C “Recommendation”. DCAT gives people and machines a specific and domain-independent approach to create catalogs that express the core elements of a dataset description in a standardized way that is suitable for publication on the Web, and enables cross-domain interoperability by being used either on its own or alongside, as a complement to other data catalog standards. Thanks to this, DCAT facilitates effective search and retrieval, and permits easy scaling up of the query process either through “frictionless” aggregation of dataset descriptions and catalog records from many different sources and domains, or by applying the same query across multiple catalogs and aggregating the results. These patterns can also be varied slightly so as to provide communities with tailored approaches to the dataset catalog that respect the specific nuances of a particular type of data.
Version 2 builds on the initial work published in 2014 by providing, among other things, classes of descriptors that can be used for data services, and a wider set of relationships characterizing datasets and their temporal and spatial aspects. It also removes the constraints that were inherent in the prescribed use of some vocabulary terms for relationships (properties) that were present in its original version, so making their usage pattern more flexible.
Although the expectation is that dataset publishers will want to revise their existing catalogs, in line with their general activities of curation and update to make use of the additional features available in version 2, compatibility between the new version and the earlier version of the DCAT vocabulary has been preserved.
The WG has also made an effort in (i) providing multilingual descriptions of the different terms and properties, facilitating their application across the world; and (ii) explaining the alignment to the Schema.org vocabulary, which is the metadata set most widely used by search engines to optimize the indexing of Web content, and now increasingly being adopted also in data catalogs.
Within just a few years from its first release in 2014, DCAT has become recognised as a key interoperability standard for data catalogs in many countries and organizations. Search engine providers are using it to identify data assets to catalog, and publishers are using it to make their materials more findable. Going forward, the WG expects the incorporation of classes to describe data services into the model will make DCAT an increasingly useful tool in data science and provide a well-trodden path for those implementing the FAIR Principles to follow.
The DXWG appreciates hearing about any implementations of catalogs using DCAT v2. We would also like to know about any errors that you find or problems that you experience so that these can be fed into the ongoing management of version 2, and potentially influence changes to be made in version 3, whose work has just started. You can provide feedback on errors or difficulties you experience with DCAT v2 to the WG either by email to email@example.com or through the dedicated errata page. For new use cases and other issues, please contact us via email or by submitting an issue in the dedicated GitHub repository. We hope that you find this standard a useful addition to your data publications.