Dataset Exchange Working Group Charter
The mission of the Dataset Exchange Working Group is to:
- Maintain and revise the Data Catalog Vocabulary, DCAT, taking into account feature requests from the DCAT user community.
- Define and publish guidance on the specification and use of application profiles when requesting and serving data on the Web.
|Charter Status||See the group status page and detailed change history.|
|Start date||28 June 2022|
|End date||30 June 2024|
Peter Winstanley, Invited Expert,
Caroline Burle, Invited Expert
|Team Contacts||Pierre-Antoine Champin (0.15 FTE)|
Teleconferences: 1-hour calls will be held every other week
Face-to-face: face-to-face meetings may be scheduled by consent of the participants, usually no more than 1 per year.
Sharing data among researchers, governments and citizens, whether openly or not, requires the provision of metadata. Different communities use different metadata standards to describe their datasets, some of which are highly specialized. At a general level W3C’s Data Catalog Vocabulary, DCAT, is in widespread use, but so too are CKAN’s native schema, schema.org's dataset description vocabulary, ISO 19115, DDI, SDMX, CERIF, VoID, INSPIRE and, in the healthcare and life sciences domain, the Dataset Description vocabulary and DATS (ref) among others. This variety is a clear indication that no single vocabulary offers a complete and universally accepted solution.
DCAT has known gaps in coverage, for example around time series and versions. DCAT has been successful and is in wide use, but these gaps must be addressed if usage is to continue to grow across different communities and the variety of metadata schemas is to reduce.
Maximizing interoperability between services such as data catalogs, e-Infrastructures and virtual research environments requires not just the use of standard vocabularies but of application profiles. These define how a vocabulary is used, for example by providing cardinality constraints and/or enumerated lists of allowed values such that data can be validated. The development of several application profiles based on DCAT, such as the European Commission's DCAT-AP is particularly noteworthy in this regard.
Rather than limit the number of metadata standards and application profiles in use, systems should be able to expose and ingest (meta)-data according to multiple standards through transparent and sustainable interfaces. We thus need a mechanism for servers to indicate the available standards and application profiles, and for clients to choose an appropriate one. This leads to the concept of content negotiation by application profile, which is orthogonal to content negotiation by data format and language that is already part of HTTP. A new Internet Draft on profile negotiation currently under development at IETF with input from the Dataset Exchange Working Group, is based on the draft presented at the SDSVoc workshop. The combination of DXWG's definition of what is meant by "application profile", together with the DXWG view of how clients and servers may interact in different ways based on these profiles, together with this external work will provide a powerful means to exchange data in any format (JSON, RDF, XML etc.) according to declared structures against which the data can be validated.
The goals of the working group are to maintain the version 2 of DCAT and extend the standard to version 3 in line with work done to date and the ongoing work on dataset exchange being undertaken by communities more generally, and to develop to a recommendation the work undertaken in the 2017-2019 charter period on content negotiation by profile.
DCAT is formulated as an RDF vocabulary and is expected to remain so, however, as with all earlier work, the working Group is agnostic about data formats. Methods for expressing DCAT in other (existing) formats are in scope.
Government data, scientific research data, industry/enterprise and cultural heritage data, whether shared openly or not, are all explicitly in scope. The working group will primarily look at cross-domain requirements.
The following documents SHOULD be considered by the Working Group as direct inputs to the specifications to be developed.
Existing Project Materials
In the course of the first charter period of the Dataset Exchange Working Group material was developed which did not get used in the documents published as recommendations and notes, these include various things on the DXWG Github and the project wiki, but particularly those Github issues marked as Future-work.
- DCAT version 2 and the HCLS Community profile
- The Data Quality and Dataset Usage vocabularies
- The Smart Data & Smarter Descriptions (SDSVoc) workshop report, in particular the section on content negotiation by application profile.
- Data on the Web Best Practices
DCAT must take account of current practice in many different communities. The following list is therefore not exhaustive.
- DCAT-AP and related work, such as DCAT-AP-NO, DCAT-AP-IT, GeoDCAT-AP, DCAT-AP.de etc.
- schema.org's dataset description vocabulary;
- Other related vocabularies such as CERIF (the Common European Research Information Format), DBpedia DataID, DDI (the Data Documentation Initiative), DataCite, Hypercat etc.
- The FAIR Principles (a community-based movement to make data assets findable, accessible, interoperable and reusable).
- Description Set Profiles (constraint language for Dublin Core Application Profiles).
Out of Scope
The following features are out of scope, and will not be addressed by this Working group.
The Dataset Exchange Working Group will not create application profiles or metadata standards that only apply to very specific domains (such as particle physics, accountancy, oncology etc.)
Updated document status is available on the group publication status page.
Draft state indicates the state of the deliverable at the time of the charter approval. Expected completion indicates when the deliverable is projected to become a Recommendation.
The Working Group will deliver the following W3C normative specifications:
- Data Catalog Vocabulary (DCAT) - Version 3
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.
An update and expansion of the current DCAT Recommendation. The new version may deprecate features, but MUST maintain backwards compatibility.
Draft state: Working Draft
Expected completion: Q1 2023
Adopted Draft: Data Catalog Vocabulary (DCAT) - Version 3, W3C Working Draft 11 January 2022
Exclusion Draft: Data Catalog Vocabulary (DCAT) - Version 3, W3C Working Draft 17 December 2021. associated Call for Exclusion on 2020-12-17 ended on 2021-05-16.
Exclusion Draft Charter: https://www.w3.org/2020/02/dx-wg-charter.html
- Content Negotiation by Profile
This document describes how Internet clients may negotiate for content provided by servers based on data profiles to which the content conforms. This is distinct from negotiating by Media Type or Language: a profile may specify the content of information returned, which may be a subset of the information the responding server has about the requested resource, and may be structured in a specific way to meet interoperability requirements of a community of practice.
Draft state: Working Draft
Expected completion: Not specified
Adopted Draft: Content Negotiation by Profile, W3C Working Draft 26 November 2019
Exclusion Draft: Content Negotiation by Profile W3C First Public Working Draft 18 December 2018. associated Call for Exclusion on 2018-12-18 ended on 2019-05-17
Exclusion Draft Charter: https://www.w3.org/2017/dxwg/charter
Other non-normative documents may be created such as:
- A use case and requirements document
- A test suite for content negotiation by application profile
- A primer on the uses of DCAT
- Guidance on publishing application profiles of vocabularies.
- Subject to its capacity, the working group may choose to develop additional relevant vocabularies in response to community demand.
- Q2 2022: Last Working Draft for DCAT 3, start wide review
- September 2022: Candidate Recommendation for DCAT 3
- Q4 2022: Proposed Recommendation for DCAT 3
- Q1 2023: Recommendation for DCAT 3
In order to advance to Proposed Recommendation, the WG should show that each term in the revised version of DCAT is used in multiple catalogs and related systems. As a minimum, evidence will be adduced that each term has been published and consumed independently at least once, although a higher number is expected for the majority of terms.
For the content negotiation by application profile specification, each fall back mechanism identified by the Working Group is expected to have two independent implementations. The DXWG is not responsible for proving implementations of the RFC defined at IETF.
Each specification should contain separate sections detailing all known security and privacy implications for implementers, Web authors, and end users.
Each specification should have a testing plan, some guide to help implementers know if they have followed the specification correctly.
For all specifications, this Working Group will seek horizontal review for accessibility, internationalization, performance, privacy, and security with the relevant Working and Interest Groups, and with the TAG. Invitation for review must be issued during each major standards-track document transition, including FPWD. The Working Group is encouraged to engage collaboratively with the horizontal review groups throughout development of each specification. The Working Group is advised to seek a review at least 3 months before first entering CR and is encouraged to proactively notify the horizontal review groups when major changes occur in a specification following a review.
Additional technical coordination with the following Groups will be made, per the W3C Process Document:
- Shape Expressions Community Group
The work of this CG is of direct relevance to the concept of application profiles.
- SHACL Community Group
This CG is continuing work on the W3C Shapes Constraint Language Recommendation. Efforts should be made to liaise with its community.
- schema.org for datasets Community Group
This CG is clearly of high relevance to the DXWG
- Open Digital Rights Language (ODRL) Community Group
Ensure that the mechanisms of the W3C ODRL Recommendation being maintained by the ODRL Community Group for machine readable permissions, obligations, licenses, rights etc. are given due consideration.
- European Commission's ISA Programme
This is the body responsible for interoperability across the EU and whose outputs include various application profiles of DCAT such as DCAT-AP, GeoDCAT-AP and StatDCAT-AP.
- Research Data Alliance
Many of the issues raised in the DXWG are of direct relevance to work at the RDA around metadata, citation and more. It is important to align and not duplicate effort.
- GO-FAIR, FORCE11, CODATA (the International Science Council's Committee on Data) and FAIRsharing.org
Findability of data assets is potentially made much easier if there are adequate and interoperable catalogs. As this is a key goal of the FAIR Principles [FAIR] we need to coordinate with communities such as FORCE11, CODATA, GO-FAIR and FAIRsharing.org.
- EPOS, the European Plate Observing System
EPOS use DCAT as the basis for their dataset catalog application profile. Their datasets bring together information on geo-hazards and those geodynamic phenomena (including geo-resources) relevant to the environment and human welfare. Datasets come from a wide, international community and so are an important route for the working group to gather input and feedback on its deliverables.
To be successful, this Working Group is expected to have 6 or more active participants for its duration, including representatives from the key implementors of this specification, and active Editors and Test Leads for each specification. The Chairs, specification Editors, and Test Leads are expected to contribute half of a working day per week towards the Working Group. There is no minimum requirement for other Participants.
The group encourages questions, comments and issues on its public mailing lists and document repositories, as described in Communication.
The group also welcomes non-Members to contribute technical submissions for consideration upon their agreement to the terms of the W3C Patent Policy.
Participants in the group are required (by the W3C Process) to follow the W3C Code of Ethics and Professional Conduct.
Technical discussions for this Working Group are conducted in public: the meeting minutes from teleconference and face-to-face meetings will be archived for public review, and technical discussions and issue tracking will be conducted in a manner that can be both read and written to by the general public. Working Drafts and Editor's Drafts of specifications will be developed in public repositories and may permit direct public contribution requests. The meetings themselves are not open to public participation, however.
Information about the group (including details about deliverables, issues, actions, status, participants, and meetings) will be available from the Dataset Exchange Working Group home page.
Most Dataset Exchange Working Group teleconferences will focus on discussion of particular specifications, and will be conducted on an as-needed basis.
This group primarily conducts its technical work: on the public mailing list email@example.com (archive). The public is invited to review, discuss and contribute to this work.
The group may use a Member-confidential mailing list for administrative purposes and, at the discretion of the Chairs and members of the group, for member-only discussions in special cases when a participant requests such a discussion.
This group will seek to make decisions through consensus and due process, per the W3C Process Document (section 5.2.1, Consensus). Typically, an editor or other participant makes an initial proposal, which is then refined in discussion with members of the group and other reviewers, and consensus emerges with little formal voting being required.
However, if a decision is necessary for timely progress and consensus is not achieved after careful consideration of the range of views presented, the Chairs may call for a group vote and record a decision along with any objections.
To afford asynchronous decisions and organizational deliberation, any resolution (including publication decisions) taken in a face-to-face meeting or teleconference will be considered provisional. A call for consensus (CfC) will be issued for all resolutions (for example, via email, GitHub issue or web-based survey), with a response period from one week to 10 working days, depending on the chair's evaluation of the group consensus on the issue. If no objections are raised by the end of the response period, the resolution will be considered to have consensus as a resolution of the Working Group.
All decisions made by the group should be considered resolved unless and until new information becomes available or unless reopened at the discretion of the Chairs or the Director.
This charter is written in accordance with the W3C Process Document (Section 5.2.3, Deciding by Vote) and includes no voting procedures beyond what the Process Document requires.
This Working Group operates under the W3C Patent Policy (Version of 15 September 2020). To promote the widest adoption of Web standards, W3C seeks to issue Web specifications that can be implemented, according to this policy, on a Royalty-Free basis. For more information about disclosure obligations for this group, please see the licensing information.
This Working Group will use the W3C Software and Document license for all its deliverables.
About this Charter
This charter has been created according to section 3.4 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.
The following table lists details of all changes from the initial charter, per the W3C Process Document (section 4.3, Advisory Committee Review of a Charter):
|Charter Period||Start Date||End Date||Changes|
DCAT 3, new document license
|Rechartered||2022-06-28||2024-06-30||Use Patent Policy 2020. Changed in Team FTE (+0.1%)|
Changes to this document are documented in this section.