Dataset Exchange Working Group Charter
The mission of the Dataset Exchange WG is to:
- Revise the Data Catalog Vocabulary, DCAT, taking account of related vocabularies and the extensive work done in developing a number of its application profiles.
- Define and publish guidance on the use of application profiles when requesting and serving data on the Web.
|Start date||04 May 2017|
|End date||30 June 2019|
|Chairs||Caroline Burle, NIC.br,
Karen Coyle, Dublin Core Metadata Initiative
|Team Contacts||Phil Archer (0.2 FTE), supported by the VRE4EIC project|
|Meeting Schedule|| Teleconferences: 1-hour calls will be held weekly
Face-to-face: twice per year, expected to include the W3C's annual Technical Plenary week.
Sharing data among researchers, governments and citizens, whether openly or not, requires the provision of metadata. Different communities use different metadata standards to describe their datasets, some of which are highly specialized. At a general level, W3C’s Data Catalog Vocabulary, DCAT, is in widespread use, but so too are CKAN’s native schema, schema.org's dataset description vocabulary, ISO 19115, DDI, SDMX, CERIF, VoID, INSPIRE and, in the healthcare and life sciences domain, the Dataset Description vocabulary and DATS (ref) among others.
This variety is a clear indication that no single vocabulary offers a complete and universally accepted solution. For example, catalogs increasingly provide APIs to the datasets they contain, yet the current version of DCAT lacks a way to describe these APIs as it only fully supports the discovery of static datasets. By providing a sufficiently rich description of a data API to allow the programmatic conversion to something else, the data is made more available and interoperable, and therefore more reusable. There are further gaps in DCAT that can be closed, for example around time series and versions. DCAT has been successful and is in wide use, but these gaps must be addressed if usage is to continue to grow across different communities and the variety of metadata schemas is to reduce.
Maximizing interoperability between services such as data catalogs, e-Infrastructures and virtual research environments requires not just the use of standard vocabularies but of application profiles. These define how a vocabulary is used, for example by providing cardinality constraints and/or enumerated lists of allowed values such that data can be validated. The development of several application profiles based on DCAT, such as the European Commission's DCAT-AP is particularly noteworthy in this regard.
Rather than limit the number of metadata standards and application profiles in use, systems should be able to expose and ingest (meta)-data according to multiple standards through a transparent and sustainable interface. We thus need a mechanism for servers to indicate the available standards and application profiles, and for clients to choose an appropriate one. This leads to the concept of content negotiation by application profile, which is orthogonal to content negotiation by data format and language that is already part of HTTP. It is expected that a new RFC on this topic will be developed at IETF and published in parallel with the Dataset Exchange Working Group, based on the existing draft presented at the SDSVoc workshop. Taken together, this external work and the DXWG's definition of what is meant by application profile and how clients and servers may interact in different ways based on that, will provide a powerful means to exchange data in any format (JSON, RDF, XML etc.) according to declared structures against which the data can be validated.
The goal of the working group is to extend the existing DCAT standard in line with wider practice but also to recognize and support diverse approaches to data description and Dataset Exchange more generally.
DCAT is formulated as an RDF vocabulary and is expected to remain so, however, the working Group is agnostic about data formats. Methods for expressing DCAT in other (existing) formats are in scope.
Government data, scientific research data, industry/enterprise and cultural heritage data, whether shared openly or not, are all explicitly in scope.
The following documents SHOULD be considered by the Working Group as direct inputs to the specifications to be developed.
- DCAT and the HCLS Community profile
- The Data Quality and Dataset Usage vocabularies
- The Smart Data & Smarter Descriptions (SDSVoc) workshop report, in particular the section on content negotiation by application profile.
- Data on the Web Best Practices
DCAT must take account of current practice in many different communities. The following list is therefore not exhaustive.
- DCAT-AP and related work, such as DCAT-AP-NO, DCAT-AP-IT, GeoDCAT-AP, DCAT-AP.de etc.
- schema.org's dataset description vocabulary;
- Other related vocabularies such as CERIF, DBpedia DataID, DDI, DataCite, Hypercat etc.
- FORCE 11’s FAIR Principles.
- Description Set Profiles: A constraint language for Dublin Core Application Profiles
Out of Scope
The Dataset Exchange Working Group will not create application profiles or metadata standards that only apply to very specific domains (such as particle physics, accountancy, oncology etc.)
In order to advance to Proposed Recommendation, the WG should show that each term in the revised version of DCAT is used in multiple catalogs and related systems. As a minimum, evidence will be adduced that each term has been published and consumed independently at least once, although a higher number is expected for the majority of terms.
For the content negotiation by application profile specification, each fall back mechanism identified by the Working Group is expected to have two independent implementations. The DXWG is not responsible for proving implementations of the RFC defined at IETF.
Each specification should contain a section detailing any known security or privacy implications for implementers, Web authors, and end users.
Testing plans for each specification, starting from the earliest drafts.
The Working Group will deliver the following W3C normative specifications (titles of the documents are provisional; some documents listed below may be grouped into one document or split into several, constituent documents):
- DCAT 1.1
An update and expansion of the current DCAT Recommendation. The new version may deprecate, but MUST NOT delete, any existing terms.
- Guidance on publishing application profiles of vocabularies.
A definition of what is meant by an application profile and an explanation of one or more methods for publishing and sharing them.
- Content Negotiation by Application Profile
An explanation of how to implement the expected RFC and suitable fallback mechanisms as discussed at the SDSVoc workshop.
Other non-normative documents may be created such as:
- A use case and requirement document
- A test suite for content negotiation by application profile
- A primer (subject to the WG’s capacity)
- Subject to its capacity, the working group may choose to develop additional relevant vocabularies in response to community demand.
- Use Cases and Requirements FPWD Q3-4 2017
- FPWD for DCAT 1.1 Q3-4 2017
- FPWD for Conneg by application profile Q1 2018
- CR for all Rec Track documents Q4 2018
For all specifications, this Working Group will seek horizontal review for accessibility, internationalization, performance, privacy, and security with the relevant Working and Interest Groups, and with the TAG. Invitation for review must be issued during each major standards-track document transition, including FPWD and CR, and should be issued when major changes occur in a specification.
Additional technical coordination with the following Groups will be made, per the W3C Process Document:
- Internationalization Activity
Ensure that multilinguality concerns continue to be properly reflected in DCAT revision.
- Privacy Interest Group
Ensure that privacy concerns are addressed, for example, if a dataset includes personally identifiable information.
- Web Application Security Working Group
In particular concerning the conneg by application profile spec, ensuring that no security vulnerabilities are introduced.
- Shape Expressions Community Group
The work of this CG is of direct relevance to the concept of application profiles.
- The RDF Data Shapes Working Group
This WG is expected to have completed its work shortly after the DXWG is formed, however, efforts should be made to liaise with its community.
- schema.org for datasets Community Group
This CG is clearly of high relevance to the DXWG
- Permissions & Obligations Expression WG
Ensure that the mechanisms being standardized by the POE WG for machine readable permissions, obligations, licenses, rights etc. are given due consideration for reference from DCAT 1.1.
- European Commission's ISA Programme
This is the body responsible for interoperability across the EU and whose outputs include various application profiles of DCAT such as DCAT-AP, GeoDCAT-AP and StatDCAT-AP.
- Research Data Alliance
Many of the issues raised in the DXWG are of direct relevance to work at the RDA around metadata, citation and more. It is important to align and not duplicate effort.
- bioCADDIE WG 3
This is the working group responsible for the DATS work, part of the NIH-funded bioCADDIE project
To be successful, this Working Group is expected to have 6 or more active participants for its duration, including representatives from key implementors and users (e.g, governments and research data managers) of this specification, and active Editors. The Chairs, specification Editors, and Test Leads are expected to contribute half of a day per week towards the Working Group. There is no minimum requirement for other Participants.
The group encourages questions, comments and issues on its public mailing lists and document repositories, as described in Communication.
The group also welcomes non-Members to contribute technical submissions for consideration upon their agreement to the terms of the W3C Patent Policy.
Technical discussions for this Working Group are conducted in public: the meeting minutes from teleconference and face-to-face meetings will be archived for public review, and technical discussions and issue tracking will be conducted in a manner that can be both read and written to by the general public. Working Drafts and Editor's Drafts of specifications will be developed on a public repository, and may permit direct public contribution requests. The meetings themselves are not open to public participation, however.
Information about the group (including details about deliverables, issues, actions, status, participants, and meetings) will be available from the Dataset Exchange Working Group home page.
This group primarily conducts its technical work: on the public mailing list firstname.lastname@example.org (archive). The public is invited to review, discuss and contribute to this work.
The group may use a Member-confidential mailing list for administrative purposes and, at the discretion of the Chairs and members of the group, for member-only discussions in special cases when a participant requests such a discussion.
This group will seek to make decisions through consensus and due process, per the W3C Process Document (section 3.3). Typically, an editor or other participant makes an initial proposal, which is then refined in discussion with members of the group and other reviewers, and consensus emerges with little formal voting being required.
However, if a decision is necessary for timely progress, but consensus is not achieved after careful consideration of the range of views presented, the Chairs may call for a group vote, and record a decision along with any objections.
To afford asynchronous decisions and organizational deliberation, any resolution (including publication decisions) taken in a face-to-face meeting or teleconference will be considered provisional. A call for consensus (CfC) will be issued for all resolutions (for example, via email and/or web-based survey), with a response period from one week to 10 working days, depending on the chair's evaluation of the group consensus on the issue. If no objections are raised on the mailing list by the end of the response period, the resolution will be considered to have consensus as a resolution of the Working Group.
All decisions made by the group should be considered resolved unless and until new information becomes available, or unless reopened at the discretion of the Chairs or the Director.
This charter is written in accordance with the W3C Process Document (Section 3.4, Votes), and includes no voting procedures beyond what the Process Document requires.
This Working Group operates under the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis. For more information about disclosure obligations for this group, please see the W3C Patent Policy Implementation.
This Working Group will use the W3C Document license for all its deliverables.
About this Charter
This charter has been created according to section 5.2 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.
The following table lists details of all changes from the initial charter, per the W3C Process Document (section 5.2.3):
|Charter Period||Start Date||End Date||Changes|
|Initial Charter||04 May 2017||30 June 2019||
Since AC review: