RDF Dataset Canonicalization and Hash Working Group Charter

The mission of the RDF Dataset Canonicalization and Hash Working Group is to define a standard to uniquely and deterministically calculate a hash of RDF Datasets for use cases such as Detecting changes in Datasets. The work will include defining RDF Dataset Canonicalization algorithms.

Join the RDF Dataset Canonicalization and Hash Working Group

Start date 21 July 2022
End date 20 July 2024
Charter extension See Change History.
Chairs Phil Archer, GS1
Markus Sabadello, Danube Tech
Team Contacts Pierre-Antoine Champin (0.15 FTE)
Meeting Schedule Teleconferences: 1 hour calls to be held weekly; extra topic-specific calls may also be held.
Face-to-face: we will meet during the W3C's annual Technical Plenary week; additional face-to-face meetings may be scheduled by consent of the participants, usually no more than 3 per year.

Scope

There are a variety of use cases that depend on the ability to calculate a unique and deterministic hash value of RDF Datasets, such as Verifiable Credentials, the publication of biological and pharmaceutical data, or consumption of mission critical RDF vocabularies that depend on the ability to verify the authenticity and integrity of the data being consumed. See the use cases for more examples. These use cases require a standard way to process the underlying graphs contained in RDF Datasets that is independent of the serialization itself.

The scope of this Working Group is to define standards to canonicalize and cryptographically hash an RDF Dataset. This includes the definition of a standard canonicalization algorithm. (See the separate explainer document for more detailed technical backgrounds and for the terminology used in this context.)

See also some publications for RDF Dataset Canonicalization that will serve as inputs for the specification work:

  1. RDF Dataset Canonicalization, Rachel Arnold, Dave Longley, Report submitted to the W3C Credentials Community Group mailing list, 2020.
  2. Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes, Aidan Hogan, ACM Trans. Web, vol. 11, no. 4, p. 22:1-22:62, 2017.
  3. A Framework for Iterative Signing of Graph Data on the Web, Andreas Kasten, Ansgar Scherp, Peter Schauß, European Semantic Web Conference — ESWC 2014, Springer Verlag, pp. 146-160, 2014.
  4. Signing RDF Graphs, Jeremy J. Carroll, International Semantic Web Conference — ISWC 2003, Springer Verlag, pp. 369-384, 2003.

Out of Scope

The following items are out of scope, and will not be addressed by this Working group:

  • Definition of new cryptographic hashing algorithms. This Working Group will only define the usage of algorithms such as, for example, BLAKE3 or SHA-3.
  • Definition of higher level protocols, like signature schemes, which is left to other groups.

Deliverables

More detailed milestones and updated publication schedules are available on the group publication status page.

Expected completion indicates when the deliverable is projected to become a Recommendation, or otherwise reach a stable state.

Note that the titles of the documents are tentative and not final. The Working Group may also decide to either split some of those documents or, conversely, merge them.

Normative Specifications

The Working Group will deliver the following W3C normative specifications:

RDF Dataset Canonicalization (RDC)

This specification defines an algorithm that implements an RDF Dataset Canonicalization function.

Expected completion: WG-START + 24 months.

RDF Dataset Hash (RDH)

This specification defines how to apply a hash function to an arbitrary RDF Dataset. These steps include the generation of a canonical form of the RDF Dataset using the algorithm specified in the “RDF Dataset Canonicalization” deliverable, and the application of a hash function over a serialization (such as N-Quads) that has been ordered via an algorithm specified by the WG.

Expected completion: WG-START + 24 months.

Other Deliverables

Other non-normative documents may be created such as:

  • Test suite and implementation report for the specification.

Timeline

  • WG-START +1 month: First teleconference
  • WG-START +3 months: FPWD for RDC and RDH
  • WG-START +15 months: CR for RDC and RDH
  • WG-START +24 months: REC for all standards-track documents

Success Criteria

In order to advance to Proposed Recommendation, each normative specification is expected to have at least two independent implementations of every feature defined in the specification.

Each specification should contain separate sections detailing all known security and privacy implications for implementers, Web authors, and end users.

There should be testing plans for each specification, starting from the earliest drafts.

To promote interoperability, all changes made to specifications should have tests.

Coordination

For all specifications, this Working Group will seek horizontal review for accessibility, internationalization, performance, privacy, and security with the relevant Working and Interest Groups, and with the TAG. Invitation for review must be issued during each major standards-track document transition, including FPWD. The Working Group is encouraged to engage collaboratively with the horizontal review groups throughout development of each specification. The Working Group is advised to seek a review at least 3 months before first entering CR and is encouraged to proactively notify the horizontal review groups when major changes occur in a specification following a review.

Additional technical coordination with the following Groups will be made, per the W3C Process Document:

W3C Groups

Verifiable Credentials Working Group
To synchronize the definition and usage of the RDF Dataset Canonicalization and Hash, both needed for the ability to provide proofs for Verifiable Credentials.
Dataset Exchange Working Group
To synchronize on the needs and requirements of dataset publications and exchange regarding canonicalization.
Web Application Security Working Group
To ensure that the canonicalization and hashing mechanisms defined in this group have similar security properties to the rest of the Web, and to take advantage of lessons learned while designing other canonicalization systems.
Web of Things Working Group
To synchronize on the needs and requirements of the WoT community, in particular on the subject of WoT Thing Descriptions, regarding canonicalization.
Credentials Community Group
Coordination on other specifications incubated and maintained the Credentials Community Group at W3C.
RDF-DEV Community Group
To synchronize on the further evolution of the RDF Standard, such as canonicalization and hash functions for Generalized or RDF-star Graphs and Datasets.

External Organizations

Internet Engineering Task Force Crypto Forum Research Group
To perform broad horizontal reviews on the output of the Working Group and to ensure that new pairing-based and post-quantum cryptographic algorithms and parameters can be integrated into the RDF Dataset Hash ecosystem.
Hyperledger Aries
To coordinate on broad horizontal reviews and implementations related to the specifications developed by the Working Group.
Decentralized Identity Foundation Interoperability Working Group
To coordinate on broad horizontal review and integration of the specifications developed by the Working Group into the Decentralized Identity Foundation's ecosystem.

Participation

To be successful, this Working Group is expected to have 6 or more active participants for its duration, including representatives from the key implementors of this specification, and active Editors and Test Leads for each specification. The Chairs, specification Editors, and Test Leads are expected to contribute half of a working day per week towards the Working Group. There is no minimum requirement for other Participants.

The group encourages questions, comments and issues on its public mailing lists and document repositories, as described in Communication.

The group also welcomes non-Members to contribute technical submissions for consideration upon their agreement to the terms of the W3C Patent Policy.

Participants in the group are required (by the W3C Process) to follow the W3C Code of Ethics and Professional Conduct.

Communication

Technical discussions for this Working Group are conducted in public: the meeting minutes from teleconference and face-to-face meetings will be archived for public review, and technical discussions and issue tracking will be conducted in a manner that can be both read and written to by the general public. Working Drafts and Editor's Drafts of specifications will be developed in public repositories and may permit direct public contribution requests. The meetings themselves are not open to public participation, however.

Information about the group (including details about deliverables, issues, actions, status, participants, and meetings) will be available from the RDF Dataset Canonicalization and Hash Working home page.

Most RDF Dataset Canonicalization and Hash Working teleconferences will focus on discussion of particular specifications, and will be conducted on an as-needed basis.

This group primarily conducts its technical work on GitHub issues. The public is invited to review, discuss and contribute to this work.

The group may use a Member-confidential mailing list for administrative purposes and, at the discretion of the Chairs and members of the group, for member-only discussions in special cases when a participant requests such a discussion.

Decision Policy

This group will seek to make decisions through consensus and due process, per the W3C Process Document (section 3.3). Typically, an editor or other participant makes an initial proposal, which is then refined in discussion with members of the group and other reviewers, and consensus emerges with little formal voting being required.

However, if a decision is necessary for timely progress and consensus is not achieved after careful consideration of the range of views presented, the Chairs may call for a group vote and record a decision along with any objections.

To afford asynchronous decisions and organizational deliberation, any resolution (including publication decisions) taken in a face-to-face meeting or teleconference will be considered provisional. A call for consensus (CfC) will be issued for all resolutions (for example, via email and/or web-based survey), with a response period from one week to 10 working days, depending on the chair's evaluation of the group consensus on the issue. If no objections are raised on the mailing list by the end of the response period, the resolution will be considered to have consensus as a resolution of the Working Group.

All decisions made by the group should be considered resolved unless and until new information becomes available or unless reopened at the discretion of the Chairs or the Director.

This charter is written in accordance with the W3C Process Document (Section 3.4, Votes) and includes no voting procedures beyond what the Process Document requires.

Patent Policy

This Working Group operates under the W3C Patent Policy (Version of 15 September 2020). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis. For more information about disclosure obligations for this group, please see the W3C Patent Policy Implementation.

Licensing

This Working Group will use the W3C Software and Document license for all its deliverables.

About this Charter

This charter has been created according to section 5.2 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.

Charter History

The following table lists details of all changes from the initial charter, per the W3C Process Document (section 5.2.3):

Charter Period Start Date End Date Changes
Initial Charter 21 July 2022 20 July 2024 none