What is new in the Fourth Working Draft of the PROV provenance model?

Part of Data

Author(s) and publish date

By:
Published:

The Provenance Working Group has released the fourth public working draft of its data model. The purpose of this blog is to summarize the changes that occurred since the third working draft.

From an editorial perspective, three significant changes took place since the last release.

  1. The document has been reorganized into three separate documents. The data model document focuses on defining the vocabulary, in terms of its types and relations. A second document lists the contraints that should be checked to determine if provenance descriptions are valid. Finally, a third documentpresents the details of the PROV notation aimed at human consumption
  2. Each concept is defined with a simple English definition. A few starting points of the data model are presented early and used to illustrate the data model on an example.
  3. The types and relations of the data model are structured in a set of six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) agents bearing responsibility for entities that were generated and activities that happened; (3) derivations of entities from entities; (4) properties to link entities that refer to the same thing; (5) collections forming a logical structure for its members; (6) a simple annotation mechanism.

As far as the data model is concerned, our aim to simplify its various concepts has paid, result in a data model that is more mature and stable. Key highlights include:

  • We simplified notion of derivation (and its subypes);
  • We clarified what identifiers denote;
  • We introduced a notion of entity invalidation, an event that marks the end of an entity's lifetime;
  • We dropped the idea that accounts can be nested.

The fourth working draft includes these changes, and we feel that the data model, expressed according to various technologies, e.g. rdf, xml, json, is now usable. Examples of provenance can be expressed concisely for simple use cases, but the model is also expressive enough to tackle sophisticated ones. Tools are now being developed to manipulate PROV representations. Ultimately, the data model offers a vocabulary, consisting of 22 or so terms. The use of this vocabulary is essentially unconstrained. To help developers, a notion of valid provenance has been defined; a set of of constraints have to be satisfied for provenance assertions to be valid.

The PROV Working Group has decided to produce a synchronized release of most of its documents, including a PROV primer, and a PROV ontology. See Paul's blog for an overview of these documents.

Work on the next working draft has already begun. Our aim for the data model is to address the remaining technical issues related to provenance of provenance, further simplification of the data model. When complete, I will blog again about it.

Related RSS feed

Comments (0)

Comments for this post are closed.