The
Provenance
Working Groupbegan its activities with a charter naming some 17
concepts relevant to provenance, such as resource, process
execution, use, derivation, version, etc
For the first 3 months leading to our
first face to face meeting, we debated definitions for these
concepts. Importantly, for the social cohesion of the group, we
developed a common vocabulary shared by members to communicate.
Following the first face to face meeting, editors were tasked to
produce a concrete document, against which the group could formally
raise issues and make concrete proposals. In October, this document
was released as a first public working draft. We were aware of its
limitations, but it served an important purpose: it was setting the
direction and scope of the model we were proposing to
standardize.
Since then, the group has worked really hard at rationalising
concepts of the PROV data model. Key hilights include:
- introduction of the notion of responsibility, which may be
assigned to agents, for the activities they participated in
- a better characterisation of derivation, which represents, for
example, the transformation of a raw data set into linked data
- ability for the model to track how collections of data
evolved
- a relation which expresses that two different descriptions
relate in some way to a same thing in the world
- definition of a set of constraints, which allow humans and
reasoners to determine whether a set of provenance assertions makes
sense
The
third working
draft includes these changes, and we feel that the data model
has reached some level of stability, and that from now on any
release should be synchronised with
PROV ontology
definition and the
PROV primer.
At our
second face to face meeting, we debated intensively what
identifiers of the model denote. A challenge one faces with
provenance (as well as any form of metadata) is that provenance may
no longer be valid if the subject of provenance changes. To make
provenance assertions robust, a partial state of the subject has to
be characterised in terms of time and attributes, and its
provenance expressed.
However, a lot of current practice simply identifies the subject
of provenance with a URI where nothing is said about the identified
resource state. Thus, the prov-wg has decided that it will present
the data model, to support this common usage. In a separate
document, an upgrade path will be proposed: to produce a more
robust form of provenance, extra assertions can make explicit the
extent to which provenance assertions keep an interpretation when
changes in subjects occur.
Work on the fourth working draft has already begun; when
complete, I will blog again about it.