The PROV ontology – an update

The W3C Provenance working group has released a new set of working drafts for the PROV standard. In this post we present a brief overview of the PROV ontology (PROV-O) using an example from the PROV primer.
The core classes of PROV include the prov:Entity, prov:Activity and prov:Agent. Using these, one can describe the provenance of a resource in a brief, step-by-step manner. The example below, as explained in the primer, shows how the chart ex:chart1 in a fictional news article about crime figures has been made by ex:Derek, who has also composed data items ex:dataSet1 and ex:regionList. Click the image to see it full-size.

example provenance graph

An entity in PROV is a physical, digital, conceptual, or other kind of thing; real or imaginary. An entity should have some fixed aspects in order to state some provenance information about it. An activity is something that actually occurred over a period of time, and an agent is something or someone which was responsible for or otherwise associated with what happened in an activity. Entities can be attributed to agents who were responsible for their generation.
For the chart ex:chart1 we can start by stating who created the chart (prov:wasAttributedTo) and how it was created (prov:wasGeneratedBy). We can then say more by pointing out which data sources were used to create the chart, and then provide details about the provenance of these data sources, just like what we did for the chart.

@prefix prov: <> .
@prefix foaf: <> .
@prefix ex: <> .
@prefix rdfs: <> .

ex:chart1      a                      prov:Entity ;
               prov:wasGeneratedBy    ex:illustrate ;
               prov:wasAttributedTo   ex:derek .

ex:derek       a                      prov:Person, prov:Agent ;
               foaf:givenName         "Derek" ;
               foaf:mbox              <> ;
               prov:actedOnBehalfOf   ex:chartgen .

ex:chartgen    a                      prov:Organization, prov:Agent ;
               foaf:name              "Chart Generators Inc" .

ex:illustrate  a                      prov:Activity ;
               prov:used              ex:composition ;
               prov:wasAssociatedWith ex:derek .

ex:composition a                      prov:Entity ;
               prov:wasGeneratedBy    ex:compose .

ex:compose     a                      prov:Activity ;
               prov:used              ex:dataSet1, ex:regionList ;
               prov:wasAssociatedWith ex:derek .

ex:dataSet1    a                      prov:Entity .
ex:regionList  a                      prov:Entity .

PROV-O is based on an activity-driven model. It is generic and can describe any provenance where the individual steps and agents are known. If more specifics are needed, PROV-O can be used as an extension or bridging point for defining or aligning domain specific subclasses:

@prefix ext:   <>.
ext:DataSet    rdfs:subClassOf prov:Entity .
ext:Illustrate rdfs:subClassOf prov:Activity .

PROV-O allows binary relations like prov:used and prov:wasAssociatedWith to be qualified using their corresponding involvement classes, such as prov:Use and prov:Association, in order to specify additional attributes about these relations, like role, time, location or other domain-specific attributes:

ex:chart1 prov:wasGeneratedBy ex:illustrate ;
    prov:qualifiedGeneration [
       a prov:Generation ;
       prov:activity ex:illustrate ; # object of qualified wasGeneratedBy
       prov:atTime "2011-07-16T01:52:02Z"^^xsd:dateTime ;
       prov:atLocation <> ;
       ext:colours ext:red, ext:blue ;
       ext:tool <>
] .

Sometimes not enough details are known to describe complete activity-agent-entity interactions or doing so becomes too verbose. PROV-O provides options to describe some indirect entity-entity and agent-agent relations, which are also important for understanding the history of the resources or regarded as shortcuts to the above activity-driven statements. PROV includes a predefined set of such relations for common use cases, such as derivation, attribution, quotation, responsibility, specialization and dictionaries.
The example below captures the core information from our earlier example, but does not show details such as how the region list was combined with the dataset.
ex:chart1 prov:wasDerivedFrom ex:dataSet1 ;
prov:tracedTo ex:regionList ;
prov:wasAttributedTo ex:derek .

For the purpose of tracking provenance, an entity in PROV has some fixed aspects as well as some changeable aspects. For instance a new crime chart created by using  an updated data set could be regarded as a new entity, ex:chart2, and have the following provenance information:

ex:chart2   prov:wasDerivedFrom ex:dataSet2 ;
            prov:wasRevisionOf  ex:chart1 .
ex:dataSet2 prov:wasRevisionOf  ex:dataSet1 .

These kind of structures in PROV allow asserters to transition from a high-level overview of an entity’s history to a granular provenance trace.
We hope that you will explore the PROV models and consider adapting the future standards in your products. For an in-depth introduction to PROV, see the PROV primer, for ontology details and the OWL file, see PROV-O, and for the underlying data model, see the PROV Data Model.
The W3C Provenance working group is seeking feedback from the wider community on the PROV working drafts. Please send any comments to (subscribe, archives) or use the Twitter hashtag #provwg.

One Response to The PROV ontology – an update

  1. Pingback: Distributed Weekly 154 — Scott Banwart's Blog