Linking Across Provenance Bundles

W3C Working Group Note

This version:
Latest published version:
Previous version:
http://www.w3.org/TR/2013/WD-prov-links-20130312/ (color-coded diff)
Luc Moreau, University of Southampton
Timothy Lebo, Rensselaer Polytechnic Institute


Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. Bundles, defined in [PROV-DM] as sets of provenance descriptions, were introduced in PROV as the mechanism by which provenance of provenance can be expressed. Bundles, whose validity is established independently of each other [PROV-CONSTRAINTS], are essentially independent of each other, acting as islands of provenance descriptions.

In applications where provenance is created by multiple parties over time, it is useful for provenance descriptions created by one party to link to provenance descriptions created by another party. Such a mechanism would allow the "stitching" of provenance descriptions together. Given that provenance descriptions are expected to be contained in bundles, this would require a capability to link entity descriptions across bundles. To address this requirement, this document introduces a relation Mention allowing an entity description to be linked to another entity description occurring in another bundle.

The PROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

PROV Family of Documents

This document is part of the PROV family of documents, a set of documents defining various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. These documents are listed below. Please consult the [PROV-OVERVIEW] for a guide to reading these documents.

Implementations Encouraged

The Provenance Working Group encourages implementation of the material defined in this document. Although work on this document by the Provenance Working Group is complete, errors may be recorded in the errata or and these may be addressed in future revisions.

Please Send Comments

This document was published by the Provenance Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to public-prov-comments@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

The PROV data model [PROV-DM] defines provenance as a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing. The specifications [PROV-O], [PROV-DM], [PROV-N], and [PROV-XML] have respectively defined the PROV ontology, the PROV conceptual model, the PROV notation, and the PROV XML schema, allowing provenance descriptions to be expressed, represented in various ways, and interchanged between systems across the Web.

The provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. To support this, provenance itself should be trusted, and therefore, provenance of provenance is itself an important aspect of establishing trust in an information infrastructure such as the Web. To this end, PROV introduces the concept of Bundle: defined as a named set of provenance descriptions; it is a mechanism by which provenance of provenance can be expressed (see also Bundle [PROV-O], Bundle [PROV-N] and Bundle [PROV-XML]). With bundles, sets of provenance descriptions can be given names and can themselves be regarded as entities, whose provenance can in turn be described using PROV. These sets of provenance descriptions stand independently of each other, as formalized by [PROV-CONSTRAINTS] which determines their validity by examining them in isolation of each other.

In a distributed environment, it is common to encounter applications that involve multiple parties: it is a common situation that some party (a producer) creates some data and its provenance, whereas another party (a consumer) consumes the data and its provenance. In such a situation, the consumer, when it in turn generates provenance, often wants to augment the descriptions of entities generated by the producer. For the consumer, it is not suitable to repeat the provenance created by the producer, and augment it according to their need. Instead, a consumer wants to refer to the description as created by the producer in situ and specialize it, allowing the consumer to add their own view on this entity. (The notion of specialization is defined in [PROV-DM].) Such a capability would allow parties to "stitch together" provenance descriptions that would otherwise be disconnected. For this to work, this specification assume that provenance created by the producer is contained in a bundle, so that others such as the consumer, can refer to it, by means of the bundle identifier.

While URIs are the Web mechanism by which entities can be assigned identities, URIs alone are not sufficient for our purpose. Indeed, the entity produced by the producer is given a URI, but the same entity, with the same URI, could also be described in other bundles, by this producer or third parties. It is the capability of referring to the description of the entity, as created by the producer in a specific bundle, that is of interest to us in this specification.

This specification introduces a new concept Mention allowing an entity to be described as the specialization of another entity, itself described in another bundle. This specification provides not only a conceptual definition of Mention, but also the corresponding ontological, schema, and notational definitions, for the various representations of PROV. It also includes constraints that apply to this construct specifically. It is our aim to promote interoperability by defining Mention conceptually and in the representations of PROV.

The concept Mention is experimental, and for this reason was not defined in PROV recommendation-track documents. The Provenance Working Group is seeking feedback from the community on its usefulness in practical scenarios.

1.1 PROV Namespace

The PROV namespace URI is http://www.w3.org/ns/prov# and prefix prov.

All the concepts, reserved names, classes, properties, attributes introduced in this specification belong to the PROV namespace.

1.2 Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2. Conceptual Definition of Mention

An entity e1 may be mentioned in a bundle b, which contains some descriptions about this entity e1: how e1 was generated and used, which activities e1 is involved with, the agents e1 is attributed to, etc. Other bundles may contain other descriptions about the same entity e1.

Some applications may want to augment the descriptions of entity e1 found in a bundle b with other information. They cannot add these descriptions to bundle b since this would result in a different bundle. Alternatively, they may create a new bundle with descriptions from bundle b and novel descriptions, but this results in an undesirable copy of the bundle.

To this end, PROV allows a new entity e2 to be created and defined as a specialization of the preceding entity e1, and which presents at least an additional aspect: the bundle b containing some descriptions of e1. With this relation, applications that process e2 can know that the attributes of e2 may have been computed according to the descriptions of e1 in b. (The term 'aspect' should be understood informally as "a particular part or feature of something"; the term is used in [PROV-DM]'s definitions of entity (Section 5.1.1), specialization (Section 5.5.1), alternate (Section 5.5.2), and in section 2.1 of [PROV-CONSTRAINTS]).

Figure 1 depicts the relation MentionOf (concept Mention) as a ternary relation.

Figure 1 ◊: UML Diagram for Mention

Thus, a mention relates two entities with regard to a bundle. It is a special case of specialization.

The mention of an entity in a bundle (containing a description of this entity) is another entity that is a specialization of the former and that presents at least the bundle as a further additional aspect.

An entity is interpreted with respect to a bundle's description in a domain specific manner. The mention of this entity with respect to this bundle offers the opportunity to specialize it according to some domain-specific interpretation.

A mention of an entity in a bundle results in a specialization of this entity with extra fixed aspects, including the bundle that it is described in.

A mention relation, written prov:mentionOf(infra, supra, b) in PROV-N, has:

Like specialization, a mention is not, as defined here, an influence, and therefore does not have an id and attributes. Its grammar, in the provenance notation, is written as follows.

    mentionExpression    ::=    "prov:mentionOf" "(" eIdentifier "," eIdentifier "," bIdentifier ")"
    bIdentifier    ::=    identifier

The following table summarizes how each constituent of a Mention maps to a syntax element, in the provenance notation.


We note that prov:mentionOf cannot be inferred from Specialization. Indeed, let us consider a bundle and the expression specializationOf(e2,e1) occuring in this bundle. The entity e1 may be described in multiple other bundles bi. From specializationOf(e2,e1), one cannot infer prov:mentionOf(e2,e1,b) for a given b, since it is unknown which bi's descriptions were used to compute additional aspects of e2. Hence, prov:mentionOf has to be asserted.

We note that the concept Mention is an extension of the PROV data model [PROV-DM]; therefore, its textual notation has to be prefixed, prov:mentionOf, according the extensibility rules of the provenance notation [PROV-N].

Section 5. presents constraints applicable to Mention, and in particular, the fact that an entity can be a specific entity of a Mention at most once.

Example 1

This example is concerned with a performance rating tool that reads and processes provenance to determine the performance of agents. To keep the example simple, an agent's performance is determined by the duration of the activities it is associated with.

As an illustration, we consider two bundles ex:run1 and ex:run2 that refer to an agent ex:Bob that controlled two activities ex:a1 and ex:a2.

bundle ex:run1
    activity(ex:a1, 2011-11-16T16:00:00, 2011-11-16T17:00:00)  //duration: 1hour
    wasAssociatedWith(ex:a1, ex:Bob, [prov:role='ex:controller'])

bundle ex:run2
    activity(ex:a2, 2011-11-17T10:00:00, 2011-11-17T17:00:00)  //duration: 7hours
    wasAssociatedWith(ex:a2, ex:Bob, [prov:role='ex:controller'])

The performance rating tool reads these bundles, and rates the performance of the agent described in these bundles. The performance rating tool creates a new bundle tool:analysis01 containing the following. A new agent tool:Bob-2011-11-16 is declared as a mention of ex:Bob as described in bundle ex:run1, and likewise for tool:Bob-2011-11-17 with respect to ex:run2. The tool adds a domain-specific performance attribute to each of these specialized entities as follows: the performance of the agent in the first bundle is judged to be good since the duration of ex:a1 is one hour, whereas it is judged to be bad in the second bundle since ex:a2's duration is seven hours. The attribute perf:rating is an example of additional attribute of the specialized agents tool:Bob-2011-11-16 and tool:Bob-2011-11-17.

bundle tool:analysis01
    agent(tool:Bob-2011-11-16, [perf:rating="good"])
    prov:mentionOf(tool:Bob-2011-11-16, ex:Bob, ex:run1)

    agent(tool:Bob-2011-11-17, [perf:rating="bad"])
    prov:mentionOf(tool:Bob-2011-11-17, ex:Bob, ex:run2)
Example 2

Consider the following bundle of descriptions. It describes how ex:report2 was derived from ex:report1.

bundle obs:bundle1
  entity(ex:report1, [ prov:type="report", ex:version=1 ])
  wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01)
  entity(ex:report2, [ prov:type="report", ex:version=2 ])
  wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)
  wasDerivedFrom(ex:report2, ex:report1)
Bundle obs:bundle1 was attributed to agent ex:observer01, as described by the following:
entity(obs:bundle1, [ prov:type='prov:Bundle' ])
wasAttributedTo(obs:bundle1, ex:observer01)
Let us assume that bundle obs:bundle1 is rendered by a visualization tool. It may be useful for the visualization layout of this bundle to be shared along with the provenance descriptions, so that other users can render provenance as it was originally rendered. The original bundle obviously cannot be changed. However, one can create a new bundle, as follows.
bundle tool:bundle2
  entity(tool:bundle2, [ prov:type='viz:Configuration', prov:type='prov:Bundle' ])
  wasAttributedTo(tool:bundle2, viz:Visualizer)

  entity(tool:report1, [ viz:color="orange" ])
  prov:mentionOf(tool:report1, ex:report1, obs:bundle1)

  entity(tool:report2, [ viz:color="blue" ])              
  prov:mentionOf(tool:report2, ex:report2, obs:bundle1)

In bundle tool:bundle2, the prefix viz is used for naming visualization-specific attributes, types or values.

This example is typical of a common situation in distributed environments, where the consumer and producer of provenance are different.

Bundle tool:bundle2 is given type viz:Configuration to indicate that it consists of descriptions that pertain to the configuration of the visualization tool. This type attribute can be used for searching bundles containing visualization-related descriptions.

The visualization tool created new identifiers tool:report1 and tool:report2. They denote entities which are specializations of ex:report1 and ex:report2, described in bundle obs:bundle1, with visualization attribute for the color to be used when rendering these entities.

3. Ontological Definition of Mention

The properties defined in this document are included in the default namespace of PROV. Users of the ontology have the option of importing <http://www.w3.org/ns/prov#>, which includes all extensions, including PROV-Links. Additionally, the [OWL file for PROV-Links] is available for download.

The ternary relation Mention is encoded as two properties: prov:mentionOf and prov:asInBundle, defined as follows.

Property: prov:mentionOf op


prov:mentionOf is used to specialize an entity as described in another bundle. It is to be used in conjuction with prov:asInBundle.

prov:asInBundle is used to cite the Bundle in which the generalization was mentioned.

has super-properties
has domain
has range

Property: prov:asInBundle op


prov:asInBundle is used to specify which bundle the general entity of a prov:mentionOf property is described.

When :x prov:mentionOf :y and :y is described in Bundle :b, the triple :x prov:asInBundle :b is also asserted to cite the Bundle in which :y was described.

has domain
has range
Example 3

We revisit Example 1, encoding in RDF the rating of Bob in the context of the second activity. For this, we use the TRIG notation to express bundles :run2 and tool:analysis01.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix tool: <http://example.com/tool/> .
@prefix perf: <http://example.com/performance/> .
@prefix :     <http://example.com/> .

:run2 {
      a prov:Activity;
      prov:startedAtTime "2011-11-17T10:00:00"^^xsd:dateTime;
      prov:endedAtTime   "2011-11-17T17:00:00"^^xsd:dateTime; 
      prov:wasAssociatedWith :bob;

tool:analysis01 {
      a prov:Agent;
      prov:mentionOf  :bob;
      prov:asInBundle :run2;
      perf:rating     perf:very-slow;

    # This is inferred from prov:mentionOf
    tool:bob-2011-11-17 prov:specializationOf :bob . 

4. XML Schema for Mention

The [XML schema for PROV-Links] is available for download, and includes prov-core.xsd, the core schema of PROV. Alternatively, the default schema, prov.xsd can be used, which imports prov-core.xsd all extension schemas developed by the Working Group.

Type definition in XML Schema:

<xs:complexType xmlns:xs="http://www.w3.org/2001/XMLSchema" name="Mention">
    <xs:element name="specificEntity" type="prov:IDRef"/>
    <xs:element name="generalEntity" type="prov:IDRef"/>
    <xs:element name="bundle" type="prov:IDRef"/>

Usage in XML:

<xs:element xmlns:xs="http://www.w3.org/2001/XMLSchema" name="mentionOf" type="prov:Mention" substitutionGroup="prov:internalElement"/>
Example 4

  <prov:bundleContent prov:id="ex:run1">
    <prov:activity prov:id="ex:a1">

      <prov:activity prov:ref="ex:a1" />
      <prov:agent prov:ref="ex:Bob" />
      <prov:role xsi:type="xsd:QName">ex:controller</prov:role>

  <prov:bundleContent prov:id="ex:run2">
    <prov:activity prov:id="ex:a2">

      <prov:activity prov:ref="ex:a2" />
      <prov:agent prov:ref="ex:Bob" />
      <prov:role xsi:type="xsd:QName">ex:controller</prov:role>

  <prov:bundleContent prov:id="tool:analysis01">
    <prov:agent prov:id="tool:Bob-2011-11-16">

      <prov:specificEntity prov:ref="tool:Bob-2011-11-16" />
      <prov:generalEntity prov:ref="ex:Bob" />
      <prov:bundle prov:ref="ex:run1" />

    <prov:agent prov:id="tool:Bob-2011-11-17">

      <prov:specificEntity prov:ref="tool:Bob-2011-11-17" />
      <prov:generalEntity prov:ref="ex:Bob" />
      <prov:bundle prov:ref="ex:run2" />


5. Constraints associated with Mention

If one entity is a mention of another in a bundle, then the former is also a specialization of the latter:

IF mentionOf(e2,e1,b) THEN specializationOf(e2,e1).

An entity can be the subject of at most one mention relation.

IF mentionOf(e, e1, b1) and mentionOf(e, e2, b2), THEN e1=e2 and b1=b2.

A. Change Log

A.1 Change Log Since WD Working Draft 12 March 2013

A.2 Change Log Since First Public Working Draft

B. Acknowledgements

This document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole. The editors extend special thanks to Ivan Herman (W3C/ERCIM).

Thanks to Khalid Belhajjame, Tom De Nies, Graham Klyne, and Simon Miles for their reviews of the document.

Members of the PROV Working Group at the time of publication of this document were: Ilkay Altintas (Invited expert), Reza B'Far (Oracle Corporation), Khalid Belhajjame (University of Manchester), James Cheney (University of Edinburgh, School of Informatics), Sam Coppens (iMinds - Ghent University), David Corsar (University of Aberdeen, Computing Science), Stephen Cresswell (The National Archives), Tom De Nies (iMinds - Ghent University), Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland), Simon Dobson (Invited expert), Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)), Kai Eckert (Invited expert), Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER), James Frew (Invited expert), Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)), Daniel Garijo (Universidad Politécnica de Madrid), Yolanda Gil (Invited expert), Ryan Golden (Oracle Corporation), Paul Groth (Vrije Universiteit), Olaf Hartig (Invited expert), David Hau (National Cancer Institute, NCI), Sandro Hawke (W3C/MIT), Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh), Ivan Herman, (W3C/ERCIM), Ralph Hodgson (TopQuadrant), Hook Hua (Invited expert), Trung Dong Huynh (University of Southampton), Graham Klyne (University of Oxford), Michael Lang (Revelytix, Inc.), Timothy Lebo (Rensselaer Polytechnic Institute), James McCusker (Rensselaer Polytechnic Institute), Deborah McGuinness (Rensselaer Polytechnic Institute), Simon Miles (Invited expert), Paolo Missier (School of Computing Science, Newcastle university), Luc Moreau (University of Southampton), James Myers (Rensselaer Polytechnic Institute), Vinh Nguyen (Wright State University), Edoardo Pignotti (University of Aberdeen, Computing Science), Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute), Carl Reed (Open Geospatial Consortium), Adam Retter (Invited Expert), Christine Runnegar (Invited expert), Satya Sahoo (Invited expert), David Schaengold (Revelytix, Inc.), Daniel Schutzer (FSTC, Financial Services Technology Consortium), Yogesh Simmhan (Invited expert), Stian Soiland-Reyes (University of Manchester), Eric Stephan (Pacific Northwest National Laboratory), Linda Stewart (The National Archives), Ed Summers (Library of Congress), Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)), Ted Thibodeau (OpenLink Software Inc.), Curt Tilmes (National Aeronautics and Space Administration), Craig Trim (IBM Corporation), Stephan Zednik (Rensselaer Polytechnic Institute), Jun Zhao (University of Oxford), Yuting Zhao (University of Aberdeen, Computing Science).

C. References

C.1 Informative references

Graham Klyne; Paul Groth; eds. Provenance Access and Query. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-aq-20130430/
James Cheney; Paolo Missier; Luc Moreau; eds. Constraints of the PROV Data Model. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-constraints-20130430/
Daniel Garijo; Kai Eckert; eds. Dublin Core to PROV Mapping. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-dc-20130430/
Tom De Nies; Sam Coppens; eds. PROV Dictionary: Modeling Provenance for Dictionary Data Structures. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-dictionary-20130430/
Luc Moreau; Paolo Missier; eds. PROV-DM: The PROV Data Model. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-dm-20130430/
Luc Moreau; Paolo Missier; eds. PROV-N: The Provenance Notation. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-n-20130430/
Timothy Lebo; Satya Sahoo; Deborah McGuinness; eds. PROV-O: The PROV Ontology. 30 April 2013, W3C Recommendation. URL: http://www.w3.org/TR/2013/REC-prov-o-20130430/
Paul Groth; Luc Moreau; eds. PROV-OVERVIEW: An Overview of the PROV Family of Documents. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
Yolanda Gil; Simon Miles; eds. PROV Model Primer. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
James Cheney; ed. Semantics of the PROV Data Model. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-sem-20130430.
Hook Hua; Curt Tilmes; Stephan Zednik; eds. PROV-XML: The PROV XML Schema. 30 April 2013, W3C Note. URL: http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt