Copyright © 2011-2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
PROV-DM is a data model for provenance for building representations of the entities, people and activities involved in producing a piece of data or thing in the world. PROV-DM is domain-agnostic, but is equipped with extensibility points allowing further domain-specific and application-specific extensions to be defined. PROV-DM is accompanied by PROV-ASN, a technology-independent abstract syntax notation, which allows serializations of PROV-DM instances to be created for human consumption, which facilitates its mapping to concrete syntax, and which is used as the basis for a formal semantics.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was published by the Provenance Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-prov-wg@w3.org (subscribe, archives). All feedback is welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
For the purpose of this specification, provenance is defined as a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing in the world. In particular, the provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. In an open and inclusive environment such as the Web, users find information that is often contradictory or questionable: provenance can help those users to make trust judgments.
The idea that a single way of representing and collecting provenance could be adopted internally by all systems does not seem to be realistic today. Instead, a pragmatic approach is to consider a core data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and exchanged between systems. Heterogeneous systems can then export their provenance into such a core data model, and applications that need to make sense of provenance in heterogeneous systems can then import it, process it, and reason over it.
Thus, the vision is that different provenance-aware systems natively adopt their own model for representing their provenance, but a core provenance data model can be readily adopted as a provenance interchange model across such systems.
A set of specifications, referred to as the PROV family of specifications, define the various aspects that are necessary to achieve this vision in an inter-operable way, the first of which is this document:
The PROV-DM data model for provenance consists of a set of core concepts, and a few common relations, based on these core concepts. PROV-DM is a domain-agnotisc model, but with clear extensibility points allowing further domain-specific and application-specific extensions to be defined.
This specification also introduces PROV-ASN, an abstract syntax that is primarily aimed at human consumption. PROV-ASN allows serializations of PROV-DM instances to be written in a technology independent manner, it facilitates its mapping to concrete syntax, and it is used as the basis for a formal semantics. This specification uses instances of provenance written in PROV-ASN to illustrate the data model.
In section 2, a set of preliminaries are introduced, including concepts that underpin PROV-DM and motivations for the PROV-ASN notation.
Section 3 provides an overview of PROV-DM listing its core types and their relations.
In section 4, PROV-DM is applied to a short scenario, encoded in PROV-ASN, and illustrated graphically.
Section 5 provides the normative definition of PROV-DM and the notation PROV-ASN.
Section 6 introduces further relations offered by PROV-DM, including relations for data collections and domain-independent common relations.
Section 7 provides an interpretation of PROV-DM in terms of ordering constraints between events, and also presents a set of structural constraints to be satisfied by PROV-DM.
Section 8 summarizes PROV-DM extensibility points.
Section 9 discusses how PROV-DM can be applied to the notion of resource.
The PROV-DM namespace is http://www.w3.org/ns/prov-dm/ (TBC).
All the elements, relations, reserved names and attributes introduced in this specification belong to the PROV-DM namespace.
The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in [RFC2119].
This specification is based on a conceptualization of the world that is described in this section. In the world (whether real or not), there are things, which can be physical, digital, conceptual, or otherwise, and activities involving things.
When we talk about things in the world in natural language and even when we assign identifiers, we are often imprecise in ways that make it difficult to clearly and unambiguously report provenance: a resource with a URL may be understood as referring to a report available at that URL, the version of the report available there today, the report independent of where it is hosted over time, etc.
Hence, to accommodate different perspectives on things and their situation in the world as perceived by us, we introduce the idea of a characterized thing, which refers to a thing and its situation in the world, as characterized by someone. We then define an entity as an identifiable characterized thing. An entity fixes some aspects of a thing and its situation in the world, so that it becomes possible to express its provenance, and what causes these specific aspects to be as such. An alternative entity may fix other aspects, and its provenance may be different.
We do not assume that any characterization is more important than any other, and in fact, it is possible to describe the processing that occurred for the report to be commissioned, for individual versions to be created, for those versions to be published at the given URL, etc., each via a different entity that characterizes the report appropriately.
In the world, activities involve entities in multiple ways: consuming them, processing them, transforming them, modifying them, changing them, relocating them, using them, generating them, being associated with them, etc.
An agent is a type of entity that takes an active role in an activity such that it can be assigned some degree of responsibility for the activity taking place. This definition intentionally stays away from using concepts such as enabling, causing, initiating, affecting, etc, because any entities also enable, cause, initiate, and affect in some way the activities. So the notion of having some degree of responsibility is really what makes an agent.
Even software agents can be assigned some responsibility for the effects they have in the world, so for example if one is using a Text Editor and one's laptop crashes, then one would say that the Text Editor was responsible for crashing the laptop. If one invokes a service to buy a book, that service can be considered responsible for drawing funds from one's bank to make the purchase (the company that runs the service and the web site would also be responsible, but the point here is that we assign some measure of responsibility to software as well). So when someone models software as an agent for an activity in our model, they mean the agent has some responsibility for that activity.
PROV-DM considers agents as a type of entity so that the model can be used to represent the provenance of the agents themselves. For example, a grammarchecker software may be an agent of a document preparation activity, but itself can have a provenance record that states who its vendor is.
In this specification, the qualifier 'identifiable' is implicit whenever a reference is made to an activity, agent, or an entity.
Time is critical in the context of provenance, since it can help corroborate provenance claims. For instance, if an entity is claimed to be obtained by transforming another, then the latter must have existed before the former. If it is not the case, then there is something wrong with such a provenance claim.
Although time is critical, we should also recognize that provenance can be used in many different contexts: in a single system, across the Web, or in spatial data management, to name a few. Hence, it is a design objective of PROV-DM to minimize the assumptions about time, so that PROV-DM can be used in varied contexts.
Furthermore, consider two activities that started at the same time instant. Just by referring to that instant, we cannot distinguish which activity start we refer to. This is particularly relevant if we try to explain that the start of these activities had different reasons. We need to be able to refer to the start of an activity as a first class concept, so that we can talk about it and about its relation with respect to other similar starts.
Hence, in our conceptualization of the world, an instantaneous event, or event for short, happens in the world and marks a change in the world, in its activities and in its entities. The term "event" is commonly used in process algebra with a similar meaning. For instance, in CSP [CSP], events represent communications or interactions; they are assumed to be atomic and instantaneous.
Four kinds of instantaneous events underpin the PROV-DM data model. The activity start and activity end events demarcate the beginning and the end of activities, respectively. The entity generation and entity usage events demarcate the characterization interval for entities. More specifically:
An entity generation event is the instantaneous event that marks the final instant of an entity's creation timespan, after which it becomes available for use.
An entity usage event is the instantaneous event that marks the first instant of an entity's consumption timespan by an activity.
An activity start event is the instantaneous event that marks the instant an activity starts.
An activity end event is the instantaneous event that marks the instant an activity ends.
To allow for minimalistic clock assumptions, like Lamport [CLOCK], PROV-DM relies on a notion of relative ordering of instantaneous events, without using physical clocks. This specification assumes that a partial order exists between instantaneous events.
Specifically, follows is a partial order between instantaneous events, indicating that an instantaneous event occurs at the same time as or after another. For symmetry, precedes is defined as the inverse of follows. (Hence, these relations are reflexive and transitive.)
How such partial order is realized in practice is beyond the scope of this specification. This specification only assumes that each instantaneous event can be mapped to an instant in some form of timeline. The actual mapping is not in scope of this specification. Likewise, whether this timeline is formed of a single global timeline or whether it consists of multiple Lamport's style clocks is also beyond this specification. It is anticipated that follows and precedes correspond to some ordering over this timeline.
This specification introduces a set of "temporal interpretation" rules allowing the derivation of instantaneous event ordering constraints from provenance records. According to such temporal interpretation, provenance records must satisfy such constraints. We note that the actual verification of such ordering constraints is outside the scope of this specification.
PROV-DM also allows for time observations to be inserted in specific provenance records, for each recognized instantaneous event introduced in this specification. The presence of a time observation for a given instantaneous event fixes the mapping of this instantaneous event to the timeline. It can also help with the verification of associated ordering constraints (though, again, this verification is outside the scope of this specification).
This specification defines PROV-DM, a data model for provenance, consisting of records describing how people, entities, and activities, were involved in producing, influencing, or delivering a piece of data or a thing in the world.
This specification also relies on a language, PROV-ASN, the Provenance Abstract Syntax Notation, to express instances of that data model. For each construct of PROV-DM, a corresponding ASN expression is introduced, by way of a production in the ASN grammar.
PROV-ASN is an abstract syntax, whose goals are:
This specification provides a grammar for PROV-ASN. Each record of the PROV-DM data model is explained in terms of the production of this grammar.
The formal semantics of PROV-DM is defined at [PROV-SEM] and its encoding in the OWL2 Web Ontology Language at [PROV-O].
PROV-DM is a provenance data model designed to express representations of the world. Such representations are structured according to a set of records.
These records are relative to an asserter, and in that sense constitute assertions stating properties of the world, as represented by an asserter. Different asserters will normally contribute different records expressive different representations of the world. This specification does not define a notion of consistency between different sets of records (whether by the same asserter or different asserters). The data model provides the means to associate attribution to assertions.
The data model is designed to capture activities that happened in the past, as opposed to activities that may or will happen. However, this distinction is not formally enforced. Therefore, all PROV-DM records should be interpreted as a description of what has happened, as opposed to what may or will happen.
This specification does not prescribe the means by which an asserter arrives at records; for example, records can be composed on the basis of observations, reasoning, or any other means.
Sometimes, inferences about the world can be made from records conformant to the PROV-DM data model. When this is the case, this specification defines such inferences, allowing new provenance records to be inferred from existing ones. Hence, representations of the world can result either from direct assertions of records by asserters or from inference of new records by application of inference rules defined by this specification.
This specification includes a grammar for PROV-ASN expressed using the Extended Backus-Naur Form (EBNF) notation.
Each production rule (or production, for short) in the grammar defines one non-terminal symbol, in the form:
E ::= expression
Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:The following ER diagram provides a high level overview of the structure of PROV-DM records. Examples of provenance assertions that conform to this schema are provided in the next section.
The model includes the following elements:
A set of attribute-value pairs can be associated to elements and relations of the PROV model in order to further characterize
their nature.
The alternateOf and specializationOf relationships are used to denote that two entities
represent two alternative characterizations of the same thing, and that one of the two is a more precise characterization than the other, respectively.
The attributes role and type are pre-defined.
In addition to the kinds of record introduced in the overview figure, PROV-DM also features a notion of Account Record that allows attribution of provenance records to be expressed.
The set of relations presented here forms a core, which is further extended with additional relations, defined in Section Common Relations.
The model includes a further additional element: notes. These are also structured as sets of attribute-value pairs. Notes are used to provide additional, "free-form" information regarding any identifiable construct of the model, with no prescribed meaning. Notes are described in detail here.
Attributes and notes are the main extensibility points in the model: individual interest groups are expected to extend PROV-DM by introducing new attributes and notes as needed to address applications-specific provenance modelling requirements.
This section is non-normative.
To illustrate PROV-DM, this section presents an example encoded according to PROV-ASN. For more detailed explanations of how PROV-DM should be used, and for more examples, we refer the reader to the Provenance Primer [PROV-PRIMER].
This scenario is concerned with the evolution of a crime statistics file (referred to as e0) stored on a shared file system and which journalists Alice, Bob, Charles, David, and Edith can share and edit. The file e0 evolution can be mapped to an event line, in which we consider various events; events listed below follow each other, unless otherwise specified.
Event evt1: Alice creates (activity: a0) an empty file in /share/crime.txt. We denote this file e1.
Event evt2: Bob appends (activity: a1) the following line to /share/crime.txt:
There was a lot of crime in London last month.
We denote the revised file e2.
Event evt3: Charles emails (a2) the contents of /share/crime.txt, as an attachment, which we refer to as e4. (We specifically refer to a copy of the file that is uploaded on the mail server.)
Event evt4: David edits (activity: a3) file /share/crime.txt as follows.
There was a lot of crime in London and New York last month.
We denote the revised file e3.
Event evt5: Edith emails (activity: a4) the contents of /share/crime.txt as an attachment, referred to as e5.
Event evt6: between events evt4 and evt5, someone (unspecified) runs a grammar checker (activity: a5) on the file /share/crime.txt, using a set of grammatical rules (referred to as gr1). The file after grammatical checking is referred to as e6.
Entity Records (described in Section Entity). The file in its various forms and its copies are modelled as entity records, corresponding to multiple characterizations, as per scenario. The entity records are identified by e0, ..., e6.
entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ]) entity(e1, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="" ]) entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London last month."]) entity(e3, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London and New York last month."]) entity(e4) entity(e5) entity(e6, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London and New York last month.", ex:grammarchecked="yes"])
These entity records list attributes that have been given values during intervals delimited by events; such intervals are referred to as characterization intervals. The following table lists all entity identifiers and their corresponding characterization intervals. When the end of the characterization interval is not delimited by an event described in this scenario, it is marked by "...".
Entity Characterization Interval e0 evt1 - ... e1 evt1 - evt2 e2 evt2 - evt4 e3 evt4 - ... e4 evt3 - ... e5 evt5 - ... e6 evt6 - ...
Activity Records (described in Section Activity) represent activities in the scenario. Each activity record contains the activity identifier, a start time and a type attribute characterizing the nature of the activity.
activity(a0, 2011-11-16T16:00:00,,[prov:type="createFile"]) activity(a1, 2011-11-16T16:05:00,,[prov:type="edit"]) activity(a2, 2011-11-16T17:00:00,,[prov:type="email"]) activity(a3, 2011-11-17T09:00:00,,[prov:type="edit"]) activity(a4, 2011-11-17T09:50:00,,[prov:type="email"]) activity(a5, 2011-11-17T09:30:00, ,[prov:type="grammarcheck"])
Generation Records (described in Section Generation) represent the event at which a file is created in a specific form. Attributes are used to describe the modalities according to which a given entity is generated by a given activity. The interpretation of attributes is application specific. Illustrations of such attributes for the scenario are: no attribute is provided for e0; e2 was generated by the editor's save function; e4 can be found on the smtp port, in the attachment section of the mail message; e6 was produced on the standard output of a5. Sometimes, it is necessary to refer to generation records in other records. For those cases, we introduce identifiers such as g1 and g2 to identify the generation records; these identifiers are used in derivations introduced below to reference those specific records.
wasGeneratedBy(e0, a0) wasGeneratedBy(e1, a0, [ex:fct="create"]) wasGeneratedBy(e2, a1, [ex:fct="save"]) wasGeneratedBy(e3, a3, [ex:fct="save"]) wasGeneratedBy(g1, e4, a2, [ex:port="smtp", ex:section="attachment"]) wasGeneratedBy(g2, e5, a4, [ex:port="smtp", ex:section="attachment"]) wasGeneratedBy(e6, a5, [ex:file="stdout"])
Usage Records (described in Section Usage) represent the event by which a file is read by an activity. Likewise, attributes describe the modalities according to which the various entities are used by activities. Illustrations of such attributes are: e1 is used in the context of a1's load functionality; e2 is used by a2 in the context of its attach functionality; e3 is used on the standard input by a5. Sometimes, it is also necessary to refer to usage records in other records. To this end, for these usage records, identifiers such as u1 and u2 are introduced to identify them; these identifiers are used later in derivations introduced below to refer to these specific Usage records.
used(a1,e1,[ex:fct="load"]) used(a3,e2,[ex:fct="load"]) used(u1,a2,e2,[ex:fct="attach"]) used(u2,a4,e3,[ex:fct="attach"]) used(a5,e3,[ex:file="stdin"])
Derivation Records (described in Section Derivation Relation) express that an entity is derived from another. The first two are expressed in their compact version, whereas the following two are expressed in their full version, including the activity underpinning the derivation, and associated usage (u1, u2) and generation (g1, g2) records.
wasDerivedFrom(e2,e1) wasDerivedFrom(e3,e2) wasDerivedFrom(e4,e2,a2,g1,u1) wasDerivedFrom(e5,e3,a4,g2,u2)
specializationOf: (this relation is described in Section alternate and specialization records). The crime statistics file (e0) has various contents over its existence (e1, e2, e3); the entity records identified by e1, e2, e3 specialize e0 with an attribute content. Likewise, the one denoted by e6 specializes the record denoted by e3 with an attribute grammarchecked.
specializationOf(e1,e0) specializationOf(e2,e0) specializationOf(e3,e0) specializationOf(e6,e3)
Agent Records (described at Section Agent): the various users are represented as agents, themselves being a type of entity. Furthermore, a sixth agent is defined to be a software agent (the grammar checker).
agent(ag1, [ prov:type="prov:Person" %% xsd:QName, ex:name="Alice" ]) agent(ag2, [ prov:type="prov:Person" %% xsd:QName, ex:name="Bob" ]) agent(ag3, [ prov:type="prov:Person" %% xsd:QName, ex:name="Charles" ]) agent(ag4, [ prov:type="prov:Person" %% xsd:QName, ex:name="David" ]) agent(ag5, [ prov:type="prov:Person" %% xsd:QName, ex:name="Edith" ]) agent(ag6, [ prov:type="prov:SoftwareAgent" %% xsd:QName, ex:name="GoodEnglish" ])
Activity Assocation Records (described in Section Activity Association): the association of an agent with an activity is expressed with , and the nature of this association is described by attributes. Illustrations of such attributes include the role of the participating agent, as creator, author, communicator, and checker (role is a reserved attribute in PROV-DM).
wasAssociatedWith(a0, ag1, [prov:role="creator"]) wasAssociatedWith(a1, ag2, [prov:role="author"]) wasAssociatedWith(a2, ag3, [prov:role="communicator"]) wasAssociatedWith(a3, ag4, [prov:role="author"]) wasAssociatedWith(a4, ag5, [prov:role="communicator"])
In addition, activity a5 is associated with the grammar checker, which relied on a set of grammatical rules to perform the grammar checking. Generally, rules like these are referred to as a plan, a specific type of entity.
entity(gr1,[prov:type="prov:Plan"%% xsd:QName, ex:url="http://example.org/grammarRules.html" %% xsd:anyURI]) wasAssociatedWith(a5, ag6, gr1, [prov:role="checker"])
Finally, the software agent ag6 did not act autonomously, but was operating on behalf of a user. This chain of responsibility is captured with the Responsibility Record (described in Section Responsibility Record).
actedOnBehalfOf(ag6, ag4, a5, [prov:type="delegation"])
Provenance records can be illustrated graphically. The illustration is not intended to represent all the details of the model, but it is intended to show the essence of a set of provenance records. Therefore, it should not be seen as an alternate notation for expressing provenance.
The graphical illustration takes the form of a graph. Entities, activities and agents are represented as nodes, with oval, rectangular, and pentagonal shapes, respectively. Usage, Generation, Derivation, Activity Association, and Specialization are represented as directed edges.
Entities are layed out according to the ordering of their generation event. We endeavor to show time progressing from left to right. This means that edges for Usage, Generation and Derivation typically point from right to left.
This section contains the normative specification of PROV-DM core, the core of the PROV data model.
This specification introduced provenance as a set of records describing the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing in the world. PROV-DM is a data model defining the structure and meaning of such records.
Concretely, PROV-DM consists of a set of constructs to formulate representations of the world and constraints that must be satisfied by them. A PROV-DN record is a body of information about something which is of interest from a provenance viewpoint. PROV-DM records may be asserted directly or may be inferred from others.
PROV-DM records are typed and can be among the following types, introduced one by one in this section: entity record, activity record, agent record, note record, generation record, usage record, derivation record, activity association record, responsibility record, start record, end record, alternate record, specialization record, annotation record, and account record.
Furthermore, PROV-DM includes a "house-keeping construct", a record container, used to wrap PROV-DM records and facilitate their interchange. Hence, by creating a set of PROV-DM records and packaging them into a record container, one forms a provenance record.
In PROV-ASN, such representations of the world must be conformant with the toplevel production record of the grammar. These records are grouped in three categories: elementRecord (see section Element), relationRecord (see section Relation), and accountRecord (see section Account).
In PROV-ASN, a record container is compliant with the production recordContainer (see section Record Container).
This section describes all the PROV-DM records referred to as element records. (In PROV-ASN, such records are conformant to the elementRecord production of the grammar.)
In PROV-DM, an entity record is a representation of an entity.
Examples of entities include a car on a road, a linked data set, a sparse-matrix matrix of floating-point numbers, a document in a directory, the same document published on the Web, and meta-data embedded in a document.
An entity record, noted entity(id, [ attr1=val1, ...]) in PROV-ASN, contains:
The assertion of an entity record states, from a given asserter's viewpoint, the existence of an entity, whose situation in the world is represented by the attribute-value pairs, which remain unchanged during a characterization interval, i.e. a continuous interval between two instantaneous events in the world.
In PROV-ASN, an entity record's text matches the entityRecord production of the grammar defined in this specification document.
The following entity record,
entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])states the existence of an entity, denoted by identifier e0, with type File and path /shared/crime.txt in the file system, and creator alice The attributes path and creator are application specific, whereas the attribute type is reserved in the PROV-DM namespace.
Furthermore, section Activity Association Record, introduces the idea of plans being associated with activities:
In PROV-DM, an activity record is a representation of an identifiable activity, which performs a piece of work.
An activity, represented by an activity record, is delimited by its start and its end events; hence, it occurs over an interval delimited by two instantaneous events. However, an activity record need not mention time information, nor duration, because they may not be known.
If start and end times are known, they are expressed as attributes of an activity, where the interpretation of attribute in the context of an activity record is the same as the interpretation of attribute for entity record: an activity record's attribute remains constant for the duration of the activity it represents. Further characteristics of the activity in the world can be represented by other attribute-value pairs, which must also remain unchanged during the activity duration.
Examples of activities include driving a car from Boston to Cambridge, assembling a data set based on a set of measurements, performing a statistical analysis over a data set, sorting news items according to some criteria, running a sparql query over a triple store, editing a file, and publishing a web page.
An activity record, written activity(id, st, et, [ attr1=val1, ...]) in PROV-ASN, contains:
In PROV-ASN, an activity record's text matches the activityRecord production of the grammar defined in this specification document.
The following activity record
activity(a1,2011-11-16T16:05:00,2011-11-16T16:06:00, [ex:host="server.example.org",prov:type="ex:edit" %% xsd:QName])
states the existence of an activity identified by a1, start time 2011-11-16T16:05:00, and end time 2011-11-16T16:06:00, running on host server.example.org, and of type edit (declared in some namespace with prefix ex). The attribute host is application specific, but must hold for the duration of activity. The attribute type is a reserved attribute of PROV-DM, allowing for subtyping to be expressed.
Further considerations:
An agent record is a representation of an agent, which is an entity that can be assigned some degree of responsibility for an activity taking place.
Many agents can have an association with a given activity. An agent may do the ordering of the activity, another agent may do its design, another agent may push the button to start it, another agent may run it, etc. As many agents as one wishes to mention can occur in the provenance record, if it is important to indicate that they were associated with the activity.
From an inter-operability perspective, it is useful to define some basic categories of agents since it will improve the use of provenance records by applications. There should be very few of these basic categories to keep the model simple and accessible. There are three types of agents in the model since they are common across most anticipated domain of use:
These types are mutually exclusive, though they do not cover all kinds of agent.
An agent record, noted agent(id, [ attr1=val1, ...]) in PROV-ASN, contains:
In PROV-ASN, an agent record's text matches the agentRecord production of the grammar defined in this specification document.
With the following assertions,
agent(e1, [ex:employee="1234", ex:name="Alice", prov:type="prov:Person" %% xsd:QName]) entity(e2) and wasStartedBy(a1,e2,[prov:role="author"]) entity(e3) and wasAssociatedWith(a1,e3,[prov:role="sponsor"])
the agent record represents an explicit agent identified by e1 that holds irrespective of activities it may be associated with. On the other hand, from the entity records identified by e2 and e3, one can infer agent records, as per the following inference.
One can assert an agent record or alternatively, one can infer an agent record by its association with an activity.
As provenance records are exchanged between systems, it may be useful to add extra-information about such records. For instance, a "trust service" may add value-judgements about the trustworthiness of some of the assertions made. Likewise, an interactive visualization component may want to enrich a set of provenance records with information helping reproduce their visual representation. To help with inter-operability, PROV-DM introduces a simple annotation mechanism allowing any identifiable record to be associated with notes.
A note record is a set of attribute-value pairs, whose meaning is application specific. It may or may not be a representation of something in the world.
In PROV-ASN, a note record's text matches the noteRecord production of the grammar defined in this specification document.
A separate PROV-DM record is used to associate a note with an identifiable record (see Section on annotation). A given note may be associated with multiple records.
The following note record consists of a set of application-specific attribute-value pairs, intended to help the rendering of the record it is associated with, by specifying its color and its position on the screen.
note(ann1,[ex:color="blue", ex:screenX=20, ex:screenY=30]) hasAnnotation(g1,n1)
The note record is associated with a record g1 previously introduced (hasAnnotation is discussed in Section Annotation Record). In this example, the attribute-value pairs do not constitute a representation of something in the world; they are just used to help render provenance.
Attribute-value pairs occurring in notes differ from attribute-value pairs occurring in entity records and activity records. In entity and activity records, attribute-value pairs must be a representation of something in the world, which remain constant for the duration of the characterization interval (for entity record) or the activity duration (for activity records). A note record linked with an entity record consists of attribute-value pairs which may or may not represent the entity's situation in the world. If a note record's attribute-value pair represents an entity's situation in the world, no requirement is made on this situation to be unchanged for the entitys' characterization interval.
This section describes all the PROV-DM records representing relations between the elements introduced in Section Element. While these relations are not binary, they all involve two primary elements. They can be summarized as follows.
Entity | Activity | Agent | Note | |
Entity | wasDerivedFrom alternateOf specializationOf | wasGeneratedBy | — | hasAnnotation |
Activity | used | — | wasStartedBy wasEndedBy wasAssociatedWith | hasAnnotation |
Agent | — | — | actedOnBehalfOf | hasAnnotation |
Note | — | — | — | hasAnnotation |
In PROV-ASN, all these relation records are conformant to the relationRecord production of the grammar.
In PROV-DM, a generation record is a representation of an instantaneous world event, the completed creation of a new entity by an activity. This entity become available for usage after this instantaneous event. This entity did not exist before creation (though another with a different characterization may have existed before). The representation of this instantaneous event encompasses a description of the modalities of generation of this entity by this activity.
A generation event may be, for example, the completed creation of a file by a program, the completed creation of a linked data set, the completed publication of a new version of a document, and the complete sending of a value on a communication channel. The point at which creation is actually complete is application specific: generation of a file may complete when a lock is released by its creator, whereas actual publication of a document may be after the embargo period that was defined for it.
A generation record, written wasGeneratedBy(id,e,a,t,attrs) in PROV-ASN, has the following components:
In PROV-ASN, a generation record's text matches the generationRecord production of the grammar defined in this specification document.
A generation record's id is optional. It must be used when annotating generation records (see Section Annotation Record) or when defining precise-1 derivations (see Derivation Record).
The following generation assertions
wasGeneratedBy(e1,a1, 2001-10-26T21:32:52, [ex:port="p1", ex:order=1]) wasGeneratedBy(e2,a1, 2001-10-26T10:00:00, [ex:port="p1", ex:order=2])
state the existence of two events in the world (with respective times 2001-10-26T21:32:52 and 2001-10-26T10:00:00), at which new entities, represented by entity records identified by e1 and e2, are created by an activity, itself represented by an activity record identified by a1. The first one is available as the first value on port p1, whereas the other is the second value on port p1. The semantics of port and order in these records are application specific.
In some cases, we may want to record the time at which an entity was generated without having to specify the activity that generated it. To support this requirement, the activity component in generation records is optional. Hence, the following record indicates the time at which an entity is generated, without giving the activity that did it.
wasGeneratedBy(e,,2001-10-26T21:32:52)
In PROV-DM, a usage record is a representation of an instantaneous world event: an activity beginning to consume an entity. Before this event, the activity had not begun to consume or use to this entity. The representation includes a description of the modalities of usage of this entity by this activity.
A usage event may be a procedure beginning to consume a parameter, a service starting to read a value on a port, a program beginning to read a configuration file, or the point at which an ingredient, such as eggs, is being added in a baking activity. Usage may entirely consume an entity (e.g. eggs are not longer available after being added to the mix), or leave it as such, ready for further uses (e.g. a file on a file system can be read indefinitely).
A usage record, written used(id,a,e,t,attrs) in PROV-ASN, has the following constituent:
In PROV-ASN, a usage record's text matches the usageRecord production of the grammar defined in this specification document.
A usage record's id is optional, but comes handy when annotating usage records (see Section Annotation Record) or when defining derivations.
The following usage records
used(a1,e1,2011-11-16T16:00:00,[ex:parameter="p1"]) used(a1,e2,2011-11-16T16:00:01,[ex:parameter="p2"])
state that the activity, represented by the activity record identified by a1, consumed two entities, represented by entity records identified by e1 and e2, at times 2011-11-16T16:00:00 and 2011-11-16T16:00:01, respectively; the first one was found as the value of parameter p1, whereas the second was found as value of parameter p2. The semantics of parameter in these records is application specific.
A usage record's id is optional. It must be present when annotating usage records (see Section Annotation Record) or when defining precise-1 derivations (see Derivation Record).
A reference to a given entity record may appear in multiple usage records that share a given activity record identifier.
The key purpose of agents in PROV-DM is to assign responsibility for activities. It is important to reflect that there is a degree in the responsibility of agents, and that is a major reason for distinguishing among all the agents that have some association with an activity and determine which ones are really the originators of the entity. For example, a programmer and a researcher could both be associated with running a workflow, but it may not matter what programmer clicked the button to start the workflow while it would matter a lot what researcher told the programmer to do so. Another example: a student publishing a web page describing an academic department could result in both the student and the department being agents associated with the activity, and it may not matter what student published a web page but it matters a lot that the department told the student to put up the web page. So there is some notion of responsibility that needs to be captured.
Examples of activity association include designing, participation, initiation and termination, timetabling or sponsoring.
Provenance reflects activities that have occurred. In some cases, those activities reflect the execution of a plan that was designed in advance to guide the execution. PROV-DM allows attaching a plan to an activity record, which represents what was intended to happen. The plan can be useful for various tasks, for example to validate the execution as represented in the provenance record, to manage expectation failures, or to provide explanations.
In the context of PROV-DM, a plan should be understood as the description of a set of actions or steps intended by one or more agents to achieve some goal. PROV-DM is not prescriptive about the nature of plans, their representation, the actions and steps they consist of, and their intended goals. Hence, for the purpose of this specification, a plan can be a workflow for a scientific experiment, a recipe for a cooking activity, or a list of instructions for a micro-processor execution. While PROV-DM does not specify the representations of plans, it allows for activities to be associated with plans. Furthermore, since plans may evolve over time, it may become necessary to track their provenance, and hence, plans are entities. An activity may be associated with multiple plans. This allows for descriptions of activities initially associated with a plan, which was changed, on the fly, as the activity progresses. Plans can be successfully executed or they can fail. We expect applications to exploit PROV-DM extensibility mechanisms to capture the rich nature of plans and associations between activities and plans.
Thus, PROV-DM offers two kinds of records. The first, introduced in this section, represents an association between an agent, a plan, and an activity; the second, introduced in Section Responsibility record, represents the fact that an agent was acting on behalf of another, in the context of an activity.
An activity association record, written wasAssociatedWith(id,a,ag2,pl,attrs) in PROV-ASN, has the following constituents:
In PROV-ASN, an activity association record's text matches the activityAssociationRecord productions of the grammar defined in this specification document.
activity(ex:a,[prov:type="workflow execution"]) agent(ex:ag1,[prov:type="operator"]) agent(ex:ag2,[prov:type="designer"]) wasAssociatedWith(ex:a,ex:ag1,[prov:role="loggedInUser", ex:how="webapp"]) wasAssociatedWith(ex:a,ex:ag2,ex:wf,[prov:role="designer", ex:context="project1"]) entity(ex:wf,[prov:type="prov:Plan"%% xsd:QName, ex:label="Workflow 1", ex:url="http://example.org/workflow1.bpel" %% xsd:anyURI])Since the workflow ex:wf is itself an entity, its provenance can also be expressed in PROV-DM: it can be generated by some activity and derived from other entities, for instance.
A start record is a representation of an agent starting an activity. An end record is a representation of an agent ending an activity. Both relations are specialized forms of wasAssociatedWith. They contain attributes describing the modalities of acting/ending activities.
A start record, written wasStartedBy(id,a,ag,attrs) in PROV-ASN, contains:
An end record, written wasEndedBy(id,a,ag,attrs) in PROV-ASN, contains:
In PROV-ASN, start and end record's texts match the startRecord and endRecord productions of the grammar defined in this specification document.
The following assertions
wasStartedBy(a,ag,[ex:mode="manual"]) wasEndedby(a,ag,[ex:mode="manual"])
state that the activity, represented by the activity record denoted by a was started and ended by an agent, represented by record denoted by ag, in "manual" mode, an application specific characterization of these relations.
To promote take-up, PROV-DM offers a mild version of responsibility in the form of a relation to represent when an agent acted on another agent's behalf. So in the example of someone running a mail program, the program is an agent of that activity and the person is also an agent of the activity, but we would also add that the mail software agent is running on the person's behalf. In the other example, the student acted on behalf of his supervisor, who acted on behalf of the department chair, who acts on behalf of the university, and all those agents are responsible in some way for the activity to take place but we don't say explicitly who bears responsibility and to what degree.
We could also say that an agent can act on behalf of several other agents (a group of agents). This would also make possible to indirectly reflect chains of responsibility. This also indirectly reflects control without requiring that control is explicitly indicated. In some contexts there will be a need to represent responsibility explicitly, for example to indicate legal responsibility, and that could be added as an extension to this core model. Similarly with control, since in particular contexts there might be a need to define specific aspects of control that various agents exert over a given activity.
Given an activity association record wasAssociatedWith(a,ag2,attrs), a responsibility record, written actedOnBehalfOf(id,ag2,ag1,a,attrs) in PROV-ASN, has the following constituents:
activity(a,[prov:type="workflow"]) agent(ag1,[prov:type="programmer"]) agent(ag2,[prov:type="researcher"]) agent(ag3,[prov:type="funder"]) wasAssociatedWith(a,ag1,[prov:role="loggedInUser"]) wasAssociatedWith(a,ag2) actedOnBehalfOf(ag1,ag2,a,[prov:type="delegation"]) actedOnBehalfOf(ag2,ag3,a,[prov:type="contract"])
In PROV-DM, a derivation record is a representation that some entity is transformed from, created from, or affected by another entity in the world.
Examples of derivation include the transformation of a canvas into a painting, the transportation of a person from London to New York, the transformation of a relational table into a linked data set, and the melting of ice into water.
According to Section Conceptualization, for an entity to be transformed from, created from, or affected by another in some way, there must be some underpinning activities performing the necessary actions resulting in such a derivation. However, asserters may not assert or have knowledge of these activities and associated details: they may not assert or know their number, they may not assert or know their identity, they may not assert or know the attributes characterizing how the relevant entities are used or generated. To accommodate the varying circumstances of the various asserters, PROV-DM allows more or less precise records of derivation to be asserted. Hence, PROV-DM uses the terms precise and imprecise to characterize the different kinds of derivation record. We note that the derivation itself is exact (i.e., deterministic, non-probabilistic), but it is its description, expressed in a derivation record, that may be imprecise.
The lack of precision may come from two sources:
Hence, we can consider two axis. An activity number axis that has values single, multiple, and unknown, respectively representing the case where one activity is known to have occurred, more than one activities are known to have occurred, or an unknown number of activities have occurred. Likewise, we can consider another axis to cover other details (identities, generation and usage records, attributes), with values asserted and not asserted. We can then form a matrix of possible derivations. Out of the six possibilities, PROV-DM offers three forms of derivation derivation records to cater for five othem, while the remaining one is not meaningful. The following table summarises names for the three kinds of derivation, which we then explain.
other details axis | |||
asserted | not asserted | ||
activity number axis | single | precise-1 derivation record | imprecise-1 derivation record |
multiple | imprecise-n derivation record | imprecise-n derivation record | |
unknown | — |
We note that the last theoretical cases cannot occur, since asserting the details of an unknown number of activities is a contradiction.
In order to represent the number of activities in a derivation, we introduce a PROV-DM attribute steps, which can take two possible values: single and any. When prov:steps="single", derivation is due to one activity; when prov:steps="any", the number of activities is multiple or not known.
The three kinds of derivation records are successively introduced. Making use of the attribute steps, we can distinguish the various derivation types.
A precise-1 derivation record, written wasDerivedFrom(id, e2, e1, a, g2, u1, attrs) in PROV-ASN, contains:
It is optional to include the attribute prov:steps in a precise-1 derivation since the record already refers to the one and only one activity underpinning the derivation.
An imprecise-1 derivation record, written wasDerivedFrom(id, e2,e1, t, attrs) in PROV-ASN, contains:
An imprecise-1 derivation must include the attribute prov:steps, since it is the only means to distinguish this record from an imprecise-n derivation record.
An imprecise-n derivation record, written wasDerivedFrom(id, e2, e1, t, attrs) in PROV-ASN, contains:
It is optional to include the attribute prov:steps in an imprecise-n derivation record. It defaults to prov:steps="any".
None of the three kinds of derivation is defined to be transitive. Domain-specific specializations of these derivations may be defined in such a way that the transitivity property holds.
In PROV-ASN, a derivation record's text matches the derivationRecord production of the grammar defined in this specification document.
The first clause of the alternative, where the activity, generation and usage record identifiers are present formalizes a derivation record is precise-1. The second clause of the alternative, with optional time formalizes imprecise records. The distinction between imprecise-1 and imprecise-n is made by the attribute prov:steps.
The following assertions state the existence of derivations.
wasDerivedFrom(e5,e3,a4,g2,u2) wasDerivedFrom(e5,e3,a4,g2,u2,[prov:steps="single"]) wasDerivedFrom(e3,e2,[prov:steps="single"]) wasDerivedFrom(e2,e1,[]) wasDerivedFrom(e2,e1,[prov:steps="any"]) wasDerivedFrom(e2,e1,2012-01-18T16:00:00, [prov:steps="any"])
The first two are precise-1 derivation records expressing that the activity represented by the activity a4, by using the entity denoted by e3 according to usage record u2 derived the entity denoted by e5 and generated it according to generation record g2. The third record is an imprecise-1 derivation, which is similar for e3 and e2, but it leaves the activity record and associated attributes implicit. The fourth and fifth records are imprecise-n derivation records between e2 and e1, but no information is provided as to the number and identity of activities underpinning the derivation. The six derivation records extends the fifth with the derivation time of e2.
An precise-1 derivation record is richer than an imprecise-1 derivation record, itself, being more informative that an imprecise-n derivation record. Hence, the following implications hold.
The imprecise-1 derivation has the same meaning as the precise-1 derivation, except that an activity is known to exist, though it does not need to be asserted. This is formalized by the following inference rule, referred to as activity introduction:
activity(a,aAttrs) wasGeneratedBy(g,e2,a,gAttrs) used(u,a,e1,uAttrs)for sets of attribute-value pairs gAttrs, uAttrs, and aAttrs.
Note that inferring derivation from usage and generation does not hold in general. Indeed, when a generation wasGeneratedBy(g, e2, a, attrs2) precedes used(u, a, e1, attrs1), for some e1, e2, attrs1, attrs2, and a, one cannot infer derivation wasDerivedFrom(e2, e1, a, g, u) or wasDerivedFrom(e2,e1) since of e2 cannot possibly be derived from e1, given the creation of e2 precedes the use of e1.
In PROV-DM, the effective placeholder for an entity generation time is the generation record. The presence of time information in imprecise derivation records is merely a convenience notation for a timeless derivation record and a generation record with this generation time information.
A alternate record, written alternateOf(alt1, alt2, attrs) in PROV-ASN, has the following constituents:
A specialization record written specializationOf(sub, super, attrs) in PROV-ASN, has the following constituents:
An entity record identifier can optionally be accompanied by an account identifier. When this is the case, it becomes possible to use the alternateOf relation to link two entity record identifiers that are appear in different accounts. (In particular, the entity identifiers in two different account are allowed to be the same.). When account identifiers are not available, then the linking of entity records through alternateOf can only take place within the scope of a single account.
In PROV-ASN, an alternate record's text matches the alternateRecord production of the grammar defined in this specification document.
In PROV-ASN, a specialization record's text matches the specializationRecordproduction of the grammar defined in this specification document.
An annotation record establishes a link between an identifiable PROV-DM record and a note record referred to by its identifier. Multiple note records can be associated with a given PROV-DM record; symmetrically, multiple PROV-DM records can be associated with a given note record. Since note records have identifiers, they can also be annotated. The annotation mechanism (with note record and the annotation record) forms a key aspect of the extensibility mechanism of PROV-DM (see extensibility section).
An annotation record, written hasAnnotation(r,n,attrs) in PROV-ASN, has the following constituents:
In PROV-ASN, a note record's text matches the noteRecord production of the grammar defined in this specification document.
The interpretation of notes is application-specific. See Section Note for a discussion of the difference between note attributes and other records attributes. We also note the present tense in this term to indicate that it may not denote something in the past.
The following records
entity(e1,[prov:type="document"]) entity(e2,[prov:type="document"]) activity(a,t1,t2) used(u1,a,e1,[ex:file="stdin"]) wasGeneratedBy(e2, a, [ex:file="stdout"]) note(n1,[ex:icon="doc.png"]) hasAnnotation(e1,n1) hasAnnotation(e2,n1) note(n2,[ex:style="dotted"]) hasAnnotation(u1,n2)
assert the existence of two documents in the world (attribute-value pair: prov:type="document") identified by e1 and e2, and annotate these records with a note indicating that the icon (an application specific way of rendering provenance) is doc.png. It also asserts an activity, its usage of the first entity, and its generation of the second entity. The usage record is annotated with a style (an application specific way of rendering this edge graphically). To be able to express this annotation, the usage record was provided with an identifier u1, which was then referred to in hasAnnotation(u1,n2).
In this section, two constructs are introduced to group PROV-DM records. The first one, account record is itself a record, whereas the second one record container is not.
It is common for multiple provenance records to co-exist. For instance, when emailing a file, there could be a provenance record kept by the mail client, and another by the mail server. Such provenance records may provide different explanations about something happening in the world, because they are created by different parties or observed by different witnesses. A given party could also create multiple provenance records about an execution, to capture different levels of details, targeted at different end-users: the programmer of an experiment may be interested in a detailed log of execution, while the scientists may focus more on the scientific-level description. Given that multiple provenance records can co-exist, it is important to know who asserted these records.
In PROV-DM, an account record is a wrapper of records with the following purposes:
An account record, written account(id, assertIRI, recs, attrs) in PROV-ASN, contains:
In PROV-ASN, an account record's text matches the accountRecord production of the grammar defined in this specification document.
The following account record
account(ex:acc0, http://example.org/asserter, entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ]) ... wasDerivedFrom(e2,e1) ... activity(a0,t,,[prov:type="createFile"]) ... wasGeneratedBy(e0,a0) ... wasAssociatedWith(a4, ag5, [prov:role="communicator"]) )
contains the set of provenance records of section example-prov-asn-encoding, is asserted by agent http://example.org/asserter, and is identified by identifier ex:acc0.
An identifier in a record within the scope of an account is intended to denote a single record. However, nothing prevents an asserter from asserting an account containing, for example, multiple entity records with a same identifier but different attribute-values. In that case, they should be understood as a single entity record with this identifier and the union of all attributes values, as formalized in identifiable-record-in-account.
Given an entity record identifier e, two sets of attribute-values denoted by av1 and av2, two entity records entity(e,av1) and entity(e,av2) occurring in an account are equivalent to the entity record entity(e,av) where av is the set of attribute-value pairs formed by the union of av1 and av2.
This constraint similarly applies to all other types of records. As a result, the identifier that occurs in a record is unique and acts as a local identifier for that record in that account.
Whilst constraint identifiable-record-in-account specifies how to understand multiple entity records with a same identifier within a given account, it does not guarantee that the entity record formed with the union of all attribute-value pairs satisfies the attribute occurrence validity property, as illustrated by the following example.
In the following account record, we find two entity records with a same identifier e.
account(ex:acc1, http://example.org/id, entity(e,[prov:type="person", ex:age=20]) entity(e,[prov:type="person", ex:weight=50, ex:age=30]) ...)
Application of identifiable-record-in-account results in an entity record containing the attribute-value pairs age=20, weight=50, and age=30. The namespace referred to by prefix ex declares the number of occurrences that are permitted for each attribute. The resulting entity record may or may not satisfy the attribute occurrence validity, depending on this namespace. For instance, if the namespace referred to by ex declares that age must have at most one occurrence, then the resulting entity record does not satisfy the attribute occurrence validity property. This document does not specify how to handle such an entity record.
Account records can be nested since an account record can occur among the records being wrapped by another account.
Account records constitute a scope for identifier uniqueness. Since accounts can be nested, scopes can also be nested; thus, the requirement on uniqueness of identifiers should be understood in the context of such nested scopes. When a record with an identifier occurs directly within an account, then its identifier denotes this record in the scope of this account, except in sub-accounts where records with the same identifier occur.
The following account record is inspired from section example-prov-asn-encoding. This account, identified by ex:acc3, declares entity record with identifier e0, which is being referred to in the nested account ex:acc4. Identifier e0 is uniquely identify a record in account ex:acc3, including subaccount ex:acc4.
account(ex:acc3, http://example.org/asserter1, entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ]) activity(a0,t,,[prov:type="createFile"]) wasGeneratedBy(e0,a0,[]) account(ex:acc4, http://example.org/asserter2, entity(e1, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="" ]) activity(a0,t,,[prov:type="copyFile"]) wasGeneratedBy(e1,a0,[ex:fct="create"]) specializationOf(e1,e0)))
Alternatively, an activity record identified by a0 occurs in each of the two accounts. Therefore, each activity record is asserted in a separate scope, and therefore may represent different activities in the world.
The identifier of an account record is expected to be globally unique, whereas identifiers for other records are expected to be unique within the scope of the account in which their record occurs.
The account record is the hook by which further meta information can be expressed about provenance, such as asserter, time of creation, signatures. The annotation mechanism can be used for this purpose, but how general meta-information is expressed is beyond the scope of this specification, except for asserters.
A record container is a house-keeping construct of PROV-DM, also capable of bundling PROV-DM records. A record container is the root of a provenance record and can be exploited to package up PROV-DM records in response to a request for the provenance of something ([PROV-AQ]). Given that a record container is the root of a provenance record, it is not defined as a PROV-DM record (production record), since otherwise it could appear arbitrarily nested inside accounts.
A record container, written container decls recs endContainer in PROV-ASN, contains:
All the records in recs are implictly wrapped in a default account, scoping all the record identifiers they declare directly, and constituting a toplevel account, in the hierarchy of accounts. Consequently, every provenance record is always expressed in the context of an account, either explicitly in an asserted account, or implicitly in a container's default account.
In PROV-ASN, a record container's text matches the recordContainer production of the grammar defined in this specification document.
The following container contains records related to the provenance of entity e2.
container prefix ex: http://example.org/, entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London last month."]) activity(a1, 2011-11-16T16:05:00,,[prov:type="edit"]) wasGeneratedBy(e2, a1, [ex:fct="save"]) wasAssociatedWith(a1, ag2, [prov:role="author"]) agent(ag2, [ prov:type="prov:Person" %% xsd:QName, ex:name="Bob" ]) endContainer
This container could for instance be returned by querying a provenance store for the provenance of entity e2 [PROV-AQ]. All these assertions are implicitly wrapped in a default account. In the absence of an explicit account, such provenance records remain unattributed.
The following container
container prefix ex: http://example.org/, account(ex:acc1,http://example.org/asserter1,...) account(ex:acc2,http://example.org/asserter1,...) endContainer
illustrates how two accounts with identifiers ex:acc1 and ex:acc2 can be returned in a PROV-ASN serialization of the provenance of something.
An attribute is a qualified name. An qualified name is a name subject to namespace interpretation. It consists of namespace, denoted by an optional prefix, and a local name. The namespace is denoted by an IRI [IRI].
PROV-DM stipulates that a qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part (see detailed rule in [RDF-SPARQL-QUERY], Section 4.1.1).
A qualified name's prefix is optional. If a prefix occurs in a qualified name, it refers to a namespace declared in the record container. In the absence of prefix, the qualified name refers to the default namespace declared in the container.
In PROV-ASN, an attribute's text matches the attribute production of the grammar defined in this specification document.
For each attribute in a record, its namespace also declares the number of occurrences it may have in a list of attributes. The property attribute occurrence validity holds for a record if the actual number of occurrences of each attribute in this record is compatible with this attribute's declaration it its namespace. How to handle records that do not satisfy the attribute occurrence validity property is beyond the scope of this specification.
From this specification's viewpoint, the interpretation of an attribute declared in a namespace other than prov-dm is out of scope.
The PROV data model introduces a fixed set of attributes in the PROV-DM namespace:
The following start record describes the role of the agent identified by ag in this start relation with activity a.
wasStartedBy(a,ag, [prov:role="program-operator"])
The following record declares an agent of type software agent
agent(ag, [prov:type="prov:SoftwareAgent" %% xsd:QName])
The following record declares an imprecise-1 derivation, which is known to involve one activity, though its identity, usage details of ex:e1, and generation details of ex:e2 are not asserted.
wasDerivedFrom(ex:e2, ex:e1, [prov:steps="single"])
An identifier is a qualified name. A qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part (see detailed rule in [RDF-SPARQL-QUERY], Section 4.1.1).
A PROV-DM Literal represents a data value such as a particular string or number. A PROV-DM Literal represents a value whose interpretation is outside the scope of PROV-DM.
In PROV-ASN, a Literal's text matches the Literal production of the grammar defined in this specification document.
The non terminals stringLiteral and intLiteral are syntactic sugar for quoted strings with datatype xsd:string and xsd:int, respectively.
In particular, a PROV-DM Literal may be an IRI-typed string (with datatype xsd:anyURI); such IRI has no specific interpretation in the context of PROV-DM.
The following examples respectively are the string "abc" (expressed using the convenience notation), the string "abc", the integer number 1, the integer number 1 (expressed using the convenience notation) and the IRI "http://example.org/foo".
"abc" "abc" %% xsd:string "1" %% xsd:int 1 "http://example.org/foo" %% xsd:anyURIThe following example shows a literal of type xsd:QName (see QName [XMLSCHEMA-2]). The prefix ex must be bound to a namespace declared in the record container.
"ex:value" %% xsd:QName
Time instants are defined according to xsd:dateTime [XMLSCHEMA-2].
It is optional to assert time in usage, generation, and activity records.
An asserter is a creator of PROV-DM records. An asserter is denoted by an IRI. Such IRI has no specific interpretation in the context of PROV-DM.
A PROV-DM namespace is identified by an IRI reference [IRI]. In PROV-DM, attributes, identifiers, and literals of with datatype xsd:QName can be placed in a namespace using the mechanisms described in this specification.
A namespace declaration consists of a binding between a prefix and a namespace. Every qualified name with this prefix in the scope of this declaration refers to this namespace. A default namespace declaration consists of a namespace. Every unprefixed qualified name in the scope of this default namespace declaration refers to this namespace.
Location is an identifiable geographic place (ISO 19112). As such, there are numerous ways in which location can be expressed, such as by a coordinate, address, landmark, row, column, and so forth. This document does not specify how to concretely express locations, but instead provide a mechanism to introduce locations in assertions.
Location is an optional attribute of entity records and activity records. The value associated with a attribute location must be a Literal, expected to denote a location.
This section contains the normative specification of common relations of PROV-DM.
The following figure summarizes the additional relations described in subsections 6.1 to 6.7.
It is common that we may want to know who or what may have some influence, whether direct or indirect, on a given entity, or who may, directly or not, have some responsibility for a given outcome. Hence, we may want to infer such a notion from an existing set of PROV-DM records. Vice-versa, we may have knowledge of this influence and responsibility, but without knowing its actual details. Thus, we may also want to assert such a notion.
A traceability record states the existence of a "dependency path" between two entities, indicating that one entity can be shown to be in the lineage of another, and may have influenced it, or may bear some responsibility for it, in some way. The traceability relation subsumes derivation, activity association, and responsibility, and is defined to be transitive.
A traceability record, written tracedTo(id,e2,e1,attrs) in PROV-ASN, contains the following components:
In PROV-ASN, a traceability record's text matches the traceabilityRecord production of the grammar defined in this specification document.
A traceability record can be inferred from existing records, or can be asserted stating that such a dependency path exists without the asserter knowing its individual steps, as expressed by the following inference and constraint, respectively.
We note that the inference rule traceability-inference does not allow us to infer attributes, which are record and application specific.
We note that the previous constraint is not really an inference rule, since there is nothing that we can actually infer. Instead, this constraint should simply be seen as part of the definition of the traceability record.
PROV-DM allows dependencies amongst activities to be expressed. An information flow ordering record is a representation that an entity was generated by an activity, before it was used by another activity. A control ordering record is a representation that an activity was initiated by another activity.
In PROV-ASN, an activity ordering record's text matches the activityOrderingRecord production of the grammar defined in this specification document.
An information flow ordering record, written as wasInformedBy(id,a2,a1,attrs) in PROV-ASN, contains:
An information flow ordering record is formally defined as follows.
The relationship wasInformedBy is not transitive. Indeed, consider the following records.
wasInformedBy(a2,a1) wasInformedBy(a3,a2)
We cannot infer wasInformedBy(a3,a1) from them. Indeed, from wasInformedBy(a2,a1), we know that there exists e1 such that e1 was generated by a1 and used by a2. Likewise, from wasInformedBy(a3,a2), we know that there exists e2 such that e2 was generated by a2 and used by a3. The following illustration shows a case for which transitivity cannot hold. The horizontal axis represents the event line. We see that e1 was generated after e2 was used. Furthermore, the illustration also shows that a3 completes before a1. So it is impossible for a3 to have used an entity generated by a1.
A control ordering record, written as wasStartedBy(a2,a1, attrs) in PROV-ASN, contains:
Such a record states control ordering between a2 and a1, specified as follows.
We note that a start record associates an activity with an agent, and is denoted by the name wasStartedBy. A control ordering record associates an activity with another activity, also denoted by the name wasStartedBy. Effectively, by considering both record types, the relation wasStartedBy has a range formed by the union of agents and activities.
In the following assertions, we find two activity records, identified by a1 and a2, representing two activities, which took place on two separate hosts. The third record indicates that the latter activity was started by the former.
activity(a1,t1,t2,[ex:host="server1.example.org",prov:type="workflow"]) activity(a2,t3,t4,[ex:host="server2.example.org",prov:type="subworkflow"]) wasStartedBy(a2,a1)
Alternatively, we could have asserted the existence of an entity, representing a request to create a sub-workflow. This request, issued by a1, triggered the start of a2.
entity(e,[prov:type="creation-request"]) wasGeneratedBy(e,a1) wasStartedBy(a2,e)
A revision record is a representation of the creation of an entity considered to be a variant of another. Deciding whether something is made available as a revision of something else usually involves an agent who represents someone in the world who takes responsibility for approving that the former is a due variant of the latter.
A revision record, written wasRevisionOf(e2,e1,ag,attrs) in PROV-ASN, contains:
In PROV-ASN, a revision record's text matches the revisionRecord production of the grammar defined in this specification document.
A revision record needs to satisfy the following constraint, linking the two entity records by a derivation, and stating them to be a complement of a third entity record.
wasRevisionOf is a strict sub-relation of wasDerivedFrom since two entities e2 and e1 may satisfy wasDerivedFrom(e2,e1) without being a variant of each other.
The following revision assertion
agent(ag,[prov:type="QualityController"]) entity(e1,[prov:type="document"]) entity(e2,[prov:type="document"]) wasRevisionOf(e2,e1,ag)
states that the document represented by entity record identified by e2 is a revision of document represented by entity record identified by e1; agent denoted by ag is responsible for this new versioning of the document.
An attribution record represents that an entity is ascribed to an agent.
An attribution record, written wasAttributedTo(e,ag,attr) in PROV-ASN, contains the following components:
Attribution models the notion of an activity generating an entity identified by e being associated with an agent ag, which takes responsibility for generating e. Formally, this is expressed as the following necessary condition.
activity(a,t1,t2,attr1) wasGenerateBy(e,a) wasAssociatedWith(a,ag,attr2)for some sets of attribute-value pairs attr1 and attr2, time t1, and t2.
In PROV-ASN, an attribution record's text matches the attributionRecord production of the grammar.
A quotation record is a representation of the repeating or copying of some part of an entity.
A quotation record, written wasQuotedFrom(e2,e1,ag2,ag1,attrs) in PROV-ASN, contains:
In PROV-ASN, a quotation record's text matches the quotationRecord production of the grammar.
wasDerivedFrom(e2,e1) wasAttributedTo(e2,ag2) wasAttributedTo(e1,ag1)
A summary record represents that an entity (expected to be a document) is a synopsis or abbreviation of another entity (also expected to be a document).
A summary record, written wasSummaryOf(e2,e1,attrs) in PROV-ASN, contains:
wasSummaryOf is a strict sub-relation of wasDerivedFrom.
In PROV-ASN, a summary record's text matches the summaryRecord production of the grammar.
An original source record represents an entity in which another entity first appeared.
An assertion hadOriginalSource, written hadOriginalSource(e2,e1,attrs), contains:
hasOriginalSource is a strict sub-relation of wasDerivedFrom.
In PROV-ASN, an original source record's text matches the originalSourceRecord production of the grammar.
We adopt a very generic form of collection for the purpose, namely an abstract data types consisting of set of key-value pairs, often referred to as a map. This provides a generic indexing structure that can be used to model commonly used data structures, including associative lists (also known as "dictionaries" or maps in some programming languages), relational tables, ordered lists, and more (the specification of such specialized structures in terms of key-value pairs is out of the scope of this document).
Keys and values used in collections are literals. This allows expressing nested collections, that is, collections whose values include entities of type collection.
The following relations and corresponding record types are introduced to model (a) insertion of a new key-value pair into a collection and (b) removal of a key-value pair from a collection.
Because these relations state the derivation of a collection from another, formally they are specializations of the precise-1 wasDerivedFrom relation.
The following entity types are introduced:The intent of these relations and entity types is to capture the history of changes that occurred to a collection.
The following examples illustrate how these assertions are expected to be used in practice.
entity(c, [prov:type="EmptyCollection"]) // e is an empty collection entity(k1) entity(v1) entity(k2) entity(v2) entity(c1, [prov:type="Collection"]) entity(c2, [prov:type="Collection"]) CollectionAfterInsertion(c1, c, k1, v1) // c1 = { (k1,v1) } CollectionAfterInsertion(c2, c1, k2, v2) // c2 = { (k1,v1), (k2 v2) } CollectionAfterRemoval(c3, c2, k1) // c3 = { (k2,v2) }
This representation of a collection's evolution makes no assumption regarding the underlying data structure used to store and manage collections. In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates. In fact, the state of a collection (i.e., the set of key-value pairs it contains) at a given point in a sequence of operations is never stated explicitly. Rather, it can be obtained by querying the chain of derivation assertions involving insertions and removals. Entity type prov:type="emptyCollection" can be used in this context as it marks the start of a sequence of collection operations.
Observations:
entity(c, [prov:type="EmptyCollection"]) // e is an empty collection entity(k1) entity(v1) entity(k2) entity(v2) entity(k3) entity(v3) entity(c1, [prov:type="Collection"]) entity(c2, [prov:type="Collection"]) entity(c3, [prov:type="Collection"]) CollectionAfterInsertion(c1, c, k1, v1) // c1 = { (k1,v1) } CollectionAfterInsertion(c2, c, k2, v2) // c2 = { (k2 v2) } CollectionAfterInsertion(c3, c1, k3,v3) // c3 = { (k1,v1), (k3,v3) }
entity(c, [prov:type="collection"]) // e is a collection, possibly not empty entity(k1) entity(v1) entity(k2) entity(v2, [prov:type="collection"]) // v2 is a collection CollectionAfterInsertion(c1, c, k1, v1) // c1 includes { (k1,v1) } but may contain additional unknown pairs CollectionAfterInsertion(c2, c1, k2, v2) // c2 includes { (k1,v1), (k2 v2) } where v2 is a collection with unknown state
entity(c, [prov:type="emptyCollection"]) // e is an empty collection entity(k1) entity(v1) entity(k2) entity(v2) CollectionAfterInsertion(c1, c, k1, v1) // c1 = { (k1,v1) } wasDerivedFrom(c2, c1) // the asserted knows that c2 is somehow derived from c1, but cannot assert the precise sequence of updates CollectionAfterInsertion(c3, c2, k2, v2)
An assertion CollectionAfterInsertion, written CollectionAfterInsertion(collAfter, collBefore, key, value), contains:
An assertion CollectionAfterDeletion, written CollectionAfterDeletion(collAfter, collBefore, key), contains:
In PROV-ASN, an collection record's text matches the collectionRecord production of the grammar:
The previous two sections have introduced a data model for provenance, without introducing any constraint that this data model has to satisfy. In this section, we explore the constraints that this data model has to satisfy.
Section section-time-event introduces a notion of instantaneous event marking changes in the world, in its activities and entities. PROV-DM identifies four kinds of instantaneous events, namely entity generation event, entity usage event, activity start event and activity end event. PROV-DM adopts Lamport's clock assumptions [CLOCK] in the form of a reflexive, transitive partial order follows (and its inverse precedes) between instantaneous events. Furthermore, PROV-DM assumes the existence of a mapping from instantaneous events to time clocks, though the actual mapping is not in scope of this specification.
Given that provenance records offer a description of past entities and activities, to be meaningful provenance records must satisfy instantaneous event ordering constraints, which we introduce in this section. For instance, an entity can only be used after it was generated; hence, we say that an entity's generation event precedes any of this entity's usage event. Should this ordering constraint be proven invalid, the associated generation and usage records could not be credible. The rest of this section defines the temporal interpretation of provenance records as the set of instantaneous event ordering constraints associated with provenance records.
PROV-DM also allows for time observations to be inserted in specific provenance records, for each of the four kinds of instantaneous events introduced in this specification. The presence of a time observation for a given instantaneous event fixes the mapping of this instantaneous event to the timeline. The presence of time information in a provenance record instantiates the ordering constraint with that time information. It is expected that such instantiated constraint can help corroborate provenance information. We anticipate that verification algorithms could be developed though this verification is outside the scope of this specification.
The following figure summarizes the ordering constraints in a graphical manner. For each subfigure, an event time line points to the right. Activities are represented by rectangles, whereas entities are represented by circles. Usage, generation and derivation records are represented by the corresponding edges between entities and activities. The four kind of instantaneous events are represented by vertical dotted lines (adjacent to the vertical sides of an activity's rectangle, or intersecting usage and generation edges). The ordering constraints are represented by triangles: an occurrence of a triangle between two instantaneous event vertical dotted lines represents that the event denoted by the left line precedes the event denoted by the right line.
The mere existence of an activity assertion entails some event ordering in the world, since an activity start event always precedes the corresponding activity end event. This is illustrated by Subfigure constraint-summary (a) and expressed by constraint start-precedes-end.
Assertion of a usage record and a generation record for a given entity implies ordering of events in the world, since the generation event had to precede the usage event. This is illustrated by Subfigure constraint-summary (b) and expressed by constraint generation-precedes-usage.
The assertion of a usage record implies ordering of events in the world, since the corresponding event had to occur during the associated activity. This is illustrated by Subfigure constraint-summary (c) and expressed by constraint usage-within-activity.
The assertion of a generation record implies ordering of events in the world, since the corresponding event had to occur during the associated activity. This is illustrated by Subfigure constraint-summary (d) and expressed by constraint generation-within-activity.
If a derivation record holds for e2 and e1, then this means that the entity e1 had some form of influence on the entity e2; for this to be possible, some event ordering must be satisfied. First, we consider one-activity derivations. In that case, the usage of e1 has to precede the generation of e2. This is illustrated by Subfigure constraint-summary (e) and expressed by constraint derivation-usage-generation-ordering.
For imprecise-n derivations, a similar constraint exists, but in this case, no usage record can be inferred for e1. Instead, the constraint refers to its generation event, as illustrated by Subfigure constraint-summary (f) and expressed by constraint derivation-generation-generation-ordering.
Note that event ordering is between generations of e1 and e2, as opposed to precise-1 derivation, which implies ordering ordering between the usage of e1 and generation of e2. Indeed, in the case of imprecise-n derivation, nothing is known about the usage of e1, since there is no associated activity.
The assertion of an information flow ordering record between two activities of a1 and a2 also implies ordering of events in the world, since some entity must have been generated by the former and used by the later, which implies that the start event of a1 cannot follow the end event of a2. This is illustrated by Subfigure constraint-summary (g) and expressed by constraint wasInformedBy-ordering.
The assertion of a control flow ordering record between two activities of a1 and a2 also implies ordering of events in the world, since a1 must have been active before a2 started. This is illustrated by Subfigure constraint-summary (h) and expressed by constraint wasStartedBy-ordering.
Further constraints appear in Figure constraint-summary2 and are discussed below.
An agent that started an activity must exist when the activity starts. This is illustrated by Subfigure constraint-summary2 (a) and expressed by constraint wasStartedByAgent-ordering.
An activity that was associated with an agent must have some overlap with the agent. The agent may be generated, or may only become associated with the activity, after its start: so, the agent is required to exist before the activity end. Likewise, the agent may be destructed, or may terminate its association with the activity, before the activity end: hence, the agent destruction is required to happen after the activity start. This is illustrated by Subfigure constraint-summary2 (b) and expressed by constraint wasAssociatedWith-ordering.
Sections 5 and 6 define a data model for provenance, which, for the most part, is unconstrained. Section 7.1 defines an interpretation of this data model, in terms of event ordering constraints. This section introduces further constraints on the structure of PROV-DM records. Records that satisfy these constraints are said to be structurally well-formed. A benefit of structurally well-formed provenance records is that further inferences can be made, because records are more precise, and therefore, richer.
According to the definition of a generation record, an entity becomes available after this entity's generation event, and does not exist before this event. From this definition, we conclude that PROV-DM does not allow for an entity to have two generation records occurring at two different instants. The rationale for this constraint is as follows. Two distinct generation events (by a same activity or by two distinct activities), occurring one after the other, necessarily create two distinct entities; otherwise, the second generation event would have resulted in an entity that existed before its creation, which contradicts the definition of generation record.
So, PROV-DM allows for two distinct generation records g1 and g2 referencing a same entity record provided they occur simultaneously. In practice, for such a simultaneous generation to occur, the generation event has to be unique and caused by a single world activity, though the provenance records may contain different activity records providing alternative descriptions of that same world activity.
In the following assertions, a workflow execution a0 consists of two sub-workflow executions a1 and a2. Sub-workflow execution a2 generates entity e, so does a0.
activity(a0,,,[prov:type="workflow execution"]) activity(a1,,,[prov:type="workflow execution"]) activity(a2,,,[prov:type="workflow execution"]) wasInformedBy(a2,a1) wasGeneratedBy(e,a0) wasGeneratedBy(e,a2)This example is permitted in PROV-DM if the two activity records a0 and a2 provide alternate descriptions of what happens in the world with respect to this generation event.
While this example is permitted in PROV-DM, it does not expose the hierarchical organization of executions and it mixes records providing two descriptions of a same execution. This issue is highlighted by two different generation records for entity e, which makes reasoning about this kind of provenance records unnecessarily difficult. Such assertions are said not be structurally well-formed.
Structurally well-formed provenance records can be obtained by partitioning the generation records into different accounts. This makes it clear that these records provide alternative descriptions of the same real-world generation event, rather than describing two generation events for the same entity. When accounts are used, the example can be encoded as follows.
The same example is now revisited, with the following assertions that are structurally well-formed. Two accounts are introduced, and there is a single generation record for entity e per account.
account(ex:summary, http://example.org/asserter, activity(a0,t1,t2,[prov:type="workflow execution"]) wasGeneratedBy(e,a0)) account(ex:detailed, http://example.org/asserter, activity(a1,t1,t3,[prov:type="workflow execution"]) activity(a2,t3,t2,[prov:type="workflow execution"]) wasInformedBy(a2,a1) wasGeneratedBy(e,a2))
Structurally well-formed records satisfy some constraints, which force the structure of descriptions to be exposed by means of accounts. With these constraints satisfied, further inferences can be made about structurally well-formed records. The uniqueness of generation records in accounts is formulated as follows.
A further inference is permitted from the imprecise-1 derivation record:
Given an activity record identified by a, entity records identified by e1 and e2, and set of attribute-value pairs attrs2, if wasDerivedFrom(e2,e1, [prov:steps="single"]) and wasGeneratedBy(e2,a,attrs2) hold, then used(a,e1,attrs1) also holds for some set of attribute-value pairs attrs1.
This inference is justified by the fact that the entity represented by entity record identified by e2 is generated by at most one activity in a given account (see generation-uniqueness). Hence, this activity record is also the one referred to in the usage record of e1.
We note that the converse inference, does not hold. From wasDerivedFrom(e2,e1) and used(a,e1), one cannot derive wasGeneratedBy(e2,a,attrs2) because identifier e1 may occur in usage records referring to many activity records, but they may not be referred to in generation records containing identifier e2.
An account is said to be structurally well-formed if it satisfies the constraint generation-uniqueness. If an account is structurally well-formed, it support the inference derivation-use.
The union of two accounts is another account, containing the unions of their respective records, where records with a same identifier should be understood according to constraint identifiable-record-in-account. Structurally well-formed accounts are not closed under union because the constraint generation-uniqueness may no longer be satisfied in the resulting union.
Indeed, let us reconsider example account-example-1, and let us define another account record as follows.
account(ex:acc2, http://example.org/asserter2, entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ]) ... activity(a1,t1,,[prov:type="createFile"]) ... wasGeneratedBy(e0,a1,[ex:fct="create"]) ... )
with identifier ex:acc2, containing assertions by asserter by http://example.org/asserter2 stating that the entity represented by entity record identified by e0 was generated by an activity represented by activity record identified by a1 instead of a0 in the previous account ex:acc0. If accounts ex:acc0 and ex:acc2 are merged together, the resulting set of records violates generation-uniqueness if the two activities a0 and a1 are distinct.
The PROV data model provides several extensibility points that allow designers to specialize it to specific applications or domains. We summarize these extensibility points here:
The PROV-DM namespace declares a set of reserved attributes catering for extensibility: type, location.
To this end, the PROV-DM namespace declares a reserved attribute: role.
The PROV data model is designed to be application and technology independent, but specializations of PROV-DM are welcome and encouraged. To ensure inter-operability, specializations of the PROV data model that exploit the extensibility points summarized in this section must preserve the semantics specified in this document. For instance, a qualified attribute on a domain specific entity record must represent an aspect of an entity and this aspect must remain unchanged during the characterization's interval of this entity record.
This specification introduces the notion of an identifiable entity in the world. In PROV-DM, an entity record is a representation of such an identifiable entity. An entity record includes an identifier identifying this entity. Identifiers are qualified names, which can be mapped to IRIs.
The term 'resource' is used in a general sense for whatever might be identified by a URI [RFC3986]. On the Web, a URI denotes a resource, without any expectation that the resource is accessed.
The purpose of this section is to clarify the relationship between resource and the notions of entity and entity record.
In the context of PROV-DM, a resource is just a thing in the world. One may take multiple perspectives on such a thing and its situation in the world, fixing some its aspects.
We refer to the example of section 2.1 for a resource (at some URL) and three different perspectives, referred to as entities. Three different entity records can be expressed for this report, which in the PROV-ASN sample below, are expressed within a same account.
container prefix app http://example.org/app/ prefix cr http://example.org/crime/ account(acc1, http://example.org/asserter1, entity(app:0, [ prov:type="Document", cr:path="http://example.org/crime.txt" ]) entity(app:1, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ]) entity(app:2, [ prov:type="Document", cr:author="John" ]) ...) endContainer
Each entity record contains an identifier that is unique in account acc1, and therefore locally identifies the entity record it is contained in. In this example, three identifiers were minted.
Given that the report is a resource denoted by the URI http://example.org/crime.txt, we could simply use this URI as the identifier of an entity. This would avoid us minting new URIs. Hence, the report URI would play a double role: as a URI it denotes a resource accessible at that URI, and as an identifier in a PROV-DM record, it helps identify a specific characterization of this report. A given identifier occurring in an entity record must be unique within the scope of an account. Hence, below, all entities records have been given the same identifier but appear in the scope of different accounts, so as to satisfy identifiable-record-in-account.
container prefix app http://example.org/ prefix cr http://example.org/crime/ account(acc2, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt" ]) ...) account(acc3, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ]) ...) account(acc4, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:author="John" ]) ...) endContainer
In this case, the qualified name app:crime.txt maps to URI http://example.org/crime.txt still denotes the same resource; however, the perspectives we take about that resource are expressed by multiple entity records, happening to all contain the same identifier but in different accounts.
Alternatively, if we need to assert the existence of two different perspectives on the report within the same account, then alternate identifiers must be used, one of them being allowed to be the resource URI.
container prefix app http://example.org/ prefix app2 http://example.org/app/ prefix cr http://example.org/crime/ account(acc5, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt" ]) entity(app2:1, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ]) ...) endContainer
WG membership to be listed here.