Last Call: Constraints of the Provenance Data Model

Part of Data

Author(s) and publish date

By:
Published:

On Sept. 11, 2012 the Provenance Working Group has announced Last Call on a new document: PROV-CONSTRAINTS in the suite that defines the core of the PROV family of specifications.

This follows the recent Last Call announcement for 3 other documents, namely PROV-DM, PROV-O, PROV-N. The meaning of Last Call is clarified in the earlier announcement. Essentially, it means that the specification document is open to public comments for a set period of time, at the end of which the editors commit to produce the final version of the document, where all such comments are accounted for following internal group discussions.

The PROV-CONSTRAINTS document complements the first three, and is focused on the notion of valid provenance. The intent of provenance validation is to ensure that a set of PROV statements represents a history of objects and their interactions which is consistent, and thus safe to use for the purpose of logical reasoning and other kinds of analysis.

Thus, the document can be used to design a validator that can be used to check the consistency of a PROV statements.

What is in the CONSTRAINTS document?

Three types of constraints are defined.

  • Uniqueness constraints. These include key constraints, stating for instance that identifier e is key for statement entity(e,attrs), but also constraints that state the uniqueness of events such as the generation of an entity by an activity. Constraint 25for example states that only one generation event can be associated to a generated entity and a generating activity: IF wasGeneratedBy(gen1; e,a,_t1,_attrs1) and wasGeneratedBy(gen2; e,a,_t2,_attrs2), THEN gen1 = gen2.
  • Event ordering constraints. These specify the possible orderings of events (generation, usage, invalidation of entities, start and end of activities) that correspond to a sensible history. For example, an entity should not be used before it is generated (Constraint 39): IF wasGeneratedBy(gen; e,_a1,_t1,_attrs1) and used(use; _a2,e,_t2,_attrs2) THEN gen precedes use.
  • Impossibility constraints. These are used to state for example that the same identifier cannot be used in two different relation types (i.e. entity(foo) and activity(foo) is an illegal combination), but also to state property of relations, for example "specialization is irreflexive" (Constraint 54): IF specializationOf(e,e) THEN INVALID.and "the set of entities and activities are disjoint" (Constraint 57):IF 'entity' ∈ typeOf(id) AND 'activity' ∈ typeOf(id) THEN INVALID.

Example

We now show an inference process involving ordering constaints, which leads to concluding that all the events involved in the provenance must all be simultaneous. Although logically this is a possibility, this is most likely an indication of some of the statements disrupt the consistency of the entire history. The example involves a case of mutual derivation of an entity from another. Consider the following statements:

entity(e1) entity(e2) activity(a1) activity(a2) wasGeneratedBy(gen2; e2,a2,t2) wasGeneratedBy(gen1; e1,a1,t1) wasDerivedFrom(d1; e2,e1,-,-,-)

That is, e2 was derived from e1, each of e2, e1 being respectively generated by an activity a2, a1, at time t2, t1, as illustrated by the following figure.

Constraint 44 defines the precedence of generation of the second entity over generation of the first entity in the context of derivation:

IF wasDerivedFrom(d; e2,e1,a,g,u,attrs) and wasGeneratedBy(gen1; e1,a1,t1,attrs1) and wasGeneratedBy(gen2; e2,a2,t2,attrs2) THEN gen1 strictly precedes gen2.

Intuitively, e1 must be generated prior to generating e2:

gen1 strictly precedes gen2.

Suppose we add the following statement to the our set of statements:

wasDerivedFrom(d2; e1,e2,-,-,-)

This would form the following overall PROV graph.

Adding this new statement, however, creates a circular derivation between e1 and e2, an invalid situation. We therefore expect that our constraint system be able to tell us something interesting. Indeed, by application of the same Constraint 44, this new statement entails:

gen2 strictly precedes gen1.

Hence, we obtain that gen2 strictly precedes gen1 strictly precedes gen2, which is impossible.

Conclusion

This example was simple and may not have required an automated validator to detect invalidation. However, when graph patterns become more complex, an automated validator turns out to be an essential component for provenance user, whether they intend to publish provenance, or whether they intend to consume it. The prov-constraints document defines a set of constraints that validators are expected to implement.

We encourage developers to implement these constraints. Several people are already working on validators and we encourage you to do so as well.

Related RSS feed

Comments (0)

Comments for this post are closed.