Proposed Collections Relations

From Provenance WG Wiki
Jump to: navigation, search

Motivation

The OPM was agnostic to any typing of Artifacts. One could specialize the Artifact node type, but there is no specific provision in the model to account for relationships amongst elements of data structures.

Yet, processes are commonly aware of the types of the data that they operate on, and it is often the case that specific, precise dependencies can be established amongst elements of an input data structure and those of an output data structure as part of a computation.

Ordered trees, in particular, are commonly used data structures that subsume nested ordered lists, used for example in the Taverna workflow model, and XML documents, used in the Kepler COMAD model. In both these systems, tree manipulation is supported by a well-defined semantics that one can use to derive element-level dependencies between trees.

These informal considerations justify the introduction into PIL of new relationships, designed to express assertions about the following:

  1. current containment: based on the observation of list processing operators, an element may be asserted to be contained into a tree, at a certain position
  2. former containment: it may be possible to assert that an element was previously a member of a tree, from which it has been removed
  3. new containment: it may be possible to assert that an element has been added to a tree
  4. element equality: it may be possible to assert that an element of tree produced by a computation is the same as the element of a tree used as input to the computation. For example, the combination of inserting an element x into a tree T at a certain position, and then selecting from the tree at that same position (with no intervening operations on T, yields that same element x by definition.

Relations amongst data structure elements

We therefore propose to introduce the following relations into PIL -- note that relations are decorated with a Role, that qualifies the meaning of the data in the context of that relation. In particular, a Role can be the position of an element in the tree. Position p is a path into the ordered tree, i.e., a vector of the form [p1..pn] where each pi denotes the index of the sub-tree at the corresponding depth in the tree.

The proposed relations and associated Roles are as follows:

  1. X Contained(element) T asserts that X is an element of T. X can be an atomic value, or a sub-tree
  2. T' wasSelectedFrom(p) T asserts that T' was the element at position p in T.
  3. T' wasRemovedFrom(p) T asserts that T' was the element that was removed from T at position p.
  4. X wasSameAs X' asserts that elements X, X' (within some unspecified tree) are the same.