From Provenance WG Wiki
Jump to: navigation, search

Definition for Concept 'Collections'



The Provenance WG charter identifies the concept 'Collections' as a core concept of the provenance interchange language to be standardized (see

  • What term do we adopt for the concept 'Collections'?
  • How do we define the concept 'Collections'?
  • Where does concept 'Collections' appear in ProvenanceExample?
  • Which provenance query requires the concept 'Collections'?

Proposed Definitions for the Concept 'Collections'

Definition by Creed(?)

This is input to the discussion only. The geospatial/GIS community has a set of definitions that may be of use. These include "feature", feature collection, container, and so forth. These definitions are captured in joint OGC and ISO standards documents. While not specifically developed for provenance, they may be useful. Specifically: A feature is a digital representation of a real world entity or an abstraction of the real world (ISO 19107 and others). Examples of features include almost anything that can be placed in time and space, including desks, buildings, cities, trees, ecosystems, delivery vehicles, newspaper articles, and so on. Features are usually managed in groups as feature collections. A feature collection is a set of feature instances and related metadata (ISO 19136). The set of articles in a newspaper is a feature collection. The set of furniture in a building is a feature collection. In a sense, a feature is equivalent to a concept where the concept represents an agreement in a community.

Definition by Satya

(The Provenance XG identified this concept to represent part of/contains relations)

Two distinct properties are needed to represent partonomy (partOf/hasPart) and containment (containedIn/contains) relation between entities (which may be agent - temperature sensor in an satellite, resource - image in a news article, process - editorial review process is part of the publication process for a news article).

In the ProvenanceExample,

  • image (img1) within a document (art1)

Definition by Paolo M.

The following relations can be used to represent dependencies amongst collections (containers) and their elements. A full formalisation will follow later.

  • Contained(R): ex. L′ Contained(p) x

to denote that element x was inserted into L at position p. Here p can be an index, or more generally an access path into a data structure

  • wasSelectedFrom(R): ex L′ wasSelectedFrom (p) L

to denote that element L′ at position p was selected from L (L' can itself be a collection, i.e., when collections are nested)

  • wasRemovedFrom(R): ex L′ wasRemovedFrom (p) L

to denote that element L′ at position p was deleted from L

In addition, one can sometimes infer that two elements of a collection are the same, i.e., when there is evidence that one has been inserted at position p, the other is retrieved at position p, and no intervening operation has changed the collection. So we need a way to specify resource equality (by reference, not by content), something like:

  • L wasSameAs x

Also, some of these relations should be decorated with additional roles, for example:

  • P Used(list) L., P Used(position) p in the context of element selection, i.e., to distinguish the collection L from the access path p used by a selector (accessor operation)