ConceptDerivation

From Provenance WG Wiki
Jump to: navigation, search

Definition for Concept 'Derivation'

ISSUE: http://www.w3.org/2011/prov/track/issues/7

Introduction

The Provenance WG charter identifies the concept 'Derivation' as a core concept of the provenance interchange language to be standardized (see http://www.w3.org/2011/01/prov-wg-charter).

  • What term do we adopt for the concept 'Derivation'?
  • How do we define the concept 'Derivation'?
  • Where does concept 'Derivation' appear in ProvenanceExample?
  • Which provenance query requires the concept 'Derivation'?

Proposed Definitions for the Concept 'Derivation'

Definition by Jun

Instead of being a concept, derivation might be more appropriate for expressing the dependency relationship between things, that are in an immutable state and for whom we provide provenance descriptions.

Examples of derivation from the ProvenanceExample:

  • RDF (f1), converted by gov, is derived from data (d1)
  • chart (c1) is derived from the turtle (lcp1) and the statistical assumptions (stats1)
  • document (art1) is derived from the incidence map (map1), chart (c1) and the image (img1)
  • the updated data (d2) is derived from data (d1)

But...

  • Is the turtle serialization (lcp1) derived from the resource (r1)?
  • Is the turtle serialization turtle (lcp2) derived from the resource (r2)?

If we say derivation relationship describes relationships between things, as defined above, then, these two examples should reflect a derivation relationship.

revised by Jun on June 14, 2011

Given the proposal of using IVPT as a current replace of resources, I revise my original definition.

Derivation expresses the dependency relationship between two IVPTs, which might be of the same thing or different things. The existence of the derived IVPT must rely on the existence of another IVPT that it is derived from.

Given this new definition, I would say the last two examples do not reflect a derivation relationship, because the example does not clearly define what r1 and r2 are, whether they are IVPTs of a thing.

Definition by PM

Derivation is a binary relation between two resource states RS (in line with my distinction between resources, their states, and their representations).

Note that although the relation is binary, one can assert that a RS A is derived from a set B = { B1...Bn} of RS. This means that each of the Bi is necessary in order for B to exist.

The semantics of this (and all other) relations remains informal, however, because we have no formal grounding of the core concepts. Describing derivation in terms of influence, for example, simply shifts the problem without solving it. What is influence?

Specialisations

Derivation is a top-level relation that can and should be specialised in a principled way. I have no grasp on how this principled process works. Someone suggested Version is a type of Derivation. Good. Can we do better than dig these out of the examples?

NOTE: the terminology used here needs realigning with IVPT, introduced after this was written

Definition by Khalid Belhajjame

derivation is a relationship that associates an IVPT to another (different) IVPT specifying that the former contributed to the existence of the latter. The two IVPTs may be views of the same or different things. This can be captured by specializing the derivation relationship, or by associating it with a property that specifies the relationship between the two IVPTs.

As the discussion on derivation continues, I expect that we will identify other kinds of derivations that we want to capture.

Definition by Luc

I would like to move away from terminology such as "influence", which we will struggle to define.

I think derivation is "linking" two Invariant Views or Perspectives on a Thing (IVPT).

Derivation expresses a transfer of information from an IVPT to another IVPT.

Alternatively, we could use 'information flow'.

Credit: this is inspired by some of the ideas in http://star.tau.ac.il/~eshel/Physics2006/Causation%20and%20Information.pdf

I think this applies nicely to digital systems, it also works well with conceptual and biological systems. Does it work well for all physical things? Hhm??? Well, information is also held in a specific state of a physical system (e.g. cup is full, cup is empty, water is cold, water is hot)

Definition adapted by Graham

I prefer the OPM approach to all the above:

  • Definition 8 (Artifact Derived from Artifact). An edge "was derived from" from artifact A2 to artifact A1 is a causal relationship that indicates that artifact A1 needs to have been generated for A2 to be generated. The piece of state associated with A2 is dependent on the presence of A1 or on the piece of state associated with A1.

-- http://eprints.ecs.soton.ac.uk/21449/1/opm.pdf

But I think the bit about state is redundant, and the wording needs adjusting for our context, e.g.

  • An artifact A2 "was derived from" artifact A1 is a causal relationship that indicates that artifact A1 needs to have existed for A2 to be created.

Comment by James

Terms like "causal", "influence", "derivation", or "needs to have existed" are all a bit subjective, or at least, philosophers have struggled for hundreds of years to come up with satisfying objective definitions. There is some recent work on mathematical formalisms for causal models, see Pearl's "Causality" (2000) or Woodward's "Making things happen", and there is also lots of work on information flow in security, but all of this work makes some modeling assumptions about what it means for information to "flow".

I (and more recently some other DB people) have also tried to adapt these ideas to provenance, but I'm not totally convinced it is the right way to go: http://arxiv.org/abs/1006.1429v1

I'd suggest not trying to define derivation per se but naming it as a relationship that can hold between IVPTs as judged by some observer/perspective/account, whose criteria for judging this may be objective (e.g. using a mathematical theory of causality), legalistic (using defeasible / argumentation) or completely subjective (I know it when I see it).

Comment by Paolo

I fully agree with James's comments. In addition I would like to caution against the use of Causation in this context. It may take us far. For reference, [here] is a recent tutorial on "Provenance and Causality" presented at TAPP'11, June 2011 by Dan Suciu.

Definition by Luc (in terms of properties)

Things represent stuff in the real-world.

Definition of Derivation. A derivation represents how stuffs are transformed or affect each other in the real world.

A thing B is derived from a thing A if:

  • A was used (and therefore created) before B was created
  • The values of some invariant properties of B are partially determined by the values of some invariant properties of A

--Luc Moreau 13:42, 17 June 2011 (UTC)

Comments

--Luc Moreau 22:42, 20 June 2011 (UTC)

Definition by Simon, a revision of Luc's above

Derivation represents how stuff is transformed from or affected by other stuff. A thing B is derived from a thing A if the values of some invariant properties of B are at least partially determined by the values of some invariant properties of A.

--Simon Miles 15:20, 21 June 2011 (UTC)

Comment by Khalid

The first sentence is fine. I have some concerns about the second sentence in the definition. I tend to think that there are cases in which a thing B may be derived from the thing B, without it being that the values of some invariant properties of B are determined by the values of some invariant properties of A. In particular, I am thinking of a case in which a process that takes A as input will decide to output B simply because A exists, without using the values of some invariant properties of A to generate the values of some invariant properties of B. Is this example outside the scope of what we want derivation to capture?

Comment by Satya

I broadly agree with Luc and Simon's definition, except I would replace affected with created from, since a thing X may be affected by thing Y, but X may not be derived from Y. For example, cold temperature affects plant X, but plant X is not derived from cold temperature.

Modified definition: "Derivation represents how stuff is transformed from or created from other stuff."

Also, would like to point to the both the "derived from" and "transformation of" properties defined by the OBO Foundry Relation ontology, which is widely used in biomedical ontologies.