Re: PROV-ISSUE-26 (uses and generates questions): How can one figure out the provenance of a given entity? from Paulo Pinheiro da Silva on 2011-08-05 (public-prov-wg@w3.org from August 2011)

From: Paulo Pinheiro da Silva <paulo@utep.edu>
Date: Thu, 4 Aug 2011 23:30:13 -0600
To: <public-prov-wg@w3.org>
Message-ID: <4E3B7FE5.5060603@utep.edu>
Hi Luc,

Please see my comments in-line below:
> - I assume you mean can we infer that c was derived by the process
> execution
>
>      Yes, this is explained in the document, and further refine in the
> soon-to-be-released new version.
>       Only one pe can generate c (in one account).
>       And from a derivation from c to a, one can infer the existence of a
> pe which generated c and  used a.

Yes, this explains a lot!

I understand that the model must be able to represent that a derivation 
from 'a' to 'c' occurred through a process execution and that the 
process execution was indeed the one called 'pe'. The fact that the 
document explains the inference above appears to support the need for 
such description.

 From your message, I see that one cannot derive that 'pe' was the 
process execution that derived 'c' without the use of accounts -- and I 
do not recall any group discussion of what is an account. So, this 
suggests that we are not following the proper concept dependencies to 
discuss these provenance concepts in a logical way -- can you see my point?

I further understand that the model does not only relies on accounts but 
also relies on the use of this restriction that "an entity can only be 
generated by one process execution" to be able to infer in our example 
that 'pe' was the process execution that derived c. I would strongly 
favor the adoption of constructs that are explicitly capable of stating 
relationships between data derivations and process executions.

Going back to the example (I numbered the statements to facilitate the 
conversation):

1. uses(pe, a, r_a)
2. uses(pe, b, r_b)
3. isGeneratedBy(c,pe,r_c)
4. isDerivedFrom(c,a)


I understand that most of this conversation is in support of the need of 
representing that 'pe' has an input parameter 'b' that is not used to 
derive 'a' (and I am using close world assumption to infer that 'c' was 
not derived from 'b' -- is this correct?). Do we really need to have all 
this added complexity for every single derivation encoding to say that 
'pe' has this additional parameter that does not affect the final 
product of the precess execution? I would further claim that most 
process execution inputs and outputs in real life would not include 
entities that are not involved in derivations. There are many things 
that we can do to simplify this model:

Option A: To formalize a 'derive' role that can be used both in 'uses' 
and 'isGeneratedBy' and to drop (4)

uses (pe, a, derive)
uses (pe, b r_b)
isGeneratedBy(c, pe, derive)

Option B: To assume that 'uses' and 'isGeneratedBy' implies derivation 
and to add a new relationship to explicitly annotate processes including 
the use of roles

uses (pe, a)
annotates (pe, b, r_b)
isGeneratedBy(c, pe)

In this case, we could swap the positions of 'pe' and b in case 'b' was 
an output of 'pe'.

Both options would significantly reduce most of the diagrams we have 
built so far, what is less work for the specification of provenance, 
without losing a single bit of information. Moreover, on top of this, 
our definitions of 'uses' and 'isGeneratedBy' would stand on their own 
without the need of accounts or the enforcement of restrictions such as 
that 'c' can only be generated by 'pe' (I also have lots of things to 
discuss in terms of this restriction in case we decide to keep the 
current approach).

I am not saying that we only have options A and B (or even that options 
A and B are correct). We may have other options and I am just proposing 
A and B to demonstrate the there are other ways of representing 
provenance that may be more beneficial than the current approach.

Many thanks,
Paulo.

> I hope it helps,
> Cheers,
> Luc
>
> On 07/07/11 15:50, Provenance Working Group Issue Tracker wrote:
> >  PROV-ISSUE-26 (uses and generates questions): How can one figure out the provenance of a given entity?
> >
> >  http://www.w3.org/2011/prov/track/issues/26
> >
> >  Raised by: Paulo Pinheiro da Silva
> >  On product:
> >
> >  Context:
> >  1. P uses A
> >  2. P uses B
> >  3. P generates C
> >  4. C derived from A
> >
> >  If the provenance of C is the concern of a user of C (as opposed to the provenance of a process that generates C), one may have the following questions:
> >
> >  1) What the “uses” and “generates” relationships are adding to one’s understanding of C if something is wrong with C?
> >  2) Can we infer that A was derived by the execution of process P? How?
> >
> >
> >
> >
> >
Received on Friday, 5 August 2011 05:31:17 UTC