Re: PROV-ISSUE-447: subactivity relation [prov-dm] from Luc Moreau on 2012-09-10 (public-prov-wg@w3.org from September 2012)

From: Luc Moreau <l.moreau@ecs.soton.ac.uk>
Date: Mon, 10 Sep 2012 12:12:39 +0100
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
CC: Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, public-prov-wg@w3.org
Message-ID: <EMEW3|fe0c41fa776ece34e95ffe1d5bd9759fo89CCg08l.moreau|ecs.soton.ac.uk|504DCB27>
Hi Stian,

Response interleaved.

On 09/06/2012 11:11 AM, Stian Soiland-Reyes wrote:
>> I don't think this example makes much sense:
>>
>> activity(a1,2011-11-16T00:00:00,2011-11-17T00:00:00) // in 2011
>> activity(a2,2012-11-16T00:00:00,2012-11-17T00:00:00)  // in 2012
>> wasSubactivity(a1,a2)
> I agree this would look stupid, but we have said before that the exact
> timestamps don't have any meaning in PROV-Constraints.

My point was to illustrate an issue with subactivity and event ordering .
An equivalent situation can be constructed without explicit reference
to time:

wasGeneratedBy(gen1; e1,a2,-)
wasDerivedFrom(der; e2,e1,-)
wasGeneratedBy(gen2; e2,-,-)
wasStartedBy(start1; a1,e2,-,-)
wasStartedBy(start2; a2,-,-,-)

Constraints:

start2 <= gen1 < gen2 <= start1

subActivityOf(a2,a1)

start1 <= start 2


>
> In particular for subactivities, it could very much happen that the
> times are recorded by different mechanisms. Perhaps a difference of a
> year is a glaring error, but say a few seconds off might be
> acceptable.  (For instance a shell script that does an SSH to a server
> that then does a wget to a web service, three different timestamps not
> quite synchronized.).   Obviously this can easily be isolated using
> different accounts/bundles, but as has been discussed with workflow
> provenance, we often came to the conclusion that we don't want to
> split every subactivity into a new bundle, as it would mean hundreds
> of different standalone bundles which would be trickier to do any kind
> of reasoning over.
>
>
>> As indicated previously, it's a whole complete new design that
>> we have to undertake, for which we don't have enough experience.
> It seems that a wasSubActivity should have many of the characteristics
> of specializationOf, but it raises lots of discussion points for
> inferences:
>
> * the subactivity must be fully contained within the duration of the
> superactivity (This is the easy one!)
> * wasAssociatedWith(ag, subAct), then wasAssociatedWith(ag, act) ?  Vice versa?
> * wasGeneratedBy(e, subAct), then wasGeneratedBy(e, act) ?  Vice versa?
> * used(subAct, e), then wasGeneratedBy(act, e) ?  Vice versa?
> * Must subactivities be 'isolated', or are they allowed to communicate
> with activities which also communicate with the superactivity?
> (Imposes a theory of execution!)
> * Can the superactivity communicate with the subactivity? Does it always?
>
> So I agree it is a big can of worms. This was difficult enough to
> settle for entities, now we would not only have to think about
> activity-to-activity, but the implications on the other relations.

This is my point. We cannot start a design on this because
there is no precedent on this in the provenance community.

>
> However the arguments we used for adding prov:specializationOf and
> prov:alternateOf would very easily also apply to activities:
> * Equivalent activities can be expressed at different granularities
> (prov:wasSubActivityOf ?)
> * Equivalent activities can be expressed using alternate
> interpretations (prov:alternateActivity ? )
>
> So given this, why do we allow nesting and alternatives for entities,
> and not for activities?

That's a good question. I personally was opposed to memberOf
to be part of PROV ... but no need to come back to this decision.

The difference however is that I don't necessarily see these event
constraints to be necessary for memberOf. So, it's OK.

>
>
> I strongly recognize the need for the expression of subactivities -
> but I am very afraid of all of these questions, and it is not like our
> model is not getting complex enough already.
>
> I would prefer to simply introduce it as a dcterms:hasPart (please,
> don't use dc !) kind of notion with no particular interpretation
> attached - it is simply a guide to the reader, like prov:alternateOf.
> Perhaps prov:partOfActivity  to avoid the implications of "sub"?  (ie.
> are you allowed to be part of multiple activities? I think we should
> not restrict that.)
>

That's where we have to be careful about what we say.

My view is that in our formal response to this issue (and potential in 
our FAQ
if we create one), the working group can *suggest* the use of 
dcterms:hasPart.

This should NOT BE NORMATIVE. In fact, it shouldn't appear in our 
recommendation
documents.   We can also flag the potential problem of event ordering 
and subactivity.


Regards,
Luc

>
> It still raises the question about entities generated by both
> activities and the generation-uniqueness constraint.
>
> One way around it, as I've approached it for Taverna's workflow PROV,
> is to use prov:alternateOf between two entities, one per
> generation/invalidation. You can picture these entities as
> representing "The value as output gate X" and "The value at output
> gate Y" - almost like the old prov:EntityInRole. This is the same
> reasoning a washed car coming out of the last-stage
> activity(polishing) and thereby completing the activity(carWashing)
> can be seen as generated twice, once as "polishedCar" and once as
> "washedCar" - even though there is nothing happening between the two
> activities and the two entities are equivalent.
>
> If this is the recommended approach, then it would be good to have a
> property to clarify this is not just any odd alternate; say
> prov:alternateInSubActivity. (as a property on the prov:Entity or a
> subproperty of prov:alternateOf). Otherwise it gets tricky to query
> the provenance across, we don't want to follow every odd alternate up
> and down the trace. The strange thing here is that you don't *need* to
> do the prov:alternateOf wrapping for usage or association. The
> question also then comes to which extend to the subactivities should
> always twin the entities or not.
>
> I don't particularly like that "work around" approach for
> subactivities, as it ends up making a verbose "twin world" with
> alternate identifiers (which you have to mint) - effectively making an
> inline bundle without clear boundaries.
>
>
>
> The second way, much simpler and my preference, is to allow multiple
> generation, but only as long as one activity is subactivity of the
> other. I guess we can't infer which one is the sub and which one is
> the super - so it would be a constraint rather than an inference, but
> this gets tricky with the open world assumption and the use of OR/NOT.
>
> (This can be solved by adding a prov:alternateActivityFor as a
> symmetric superproperty of prov:wasSubActivityOf, then we can instead
> of the constraint simply infer prov:alternateActivityFor on multiple
> generations. The semantics of prov:alternativeActivityFor would be
> particularly weak, similar to prov:alternativeOf.  )
>
>
> This is indeed the approach we have taken for Wf4Ever's 'simplified'
> workflow provenance model wfprov -
> http://wf4ever.github.com/ro/#wfprov
>
> Here wfprov:wasPartOfWorkflowRun is the workflow equivalent of
> wasSubActivityOf, and both are allowed to have the same artifact (ie.
> entity) as it's wfprov:wasOutputFrom. Because of this we currently we
> can't make wfprov:wasOutputFrom a subproperty of prov:wasGeneratedBy
> without violating PROV-Constraints. As we don't want to make a too
> verbose model, we are trying to avoid adding the equivalent of
> prov:alternateOf workaround I sketched above.
>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Monday, 10 September 2012 11:13:18 UTC