Re: PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]

Hi,

OK, the contributes attribute appproach was a strawman in any case.

In a sense, the obstacle here is the focus in prov-n on relations that
we think of as binary but in reality have an identity and additional
parameters/attributes.  In PROV-O, we would not have this problem, we
could just link multiple activities to the event id.

Your alternative essentially would (in database constraint terms) weaken
the key constraint to a functional dependency that says that the event
id determines only the time. This is fine and would be straightforward,
if we were working with normal flat relations, BUT because of attribute
lists (and the alignment we have in mind with rdf) it is not quite so
easy.

So concretely suppose we say:

wgb(id;e,a1,t,[k1=v1])
wgb(id;e,a2,t,[k2=v2])

This is currently invalid.  If we adopt your suggestion below, then it would be
valid.  But it's not clear to me what its normal form should be.
Should the attributes of the first statement be merged into those of
the second and vice versa?  

In rdf terms, we have been mapping the attribute value pairs to
properties hanging off the id.  So it seems to me that if we have
attributes hanging off the same id in different places, they should be
merged.  In other words, it seems wrong to me to use the same id to
describe two interactions, one between e and a1 and one between e and
a2.

So I think the right thing to do is somehow accommodate the fact that
a generation event could involve multiple equal participants.  If we had
some lightweight way of collecting activities, so that we could in
effect write 

wgb(id; e,[a1,a2], t, attrs)

would that work?


--James

On Thu, 9 Aug 2012 19:45:36 +0100
"Miles, Simon" <simon.miles@kcl.ac.uk> wrote:

> Hello James,
> 
> Agreed that simply removing the key constraint may allow too much. We
> want to keep the defining aspects of an event fixed across all
> descriptions to be valid. For an event in general, I think that just
> means time of occurrence, correct? If so, can't we express this as
> constraints, i.e. relation R with identifier i1 and time t1 cannot be
> merged with R with identifier i2 and time t2 if R describes an event
> (e.g. wasGeneratedBy) and i1=i2 but t1/=t2? For a generation event,
> perhaps it also means the entity?
> 
> For your use cases, I agree there is probably little difference in
> practice. It's case 2 I was thinking about. I'm not sure if the
> textual definitions in the DM preclude case 1, but it's interesting,
> e.g. music is generated at the instant that each of the individual
> instruments in a band are being played.
> 
> I'd find "primary" activities and "contributed" properties hard to
> explain and justify. I can't see why an activity at one level of
> abstraction should be any more primary than one at another. I'm
> unclear how to define contribution so that it works in both
> directions (sub- to super-activity and vice-versa). Also, shouldn't
> we be allowing for the merging of statements from multiple sources
> when this produces a valid instance? If so, then we should allow for
> two parties to declare a different generating event as primary, and
> struggle to see why this means their statements should be
> unmergeable. In conclusion, I'm not yet convinced by the idea.
> 
> Thanks,
> Simon
> 
> Dr Simon Miles
> Senior Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
> 
> Evolutionary Testing of Autonomous Software Agents:
> http://eprints.dcs.kcl.ac.uk/1370/
> ________________________________________
> From: James Cheney [jcheney@inf.ed.ac.uk]
> Sent: 09 August 2012 18:22
> To: Miles, Simon
> Cc: Provenance Working Group
> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation
> events  and activities [prov-dm-constraints]
> 
> OK.  The problem with removing the key constraint is that it takes
> away a lot more than we probably want, e.g. now we can say:
> 
> wasGeneratedBy(evt; widget, worker1, Monday)
> wasGeneratedBy(evt; widget, worker2, Tuesday).
> wasGeneratedBy(evt; widget, factory, Friday).
> 
> because (only) the key constraint says that all of the other fields
> have to match (except attributes, which can be merged).
> 
> That seems strange to me - the whole point of event identifiers (I
> thought) is to identify the events.  Most of what we have done
> assumes events that take place between exactly two things (or at most
> a small number), rather than arbitrarily many.  So I would say that
> at least the times should match, otherwise the thing gets generated
> at two different times.
> 
> It seems that there are two main use cases:
> 
> 1.  separate activities participating simultaneously in generating
> the same entity:
> 
> wasGeneratedBy(evt1;widget,worker1,t1)
> wasGeneratedBy(evt2;widget,worker2,t1)
> 
> 2.  super- and sub-activities generating the same entity via events
> describing different abstraction levels.
> 
> wasGeneratedBy(evt1;widget, factory,t1)
> wasGeneratedBy(evt2;widget, worker,t1)
> (some non-PROV statement that a1 is part of a2)
> 
> >From the point of view of PROV, there is no real difference, since
> >we don't have a way of saying an activity is a sub-activity of
> >another... Does this sound right?
> 
> As a strawman, why wouldn't it work to require a specific "primary"
> activity (which could be a new activity invented solely for this
> event), and have an attribute that such as prov:contributedTo that
> names other activities that contributed to a generation event
> (perhaps indirectly, such as a super-activity)?
> 
> Hence:
> 
> wasGeneratedBy(evt1;e,workers12,t1,[prov:contributed = worker1,
> prov:contributed = worker2)
> 
> wasGeneratedBy(evt1;e,worker,t1,[prov:contributed = factory)
> 
> --James
> 
> On Aug 9, 2012, at 5:53 PM, Miles, Simon wrote:
> 
> > Hello James,
> >
> > I'm not clear what the invalidity point would actually look like or
> > entail, so would prefer to reserve comment.
> >
> > Yes, happy to provide suggestions, examples, arguments etc. if you
> > say what you need. I didn't have a particular solution in mind in
> > the issue raised below, but agree with your suggestion in the
> > telecon that it implies the removal of the key constraint on
> > wasGeneratedBy.
> >
> > thanks,
> > Simon
> >
> > Dr Simon Miles
> > Senior Lecturer, Department of Informatics
> > Kings College London, WC2R 2LS, UK
> > +44 (0)20 7848 1166
> >
> > Evolutionary Testing of Autonomous Software Agents:
> > http://eprints.dcs.kcl.ac.uk/1370/
> > ________________________________________
> > From: James Cheney [jcheney@inf.ed.ac.uk]
> > Sent: 09 August 2012 17:23
> > To: Provenance Working Group
> > Subject: Re: PROV-ISSUE-473 (generating-activity): Unique
> > generation events and activities [prov-dm-constraints]
> >
> > The consensus was that this needs work, either by dropping some
> > inferences (provided we understand the implications) or finding a
> > way to accommodate multiple levels of abstraction.
> >
> > If we can find a way to allow the inference to be used to determine
> > *invalidity* if implementations agree with it, while not requiring
> > everyone use it, will that be OK?
> >
> > I will be pestering Simon, Daniel and Stian to offer suggestions
> > and/or examples.
> >
> > --James
> >
> >
> > On Aug 9, 2012, at 3:35 PM, Provenance Working Group Issue Tracker
> > wrote:
> >
> >> PROV-ISSUE-473 (generating-activity): Unique generation events and
> >> activities [prov-dm-constraints]
> >>
> >> http://www.w3.org/2011/prov/track/issues/473
> >>
> >> Raised by: Simon Miles
> >> On product: prov-dm-constraints
> >>
> >> As requested, I'm submitting an issue where I feel a
> >> PROV-Constraints review comment of mine is not completely answered.
> >>
> >> My original comment:
> >>> Unique generations
> >>> -----------
> >>> C. Immediately following Inference 12, the text says "the entity
> >>> denoted by e2 is generated by at most one activity (see Constraint
> >>> 27". The Remark below repeats this, "at most one activity could
> >>> generate the entity e2."
> >>>
> >>> This seems wrong. Constraint 27 says that e2 is generated by only
> >>> one generation event, not by only one activity. The distinction
> >>> between these is important. In the primer's example, there is an
> >>> activity ex:compile which is decomposed into steps ex:compose and
> >>> ex:illustrate. While there is only one (implicit) generation
> >>> event for entity ex:chart1, both ex:compile and ex:illustrate can
> >>> be asserted to have generated the entity.
> >>
> >> Response from editors:
> >>> Constraint 27 indeed says that there is a single generation event
> >>> and constraint 26 says that the id is a key for a wasGeneratedBy
> >>> which implies that there is a single activity.
> >>>
> >>> In the primer, you assert:
> >>> wasGeneratedBy(ex:chart1, ex:compile,  2012-03-02T10:30:00)
> >>> wasGeneratedBy(ex:chart1, ex:illustrate,  2012-03-02T10:30:00)
> >>>
> >>> This is invalid.
> >>>
> >>> One way to address this is to maintain two levels of abstraction
> >>> for both activities and entities.
> >>>
> >>> wasGeneratedBy(ex:chart1_abstract, ex:illustrate,
> >>> 2012-03-02T10:30:00)
> >>> specializationOf(ex:chart1,ex:chart1_abstract)  // or similar.
> >>
> >> This response explains why the current constraints do not allow
> >> what I described, but not why they are meaningful. The questions
> >> below hopefully articulate my concerns.
> >>
> >> 1. The response suggests that the invalidity of the primer example
> >> is due to it describing multiple levels of abstraction for a
> >> single entity. Why should this be invalid? Why has validity got
> >> anything to do with levels of abstraction? As far as I can see,
> >> this is not stated or explained in PROV-Constraints.
> >>
> >> 2. As ex:chart1_abstract and ex:chart1 are exactly the same entity
> >> with exactly the same attributes and generated at the same
> >> instant, then why would we want statements implying one was more
> >> abstract than the other? Isn't this at least misleading?
> >>
> >> I also have one related follow-on question:
> >>
> >> 3. Even if we do use the specialization approach to get around the
> >> constraints as suggested, there can only be one entity per
> >> generation event. If something is described at multiple levels of
> >> abstraction, then does that necessitate a unique generation event
> >> for each level (each entity)? If so (as appears), why? When I
> >> create the first version of a document, in the same instant I
> >> create both "doc" and "docV1". How do I describe that the event
> >> creating one is the "same" event that created the other? It is
> >> surely the "same" event in some strong, objective sense, even if
> >> we prefer to describe it using a different identifier for each
> >> entity.
> >>
> >> Thanks,
> >> Simon
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Thursday, 9 August 2012 21:23:07 UTC