Re: ISSUE-385: hasProvenanceIn: finding a solution from Timothy Lebo on 2012-06-04 (public-prov-wg@w3.org from June 2012)

From: Timothy Lebo <lebot@rpi.edu>
Date: Sun, 3 Jun 2012 22:17:28 -0400
To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Cc: Paul Groth <p.t.groth@vu.nl>, Jun Zhao <jun.zhao@zoo.ox.ac.uk>, Luc Moreau <L.Moreau@ecs.soton.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-Id: <F4A74A14-A27C-452C-8654-905BE1EFF934@rpi.edu>
On Jun 3, 2012, at 2:29 PM, Graham Klyne wrote:

> Paul,
> 
> I somewhat agree with you.
> 
> But, to play devil's advocate here, I could also argue that it's not the job of a data model to help implementers *organize* their data.
> 
> I will claim that, for the purposes of provenance data modelling, the hasProvenanceIn is unnecessary.  If one has a number of bundles, one could load them all

That's where it falls apart. We don't have the entire web on our laptops. And data web publishers need to provide means for consumers to offers suggestions for what they should obtain.
(again, I'm a PAQ-naive here)

> into a single bundle (creating a new bundle that is the union of the given bundles), then look for information about particular entities in the merged bundle.  On that score, the data model's job is done.
> 
> What we *do* need, and the reason that we have bundles, is a way to label a particular bundle of provenance to that we can assert provenance *about* that bundle.

+1

>  We don't need hasProvenanceIn for that.

Sure.

> 
> So, to repeat my earlier claim, we don't *need* hasProvenanceIn to support the functionality that was intended (or agreed) to be provided by provenance bundles.  In this respect, hasProvenanceIn is scope creep.  So, if it's proving hard to agree what it means, I think it should be dropped.

Or, minimally included at a broad level. :-)
And let its development continue beyond our Rec.

-Tim


> 
> #g
> --
> 
> 
> On 03/06/2012 17:40, Paul Groth wrote:
>> Hi Graham,
>> 
>> I would argue that being able to refer to a bundle in which the
>> provenance of an entity is contained is an important piece of
>> functionality to allow people to easily organize their provenance
>> information.
>> 
>> I can see the point about trying to reuse the relation between the PAQ
>> and the dm.
>> 
>> cheers
>> Paul
>> 
>> 
>> On Sun, Jun 3, 2012 at 9:48 AM, Graham Klyne<GK@ninebynine.org>  wrote:
>>> (I'm replying arbitrarily to Jun's email to maintain the thread, but my comment
>>> is to the issue in general.  As it happens, my point about semantics is
>>> underscored by Jun's comment about time constraint - I think it's a non-issue
>>> here, but not obviously so.)
>>> 
>>> I think the problem we're running into is that we agreed at the last F2F to
>>> remove all the additional semantics associated with account.  Thus, to
>>> paraphrase Simon's excellent summary, a bundle is just a named set of provenance
>>> statements without any further semantics.  But it appears that Luc's example
>>> needs more semantics than just a named set of provenance statements - and that's
>>> where I think we are running into problems, because we are not clear about
>>> exactly what those additional semantics should be.
>>> 
>>> Therefore I suggest that, according to prior WG agreement, Luc's example is out
>>> of scope for us to fully resolve.  Paul's suggestion to provide the attributes
>>> as an extensibility hook is one possible approach.
>>> 
>>> Another possible and more radical approach, prompted by Tim's earlier suggestion
>>> to take a local name from DC, is to drop hasProvenanceIn entirely from the prov
>>> specification, and (in the usage guidelines) document use the DC term for this
>>> purpose.  This will leave the field clear for subsequent work to define a
>>> suitable cross-bundle primitive when we have a clearer common understanding of
>>> the actual requirements.
>>> 
>>> I summary, options that work for me would be (in order of preference):
>>> (1) drop hasProvenanceIn entirely and move on.  Use existing terms from other
>>> vocabularies to express this idea. (**)
>>> (2) adopt Paul's suggestion of an extensible 2-place relation (*)
>>> 
>>> (*) noting the importance of monotonicity here: extension attributes must not be
>>> able to change semantics of the underlying property.  If the underlyong property
>>> has no (formal) semantics, this is easy.  If the underlying property does have
>>> built-in semantics, then the utility of the extension may be limited (or worse,
>>> careless extensions may break the underlying semantic model associated with the
>>> core provenance model).
>>> 
>>> (**) the slight inconsistency here would be that PROV-AQ still requires a
>>> prov:hasProvenance relation.  I'm OK with this because PROV-AQ is intended to
>>> address operational concerns where the model is not.  But this does create a
>>> reasonably compelling argument for having a corresponding relation in the model
>>> - if the semantics are minimal then the same relation can work at both levels.
>>> 
>>> #g
>>> --
>>> 
>>> 
>>> On 02/06/2012 22:36, Jun Zhao wrote:
>>>> Paul,
>>>> 
>>>> At first sight, I loved your proposal. But after reading into it, I got less sure.
>>>> 
>>>> This property is to allow locating the bundle in which the provenance of an entity is described. To qualify this, would it mean that, e.g, there is a time period during which you can find provenance of that entity in the bundle and after that you can't?
>>>> 
>>>> Although the pattern you propose makes sense, I can't see when people need to qualify this relation. If you have a more concrete example in mind, I am ready to be convinced!
>>>> 
>>>> Cheers,
>>>> 
>>>> Jun
>>>> 
>>>> Sent from my iPad (sorry for the brevity)
>>>> 
>>>> On 1 Jun 2012, at 17:03, Paul Groth<p.t.groth@vu.nl>    wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> It seems that a one approach would be to define an extensible version
>>>>> of hasProvenanceIn and leave it at that.
>>>>> 
>>>>> hasProvenanceIn(id, entity, bundle, attrs).
>>>>> 
>>>>> Like all our extensible relations, we would also have the straight
>>>>> binary version
>>>>> 
>>>>> hasProvenanceIn(entity,bundle)
>>>>> 
>>>>> This would allow for the extensibility to cater for Luc's use case but
>>>>> also for other use cases where extension is nice. For example, I can
>>>>> imagine a system wanting to put a time constraint on the applicability
>>>>> of provenance in a bundle to an entity.
>>>>> 
>>>>> This would leave it up to people to define specialization, alternate
>>>>> and derivation relations between entities as they want.
>>>>> 
>>>>> Would this be acceptable to the group?
>>>>> 
>>>>> Thanks
>>>>> Paul
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jun 1, 2012 at 5:33 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk>    wrote:
>>>>>> Hi Simon,
>>>>>> 
>>>>>> Thanks for your message. I feel you don't directly respond to the points
>>>>>> that I raised,
>>>>>> and therefore all my comments stand.
>>>>>> 
>>>>>> I respond to your points below.
>>>>>> 
>>>>>> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>>>>>>> Hi Luc,
>>>>>>> 
>>>>>>> I will try to articulate the points which I think back up the binary relations proposal.
>>>>>>> 
>>>>>>> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> A close notion to bundle in prior provenance art is opm:Account, and
>>>>>> there is plenty of evidence
>>>>>> that merging accounts may lead to contradictions.  PROV, rightly so,
>>>>>> does not define a union operator
>>>>>> over bundles, and is silent about merging or not bundles.
>>>>>> 
>>>>>> Therefore,  there is nothing in PROV that backs this statement "which
>>>>>> bundle a description is in is
>>>>>> irrelevant and the bundling can be ignored".
>>>>>> 
>>>>>> You are suggesting that an extension of PROV may add semantics to
>>>>>> bundles: that's exactly what you
>>>>>> have done, by implying they are mergeable.
>>>>>> 
>>>>>>> Taking the statements from the three bundles below, a querier would end up with:
>>>>>>> 
>>>>>>>   activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>>>>>>>   wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>>>>>>   activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>>>>>>   wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>>>>>>   agent(tool:Bob1, [perf:rating="good"])
>>>>>>>   agent(tool:Bob2, [perf:rating="bad"])
>>>>>>> 
>>>>>>> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> PROV does not specify whether they mean something different or not.
>>>>>> 
>>>>>>> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.
>>>>>>> 
>>>>>> 
>>>>>> I agree that being able to assert subtypes for hasProvenanceIn is
>>>>>> important: that why I am
>>>>>> in favour of having hasProvenanceIn a n-ary relation that includes
>>>>>> attributes so that prov:type can be
>>>>>> used for what you suggest.
>>>>>>> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
>>>>>>> 
>>>>>>> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
>>>>>>> 
>>>>>>> Separating concerns, I'd argue it is preferable to say:
>>>>>>>   hasProvenanceIn(tool:Bob1, ex:run1)
>>>>>>>   specializationOf(tool:Bob1, ex:Bob)
>>>>>>>   specializationOf(tool:Bob, ex:GeneralBob)
>>>>>>> 
>>>>>> But this latter statement would belong to the ex:run1 bundle I assume.
>>>>>> It is not going to be known to be relevant to me until I have correctly
>>>>>> been able to link tool:Bob1 to ex:Bob in run1.
>>>>>> 
>>>>>> 
>>>>>>> and let the que
>>>> 
>>>> 
>> 
> 
>
Received on Monday, 4 June 2012 02:18:25 UTC