Re: ISSUE-385: hasProvenanceIn: finding a solution from Paul Groth on 2012-06-03 (public-prov-wg@w3.org from June 2012)

From: Paul Groth <p.t.groth@vu.nl>
Date: Sun, 3 Jun 2012 19:40:20 +0300
To: Graham Klyne <GK@ninebynine.org>
Cc: Jun Zhao <jun.zhao@zoo.ox.ac.uk>, Luc Moreau <L.Moreau@ecs.soton.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <CAJCyKRqsHJ9JTUn-VWmagYc9p7FD5680YH6VwoPOZU4dp0UYkQ@mail.gmail.com>
Hi Graham,

I would argue that being able to refer to a bundle in which the
provenance of an entity is contained is an important piece of
functionality to allow people to easily organize their provenance
information.

I can see the point about trying to reuse the relation between the PAQ
and the dm.

cheers
Paul


On Sun, Jun 3, 2012 at 9:48 AM, Graham Klyne <GK@ninebynine.org> wrote:
> (I'm replying arbitrarily to Jun's email to maintain the thread, but my comment
> is to the issue in general.  As it happens, my point about semantics is
> underscored by Jun's comment about time constraint - I think it's a non-issue
> here, but not obviously so.)
>
> I think the problem we're running into is that we agreed at the last F2F to
> remove all the additional semantics associated with account.  Thus, to
> paraphrase Simon's excellent summary, a bundle is just a named set of provenance
> statements without any further semantics.  But it appears that Luc's example
> needs more semantics than just a named set of provenance statements - and that's
> where I think we are running into problems, because we are not clear about
> exactly what those additional semantics should be.
>
> Therefore I suggest that, according to prior WG agreement, Luc's example is out
> of scope for us to fully resolve.  Paul's suggestion to provide the attributes
> as an extensibility hook is one possible approach.
>
> Another possible and more radical approach, prompted by Tim's earlier suggestion
> to take a local name from DC, is to drop hasProvenanceIn entirely from the prov
> specification, and (in the usage guidelines) document use the DC term for this
> purpose.  This will leave the field clear for subsequent work to define a
> suitable cross-bundle primitive when we have a clearer common understanding of
> the actual requirements.
>
> I summary, options that work for me would be (in order of preference):
> (1) drop hasProvenanceIn entirely and move on.  Use existing terms from other
> vocabularies to express this idea. (**)
> (2) adopt Paul's suggestion of an extensible 2-place relation (*)
>
> (*) noting the importance of monotonicity here: extension attributes must not be
> able to change semantics of the underlying property.  If the underlyong property
> has no (formal) semantics, this is easy.  If the underlying property does have
> built-in semantics, then the utility of the extension may be limited (or worse,
> careless extensions may break the underlying semantic model associated with the
> core provenance model).
>
> (**) the slight inconsistency here would be that PROV-AQ still requires a
> prov:hasProvenance relation.  I'm OK with this because PROV-AQ is intended to
> address operational concerns where the model is not.  But this does create a
> reasonably compelling argument for having a corresponding relation in the model
> - if the semantics are minimal then the same relation can work at both levels.
>
> #g
> --
>
>
> On 02/06/2012 22:36, Jun Zhao wrote:
>> Paul,
>>
>> At first sight, I loved your proposal. But after reading into it, I got less sure.
>>
>> This property is to allow locating the bundle in which the provenance of an entity is described. To qualify this, would it mean that, e.g, there is a time period during which you can find provenance of that entity in the bundle and after that you can't?
>>
>> Although the pattern you propose makes sense, I can't see when people need to qualify this relation. If you have a more concrete example in mind, I am ready to be convinced!
>>
>> Cheers,
>>
>> Jun
>>
>> Sent from my iPad (sorry for the brevity)
>>
>> On 1 Jun 2012, at 17:03, Paul Groth<p.t.groth@vu.nl>  wrote:
>>
>>> Hi All,
>>>
>>> It seems that a one approach would be to define an extensible version
>>> of hasProvenanceIn and leave it at that.
>>>
>>> hasProvenanceIn(id, entity, bundle, attrs).
>>>
>>> Like all our extensible relations, we would also have the straight
>>> binary version
>>>
>>> hasProvenanceIn(entity,bundle)
>>>
>>> This would allow for the extensibility to cater for Luc's use case but
>>> also for other use cases where extension is nice. For example, I can
>>> imagine a system wanting to put a time constraint on the applicability
>>> of provenance in a bundle to an entity.
>>>
>>> This would leave it up to people to define specialization, alternate
>>> and derivation relations between entities as they want.
>>>
>>> Would this be acceptable to the group?
>>>
>>> Thanks
>>> Paul
>>>
>>>
>>>
>>> On Fri, Jun 1, 2012 at 5:33 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk>  wrote:
>>>> Hi Simon,
>>>>
>>>> Thanks for your message. I feel you don't directly respond to the points
>>>> that I raised,
>>>> and therefore all my comments stand.
>>>>
>>>> I respond to your points below.
>>>>
>>>> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>>>>> Hi Luc,
>>>>>
>>>>> I will try to articulate the points which I think back up the binary relations proposal.
>>>>>
>>>>> 1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.
>>>>>
>>>>>
>>>>
>>>> A close notion to bundle in prior provenance art is opm:Account, and
>>>> there is plenty of evidence
>>>> that merging accounts may lead to contradictions.  PROV, rightly so,
>>>> does not define a union operator
>>>> over bundles, and is silent about merging or not bundles.
>>>>
>>>> Therefore,  there is nothing in PROV that backs this statement "which
>>>> bundle a description is in is
>>>> irrelevant and the bundling can be ignored".
>>>>
>>>> You are suggesting that an extension of PROV may add semantics to
>>>> bundles: that's exactly what you
>>>> have done, by implying they are mergeable.
>>>>
>>>>> Taking the statements from the three bundles below, a querier would end up with:
>>>>>
>>>>>   activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>>>>>   wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>>>>   activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>>>>   wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>>>>   agent(tool:Bob1, [perf:rating="good"])
>>>>>   agent(tool:Bob2, [perf:rating="bad"])
>>>>>
>>>>> I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?
>>>>>
>>>>>
>>>>
>>>> PROV does not specify whether they mean something different or not.
>>>>
>>>>> 2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.
>>>>>
>>>>
>>>> I agree that being able to assert subtypes for hasProvenanceIn is
>>>> important: that why I am
>>>> in favour of having hasProvenanceIn a n-ary relation that includes
>>>> attributes so that prov:type can be
>>>> used for what you suggest.
>>>>> 3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.
>>>>>
>>>>> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.
>>>>>
>>>>> Separating concerns, I'd argue it is preferable to say:
>>>>>   hasProvenanceIn(tool:Bob1, ex:run1)
>>>>>   specializationOf(tool:Bob1, ex:Bob)
>>>>>   specializationOf(tool:Bob, ex:GeneralBob)
>>>>>
>>>> But this latter statement would belong to the ex:run1 bundle I assume.
>>>> It is not going to be known to be relevant to me until I have correctly
>>>> been able to link tool:Bob1 to ex:Bob in run1.
>>>>
>>>>
>>>>> and let the que
>>
>>
Received on Sunday, 3 June 2012 16:40:50 UTC