RE: ISSUE-385: hasProvenanceIn: finding a solution from Miles, Simon on 2012-06-01 (public-prov-wg@w3.org from June 2012)

From: Miles, Simon <simon.miles@kcl.ac.uk>
Date: Fri, 1 Jun 2012 15:39:42 +0100
To: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <830EEE5C741ED54EAB28EBACFFC77984EE856F566F@KCL-MAIL04.kclad.ds.kcl.ac.uk>
Hi Luc,

I will try to articulate the points which I think back up the binary relations proposal.

1. As I understood, there is currently no semantics to a bundle. A querier can choose to consider the descriptions in the bundle or not (based on the bundle's provenance), but whether there are one or many bundles, the querier just has a set of PROV descriptions. The bundles need to be found and known to be relevant, which is why hasProvenanceIn (or isTopicOf) is needed. After that, which bundle a description is in is irrelevant and the bundling can be ignored. A specific extension of PROV may change this by adding semantics to bundles, but this is not in the current specification.

Taking the statements from the three bundles below, a querier would end up with:

  activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
  wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
  activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
  wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
  agent(tool:Bob1, [perf:rating="good"])
  agent(tool:Bob2, [perf:rating="bad"])

I can see nothing in the current specification to suggest this means anything different to when these descriptions are separated into multiple bundles. Do you agree?

2. If there are two entity identifiers relating to the same thing/entity, we need to say how they are connected: either alternateOf, specializationOf, or possibly some external relation such as owl:sameAs. While the example below happens to imply a specialisation relation between tool:Bob1 and ex:Bob, there is no reason to believe this is true in all cases: alternateOf is just as possible. So, hasProvenanceIn cannot imply or be a sub-type of either specializationOf or alternateOf, the appropriate one must be asserted separately.

3. The same thing described from different perspectives has multiple identifiers regardless of bundles, i.e. at least one for each entity. When a bundle is newly read by a querier interested in the provenance of entity E, they should consider every entity E is a specialisation of, and look for those identifiers as well. If they don't, they will miss information about the provenance of E described at a coarser granularity.

For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle ex:run1 might describe something about ex:GeneralBob's provenance. This makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it is not only ex:Bob that is relevant to look for in ex:run1.

Separating concerns, I'd argue it is preferable to say:
  hasProvenanceIn(tool:Bob1, ex:run1)
  specializationOf(tool:Bob1, ex:Bob)
  specializationOf(tool:Bob, ex:GeneralBob)
and let the querier search ex:run1 for all identifiers relevant to the entity. It seems irrelevant that the identifier tool:Bob1 is itself absent from bundle ex:run1, as it is only one of many identifiers for the entity/thing anyway.

Paraphrasing Paul from the telecon, hasProvenanceIn(tool:Bob1, ex:run1) can just mean "look in ex:run1 for more stuff relevant to tool:Bob1". If you know that tool:Bob1 is a specialisation of ex:Bob, then you should also look for ex:Bob.

Thanks,
Simon

Dr Simon Miles
Senior Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

accounting for the reasons behind contractual violations:
http://eprints.dcs.kcl.ac.uk/1283/
________________________________________
From: Luc Moreau [L.Moreau@ecs.soton.ac.uk]
Sent: 31 May 2012 22:54
To: Provenance Working Group WG
Subject: ISSUE-385: hasProvenanceIn: finding a solution

All,

To try and converge towards a solution, I am
circulating an example using a ternary hasProvenanceIn.
I would like to understand if and how we can make it work with
a simpler relation.


Two bundles ex:run1 and ex:run2 describe bob's role as a controller
of two activities.  Same bob, two different bundles.

     bundle ex:run1
      activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
//duration: 1hour
      wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
     endBundle

     bundle ex:run2
      activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
//duration: 7hours
      wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
     endBundle


A performance analysis tool rates the performance of agents (this could
be used
to dispatch further work to performant agents, or congratulate them, etc).


     bundle tool:analysis01

       agent(tool:Bob1, [perf:rating="good"])
       hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)  // Bob performance
in ex:run1 is good

       agent(tool:Bob2, [perf:rating="bad"])
       hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)  // Bob performance
in ex:run2 is bad

     endBundle

The performance analysis tool has to rate two involvements of ex:Bob in
two separate activities.
Two specialized version of ex:Bob are defined: tool:bob1 and tool:bob2,
with rating good and
bad respectively.

tool:Bob1 is linked to ex:Bob in run1, and tool:Bob2 is linked to ex:Bob
in run2, with the following

       hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)
       hasProvenanceIn(tool:Bob2, ex:run2, ex:Bob)

Nothing is expressed about ex:Bob in bundle tool:analysis01 (except that
this is an alias
for tool:Bob1 and tool:Bob2).

It is suggested that the ternary relation could be replaced by
isTopicIn(tool:Bob1, ex:run1)
and
specialization(tool:Bob1, ex:Bob).

I don't understand the point of
   isTopicIn(tool:Bob1, ex:run1)
since tool:Bob1 is not a topic in ex:run1.

Also, we now seem to have made ex:Bob a topic of tool:analysis01, because
the following expression.
specialization(tool:Bob1, ex:Bob).

 From tool:analysis01, where do I find provenance about ex:Bob?
It look like this has become a dead end in this graph.

Do I need to introduce:
   isTopicIn(ex:Bob, ex:run1)
   isTopicIn(ex:Bob, ex:run2)?


So now we would  have:
isTopicIn(tool:Bob1, ex:run1)
specialization(tool:Bob1, ex:Bob)
isTopicIn(tool:Bob2, ex:run2)
specialization(tool:Bob2, ex:Bob)
isTopicIn(ex:Bob, ex:run1)
isTopicIn(ex:Bob, ex:run2)

Which means that:

specialization(tool:Bob1, ex:Bob)
isTopicIn(ex:Bob, ex:run2)

... would lead us to believe that good rating is due to slow performance.

Can the proposer of the separate binary relations explain how this
example can work?

Thanks,
Luc
Received on Friday, 1 June 2012 14:40:58 UTC