RE: simon:entity (or Identifiable) from Myers, Jim on 2011-07-15 (public-prov-wg@w3.org from July 2011)

From: Myers, Jim <MYERSJ4@rpi.edu>
Date: Fri, 15 Jul 2011 19:50:46 -0400
To: <reza.bfar@oracle.com>, <public-prov-wg@w3.org>
Message-ID: <B7376F3FB29F7E42A510EB5026D99EF2040413F4@troy-be-ex2.win.rpi.edu>
This is going in the direction of a hierarchy of 'states' of an identifier? If so - I don't think we have a hierarchy. If not, then I'm not sure what the DAG represents.

I remember Graham making a comment at one point about trying to write a page that talked more about the purpose of the model (as he wrote for access) - I wonder if that would help. Here's my attempt to describe the requirements and where we agree/disagree in this style. (My take could be wrong but perhaps we would make progress by identifying if we disagree on requirements or where some are debating something that others consider resolved. If so, perhaps trying to modify the text below would help before we dive back to specific points).

In the following, I intend only the English meanings of words unless otherwise noted.

There's a set of things we've agreed to/ignored for a while related to the basic 'inputs - process execution -outputs' where the purpose of the model is to describe the history in cases where inputs and outputs are clear and the effects of a process execution are  captured by the set of inputs and outputs (i.e. the process execution can't just change an input).

We also want to be able to model cases where the process execution does change something versus just using input and generating outputs. A document with versions is one example. In that case we're making the choice to model both the document and its versions and are adding a relationship (IVPof) between then to signify that the object we consider to be changing could instead be thought of as distinct objects (document with content1 and document with content2) that can be handled by the base input-process execution-output model.

We're debating:
 how to define this relationship 
 whether the document and its versions are the same type/class in the model
 
We also expect the model to cover a third case - where we have two different things - e.g. a document and a file - that may both have provenance, but at some point have a correspondence - the file bytes represent the document's content. This case causes problems for IVPof definitions that involve hierarchy since one can't really consider either a document or a file to be more stateful versions of the other.

This again leads to debate about the definition of IVPof. So far formulation of this concept has been attempted in terms of properties and dimensions as well as in terms of 'perspective relative to processes'. Some of the debate here has been when these definitions start to include hierarchy (thus not fitting the third use case), but it may be possible to formulate all three in ways that don't require hierarchy.

This last use case also makes it harder to see a difference in the types of thing like document and version. In particular, if we can imagine more than one level for the second use case (e.g. document-version-encodedVersion), or think about the third case with no hierarchy, a two class system of thing and thing-state does not appear workable.

Another issue that has arisen in the discussions is how to refer to things outside the model. We have several reasons we want to do this - 
  to allow discovery of things with provenance using descriptive metadata/behavior/other context outside the model
  to aid in the definition of IVPof, where multiple hierarchies ala TBL and the third, non-hierarchical use case make it hard not to talk about something 'real' that both things involved in an IVPof relationship are describing/representing.

Throughout we have trouble with nomenclature thing/entity/stuff/etc., describe/represent/view of/etc. which helps obscure when we do/don't agree.

We(I anyway) may be confusing what the model contains versus how the model will be implemented (in RDF or in other languages we think in).

I don't know that this is complete, but perhaps I can stop and ask whether this is already controversial or if it captures some of the nature of our debates?

 Jim

-----Original Message-----
From: public-prov-wg-request@w3.org on behalf of Reza B'Far
Sent: Fri 7/15/2011 2:22 PM
To: public-prov-wg@w3.org
Subject: Re: simon:entity (or Identifiable)
 
Folks -

I realize that the "R" word has been banned and am fine with that.  Here is a 
suggestion for reconciliation of proposals/suggestions by Ryan, Jim(s), and Luc -

 1. That we specify that Identifier is some "base-line" temporally identified as
    zero point (there exist no entity to be identified before this point).
 2. That we have a new concept that encapsulates a single "state" (sorry, I know
    that's another dangerous word) of identifier from that point on.  I don't
    want to give it a name so I'll call it set S{}.
 3. An Identifier can have a DAG (Directed Acyclic Graph) of S{} nodes where the
    DAG has a single root node and that root node has equivalence with the
    identifier itself.

Just trying to reconcile at this point.


On 7/15/11 10:46 AM, Jim McCusker wrote:
> On Fri, Jul 15, 2011 at 12:06 PM, Myers, Jim<MYERSJ4@rpi.edu>  wrote:
>>> Being able to describe what the entity "looks like" at the time the
>>> provenance was recorded.
>>>
>>> My understanding was that a BOB was something like a named graph,
>> graph
>>> literal (http://webr3.org/blog/semantic-web/rdf-named-graphs-vs-graph-
>>> literals/),
>>> or information artifact similar to iao:Dataset. The Bob would then
>> have
>>> content that described, in some way, the entity in question.
>>> Hence the Bob being a description of an entity's state.
>> Do you distinguish 'description of an entity' from 'description of an
>> entity's state'? I get the sense that you are not using state in the
>> same sense of 'a more stateful view of' that is driving the discussion
>> of entity versus entity-state in the IVPof debates.
> Any description of an entity will occur with an entity in a particular
> state, and so two are the same.
>
>>> If it is possible to know, there should be assertions on the BOB
>> itself that say
>>> which entity the BOB is describing. Ideally, this is a URI of
>> something that's
>>> referenced within the BOB.
>> I'm hoping someone will chime in on this - I agree we need to connect
>> the idea of a bob with the entity, but I could see implementing that as
>> a link (as you say) or by saying that my entity's class is a subtype of
>> Bob (hence there's only one URL for the Bob and the entity).
> But that's clearly wrong, since Bobs only describe the state of an
> entity at one point/span of time and context. If the same entity is
> observed again, and a new Bob is created that describes the state
> differently, then there's nothing to tie it down. I'm guessing that by
> saying there is no referable entity outside of the Bob, then you can
> just make Bobs all the way down. But there would be no grounding to
> non-provenance resources in this case.
>
> The Bob is the description of something based on its state, the Entity
> is that something. A description of a thing is not the thing itself.
> Within the context of information systems, one can say that
> http://tw.rpi.edu/instances/JamesMcCusker is me. If you were to
> download the RDF from that URL that would contain a description of me
> within the context of RPI. The graph literal behind
> http://tw.rpi.edu/instances/JamesMcCusker is one description (that can
> change over time), and can be given an identifier using a graph digest
> [1], guaranteeing that we always talk about the same graph. But that
> graph is not me, even though the URI that returns it stands in for me
> in the semantic web.
>
> [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.2187&rep=rep1&type=pdf
>
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker@yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu
>
> PhD Student
> Tetherless World Constellation
> Rensselaer Polytechnic Institute
> mccusj@cs.rpi.edu
> http://tw.rpi.edu
>
Received on Friday, 15 July 2011 23:51:45 UTC