Re: playing with pil ontology from Luc Moreau on 2011-08-25 (public-prov-wg@w3.org from August 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Thu, 25 Aug 2011 10:33:51 +0100
To: public-prov-wg@w3.org
Message-ID: <EMEW3|d7187c92185504ba838714d0a54e2cc4n7OAZA08L.Moreau|ecs.soton.ac.uk|4E5616FF>
Hi Simon,

You need to assert (2), which then allows you to express (1).

I tried to introduce a container in the file example. See [1].
In the provenance container found at [1], I have embedded some provenance.
(It could have been kept separate, using the PAQ mechanism)

The notion of scope is becoming crucial, and we need to address it (see 
ISSUE-81).

Cheers,
Luc

[1] http://dvcs.w3.org/hg/prov/raw-file/default/model/container-example.pasn

On 24/08/11 09:56, Simon Miles wrote:
> Hi Luc,
>
> I'm trying to understand the implication of your distinction below.
>
> Does this mean that if I have a document containing (only) provenance
> assertions which I identify with URI http://one, then I cannot simply
> assert the following?
>
> (1) http://one pil:isDerivedFrom http://two
>
> What does "provenance [container] can be asserted to be an entity"
> entail?  Is it enough for me to say the following:
>
> (2) http://one rdf:type pil:Entity
>
> and this would then allow me to also say (1)? Or is there something
> more required to transform http://one into a PIL entity?
>
> Thanks,
> Simon
>
> On 23 August 2011 23:20, Luc Moreau<L.Moreau@ecs.soton.ac.uk>  wrote:
>    
>> Hi Jim and all,
>>
>> I am picking on this specific message, since it seems to represent an
>> idea that is evolving in the WG.
>>
>> As earlier today, I don't agree with:
>>   >  provenance is 'just another entity'
>>
>> Adopting the phrasing that Jim used in a previous response, I would
>> say that:
>>
>>   provenance [container] can be asserted to be an entity
>>
>> Luc
>>
>> On 15/08/11 19:43, Myers, Jim wrote:
>>      
>>> I agree provenance is 'just another entity' at some level, so perhaps a subtype of entity (versus a separate concept). The two types of things that seem different to me from HTML and XML docs are:
>>>
>>> I would like a PIL interpreter to do something with accounts it gets, i.e. read them and make the contents accessible. I might also want to keep the link between a statement and its source 'account, allow streaming accounts (e.g. from a live sensor), allow multiple accounts in one document, etc. I suspect that simply standardizing how one indicates that a resource is of type 'provenance doc' gives a partial solution, but I see an analogy to the reasoning leading to NamedGraphs as well (why not just have RDF docs? Do their reasons (related to signing, etc.) apply for us as well? (Such as - if we want to sign provenance, wouldn't it be nice to put the signature in the same doc as the provenance statements it refers to? But that changes a cryptographic signature unless there's a mechanism to specify which parts of the doc are the account...)).
>>>
>>> The other type of thing we explored in OPM was in relating accounts - being able to state that one account was consistent/inconsistent with another one. If we want that for PIL, we'd need to define a relationship(s) between accounts (which could still be subtypes of entities).
>>>
>>> In any case - probably more discussion required - I just didn't want 'collection' as an aggregate entity concept to get lumped together with the account/provenance doc/prov container and cause additional confusion.
>>>
>>>    Jim
>>>
>>>
>>>
>>>        
>>>> -----Original Message-----
>>>> From: Graham Klyne [mailto:GK@ninebynine.org]
>>>> Sent: Monday, August 15, 2011 12:10 PM
>>>> To: Myers, Jim
>>>> Cc: Satya Sahoo; Deus, Helena; Khalid Belhajjame; public-prov-wg@w3.org
>>>> Subject: Re: playing with pil ontology
>>>>
>>>> Jim,
>>>>
>>>> FWIW, in PAQ we talk about "provenance information" as just another resource
>>>> that includes provenance assertions.  To my mind, it's primary representation
>>>> would be as an RDF document.
>>>>
>>>> The terminology here is subject to review and harmonization with the model,
>>>> but I'm not convinced that we need a new concept in the model for this, and
>>>> I'm not keen on a name involving "container", as in my mind that sets up
>>>> expectations of a distinct layer of encapsulation.  We don't talk about
>>>> "containers" for HTML or XML elements, we just talk about HTML and XML
>>>> documents.  Same for provenance, IMO.
>>>>
>>>> I suppose that suggests "Provenance Document", or similar.
>>>>
>>>> #g
>>>> --
>>>>
>>>> Myers, Jim wrote:
>>>>
>>>>          
>>>>> A couple quick comments: I don't think we've distinguished provenance
>>>>> container and account at this point - they are an entity which
>>>>> contains provenance statements and are used to enable you to talk
>>>>> about how the provenance was created (what processes and inputs caused
>>>>> those statements to be), but collection has been discussed as a
>>>>> general aggregate entity/container - a bag of marbles is an entity and
>>>>> saying a process execution used it is shorthand for talking about the
>>>>> individual marbles. A file is a collection of bytes and a process
>>>>> execution may only use some of the bytes, etc.
>>>>>
>>>>>
>>>>>
>>>>> Re: roles - I would argue that you should use something quite specific
>>>>> for the role of your temperature parameter, e.g.
>>>>> "processingtempraturesetpoint' rather than a generic "input" or
>>>>> "inputParameter" role (parameter might still be a supertype of
>>>>> processingtemperaturesetpoint). This would be necessary if, for
>>>>> example, your process execution had a reaction temperature and a
>>>>> storage temperature as inputs - now you have two numbers/two
>>>>> temperatures and you have to use each in the correct role for the
>>>>> provenance to be correct. In many cases, you could potentially
>>>>> describe the type of the entity itself well enough to make the
>>>>> provenance clear, but putting the information into the entity typing
>>>>> rather than into the role it has relative to the process execution
>>>>> causes trouble if you use the entity in multiple processes (if I make
>>>>> an entity that is of type "processingtemperaturesetpoint" and I have a
>>>>> second process that displays a "printablenumber" that uses it as
>>>>> input, the same entity can't also be of type "printable number" -
>>>>> better to make the entity have type number and play a
>>>>> 'processingtemperaturesetpoint" role in one process and the
>>>>> "printablenumber" role in the other.)
>>>>>
>>>>>
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>> *From:* public-prov-wg-request@w3.org
>>>>> [mailto:public-prov-wg-request@w3.org] *On Behalf Of *Satya Sahoo
>>>>> *Sent:* Monday, August 15, 2011 11:02 AM
>>>>> *To:* Deus, Helena
>>>>> *Cc:* Khalid Belhajjame; public-prov-wg@w3.org
>>>>> *Subject:* Re: playing with pil ontology
>>>>>
>>>>>
>>>>>
>>>>> Hi Lena,
>>>>>
>>>>> Thanks again for trying to use the ontology for the microarray use case!
>>>>>
>>>>>
>>>>>
>>>>> My comments are inline:
>>>>>
>>>>>
>>>>>
>>>>>    >I am not questioning whether agent should be mapped to agents
>>>>> defined elsewhere, which seems to>be obvious- only wondering whether
>>>>> agent "label" and "description" are things we want to standardize>in
>>>>> our model or not. We can "suggest" rdfs:label and rdfs:comment without
>>>>> enforcing it as such ->having those included in the model will likely
>>>>> result in much less heterogeneity when it comes to>reporting
>>>>> provenance (particularly since we are defining it necessarily "open"
>>>>> and highly granular to fit>any particular domain.
>>>>>
>>>>>
>>>>>
>>>>> I am not sure I understand your point. The rdfs:label and rdfs:comment
>>>>> are two of the nine annotation properties that are part of the OWL2
>>>>> syntax. So, the provenance ontology encoded in OWL includes them by
>>>>> default.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>    >    What was its intended purpose/role in the description of provenance?
>>>>>
>>>>>
>>>>>
>>>>> Provenance container, account, and collection are related concepts for
>>>>> modeling a collection of provenance assertions. E.g. provenance of a
>>>>> Affymetrix gene chip will be a collection of provenance assertions
>>>>> (date of manufacture, location of manufacturer, production series
>>>>> etc.) that can be stored in a single file and the file will be a
>>>>> provenance container.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> Example: a list of height measurement is an "untransformed" entity (a
>>>>>>
>>>>>>              
>>>>> dataset); the average of that list>is the "transformed" entity
>>>>> (another dataset, although a very simple one).
>>>>>
>>>>>
>>>>>            
>>>>>> I am dealing with much more complex workflows, (e.g. files containing
>>>>>>
>>>>>>              
>>>>> the outcome of a microarray>experiment as the untransformed dataset
>>>>> and a list of differentially expressed genes as the>transformed
>>>>> dataset), so please take the example above is just illustrative.
>>>>>
>>>>>
>>>>>
>>>>> I am not sure I see the granularity/expressivity issue in the above
>>>>> example (from your first mail). Both the "untransformed" and
>>>>> "transformed" entities map to input and output data of a process
>>>>> execution - we can create subclass of Entity for this purpose.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> An investigator (agent) performs an experiment That experiment has
>>>>>>
>>>>>>              
>>>>> several input parameters, some>of which are entities (e.g. samples),
>>>>> other are not (e.g. temperature) Resulting from the experiment are
>>>>>
>>>>>            
>>>>>> several output parameters (entities)
>>>>>>
>>>>>>              
>>>>>
>>>>> I am confused by the above scenario. Why is temperature not an entity?
>>>>> Both the input (sample) and (temperature) are special types (sub
>>>>> class) of entities - (a) InputData and (b) InputParameter etc.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> So if I understand what you are saying correctly, "temperature" would
>>>>>>
>>>>>>              
>>>>> be an entity of type "input",>which in turn would be subclass of
>>>>> "role". An instance of "input" could then have a certain value (e.g.
>>>>>    >15C) in one of its properties?
>>>>>
>>>>>
>>>>>            
>>>>>> In that case, does it make sense to include "input" and "output"
>>>>>> classes
>>>>>>
>>>>>>              
>>>>> in the model as subclasses of>"role"? Or is this something that me
>>>>> and Stephan exemplify in the primer document under "usage of>agent"
>>>>> (or something of the sort)?
>>>>>
>>>>>
>>>>>
>>>>> I agree with Khalid's example where Role allows us to model more
>>>>> complex scenarios. For example, X is an instance of class HumanBeing
>>>>> (perhaps as subclass of entity) and X has multiple roles - researcher,
>>>>> parent, soccer player etc. To model these "functions" we will use the
>>>>> Role class. I believe in the microarray scenario (in your first mail)
>>>>> Roles are not needed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> In that case, does it make sense to include "input" and "output"
>>>>>>
>>>>>>              
>>>>> classes in the model as>subclasses of "role"? Or is this something
>>>>> that me and Stephan exemplify in the primer>document under "usage of
>>>>>
>>>>>            
>>>> agent"
>>>>
>>>>          
>>>>> (or something of the sort)?
>>>>>
>>>>>
>>>>>
>>>>> Sorry I did not understand this. Role can be used by any entity, why
>>>>> only "usage of agent"?
>>>>>
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Satya
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena<helena.deus@deri.org
>>>>> <mailto:helena.deus@deri.org>>    wrote:
>>>>>
>>>>> Hi Khalid,
>>>>>
>>>>> Please see comments inline
>>>>>
>>>>>
>>>>>
>>>>> *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.man.ac.uk
>>>>> <mailto:Khalid.Belhajjame@cs.man.ac.uk>]
>>>>> *Sent:* 12 August 2011 10:22
>>>>> *To:* Deus, Helena
>>>>> *Cc:* public-prov-wg@w3.org<mailto:public-prov-wg@w3.org>
>>>>> *Subject:* Re: playing with pil ontology
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Helena,
>>>>>
>>>>> Thanks for this, I think that this is a good exercise and some of the
>>>>> point you mentioned relate to the conceptual model, not only the
>>>>> formal model.
>>>>>
>>>>> On 11/08/2011 18:52, Deus, Helena wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Reiterating a bit on what was addressed today  in the telco, I
>>>>> downloaded the ontology from mercurial and tried to use it with my use
>>>>> case.
>>>>>
>>>>> I am using the use cases published in [1] and demoed with SPARQL at
>>>>> http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html
>>>>>
>>>>>
>>>>>
>>>>> Here is my input so far:
>>>>>
>>>>>
>>>>>
>>>>> Agent could have dataProperty "label" and "description"; it would help
>>>>> the implementer describe what type of agent does he/she intend to
>>>>> describe. Is the ontology here being confused with the query model?
>>>>>
>>>>> I think that there was previously a long thread discussion on agent
>>>>> and agent types, and whether the model should be prescriptive in this
>>>>> respect. One of the solutions that I think many people were happy with
>>>>> is to leave users choose their favorite model(ontology) for agent,
>>>>> which means that the agent class defined in the ontology acts as a
>>>>> place holder that can be specialized to include description, types,
>>>>> and whatever the application needs.
>>>>>
>>>>>
>>>>>
>>>>> I am not questioning whether agent should be mapped to agents defined
>>>>> elsewhere, which seems to be obvious- only wondering whether agent
>>>>> "label" and "description" are things we want to standardize in our
>>>>> model or not. We can "suggest" rdfs:label and rdfs:comment without
>>>>> enforcing it as such - having those included in the model will likely
>>>>> result in much less heterogeneity when it comes to reporting
>>>>> provenance (particularly since we are defining it necessarily "open"
>>>>> and highly granular to fit any particular domain.
>>>>>
>>>>>
>>>>>
>>>>> ProvenanceContainer is not useful, or its description is not clear;
>>>>> what should be an instance of provenanceContainer?
>>>>>
>>>>>
>>>>> At this stage, the description of this concept is not yet stable in
>>>>> the conceptual model as far as I know.
>>>>>
>>>>>
>>>>>
>>>>> What was its intended purpose/role in the description of provenance?
>>>>>
>>>>>
>>>>>
>>>>> I want to create an instance of a "untransformed" entity (in my case,
>>>>> a
>>>>> dataset) and a "transformed" entity. Is the model going to give me
>>>>> that granularity/expressivity or do we expect each implementer to come
>>>>> up with their own way of defining these?
>>>>>
>>>>> Could you please clarify what you mean by transformed and
>>>>> untransformed entity?
>>>>>
>>>>> Example: a list of height measurement is an "untransformed" entity (a
>>>>> dataset); the average of that list is the "transformed" entity
>>>>> (another dataset, although a very simple one).
>>>>>
>>>>>
>>>>>
>>>>> I am dealing with much more complex workflows, (e.g. files containing
>>>>> the outcome of a microarray experiment as the untransformed dataset
>>>>> and a list of differentially expressed genes as the transformed
>>>>> dataset), so please take the example above is just illustrative.
>>>>>
>>>>>
>>>>>
>>>>> ProcessExecution needs more expressivity, I think. Not sure how to
>>>>> solve this in a domain independent way, but here's my problem:
>>>>>
>>>>> An investigator (agent) performs an experiment
>>>>>
>>>>> That experiment has several input parameters, some of which are
>>>>> entities (e.g. samples), other are not (e.g. temperature).
>>>>>
>>>>> Resulting from the experiment are several output parameters (entities)
>>>>>
>>>>>
>>>>> I think that the current model caters for the above need. If you are
>>>>> specifically trying to differentiate between different kinds of inputs
>>>>> (samples as opposed to temperature), then the notion of role can be
>>>>> helpful in this resepect.
>>>>>
>>>>>
>>>>>
>>>>> So if I understand what you are saying correctly, "temperature" would
>>>>> be an entity of type "input", which in turn would be subclass of
>>>>> "role". An instance of "input" could then have a certain value (e.g.
>>>>> 15C) in one of its properties?
>>>>>
>>>>> In that case, does it make sense to include "input" and "output"
>>>>> classes in the model as subclasses of "role"? Or is this something
>>>>> that me and Stephan exemplify in the primer document under "usage of
>>>>> agent" (or something of the sort)?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks, khalid
>>>>>
>>>>>
>>>>>
>>>>> Have not completed my "experiment" yet, but will provide more feedback
>>>>> soon J
>>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Helena F. Deus
>>>>>
>>>>> Post-doctoral Researcher
>>>>> Digital Enterprise Research Institute
>>>>>
>>>>> National University of Ireland, Galway
>>>>>
>>>>> http://lenadeus.info
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>
>>>        
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>>
>>      
>
>
>
Received on Thursday, 25 August 2011 09:35:47 UTC