Re: actions related to collections

Luc,

On Apr 18, 2012, at 4:19 PM, Luc Moreau wrote:

> Dear all,
> 
> I just wanted to throw a few ideas/questions to defend collections as they currently are.
> 
> 1. prov:Collection is similar to rdfs:Container [1] :
> the properties rdf:_1, rdf:_2, ...[2]  map naturally to keys in prov:Collection.

I don't see how these map. 
In prov:Collection, keys have values chosen by the user -- rdfs:Container imposes the rdf:_N "value" for the "key".
rdfs:Container doesn't support keys.

I think there is consensus that prov:Collection as it stands is _more_ than set membership.
I argue that this more expressive construct is incredibly useful but misleadingly named.

> 
> 2. RDF collections [3] can also be described by prov:Collection, using rdf:first and rdf:rest
>     as keys for a collection of two elements, and allowing nesting of collections.

Although it's true that one can reproduce an rdf:List using the current definition of prov:Collection, 
I'm not sure this provides "nesting" in any useful form.
It also shows how prov:Collection is a more general construct than rdf:List.


> 
> So a few questions:
> 
> 1. Is it being suggested that rdfs:Container and rdf:List are not appropriate, and we
>     should look at other forms of "collections"?


I'm suggesting we rename "collection" to "dictionary". The confusion is occurring when people read prov:Collection definitions as if it is set membership, which it is not optimized for.
The capabilities that it _is_ optimized for are very useful, should stay, will be used heavily, but should be renamed to something less misleading.


> 
> 2. Has the prov-o ontology encoded prov-dm collections in a way that is lightweight enough?
>     Could we for instance restrict the keys to be mapped to  properties such as rdf:_1, rdf:_2?

I'm not sure why we want to contort the eloquence of the Dictionary into something that is less expressive (rdfs:Container), and which has been disregarded for practical uses during the decade that it has been available.


> 
> 

> I however acknowledge that prov:Collection is not "natural" to model a set.

prov:Dictionary!


> I suppose that
> like  "rdf:Bag class is used conventionally to indicate to a human reader that the container is intended to be unordered",
> we would need a similar notion for expressing sets with prov:Collection.

We should leave modeling sets to SIOC and RDFS and focus on giving the community something that it doesn't have -- a construct that lets us encode the provenance of function calls with multiple inputs and multiple outputs.

We don't have a set membership construct and we shouldn't encourage people to misuse a dictionary to model a set.


-Tim


> 
> Cheers,
> Luc
> 
> [1] http://www.w3.org/TR/rdf-schema/#ch_container
> [2] http://www.w3.org/TR/rdf-schema/#ch_containermembershipproperty
> [3] http://www.w3.org/TR/rdf-schema/#ch_collectionvocab
> 
> 
> On 18/04/12 19:39, Stephan Zednik wrote:
>> 
>> 
>> On Apr 18, 2012, at 12:24 PM, Timothy Lebo wrote:
>> 
>>> I've had similar concerns that the definitions for collections are "too heavyweight" to manage the membership of sets.
>>> 
>>> But while ignoring is name and looking at the modeling construct it provides, it's clear that this construct will be very useful in many real provenance problems (for example, the very ubiquitous need for provenance of function calls with their argument names and bindings).
>>> 
>>> Perhaps we can avoid the "too heavyweight for set membership" concerns raised by Satya and Jun by renaming what we have (prov:Collection) to something more appropriate, like prov:Dictionary?
>> 
>> +1
>> 
>> Jim is right that you can model collections with enumerated classes, but I am not sure about stating the provenance of a collection defined by an enumerated class.
>> 
>> We could also define a much simpler prov:Collection class that does not force map/dictionary conventions to go along with prov:Dictionary.
>> 
>> --Stephan
>> 
>>> 
>>> -Tim
>>> 
>>> On Apr 18, 2012, at 2:12 PM, Jim McCusker wrote:
>>> 
>>>> I think a set of key-value pairs is what's known as a map or dictionary. A collection is a set of things with a defined membership. In OWL it would probably be represented as an enumerated class.
>>>> 
>>>> Jim
>>>> 
>>>> On Wed, Apr 18, 2012 at 1:20 PM, Jun Zhao <jun.zhao@zoo.ox.ac.uk> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> I concur with what Satya wrote. And the example I had in mind is collection type of entities on the blog sphere of the Web.
>>>> 
>>>> As we all know SIOC is a widely used vocabulary to describe entities in the online community sites, like blogs, wikis, etc. It has the concept of sioc:Container, which is defined as "a high-level concept used to group content Items together". The relationships between a sioc:Container and the sioc:Items or sioc:Posts that belong to it are described using sioc:container_of and sioc:has_container properties.
>>>> 
>>>> The provenance of a sioc:Container could be who is/are responsible for the container, who created this container, and when.
>>>> 
>>>> The provenance of a sioc:Post could include when the posted was published, when it was modified, by whom, based on which other posts, document or data.
>>>> 
>>>> As you see, I am struggling to see how the key-value pair kind of structure could play in the above simple scenario. But please correct me if I am wrong.
>>>> 
>>>> HTH,
>>>> 
>>>> Jun
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 18/04/2012 18:35, Satya Sahoo wrote:
>>>> Hi all,
>>>> The issue I had raised last week is that collection is an important
>>>> provenance construct, but the assumption of only key-value pair based
>>>> collection is too narrow and the relations derivedByInsertionFrom,
>>>> Derivation-by-Removal are over specifications that are not required.
>>>> 
>>>> I have collected the following examples for collection, which only require
>>>> the definition of the collection in DM5 (collection of entities) and they
>>>> don't have (a) a key-value structure, and (b) derivedByInsertionFrom,
>>>> derivedByRemovalFrom relations are not needed:
>>>> 1. Cell line is a collection of cells used in many biomedical experiments.
>>>> The provenance of the cell line (as a collection) include, who submitted
>>>> the cell line, what method was used to authenticate the cell line, when was
>>>> the given cell line contaminated? The provenance of the cells in a cell
>>>> line include, what is the source of the cells (e.g. organism)?
>>>> 
>>>> 2. A patient cohort is a collection of patients satisfying some constraints
>>>> for a research study. The provenance of the cohort include, what
>>>> eligibility criteria were used to identify the cohort, when was the cohort
>>>> identified? The provenance of the patients in a cohort may include their
>>>> health provider etc.
>>>> 
>>>> Hope this helps our discussion.
>>>> 
>>>> Thanks.
>>>> 
>>>> Best,
>>>> Satya
>>>> 
>>>> 
>>>> On Thu, Apr 12, 2012 at 5:06 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk>wrote:
>>>> 
>>>> 
>>>> Hi Jun and Satya,
>>>> 
>>>> Following today's call, ACTION-76 [1] and ACTION-77 [2] were raised
>>>> against you, as we agreed.
>>>> 
>>>> Cheers,
>>>> Luc
>>>> 
>>>> [1] https://www.w3.org/2011/prov/**track/actions/76<https://www.w3.org/2011/prov/track/actions/76>
>>>> [2] https://www.w3.org/2011/prov/**track/actions/77<https://www.w3.org/2011/prov/track/actions/77>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jim McCusker
>>>> Programmer Analyst
>>>> Krauthammer Lab, Pathology Informatics
>>>> Yale School of Medicine
>>>> james.mccusker@yale.edu | (203) 785-6330
>>>> http://krauthammerlab.med.yale.edu
>>>> 
>>>> PhD Student
>>>> Tetherless World Constellation
>>>> Rensselaer Polytechnic Institute
>>>> mccusj@cs.rpi.edu
>>>> http://tw.rpi.edu
>>> 
>> 

Received on Wednesday, 18 April 2012 22:09:10 UTC