Re: PROV-ISSUE-198: Section 6.1 (PROV-DM as on Dec 5) [prov-dm]

Hi Satya,

this is one of the issues that went overlooked in my backlog. It is now relevant as I am revising the Collections section.

I am strongly in favour of introducing constructs that provide a way to track the provenance of a data structure. There is a ample 
evidence that tracking the evolution of data /along with its data container/ is useful. Indeed, the initial proposal was more 
ambitious and tried to capture operations on ordered trees. It was then revised "down" to a simple data structure.
   I will argue that this is not at all domain-specific: the notion of a "data container" or data structure is not a domain, rather 
it's an integral part of what data provenance is about. I felt that support for expressing the connection between data elements and 
the data structures that contain them was missing from the OPM, and at the time we devised extensions to deal with it.
  So to me the question is not whether there is a place in PROV for constructs that track the provenance of a data structure, rather 
what is the most general data structure whose provenance we can capture in a simple way. In the end, we settled for a minimal 
structure, namely sets of key-value pairs, which is what the current proposal is about.  But it's important to clarify whether there 
is agreement on going forward with it.
If so, we are going to edit the current version based on the other issues that have been raised (135-139)

--Paolo



On 12/7/11 2:19 AM, Provenance Working Group Issue Tracker wrote:
> PROV-ISSUE-198: Section 6.1 (PROV-DM as on Dec 5) [prov-dm]
>
> http://www.w3.org/2011/prov/track/issues/198
>
> Raised by: Satya Sahoo
> On product: prov-dm
>
> Hi,
> The following are my comments for Section 6.1 of the PROV-DM (as on Dec 5):
>
> Section 6.1
> 1. "The relations introduced here are all specializations of the wasDerivedFrom relation, specifically precise-1 or imprecise-1. They are designed to model:
> * insertion: a collection entity c' is obtained from collection entity c, by adding entity e having key k to c;
> * removal: a collection entity c' is obtained from collection entity c, by removing entity e having key k from c;
> * selection: an entity e was selected from collection c using key k."
>
> Comment: The relevance of the Collection and these related properties in PROV-DM is not clear. I am not sure why indexing structures should be part of the Data Model. In addition, the above list has highly domain-specific methods and should be either removed completely or removed to Best Practices document if needed. For example, one can make the case for modeling wasAddedTo_Agent, wasRemovedFrom_Entity, wasModifiedIn_Entity etc.
>
> 2. "Record: wasAddedTo_Coll(c2,c1) (resp. wasRemovedFrom_Coll(c2,c1)) denotes that collection c2 is an updated version of collection c1, following an insertion (resp. deletion) operation."
>
> Comment: Why can't this be expressed using "wasDerivedFrom" or revision?
>
> Thanks.
>
> Best,
> Satya
>
>
>


-- 
-----------  ~oo~  --------------
Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
School of Computing Science, Newcastle University,  UK
http://www.cs.ncl.ac.uk/people/Paolo.Missier

Received on Thursday, 22 December 2011 10:20:03 UTC