Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]

We discussed this in the teleconference and it sounded like it would be appropriate to find better terminology for the following three things, which are currently not clearly distinguished:

- "the whole PROV instance, including set of toplevel statements and bundles"
- "a particular set of statements, either the toplevel one or one within a bundle"
- bundle = "a named set of provenance statements"

My initial proposal is "PROV dataset", "PROV instance", and "bundle".  I believe "PROV dataset" is roughly analogous to what people call "dataset" in the context of SPARQL; if anyone knows different (or has objections or better suggestions), let me know.

I'll send another message on this when this is ready for review.

--James

On Aug 9, 2012, at 3:45 PM, Provenance Working Group Issue Tracker wrote:

> PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
> 
> http://www.w3.org/2011/prov/track/issues/474
> 
> Raised by: Simon Miles
> On product: prov-dm-constraints
> 
> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered.
> 
> My original comment:
>> Bundles
>> -------
>> F. Section 6.1 seems a bit out of the blue. "The definitions
>> [etc.]... assume a PROV instance with exactly one bundle", and then
>> multiple bundles are handled as exactly the same number of
>> instances. Why? Why is there a connection between number of instances
>> and number of bundles? Why would a bundle be considered to be only one
>> instance? I thought a bundle was an identified set of statements,
>> allowing for provenance of provenance, which seems a distinct matter
>> from whether a set of statements are valid. It seems fine for a user
>> to treat one bundle as one instance if they want to, but there's no
>> reason given why this is the general case. 
> 
> Response from editors:
>> I am not sure I understand this comment.  However, I have rewritten
>> slightly the intro of section 6.1.
>> 
>> "The definitions, inferences, and constraints, and the resulting notions of normalization, validity and equivalence, assume a PROV instance that consists of exactly one bundle, the toplevel bundle, containing all PROV statements in the top level of the bundle (that is, not enclosed in a named bundle). In this section, we describe how to deal with PROV instances consisting of multiple named bundles. Briefly, each bundle is handled independently; there is no interaction between bundles from the perspective of applying definitions, inferences, or constraints, computing normal forms, or checking validity or equivalence."
> 
> I agree this is clearer, but I don't feel it answers the key questions in my comment. To put my comment another way: you have explained checking validity where an instance consists of one bundle and of multiple bundles. The two other possibilities I see are:
> (a) A bundle containing multiple instances;
> (b) An instance that is a collection of PROV descriptions with no identifier and so is not a bundle, e.g. a provenance service query result.
> 
> How do we deal with each of these cases? Or, if they cannot occur, why not?
> 
> Thanks,
> Simon
> 
> 
> 
> 
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Thursday, 9 August 2012 16:21:45 UTC