Re: PROV-ISSUE-46 (where-is-D-in-provenance): Where do I find document D in provenance [Accessing and Querying Provenance]

Paul Groth wrote:
> Hi Graham,
> 
> I think you identified the crux of the matter in this paragraph:
> 
>> The specific point raised about "where do I find a BOB assertion" is, 
>> I think, a
>> matter for the model.  If one has located and obtained the provenance 
>> associated
>> with a web resource, it's not the job of the access mechanism to 
>> figure out what
>> it actually describes.  Personally, I think the whole sideshow about 
>> BOBs and
>> suchlike is just a big unnecessary distraction, but the point about 
>> being clear
>> about what is described by provenance remains important.
> 
> As far as I understand, you mean the PAQ just tells you where to find 
> some provenance information associated with a particular URL, correct? 
> where the association is rather loose. Then you have to use the model to 
> tease apart what the provenance information "actually" is about.

Yes.  But I should temper that by noting that this is not inevitable, but it 
does represent a design choice.  In the web context, with its focus on a RESTful 
approach to application design, I think it is the appropriate choice.

> For example, if several versions of the same document appeared at the 
> same url (i.e. the new york times homepage gets updated). Then to 
> determine that I would just say well the provenance is associated with 
> this url is over here. Then you I would look at the provenance 
> information and by knowing the model know what the current version is 
> and its provenance... or if it's really provenance about the ads on the 
> page, I would inspect the provenance information some more and be able 
> to figure that out.
> 
> Is that correct?

Broadly, that is a consistent approach, and I think it's a reasonable design 
choice.  Alternatively, if there is a "permalink" for the NYT homepage for 
today, then use that as the resource key for accessing provenance.  But, in any 
case, I think it would be a good design choice if the provenance itself 
(assuming RDF here) uses the more specific URI, and maybe also includes a 
statement to the effect that the URI used refers to a constrained form of (an 
"IVP") the resource identified by the "today" URI.

If there's no "permalink", then in the provenance RDF one might introduce a 
blank node (or constructed URI) for the date-specific homepage and add the 
additional through appropriate additional RDF.  (This is part of the power we 
gain by using RDF rather than a provenance-specific metadata format.)

If it's about the ads on the page, the same broadly applies, but would probably 
be better (IMO) if each ad already has its own URI.  In any case, we need to 
deal with real resources, so I think these design choices should be left open.

#g

Received on Thursday, 28 July 2011 11:35:46 UTC