Re: PROV-ISSUE-46 (where-is-D-in-provenance): Where do I find document D in provenance [Accessing and Querying Provenance]

Paul Groth wrote:
> Thanks for explaining. I understand the design approach now.
> 
> Another design choice could be to a specific "resource key" for 
> accessing provenance. That is that there is a tight binding between the 
> url and the provenance. Essentially, you can only connect provenance to 
> a url that refers to a BOB. (e.g. provenance can only be associated with 
> a permalink). This means that you always know what the provenance is 
> referring to.
> 
> This makes the semantics of the connection between provenance and a url 
> clearer. But I guess this has some downsides as well...

Allowing that it's possible for a resource to be its own "BOB" (to the extent 
that it has some invariant features), then that seems a natural way to proceed - 
I don't see it so much an option as the only meaningful way to apply provenance 
(recalling that I regard "BOB" as a class to be an irrelevance).

But I think it doesn't address the problem I thought we were discussing.  If 
provenance assertions refer to some resource, they are only meaningful to the 
extent that the resource is invariant with respect to those assertions.

Some interesting/useful entailments should fall out of this:

   B isIVPOf A .
   A hasProvenance P .
|=
   B hasProvenance P .

(If one insists on BOB as a distinct class, this doesn't work.)

#g
--

> Graham Klyne wrote:
>> Paul Groth wrote:
>>> Hi Graham,
>>>
>>> I think you identified the crux of the matter in this paragraph:
>>>
>>>> The specific point raised about "where do I find a BOB assertion" is,
>>>> I think, a
>>>> matter for the model.  If one has located and obtained the provenance
>>>> associated
>>>> with a web resource, it's not the job of the access mechanism to
>>>> figure out what
>>>> it actually describes.  Personally, I think the whole sideshow about
>>>> BOBs and
>>>> suchlike is just a big unnecessary distraction, but the point about
>>>> being clear
>>>> about what is described by provenance remains important.
>>> As far as I understand, you mean the PAQ just tells you where to find
>>> some provenance information associated with a particular URL, correct?
>>> where the association is rather loose. Then you have to use the model to
>>> tease apart what the provenance information "actually" is about.
>>
>> Yes.  But I should temper that by noting that this is not inevitable, 
>> but it
>> does represent a design choice.  In the web context, with its focus on 
>> a RESTful
>> approach to application design, I think it is the appropriate choice.
>>
>>> For example, if several versions of the same document appeared at the
>>> same url (i.e. the new york times homepage gets updated). Then to
>>> determine that I would just say well the provenance is associated with
>>> this url is over here. Then you I would look at the provenance
>>> information and by knowing the model know what the current version is
>>> and its provenance... or if it's really provenance about the ads on the
>>> page, I would inspect the provenance information some more and be able
>>> to figure that out.
>>>
>>> Is that correct?
>>
>> Broadly, that is a consistent approach, and I think it's a reasonable 
>> design
>> choice.  Alternatively, if there is a "permalink" for the NYT homepage 
>> for
>> today, then use that as the resource key for accessing provenance.  
>> But, in any
>> case, I think it would be a good design choice if the provenance itself
>> (assuming RDF here) uses the more specific URI, and maybe also includes a
>> statement to the effect that the URI used refers to a constrained form 
>> of (an
>> "IVP") the resource identified by the "today" URI.
>>
>> If there's no "permalink", then in the provenance RDF one might 
>> introduce a
>> blank node (or constructed URI) for the date-specific homepage and add 
>> the
>> additional through appropriate additional RDF.  (This is part of the 
>> power we
>> gain by using RDF rather than a provenance-specific metadata format.)
>>
>> If it's about the ads on the page, the same broadly applies, but would 
>> probably
>> be better (IMO) if each ad already has its own URI.  In any case, we 
>> need to
>> deal with real resources, so I think these design choices should be 
>> left open.
>>
>> #g
>>
>>
> 

Received on Thursday, 28 July 2011 13:35:55 UTC