Re: PROV-ISSUE-1 (define-resource): Definition for concept 'Resource' [Provenance Terminology]

I have a problem with resource-as-container.  I think it's too constraining.  My 
zebra example wouldn't comply.

As for the distinction between f1 and r1 per your example, I think this is 
rather broadening the discussion - which I'm not sure is necessary or helpful.

I would say that in this case, r1 is a service resource.  And as such, I don't 
think it makes sense to download a service.  E.g. what to you receive if you do 
s simple HTTP GET in a SPARQL endpoint URI?  I think it's typically some kind of 
intro page that explains how to use the service (e.g. 
http://data.clarosnet.org/sparql/).  The URIs that may be used to download 
*content* from the triple store are different (e.g. URI-encoded SPARQL queries, 
or constructed LDAPI URIs).

So, for the purposes of this example, we need to be clearer about what we mean 
when saying "analyst (alice) downloads a turtle serialization (lcp1) of the 
resource (r1) from government portal" - in this context, I don't think it makes 
sense as it stands.

I also note that once you introduce a triple store into the mix, while we can 
expect it to contain information that has been loaded into it, when retrieving 
information, we have no a priori way to claim that the information subsequently 
retrieved has to do with the original resource.  The best we can say is that if 
the entire *content* of the resource "r1" is downloaded, then that content 
should contain as a subset the RDF that was loaded.  But even this isn't 
clear-cut - if the triple store supports named graphs (which most do), then 
there's no way to represent its entire content in a single Turtle download.

In  summary, I think the introduction of containers and triple stores is mixing 
mechanism with essential provenance concepts here, and I think we need to get 
the former straight before we can explain what happens when more complex 
mechanisms are introduced.  The scenario as described could playperfectly well 
without mention of a triple store.

#g
--


Luc Moreau wrote:
> Hi Paul,
> Yesterday, I also began drafting some definition. We need 
> representations in here too. I am not sure about
> your illustrations.  Here is my take on it:
> 
> 
> 
> 
>  From a provenance viewpoint, we seem to discuss several concepts
> related to resources.  Some terminology is required to disambiguate
> concepts.  It is inspired by terminology developed by the rdf working
> group (thanks to Sandro for drafting the original email!)
> 
> 
> 1. A "resource" is a container, whose contents may vary over time.
>    Its content may be structured in many different ways (hierarchical
>    XML tree, RDF arcs, etc).
> 
> 2. A "r-snapshot" is a state of a resource, or a snapshot of that
>    resource at a specific instant.  A r-snapshot is immutable. From a
>    resource that changes over time, one can obtain multiple
>    r-snapshots.
> 
> 3. A "r-text" is a particular sequence of characters or bytes which
>    conveys a particular r-snapshot in some language.  If you can parse
>    a r-text, you know what is in the r-snapshot it conveys.  You can
>    tell someone exactly what is in a particular resource at some
>    instant by sending them a r-text.  (You send them the r-text which
>    conveys the r-snapshot which is the current state of that resource.)
> 
> 
> 
> In some cases, some resources do not vary over time, which means that
> there is a single r-snapshot for them, and some may even have a single 
> r-text
> (no content negotiation).  In such a specific case (static resources on 
> the web),
> the three concepts conflate into  a single one.
> 
> The challenge is to deal with dynamic contents.
> 
> 
> 
> Illustration inspired by the example.
> 
> - government (gov) converts data (d1) to RDF file (f1) at time (t1) 
> using xlst transform
> - government (gov) uploads RDF data (f1) into a triple store, exposed 
> as  Web resource (r1)
> - analyst (alice) downloads a turtle serialization (lcp1) of the 
> resource (r1) from government portal
> 
> Illustrations:
> - r1: is a resource: it's the triple store, its a container, its content 
> can vary over time
> - lcp1: is a r-text (turtle serialization) of a given snapshot (created 
> by, or available at the time of, download)
> - f1 is a local file: it can be seen as a stateless anonymous resource, 
> with a single r-text.
> 
> If in addition:
> - analyst (alice) downloads a rdf/xml serialization (lcp2) of the 
> resource (r1)
> 
> If the content of r1 has not changed, then lcp2 and lcp1 are both 
> r-texts of a same r-snapshot.
> 
> Note that this is not limited to RDF (as Graham mentioned)
> 
> - newspaper (news), uses a CMS to publish the incidence map (map1), 
> chart (c1) and
>   the image (img1) within a document (art1) written by (joe) using
>   license (li2)
> - newspaper (news), updates art1, adding a correction following a 
> complaint from a reader
> 
> Illustrations:
> - art1 is a also resource, with two r-snapshots (before and after 
> correction)
> - with language negotiation, an http client can download  html and xhtml 
> representations (i.e., r-texts) of the article
> 
> 
> 
> What do you think?
> Cheers,
> Luc
> 
> 
> On 05/25/2011 06:49 AM, Paul Groth wrote:
>> Hi,
>>
>> To throw out some, perhaps simpler, definitions into the mix that I 
>> think follow along the lines of what's being discussed.
>>
>> Resource - something that can be identified
>>
>> Snapshot - the state of a resource at particular point in time
>>
>> In the Data Journalism Scenario: a 'resource' would be the web page. a 
>> 'snapshot' would be the web page before publication.
>>
>> cheers,
>> Paul
>>
>> Note: Similar concepts are found within many provenance models that I 
>> know of....if it's helpful I can list those out
>>
> 

Received on Wednesday, 25 May 2011 10:54:49 UTC