Re: PROV-ISSUE-1 (define-resource): Definition for concept 'Resource' [Provenance Terminology]

Nothing in the example is restricted to rdf or triple stores.
It also applies to a table in a relational database (and its xml 
serialization),
or an excel spreadsheet (and a csv representation).

The relational database/table and the spreadsheet can be seen as 
containers, since
they can be updated.

The reason why it is important is that we need to consider stateful 
resources (well,
I think so, don't you?).

An alternative way of looking at it, adopting some old programming language
terminology, is this:

a resource is like a l-value
a snapshot is like a r-value
a r-text is like a representation of a r-value

Luc

On 05/25/2011 09:53 AM, Graham Klyne wrote:
> I have a problem with resource-as-container.  I think it's too 
> constraining.  My zebra example wouldn't comply.
>
> As for the distinction between f1 and r1 per your example, I think 
> this is rather broadening the discussion - which I'm not sure is 
> necessary or helpful.
>
> I would say that in this case, r1 is a service resource.  And as such, 
> I don't think it makes sense to download a service.  E.g. what to you 
> receive if you do s simple HTTP GET in a SPARQL endpoint URI?  I think 
> it's typically some kind of intro page that explains how to use the 
> service (e.g. http://data.clarosnet.org/sparql/).  The URIs that may 
> be used to download *content* from the triple store are different 
> (e.g. URI-encoded SPARQL queries, or constructed LDAPI URIs).
>
> So, for the purposes of this example, we need to be clearer about what 
> we mean when saying "analyst (alice) downloads a turtle serialization 
> (lcp1) of the resource (r1) from government portal" - in this context, 
> I don't think it makes sense as it stands.
>
> I also note that once you introduce a triple store into the mix, while 
> we can expect it to contain information that has been loaded into it, 
> when retrieving information, we have no a priori way to claim that the 
> information subsequently retrieved has to do with the original 
> resource.  The best we can say is that if the entire *content* of the 
> resource "r1" is downloaded, then that content should contain as a 
> subset the RDF that was loaded.  But even this isn't clear-cut - if 
> the triple store supports named graphs (which most do), then there's 
> no way to represent its entire content in a single Turtle download.
>
> In  summary, I think the introduction of containers and triple stores 
> is mixing mechanism with essential provenance concepts here, and I 
> think we need to get the former straight before we can explain what 
> happens when more complex mechanisms are introduced.  The scenario as 
> described could playperfectly well without mention of a triple store.
>
> #g
> -- 
>
>
> Luc Moreau wrote:
>> Hi Paul,
>> Yesterday, I also began drafting some definition. We need 
>> representations in here too. I am not sure about
>> your illustrations.  Here is my take on it:
>>
>>
>>
>>
>>  From a provenance viewpoint, we seem to discuss several concepts
>> related to resources.  Some terminology is required to disambiguate
>> concepts.  It is inspired by terminology developed by the rdf working
>> group (thanks to Sandro for drafting the original email!)
>>
>>
>> 1. A "resource" is a container, whose contents may vary over time.
>>    Its content may be structured in many different ways (hierarchical
>>    XML tree, RDF arcs, etc).
>>
>> 2. A "r-snapshot" is a state of a resource, or a snapshot of that
>>    resource at a specific instant.  A r-snapshot is immutable. From a
>>    resource that changes over time, one can obtain multiple
>>    r-snapshots.
>>
>> 3. A "r-text" is a particular sequence of characters or bytes which
>>    conveys a particular r-snapshot in some language.  If you can parse
>>    a r-text, you know what is in the r-snapshot it conveys.  You can
>>    tell someone exactly what is in a particular resource at some
>>    instant by sending them a r-text.  (You send them the r-text which
>>    conveys the r-snapshot which is the current state of that resource.)
>>
>>
>>
>> In some cases, some resources do not vary over time, which means that
>> there is a single r-snapshot for them, and some may even have a 
>> single r-text
>> (no content negotiation).  In such a specific case (static resources 
>> on the web),
>> the three concepts conflate into  a single one.
>>
>> The challenge is to deal with dynamic contents.
>>
>>
>>
>> Illustration inspired by the example.
>>
>> - government (gov) converts data (d1) to RDF file (f1) at time (t1) 
>> using xlst transform
>> - government (gov) uploads RDF data (f1) into a triple store, exposed 
>> as  Web resource (r1)
>> - analyst (alice) downloads a turtle serialization (lcp1) of the 
>> resource (r1) from government portal
>>
>> Illustrations:
>> - r1: is a resource: it's the triple store, its a container, its 
>> content can vary over time
>> - lcp1: is a r-text (turtle serialization) of a given snapshot 
>> (created by, or available at the time of, download)
>> - f1 is a local file: it can be seen as a stateless anonymous 
>> resource, with a single r-text.
>>
>> If in addition:
>> - analyst (alice) downloads a rdf/xml serialization (lcp2) of the 
>> resource (r1)
>>
>> If the content of r1 has not changed, then lcp2 and lcp1 are both 
>> r-texts of a same r-snapshot.
>>
>> Note that this is not limited to RDF (as Graham mentioned)
>>
>> - newspaper (news), uses a CMS to publish the incidence map (map1), 
>> chart (c1) and
>>   the image (img1) within a document (art1) written by (joe) using
>>   license (li2)
>> - newspaper (news), updates art1, adding a correction following a 
>> complaint from a reader
>>
>> Illustrations:
>> - art1 is a also resource, with two r-snapshots (before and after 
>> correction)
>> - with language negotiation, an http client can download  html and 
>> xhtml representations (i.e., r-texts) of the article
>>
>>
>>
>> What do you think?
>> Cheers,
>> Luc
>>
>>
>> On 05/25/2011 06:49 AM, Paul Groth wrote:
>>> Hi,
>>>
>>> To throw out some, perhaps simpler, definitions into the mix that I 
>>> think follow along the lines of what's being discussed.
>>>
>>> Resource - something that can be identified
>>>
>>> Snapshot - the state of a resource at particular point in time
>>>
>>> In the Data Journalism Scenario: a 'resource' would be the web page. 
>>> a 'snapshot' would be the web page before publication.
>>>
>>> cheers,
>>> Paul
>>>
>>> Note: Similar concepts are found within many provenance models that 
>>> I know of....if it's helpful I can list those out
>>>
>>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Wednesday, 25 May 2011 12:17:57 UTC