Re: PROV-ISSUE-1 (define-resource): Definition for concept 'Resource' [Provenance Terminology]

Hi Simon,

Simon Miles wrote:
> Hello,
> 
> With regard to the points raised on resources, in brief I suggest:
>  - For our purposes, a resource is anything which can be referred to
> and has a provenance.

Works for me.  (Or "may have a provenance"?)

>  - This is equivalent to "anything that might be identified by a URI"
> anyway, so it seems sensible to use that existing definition.

For practical purpuses, I'd agree, though there may be edge cases that don't 
exactly match up.  E.g. http://example.org/interestingnumbers#pi, used to denote 
the number Pi (I'm not sure Pi has a provenance)?

>  - When we talk about the provenance of a resource, we mean the
> provenance of its state on asking the question.

Sometimes maybe, but I'm not sure I agree this *always* applies.  Consider my 
previous example of the ancestry of a Zebra.

>  - When we talk about the provenance of a resource state
> representation, we mean the provenance of its state plus how it came
> to be in that representation.

I think this needs unpicking:  how does "how it came to be" differ from 
"provenance" here?

>  - We would expect implementers of the recommendation to provide
> access to the provenance of a web resource state representation, but
> by the suggestions above this would anyway be the provenance of the
> resource state (just by ignoring the portion specifically relating to
> representation), and that state's provenance is equivalent to the
> resource's provenance.

This isn't clear to me ... I've yet to read your longer explanation...

> In less brief, the reasons for the suggestions above:
> 
> It seems intuitive to me that what a user, or a client on their
> behalf, would ask for or expect is the provenance of a resource (in
> the web architecture sense, (a) in Luc's list). As this might be
> mutable, and so does not have one history over time, it makes sense to
> me to specify that the provenance of a resource is the provenance of
> its state on asking the question.
> 
> I agree with Jun that it would be good to include non-web resources,
> but then agree with Paul that the web architecture definition captures
> all we would want, just expressed in a way which is unusual for
> non-web settings. If we accept the above suggestion that a "resource"
> is what we'd ask for the provenance of, then surely all we mean by
> resource is something which can be referred to and which has a
> provenance? If so, then I think "might be identified with a URI" is
> one way of describing this - else, what could be referred to but could
> not be identified with URI? and what could be identified but does not
> have a provenance?

Yes.  In previous discussions about the nature of a resources (esp. w.r.t. RDF 
formal semantics), Pat Hayes has pointed out that a resource is pretty much 
anything you want it to be.  So I think all of the above applies.  I guess the 
trick for us is to figure out a form of words that sets up the right 
expectations for our readers?

Actually, I find your take on the Web Arch definition is useful, as it allows a 
small separation between the object of provenance in general and provenance 
applied in the Web.  I think Luc's objection might be overcome by making the 
reference to provenance not s much part of the definition, bit more of an 
explanation; e.g.


   "A resource is anything which can be referred to.
    In principle, provenance nformation can be associated with any resource"

> With regards to (a) resource, (b) state and (c) representation, I
> think it makes sense to talk about the provenance of any of the three.
> Taking Graham's example, if (a) is the zebra's health, (b) is the
> zebra's health at some point in time, and (c) is a medical record
> about the zebra's health, I can envisage a meaningful response to
> asking the history of the zebra's health (a), how its health came to
> be as it is now (b) which is effectively the same as (a), or why the
> record contains what it does (c). For the purposes of provenance, it
> seems that (c) is just (b) with a bit of extra information (details of
> the particular representation) and so the provenance of (c) is just
> the provenance of (b) plus some extra (ignorable) information on how
> it can to be represented as it is.
> 
> Graham - I don't understand your argument for why a web resource
> state's ((b)'s) provenance would not be meaningful. The provenance of
> the government data at the time it was first published, for example,
> would refer to the studies which produced it, while the provenance of
> its Turtle representation would be the same plus information about
> serialisation in Turtle.

That comment was made specificaly with respect to the notion of (my 
understanding of) resource state as it appears in the description of Web 
Architecture and REST architectural style.  The resource is the thing that is 
identified (has a URI).  The state representation is the thing that is actually 
transferred on the Web.  The resource state is a notion that is used to link 
these ideas without, as far as I can tell, having any actual direct exposure to 
users or agents in the web.

But, if we talk about resource state as you have done, without reference to web 
architecture (which I'm coming to think may the right thing to do for our 
initial definitions), then it makes sense.  I.e. if we *first* start with 
definitions of what we want to describe, and *second* figure out how they map 
into web architecture, then we may be able to achieve greater clarity. (**)

I recognize I may be partly responsible for initially pushing the discussion in 
terms of web architecture.  I guess that, in part, I thought the basic notions 
were fairly well understood from prior work, and we were already moving toward 
the second step.  But I do see a danger that if we stray too far from web 
architecture in our discussions, then we may spend a lot of time discussing 
things that aren't really relevant to the goals of this WG.  It's a tricky balance!

(**) separately from this message, I've been contemplating a (more technically 
oriented) suggestion that may help to bridge the conceptual gap - I'm aim to 
post that in another message.


> In a mail to this list which I think got lost, I said that in the
> government example I didn't understand the difference between f1 being
> "published" and r1 being "made available as a web resource", so I'm
> not clear enough on the difference between f1 and r1 to use to
> illustrate the suggestions above.

Without checking in detail, I think I had a similar problem.  To me, they seemed 
like the same thing.


> On 24 May 2011 21:13, Graham Klyne <GK@ninebynine.org> wrote:
>> Hi Luc,
>>
>> Trimming the message this time!
>>
>> Luc Moreau wrote:
>>  >(I wrote):
>>>> I don't think there's a need or purpose to invoke that terminology here.
>>>>
>>>> Just consider, for the sake of discussion, a slight revision of the
>>>> example:
>>>>
>>>> government (gov) converts data (d1) to XML (f1) at time (t1)
>>>> government (gov) generates provenance information (prov) regarding XML
>>>> (f1)
>>>> government (gov) publishes XML data (f1) along with its provenance
>>>> (prov) on a portal with a license (li1); the XML data is now available
>>>> as a Web resource (r1)
>>>>  :
>>>>
>>>> I think the example makes just as much sense with RDF replaced by XML,
>>>> but the RDF terminology does not apply to XML data.  And, by the way,
>>>> I think this revised example also represents a use-case that we MUST
>>>> be able to support (except that instead of talking about Turle and
>>>> RDF/XML serializations, we might talk about text/XML vs EXI
>>>> (http://www.w3.org/TR/2011/REC-exi-20110310/) serializations.
>>> I agree that it could be xml.  But the problem is still the same.
>>> THe web architecture distinguishes
>>> - resource
>>> - resource state
>>> - resource state representation
>>>
>>> The rdf WG has introduced terminology for rdf corresponding to these
>>> concepts.
>>>
>>> If we want to explain how provenance fits into the web architecture, we
>>> need to be able
>>> to refer to these notions.
>> OK, I see two discussion points here:
>>
>> (a) the relevance of the RDF g-box, g-snap, g-text terminology, and
>>
>> (b) the need to express provenance about resources/resource state/resource state
>> representation
>>
>> Regarding (a), I think the (resources/resource state/resource state
>> representation) terminology is perfectly adequate for our current purposes, and
>> that avoids getting drawn into RDF-specific issues of RDF graph evolution.
>> Later, when we (maybe) discuss more specifically management of provenance
>> expressed using RDF, I can imagine the g-box/... terminology might be helpful.
>>
>> Regarding (b), I've offered a viewpoint, but I remain open to persuasion.  But I
>> don't think focusing on the g-box/g-snap/g-text is going to help us here,
>> because the Web Architecture concepts are so much broader (i.e. not just RDF).
>> More important, IMO, is to identify a specific scenario that isn't adequately or
>> so easily handled by the provenance-of-resource case.
>>
>> #g
>> --
>>
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>>
> 
> 
> 

Received on Wednesday, 25 May 2011 07:34:21 UTC