Re: Resources and state (was: PROV-ISSUE-7 (define-derivation): Definition for Concept 'Derivation' ) from James Cheney on 2011-06-02 (public-prov-wg@w3.org from June 2011)

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Thu, 2 Jun 2011 12:42:29 +0100
To: Graham Klyne <GK@ninebynine.org>
Cc: Luc Moreau <L.Moreau@ecs.soton.ac.uk>, W3C provenance WG <public-prov-wg@w3.org>
Message-Id: <0B3DB509-F470-479C-B303-68C815613AFA@inf.ed.ac.uk>
Hi all,

Catching up, I think I agree with the current state of the discussion  
of resources and derivation.

I'm not sure I agree that most Web resources are static in the sense  
that they don't change at all, or at least, I'd be curious to see  
numbers backing that up.  But resources often change monotonically  
(nondestructively) and (as Paolo suggested) their provenance generally  
should be monotonic too.  We should handle this common case.

Web resources are sometimes aggregations of other resources (e.g.  
databases, records) and dynamic/destructive changes do happen, and  
their provenance is important to capture.  Scientific data on the Web  
illustrates the importance of both dynamics and collections.   
Databases evolve gradually over time and it is very important to be  
able to understand how things have changed, who has changed what, and  
so on.  Databases also behave more like "physical objects" - they  
typically do have a consistent linear sequence of versions because of  
transactions, large ones are hard to copy, etc.

*However*, these are important research issues for provenance, and  
there is little, if any standardization on how to do this, in contrast  
to the various provenance models (OPM, PML, Provenir, ...) we have  
been discussing which seem perfectly adequate for static, non- 
aggregate resources.

I recall that at the end of the WG, issues such as containers,  
versioning, recipe links and so on were raised and included among the  
concepts, but that there seemed not to be a strong consensus that they  
were essential or that technology for these has converged enough to  
justify standardization.  I don't think the goal of the WG is to try  
to solve (what I view as) research issues.

So I would be inclined to to agree that we should avoid mission creep  
concerning resource state, and maybe go further: while recognizing the  
importance of dynamic state, versioning, containers, etc. for  
provenance, we should scope the model as narrowly as possible, and ask  
for each concept whether it is really something that needs to be  
standardized in order to provide minimal , and whether it is well- 
understood enough not to be a research problem.  For some of these, we  
might consider extensibility mechanisms (eg OPM-syle "profiles") to  
accommodate experimentation without presuming to standardize something  
prematurely.

</soapbox>

--James

On Jun 1, 2011, at 8:20 AM, Graham Klyne wrote:

> Luc Moreau wrote:
>> Hi Graham,
>> Isn't it that you used the duri scheme to name the two resource  
>> states that exist in
>> this scenario?
>
> That is a possible way of describing it, but the essence of what I  
> suggest is
> that the "states" (or "snapshots") are themselves resources.
>
>> In your view of the web, is there a notion of stateful resource?  
>> Does it apply here?
>
> Resource state is an architectural concept in the web.  Indeed, it's  
> so
> fundamental that the notion is used throughout http://www.w3.org/TR/webarch/
> without actually being defined, as far as I can tell.  (Other than  
> indirectly by
> reference to Fielding's thesis.)
>
> But not all resources have dynamic state.  Indeed, probably most  
> resources on
> the web are static.
>
> What I'm trying to do is avoid a layer of modelling complexity that  
> I don't
> believe is needed:  I think we can say all we need to say by just  
> talking about
> resources.  And in some cases, I think that talking about non- 
> resources can lead
> to inconsistencies or awkwardness (sorry, no example to hand.)
>
> To some extent, this pushes the static/dynamic discussion to a  
> different place,
> because in some cases the type of a resource may be significant.   
> But I see that
> as a useful extension point in any case, when discussing possibly  
> conflicting
> models of provenance and associated inference.  What I'd really like  
> to do is
> simplify the *core* model until there's nothing there for the  
> different
> provenance models to disagree about.
>
> ...
>
> Tangentially related, I just read through Yolanda's slides (http://www.w3.org/2005/Incubator/prov/wiki/images/0/02/Provenance-XG-Overview.pdf 
> ), and followed up on some of the associated reports, and my feeling  
> is that there's serious potential here for scope creep.
>
> One point in which I am in very strong agreement, indeed, I think  
> it's probably the most important thing for us as a working group, is:
>
> Slide 36:
> "The exchange language should have a low entry point to facilitate  
> widespread adoption, therefore it should be easy to do simple things"
>
> I think this is crucial the the eventual success of this group's  
> work (by which I don't mean successfully advancing to REC status,  
> but creating specifications that will actually be used in the web at  
> large).  The provenance problem is too important to mess it up my  
> making it too complicated for developers to get started.
>
> So, I think that at a basic level of provenance on the web, we want  
> to avoid talking about states, and snapshots, and other constructs  
> that are not directly relevant to a web developer creating an  
> application to record or use provenance information.  I think the  
> notion of "resource", interprted broadly per AWWW, etc., allows us  
> to do that.  Building upon a very simple core model, the subtleties  
> can be added throug refinement of the core concepts.  But if the  
> core concepts are not so simple that developers can easily generate  
> provenance information, this group's work could end up like a  
> magnificently engineered car with no roads to drive on.
>
> #g
> --
>
>> On 31/05/11 23:57, Graham Klyne wrote:
>>> Luc Moreau wrote:
>>>> Graham,
>>>>
>>>> In my example, I really mean for the two versions of the chart to  
>>>> be available at
>>>> the same URI. (So, definitely, an uncool URI!)
>>>>
>>>> In that case, there is a *single* resource, but it is stateful.  
>>>> Hence, there
>>>> are two *resource states*, one generated using (stats2), and the  
>>>> other using (stats3).
>>>
>>> Luc,
>>>
>>> I had interpreted your scenario as using a common URI as you  
>>> explain.
>>>
>>> But there are still several resources here, but they are not all  
>>> exposed on the web or assigned URIs.  I'm appealing here to  
>>> anything that *might* be identified as opposed to things that  
>>> actually are assigned URIs.   (For example, the proposed duri:  
>>> scheme might be used - http://tools.ietf.org/id/draft-masinter-dated-uri-07.html)
>>>
>>> (And the URI is perfectly "cool" if it is specifically intended to  
>>> denote a dynamic resource.  A URI used to access the current  
>>> weather in London can be stable if properly managed.)
>>>
>>> (I think this is all entirely consistent with my earlier stated  
>>> positions.)
>>>
>>> #g
>>> -- 
>>>
>>>> Of course, if blogger had used cool uris, then, c2s2 and c2s3  
>>>> would be different resources.
>>>>
>>>> Luc
>>>>
>>>> On 05/31/2011 02:25 PM, Graham Klyne wrote:
>>>>> I see (at least) two resources associated with (c2):  one  
>>>>> generated using (stats2), and other using (stats3).  We might  
>>>>> call these (c2s2) and (c2s3).
>>>>
>
>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Thursday, 2 June 2011 11:42:58 UTC