ConceptResource

From Provenance WG Wiki
Revision as of 15:50, 6 July 2011 by Tlebo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Historical note

Resource was renamed to ConceptThing in June 2011 after several weeks of discussion on email and telecons.


Definition for Concept 'Resource'

ISSUE: http://www.w3.org/2011/prov/track/issues/1

Introduction

The Provenance WG charter identifies the concept 'Resource' as a core concept of the provenance interchange language to be standardized (see http://www.w3.org/2011/01/prov-wg-charter).

  • What term do we adopt for the concept 'Resource'?
  • How do we define the concept 'Resource'?
  • Where does concept 'Resource' appear in ProvenanceExample?
  • Which provenance query requires the concept 'Resource'?

Proposed Definitions for the Concept 'Resource'

Definition from Architecture of the World Wide Web

By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI.

-- http://www.w3.org/TR/webarch/#id-resources

The text goes on to introduce the notion of an "information resource". I think there may be an argument that provenance data constitutes such an "information resource".

See also: http://www.w3.org/TR/webarch/#def-resource

I think any definition we may adopt for "Resource" should be broadly compatible with this view, or if not we should maybe find a term other than "Resource" for what we are talking about.

Definition from RFC3986

Resource

This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).

-- http://tools.ietf.org/html/rfc3986#section-1.1

Definition (according to Khalid Belhajjame)

"resource" is a generic term that refers to "anything" that can be referenced, and the provenance of which needs to be traced. A resource can be tangible, e.g., a painting, or intangible, e.g., a bank account. A resource can be mutable, in which case its state may changes over time. Moreover, the same resource (resp. resource state) may be encoded using multiple representations.

Definition by Paolo M.

  • a resource is ""anything which can be referred to (SM)""

Would an Information Object have the same resource status as a resource as a physical object? To me, the objects that matter are primarily data structures, documents, and assertions, and I think what we are saying does apply to those.

I also agree with SM, GK, etc. that When we talk about the provenance of a resource, we mean the provenance of its state on asking the question.

So I also agree that there is an implicit notion of resource state:

  • resource state -> r-snapshot (LM)

also:

  • """any notion of provenance refers to a specific state of a resource""". Naturally here we mean "observable state". I have not seen the notion of observer introduced in the discussion but it seems natural that provenance be relative to an observer. The fact that the Web architecture defines its foundational concepts similarly should be viewed as a convenience which will help ground the concepts, rather than a set of constraints that we are bound to.

Monotonicity, manifestations/representations:


  • provenance is "monotonic" wrt the state evolution of the resource it refers to. This is desirable (for computational purposes) and seems to follow naturally from associating provenance to a state: let r_s be a

resource r in state s. Its provenance prov(r_s) is a subset of prov(r_{s'}) for any s' that temporally follows s.

  • Given a resource r in a state s: r_s, one can create one or more representations ("manifestations") repr(r_s) of r_s. These are all r-snapshots or r_s.
  • Jun writes: ""If f1 is a file, then it is a representation of a resource, not a resource any more, right?""

I would argue that repr(r_s) should be a resource itself, for any resource r and (visible) state s. Indeed, it has an initial state (the time it is created from the underlying resource state), and its provenance at that state is simply the provenance of r_s, plus the action of creating repr(r_s). Then It can then evolve independently (but monotonically) as that new representation is acted upon. The provenance of any further state, is prefixed by that just mentioned by monotonicity.

Containers:

I do have a problem with "containers" as a separate notion from resource, however. Isn't a database a container? and a resource? (it does have a state, which is the set of all its elements, and for a given state I can certainly exhibit the provenance of each data item it contains).

So I am not sure the notion of container is useful here, or even well-founded: you end up with issues of granularity, because containers may be nested. But then anything non-atomic, like a tuple, is a container, which however does have a provenance, as we know.


Definition(rant?) by Jim M.

A resource is anything represented by an identifier and associated descriptive metadata. Resources are commonly used to represent physical and digital objects at various levels of abstraction and aggregation as well as to represent conceptual/logical constructs. Resources can be involved in events (instances of a process) as inputs, outputs, or participants. Events can change the state of a resource or create/destroy them. Resources may be related to other resources via provenance as well as through abstraction and aggregation relations.

Abstraction relations denote when two resources refer to the same underlying thing/object at a given time but differ in the set of state/descriptive metadata that is assumed to be integral to identity. Implied by this definition is the sense that there are processes that would 'merely' change the state of a resource at one level of abstraction while creating/destroying related resources at another level of abstraction. Resources representing the same thing at different levels of abstraction may thus be considered to be mutable/immutable with respect to such processes. E.g. a paper and a version of a paper are two resources that, at a given time, refer to the same underlying thing (e.g. a string/sequence of bytes) with the version considered immutable w.r.t. editing and the paper considered mutable. Vocabularies related to versioning, mutability, FRBR, etc. label resources in an attempt to represent abstraction relationships relevant to common processes (e.g. editing, expressing and manifesting a work).

Aggregation relations denote when one resources is considered to be part of another at a given time. (It is also possible to consider an aggregate resource at two levels of abstraction - one in which the current set of parts (state) is integral to identity and one in which is it not...). Aggregation implies that there are processes that can assemble and disassemble aggregate resources and change their composition. Vocabularies related to collection attempt to represent these relationships with respect to common processes (e.g. a it is useful to label a list of names as a collection if we are interested in add/removing names, but we we not consider a rock a collection of atoms unless we were interested in processes that add/remove atoms from it).

Resources may 'participate' in process instances/events/executions - with 'participation' implying that the given type of process only affects aspects of the resource's state that are not integral to its identity (i.e. the resource is considered mutable/the given process type is part of its lifecycle). Resources may also have derivation relationships to other resources, implying a connection via some process instance/event/execution, with derivation implying that the given process type changes aspects of the resources integral to identity (i.e. the resources are immutable w.r.t. the given process type and are used/produced by it).

All of these aspects of resources are critical to addressing the use cases within the prov-wg. Examples exist where things are considered at different levels of abstraction (a report and its unpublished/published versions), where aggregate resources are assembled and dissasembled, and where resources participate in events (the government publishing a report without considering any change of state of the government) as well being used and produced. Specifically, while provenance could be narrowly defined as being about tracking state changes, real world use cases involve only partial knowledge of state and changes in level of abstraction and aggregation are critical to assembling sufficiently rich provenance information to be useful. Conversely, in my opinion, this framework is also a minimal model in the sense that attempts to simplify by, for example, labeling resources as immutable/mutable, FRBR work/expression/manifestations, versionable resources/versions, opm:agenets, pml:sources, etc., significantly limit the scope of processes that can be discussed in an integrated fashion. (Considering such labels as useful extensions that can be mapped to core concepts of abstraction/aggregation is a different matter - along the lines of seeing a dc:creator label as inferring a creator's participation in a creation event for that resource).