Warning:
This wiki has been archived and is now read-only.

ProvDM Proposal for restructuring

From Provenance WG Wiki
Jump to: navigation, search

The master copy of this document is currently maintained offline - please do not edit the document here (for now)

Section references are to http://www.w3.org/TR/2012/WD-prov-dm-20120503/, indicating the intended source of material.

Rationale

This proposal attempts to facilitate take-up of the provenance model and vocabulary by addressing the following goals:

1. Separate core provenance patterns from specific applications

At heart, the provenance data model and ontology provide a framework and pattern for constructing a provenance trace; i.e. to enable traceability of artifacts to their ultimate sources, by whatever mean s they may have been produced. The proposed core patterns aim to capture the essence of this pattern, separately from the specific applications for which it may be used.

It has been my experience working with other complex ontologies that understanding the central patterns is the key to being able to work with the ontology as a whole. An example of this is FRBR (http://archive.ifla.org/VII/s13/frbr/frbr_current3.htm): if one goes to the original FRBR specification, it is bloated with detailed bibliographic record types, many of which are outdated (e.g. has terms for the groove pitch of a gramophone record, but no term for the bit rate of an MP3). But at the heart of FRBR is a very simple structure which is timeless and is frequently referenced in discussions about information systems and catalogues (i.e. work, expression, manifestation, item and supporting relationships. That core structure of FRBR is not easy to discover unless one already understands it.

2. Maximize interoperability with other systems and mechanisms

The proposed core patterns substantially satisfy the "Test of independent invention" (cf. http://www.w3.org/DesignIssues/Principles.html). There are many proposals for provenance information, all of which differ in their details end emphasis, but all of them use something like the core pattern described here. There are also ontologies that exhibit a similar structure that are not presented as provenance.

By describing the central patterns separately from the application refinements, it becomes easier to locate and exploit the correspondences between the provenance ontology and other provenance-likje structures that may be encountered on the web.

As a standards group, we do our most useful work by identifying and documenting the intersection of alternative provenance representation systems, rather than collecting and documenting their union. In my experience, a good standard is one that a reader can pick up and almost dismiss as being blindingly obvious, rather than one that demands detailed attention to a plethora of explicit detail; getting users "on the same page" is the first step to meaningful interoperability.

Simplicity of essential concepts is a key for getting "ordinary developers" to accept and use a specification, without which there is no meaningful interoperability.

3. Minimize ontological commitment required of users

The core patterns embody very little actual knowledge of the artifacts and processes involved; i.e. they require very little ontological commitment to a specific application view of the world. Less ontological commitment means that there is less for different people or communities to disagree about, and hence provides a basis for wider uptake of the core ideas.

Provenance is one of those topics that many people agree is important, but very few actually actually agree what it must capture. It's a bit like the story of the blind men and the elephant - everyone sees it from their own perspective. By digging down and exposing the core structures in very simple terms, we can make it easier for different communities to relate their views of provenance.

A symptom of ontological commitment can be seen within the working group: there is an large amount of discussion taking place which is concerned with the minutiae of exactly what term X or term Y mean, and exactly what information needs to be represented. In practice, I submit, applications will record the information they have available, and for maximum fidelity will use whatever terms are "native" to the application. To the extent that the PROV application extension terms represent commonly understood operations, they are useful and may be used, but we should not expect them to displace more precise application ontologies. The challenge then becomes how to recognize that the application ontologies have a relationship with (or are specializations of) these common terms. The core structural patterns, through their lack of ontological commitment, are the natural highest common factor for such alignment and, as such, deserve to be presented very clearly as the central concepts around which other terms are assembled.


Core Patterns

Abstract

Provenance describes the entities, people and activities involved in producing a piece of data or thing, from which the ancestry of an artifact can be discovered in the form of a /provenance trace/, together with information about agencies involved in the various steps leading to that entities existence. The core provenance data model describes elements that comprise the essential structural core of provenance information. This core structure may be enhanced with additional elements, some of which are described in a companion extensions specification, that represent more detailed information about the specific nature of that ancestry.

Introduction

Structure of this Document

Notational Conventions

Preliminaries

Use of PROV-N

Identifiers, namespaces and qualified names

(from 4.7.1, 4.7.2, 4.7.3)

Attributes and values

concept, not specifics

(part from 4.7.4, 4.7.5)

Extension

(refinement)

(from 5)

Validity

(selected material from 6, mainly a reference to the "constraints" document)

Structure of provenance and provenance traces

(material from 2.5, etc.)

Introduce the core triumvirate, with diagram

Diagram/example of a provenance trace (new, simplified)

Provenance core concepts

Entity

(From 4.1.1)

   entity(id, [attr1=val1, ...])

Activity

(from 4.1.2)

   activity(id, st, et, [attr1=val1, ...])

Agent

(from 4.2.1)

   agent(id, [attr1=val1, ...])

Derivation

(from 4.3.1)

   wasDerivedFrom(id, e2, e1, [a], [g2], [u1], attrs)

Use

(from 4.1.4)

   used(id,a,e,t,attrs)

Generation

(from 4.1.3)

   wasGeneratedBy(id,e,a,t,attrs)

[Invalidation]

(from 4.1.7)

   wasInvalidatedBy(id,e,a,t,attrs)

@@I've tentatively included this here as it's a natural counterpart to Generation.

Attribution

(from 4.2.2)

   wasAttributedTo(id,e,ag,attrs)

Association

(from 4.2.3)

   wasAssociatedWith(id,a,ag,[pl],attrs)

("Direction"? - "wasDirectedBy"?)

Responsibility

(from 4.2.4)

   actedOnBehalfOf(id,ag2,ag1,[a],attrs)

("Delegation"? - "wasDelegatedBy")

[communication]

(from 4.1.8)

   wasInformedBy(id,a2,a1,attrs)

@@Two interactions here I think should have a common structural ancestor ("influencedBy"?)

[startedby]

(from 4.1.9)

   wasStartedByActivity(id, a2, a1, attrs)

@@Two interactions here I think should have a common structural ancestor ("influencedBy"?)

Attributes

(from 4.7.4, et seq)

prov:label

prov:location

prov:role

prov:type

prov:value

Application Extensions

Abstract

The core provenance data model constructs describe the essential structures of a provenance trace, but to not provide specific information about the relationships between entities, activities and agents. Provenance extension constructs may be used to augment or refine the core constructs to provide such additional information about relationships that are commonly encountered in information and data processing activities.

Introduction

Structure of this Document

Notational Conventions

Illustration of PROV-DM by an Example

(From 3, et seq.)

(This could be the current example material, with a simplified example used in the core spec)

The Authors View

The Process View

Attribution of Provenance

Types and Relations

Entities

Specialization

(from 4.4.1)

Alternate

(from 4.4.2)

Activities

Start

(from 4.1.5)

End

(from 4.1.6)

[Communication]

(from 4.1.8, if not in core)

[Start by Activity]

(from 4.1.9, if not in core)

Derivations

Revision

(from 4.3.2)

Quotation

(from 4.3.3)

Original Source

(from 4.3.4)

Trace

(from 4.3.5)


Collections

(From 4.5 et seq)

Collection

Dictionary

Insertion

Removal

Membership

Annotations

(from 4.6 et seq)

Note

(from 4.6.1)

Annotation

(from 4.6.2)