ProvDM ConsensusProposal

From Provenance WG Wiki
Jump to: navigation, search

PROV-DM Consensus Proposal

There have been some suggestions about further reorganization of prov-dm to help accessibility. In particular, Graham has made a proposal to reorganize prov-dm. After discussion in the last telecon, it was clear that there was no overall consensus on that reorganization. In order, to reflect this proposal as well as the previous opinions by the working group, the following is a suggested consensus proposal.

Core & Advanced Patterns

An important point of Graham's proposal is that we should identify provenance core patterns, which are common structures of provenance descriptions. This can be accomplished by adopting a core and advanced pattern approach. In particular, this would involve revising Section 2 of the DM.

Prov-dm sections 2.1, 2.2 and 2.3 already list provenance core patterns. They are summarized in table 2 and pictured in figure 3. Prov-dm section should do a better job of explaining why they are core. Section 2.6 is not about a simplified overview but the core overview. Section 2.4 and 2.5 are not at the right place, since not about core patterns. So, overall, some restructuring of section 2 is required to make core patterns clear. Section 3.1 should just use core patterns, and 3.2, using non core features, is to be moved at the back.

Thus, a revised structure of section 2 would be as follows

2. PROV Overview
2.1 Core patterns
2.1.1 Entity and Activity
2.1.2 Generation, Usage, Derivation
2.1.3 Agents, Attribution, Association, and Responsibility
2.1.4 Core Patterns Diagram
2.2 Advanced patterns
   An explanation of the principles underpinning advanced patterns
2.2.1 Subtyping
   e.g. plan, SoftwareAgent, collection
2.2.2 Expanded relations
   e.g. full derivation, full association
2.2.3 Further relations
2.2.4 Provenance of Provenance

As a group, we should discuss which patterns are core and which are advanced.

Continue with one document

Outside users will not know whether something is core or not. Therefore, they will never know which document to search in. For them, it is not very practical to separate documents.

Furthermore, wasDerivedFrom(e2,e1) is core, whereas the more expressive wasDerivedFrom(id;e2,e1,a,g,u) is advanced. In other words, concepts such as Derivation have got an essential pattern and more advanced ones.

So, with two documents we would have to repeat concepts in a core and advanced document.


Graham's proposal highlighted design principles,, we should also look at modular design, mentioned in the same document. The components in prov-dm expose dependencies between concepts, and as Curt said offer users different approaches to provenance (process oriented with component 1, data flow oriented with component 3, and responsibility oriented with component 2).

Thus, keeping the components within the context of core and advanced patterns will help with this modularization.

Overall recommendation

  1. Better expose core patterns in Section 2
  2. Keep the component based structure, and uniform presentation of concepts, marking core patterns throughout the document.
  3. WG to agree on core and advanced patterns, and expose them both in prov-o and prov-dm.
  4. WG to agree on terminology:
  • is it core/starting point/primitive/essential/...
  • is it advanced/extended/..