Recommendations for scenarios
We first analyze the three flagship scenarios for provenance that we have developed, and articulate recommendations based on those scenarios. Then, we decide on priorities for those recommendations.
The three flagship scenarios suggest the following recommendations. The first scenario leads to the more basic recommendations, the second to a broader set of recommendations, and the third to the more extensive set of recommendations.
Recommendations based on the News Aggregator Scenario
Recommendation # 1: There should be a standard way to represent at a minimum three basic provenance entities:
- a handle (URI) to refer to an object (resource)
- a person/entity that the object is attributed to
- a processing step done by a person/entity to an object to create a new object
Recommendation # 2: A provenance framework should include a mechanism to access provenance-related information addressed by other standards, such as:
- licensing information of the object
- digital signature for the object
- digital signature for provenance records
Recommendation # 3: A provenance framework should include a standard way for sites to make provenance information about their content available to other parties in a selective manner, and for others to access that provenance information
Recommendations based on the Disease Outbreak Scenario
Additional issues raised by this scenario over those in the previous one:
Recommendation # 4: A provenance framework should include a standard way to express the provenance of provenance assertions, as there can be several accounts of provenance and with different granularity and that may possibly conflict
Recommendation #5: A provenance framework should include a representation of provenance that is detailed enough to enable reapplying the process to reproduce it
Recommendation #6: A provenance framework should allow referring to versions of objects as they evolve over time, or to temporal information statements of when the object was created, modified, or accessed. In particular it should provide for a representation of how one version (or parts thereof) was derived from another version (or parts thereof).
Recommendations based on the Business Contract Scenario
Recommendation #7: A provenance framework should include a standard way to represent a procedure which has been enacted (in the scenario, this is to compare that procedure with what was required to be done)
Recommendation #8: A provenance framework should include a way to determine commonality of derivation in two resources (in the scenario, this is needed to judge the independence or otherwise of two reports)
The group agreed that #1, #2, and #3 were the highest priority recommendations and that they should be carried out by a provenance standardization effort.
While acknowledging that the priorities of the recommendations depend on the context, those three recommendations are the most common and represent the core set of issues to be addressed.
Relationship with technical requirements
The priorities above can be elaborated in terms of the broad technical requirements that the group identified for provenance.
Looking at the short-term recommendations #1 to #3, the minimal technical requirements should result in a provenance model, describing processes on objects, initiated by a person or entity, and lead to a newly created object. For this, we'll need handles for referring those objects, subjected to the processes. Also, there should be at least a way for representing the persons or entities that initiated the process. The provenance information, captured by this model, must be accessible in a standardized and selective way and mechanisms are needed to incorporate information described by other standards, e.g. licensing information and digital signatures.
The technical requirements, specific to the content of the provenance information are addressed by the key dimensions: attribution and process. More specifically, the technical requirements C-Attr-TR 1.1, C-Attr-TR 2.1 to C-Attr-TR 2.3 will enable to attribute a process to objects or to persons or entities. The requirements C-Proc-TR1.1, C-Proc-TR2.1, C-Proc-TR3.1 and C-Proc-TR3.2 will make it possible to describe the processes clear enough to reason over them. Concerning the management of provenance information the technical requirements grouped by the key dimensions publication, access and dissemination are essential. The requirements, denoted by M-Pub-TR 1.1 to M-Pub-TR 1.4 tackle the publication of the provenance information. Requirements M-Acc-TR 1.1, M-Acc-TR 1.3, and M-Acc-TR 1.4 provide access to the published provenance information, and requirements M-Diss-TR 2.1 and M-Diss-TR 2.2 make the provenance information available in a selective manner. The requirements, needed for the usability of the provenance information, are those grouped by the understanding and interoperabilty dimension, especially U-Under-TR 1.1 to U-Under-TR 3.2 allow to describe provenance using domain-specific vocabularies and to query the information using these vocabularies. The requirements denoted by U-Inter-TR 2.1.1, U-Inter-TR 3.1.1, and U-Inter-TR 5.1 to U-Inter-TR 5.3 enable relating the provenance information to related information addressed by other standards, e.g., digital signatures. These technical requirements will be essential for a standardization effort to focus on in a short term. These technical requirements will already enable a great deal of the use cases, relying on provenance information.
The other technical requirements, listed on the requirements page, play a crucial role in the long-term recommendations #4 to #8 and focus on more sophisticated problems like versioning, comparation of provenance records, trust, accountability, etc.