Warning:
This wiki has been archived and is now read-only.

Requirements

From XG Provenance Wiki
Jump to: navigation, search

Introduction

This document describes the requirements the Provenance-XG has extracted from the use cases it has collected. We organize these requirements according to the identified provenance dimensions. Each section has a series of User Requirements that are extracted from the dimensions associated exemplar use case. Each User Requirement places requirements on technology. These Technical Requirements are organized according to the User Requirements. Note, that technical requirements may be duplicated at least at the beginning in order to detect overlap in any user requirement.

All requirements begin with the prefix for dimension. Each User Requirement starts with a UR: amd the number of that requirement. Similarly, each Technical Requirement starts with a TR: the user requirement number it falls under and the its own technical requirement number. Each User Requirement should denote the use case where it was extracted from in parenthesis behind the requirement. An example requirement would be C-Attr-UR1.

Dimension Requirement Prefixes

To make it easier to reference various requirements for each dimension, we've established the following prefixes for each TR and UR under a dimension.

Dimensions Requirement Prefix
Content
Attribution C-Attr
Process C-Proc
Versioning C-Vers
Justification C-Just
Entailment C-Entail
Management
Publication M-Pub
Access M-Acc
Dissemination M-Diss
Scale M-Scale
Use
Understanding U-Under
Interoperability U-Inter
Comparison U-Comp
Accountability U-Acct
Trust U-Tru
Imperfections U-Imper
Debugging U-Debug

Requirements Guidelines

  • Requirements should aim for parsimony.
  • Requirements should be specific to the dimension as possible
  • Requirements may not be illustrated in the exemplar use cases. If this is the case the requirement should note this.
  • Technical requirements may be repeated across user requirements. These will be collated in the end.
  • If a requirement impacts on another area please refer to that area.

Content

Attribution

Exemplar Use Cases: collections, versions, disambiguation

  • C-Attr-UR 0: Be able to uniquely identify the source and the entity that are referred to by provenance.
  • C-Attr-UR 1: Determine who contributed to a document (disambiguation).
    • C-Attr-TR 1.1: The notion of contributor ID should be fixed to have data provider identity disambiguated.
  • C-Attr-UR 2: To be able to follow a process which will ensure correct management of provenance of data creators, including understanding of which content was created by which person (versions).
    • C-Attr-TR 2.1: There should be a notion of "creator of data" (in which cases user can be considered a creator of a piece of data and in which cases only collaborator).
    • C-Attr-TR 2.2: There should be a schema which will lead user to fulfill requirements for correct management of provenance metadata (proper insertion/modifications of data creators and collaborators).
    • C-Attr-TR 2.3: A system should track which portions of a document were produced by an entity.

Process

Exemplar Use Cases: biospecimens

  • C-Proc-UR 1: Be able to determine the influence of an agent on a given (sub)process.
    • C-Proc-TR1.1: The provenance metamodel used should be expressive enough to support the required reasoning capabilities.
  • C-Proc-UR 2: It should be possible to reason on the outcome of a given process, assuming changes in their preconditions. For example: "Does the reduction of the amount of H2O in a solution of 10g of NaCl, influence on its crystallization process?".
    • C-Proc-TR 2.1: In combination with the above mentioned reasoning capabilities, qualitative reasoning might be required to reason on the influence of the change of quantities in the entities participating in a given process.
  • C-Proc-UR 3: A process should be reproducible from its provenance graph.
    • C-Proc-TR3.1: The provenance metamodel should be rich enough in terms of process description to identify the key features of a process.
    • C-Proc-TR3.2: It should be possible to revert provenance graphs into enactable process models.
  • C-Proc-UR 4: It should be possible to compare processes with each other, group them into categories or clusters, and retrieve them on the basis of their provenance graphs. Additionally, it should be possible to establish analogies between different processes should, even from different domains. This should facilitate process reuse.
    • C-Proc-TR4.1: Graph matching algorithms should be applied that allow to compare two or more process provenance graphs.
    • C-Proc-TR4.2: It should be possible to abstract process provenance graphs from their domains, so that inter-domains analogies could be established.
  • C-Proc-UR 5: It should be possible to analyze the provenance of a process at different levels of granularity.
    • C-Proc-TR 5.1: Libraries of domain-independent overlays could be (re)used that describe processes at different levels of details so that provenance graphs can be matched against them.
    • C-Proc-TR 5.2: It should be possible to group the entities present on a provenance graph according to different criteria like proximity, sequence of subprocess, etc.

Versioning

Exemplar Use Cases: tweets, IQ assessment for linked data, timeliness

  • C-Vers-UR 1: Determine how content changes (i.e. its version) across the web and who is responsible for those changes (tweets)
    • C-Vers-TR 1.1: A system should keep track of the different versions of a published document, or piece of information.
    • C-Vers-TR 1.2: A system should keep track of what entity makes changes to the document
  • C-Vers-UR 1b: Determine and record how content has changed (tweets, IQ assessment for linked data, timeliness)
    • C-Vers-TR 1.1: A system should identify and record changes that are made to a document or data
    • C-Vers-TR 1.2: A system should define what constitutes a different version of a document or data
  • C-Vers-UR 2b: Determine and record when content was changed (tweets, IQ assessment for linked data, timeliness)
    • C-Vers-TR 2.1: A system should identify and record when content was changed
  • C-Vers-UR 3b: Determine and record who and/or what is responsible for the changes (tweets, IQ assessment for linked data, timeliness)
    • C-Vers-TR 3.1: A system should identify and record who and/or what changed the version, document or data.
  • C-Vers-UR 4b: Record why changes were made so that they may be retrieved later (IQ assessment for linked data)
    • C-Vers-TR 4.1: A system should allow an explanation for a change to be provided and recorded
  • C-Vers-UR 5b: View and extract some or all of the provenance records as required (tweets, IQ assessment for linked data, timeliness)
    • C-Vers-UR 5.1: Permit anonmyzation of the records before they are viewed or extracted (add to use cases?)
    • C-Vers-UR 5.2: Extract some or all of those records (either anonmyzed or not) (add to use cases?)
    • C-Vers-UR 5.3: Permit a user to access those records anonmymously (optional) (add to use cases?)
      • C-Vers-TR 5.1: A system should allow those records (or parts of those records) to be viewed or extracted as required.
      • C-Vers-TR 5.2: A system should be capable of anonymizing the personal data held in those records before viewing or extraction.
      • C-Vers-TR 5.3: A system should be capable of authenticating a user but allowing him/her to access the provenance records anonymously.
  • C-Vers-UR 6b: Use across different platforms
    • C-Vers-TR 6b: A system should be interoperable

Justification

Exemplar Use Cases: policy, engineering

  • C-JUST-UR1: The end results of engineering processes or scientific studies need to be justified by linking to source and intermediate data.
    • C-JUST-UR1.1: The justification should distinguish between source and derived data, and show how intermediate results were obtained (Cf. C-PROC-UR-1, C-ENTAIL-UR-1, U-UNDER-UR-1, U-UNDER-UR-2)
    • C-JUST-UR1.2: Links need to relate both public and confidential materials stored both online and offline (cf. C-ENTAIL-UR-6, M-DISS-UR-2)
  • C-JUST-UR2: The justification should facilitate informed discussion and decisions about the results (Cf. U-TRU-*)
    • C-JUST-UR2.1: Non-experts should be able to make basic judgments about the validity of the end results based on the supporting information, possibly using automatically constructed "executive summaries". (Cf. C-PROC-UR-3, C-PROC-UR-5, C-ENTAIL-UR-3, M-ACC-UR-2, M-ACC-UR-3, U-UNDER-UR4, U-IMPER-2)
    • C-JUST-UR2.2: Experts should be able to test the robustness of the conclusions by considering alternative steps or modeling choices, or identifying likely "points of failure" (Cf. M-DISS-UR-1, U-UNDER-UR-3, U-DEBUG-UR-1)
    • C-JUST-UR2.3: Automatic tools (by both experts and non-experts) should be able to perform basic checks of the authenticity, integrity and validity of the justification, to verify that a study or engineering process followed regulations, or to decide among different supported conclusions making competing claims (M-DISS-UR-1, U-INTER-TR1.3, U-ACCT-UR-1, U-COMP-UR1)
  • C-JUST-UR3: The justification should be preserved so that the actual long-term behavior of a product, or effects of a policy can be compared with predictions
    • C-JUST-TR3.1: The data and supporting information should be archived securely in a stable long-term preservation format
    • C-JUST-TR3.2: Changes to the source data or justification over time need to be recorded to keep the records consistent (C-VERS-*, M-DISS-UR4)
  • C-JUST-UR4: As much as possible of the above processes should be automated, to reduce effort and ensure compliance with regulations.
    • C-JUST-TR4.1: The "correct" behavior of unsupervised justification behavior (for individual or networked systems) need to be specified and agreed with regulatory agencies
    • C-JUST-TR4.2: There should be a clear description of the strengths and limitations of the system (i.e., what it can and should not be expected to do automatically)
    • C-JUST-TR4.3: Systems must be proved correct, or their actual behavior must be verifiable by users
    • C-JUST-TR4.4: Standards for exchanging justified, or provenance-carrying data should support collaboration among many users or systems (CF U-INTER-UR-2, U-INTER-UR-4, U-INTER-UR-5)

SEE ALSO: Trust, attribution, process.

Entailment

Exemplar Use Cases: environment, axioms

  • C-Entail-UR 1 : Decide the trustworthiness of a reasoning process or a materialized view.
    • C-Entail-TR1.1 record the provenance of data at a coarse and fine granularity levels. In the context of RDF, recording the provenance at a fine granularity level amounts to recording the provenance of a single triple, whereas recording the provenance at a coarse granularity level accounts to recording the provenance of a collection of RDF triples.
    • C-Entail-TR 1.2 provenance model for the SPARQL language: determine the provenance of the result of a SPARQL query.
  • C-Entail-UR 2 : Maintain a materialized view of the data at the lowest possible cost (when insertions and deletions are considered)
    • C-Entail-TR 2.1 record the provenance of data at a coarse and fine granularity levels. In the context of RDF, recording the provenance at a fine granularity level amounts to recording the provenance of a single triple, whereas recording the provenance at a coarse granularity level accounts to recording the provenance of a collection of RDF triples.
    • C-Entail-TR 2.2 ( provenance model for the SPARQL language: determine the provenance of the result of a SPARQL query.
  • C-Entail-UR 3: Be aware of the sources that contributed to the materialized view, both at a coarse and fine granularity levels. More specifically, we would like to know the database, table, or tuple thereof that was used in the computation of the view.
    • C-Entail-TR 3.1 record the provenance of data at a coarse and fine granularity levels. In the context of RDF, recording the provenance at a fine granularity level amounts to recording the provenance of a single triple, whereas recording the provenance at a coarse granularity level accounts to recording the provenance of a collection of RDF triples.
    • C-Entail-TR 3.2 provenance model for the SPARQL language: determine the provenance of the result of a SPARQL query.
  • C-Entail-UR 4 (Entailment Specific): Be aware of the sources that contributed to a implied triple using the RDFS reasoning mechanisms.
    • C-Entail-TR 4.1 record the provenance of data at a coarse and fine granularity levels. In the context of RDF, recording the provenance at a fine granularity level amounts to recording the provenance of a single triple, whereas recording the provenance at a coarse granularity level accounts to recording the provenance of a collection of RDF triples.
    • C-Entail-TR 4.2: extend the RDFS inference rules to capture the provenance of the implied /inferred triples.
  • C-Entail-TR 5: Be able to differentiate between asserted and derived information. Derivation should be considered broadly varying from automated reasoning to human reasoning to steps of scientific processes to query processing (e.g., SQL, SPARQL).
  • C-Entail-UR 6: Identify agents (e.g., humans and software components) responsible for conclusion derivation
  • C-Entail-UR 7: Identify the transformation pattern used to derive conclusions:
    • C-Entail-TR 7.1: For conclusion that is data, the algorithm, heuristic used
    • C-Entail-TR 7.2: For conclusion that is a logic formula, the inference rule used
  • C-Entail-UR 8: Identify the date and time of the derivation
    • C-Entail-TR 8.1: Annotate the date and time of the derivation
  • C-Entail-UR 9: Identify any input information directly and indirectly used to derive conclusions
    • C-Entail-TR 9.1: Identify Parameters directly and indirectly used to derive conclusions
    • C-Entail-TR 9.2: Identify Assumptions directly and indirectly used to derive conclusions
    • C-Entail-TR 9.3: Identify Hypotheses directly and indirectly used to derive conclusions
    • C-Entail-TR 9.4: Identify Other conclusions directly and indirectly used to derive conclusions
  • C-Entail-UR 10: Be able to veto the use of conclusions derived according to any combination of C-Entail-UR 6, C-Entail-UR 7, C-Entail-UR 8 and C-Entail-UR 9
  • C-Entail-UR 11: Be able to prefer the use of conclusions derived according to any combination of C-Entail-UR 6, C-Entail-UR 7, C-Entail-UR 8 and C-Entail-UR 9
  • C-Entail-UR 12: Record multiple justifications for the derivation of a given conclusion:
    • C-Entail-TR 12.1: Through multiple derivation processes
    • C-Entail-TR 12.2: Through a combination of derivation processes and assertions

Management

Publication

Exemplar Use Cases: timeliness

  • M-Pub-UR 1: Publish provenance information associated with data on the Web, including the "Web of Data" (timeliness)
    • M-Pub-TR 1.1: Tools should be available allowing data publishers to publish the associated provenance information.
    • M-Pub-TR 1.2: Tools should enable tracking provenance for the process of accessing data on the Web.
    • M-Pub-TR 1.3: Tools to publish provenance information at different levels of granularity and being able to extract the provenance information at different levels of granularity
    • M-Pub-TR 1.4: Tools to publish provenance information that facilitates interoperability of provenance information
  • M-Pub-UR2: Publish provenance in a way that makes it easy to access and query
  • M-Pub-UR3: Choose a representation format to publish provenance information
  • M-Pub-UR4: Users need to identify who published the provenance information

Access

Exemplar Use Cases: experiments

  • M-Acc-UR 1: Given an entity, the user must be able to query a single source or federation of sources for provenance information directly applicable to that entity (experiments)
    • M-Acc-TR 1.1: Applications should support specialized provenance query infrastructure (e.g. query operators).
    • M-Acc-TR 1.2: Applications should support computation of transitive closure of the derivation of an entity and the transitive closure of entities that have derived from that entity - both within and across sets of provenance information.
    • M-Acc-TR 1.3: Provide consistent identifiers to allow sameAs-style identification of entities between sets of provenance information. (experiments)
    • M-Acc-TR 1.4: Applications should provide efficient access mechanism over large provenance datasets and be scalable.
  • M-Acc-UR 2: Given a set of provenance information, the user must be able to determine the source and authority of the provenance author. (experiments)
    • M-Acc-TR 1.1: Applications should be able to access source information for provenance authors.
  • M-Acc-UR 3: Given a set of provenance information, the user must be able to query a single source or federation of sources to find partial matches of that provenance information in other sets of provenance information, usually to corroborate the history of any of the entities in the set. (experiments)
  • M-Acc-UR 4: Provide a way for stable provenance information to survive deidentification processes without endangering privacy.

Dissemination

Exemplar Use Cases: privacy, engineering

  • M-Diss-UR 1: Verify that data, disseminated to some entity for processing, was processed for a purpose which was valid under some generally applied rules of validity, or as stated by the entity upon requesting the data. (privacy)
    • M-Diss-TR 1.1: Represent purposes of using data in a way which can be compared against the provenance of its usage.
    • M-Diss-TR 1.2: Represent the provenance of disseminated data in a way which allows its usage to be checked against pre-stated purposes.
    • M-Diss-TR 1.3: Provide mechanisms to examine data's provenance to check for correct usage according to pre-stated purpose.
    • M-Diss-TR 1.4: Make the provenance representation non forgeable (non-repudation, no man-in-the-middle attack)
  • M-Diss-UR 2: Verify that data, disseminated to some entity for processing, was processed only by that entity. (privacy)
    • M-Diss-TR 2.1: Ensure that any data's provenance information includes the verifiable identity of the entities by which it has been received or has been processed.
    • M-Diss-TR 2.2: Provide mechanisms to verify that data was processed by a particular entity only, by examining the provenance of that data.
  • M-Diss-UR 3: Verify that all of a set of data, disseminated to some entity for processing, was used in that processing. (privacy)
    • M-Diss-TR 3.1: Provide mechanisms to parse provenance information and determine whether all of a set of data was used in a well-defined process.
  • M-Diss-UR 4: Check which uses of some data are affected by a change in that data, including those of remote, independent users who copied the data long before the change. (engineering)
    • M-Diss-TR 4.1: Link data and physical artefacts (e.g. design prototypes) to the data from which they are derived.
  • M-Diss-UR 5: Prove or disprove that a resource produced by one user is actually derived from a design produced by another user (engineering)
    • M-Diss-TR 5.1: Represent the provenance of a resource such that it is apparent from what it is not derived, as well as from what it is derived.

Scale

Exemplar Use Cases: blogs

  • M-Scale-UR 1: Allow provenance tracking for large scale data such as blogs posts and other Web content(blogs)
    • M-Scale-TR 1.1: Tools to allow provenance tracking at fine level of granularity that scales with increasing size of data.
    • M-Scale-TR 1.2: Allow provenance tracking for aggregated content on the Web (for example, aggregating and republishing of blog posts).
    • M-Scale-TR 1.3: Tools to correctly attribute Web content using provenance information without violating privacy and privacy laws.

Use

Understanding

Exemplar Use Cases: domain, granularity

  • U-Under-UR 1: Allow users to search or browse the provenance of a collection of artifacts, using domain-specific vocabulary(domain)
    • U-Under-TR 1.1: It should be possible to complement a provenance graph with annotations that may include terms from a domain-specific vocabulary
    • U-Under-TR 1.2: It should be possible to query a provenance graph using a combination of domain-independent, common terms and properties as well as terms from a domain-specific vocabulary.
  • U-Under-UR 2: Enable users to search or browse the provenance of an artifact based on a combination of domain-specific metadata and descriptions about the sources that provide such metadata (granularity)
    • U-Under-TR 2.1: The metadata associated to an artifact should be annotated with its provenance
    • U-Under-TR 2.2: Searches through a collection of artifacts, which predicate on the artifacts' metadata, should also support predicates on the metadata's provenance
  • U-Under-UR 3: Be able to provide Subject Matter Experts (SMEs) with explanations of the rationale behind a certain outcome (domain)
    • U-Under-TR 3.1: It should be possible to match a provenance graph against pre-existing high-level, domain-independent representations of candidate reasoning types.
    • U-Under-TR3.2: Such templates, or overlays, should be rich enough and aboundant enough to cover the majority of the possible reasoning behaviours for a given domain and task.
  • U-Under-UR4: Enable users to approach the provenance graph at different levels of detail, e.g. enabling understanding for users of different levels of expertise, ranging from novice to expert (granularity).
    • U-Under-TR4.1: The reasoning structures used to provide the explanations mentioned in UR 3 should be represented at different levels of granularity
    • U-Under-TR4.2: It should be possible to transit between the different levels of granularity of the overlays, being such transitions semantically described.

Interoperability

Exemplar Use Cases: merging (in the rdf context, "entity" to be understood as resource or statement!)


  • U-Inter-UR 1: (Merging) Enable users/systems to merge metadata about a same "entity" according to its attribution/provenance
    • U-Inter-TR 1.1: Provide the means to query provenance of an "entity" (in the rdf context, "entity" to be understood as resource or statement!)
      • U-Inter-TR 1.1.1: query should be able to identify specific "aspects" of provenance, such as attribution, source characteristics, etc), hence some form of "filtering" required.
      • U-Inter-TR 1.1.2: query should be able to scope provenance (how far back to we go into the history). Note this was not specific in the use case but was discussed in Miles' requirements paper (http://eprints.ecs.soton.ac.uk/11189/)
    • U-Inter-TR 1.2: Given a document, image, file, etc, provide the means to obtain its provenance
      • U-Inter-TR 1.2.1: Find an authoritative provenance service to retrieve its provenance
    • U-Inter-TR 1.3: Given a resource and some provenance, provide the means to verify that the provenance is the one of that resource.

This user requirement is inspired by the Open Provenance Vision Scenario of http://eprints.ecs.soton.ac.uk/18176/

  • U-Inter-UR2: (Chained provenance) Enable users/systems to trace back the origin of an "entity" whose ancestors have been produced/generated by different systems.
    • U-Inter-TR 2.1: Provide a common representation of provenance offering a technology-independent description of how an entity was derived
      • U-Inter-TR 2.1.1: Provide a computer parseable notation for provenance
      • U-Inter-TR 2.1.2: Provide a user-oriented notation for provenance.e.g. graphical notation
    • U-Inter-TR 2.2: Provide a mechanism to integrate part of a history in system x and part of history in system y.
    • U-Inter-TR 2.2.1: Provide the means to automatically propagate provenance information as data is exchanged between systems


This user requirement is the counter-part to the two previous ones: for provenance to be queriable, it needs to have been asserted.

  • U-Inter-UR3: (Record) Enable users/systems to express the provenance of an "entity" and make it persistent.
    • U-Inter-TR 3.1: Provide the means to express the provenance of an "entity" (in the rdf context, "entity" to be understood as resource or statement!)
      • U-Inter-TR 3.1.1: Provide a language/notation to express provenance
    • U-Inter-TR 3.2: Provide the means to record the provenance of an "entity"
      • U-Inter-TR 3.2.1: Provide the means for multiple components to each assert part of the provenance of an "entity"
    • U-Inter-TR 3.3: Allow for multiple provenance repositories/stores to be used


  • U-Inter-UR4: (Interoperability) When users access metadata merged from several sources, users expect to be able to trace the source of the metadata information, so that they can trust the merged information and explain any conflicting information.
    • U-Inter-TR 4.1.: Applications should be able to record the source of metadata at different levels of granularity. For example, if the metadata information is published as RDF, the applications should be able to track the provenance of each RDF statement or a collection of RDF statements, depending on the requirements of the system.
    • U-Inter-TR 4.2.: Applications should be able to track the steps taken to merge metadata from different sources.
    • U-Inter-TR 4.3.: Applications should be able to find the source of each piece of merged metadata information and present them to the users.


  • U-Inter-UR 5: Facilitate data sharing through interoperability of the associated provenance information
    • U-Inter-TR 5.1: Provenance models to ensure conceptual clarity of provenance terms (for example, Provenir ontology and OPM)
    • U-Inter-TR 5.2: Ensure consistent use of provenance terms to reduce terminological heterogeneity (for example, naming conflict and data unit conflicts)
    • U-Inter-TR 5.3: Allow representation of domain-specific provenance details while ensuring provenance interoperability

Comparison

Exemplar Use Cases: differences

  • U-Comp-UR 1: Enable users to determine the similarities and differences between past processes or events
    • U-Comp-TR 1.1: Allow records of past processes to be represented in such a way as to be able to compare their parts
    • U-Comp-TR 1.2: Provide a means for two processes or parts of processes to be treated as comparable (about the same thing), either by manual assertion or automatic deduction
    • U-Comp-TR 1.3: Provide a mechanism to take two comparable processes and produce a helpful explanation of their similarities and differences

Accountability

Exemplar Use Cases: contracts, compliance

  • U-Acct-UR 1: Allow users to verify that the work performed meets the contract decided upon earlier (contracts)
    • U-Acct-TR 1.1: contracts should have a machine-understandable representation in terms of data to be used, methods to be performed, time constraints, constraints on agent, and other obligations
    • U-Acct-TR 1.2: provenance should be expressed in terms of workflow such that it can be matched against the contract
    • U-Acct-TR 1.3: identity handling to map local identifiers into global identifiers (such as URIs ?)
    • U-Acct-TR 1.4: should be possible to suppress certain details in provenance such as certain processes or identity of samples
    • U-Acct-TR 1.5: should be possible to use (signed) statements from trusted third parties in place of portions (either secret or lengthy) of provenance
    • U-Acct-TR 1.6: requires mechanisms for establishing trustworthiness between parties and verifying the integrity of signed statements
  • U-Acct-UR 2: Enable users to determine that the compiled document is compliant with the licenses of all its parts ( compliance)
    • U-Acct-TR 2.1: provenance information should identify images, media and other works used along with their licenses
    • U-Acct-TR 2.2: licenses should be machine-understandable
    • U-Acct-TR 2.3: it should be possible to extract licenses of all work used in the document and reason over them to ensure that (i) each work is used in compliance with its license, (ii) the different licenses are compliant with each other and no conflicts occur
  • U-Acct-UR 3: Enable users to find an appropriate license for the document ( compliance)
    • U-Acct-TR 3.1: provenance information should identify images, media and other works used along with their licenses
    • U-Acct-TR 3.2: it should be possible to infer compatible license for resultant document given the licenses of all works it uses

Did not have time to think about how to integrate:

  • Acct-UR1: Allow users to trust the evidence for the compliance of a project or a document by enabling assertions of provenance with the normal guarantees required for digital records
    • Acct-TR1.1: The ability to attach digital signatures proving the identity of the witness asserting a provenance account (it should be nearly impossible to change signed provenance without detection)
    • Acct-TR1.2: The ability to prove the time at which a signed provenance account was signed/created (it should be nearly impossible to provide such timestamps that are for times before or after the current time)
    • Acct-TR1.3: Non-repudiation: It should be nearly impossible to deny having asserted a provenance account once a signature and timestamp have been distributed.
    • Acct-TR1.4: Canonical form: To support use of cryptographic signatures, there should be a canonical form defined for provenance accounts that allows different provenance engines to agree on the byte-level form of a provenance record.
    • Acct-TR1.5: Representation of such signatures and timestamps should have standardized form documenting all processing required to verify them, as XMLSignature does for XML fragments.
    • Acct-TR1.6: In addition to standard signature and timestamp verification, it should be possible to test whether a timestamp is valid by comparing the timestamp time with artifact creation times/process run times within the provenance account being timestamped (one can’t validly claim to have witnessed things before they are alleged to have happened)
  • Acct-UR2: Users should be able to sign provenance accounts created at different levels of detail
    • Acct-TR2.1: It should be possible to define the level of detail being signed in standard ways. Minimally by specifying the exact set of statements being signed, but potentially by constraining the types of artifacts and processes being signed (only permanent files, only write operations) or the set of attributes (just Dublin Core) being included, etc.
  • Acct-UR3: Users should be able to decide whether a more detailed provenance description is fully consisent with a less detailed one (the less detailed one can be validly inferred from the detailed one without requiring additional provenance statements)
    • Acct-TR3.1: A provenance standard should include the definition of rules of inference or assertions that, when applied, preserve causal relations (e.g. transitivity, if a<--b<--c, then a<--c) Note: if a,b,c are non-atomic, e.g. represent document files, transitivity is not guaranteed (if a only contains chapter 1 and b has an added chapter 2 with c containing only chapter 2, c is not casually related to a in terms of file operations on bytes) – this issue has been discussed extensively in the creation of OPM and has been seen to severely limit valid inferences.
    • Acct-TR3.2 It should be possible to specify when a provenance record is claimed to be complete w.r.t. artifacts and processes of given types (relevant for Acct-UR5 as well)
  • Acct-UR4: Users should be able to provide notarization and signed statements about provenance graphs derived from more detailed accounts or ensembles of accounts, without exposing the more detailed information
    • Acct-TR4.1: It should be possible to define the set of operations performed on provenance accounts and provenance signatures and timestamps (the provenance of provenance) in a standardized way to support signing of such metaprovenance, i.e. being able to sign a statement that says “after verifying the signatures and timestamps on accounts A and B, I claim that if you trust the signers of A and B and their clocks, the following account C is true. (although you can’t view them, the contents of a set of accounts validly signed and/or dated as documented here, I claim that the account I’ve also signed here is implied by them).
  • Acct-UR5: Users should be able to make non-quantitative/relative/qualitative assertions comparing provenance accounts or provenance with workflow templates (contracts, recipes) (“fewer than three eggs were harmed in the making of this particular cake”, “text A was created before text B”, “account A is consistent with workflow B ([ab, ac] does not contradict [bc] unless b was created after c )”
    • Acct-TR5.1: It should be possible to use assertions about causality in combination with metadata about artifacts and processes and the semantics of that metadata to develop rules that would identify provenance accounts as inconsistent in cases that could not be identified solely based on causality. (related to discussions of the defiition of what an OPM Profile should be.)
    • Acct-TR5.2 Provenance should minimally standardize semantics related to time constraints and aliases (ID mapping)
    • Acct-TR5.3 It should be possible to specify the effect of processes on attribute values (physical processes conserve mass, exothermic processes cause temperature increases, etc.) to support analysis of whether provenance accounts are complete
  • Acct-UR6: Users should be able to compare provenance with derivation rules in content (artifact) licenses (i.e. copyright) to assess consistency
    • Acct-TR6.1: It should be possible to annotate provenance artifacts as copyrighted/licensed entities
    • Acct-TR6.2: The semantics of copyrights should be standardized w.r.t. provenance (i.e. it should be possible to decide whether a provenance account is consistent with or implies a copyright derivation relationship, constitutes fair use, etc.)

Trust

Exemplar Use Cases: associations, assessment

  • U-Tru-UR 1: Enable users to assess the trustworthiness of Web data. (assessment)
    • U-Tru-TR 1.1: Allow applications to associate source information with aggregated data.
    • U-Tru-TR 1.2: Applications should be able to use source information to compute trust associated with data.
    • U-Tru-TR 1.3: Allow versions of trust to reflect different versions of source information (used to compute trust).
    • U-Tru-TR 1.4: Applications should be able to deal with missing provenance information (see also Imperfections)
  • U-Tru-UR 2: It should be possible for users to assess trust on Web data based on its attribution metadata (associations)
    • U-Tru-TR 2.1: Web data should be annotated with attribution metadata
    • U-Tru-TR 2.2: Attribution metadata should be expressed in a formal and machine-processable language
  • U-Tru-UR 3: Allow users to interpret the evaluation of the trustworthiness of Web data
    • U-Tru-TR 3.1: Applications should enable users to understand the process used to compute trust.
    • U-Tru-TR 3.2: Applications should enable users to understand the measurement value of trust.

Imperfections

Exemplar Use Cases: emergency


Note the following user requirements were not directly exposed by the Emergency Use case.

  • U-Imper-UR 1: Allow users to access provenance information even if it cannot be directly observed.
    • U-Imper-TR 1.1: Allow applications to assert provenance with a degree of uncertainty
    • U-Imper-TR 1.2: Allow applications to infer dependence/causation with a degree of uncertainty
    • U-Imper-TR 1.3: Require asserter to be identified, and the nature of assertion to be pecified (guess, inference, ...)


  • U-Imper-UR 2: Allow users to access summarized provenance (UR inspired Re and Suciu's paper by http://www.cs.washington.edu/homes/suciu/approx_lineage.pdf)
    • U-Imper-TR 2.1: Allow applications to approximate provenance so that its storage requirement is reduced, while still capturing the "most important derivations"
    • U-Imper-TR 2.2: Require summarizer to be identified, and the nature of summary to be specified

Debugging

Exemplar Use Cases: crosswalk, bug

  • U-Debug-UR 1: Allow users to detect where there is a single point of failure (source of potentially faulty information) somewhere in the process by which we came to have some derived information (bug)
    • U-Debug-TR 1.1: Representing the full record of how each actor in a chain of sources depended on others
    • U-Debug-TR 1.2: Provide a mechanism to analyse a record of how some information was produced to detect where there is reliance on single sources for information critical to producing that information
    • U-Debug-TR 1.3: Provide a mechanism to analyse across multiple instances of the use of a source to determine how frequently that source was faulty