This wiki has been archived and is now read-only.

Provenance Dimensions

From XG Provenance Wiki
Jump to: navigation, search

This is a list of important dimensions in provenance that the group identified in order to guide the collection of use cases. We have grouped these dimensions into three major categories: the content of provenance information, the management of provenance as it exists on the web, and the use of provenance.

Note: These dimensions are not mutually exclusive. A use case may include more than one dimension. It is helpful in a use case to identify the primary dimension that the use case is trying to illustrate and then list other secondary dimensions.

The group is currently soliciting comments for these dimensions. Please put comments and suggestions in the discussion section.


  • Object - what the provenance is about
  • Attribution - provenance as the sources or entities that were used to create a new result
    • Responsibility - knowing who endorses a particular piece of information or result
    • Origin - recorded vs reconstructed, verified vs non-verified (eg with digital signatures), asserted vs inferred
  • Process - provenance as the process that yielded an artifact
    • Reproducibility (eg workflows, mashups, text extraction)
    • Data Access (e.g. access time, accessed server, party responsible for accessed server)
  • Evolution and versioning
    • Republishing (e.g. retweeting, reblogging, republishing)
    • Updates (eg a document that assembles content from various sources and that changes over time)
  • Justification for Decisions - capturing why and how a particular decision is made
    • argumentation - what was considered and debated (eg pros and cons) before reaching a solution
    • hypothesis management (eg in HLCS scientific discourse task when complementary/contrary evidence is provided by different sources)
    • why-not questions - capturing why a particular choice was not made
  • Entailment - given the results to a particular query in a reasoning system or DB, capture how the system produced an answer given what axioms or tuples it contained that led to those results


  • Publication - Making provenance information available on the web (how do you expose it, how do you distribute it)
  • Access - Finding and querying provenance information
    • Finding the provenance information, perhaps through an authoritative service
    • Query formulation and execution mechanisms
  • Dissemination control - Using provenance to track the policies for when/how an entity can be used as specified by the creator of that entity
    • Access Control - incorporate access control policies to access provenance information
    • Licensing - stating what rights the object creators and users have based on provenance
    • Law enforcement (eg enforcing privacy policies on the use of personal information)
  • Scale - how to operate with large amounts of provenance information


  • Understanding - End user consumption of provenance.
    • abstraction, multiple levels of description, summary
    • presentation, visualization
  • Interoperability - combining provenance produced by multiple different systems
  • Comparison - finding what's in common in the provenance of two or more entities (eg two experimental results)
  • Accountability - the ability to check the provenance of an object with respect to some expectation
    • Verification - of a set of requirements
    • Compliance - with a set of policies
  • Trust - making trust judgements based on provenance
    • Information quality - choosing among competing evidence from diverse sources (eg linked data use cases)
    • Incorporating reputation and reliability ratings with attribution information
  • Imperfections - reasoning about provenance information that is not complete or correct
    • Incomplete provenance
    • Uncertain/probabilistic provenance
    • Erroneous provenance
    • Fraudulent provenance
  • Debugging