James Cheney

(Curator: Simon Miles)

Provenance Dimensions

Primary: Justification for Decisions: hypothesis management (Content)

Secondary: Understanding (Use)

Background and Current Practice

Government standards (e.g. in the UK, the "green book") mandate how the source and intermediate data are to be linked to the final reports and conclusions produced by the study. In particular, the study needs to fully explain how the primary data were collected, how secondary data were analyzed and interpreted, and which analytical or interpretive processes were used to draw conclusions. The report may also need to link to associated publications or dissemination materials.

These standards are meant to make it possible for other experts to fully understand the quality of the study, and for decision makers (usually non-experts) to make qualitative judgments about the strength of the evidence for the conclusions.


The conclusions of studies bearing on public policy must be linked to their supporting data in order to meet standards inposed by funding organizations.

Provenance technology can greatly decrease the effort involved in producing acceptable linked data, and doing this in a standard or automatic way may be more reliable or useful than current practice.

Use Case Scenario

A researcher is performing a study involving multiple steps, including data gathering, analysis, then summary conclusions. The conclusions are passed to decision makers without the same expertise as the researcher. In using the conclusions, the decision makers require the same data as informed them to be analysed using different methods or considering different hypotheses. Later, the study data, analysis and conclusions are compared with the actual effects of the decision. This general scenario is derived from the more specific case as follows.

A social scientist is studying the relationship between education and poverty with support from a government grant.

The study involves recruiting participants, distributing and collecting surveys, and performing telephone interviews. The data collected through this process are initially recorded on paper and then transcribed into an electronic form. The paper records are confidential but need to be retained for a set period of time.

Once the data are collected and transcribed, the scientist processes and interprets the results and writes a report summarizing the conclusions. The conclusions may then be incorporated into policy briefing documents used by civil servants or experts on behalf of the government to identify possible policy decisions that will be made by (non-expert) decision makers. This process may involve considering hypotheticals or reevaluating the primary data using different methodologies than those applied by the scientist originally. The report and its linked supporting data may also be provided online for reuse by others or archived permanently in order to make it possible to compare the conclusions of the study with actual effects of decisions.

Problems and Limitations

This use case is hard to achieve using current technology because it requires a great deal of additional effort from scientists to manage the supplementary data and links manually.

Provenance technology that meets standards such as the "green book" could dramatically decrease this effort.

There are additional challenges, such as maintaining links between (confidential) data stored on paper and (usually public) intermediate data and final reports, that may not be solvable solely by provenance technology in computer systems. However, the availability of this technology may make it more attractive to carry out some kinds of studies using Web-based surveys or IP telephony for which provenance technology could provide support.

Unanticipated Uses

Longitudinal studies comparing different methods of analyzing and collecting data.

Comparisons between the predicted and actual effects of policy decisions.

Existing Work

