Use Case Handling Scientific Measurement Anomaly

From XG Provenance Wiki
Jump to: navigation, search


Provenance in support of discovering and handling the consequences of a scientific measurement anomaly.


Jitin Arora, Paulo Pinheiro da Silva

Provenance Dimensions

  • Primary:
    • Use: Imperfections
  • Secondary:
    • Content: Justification for decision


In many eScience scenarios, data is collected over a period of time and processed through complicated processes to produce artifacts. It may so happen that at a later date, some particular dataset needs to be treated differently or perhaps excluded if it is found that the equipment used to capture it was not operating as expected. It is then necessary to discover all the artifacts that were generated by using that dataset and possibly to retract or modify any claims made on the basis of those artifacts. During this process, additional input may have come from sources that are hard to capture such as GUI or keyboard input by a scientist, and that may have an effect on the propagation of the anomaly or error. In the following scenario, we describe the need to capture provenance in a dynamic manner that can be traversed forwards in addition to being traversable backwards.


To enable scientists to trace the claims they have made on the basis of artifacts generated using a given dataset.

Current Practice Scenario

Currently a very limited amount of provenance may be captured by choosing suitable file names or organizing files in folders but this becomes quickly unmanageable.

Use Case Scenario

Astrophysicist Joe generates images of the sun using image data captured from a telescope. Joe later suspects that in a particular series of experiments, the telescope was not appropriately calibrated before use. Joe must determine all images that were generated and published based on this dataset. In addition, it is useful to be able to estimate the extent to which any measurement errors were propagated, which in turn requires knowledge of parameters used in the execution of the process. This will enable Joe to determine if she must substantially modify a previous proposed claim.

Problems and Limitations

To answer these questions, a complete trace of the execution of the process is needed, including the all the generated artifacts, the input parameters provided in such ways as keyboard input, shell environment variables, etc. The trace must also capture information about which alternatives were chosen (e.g., human decisions) and how to traverse the trace in the forward direction to discover the artifacts.