Jim McCusker


Experimental workflow covers numerous laboratories, systems, and information models. For example, specimens are managed in a biospecimen management software package, the details of a particular experiment are encoded in MAGE object models, and the final analytic worflow is executed in statistical systems or workflow management tools like Taverna or GenePattern. A consistent data model for provenance for all of these tools allows researchers to have a complete picture of how the data was produced and what it would take to reproduce.


Gain a complete understanding of the experiment and its artifacts.

Use Case Scenario

A biologist is evaluating the results of an experiment, and wants to ensure that there were no confounding classes that weren't controlled for. A number of samples were used in the experiment, and the biologist needs to discover the full history of the samples in question, including how they were prepared, and the (deidentified) clinical history of the sources of the samples. The biologist must pull together the closure of information about artifacts that were used in the experiment.

Problems and Limitations

Currently, information about experiments and their samples can only be pulled together through query of multiple database systems with different information models. A unified model for this history would allow users to perform one federated query (possibly many times to follow the transitive closure) over one information model.