Use Case Computer Assisted Research

From XG Provenance Wiki
Jump to: navigation, search


Computer Assisted Research


Jim Myers

Provenance Dimensions

Primary: Comparison Secondary: Debugging, Process, Justification for Decisions

Background and Current Practice

Researchers 'stand on the shoulders of giants' benefiting from the results of prior and complimentary research efforts. Today, most of the work to discover related work is manual - potentially automated search followed by manual reading of papers, attendance at talks, and personal correspondance. The rich records possible through capture of metadata and provenance raise the potential for much more integration of reference and peer research into ongoing research activities.


To enhance productivity in the research process by automating the discovery of related work.

Use Case Scenario

A user pursues their work goals within a system that captures information about their plans and activities. Using this information the system interacts with community and reference data and literature systems to provide just-in-time information that aids the user in refining their project plans, debugging problems that occur as they work, and analyzing their results.

Alice has decided to extend her research through the use of a shared instrument facility - using an unfamiliar technique to better characterize a molecule she has synthesized. Bob, her graduate student, enters the general plan into his electronic notebook and, after a moment, is presented with several experimental protocols from published work that give him a good sense of best-practices in terms of calibrating the instrument, performing interleaved conrol experiments, etc. Bob selects and modifies one and begins his work. While running the instrument, Bob has trouble using a feature in the instrument control software. With the click of a button, Bob pulls up several cases where other users have seen the same error and he adopts a work-around described by a colleague in the facility's shared database. With the data in hand, Bob meets with Alice and they ponder the existence of several unexpected peaks in the spectra they have. Suspecting an impurity, they query the literature services provided through their library to get a list of impurities that have been seen in similar chemical syntheses. A few more clicks and Bob and Alice have pulled recent reference spectra of these compounds into their analysis software but none fit. They then check the instrument's online log and discover that the compound studied by a recent user is a match. Mystery solved and one more experiment run before the alloted instrument time is used. They also alert the facility staff who then notify two additional groups who may be impacted.

Problems and Limitations

The 'point' of this use case is to emphasize that provenance of related processes can be used in 'real-time' to steer work. The value from such a capability could be a significant factor in encouraging users to record metadata and provenance - they see immediate feedback in terms of useful reference information and recommendations.

The types of queries required to support the use cases above are just the same as those envisioned for historical/after-the-fact uses. They thus share the problems of those use cases, i.e. finding 'relevant' nformation depends on significant domain-specific metadata in addition to provenance. Real-time use does add a constraint that relevant information would need to be programatically accessible, efficiently indexed, and aggregated from themultiple sources available to a given user.

Unanticipated Uses (optional)

The specific use case focuses fairly closely on data, but one can envision uses that overlap with social networking and general recommender systems, e.g. "People like you (who have used this data and that workflow) have read this paper and 60% attend that conference" ...

One might also imagine that information now collected by surveys, such as information on the popularity of different sofware tool in a community, would be automatically available (perhaps on as statistical aggregates to protect privacy) and dynamically updated. Such information would give an increased sense of presence within a comunity and allow self-analysis within the community about their practices.

"Research" could also be replaced with "work" in general to create a similar use case in a business setting.

Existing Work (optional)