Locating Biospecimens With Sufficient Quality

From XG Provenance Wiki
Jump to: navigation, search


Locating Biospecimens With Sufficient Quality


Joshua Phillips


In order to identify specimens of sufficient quality, researchers need information about how the specimen has been stored, when was it thawed, what procedure was used to collect it, etc. The Cancer Biomedical Informatics Grid (caBIG) [1] program has implemented caTissue [2], a biorepository tool for biospecimen inventory management, tracking, and annotation. The caTissue information model includes information about collection. storage, quality assurance, and distribution of specimens. This information could be represented as, for example, an OPM graph. Multiple provenance related queries are described in [3].

[1] https://cabig.nci.nih.gov/ [2] https://cabig.nci.nih.gov/tools/catissuesuite [3] https://cabig-kc.nci.nih.gov/Biospecimen/KC/index.php/CaTissue_1.1_Deployment_Guide_Chapter_6:_Deploying_caTissue_caGrid_Data_Service#Running_the_caGrid_test_queries


Describing caTissue data in a common provenance model enable:

  1. Answering more sophisticated provenance-related queries than are now possible using the caBIG Query Language (CQL).
  2. A more flexible approach to describing provenance than is possible using the current caTissue information model.
  3. Combining this information with provenance information from other tissue banking repositories to enable high-level, federated query.
  4. Combining this information with other provenance information (e.g. micorarray experiment provenance) to enable assembling more complete analysis..

Use Case Scenario

A researcher executes a query across multiple tissue repositories for all specimens that

  • were collected by procedure XYZ
  • have a clinical diagnosis of ABC
  • were fixed in formalin 30 minutes or less and were embedded in a low melting point paraffin

The researcher receives candidate specimens from each system.

Problems and Limitations

  1. caBIG data sources are currently not exposed through SPARQL endpoints.
  2. The caTissue model is expressed as UML which would need to mapped to some appropriate ontology.