HCLS/ISWC2007/BOF/Dolby

From W3C Wiki

Logic based querying of Integrated Life Sciences Data

Julian Dolby1, Achille Fokoue1, Aditya Kalyanpur1, Li Ma1, Chintan Patel2, Alan Ruttenberg3, Edith Schonberg1, Kavitha Srinivas1

1IBM Research; 2Columbia University Medical School; 3Science Commons

In life sciences research, there is a pressing need to understand biological phenomena spanning across the ‘vertical scale’ of biology viz. genetic, molecular, tissue and organ level. One step towards addressing this need are efforts that provide an integrated view of the data, as has been achieved in the dataset of PubMed, Gene Ontology Annotations, and Allen Brain Atlas data (ABA) from the HCLS group. Another step is to allow logic based querying of such data, to provide better recall for user specified queries.

In this work, we will present a pragmatic approach to logic based querying of the HCLS dataset using the Foundational Model of Anatomy (FMA) ontology, and the Gene Ontology (GO). To develop an anatomy centric view of these resources, we created mappings from MeSH to FMA and from ABA anatomical concepts to FMA. This integration enables us to perform interesting biological queries based on anatomical concepts (and their sub-parts) and on gene ontology concepts (and their sub-processes). Sample queries we support are of the form (underlined concepts refer to concepts from FMA and GO respectively that require inferencing: Find the genes known to be involved in Alzheimer’s disease, in the Hippocampal region that have a role in dendrite development.

There are several technical hurdles to implementing a logic based querying solution for these ontologies:

  • Reasoning on FMA is well known to be problematic for reasoners, due to the fact that FMA represents a deep part hierarchy where both the part-of and its inverse has-Part relations are represented. To overcome this, we reasoned only over the part-of relations in FMA.
  • Web search over articles, genes, or images needs to be relatively fast, where as reasoning tends to be a compute intensive process. To address this, we incorporated into our OWL reasoner (SHER), a fast and optimized reasoning algorithm for EL++ (a subset of OWL). This procedure will provide both precision and recall for the current integrated dataset, but will not provide complete recall if the dataset includes negation (e.g., when a gene record contains the absence of an association between a gene and an anatomical part, or gene process). However, because the EL++ reasoning algorithm is incorporated within the context of the SHER algorithm, we can incorporate complete web search with negation (although not within the time frame of a web search).
  • Explanations and traceability are an important aspect for validation of logic-based querying. Our reasoner provides key axioms from the two ontologies that were used to derive the results for a specific query.

We have also developed a simple user interface for biologists to perform logic-based querying of the integrated data.