Helping Knowledge Discovery Through Hypothesis-Based Epistemic Markup

N.B. This is an outdated use case, now replaced by this one A linked open data store of scientific information that updates and elaborates on medication safety statements present in drug product labels - Anita de Waard, November 20 2011

Objective

Speed up drug discovery or disease treatment using a hypothesis-based knowledge base

Stakeholders

Pharmaceutical industry, academics, clinical researchers.

Value Proposition

Pharmacological research

The goal of industrial pharmacologists is to design and identify therapeutic drugs for (mostly) human diseases. Their work tends to be focused on conducting feasibility studies for new avenues of drug discovery and devising assay systems for screening compounds. In today’s knowledge environment, researchers have access to aggregated data garnered from large scale text-mining experiments which run over the biomedical literature. However, since these text mining systems are not able to distinguish between a casual or non-essential mention of a drug or therapeutic area, and the proposal of where a truly new therapy, the amount of data they needs to weed through to find anything new is overwhelming. To find out whether or not there is a real connection between a given target and the therapeutic area of interest and identify whether a clinical effect was a side effect or a therapeutic effect, it turns out they still have to weed through vast quantities of literature, after all.

Exploration of new treatments for Spinal Muscular Atrophy

SMA is a..

Epistemic Markup

To help speed up the knowledge discovery process for each of these user communities, what is needed is a new level of markup, which we are calling ‘Epistemic Markup’, that allows the user access to the knowledge claims, linked to experimental evidence which forms the argumentational backbone of the article. Once texts are augmented with such markup, they can choose between statements that contain key research questions that have been experimentally verified, and statements that are simple mentions of an entity or disease which are not experimentally tested.

The use case focuses on adding such epistemic markup. Once this is easy to add and trivial to query internal and external documents can be connected and browsed with ease, and relations between current and past hypotheses visualized directly. Since hypotheses that are not of interest can be excluded easily, the number of data sources under scrutiny at any given time is drastically reduced, and precious research time recovered.

Methods

Data integration
Text mining
Fact extraction
Argumentative ontologies

Components

Use case in detail (listing hypotheses, content sources, etc).
Corpus - content sources
Argumentation Ontology
Linked Data repository
“Epistemic markup” - linking knowledge claims to experimental evidence

Deliverables

Establish a pharmaceutical domain of interest, develop a collection of research hypotheses
Define a (large enough) collection of content in this domain that offers adequate mining capabilities
Make this content accessible in a form that allows efficient Natural Language Processing
Decide on an argumentation ontology (possibly SWAN or ScholOnto?)
Run NLP algorithms on the content sources
Get list of hypotheses for analysis
Expert users sample NLP output for validity and answer “does the corpus help speed up assessment of the hypotheses?” (probably few iterations here)
Get the ok from all partners on
1. Domain
2. Content
3. Argumentation ontology
4. NLP work
Then perform large-scale implementation
Wide-scale user testing
Communication to world at large

Possible Partners

Possible pharma industry partners

Vijay Bulusu, Pfizer
Susie Stephens, Johnson and Johnson
Ted Slater, Merck
Thérèse Vachon, Novartis

Possible SMA partners

Maryann Martone, SMA Foundation & UCSD

Content providers

Technical support

Anita de Waard, Elsevier Labs
Pieder Caduff, Reaxys
Linked Data repository crew, Elsevier

Content sources

Elsevier journals and databases, e.g. Reaxys - https://www.reaxys.com/info/

Others?

Text mining collaborators

Ágnes Sándor, Xerox Research Europe
Ed Hovy, ISI
Maria Liakata, EBI

Argumentation ontology specialists

Jodi Schneider, DERI
Paolo Ciccarese, Harvard/MGH

Success Criteria

Ability to confirm or uncover new disease-gene relationships
Measurable increase in speed of identifying/assessing hypotheses
Ease of replication to another domain.

Other desirable outcomes

Improved state-of-the-art in claim identification (text mining). Improved state-of-the-art in argumentative modeling.

Hypothesis-Based Knowledge Bases