Hypothesis-Based Knowledge Bases

From W3C Wiki
Jump to: navigation, search

Helping Knowledge Discovery Through Hypothesis-Based Epistemic Markup

N.B. This is an outdated use case, now replaced by this one A linked open data store of scientific information that updates and elaborates on medication safety statements present in drug product labels - Anita de Waard, November 20 2011


Speed up drug discovery or disease treatment using a hypothesis-based knowledge base


Pharmaceutical industry, academics, clinical researchers.

Value Proposition

Pharmacological research

The goal of industrial pharmacologists is to design and identify therapeutic drugs for (mostly) human diseases. Their work tends to be focused on conducting feasibility studies for new avenues of drug discovery and devising assay systems for screening compounds. In today’s knowledge environment, researchers have access to aggregated data garnered from large scale text-mining experiments which run over the biomedical literature. However, since these text mining systems are not able to distinguish between a casual or non-essential mention of a drug or therapeutic area, and the proposal of where a truly new therapy, the amount of data they needs to weed through to find anything new is overwhelming. To find out whether or not there is a real connection between a given target and the therapeutic area of interest and identify whether a clinical effect was a side effect or a therapeutic effect, it turns out they still have to weed through vast quantities of literature, after all.

Exploration of new treatments for Spinal Muscular Atrophy

SMA is a..

Epistemic Markup

To help speed up the knowledge discovery process for each of these user communities, what is needed is a new level of markup, which we are calling ‘Epistemic Markup’, that allows the user access to the knowledge claims, linked to experimental evidence which forms the argumentational backbone of the article. Once texts are augmented with such markup, they can choose between statements that contain key research questions that have been experimentally verified, and statements that are simple mentions of an entity or disease which are not experimentally tested.

The use case focuses on adding such epistemic markup. Once this is easy to add and trivial to query internal and external documents can be connected and browsed with ease, and relations between current and past hypotheses visualized directly. Since hypotheses that are not of interest can be excluded easily, the number of data sources under scrutiny at any given time is drastically reduced, and precious research time recovered.


  • Data integration
  • Text mining
  • Fact extraction
  • Argumentative ontologies


  • Use case in detail (listing hypotheses, content sources, etc).
  • Corpus - content sources
  • Argumentation Ontology
  • Linked Data repository
  • “Epistemic markup” - linking knowledge claims to experimental evidence


  1. Establish a pharmaceutical domain of interest, develop a collection of research hypotheses
  2. Define a (large enough) collection of content in this domain that offers adequate mining capabilities
  3. Make this content accessible in a form that allows efficient Natural Language Processing
  4. Decide on an argumentation ontology (possibly SWAN or ScholOnto?)
  5. Run NLP algorithms on the content sources
  6. Get list of hypotheses for analysis
  7. Expert users sample NLP output for validity and answer “does the corpus help speed up assessment of the hypotheses?” (probably few iterations here)
  8. Get the ok from all partners on
    1. Domain
    2. Content
    3. Argumentation ontology
    4. NLP work
  9. Then perform large-scale implementation
  10. Wide-scale user testing
  11. Communication to world at large

Possible Partners

Possible pharma industry partners

  • Vijay Bulusu, Pfizer
  • Susie Stephens, Johnson and Johnson
  • Ted Slater, Merck
  • Thérèse Vachon, Novartis

Possible SMA partners

  • Maryann Martone, SMA Foundation & UCSD

Content providers

Technical support

  • Anita de Waard, Elsevier Labs
  • Pieder Caduff, Reaxys
  • Linked Data repository crew, Elsevier

Content sources


Text mining collaborators

  • Ágnes Sándor, Xerox Research Europe
  • Ed Hovy, ISI
  • Maria Liakata, EBI

Argumentation ontology specialists

  • Jodi Schneider, DERI
  • Paolo Ciccarese, Harvard/MGH

Success Criteria

  1. Ability to confirm or uncover new disease-gene relationships
  2. Measurable increase in speed of identifying/assessing hypotheses
  3. Ease of replication to another domain.

Other desirable outcomes

Improved state-of-the-art in claim identification (text mining). Improved state-of-the-art in argumentative modeling.

See also

previous use case this draws from