Semantic Web Use Cases in Health Care and Life Sciences

Periodic draft

This document will be used to capture final forms of use cases. Please see the more volatile wiki version for use case development.

The Semantic Web lends itself to a seamless integration of multidisciplinary data. Parties considering an investment in Semantic Web technologies must examine the use cases in their own domain. This document illustrates a sampling of use cases related to health care and life sciences.

nameclinical datapersonal health carepharmaceutical developmentbiologychemistryscientific publication
Drug Discovery
Electronic Lab Notebooks
Comparator Arm Data
Patient Data Ownership
Biotech Acquisition
Supply Chain Automation
Web Integration

Drug Discovery

A synthesis of omics, pathway, systems biology brings us to the next level of proactive drug discovery. SemWeb architecture enables uncoordinated collaboration and connects to growing fabric of highly connected data and text.

Biologists, chemists and health care informaticians have modeled vast swaths of biological function, chemical compound and disease data into public Linked Data repositories. Complementing this with proprietary stores of biomarker, pathway and more evolved biological models knits these sources together to more effectively answer investigation and discovery queries.

Structured Experimental Results

The panacea of complete, recall-able coding of all institutional knowledge is, of course, out of reach. A representation which encourages/enables unambiguous recall, combined with a strategic assessment of assets, provides a linear value proposition for evolving towards a more re-usable information infrastructure. For example, one can initially simply record that an experiment demonstrated an upregulation of a protein in the presence of a compound, and, as needs or sequencing output evolve, qualify that upregulation with factors and necessary conditions. This coded knowledge can travel from lab notebooks to publications to institutional archive.

Increasingly, experimental protocols and findings are traded or shared in limited partnerships or acquired. Such partnerships require interpretation by parties without time for culture share between collaborating parties. A course capture of experimental meta data and results can show immediate value and motivate incremental precision as use cases motivate.

Comparator Arm

There is a move to have FDA release "comparator arm" data from clinical trials. This is data that is gathered as part of a new drug investigation, but isn't related to the new drug - it's comparison data about either on-market drugs or the state of the disease. Either way it is not competitive to the pharma that collected it. Once this data starts to move out onto the web there will be a huge pull to integrate with it. Getting the standards right for formatting, querying, and dealing with it will be huge, and the companies that are at the table will be the best equipped to use the data first.

Patient Data Ownership

Patients are increasingly capturing their own data, from classic health record information (V.A. "blue button") to modern versions of health info (Patients Like Me, Cure Together) to genomic information (23andme). There are no good standards to federate data from across this chain into a cohesive, usable and integrated system. If pharma doesn't create this as a standard, the odds are good that it will be a private standard à la Facebook and they extract high rents from those who need access to large populations for clinical molecular profiling (i.e., pharma).

Biotech Acquisition

Web standards, especially RDF/OWL/SPARQL, can ease the process of due diligence in a complex scientific data business. Pharmas constantly acquire projects, products, biotech companies, and each other. They are constantly forming alliances via contract with each other, and the vast majority of the underlying information would be far easier to integrate if stored in web formats. This is work that is much less exploratory than the current HCLS work, as these data formats are already well worked out and the economic imperative to lower the time it takes either to make an acquisition or to complete it is high.

Supply Chain Automation

As the complexity of process input and product interaction grows, business modelers will need more autonomy in describing their domains. While this is true in many domains, it's particularly salient to workflows which consume and produce unstable goods such as biologics. Exploitation of RDF's distributed extensibility model will enable less centralized, more functional modeling of business processes, output from new instruments, and evolving understanding of chemical/biological processes. This effectively moves the information from the Electronic Lab Notebook into the operations domain, using scientific knowledge to identify costs and bottlenecks and drive improved accounting.

Patent/Literature Mining

Collaborative R&D



Clinical, municipal and nongovernmental agencies negotiate their own reporting contracts with monitoring agents. Likewise, regulatory bodies must respond flexibly to different reporting formats and information spaces. The Semantic Web offers consistent access to data regardless of whether the structure of that data has been standardized by a sanctioned organization or it has not yet gone through the standards process. This allows organizations to focus their standardization efforts on components key to early use cases and let other standards be developed or discovered by concerned parties.

Pharma Co-vigilence


@@pharmas increasingly partnering with other pharmas and/or biotechs to develop new therapies.

@@Web Integration

The grounding of the Semantic Web promotes the integration of the assertional framework with conventional web architecture and tools. The ubiquity of tools supporting Web standards like HTML and XPath enables one to associate structured knowledge with regions of documents, communicating simultaneously in languages tailored to humans and machines. Web architecture has permeated our society; scientists and technicians need little or no training to use systems which are based on the familiar web browsing experience. The extension of this architecture to support not just resource linking and annotations but a fabric of structured knowledge will be the easiest transition into deep knowledge capture.

Eric Prud'hommeaux

$Date: 2011/09/08 18:01:09 $