HCLSIG/CDS/Datasets and ontologies

Datasets and ontologies relevant for the CDS task force


Drug datasets


Drugbank.ca provides drug (i.e., chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e., sequence, structure, and pathway) information. It also includes information on drug-drug and drug-food interactions.


A linked version of the NLM's RxNorm database that connects prescription drugs, ingredients, and NDC through RXCUI a concept unique identifier. RxNorm is a product developed by NIH’s National Library of Medicine. It currently interlinks 12 different drug vocabularies around a unique concept identifier. Due to licensing only six of the drug vocabularies are made available as part of the LODD cloud. This includes: Medical Subject Headings,, Metathesaurus FDA National Drug Code Directory, Metathesaurus FDA Structured Product Labels, National Drug File, RxNorm Vocabulary, Veterans Health Administration National Drug File

National Drug File Reference Terminology (NDF-RT)

NDF-RT is the terminology used by FDA and the FedMed collaboration to code these essential pharmacologic properties of medications: Mechanism of Action Physiologic Effect Structural Class

Drug interaction knowledge base

Known and predicted metabolic inhibition drug-drug interactions with links to and summaries of evidence. HTML rendering: http://dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/front-page.html D2R and SPARQL endpoint: http://dbmi-icode-01.dbmi.pitt.edu:2020/.

Datasets containing associations between genetic variation, associated phenotypes and genetic tests

Pharmacogenomics Knowledgebase / PharmGKB

A large database of curated knowledge and raw data about associations between genes, genetic variants, drug response and disease.

GWAS Central (formerly called HGVbaseG2P)

A database of genome-wide association studies that also provides summaries of study results.


A wiki-based platform containing information on phenotypes associated with SNP variants, population prevalence of genetic variants and SNP microarrays.

GET-Evidence (evidence.personalgenomes.org)

A large database of automatically annotated and then manually curated information about the impact of genetic variations. Example: http://evidence.personalgenomes.org/MYL2-A13T

Online Mendelian Inheritance in Man (OMIM)

Information about diseases with Mendelian inheritance, including references to the implicated genes.


Results of studies that have investigated the interaction of genotype and phenotype.

HuGE Navigator

Information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic tests.

Genetic Association Database (GAD)

Diseases associated with genetic variants.


Aggregated gene-disease relationship data containing an integrated view over other datasets.

NCBI GeneTests
Genetics Home Reference

Genome databases with general data about genetic variation and human genomes

Locus Reference Genomic / LRG

An internationally recognized reference database, providing stable genomic DNA sequences and identifies for regions of the human genome.


Large-scale genetic structural variation data (e.g., insertions, deletions).


Collections of personal genetic data

1000 genomes project

Genome sequences of over 1000 volunteers

Database of the Estonian Genome Center, University of Tartu

A collection of genetic data associated with health and lifestyle data of over 50,000 persons.

Personal Genome Project

Whole-genome data donated by volunteers.

Vanderbilt Biobank

See http://www.nature.com/clpt/journal/v84/n3/full/clpt200889a.html

Relevant ontologies and taxonomies

Suggested Ontology for Pharmacogenomics (SO-PHARM)

A complex ontology covering the representation of genetic variation and pharmacgenomics.

Pharmacogenomics Ontology (PO)

Represents PharmGKB data; ontology for measures and outcomes.

Pharmacogenomics Relationship Ontology (PHARE)

Proposes concepts and roles to represent relationships of pharmacogenomics interest. Used for representing findings extracted from texts.

Sequence Ontology (SO)

Contains terms often used for the annotation of sequences and features, including detailed description of different types of sequence variations.

Disease Ontology

An ontology of human diseases.

Human Phenotype Ontology (HPO)
Mammalian Phenotype Ontology
Phenotypic Quality Ontology (PATO)

An ontology of types of phenotypic properties.

Logical Observation Identifiers Names and Codes (LOINC)

An established coding system for clinical lab results. Contains many identifiers for results of genetic tests.

Formats and schemas


A simple XML schema for the representation of SNPs [1]. Maintained by the Object Management Group (OMG).

===== Genomic Sequence Variation Markup Language (GSVML), ISO 25720:2009 An XML schema geared towa [2]. Maintained by the International Organization for Standardization (ISO).

HL7 Clinical Document Architecture (CDA) Genetic Testing Report (GTR)
Last modified on 5 October 2011, at 14:17