|Name||Topic||Short Description||Size||Status/ Activity||Example Instances||SPARQL Endpoint|
|DrugBank||Drugs||Drugbank.ca provides drug (i.e., chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e., sequence, structure, and pathway) information (doi:10.1093/nar/gkj067)||766,920 triples; 4,800 drugs, 2,500 protein sequences||updated regularly||Varenicline via Marbles, via OpenLink Data Explorer||http://www4.wiwiss.fu-berlin.de/drugbank/sparql|
|LinkedCT||Clinical Trials||Linked data source of trials from ClinicalTrials.gov||7 million triples, 62000 trials||preview release||Influenza (Intervention), A Trial, AIDS (condition), A reference, A location||http://data.linkedct.org/sparql|
|DailyMed||Drugs||dailymed.nlm.nih.gov provides information about approved prescription drugs, includes FDA approved labels (package inserts)||164,276 triples; 4,039 drugs||updated regularly||"Sterile Water (Irrigant)" via Marbles, via OpenLink Data Explorer||http://www4.wiwiss.fu-berlin.de/dailymed/sparql|
|DBpedia||Drugs/ Diseases/ Proteins||RDF data about 2.49 million things that has been extracted from Wikipedia||218 million RDF triples; 2,300 drugs, 2,200 proteins||updated every 3 months||Aspirin, HIV||http://dbpedia.org/sparql|
|Diseasome||Diseases / Genes||Diseasome describes characteristics of disorders and disease genes linked by known disorder–gene associations||91,182 triples; 2,600 genes||updated 2006||Alzheimer's via Marbles, via OpenLink Data Explorer||http://www4.wiwiss.fu-berlin.de/diseasome/sparql|
|RDF-TCM||Genes / Diseases / Medicine / Ingredients||Traditional Chinese medicine, gene and disease association dataset and a linkset mapping TCM gene symbols to Extrez Gene IDs created by Neurocommons||117,643||updated August 2009 (stable)||Ginkgo biloba||http://hcls.deri.org/sparql; graph name: http://hcls.deri.org/resource/graph/tcm|
|RxNorm||Drugs|| A linked version of the NLM's RxNorm database that connects prescription drugs, ingredients, and NDC through RXCUI a concept unique identifier. RxNorm is a product developed by NIH’s National Library of Medicine. It currently interlinks 12 different drug vocabularies around a unique concept identifier. Due to licensing only six of the drug vocabularies are made available as part of the LODD cloud. This includes: Medical Subject Headings,, Metathesaurus FDA National Drug Code Directory, Metathesaurus FDA Structured Product Labels, National Drug File, RxNorm Vocabulary, Veterans Health Administration National Drug File
Links are provided connecting RxNorm to drug bank and to the UMLS.
|over 7.7 million triples; 165,806 RXCUI (Concept Unique Identifiers) Unique drugs and ingredients; 332,754 RXAUI (Atomic Unique Identifiers) sourced terms||Based on 3/2010 Rxnorm Release; Last updated 5/2010||Singulair from the Metathesaurus FDA Structured Product Labels||http://link.informatics.stonybrook.edu/sparql/|
|SIDER||Diseases / Side Effects||SIDER contains information on marketed drugs and their adverse effects (doi:10.1038/msb.2009.98)||192,515 triples; 1,737 genes||updated 2009||Confusion via Marbles||http://www4.wiwiss.fu-berlin.de/sider/sparql|
|STITCH||Chemicals / Proteins||STITCH contains information on chemicals, proteins, and their interactions (doi:10.1093/nar/gkm795)||7,500,000 chemicals; 500,000 proteins; 370 organisms||updated July 2009||Lactose via Marbles||http://www4.wiwiss.fu-berlin.de/stitch/sparql|
|ChEMBL||Chemical / Assays (Proteins, Organisms) / Papers||ChEMBL] contains information on trial drugs with information about activity against targets like but not limited to proteins. All is backed up by and linked to literature. Includes links to Bio2RDF for ChEBI and Uniprot. License: CC-BY-SA.||~24M triples||Updated 2010-01||A IC50 activity.||http://rdf.farmbio.uu.se/chembl/sparql|
|WHO Global Health Observatory||Infectious Diseases /Demography / Socioeconomic Conditions / Environmental Factors||Data and statistics for infectious diseases at country, regional, and global levels||354300||Updated 2010-09||xxx||http://aksw.org/Projects/GHO2SCOVO?v=wmb|
This figure shows the incorporation of LinkedCT, DailyMed, DrugBank, Diseasome, RDF-TCM, and SIDER into the Linked Data cloud. These data sets are represented in dark gray, while light gray represents other Linked Data from the life sciences, and white indicates interlinked datasets covering geographic, person-related and conceptual data. More on the interlinking methodology and statistics can be found on the Interlinking page.
Most of the LODD datasets have also been integrated into the SPARQL endpoint of the HCLS Knowledge Base, see the wiki page of the HCLS KB for further information.
Bio2RDF Data Sets
The Bio2RDF project has published 40 biology-, gene- and medical-related datasets (altogether 2.3 billion triples). The datasets are available via SPARQL endpoints and as Linked Data. It is recommended that you use the Bio2RDF Java Servlet, and optionally download the databases for efficient personal use. Running your own instance of the OpenLink Virtuoso AMI for EC2 is also an option (and for basic URI resolution doesn't require the Java Servlet, although if you want advanced queries you should still download it and configure it to query your EC2 sparql endpoint).
- Bio2RDF sparql endpoint list Sparql endpoint list in RDF
- Identification of an autoimmune enteropathy-related 75-kilodalton antigen, via an OpenLink hosted edition of Bio2Rdf
- Structure of the gene encoding the human cyclin-dependent kinase inhibitor p18 and mutational analysis in breast cancer, via an OpenLink hosted edition of Bio2Rdf
- PubMed article viewed using the Marbles Linked Data browser.
- PubMed author viewed using the Marbles Linked Data browser.
- OMIM Killer Cell Lectin-Like Receptor viewed using the Marbles Linked Data browser.
- Falcons Search for KILLER CELL. The Bio2RDF data has been crawled by the Falcons Semantic Web Search engine. This is an example on how the data is accessed by humans using the search engine. Falcons also offers an API that can by used by applications to access the data.
- Information about the chem2bio2rdf data sets
Data Sets for the LODD Task
To complement the drug-related Web of Data build by the LODD effort, the following data sets could/should also be published as Linked Data.
The LODD effort is currently gathering more information about relevant datasets. See also Evaluation of LODD Data Sets for current evaluation results.
- Adis R&D Insight
- Citeline TrialTrove
- Drug Bank
- Drug Ontology
- Investigational Drug Database - Proprietary
- KEGG Drug
- National Drug Code
- Orange Book
- Pharmaprojects - Proprietary
- VA NDF-RT
- Other data sources could include blogs, discussion boards, wikis, etc.
Alternative Herbal Medicine use case
Identified Based Linkage Points
Data Set Attributes
- Data Format