HCLSIG/LODD/Data/DataSetEvaluation

From W3C Wiki
< HCLSIG‎ | LODD‎ | Data

This page provides an overview about the characteristics of datasets that could be relevant for the HCLSIG LODD effort.

LODD-related datasets that are already available as Linked Data

Project Home Page Topic Short Description Size License Data dumps Status/ Activity Possible Linking to Other Datasets Example Instances
chEBI Chemical Compounds dictionary of molecular entities focused on small chemical compounds 15,548 annotated entities free structured text files updated monthly CAS, KEGG
DailyMed Drugs information about approved prescription drugs, includes FDA approved labels (package inserts) 96,000 triples; 4,039 drugs XML, SPL updated regularly (RSS) RX Norm, NDC "Sterile Water (Irrigant)" via Marbles, via OpenLink Data Explorer
DBpedia Drugs/ Diseases/ Proteins RDF data about 2.49 million things that has been extracted from Wikipedia 218 million RDF triples; 2,300 drugs, 2,200 proteins free RDF updated every 3 months ATC, CAS, DrugBank, EntrezGene, HGNCid, OMIM, PubChem, ChemSpider Aspirin, HIV
Diseasome Diseases / Genes characteristics of disorders and disease genes linked by known disorder–gene associations 87,000 triples; 2,600 genes free structured text files updated 2006 OMIM, Entrez Gene Alzheimer's via Marbles, via OpenLink Data Explorer
Drug Bank Drugs drug (i.e., chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e., sequence, structure, and pathway) information 1.1 million triples; 4,800 drugs, 2,500 protein sequences permission of authors needed structured text files (FASTA, SDF, DrugCard) updated August 2008, irregular PubChem, FDA/NDA, ChEBI, IUPAC, INCHI, CAS, KEGG Varenicline via Marbles, via OpenLink Data Explorer
LinkedCT Clinical Trials Linked data source of trials from ClinicalTrials.gov 7 million triples, 62000 trials free - with Terms and Conditions Available upon request preview release DBpedia, GeoNames, Bio2RDF Influenza (Intervention), A Trial, AIDS (condition), A reference, A location
OMIM Genes compendium of human genes and genetic phenotypes 12,000 genes license from the Johns Hopkins University needed structured text files updated daily CHONDROSARCOMA via Marbles, via OpenLink Data Explorer; TUMOR PROTEIN p53 via Marbles, via OpenLink Data Explorer

LODD-related datasets that are already RDFized but not served as Linked Data on the Web

LODD-related datasets that are not available as Linked Data yet

Project Home Page Topic Short Description Size License Data dumps Status/Activity Possible Linking to Other Datasets Example Instances
Adis R&D Insight Drugs comprehensively reports on the latest developments of drugs in active research and development internationally 19,000 drugs written permission of Adis Data Information BV needed updated weekly CAS
ChemBlast Atoms information on all the ligands (HIV-related) and their scaffolds Molecule pictures, (MDL, Excel)? updated April 2008 IUPAC, PubChem
ChemSpider Chemical Compounds database of organic molecules containing more than 20 million compounds from many different providers >20,000,000 chemical compounds HTML, no downloads updated regularly ChEBI, DailyMed, KEGG, PubChem, Wikipedia, DrugBank, InChI, MESH
ClinicalTrials.gov Trials federally and privately supported clinical trials conducted worldwide; information about a trial's purpose, who may participate, locations, and phone numbers for more details 62,693 trials accompanied by origin and date of data, and modifications made HTML ChemIDplus
Citeline TrialTrove Trials information about ongoing clinical trials proprietary
DrugDB Drugs (offline)
Drug Ontology Drugs ontology including concepts such as indications, interactions, formulary, etc. OWL schema only updated 2005
DrugDigest Drugs usage advise for drugs, vitamins, and herbs 1,500 drugs permission needed HTML updated daily
DrugInfo Drugs covers drugs in clinical trials, approval processes and on the market; information collected from other NLM services mostly 15,000 drugs permission needed HTML updated daily CAS, CT.gov, DailyMed, Medline, PubChem
Investigational Drug Database Drugs investigational drug development, from first patent to eventual launch or discontinuation 107,000 therapeutic patents; 23,000 drugs; 80.000 chemical structures proprietary
IMS Drugs information about development, efficacy, and status of pharmaceuticals from early clinical testing through to launch 16,800 drug summaries proprietary updated weekly
KEGG Drug Drugs chemical structure based information resource for all approved drugs in Japan and the U.S.A; each is identified by the D number, and is associated with generic names, trade names, efficacy, target information, etc. academic usage structured text files updated July 2008 DailyMed, PubChem
LillyTrials Trials clinical trials sponsored by Eli Lilly and Company HTML, PDF updated regulary NDA
MedMaster Drugs information on drugs and their interactions, herbs and supplements >1000 drugs permission needed HTML updated daily NDA
National Drug Code Drugs prescription drugs and insulin products that have been manufactured, prepared, propagated, compounded, or processed by registered establishments for commercial distribution free structured text files updated regularly NDA
Orange Book Drugs Generic product ANDA (Abbreviated New Drug Approval) approvals free structured text files updated daily NDA, FDA
Pharmaprojects Drugs drugs and their bindings to proteins proprietary Entrez Gene
PubChem Chemical Compounds chemical structures of small organic molecules and information on their biological activities free ASN.1, XML, SDF updated daily IUPAC, InChI
RxNorm Drugs standard names for clinical drugs login required National Drug Code

LODD-related papers

Project Home Page Topic
Drugome Drugs
VA NDF-RT Drugs

Common Identifiers

CAS ChEBI ChemSpider CT.gov DailyMed DBpedia DrugBank Entrez NDA HGNC InChI IUPAC KEGG Medline MESH OMIM PubChem RxNorm
Adis R&D x
chEBI x x x
ChemBlast x x x
ChemSpider x x x x x x x
CTrialTrove
CT.gov x x x
DailyMed x x x
DBpedia x x x x x x x x x x x
Diseasome x x
Drug Bank x x x x x x x x
Drug Ontology
DrugDB
DrugDigest
DrugInfo x x x x x
IMS
IDD
KEGG Drug x x x x
LillyTrials x
LinkedCT x
MedMaster x
NDC x
OMIM x
Orange Book x
Pharmaprojects x
PubChem x x x
RxNorm x x

The figure below shows the data sets that can be interlinked through common identifiers as of October 2008.