HCLSIG/LODD/Data/DataSetEvaluation
This page provides an overview about the characteristics of datasets that could be relevant for the HCLSIG LODD effort.
Project Home Page | Topic | Short Description | Size | License | Data dumps | Status/ Activity | Possible Linking to Other Datasets | Example Instances |
chEBI | Chemical Compounds | dictionary of molecular entities focused on small chemical compounds | 15,548 annotated entities | free | structured text files | updated monthly | CAS, KEGG | |
DailyMed | Drugs | information about approved prescription drugs, includes FDA approved labels (package inserts) | 96,000 triples; 4,039 drugs | XML, SPL | updated regularly (RSS) | RX Norm, NDC | "Sterile Water (Irrigant)" via Marbles, via OpenLink Data Explorer | |
DBpedia | Drugs/ Diseases/ Proteins | RDF data about 2.49 million things that has been extracted from Wikipedia | 218 million RDF triples; 2,300 drugs, 2,200 proteins | free | RDF | updated every 3 months | ATC, CAS, DrugBank, EntrezGene, HGNCid, OMIM, PubChem, ChemSpider | Aspirin, HIV |
Diseasome | Diseases / Genes | characteristics of disorders and disease genes linked by known disorder–gene associations | 87,000 triples; 2,600 genes | free | structured text files | updated 2006 | OMIM, Entrez Gene | Alzheimer's via Marbles, via OpenLink Data Explorer |
Drug Bank | Drugs | drug (i.e., chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e., sequence, structure, and pathway) information | 1.1 million triples; 4,800 drugs, 2,500 protein sequences | permission of authors needed | structured text files (FASTA, SDF, DrugCard) | updated August 2008, irregular | PubChem, FDA/NDA, ChEBI, IUPAC, INCHI, CAS, KEGG | Varenicline via Marbles, via OpenLink Data Explorer |
LinkedCT | Clinical Trials | Linked data source of trials from ClinicalTrials.gov | 7 million triples, 62000 trials | free - with Terms and Conditions | Available upon request | preview release | DBpedia, GeoNames, Bio2RDF | Influenza (Intervention), A Trial, AIDS (condition), A reference, A location |
OMIM | Genes | compendium of human genes and genetic phenotypes | 12,000 genes | license from the Johns Hopkins University needed | structured text files | updated daily | CHONDROSARCOMA via Marbles, via OpenLink Data Explorer; TUMOR PROTEIN p53 via Marbles, via OpenLink Data Explorer |
- Datasets RDFized by the Linked Life Data project (UniProt, Entrez Gene, Gene Ontology, Reactome, BioCyc, ...)
- Datasets RDFized by the BioRDF project (GO, Mesh, Reactome, UniProt, ...)
Project Home Page | Topic | Short Description | Size | License | Data dumps | Status/Activity | Possible Linking to Other Datasets | Example Instances |
Adis R&D Insight | Drugs | comprehensively reports on the latest developments of drugs in active research and development internationally | 19,000 drugs | written permission of Adis Data Information BV needed | updated weekly | CAS | ||
ChemBlast | Atoms | information on all the ligands (HIV-related) and their scaffolds | Molecule pictures, (MDL, Excel)? | updated April 2008 | IUPAC, PubChem | |||
ChemSpider | Chemical Compounds | database of organic molecules containing more than 20 million compounds from many different providers | >20,000,000 chemical compounds | HTML, no downloads | updated regularly | ChEBI, DailyMed, KEGG, PubChem, Wikipedia, DrugBank, InChI, MESH | ||
ClinicalTrials.gov | Trials | federally and privately supported clinical trials conducted worldwide; information about a trial's purpose, who may participate, locations, and phone numbers for more details | 62,693 trials | accompanied by origin and date of data, and modifications made | HTML | ChemIDplus | ||
Citeline TrialTrove | Trials | information about ongoing clinical trials | proprietary | |||||
DrugDB | Drugs | (offline) | ||||||
Drug Ontology | Drugs | ontology including concepts such as indications, interactions, formulary, etc. | OWL schema only | updated 2005 | ||||
DrugDigest | Drugs | usage advise for drugs, vitamins, and herbs | 1,500 drugs | permission needed | HTML | updated daily | ||
DrugInfo | Drugs | covers drugs in clinical trials, approval processes and on the market; information collected from other NLM services mostly | 15,000 drugs | permission needed | HTML | updated daily | CAS, CT.gov, DailyMed, Medline, PubChem | |
Investigational Drug Database | Drugs | investigational drug development, from first patent to eventual launch or discontinuation | 107,000 therapeutic patents; 23,000 drugs; 80.000 chemical structures | proprietary | ||||
IMS | Drugs | information about development, efficacy, and status of pharmaceuticals from early clinical testing through to launch | 16,800 drug summaries | proprietary | updated weekly | |||
KEGG Drug | Drugs | chemical structure based information resource for all approved drugs in Japan and the U.S.A; each is identified by the D number, and is associated with generic names, trade names, efficacy, target information, etc. | academic usage | structured text files | updated July 2008 | DailyMed, PubChem | ||
LillyTrials | Trials | clinical trials sponsored by Eli Lilly and Company | HTML, PDF | updated regulary | NDA | |||
MedMaster | Drugs | information on drugs and their interactions, herbs and supplements | >1000 drugs | permission needed | HTML | updated daily | NDA | |
National Drug Code | Drugs | prescription drugs and insulin products that have been manufactured, prepared, propagated, compounded, or processed by registered establishments for commercial distribution | free | structured text files | updated regularly | NDA | ||
Orange Book | Drugs | Generic product ANDA (Abbreviated New Drug Approval) approvals | free | structured text files | updated daily | NDA, FDA | ||
Pharmaprojects | Drugs | drugs and their bindings to proteins | proprietary | Entrez Gene | ||||
PubChem | Chemical Compounds | chemical structures of small organic molecules and information on their biological activities | free | ASN.1, XML, SDF | updated daily | IUPAC, InChI | ||
RxNorm | Drugs | standard names for clinical drugs | login required | National Drug Code |
Project Home Page | Topic |
Drugome | Drugs |
VA NDF-RT | Drugs |
Common Identifiers
CAS | ChEBI | ChemSpider | CT.gov | DailyMed | DBpedia | DrugBank | Entrez | NDA | HGNC | InChI | IUPAC | KEGG | Medline | MESH | OMIM | PubChem | RxNorm | |
Adis R&D | x | |||||||||||||||||
chEBI | x | x | x | |||||||||||||||
ChemBlast | x | x | x | |||||||||||||||
ChemSpider | x | x | x | x | x | x | x | |||||||||||
CTrialTrove | ||||||||||||||||||
CT.gov | x | x | x | |||||||||||||||
DailyMed | x | x | x | |||||||||||||||
DBpedia | x | x | x | x | x | x | x | x | x | x | x | |||||||
Diseasome | x | x | ||||||||||||||||
Drug Bank | x | x | x | x | x | x | x | x | ||||||||||
Drug Ontology | ||||||||||||||||||
DrugDB | ||||||||||||||||||
DrugDigest | ||||||||||||||||||
DrugInfo | x | x | x | x | x | |||||||||||||
IMS | ||||||||||||||||||
IDD | ||||||||||||||||||
KEGG Drug | x | x | x | x | ||||||||||||||
LillyTrials | x | |||||||||||||||||
LinkedCT | x | |||||||||||||||||
MedMaster | x | |||||||||||||||||
NDC | x | |||||||||||||||||
OMIM | x | |||||||||||||||||
Orange Book | x | |||||||||||||||||
Pharmaprojects | x | |||||||||||||||||
PubChem | x | x | x | |||||||||||||||
RxNorm | x | x |
The figure below shows the data sets that can be interlinked through common identifiers as of October 2008.