Potential Data Sources for HCLS demonstrations

Types of data that could be useful in a demo - a short explanation

Real data is essential to developing a convincing HCLS demonstration. Wherever real data cannot be found or made available, pseudo data that has the same attributes and structure will be used. However, even to create convincing 'pseudo data', we need a thorough understanding of the way it is used. Therefore, actual data (anonymized or otherwise protected by transformation) will ideally be supplied by a data or problem owner, i.e. someone who wants something from the data. Government regulations that protect the privacy of patients and the resulting caution makes it extremely difficult to acquire patient data that is suitable for the demonstrations that HCLS would like to develop. This page is meant to provide an overview of possible data sources.

We are interested in demonstrating how linked data can benefit patients and the clinical practitioners who diagnose and treat them. In order to demonstrate translational medicine, we would link out from attributes of a patient's record such as 'disease' to everything from patient eligibility for clinical trials to information about new compounds that could be helpful during treatment. We would also like to demonstrate that we can automatically catch errors and discrepancies between related procedures that will reduce the associated healthcare costs by ensuring that relevant information is supplied to the clinician at each step. Data that has been annotated with a medical terminology or ontology would be *very* useful.

PatientsLikeMe (http://www.patientslikeme.com/)

Our goal is to enable people to share information that can improve the lives of patients diagnosed with life-changing diseases. To make this happen, we've created a platform for collecting and sharing real world, outcome-based patient data (patientslikeme.com) and are establishing data-sharing partnerships with doctors, pharmaceutical and medical device companies, research organizations, and non-profits. Contact us if you're interested in working together to achieve our goals.

caTIES (http://caties.cabig.upmc.edu)

A system for establishing large volume repositories of concept coded, and de-identified pathology reports, that supports sharing of de-identified information across institutions. Contact: Rebecca Crowley

NCBO Biomedical Resource Index (http://bioportal.bioontology.org/resources)

NCBO is building a system for automated ontology-based annotation and indexing of biomedical data. We process the textual metadata of diverse elements of biomedical resources such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed abstracts to annotate and index them with terms from appropriate ontologies.

CollabRx (http://collabrx.com/)

CollabRx is developing a system that enables teams of experts to discuss and apply state of the art knowledge for the treatment of specific patients for a personalized medicine approach. CollabRx has employed Semantic Web, Lisp, and http://www.tolven.org/, creating a knowledge base and patient record system that links to it. Both Scott Marshall and Eric Prud'hommeaux have met with CollabRx.

Cleveland Clinic

Cleveland Clinic has a clinical report system that has been in use for several years and makes use of XML and RDF to process and store clinical reports for cardiology surgery.

AdisInsight (http://www.adisinsight.com/)

a database of clinical trials (including information from various trial registries, conferences, media releases etc) http://www.adisinsight.com/adis_cti.htm
a database of pharmaceutical products in research and development http://www.adisinsight.com/adis_rdi.htm
a database of adverse events http://www.adisinsight.com/adis_rp.htm

Contact: Diana Faulds responded to Scott's blog entry plea for data and joined a TMO telcon to tell us about the data. She can get others to supply more detail and a cut of the data.