From W3C Wiki
This page describes the joint effort between BioRDF and LODD task forces of HCLSIG for connecting the knowledge about alternative medicines and western drugs, to facilitate patients searching for alternative medicines and biomedical researchers with drug discovery research.
The goal of this exercise is to:
- test and evaluate a novel approach of creating links between RDF datasets in a large scale
- demonstrate how Linked Data can be used to connect TCM and western medicine to explore how to intersect the two types of medicine. We demonstrate:
- how we can inform patients about possible side-effects of an herb by discovering the side-effect information reported in clinical trials of western drugs with shared ingredients of this medicine
- how we can inform researchers about possible targets of an alternative medicine by discovering the possible targets of western drugs with shared ingredients of this medicine
- how we can verify genes reported by TCM researchers as being associated with alternative medicines used for the Alzheimer's Disease are indeed AD genes using studies about these genes in the context of western drugs
- RDF-TCM: see http://code.google.com/p/junsbriefcase/wiki/RDFTCMData
- Drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/
- SIDER: http://www4.wiwiss.fu-berlin.de/sider/
- Dailymed: http://www4.wiwiss.fu-berlin.de/dailymed/
- Diseasome: http://www4.wiwiss.fu-berlin.de/diseasome/
- LinkedCT: http://www.linkedct.org/
- aTags: http://hcls.deri.org/atag/data/tcm_atags.html
These source datasets are transformed into RDF format by the following two approaches:
- For DrugBank, DailyMed, Diseasome, SIDER, STITCH, the source datasets as in tab-delimited or XML files are imported into a relational database, and then a D2R server is set up over each relational database.
- Customized Python scripts are created to transform tab-delimited data TCMGeneDIT. The scripts can be found at: http://code.google.com/p/junsbriefcase/source/browse/#svn/trunk/biordf2009_query_federation_case/tcm-data
Interlinking of datasets
We used two approaches to create the interlinking between datasets in a large scale:
- Silk: http://www4.wiwiss.fu-berlin.de/bizer/silk/
- Customized scripts to create the interlinking between RDF-TCM and Entrez gene hosted at http://hcls.deri.org/sparql:
- Firstly, search for mapping Entrez genes from SPARQL endpoint  using exact gene name mapping as filters
- Then manually correct many to one gene mappings using Entrez and TCM database web pages
The figure below shows the data sets that have been published and their interlinking pathes so far.
Representation of Interlinks
- For the set of links created for any two datasets: we define them as a voiD:LinkSet and an oddlinker:linkage_run
- For each link: we represent it as an oddlinker:interlink in order to provide additional metadata about this link, such as which data items are being linked, how much confidence we have for this interlinking.
The applications to support the motivate use cases are currently deployed at http://www.open-biomed.org.uk/admed/admedapps/searchInfoAboutTCM/.
- incorporate additional data sources, e.g., herbal and/or TCM related sources as well as genomic/clinical/drug data sources
- Explore multi-lingual interlinking
- Develop new use cases and user-facing applications
- Automatic notification on interlink updates between data