Semantic Web Japan, Personalized Medicine, and Knowledge Explorer

I recently had the good fortune of being invited to give a keynote at the Japanese Semantic Web conference, held at Keio University in beautiful city of Tokyo, Japan. The speaker before me, Tetsuro Toyoda, described how Semantic Web was being used at RIKEN (The Institute of Physical and Chemical Research). Many of these interesting activities can be found at RIKEN database, such as Semantic-JSON and Semantic Table. There was a healthy showing of commercial interest present at the conference, as well as a demonstration of a (Japanese) medical terminology system that made use of OWL and could provide English translations. I wish our Japanese colleagues a quick recovery from the recent earthquake and tsunami. Although Tokyo was not hit as hard as other areas, it has still been severely affected, with power blackouts causing many delays in a return to normality.

After Tokyo, I traveled to beautiful city of Vancouver to speak at the Best Practices in Personalized Medicine Workshop, which was held as part of the Heart + Lung Research & Education FEST. The Semantic Web session where I spoke also had nicely complimentary talks from Xavier Lopez (Oracle) and Mark Wilkinson (UBC). At B2PM, Leroy Hood opened with a keynote expanding on P4 – his description of a new approach to medicine that is “powerfully predictive, personalized, preventative — meaning we’ll shift the focus to wellness — and participatory” and described recent progress. B2PM attendees presented and discussed how to achieve the goals of personalized medicine. Many participants have expressed interest in making use of the established network to further the cause of personalized medicine.

I have been learning about the Sentient Knowledge Explorer from IO Informatics. Knowledge Explorer has some very useful features and functionality that includes the generation of SPARQL queries from user selected sub-networks without requiring knowledge of SPARQL. A full description can be found at the W3C use case description “Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection”.

C-SHALS 2011

The Conference on Semantics in Health Care and Life Sciences (C-SHALS) is being held in Cambridge, MA 23-25 February 2011. The C-SHALS conference provides a forum for advancing the application of Semantic Web Technologies in the Life Sciences. Representatives from W3C’s Health Care and Life Sciences Interest Group will be participating by giving both tutorials and presentations, as well as an impressive lineup of speakers from the HCLS and pharma community. Chris Bouton’s TripleMap will be presented there too, having opened up a Public Alpha version in time for C-SHALS. I expect this year’s C-SHALS to be very interesting!

2010 – A good year.

2010 was a good year for the Semantic Web, which has especially gained momentum in health care and life sciences. I will just mention a few of the reasons it was a good year. Semantic Web and Linked Data have become a clear choice for projects that have a special interest in interoperability and data sharing across enterprises and between project partners. This includes the European Innovative Medicine Initiatives (IMI), in which a number of projects that will want to share results and information across several domains including drug discovery, electronic patient records, clinical trials, tissue banking, etc. The Pistoia Alliance SESL project has also moved things forward, showing how to add value to data through linking microarray data with claims in literature and other public domain resources. Great news: With another 5 years of NIH funding, NCBO is well-positioned to contribute to collaborative science and translational research.

Biobanking provides Semantic Web with an ideal resource sharing application – one that makes the added value of linked data more concrete for bench scientists, who understand the immediate need of sharing tissue libraries in order to move certain types of research forward. A keynote speaker at BBMRI meetings in the Netherlands and the Life Sciences Momentum 2010 conference, David Cox (Pfizer) provides a business case for biobanking, couched in a solid research strategy that is mutually beneficial to pharmaceutical companies and their academic partners. In the strategy outlined by Cox, a transparent policy of public domain knowledge and intellectual property is key to make biobanking and drug development work well for academic and pharmaceutical partners. Although David Cox does not focus on data sharing in his presentation, it plays an implicit central role in biobanking and matching biobanking resources with drug development goals. Please see the interview with David Cox “Collaborate in order to understand the biology”.

In the HCLS Interest Group, we have progressed on several fronts: an approach to creating semantic views on distributed data sources, including both relational databases and triplestores (see the SWObjects federated query tutorial at SWAT4LS), published about how we applied the Translational Medicine Ontology to patient records in Indivo format and were able to pose cross-domain queries on the results, demonstrated and published about a federated approach to microarray study results in RDF at the Provenance Workshop at ISWC, further defined several ontologies for describing discourse and annotations, in the process of documenting best practices for linked data, checking correspondence of radiology and pathology reports for breast cancer, with more publications, demonstrations, and W3C notes on the way. Look for a Best Practices document from the Linked Open Drug Data task force and a specification of microarray RDF practices from BioRDF / Scientific Discourse task forces in the coming months.

Did I mention being impressed by TripleMap? It makes it possible to create maps of interest using Linked Open Drug Data in a very nice user interface geared toward essential concerns such as compounds, disease, genes, etc.

NCBO, PharmaIT, Pistoia, Biobanking and publishing opportunities

I recently returned home to the Netherlands after visiting NCBO for 7 months. It was inspiring to see such a well-coordinated group and a great opportunity to meet and work with the NCBO team as well as Protégé and others in Biomedical Informatics Research at Stanford. Special thanks to Mark Musen. I and my wife are very grateful that our respective Dutch employers (LUMC and UMC) supported and sponsored our visit (to NCBO and Genentech respectively).

Last week, I gave a presentation about the HCLS approach to linked data federation at the Pharmaceutical Technology IT Summit in London. Ian Harrow of Pfizer presented the Pistoia Alliance project “SESL”, which aims to apply text mining and a federation of triplestores to enhance sharing of precompetitive data. The HCLS BioRDF task force recently submitted a federated approach to microarray data to Workshop on the role of Semantic Web in Provenance Management (SWPM-2010) that should be directly applicable in the SESL project.

This week, I have been attending the “BBMRI – Biobanking for Science” Conference. We have discussed biobanking as an demo application for Semantic Web and attending this conference has convinced me that biobanking is the perfect storm for Semantic Web. Another interpretation could be that biobanking has similar requirements to translational medicine and shares the inherent value of data stewardship that seem intrinsically tied to data sharing.

Conference on Semantics in Healthcare and Life Sciences (CSHALS) is open for submissions and Semantic Web Applications for the Life Sciences (SWAT4LS) is open for submissions and I hear that the Journal of Biomedical Semantics could use a few more articles in its pipeline.

RDF Symposium at ACS Meeting and Data Sharing hits the papers

Next week, Eric Prud’hommeaux will present an “Overview of the linking open drug data task” at the RDF Symposium of the American Chemical Society on August 22 and 23 in Boston. The RDF Symposium is co-organized by HCLS member Egon Willighagen, who will also present “Linking the Resource Description Framework to Cheminformatics and Proteochemometrics”.

A recent article about Data Sharing appeared in the New York Times! The article describes a project that led to biomarker discovery for Alzheimer’s Disease, see Sharing of Data Leads to Progress on Alzheimer’s. There has been some discussion about it on the HCLS mailing list.

At the Semantic Technology conference in San Francisco in June, in a session organized by Christine Golbreich and M. Scott Marshall, Applications of Biomedical Ontologies and Resources was presented by the trio HCLS/NCBO/NIF. M. Scott Marshall introduced ontologies from both an ontology building perspective (including the Foundational Model of Anatomy and the Translational Medicine Ontology) and an application perspective, while Mark Musen presented on the National Center for Biomedical Ontologies (NCBO) and Jeffrey Grethe presented on the NeuroInformatics Framework (NIF). Together, the presentations provided an overview of the range of general to specific applications of ontologies and Semantic Web. Note also that both NCBO and NIF provide a SPARQL endpoint to their resources: and

HCLS trends and NCBO’s SPARQL endpoint!

Our recent HCLS gatherings went very well in both Boston and Raleigh, with both dinners and discussions. In Boston, we had a fun chat, with a group of people somewhat diminished in size due to the volcano effect. We talked about opportunities presented by the ARRA and increased attention to Electronic Health Records. EHR’s have remained a topic of discussion and HCLS is now looking to create a demo that employs EHR scenarios. We are now actively seeking clinical data that could be used build a Semantic Web demo that is realistic to clinical practitioners. Offers of data or help of any kind will be welcomed.

Also, especially since C-SHALS, provenance has become a key theme of research. The BioRDF task force has been working on the best way to model the origins of microarray data (gene lists, experimental conditions, etc.). The Scientific Discourse task has been looking at how to model the origins of both statements made in discussion groups and computational experiments such as those stored at and Research Objects. BioRDF and SciDisc have recently begun a joint teleconference to discuss provenance overlaps. Translational Medicine Ontology task force has also begun looking at provenance issues.

Great news! NCBO has made it possible for linked data projects to link directly to NCBO hosted ontologies via an *experimental* SPARQL endpoint. Point your browsers at and your programs at . This is great news to HCLS because we can build demonstration applications that make direct use of these knowledge resources (includes all of OBO and more)! Although feature requests will be collected, please don’t expect too much of this young but valuable resource at this early stage. It’s a prototype!

HCLS Gatherings

The HCLS IG will be having two informal gatherings in April. The first gathering will be held at the Stata Center, MIT, on April 22, from 5pm. It is expected that many of the people who are attending Bio-IT World in Boston will be able to come along to this gathering.
The second gathering will be held in Raleigh, NC, on April 26, from 6pm. It is expected that this meeting will largely be attended by participants at WWW2010. Please add your name to the wiki if you are able to attend either event. I’m looking forward to being able to see many of you in person soon.

WWW2010 Workshop: The Future of the Web for Collaborative Science

The agenda for the WWW2010 workshop on The Future of the Web for Collaborative Science is now available.
Due to the large number of high quality submissions some difficult decisions had to be made. In total 9 papers were accepted, covering diverse topics including image annotation, a dashboard for systems chemical biology, a linked data framework, policy mediation, and exchange and reuse of research objects.
We look forward to seeing you in April in Raleigh, NC.

CSHALS Tutorial

W3C’s HCLS group is going to be giving a tutorial at the upcoming Conference on Semantics in Healthcare and Life Sciences (CSHALS), which is being held on February 24-26, in Cambridge, MA. Lee Feigenbaum (Cambridge Semantics) is going to start off the tutorial by giving background information on all of the key Semantic Web standards. Scott Marshall (University of Amsterdam) is then going to give an introduction to the HCLS IG. This will be followed by Elgar Pichler (AstraZeneca), Vipul Kashyap (Cigna), and Tim Clark (Harvard) describing applications of the Semantic Web that they have developed within pharma, healthcare, and scientific publishing respectively. The tutorial will conclude with John Madden (Duke) and Kei Cheung (Yale) describing some of the most cutting edge uses of the technology within terminology and federated query. I’m looking forward to catching up with many people at the conference.

Looking back at 2009 and forward to 2010

The HCLS IG had a good year in 2009, with lots of interaction, demonstration, and outreach. HCLS was involved in: C-SHALS Tutorial, Shared Names, Bio-IT World (in Boston and Hannover), Concept Web Alliance (CWA), PRISM Forum SIG, AMIA conferences, 7th Annual Pharma Technology IT Summit, the Data Integration for the Life Sciences (DILS) Workshop, and SWAT4LS.

We held two Face2Face meetings (Boston,MA and Santa Clara,CA), organized the Workshop on Scientific Discourse at ISWC2009, and helped organize the Semantic Web Applications and Tools for the Life Sciences (SWAT4LS) in Amsterdam – all well-received. HCLS produced some deliverables such as a Translational Medicine Ontology, articles in journals and proceedings, lots of Linked Open Drug Data, three W3C Interest Group notes, an approach to query federation, and presented a new version of the Clinical Observations Interoperability demo. We also helped initiate Shared Names, participated in the development of the Concept Web Alliance demo, and won the Triplification Challenge of 2009. In 2009, several of our members have joined together to write grant proposals in both Europe and the U.S. which could provide more support for some HCLS activities. BTW, another way to support those activities is to join us (see top of page for instructions)!

A development that I am excited about is the collaborative effort involving HCLS and CWA and the Swiss Institute of Bioinformatics (SIB) to create a SPARQL endpoint for Uniprot. Such an endpoint could make it possible to perform essential bioinformatics information retrieval without ever leaving the comfort of your SPARQL query interface.

I’m writing to you from Stanford, California where I am visiting the Musen Lab, home of the National Center for Biomedical Computing (NCBO), and the creators of BioPortal and Protégé. NCBO just put in for a renewal grant so keep your fingers crossed! In HCLS, we hope to see continued support for the very valuable knowledge resources that have been made available to biomedical researchers by NCBO. HCLS is looking at how to incorporate BioPortal services into our demonstrations.