HCLSIG BioRDF Subgroup/Meetings/2006-04-10 Conference Call

Conference Details

Date of Call: Monday April 10, 2006
Time of Call: 11:00am Eastern Time
Dial-In #: +1.617.761.6200 (Cambridge, MA)
Participant Access Code: 246733 ("BIORDF")
IRC Channel: irc.w3.org port 6665 channel #BioRDF (see W3C IRC page for details)
Duration: ~1 hour
Convener: Susie Stephens
Scribe:

Attendees: Susie Stephens, Roger Cutler, Davide Zaccagnini, Brian Osborne, Don Doherty, John Barkley, Kei Cheung, Daniel Rubin, Scott Marshall, Nikesh Kotecha, Kyle Bruck, Tim Clark, Alan Ruttenberg, Karen Skinner, Elizabeth Wu, June Kinoshita, Gwen Wong, Yong Gao, Gabriele Fariello, Paolo Ciccarese

Apologies: Ora Lassila, Olivier Bodenreider, Joanne Luciano

Scribe: Susie Stephens

Agenda

- Brian Osborne - Overview of task on using semantic web technologies to find small molecules that bind to proteins

- Kei Cheung - Overview of SenseLab task

- Nikesh Kotecha - Discussion of Pathway Knowledge Base

- Daniel Rubin - Overview of Imaging Ontology Workshop

- F2F - When and where

1. Brian Osborne - Semantic web technologies to find small molecules that bind to proteins

Small molecules are entities of interest to most biologists, as they can be drug candidates or reagents that enable biological pathways to be dissected.

Unless people pay $10,000s to buy Accelrys or MDL, it's hard to gain access to small molecule data.

A few ways to gain access to small molecules and the proteins that they bind to has been described in the task on the BioRDF wiki.

UniProt and Entrez Gene are possible sources of gene and protein data. The task will start with a focus on UniProt.

SMID from BIND is a source of molecular data that has been generated from 3D structures. It's primarily qualitative data. All of PDB is contained within SMID, which is 10,000s of structures.

Although there is less small molecule and protein interaction data.

The use of identifiers is more complex for small molecules.

Interested in using D2RQ to map to MySQL.

Brian Gilman is also working on the task.

Brian Osborne is going to start on his own by converting SMID to RDF, and aims to get a first version completed within a week.

Kei Cheung said that he thinks that D2RQ is a good technology to use, based upon his past experience. However, there is a performance overhead at query time with this approach.

2. Kei Cheung - Overview of SenseLab Task

Slides are at: http://twiki.med.yale.edu/kei_web/sw_group/Semantic_Web_SenseLab_Task2.ppt

There are a group of 7 databases in SenseLab.

Will use D2RQ to map directly to the RDF structure.

Interested in exploring mapping XML structure into RDF structure, possibly using Xalan or GRDDL.

The ultimate goal would be to link the SenseLab data to other gene, protein, or pathway data sets.

And also to Semantic Web aware tools.

Kei has worked to define a use case by having a number of discussions with Gordon Shepherd.

Making the data accessible as RDF makes it easier for someone to gather data from multiple data sources. Additional benefits include further semantics, which may be hidden within a relational implementation. It also becomes possible to undertake reasoning, inference, and knowledge discovery.

June has also spoken with Gordon Shepherd, and has a strong understanding as to how Alzheimers

researchers would undertake such research. She is thinking about leveraging the semantic web to help knowledge transfer between databases.

Tim - In SWAN, the key things are ontology digital resources (publications, presentations) and domain ontologies. This could be done in terms of web services. It's important to focus on how researchers think about questions, not just implementing an impressive technology demo.

Davide - Has been working with the drug discovery part of a pharma company, and has got impressive results from semantic technologies. Can get Alzheimer queries closer to the way scientists envision the queries, i.e. in natural language. Querying an ontology is closer to the natural way of answering questions and performing complex inferencing. It is easy to demonstrate the benefits.

Roger - Uncertain about extracting semantics from relational databases.

Susie - Relational databases model data, whereas the Semantic Web can more easily map reality. It is import to identify the important knowledge contained within relation database, and to surface that data, and to hide the information that is an artifact of data modeling.

3. Nikesh Kotecha - Discussion of Pathway Knowledge Base

Scott - How can I get more hits?

Nikesh - Elongation on metabolism will give lots of results.

Scott - What is the best way to visualize the pathway structure?

Nikesh - Like Reactome, KEGG and Biocyc for visualization.

Tim - If you want to do something productive with pathway visualization, then look at the APP pathway, as it's relevant for pathways. There are examples in KEGG.

Nikesh - We had an RDF/OWL structure that we could use for mapping the pathways data. BioPAX was really helpful. Think it's very relevant to have good rdf defined structure. It's important to define the integration point.

Kei - Used EDSP was used to capture relationships in SenseLab automatically.

4. Daniel Rubin - Overview of Imaging Ontology Workshop

There were many participants at the workshop who have a focus on biomedical imaging and

neuroscience. The agenda can be seen at:http://smi.stanford.edu/projects/cbio/mwiki-internal/index.php/Workshop_on_Ontology_of_Images