HCLSIG BioRDF Subgroup/Meetings/2006-10-16 Conference Call

Conference Details

Date of Call: Monday October 16, 2006
Time of Call: 11:00am Eastern Time
Dial-In #: +1.617.761.6200 (Cambridge, MA)
Participant Access Code: 246733 ("BIORDF")
IRC Channel: irc.w3.org port 6665 channel #BioRDF (see W3C IRC page for details, or see Web IRC)
Duration: ~1 hour
Convener: Susie Stephens
Scribe: Susie Stephens

Participants: Olivier Bodenreider, Matthias Samwald, Kei Cheung, John Barkley, Daniel Rubin, Alan Ruttenberg, Scott Marshall, Vipal Kashyap, Susie Stephens

Agenda

Review of the F2F meeting

Susie Stephens [SS] gave an overview of the F2F meeting.

BioRDF has been doing a good job of making data sets available in RDF. However, it would be good if this work were more visible. For example, by writing more W3C notes about our progress, and by making it easier to find the data on the BioRDF wiki. Increased visibility will make it easier to justify the extension of HCLS after the initial two-year period. Kei Cheung mentioned that the article opportunity in BMC Bioinformatics would be a good way to raise the visibility of the group.

Alan Ruttenberg gave a good presentation on URIs at the meeting, which he had worked to prepare with Matthias Samwald. This work will be written up as a note.

There was a general discussion about URIs at the meeting. It appears that there is a lot of consensus around us using HTTP based URIs. The W3C Technical Architecture Group (TAG) is updating a document on URIs. The BioRDF group will have the opportunity to review the document. No further documents on URIs were considered necessary.

Real work was achieved during the F2F, for example, Alan and Matthias worked to integrate Cocodat and KiDB.

Much progress was also made during the meeting on the demo. For example, a list of data sets that are now available in RDF was compiled. And a scientific query was determined that takes advantage of a number of these data sets.The query is “Show me the location of receptors that bind to a ligand which is a therapeutic agent in {Parkinson’s, Huntington’s} disease in each of the dopaminergic neurons in the {pars compacta, pars reticularis, substantia nigra}”.

Next steps for building the demo

Alan Ruttenberg [AR] highlighted the fact that the version of Entrez Gene that is in RDF is 42GB, and is therefore very difficult to use. He asked Scott Marshall if it would be possible for him to load the version of Entrez Gene that is in RDF into the Oracle RDF Data Model?

Scott Marshall [SM] said that it should be possible as he thinks that there is 100G of unused space. He might need to secure a guarantee that he can use the space.

AR said that once the data is loaded it would become possible to query for a subset of genes, and to then incorporate that data into the demo. He also asked Olivier Bodenreider if it would be possible for the NCBI to host the Entrez Gene data set, as they already host data in a number of other formats.

Oliver Bodenreider [OR] said that he’s part of NLM, rather NCBI, so he can’t force this to happen. There are concerns about making the data available 24 x 7, when it is hosted on the Web site. However, they have been looking into using XSLT to do query translations, which would enable data to be sent back as RDF.

Vipal Kashyap [VK] mentioned that this is exactly the sort of task that GRDDL was designed for.

AR said that it wouldn’t take much to make it GRDDL, and that they could just simulate for now.

Kei Cheung asked if it would be possible to look at the schema of the Entrez Gene Data.

OR said that they were more focused on the translation, rather than designing a schema, so isn’t sure that he has something that would be easy to share.

AR said that we should ask Ivan Herman to find a server for us to host the demo on.

SS said that it might be a good idea to have some more scientific questions that can be answered by the demo, otherwise the demo would be rather short and inflexible. It would be good if we could try to extend the number of questions that the demo can support, while initially not needing to commit more than Pubchem to RDF. Ultimately, more data sets will need to be converted into RDF and incorporated into the demo in order to fulfill the bench to bedside goal.

There was general agreement that it would make sense to have a series of questions, and that we should initially work with the data sets that have been identified.

VP said that he'd designed a spreadsheet to help us to see which data sets would be required for different queries.

SM added that an interesting extension would to incorporate image data, as that would allow us to perform an additional query relating to the location of the receptors. Image data may also be a nice way to link the demo from the bench to the bedside. Bill Bug may also be able to help us to find some image data.

KC made the recommendation that we also add disease data to the demo, for example, OMIM, as that would help us to reach the bench to bedside goal.

Alan Ruttenberg [AR] pointed out that OMIM has a lot of free text, which would make it difficult to incorporate into the demo, although it does contain MESH terms that could be helpful. He also highlighted that bedside data is usually patient specific, rather than the generic disease information that is stored within databanks such as OMIM. He also mentioned that Davide Zaccagnini has worked on NLP techniques with a case presentation report, so that could be another approach.

VK said that we will need to incorporate clinical data sets too.

SS said that there are a number of semantic interoperability projects within the government, and perhaps we could explore whether working with them would help us to cover the bedside aspect of the demo too. Government has a lot of regulations that we should be aware of when we incorporate the clinical data.

AR said that it would be interesting to hear from from government. Regulation is a separate can of worms, which we should try to avoid as much as possible.

SS mentioned that Oracle applications for the clinical space, and that we might have some demo data too.

SM said that we should force a fit between the bench and the bedside where it doesn’t really exisit.

SS gave an overview of a letter that the clinical genomics section of HL7 has recently sent to Mike Levitt of the Health and Human Services, in response for his request for data standards for the bench to the bedside.

VK said much of the work within the clinical genomics group of HL7 is focused on incorporating family history data, information from genetic labs, and providing a framework for genetic decision support.

SS highlighted that another important component to consider in the demo is the UI.

AR said that he’d pinged Ora Lassila for an update on OINK as that may be very useful.

SS proposed that we have joint BIONT-BioRDF calls to further the demo on the Mondays between regular BioRDF calls.

VK said that he'd propose this during the next BIONT call.

SS No time to talk about databases, but we can easily do that another day.

SS Next call on Oct. 30.