HCLSIG BioRDF Subgroup/Meetings/2006-05-08 Conference Call

Conference Details

Date of Call: Monday May 8, 2006
Time of Call: 11:00am Eastern Time
Dial-In #: +1.617.761.6200 (Cambridge, MA)
Participant Access Code: 246733 ("BIORDF")
IRC Channel: irc.w3.org port 6665 channel #BioRDF (see W3C IRC page for details, or see Web IRC)
Duration: ~1 hour
Convener: Susie Stephens
Scribe: Kei Cheung

Agenda

Review our progress in terms of the charter
Streamlining the Wiki and documenting our work
Content of future calls
Thoughts regarding the formal incorporation of the NLP group into BioRDF
Disease focus for selecting data sub-sets
Vocabulary/ontology needs & interaction with BIONT
Task reviews if time
AOB

Attendees

Susie Stephens, Oliver Bodenreider, Davide Zaccagnini, Kei Cheung, John Barkley, Ora Lassila, Karen Skinner, Scott Marshall, Alan Ruttenberg

Susie: One of the purposes of this telconf is that it is a house-keeping call for reviewing the BioRDF charter and progress according to the 3, 6, 12 months time frames. For the first 3 months time frame, we have successfully identified a number of initial datasets.

Olivier: People should also be aware of the datasets provided by NCBI. Currently they seem to be insufficiently explored by the group.

Davide: Anybody doing XML transformation into RDF.

Kei: The SenseLab task explores how to convert EDSP (which is a custom XML format) into RDF using technologies such as XSLT and XQuery

Alan: The key issues involve identifying good and high quality data sources and how to map them to a standard ontology. For example, those sources that have to do with antibodies and immunology.

Kei: Mapping non-RDF data to a well-structured ontology is a not as easy as one might think. In senselab, we tie local terms to existing standard vocabularies like UMLS.

Susie: We seem to need to identify a domain and the corresponding vocabularies.

Davide: It is important to identify what types of vocabularies should be used.

Susie: We should perhaps focus on the Neuroscience domain with a couple of neurological diseases such as Hungtington Disease and Alzheimer’s Disease.

Alan: A lot of issues need to be addressed regarding the disease use cases. Six months are not enough.

Kei: We should probably focus on the pilot level working with small datasets instead of a full-scale exploration.

Karen: There are a number of neuroscience resources including NIF (Neuroscience Information Framework), neuroscience database gateway, Gene ontology (Jackson Lab), BIRN (UMLS browser), etc.

Olivier: License agreement needs to be signed in order to use UMLS. UMLS has a large number of concepts and terms (5 million). It doesn’t do very well with gene identification.

Alan: NLP suffering from lack of standard ontologies.

Davide: His NLP method works without reference ontologies.

Scott: We need to focus on a demo that involves the use of standard UMLS ontologies in neuroscience (measurement data for example). Also, exploring how to bridge ontologies is important.

Davide: The level of granularity relevant to researchers is also very important. He discussed this in the context of a clinical trial example (diabetes).

Alan: Have we identified any data sources related to bench-bedside?

Davide: He is willing to take on the task of identifying datasources related to bench-bedside.

Alan: What is the relationship between NLP and RDF

Davide: His NLP program generates output with XML tags, which serve as a good starting point for conversion into RDF. But right now, the program doesn’t generate an RDF ontology.

Alan: We should identify a set of tags (e.g., disease name, gene names, etc) that are useful for individual projects.

Davide: It is important to come up with semantic queries. For example, what compounds affect a certain anatomical structure? What compounds affect the gene expression? …

Susie: It might be good idea to have Olivier give a demo on UMLS.

Scott: What is the 6 month goal?

Susie: The 6-month goal is to transfer data into RDF, semantic web requirement (screen scraping approach?). 12-month goal: demo using RDF/OWL for data integration. The goals are not cast in stone, but we should consider any deviations.

Scott: I like the idea of pilot projects.

Susie: We need to find out what bits of information we need, and then explore where we can access ontologies that contain such information (e.g., in UMLS).

Karen: There seems to be datasets available on Alzheimer’s Disease through BIRN (Bill Bug is affiliated with BIRN).

Alan: We should catalog data sources indicating what types of information needed for each source. This would allow us to survey the vocabulary/ontology needs of our group.

Scott: How to relate column names to RDF terms (concept)?

Susie: We need to stop because of the time. Davide and Brian will describe their work with NLP during next week’s call.