HCLSIG BioRDF Subgroup/Meetings/2006-05-08 Conference Call
- Date of Call: Monday May 8, 2006
- Time of Call: 11:00am Eastern Time
- Dial-In #: +1.617.761.6200 (Cambridge, MA)
- Participant Access Code: 246733 ("BIORDF")
- IRC Channel: irc.w3.org port 6665 channel #BioRDF (see W3C IRC page for details, or see Web IRC)
- Duration: ~1 hour
- Convener: Susie Stephens
- Scribe: Kei Cheung
- Review our progress in terms of the charter
- Streamlining the Wiki and documenting our work
- Content of future calls
- Thoughts regarding the formal incorporation of the NLP group into BioRDF
- Disease focus for selecting data sub-sets
- Vocabulary/ontology needs & interaction with BIONT
- Task reviews if time
Susie Stephens, Oliver Bodenreider, Davide Zaccagnini, Kei Cheung, John Barkley, Ora Lassila, Karen Skinner, Scott Marshall, Alan Ruttenberg
Susie: One of the purposes of this telconf is that it is a house-keeping call for reviewing the BioRDF charter and progress according to the 3, 6, 12 months time frames. For the first 3 months time frame, we have successfully identified a number of initial datasets.
Olivier: People should also be aware of the datasets provided by NCBI. Currently they seem to be insufficiently explored by the group.
Davide: Anybody doing XML transformation into RDF.
Kei: The SenseLab task explores how to convert EDSP (which is a custom XML format) into RDF using technologies such as XSLT and XQuery
Alan: The key issues involve identifying good and high quality data sources and how to map them to a standard ontology. For example, those sources that have to do with antibodies and immunology.
Kei: Mapping non-RDF data to a well-structured ontology is a not as easy as one might think. In senselab, we tie local terms to existing standard vocabularies like UMLS.
Susie: We seem to need to identify a domain and the corresponding vocabularies.
Davide: It is important to identify what types of vocabularies should be used.
Susie: We should perhaps focus on the Neuroscience domain with a couple of neurological diseases such as Hungtington Disease and Alzheimer’s Disease.
Alan: A lot of issues need to be addressed regarding the disease use cases. Six months are not enough.
Kei: We should probably focus on the pilot level working with small datasets instead of a full-scale exploration.
Karen: There are a number of neuroscience resources including NIF (Neuroscience Information Framework), neuroscience database gateway, Gene ontology (Jackson Lab), BIRN (UMLS browser), etc.
Olivier: License agreement needs to be signed in order to use UMLS. UMLS has a large number of concepts and terms (5 million). It doesn’t do very well with gene identification.
Alan: NLP suffering from lack of standard ontologies.
Davide: His NLP method works without reference ontologies.
Scott: We need to focus on a demo that involves the use of standard UMLS ontologies in neuroscience (measurement data for example). Also, exploring how to bridge ontologies is important.
Davide: The level of granularity relevant to researchers is also very important. He discussed this in the context of a clinical trial example (diabetes).
Alan: Have we identified any data sources related to bench-bedside?
Davide: He is willing to take on the task of identifying datasources related to bench-bedside.
Alan: What is the relationship between NLP and RDF
Davide: His NLP program generates output with XML tags, which serve as a good starting point for conversion into RDF. But right now, the program doesn’t generate an RDF ontology.
Alan: We should identify a set of tags (e.g., disease name, gene names, etc) that are useful for individual projects.
Davide: It is important to come up with semantic queries. For example, what compounds affect a certain anatomical structure? What compounds affect the gene expression? …
Susie: It might be good idea to have Olivier give a demo on UMLS.
Scott: What is the 6 month goal?
Susie: The 6-month goal is to transfer data into RDF, semantic web requirement (screen scraping approach?). 12-month goal: demo using RDF/OWL for data integration. The goals are not cast in stone, but we should consider any deviations.
Scott: I like the idea of pilot projects.
Susie: We need to find out what bits of information we need, and then explore where we can access ontologies that contain such information (e.g., in UMLS).
Karen: There seems to be datasets available on Alzheimer’s Disease through BIRN (Bill Bug is affiliated with BIRN).
Alan: We should catalog data sources indicating what types of information needed for each source. This would allow us to survey the vocabulary/ontology needs of our group.
Scott: How to relate column names to RDF terms (concept)?
Susie: We need to stop because of the time. Davide and Brian will describe their work with NLP during next week’s call.