HCLSIG BioRDF Subgroup/Meetings/2009/08-31 Conference Call
From W3C Wiki
- Date of Call: Monday August 31, 2009
- Time of Call: 11:00 am Eastern Time
- Dial-In #: +1.617.761.6200 (Cambridge, MA)
- Dial-In #: +33.4.89.06.34.99 (Nice, France)
- Dial-In #: +44.117.370.6152 (Bristol, UK)
- Participant Access Code: 4257 ("HCLS")
- IRC Channel: irc.w3.org port 6665 channel #hcls (see W3C IRC page for details, or see Web IRC)
- Duration: ~1 hour
- Frequency: bi-weekly
- Convener: Kei Cheung
- Scribe: Lena Deus
Kei Cheung, Eric Prud'hommeaux, Lena Deus, Satya Sahoo, Scott Marshall, Rob Frost, TN Bhat, Helen Parkinson, Sudeshna Das
- Roll call (Kei)
- HCLS KB update (Matthias, Adrian)
- Neuroscience microarray data curation (Helen)
- Presentation: Semantic Annotation of Genomic Experiments (Sudeshna)
- Discussion (All)
<kei> helen: EBI, intersted in the microarray use case, some man power to work on it , involved in MGED
<kei} sudeshna: Harvard, involved in a stem cell microarray database project involving different labs, working on semantic annotation of microarray experiments
<mscottm> scribenick: LenaDeus
<LenaDeus> HCLS-KB status skipped
<LenaDeus> kei: discussing paper 1 of microarray use case -- comparing gene expression between neurons with neurofibrillary tangles (NFT) and those without NFT
<LenaDeus> kei: second paper establishes a pattern for gene expression in the 6 brain regions
<LenaDeus> kei: one interesting question is how can these 2 datasets be compared and integrated;
<LenaDeus> kei: also this brings the question of what kind of standards can be used to help facilitate semantic integration
<LenaDeus> kei: the idea is to achieve data integration through query federation
<ericP> q+ to ask the execution model for these query federations
<LenaDeus> Helen: reu QC on data before integrating to gene expression atlas
<LenaDeus> Helen: we typically annotate the data
<LenaDeus> Kei: started collaboration with neuroscience domain experts at the NIF community
<LenaDeus> kei: neurolex - a standard lexicon in the neuroscience domain
<mscottm> NIFSTD - standard ontology from http://nif.nih.gov/
<LenaDeus> Helen: identified experimental submissions from GEO to be curated
<Helen> We pull in data from GEO and recurate, run QC and add to the atlas
<Scott> is that warehousing
<Helen> the atlas is, but we a;so pull this in as we need to it locally to do qc
<LenaDeus> Helen: EFO - a mashup of other bits of ontologies and extensions when needed
<mscottm> EFO - Experimental Factor Ontology
<LenaDeus> Helen: EFO is an application ontology
<LenaDeus> Helen: now is possible to return various NIF terms due to mapping to EFO
<LenaDeus> kei: how can the data be available in RDF format?
<mscottm> Hoping to have a neuroscience release.
<LenaDeus> Helen: exploring existing tools to go from database to RDF
<LenaDeus> Eric: in query transformation, the main cost will be the configuration
<LenaDeus> Eric: If an application to do it is easily configurable, what might be needed is a SPARQL interface to do the exposure of the data to the Semantic Web
<LenaDeus> Kei: eric will interact with Helen's students in order to establish a mapping to the EBI dataabase
<LenaDeus> Kei: we will also need to identify a structure to query the data
<LenaDeus> developing a science collaboration framework
<LenaDeus> Sudneshna: the primary tissues define what kind of cells are being extracted
<LenaDeus> Sudneshna: the experimental factors being tested can be one of many, from the biomaterial of the cells, diff organism parts, ext
<LenaDeus> Sudneshna: most experiments are from liver or bone marrow
<LenaDeus> Sudneshna: we'd like to connect to OBO or GO
<LenaDeus> Sudneshna: using Drupal
<LenaDeus> slide 5: the goal is to export the map as RDF
<LenaDeus> Sudneshna: working on the data processing
<LenaDeus> Sudneshna: we capture the signature of the cell lines
<LenaDeus> Kei: in order to convert data to RDF, are they using some sort of mage-ml conversion?
<LenaDeus> Sudneshna: every piece of information is annotated at the level of the sample; we use an extension of mage-ml
<LenaDeus> Scott: what do you have in mind for the data coversion?
<LenaDeus> Sudneshna: since data is in drupal, it is possible to expose all RDF data as a link
<LenaDeus> Scott: there is a plug-in for drupal that allows taking any drupal content and export to rdf
<LenaDeus> Kei: next step is to determine specific questions to pose to the dataset
<LenaDeus> Kei: based on the annotation, what questions will the dataset be able to answer?
<LenaDeus> Kei: what normalization algorithms are being used to compare data across experiments?
<mscottm> thanks for the links
Notes from Helen
List several example neuroscience data examples from Neurosci consortium, experiments well annotated and disease phenotype and experimental factors. Added a couple of papers associated with one of the examples, and abstracts. Cf profiles between patients AD and normal, and gene expression profiles for brain regions. Looking at specific neurons. Second paper different approach, brain regions looking at normal subjects and making reference gene expression datasets.
Interesting question, how can these datasets be compared and integrated what standards we need to do that that will help this. There are already standards e.g. MAGE-ML XML format which has been used for data exchange. We'd like to explore how to use the semantic web to do query federation and integration. Coincides with the query federation use case @biordf. At EBI the Gene expression atlas has an interest in tailoring to a more neuroscience context, fits well with the use case. There's also the Gene expression omnibus.
HP: We pull in data from GEO and recurate, run QC and add to the atlas
Scott: is that warehousing
HP: the atlas is, but we a;so pull this in as we need to it locally to do qc
Kei:we also have a collaboration with NIF and their NITSTD OWL ontologies for n.science domain. Use case could serve as an example project to involve microarray, NIF etc communities. Helen has communicated witn Maryann and her colleagues for annotation discussion
eric: having written one of the tools that does the query transformation, the main cost is the configuring this - tables and when we think of it in RDF we need to teach the engine, if we had one that was easily configurable would that make it easier. On the new one oracle.
Oracle, 11g has some capacity for sparql queries but you'll need to use d2r. what's the benefit of getting the sparql vs the legacy.
Eric - SW object stuff, easier to configure that d2r. if you can get 5 lines from tables we care about - tell me foreign keys - then I can give you an interface to it.
AI: HP will put Talaai in contact with Eric re the mapping between the reln database system at EBI and SW objects.
Kei: also issue with need some rdf graph structure to access these data.
Harvard stem cell inst. 5 different labs.
Data model based on the MGED ontology - data model really looks like MAGE-TAB
Made their own database and idea will be installed and adapted to their own needs
Took the mged ontology and extended it, added own classes and added groups - combination of the factors and the groups. Markers define the cell types. Repository based on drupal. would like to export RDF graphs. Have a person joining who can do this. Data will be comparable with the GXA. Working on a data processing pipeline as well. Want to have a list of genes associated with the paper, or reported by us, these are the sigs for the factors etc. Wanted to extend the repository to qPCR and UHTS sequencing. The designs and meta data fit the data midel well.
AI: contact Sudeshna re her data model based on MGED Ontology
code for mapping text to ontologues - perl code e.g. to chebi
AI: semi automated perl mapping code - Helen to send out the link
AI:sudeshna to post the URL to the repository in her ppt
AI: biological use cases - Kei and others could look at the papers and come up with some biological use cases. COmpare the paper's gene list.
Plugin for acrobat - have a acrobat reader - select an image and parse gene lists.