SV_MEETING_TITLE -- 09 Aug 2010

RDF genelist representation

<jun> I can scribe

<kei> scribenick lena

<kei> io-informatics.com

sorry... my internet connection was lost

IO informatics is moving towards integration of biomedical data;

mscottm: data sharing in bioinformatics

(sorry... who is talking?)

<mscottm> That was Chuck Raffi (sp?) from IO Informatics

kei: been working on data integration using rdf/owl to standardize in machine understandable way
... provennace, data used to describe experiments; how to link to existing ontologies

<matthias_samwald> lena, you are causing noise while typing.

(thaks, just mutted myself :-) )

kei: could recommend best practices to facilitate data integration
... could be applied to microaray data but to other datasets as well
... capture the relationships to support semantic queries
... finalizing the genelist as soon as possible due to deadline submission approaching

<jun> I think I should talk after Lena

kei: introduction almost complete
... possible link to other ontologies (provenir?) that our structure could be related to

jun: provenance information is quite rich
... looking at what used to generate the samples

kei: capture the version of software

satya: have not modified the ontology

jun: doap ( ?) has been used to describe software
... makes sense to start with the queries and decide how the rdf representation of the data looks like and then represent in the ontology

<jun> http://usefulinc.com/ns/doap#

kei: our work should be in line with other existing efforts
... examples are array express or mged group

ssahoo2 (breaking up): ncbo and obi ontologies

ssahoo2: making sure we are not re-inventing the wheel
... nci thesaurus and other ontologies, need to be careful to avoid re-creating the ontologies

mscottm: in contact with EBI who have said that they have not done this yet
... software ontologies have been used

kei: look at the rdf structure and see how well the queries can be answered
... decide what are the unique things that we have contributed and how can we link to other groups work

<mscottm> http://bioportal.bioontology.org/ontologies/42036

<mscottm> http://www.ebi.ac.uk/efo/swo

kei: other potential datasets that we can integrate with (pathway/protein/diseasome)
... scott mentions uniprot dataset

mscottm: uniprot has its own rdf representation - not in hcls kb because it is very large
... we can integrate relevant parts of the uniprot datasets

kei: interesting to integrate genomics and proteomics

mscottm: uniprot datasets behind a sparql endpoint has been done (but used their own flavor of rdf structure)
... sticky point is coordination with others in the community
... is there a way to coordinate with bio2rdf?
... while we are at it, why not integrate shared vocabularies?
... use some of the information in the protein records

<mscottm> http://www.stanford.edu/~coulet/material/ontology/phare.owl

mscottm: another source of rdf it's a pharma ontology about genes, drugs and diseases
... put together at ncbo by Adrian Cullet (?)
... the source of information is nlp techniques

<mscottm> http://sparql.bioontology.org/webui

mscottm: (not a text miner)

<mscottm> http://www.stanford.edu/~coulet/material/sparql_queries

mscottm: already behind a sparql endpoint

scott agreed to coordinate with other sparql endpoints

matthias_samwald: annotating some of the text associated with the microarray studies
... need to know which kinds of studies did we chose

mscottm: use void
... void statement can be inserted into the graph itself
... can also put the statement in the second graph
... use void to refer to who created a particular statement

<ssahoo2> to follow up on Scott's idea - should we treat Lena's RDF file and Jun's RDF file as two separate sources?

satya - they are not separate representations, they are follow ups

the name at the end of the files is not its owner but the latest person who made the modifications

mscottm: idea is coming up with interesting provenance information

<ssahoo2> right - we add distinct named graph ids with each of the ttl files and issue a federated SPARQL query directed to each named graph

<mscottm> yes

<ssahoo2> ok

http://ibl.mdanderson.org/~mhdeus/sparql_federation/endpoint2.php

<jun> yes, sure

<ssahoo2> yes

matthias_samwald: if we focus on some example it is easier to connect to other sources

kei: think about some example queries that will give broader integration with other types of data

mscottm: will look into how niff is doing relatively to its work with microarrays
... problem is our focus queries are too directed to the neurosciences, but not so much towards provenance

kei: jun is adding the provenance dimension to the paper

mscottm: if we know where the genelist came from and the microarray experiment, we can come up with the experiments as results from previous provenance queries
... we can build a query that preceeds the selection of the 3 datasets

kei: all the examples are affy platforms, but different statistical approachas
... main tasks - get the examples working!

<mscottm> mscottmarshall@gmail.com

- DRAFT -

SV_MEETING_TITLE

09 Aug 2010

Attendees

Contents

RDF genelist representation

Summary of Action Items

Scribe.perl diagnostic output