BioRDF -- 30 Aug 2010

<kei> take up next agendum

<scribe> scribenick: matthias_samwald

kei: over the last days i spent time on editing the discussion section
... smoother, more readable. i did not delete the old text, so others can have a look and possibly add. i highlighted it in yellow.
... we made a lot of good progress
... how far are we with the examples?

lena: all federated queries are now also on the demo

<Lena> http://ibl.mdanderson.org/~mhdeus/BioRDF/microarray/sparql_endpoint.html

lena: on that page, you can find the six demo queries

kei: is this the full gene list?

lena: this is a small subset

<Lena> http://mibupload.com/u0PSbD.xml

<Lena> (never mind this link)

kei: the first queries are querying the gene lists themselves
... what type of brain regions, disease etc.
... also something about the data themselves (same software package etc.)
... looks pretty good to me
... the last queries focus more on query federation
... these queries also make use of the origins of the datasets

lena: see Q4
... as scott suggested, i used the VoID vocabulary.
... if you click on the query, the SPARQL query is automatically entered.
... for the federated queries -- it does not work on some browsers

michael: would it be possible to have a query that returns all the genes in the final gene list?
... i.e., the simple final gene list for a certain experiment.
... most of these queries do not return much information, it would be nice to know what the basic information available is

lena: for the first gene list there are 162
... i can use VoID for that kind of information

egon: the vocabulary predicate you are using on diseasome, what are you doing with that?

lena: what we are doing for all datasets that have been annotated with that vocabulary.
... gives us certainty that we are finding the things we are looking for

scott: but are you referring to diseasome as a dataset? it is not.

<mscottm> I am talking about this part of a query: OPTIONAL { [ rdf:type void:Dataset ; void:sparqlEndpoint ?srvc2 ; void:vocabulary <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/> ; dct:issued ?issued2 ] . FILTER (?issued2 > ?issued)

kei: is it a language or vocabulary? how do you use VoID so that machine knows that this dataset has to do with this certain set of diseases.

(sorry, audio quality is quite bad, hard to discern people)

<Lena> void guide: http://semanticweb.org/wiki/VoiD

kei: the concern is: when we are using VoID to describe data origins, how is the data provenance actually captured by VoiD?

lena: you can describe subjects (e.g. "gene").

<kei> I can't hear what Lena said

<Lena> +351 21 4469852

<mscottm> Thank you Eric!

<ericP> the part of Lilly Tomlin's "Operator" will be played by ericP today

lena: to find all datasets that have to do with genes, i would have to figure out the URI -- it is easier with the VoiD vocabulary.

scott: diseasome is not a vocabulary, but a dataset

lena: you still have to say what is a disease by indicating a full URL
... we need a dataset of genes AND diseases

scott: VoID just gives you means to talk about a graph

kei: would it be a lot of work to switch to that use of subject?
... to make the semantics a bit more understandable

lena: okay. this is easy to change.

<ericP> uname -a

(lena and eric talk about issues with 32 vs. 64 bit version of federation software)

eric: but this is not critical for the paper

kei: for the paper the review period is quite sure (in the next 3 weeks)

eric: the second query we are working on, i got too many resoultions from one endpoint, we need to figure out how to solve this.

kei: the query that makes use of PharmGKB is a good example. we do not need to get into biological details for this paper, though.

lena: there would not be enough papers, also the reviewers would not understand.

scott: about the NCBO sparql endpoint: i don't know if there is a way with this microarray scenario. we would need an appropriate vocabulary (e.g. for diseases). but this is a level of provenance that is not fully formalized on the NCBO sparql endpoint.
... e.g., a query that finds all datasets about neurodegenerative diseases -- that would be possible
... another example: if you have a list of neurodegenerative diseases (based on ontology), then you can find data from other neurodegenarative diseases

lena: we could trim the list of disesases in Q4 to only neurodegenerative diseases

kei: in terms of the paper, how do we go about finalizing it?

lena: we have to calculate ~1 page for abstract, 1 page for references
... most of the results can be deferred to links to web pages

kei: lena, you are the person to do the first cut
... still, it is interesting to talk about the data model in the paper and give some examples

lena: i would say keep the diagram, lose the triples. i will make these changes.

kei: the deadline is friday, at one time we need to convert it to the IEEE format. when do we make that switch?

lena: my suggestion is to switch to IEEE on wedenesday and have everyone read it.
... on thursday we can still have a conference call and see if we all agree

scott: i would like to have some slides about this work that i can present at Oxford Global Pharma conference in october

<mscottm> Uh oh - on hcls2 Zakim, I get "This passcode is not valid."

<mscottm> Can you help, Eric?

<scribe> Scribe: Matthias Samwald

- DRAFT -

BioRDF

30 Aug 2010

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output