HCLSIG/LODD/Meetings/2009-05-27 Conference Call

Conference Details

Date of Call: Wednesday May 27, 2009
Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
Dial-In #: +1.617.761.6200 (Cambridge, MA)
Dial-In #: +33.4.89.06.34.99 (Nice, France)
Dial-In #: +44.117.370.6152 (Bristol, UK)
Participant Access Code: 4257 ("HCLS").
IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
Duration: ~1h
Convener: Susie

Agenda

Progress on converting data sets - All
Progress on use case - All
iTriplification Challenge - All
AOB

Minutes

<matthias_samwald> scribe: matthias_samwald

<matthias_samwald> attendees: Anja, Vassil, Bosse, ericP, Susie, Jun, Matthias

<matthias_samwald> apologies: Kei

<matthias_samwald> TOPIC --- Datasets

<matthias_samwald> jun: we created a mapping between TCM and Entrez gene

<matthias_samwald> i did some manual correction of the mapping

<matthias_samwald> ... i did some manual correction of the mapping

<matthias_samwald> ... anja also sent me some links today

<matthias_samwald> ... she also linked to SIDER, drugbank

<matthias_samwald> ... dataset quite well interlinked

<matthias_samwald> ... anja encountered performance problems with running SILK over remote SPARQL endpoints

<matthias_samwald> ... we have a rough idea of an interesting use-case in this area

<matthias_samwald> ... we managed to submit an abstract to the DILS poster/demo session, we will get feedback at the end of this month

<matthias_samwald> ... we investigate aTags to enrich the knowledge base with new statements

<matthias_samwald> anja: i did not put use-case on wiki, proposed another small use-case related to TCM today via mail, will put it on the wiki later

<matthias_samwald> ... for SILK, i had to put data into local SPARQL endpoint

<matthias_samwald> ... performance not really an issue, datasets are not updated that often, letting SILK run for some hours is not that bad in this case.

<matthias_samwald> ... had to use local SPARQL endpoint because public endpoints do not allow 7 million queries in a row...

<matthias_samwald> anja: egon willighagen was excited about this new dataset

<matthias_samwald> ... matthias and peter ansell were also working on SIDER in parallel.

<matthias_samwald> ... i might also look into linking to LinkedCD

<matthias_samwald> susie: LinkedCT contains new drugs, will there be much side effects data?

<matthias_samwald> anja: there are also marketed drugs in LinkedCT

<matthias_samwald> matthias: i am almost finished with converting SIDER to aTags, i am re-using the DBpedia and OBO URIs directly, should be complementary to the conversion of Anja

<matthias_samwald> anja: I looked at the conversion script from peter ansell, don't know about the results

<matthias_samwald> susie: a next step will be to think about additional datasets

<matthias_samwald> bosse: i have not made progress with working on a query and identifying necessary datasets

<matthias_samwald> susie: we should be able to pose new interesting questions based on the LOD we created

<matthias_samwald> ... we are still lacking questions / necessary steps to utilize the data

<matthias_samwald> jun: in the wiki page we are not giving enough demonstrations. we are just giving a description on how things are done by browsing different websites -- but we wwant to show how linked data can be used!

<matthias_samwald> susie: bosse and i put together some top questions.

<Susie> http://esw.w3.org/topic/HCLSIG/LODD/Questions

<matthias_samwald> susie: (going through questions)

<matthias_samwald> anja: we can give a summarisation of all active ingredients a company is working on / marketing

<matthias_samwald> susie: can we link to pathways?

<matthias_samwald> ... there is pathway information in some Bio2RDF datasets

<matthias_samwald> matthias: the "Linked Life Data" datasets (LarKC / Astra Zeneca) should also have a lot of pathway data

<matthias_samwald> susie: genes and proteins are often linked interchangegably, this could be an entry point.

<matthias_samwald> vassil: hi, i am from ontotext, we are working with Bosse on Linking Life Data

<matthias_samwald> ... we can work on interlinking / sharing identifiers

<matthias_samwald> ... we can also look at user interfaces

<matthias_samwald> ... just querying with SPARQL does not help

<matthias_samwald> susie: i agree

<matthias_samwald> vassil: we are planning to work on user interfaces

<matthias_samwald> ... we should start from the other side -- developing interfaces for concrete tasks by researchers

<matthias_samwald> matthias: could we use Ontotext LifeSKIM?

<matthias_samwald> vassil: this is a bit more specialized, you cannot put arbitrary RDF into it

<matthias_samwald> susie: it's a chicken and egg problem. has anyone else thought about the right order?

<matthias_samwald> vassil: we are looking at Exhibit

<matthias_samwald> vassil: the idea is to run SPARQL queries, get back JSON, and render on screen

<matthias_samwald> ... we hope to see results in 1 month

<matthias_samwald> susie: (describes query 2 on the wiki page)

<matthias_samwald> ... it seems to be a broad question... aggregate everything around a compound

<matthias_samwald> ... questions 2 (How is this therapy/compound different from existing therapies/compounds) is too broad actually, i will move them down in the list

<matthias_samwald> ... question 3 (What are our patients saying about our drugs) has a strong text mining component

<matthias_samwald> ... another question: Who are the key opinion leaders for the therapeutic area?

<matthias_samwald> vassil: we have pubmed, but not citation informatoin

<matthias_samwald> matthias: we could ask anita de waard (elsevier)

<matthias_samwald> susie: i will ask anita

<matthias_samwald> ... question: Of drugs from either the same or different company for the same indication, are they approved in the same target region of interest – US or Global

<matthias_samwald> susie: is this possible in SPARQL

<matthias_samwald> eric: not possible in a single SPARQL query

<matthias_samwald> susie: this query seems to be quite tricky

<matthias_samwald> ... especially the part about geography

<matthias_samwald> bosse: the information is proprietary, but i am not totally sure

<matthias_samwald> ACTION ITEM -- have a look whether WHO has useful data about geography of approved drugs

<matthias_samwald> ACTION: bosse have a look whether WHO has useful data about geography of approved drugs

RRSAgent records action 1

<matthias_samwald> susie: in two weeks we will see how successful we were in looking into these questions

<matthias_samwald> susie: question "are there natural alternatives to this drug?" -- can it be answered with the TCM dataset?

<matthias_samwald> jun: i am a bit worried about the user interface for presenting query results, that might cost me more time

<matthias_samwald> ... we might need to add some new information (via aTags) to make it useful

<matthias_samwald> anja: you can use associations between genes and diseases, if a drug and a TCM drug are working on the same genes, this might be a hint

<matthias_samwald> anja: we can define similarity, but we need to have a critical look at the validity

<matthias_samwald> jun: the TCM people also used pathway information to validate

<matthias_samwald> matthias: i can help in judging the validity

<matthias_samwald> TOPIC --- triplification challenge

<matthias_samwald> susie: maybe TCM?

<matthias_samwald> anja: we need use-cases