HCLSIG/LODD/Meetings/2009-05-27 Conference Call
Conference Details
- Date of Call: Wednesday May 27, 2009
- Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
- Dial-In #: +1.617.761.6200 (Cambridge, MA)
- Dial-In #: +33.4.89.06.34.99 (Nice, France)
- Dial-In #: +44.117.370.6152 (Bristol, UK)
- Participant Access Code: 4257 ("HCLS").
- IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
- Duration: ~1h
- Convener: Susie
Agenda
- Progress on converting data sets - All
- Progress on use case - All
- iTriplification Challenge - All
- AOB
Minutes
<matthias_samwald> scribe: matthias_samwald
<matthias_samwald> attendees: Anja, Vassil, Bosse, ericP, Susie, Jun, Matthias
<matthias_samwald> apologies: Kei
<matthias_samwald> TOPIC --- Datasets
<matthias_samwald> jun: we created a mapping between TCM and Entrez gene
<matthias_samwald> i did some manual correction of the mapping
<matthias_samwald> ... i did some manual correction of the mapping
<matthias_samwald> ... anja also sent me some links today
<matthias_samwald> ... she also linked to SIDER, drugbank
<matthias_samwald> ... dataset quite well interlinked
<matthias_samwald> ... anja encountered performance problems with running SILK over remote SPARQL endpoints
<matthias_samwald> ... we have a rough idea of an interesting use-case in this area
<matthias_samwald> ... we managed to submit an abstract to the DILS poster/demo session, we will get feedback at the end of this month
<matthias_samwald> ... we investigate aTags to enrich the knowledge base with new statements
<matthias_samwald> anja: i did not put use-case on wiki, proposed another small use-case related to TCM today via mail, will put it on the wiki later
<matthias_samwald> ... for SILK, i had to put data into local SPARQL endpoint
<matthias_samwald> ... performance not really an issue, datasets are not updated that often, letting SILK run for some hours is not that bad in this case.
<matthias_samwald> ... had to use local SPARQL endpoint because public endpoints do not allow 7 million queries in a row...
<matthias_samwald> anja: egon willighagen was excited about this new dataset
<matthias_samwald> ... matthias and peter ansell were also working on SIDER in parallel.
<matthias_samwald> ... i might also look into linking to LinkedCD
<matthias_samwald> susie: LinkedCT contains new drugs, will there be much side effects data?
<matthias_samwald> anja: there are also marketed drugs in LinkedCT
<matthias_samwald> matthias: i am almost finished with converting SIDER to aTags, i am re-using the DBpedia and OBO URIs directly, should be complementary to the conversion of Anja
<matthias_samwald> anja: I looked at the conversion script from peter ansell, don't know about the results
<matthias_samwald> susie: a next step will be to think about additional datasets
<matthias_samwald> bosse: i have not made progress with working on a query and identifying necessary datasets
<matthias_samwald> susie: we should be able to pose new interesting questions based on the LOD we created
<matthias_samwald> ... we are still lacking questions / necessary steps to utilize the data
<matthias_samwald> jun: in the wiki page we are not giving enough demonstrations. we are just giving a description on how things are done by browsing different websites -- but we wwant to show how linked data can be used!
<matthias_samwald> susie: bosse and i put together some top questions.
<Susie> http://esw.w3.org/topic/HCLSIG/LODD/Questions
<matthias_samwald> susie: (going through questions)
<matthias_samwald> anja: we can give a summarisation of all active ingredients a company is working on / marketing
<matthias_samwald> susie: can we link to pathways?
<matthias_samwald> ... there is pathway information in some Bio2RDF datasets
<matthias_samwald> matthias: the "Linked Life Data" datasets (LarKC / Astra Zeneca) should also have a lot of pathway data
<matthias_samwald> susie: genes and proteins are often linked interchangegably, this could be an entry point.
<matthias_samwald> vassil: hi, i am from ontotext, we are working with Bosse on Linking Life Data
<matthias_samwald> ... we can work on interlinking / sharing identifiers
<matthias_samwald> ... we can also look at user interfaces
<matthias_samwald> ... just querying with SPARQL does not help
<matthias_samwald> susie: i agree
<matthias_samwald> vassil: we are planning to work on user interfaces
<matthias_samwald> ... we should start from the other side -- developing interfaces for concrete tasks by researchers
<matthias_samwald> matthias: could we use Ontotext LifeSKIM?
<matthias_samwald> vassil: this is a bit more specialized, you cannot put arbitrary RDF into it
<matthias_samwald> susie: it's a chicken and egg problem. has anyone else thought about the right order?
<matthias_samwald> vassil: we are looking at Exhibit
<matthias_samwald> vassil: the idea is to run SPARQL queries, get back JSON, and render on screen
<matthias_samwald> ... we hope to see results in 1 month
<matthias_samwald> susie: (describes query 2 on the wiki page)
<matthias_samwald> ... it seems to be a broad question... aggregate everything around a compound
<matthias_samwald> ... questions 2 (How is this therapy/compound different from existing therapies/compounds) is too broad actually, i will move them down in the list
<matthias_samwald> ... question 3 (What are our patients saying about our drugs) has a strong text mining component
<matthias_samwald> ... another question: Who are the key opinion leaders for the therapeutic area?
<matthias_samwald> vassil: we have pubmed, but not citation informatoin
<matthias_samwald> matthias: we could ask anita de waard (elsevier)
<matthias_samwald> susie: i will ask anita
<matthias_samwald> ... question: Of drugs from either the same or different company for the same indication, are they approved in the same target region of interest – US or Global
<matthias_samwald> susie: is this possible in SPARQL
<matthias_samwald> eric: not possible in a single SPARQL query
<matthias_samwald> susie: this query seems to be quite tricky
<matthias_samwald> ... especially the part about geography
<matthias_samwald> bosse: the information is proprietary, but i am not totally sure
<matthias_samwald> ACTION ITEM -- have a look whether WHO has useful data about geography of approved drugs
<matthias_samwald> ACTION: bosse have a look whether WHO has useful data about geography of approved drugs
- RRSAgent records action 1
<matthias_samwald> susie: in two weeks we will see how successful we were in looking into these questions
<matthias_samwald> susie: question "are there natural alternatives to this drug?" -- can it be answered with the TCM dataset?
<matthias_samwald> jun: i am a bit worried about the user interface for presenting query results, that might cost me more time
<matthias_samwald> ... we might need to add some new information (via aTags) to make it useful
<matthias_samwald> anja: you can use associations between genes and diseases, if a drug and a TCM drug are working on the same genes, this might be a hint
<matthias_samwald> anja: we can define similarity, but we need to have a critical look at the validity
<matthias_samwald> jun: the TCM people also used pathway information to validate
<matthias_samwald> matthias: i can help in judging the validity
<matthias_samwald> TOPIC --- triplification challenge
<matthias_samwald> susie: maybe TCM?
<matthias_samwald> anja: we need use-cases