HCLSIG/LODD/Meetings/2009-04-15 Conference Call

From W3C Wiki
< HCLSIG‎ | LODD‎ | Meetings

Conference Details

  • Date of Call: Wednesday April 15, 2009
  • Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
  • Dial-In #: +1.617.761.6200 (Cambridge, MA)
  • Dial-In #: +33.4.89.06.34.99 (Nice, France)
  • Dial-In #: +44.117.370.6152 (Bristol, UK)
  • Participant Access Code: 4257 ("HCLS").
  • IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
  • Duration: ~1h
  • Convener: Susie

Agenda

  • Overview of Lilly's internal Linked Data project - William
  • Prioritized Questions for identifying data sources - Bosse & Susie
  • Linking TCM into LODD Cloud - Jun & Kei
  • Updating the HCLS KB with the LODD data sets - Matthias
  • iTriplification Challenge - All
  • AOB

Minutes

(also available in scribe-bot HTML)

Attendees: William, Bosse, Harold, Matthias, Anja, Oktie, Kei, Jun, Scott, EricP, Dave, Susie

Apologies: Chris

<matthias_samwald> topic: Overview of Lilly's internal Linked Data project

<matthias_samwald> William Sanchez: we have several projects making use of linked data at Lilly.

<matthias_samwald> ... currently we are analysing internal as well as external data

<matthias_samwald> ... PubMed and other datasources

<matthias_samwald> ... normally this is a very manual process

<matthias_samwald> ... we are extending public datasources with proprietary datasources

<matthias_samwald> ... we test browsers like Marbles, Longwell, Visinav

<matthias_samwald> ... we test ontologies like UMBEL and VoiD

<matthias_samwald> William: we are doing another project in conjunction with SDD

<matthias_samwald> ... a document management system

<matthias_samwald> ... the product has some in-built metadata capabilities

<matthias_samwald> ... there are descriptions for individual datasets, but no inter-dataset relationships

<matthias_samwald> ... we want to build a layer on top to integrate

<matthias_samwald> ... we use a JDBC interface to access

<matthias_samwald> ... we were able to access the datasets with D2R

<matthias_samwald> ... the next thing we need to do is to create mapping files. there are thousands of datasets

<matthias_samwald> ... another thing we did: we extend Spotfire to create a SPARQL endpoint

<matthias_samwald> ... any user can submit any query to any datasource

<matthias_samwald> ... we are in contact with the developers of Spotfire to get this into the product

<matthias_samwald> Anja: how are you going to develop the mapping files?

<matthias_samwald> William: if a user finds a specific dataset that he wants to query against, we can respond to that

<matthias_samwald> Susie: what you be looking for the group at FU Berlin to make changes to D2R based on your needs?

<matthias_samwald> William: this is quite specific to the issues with JDBC, but we can discuss further

<matthias_samwald> Susie: do you plan to present your work? papers?

<matthias_samwald> William: sure. No particular target at the moment.

<matthias_samwald> Kei: how are the data files related to the databases?

<matthias_samwald> William: there is the possibility to upload the SAS files and make it part of the database

<matthias_samwald> Kei: are analysis results stored backed to the relational database?

<matthias_samwald> William: no, not at the moment

<matthias_samwald> Susie: you recently mentioned scalability limits with one of the UI tools

<matthias_samwald> William: Longwell.

<matthias_samwald> ... it puts everything in memory

<matthias_samwald> Susie: how do you link to PubMed?

<matthias_samwald> William: still working on that

<matthias_samwald> Susie: do you think that users would perform a lot of analysis over the linked data, or would it be rather a navigation between datasets?

<matthias_samwald> William: one of the issues is to identify the datasets that a specific project requires.

<matthias_samwald> ... after datasets are identified, SAS tools can be used for analysis.

<matthias_samwald> ... e.g., "give me all datasets about cardiovascular disease"

<matthias_samwald> topic: Prioritized Questions for identifying data sources

<Susie> http://esw.w3.org/topic/HCLSIG/LODD/Questions

<matthias_samwald> Susie: Bosse and I had a call where put together questions. We have 15 in total now. See URL.

<matthias_samwald> ... Please have a look at the list and make suggestions

<matthias_samwald> ... we should also evaluate what we can already answer with the current LODD datasets, and where we need to add more datasets

<matthias_samwald> ... i volunteer on working on the first 3 questions

<matthias_samwald> ... with "work on" i mean that we identify which datasets we should add, and how we fare with current data

<matthias_samwald> kei: i would be interested on working on "Are there natural alternatives to this drug? " together with jun

<matthias_samwald> susie: others, please have a look at the wiki and add your name

<matthias_samwald> susie: we can also make some progress on these questions during the F2F

<matthias_samwald> anja: if someone has "questions about the questions", feel free to ask me about datasets etc.

<matthias_samwald> ...: maybe put sub-questions on the wiki

<matthias_samwald> TOPIC --- TCM

<matthias_samwald> kei: we focused on the BioRDF paper recently, so on my side there was not much further progress so far

<matthias_samwald> ... matthias has loaded the LODD datasets into the HCLS KB at DERI

<matthias_samwald> ... need to think about linkage

<matthias_samwald> ... datasets only contained gene symbols, not IDs. not a unique ID.

<matthias_samwald> ... we might contact the database curator.

<matthias_samwald> anja: bio2rdf has that online as a dump

<matthias_samwald> jun: between gene symbols and ids?

<matthias_samwald> anja: yes

<matthias_samwald> kei: it is not straight forward, you have to take species into account

<matthias_samwald> susie: if the mapping file is too naive we might explore other sources. william hayes recently spoke about creating a thesaurus for such purposes. not sure if it fits and if they want to share, though.

<matthias_samwald> jun: mapping to MeSH

<matthias_samwald> anja: we could use the SILK framework

<matthias_samwald> jun: MeSH provide anchors for matching disease IDs

<matthias_samwald> what has terminology focused on?

<ericP> SNOMED, mostly

<ericP> there are a bunch of others avail: e.g. UMLS, LOINC

<matthias_samwald> matthias: OMIM only covers a small subset of diseases

<matthias_samwald> susie: we can discuss that at the F2F as well

<matthias_samwald> ... there seems to be some overlap between LODD and Pharma Ontology

<ericP> COI is using the Stanford Drug Ontology a little as well

<ericP> might want to have helen geeking with y'all

<matthias_samwald> ... would people that participate remotely be interested in dialing in for the F2F?

<matthias_samwald> ... during the breakouts?

<matthias_samwald> kei: licensing is an issue.

<matthias_samwald> ... all current LODD datasets are open and free to distribute?

<matthias_samwald> susie: we selected datasets based on free licenses, but we need to take care.

<egonw> there is also the problem of license incompatibility

<egonw> http://www.sennoma.net/main/archives/2009/04/an_open_question_about_open_li.php is informative

<egonw> goes into what is and is not allowed in remixing data

<matthias_samwald> susie: some of the datasets had some limitations (e.g. non-commercial)

<egonw> is there an overview of licenses of the current LODD data sets?

<ericP> topic: HCLS KB updated with LODD datasets

<ericP> matthias_samwald: if you query the HCLS KB and find probs, please report them

<ericP> topic: next meeting

<ericP> Susie: next call in four weeks

<ericP> ... haven't heard whether folks want to dial into pharma ontology task during the F2F

<ericP> AnjaJentzsch: would like to call in, and expect chris to dial in as well

<Susie> http://esw.w3.org/topic/HCLSIG/Meetings/2009-04-30_F2F

<ericP> AnjaJentzsch: given no call in 4 weeks, we should get ontologies which we want to map to

<ericP> ... i could interlink at least drugs, .. and genes

<Susie> http://esw.w3.org/topic/HCLSIG/PharmaOntology/Roles

<ericP> Susie: pharmaont has created a list of q's that folks would ask based on their role, and lists key entities and ontologies that they may map to

<ericP> topic: AOB

<Susie> http://www.i-semantics.tugraz.at/triplification_challenge

<ericP> Susie: deadline for triplification challenge: 30May