HCLSIG/LODD/Meetings/2009-04-15 Conference Call
Conference Details
- Date of Call: Wednesday April 15, 2009
- Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
- Dial-In #: +1.617.761.6200 (Cambridge, MA)
- Dial-In #: +33.4.89.06.34.99 (Nice, France)
- Dial-In #: +44.117.370.6152 (Bristol, UK)
- Participant Access Code: 4257 ("HCLS").
- IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
- Duration: ~1h
- Convener: Susie
Agenda
- Overview of Lilly's internal Linked Data project - William
- Prioritized Questions for identifying data sources - Bosse & Susie
- Linking TCM into LODD Cloud - Jun & Kei
- Updating the HCLS KB with the LODD data sets - Matthias
- iTriplification Challenge - All
- AOB
Minutes
(also available in scribe-bot HTML)
Attendees: William, Bosse, Harold, Matthias, Anja, Oktie, Kei, Jun, Scott, EricP, Dave, Susie
Apologies: Chris
<matthias_samwald> topic: Overview of Lilly's internal Linked Data project
<matthias_samwald> William Sanchez: we have several projects making use of linked data at Lilly.
<matthias_samwald> ... currently we are analysing internal as well as external data
<matthias_samwald> ... PubMed and other datasources
<matthias_samwald> ... normally this is a very manual process
<matthias_samwald> ... we are extending public datasources with proprietary datasources
<matthias_samwald> ... we test browsers like Marbles, Longwell, Visinav
<matthias_samwald> ... we test ontologies like UMBEL and VoiD
<matthias_samwald> William: we are doing another project in conjunction with SDD
<matthias_samwald> ... a document management system
<matthias_samwald> ... the product has some in-built metadata capabilities
<matthias_samwald> ... there are descriptions for individual datasets, but no inter-dataset relationships
<matthias_samwald> ... we want to build a layer on top to integrate
<matthias_samwald> ... we use a JDBC interface to access
<matthias_samwald> ... we were able to access the datasets with D2R
<matthias_samwald> ... the next thing we need to do is to create mapping files. there are thousands of datasets
<matthias_samwald> ... another thing we did: we extend Spotfire to create a SPARQL endpoint
<matthias_samwald> ... any user can submit any query to any datasource
<matthias_samwald> ... we are in contact with the developers of Spotfire to get this into the product
<matthias_samwald> Anja: how are you going to develop the mapping files?
<matthias_samwald> William: if a user finds a specific dataset that he wants to query against, we can respond to that
<matthias_samwald> Susie: what you be looking for the group at FU Berlin to make changes to D2R based on your needs?
<matthias_samwald> William: this is quite specific to the issues with JDBC, but we can discuss further
<matthias_samwald> Susie: do you plan to present your work? papers?
<matthias_samwald> William: sure. No particular target at the moment.
<matthias_samwald> Kei: how are the data files related to the databases?
<matthias_samwald> William: there is the possibility to upload the SAS files and make it part of the database
<matthias_samwald> Kei: are analysis results stored backed to the relational database?
<matthias_samwald> William: no, not at the moment
<matthias_samwald> Susie: you recently mentioned scalability limits with one of the UI tools
<matthias_samwald> William: Longwell.
<matthias_samwald> ... it puts everything in memory
<matthias_samwald> Susie: how do you link to PubMed?
<matthias_samwald> William: still working on that
<matthias_samwald> Susie: do you think that users would perform a lot of analysis over the linked data, or would it be rather a navigation between datasets?
<matthias_samwald> William: one of the issues is to identify the datasets that a specific project requires.
<matthias_samwald> ... after datasets are identified, SAS tools can be used for analysis.
<matthias_samwald> ... e.g., "give me all datasets about cardiovascular disease"
<matthias_samwald> topic: Prioritized Questions for identifying data sources
<Susie> http://esw.w3.org/topic/HCLSIG/LODD/Questions
<matthias_samwald> Susie: Bosse and I had a call where put together questions. We have 15 in total now. See URL.
<matthias_samwald> ... Please have a look at the list and make suggestions
<matthias_samwald> ... we should also evaluate what we can already answer with the current LODD datasets, and where we need to add more datasets
<matthias_samwald> ... i volunteer on working on the first 3 questions
<matthias_samwald> ... with "work on" i mean that we identify which datasets we should add, and how we fare with current data
<matthias_samwald> kei: i would be interested on working on "Are there natural alternatives to this drug? " together with jun
<matthias_samwald> susie: others, please have a look at the wiki and add your name
<matthias_samwald> susie: we can also make some progress on these questions during the F2F
<matthias_samwald> anja: if someone has "questions about the questions", feel free to ask me about datasets etc.
<matthias_samwald> ...: maybe put sub-questions on the wiki
<matthias_samwald> TOPIC --- TCM
<matthias_samwald> kei: we focused on the BioRDF paper recently, so on my side there was not much further progress so far
<matthias_samwald> ... matthias has loaded the LODD datasets into the HCLS KB at DERI
<matthias_samwald> ... need to think about linkage
<matthias_samwald> ... datasets only contained gene symbols, not IDs. not a unique ID.
<matthias_samwald> ... we might contact the database curator.
<matthias_samwald> anja: bio2rdf has that online as a dump
<matthias_samwald> jun: between gene symbols and ids?
<matthias_samwald> anja: yes
<matthias_samwald> kei: it is not straight forward, you have to take species into account
<matthias_samwald> susie: if the mapping file is too naive we might explore other sources. william hayes recently spoke about creating a thesaurus for such purposes. not sure if it fits and if they want to share, though.
<matthias_samwald> jun: mapping to MeSH
<matthias_samwald> anja: we could use the SILK framework
<matthias_samwald> jun: MeSH provide anchors for matching disease IDs
<matthias_samwald> what has terminology focused on?
<ericP> SNOMED, mostly
<ericP> there are a bunch of others avail: e.g. UMLS, LOINC
<matthias_samwald> matthias: OMIM only covers a small subset of diseases
<matthias_samwald> susie: we can discuss that at the F2F as well
<matthias_samwald> ... there seems to be some overlap between LODD and Pharma Ontology
<ericP> COI is using the Stanford Drug Ontology a little as well
<ericP> might want to have helen geeking with y'all
<matthias_samwald> ... would people that participate remotely be interested in dialing in for the F2F?
<matthias_samwald> ... during the breakouts?
<matthias_samwald> kei: licensing is an issue.
<matthias_samwald> ... all current LODD datasets are open and free to distribute?
<matthias_samwald> susie: we selected datasets based on free licenses, but we need to take care.
<egonw> there is also the problem of license incompatibility
<egonw> http://www.sennoma.net/main/archives/2009/04/an_open_question_about_open_li.php is informative
<egonw> goes into what is and is not allowed in remixing data
<matthias_samwald> susie: some of the datasets had some limitations (e.g. non-commercial)
<egonw> is there an overview of licenses of the current LODD data sets?
<ericP> topic: HCLS KB updated with LODD datasets
<ericP> matthias_samwald: if you query the HCLS KB and find probs, please report them
<ericP> topic: next meeting
<ericP> Susie: next call in four weeks
<ericP> ... haven't heard whether folks want to dial into pharma ontology task during the F2F
<ericP> AnjaJentzsch: would like to call in, and expect chris to dial in as well
<Susie> http://esw.w3.org/topic/HCLSIG/Meetings/2009-04-30_F2F
<ericP> AnjaJentzsch: given no call in 4 weeks, we should get ontologies which we want to map to
<ericP> ... i could interlink at least drugs, .. and genes
<Susie> http://esw.w3.org/topic/HCLSIG/PharmaOntology/Roles
<ericP> Susie: pharmaont has created a list of q's that folks would ask based on their role, and lists key entities and ontologies that they may map to
<ericP> topic: AOB
<Susie> http://www.i-semantics.tugraz.at/triplification_challenge
<ericP> Susie: deadline for triplification challenge: 30May