HCLSIG/LODD/Meetings/2009-06-24 Conference Call
Conference Details
- Date of Call: Wednesday June 24, 2009
- Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
- Dial-In #: +1.617.761.6200 (Cambridge, MA)
- Dial-In #: +33.4.89.06.34.99 (Nice, France)
- Dial-In #: +44.117.370.6152 (Bristol, UK)
- Participant Access Code: 4257 ("HCLS").
- IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
- Duration: ~1h
- Convener: Susie
Agenda
- Progress on TCM & Slidder - Anja, Jun
- Progress on STITCH - Matthias, Anja
- Pharma use case - Bosse, Susie
- DILS Poster - Jun
- iTriplification Challenge - Anja
- AOB
Minutes
Attendees: Anja, Jun, Kei, Matthias, Susie, Bosse
Apologies: Oktie, Elof
<Susie> Anja: Have been generating links from TCM to other data sets
<Susie> Anja: Had some D2R limitations
<Susie> Anja: Was sending pre-generated links to JUn
<Susie> Anja: D2R 3.3 release today or tomorrow
<Susie> Anja: Anja, Kei, Jun found some missing links for Anja to generate
<Susie> Jun: Noticed D2R doesn't give complete results
<Susie> Jun: Tried to verify results from D2R manually
<Susie> Jun: Have made important progress
<Susie> Jun: Links are much improved
<Susie> Jun: In telcon, discovered missing links
<Susie> Jun: Now data sets are well linked together
<Susie> Jun: Sometimes need to use indirect links
<Susie> Jun: Want to find out if indirect linking is sufficient for the use case
<Susie> Jun: If not, we will need to work out how to overcome the problem
<Susie> Jun: Kei suggested analyzing the quality of links in terms of numbers and quality
<Susie> Kei: Anja and Jun provided good updates
<Susie> Kei: Could you describe examples of direct and indirect links
<Susie> Kei: Are links generated manually or automatically
<Susie> Anja: Covers indirect links
<Susie> Anja: DrugBank and Dailymed link to DBPedia
<Susie> Anja: Did do some sameAs links between Dailymed and DrugBank
<Susie> Anja: Could put in links, but it would be redundant
<matthias_samwald> (just had to dial in again)
<Susie> Jun: No direct link between Dailymed and Slidder about drugs yet
<Susie> Jun: There are links through diseases
<Susie> Anja: Think drugs may be linked too
<Susie> Jun: Have to go through other data sets to connect between DailyMed and Slidder on drugs
<Susie> Kei: For poster, nice to have figure that shows the nature of the links
<Susie> Susie: Nice to explore the nature of the links more
<Susie> Anja: Several data sets covering drug, diseases, target
<Susie> Anja: Sometimes have to go through existing link sets to get new connections
<Susie> Anja: What's most important is identifying which data set would be most important for the entity of interest
<Susie> Anja: Will look into this after the iTriplification paper submission
<Susie> Kei: Difficult to identify easy links
<Susie> Kei: Often have to do sub-string match
<Susie> Kei: Not same identifiers, so proliferation of URI, which can have performance issues
<Susie> Kei: We are working on best practices for defining links
<Susie> Kei: Write up either for iTriplification challenge, or a IG Note
<Susie> Matthias; iTriplification is mainly looking for new data to be made available
<Susie> Anja: Need to be innovative too
<Susie> Matthias: Need some description as to how this was achieved
<Susie> Kei: What I hear is that best practices and data generation would work
<Susie> Jun: Think best practices should be written simply and would be very valuable
<Susie> Susie: Best Practices could be well suited to a IG note
<Susie> Bosse: Do you use ATC codes for linking data about drugs
<Susie> Anja: Sometimes ATC, sometimes name in combination with other identifiers
<Susie> Anja: Prioritizing data sets helps
<Susie> Bosse: ATC codes are quite standardized
<Susie> Bosse: ATC codes are from WHO
<Susie> Bosse: They are also international, so overcome problem of different drug names in different countries
<Susie> Bosse: Most drugs currently aren't DBPedia
<Susie> Anja: Looked at DBPedia because no ATC codes in Dailymed or LinkedCT
<Susie> Bosse: Will try to provide some input
<Susie> Bosse: Had same challenges in LLD
<matthias_samwald> Deadline for triplificatin challenge is August 9
<Susie> Kei: What's the deadline for the deadline for iTriplification
<Susie> Susie: August 9
<Susie> Anja: Update on STITCH
<AnjaJentzsch> http://www4.wiwiss.fu-berlin.de/stitch/
<Susie> Anja: Have been creating a schema for STITCH and have uploaded the data
<Susie> Anja: STITCH covers chemicals and proteins, and their interactions
<Susie> Anja: STITCH also covers many species
<Susie> Anja: Whole data set is about 4GB
<Susie> Anja: So have just done one drug so far
<Susie> Anja: STITCH site is missing protein names
<Susie> Anja: STITCH everything else is covered well
<Susie> Anja: Don't know how to talk about the score
<AnjaJentzsch> http://stitch.embl.de/cgi/show_network_section.pl?identifier=-3386&network_depth=1&required_score=400&...
<Susie> Anja: Score is value for how chemicals interact
<Susie> Anja: Worried about posting it with D2R as performance is slow
<Susie> Anja: for the SPARQL endpoint
<Susie> Anja: Might therefore post it into Virtuoso
<Susie> Anja: Matthias could load it into Virtuoso
<Susie> Anja: Chris isn't convinced that aTag are ideal for D2R
<Susie> Anja: I don't completely understand what the problem is
<Susie> Anja: You could send him an email
<Susie> Matthias: Not sure it's necessary to create D2R extension to do that
<Susie> Matthias: It would just need a mapping file
<Susie> Matthias: Should work with standard D2R?
<Susie> Anja: Extension would be needed as they don't render the page
<Susie> Anja: Don't think it's much work, but need to convince Chris of the value
<Susie> Anja: Next steps?
<Susie> Matthias: Nice to get RDF out of STITCH in aTag format
<Susie> Matthias: Might not be necessary to create RDFa, just RDF should work
<Susie> Anja: Looking at possible links from STITCH to other data sets
<Susie> Anja: Not sure if proteins in STITCH can be linked to targets, etc
<Susie> Susie: How are protein names covered?
<Susie> Anja: Only SwissProt or Ensembl identifiers
<Susie> Anja: I don't know the quality
<Susie> Anja: Most Bio2RDF endpoints are down most of the time
<Susie> Susie: Maybe we should provide SwissProt
<Susie> Matthias: It's part of the main release
<Susie> Matthias: But no SPARQL endpoint
<Susie> Susie: We could ask EBI if they could provide a SPARQL endpoint
<Susie> Bosse: Could we use UniProt from LinkedLife Data
<Susie> Bosse: Couldn't we include their SPARQL endpoint?
<Susie> Matthias: Is it included?
<matthias_samwald> by the way, here is an example of a RDF file about a protein in Uniprot: http://www.uniprot.org/uniprot/Q62165.rdf
<matthias_samwald> http://purl.uniprot.org/uniprot/Q62165
<Susie> Matthias: It has a file per protein
<Susie> Matthias: I don't know if it responds to content negotiation
<AnjaJentzsch> Bosse H. Andersson: UniProt is in LLD if it's helps: http://www.linkedlifedata.com/workbench/repository/overview.view?id=owlim
<Susie> Matthias: We should try thatr
<Susie> Matthias: Just tried it, and it does respond to content negotiation
<Susie> Matthias: Can get URI for each protein record
<Susie> Anja: Can begin to look at connecting STITCH to other data
<Susie> Anja: And get feedback from others
<Susie> Anja: I'll send out a wiki link afterwards
<Susie> Bosse: Look up service for drug codes
<Susie> Bosse: Will have to look into it more
<matthias_samwald> I just confirmed that the Linked Life Data SPARQL eindpoint contains triples about http://purl.uniprot.org/uniprot/Q62165
<matthias_samwald> so the endpoint gives us access to the official Uniprot RDF via SPARQL
<Susie> Bosse: Doesn't include the country where they were available
<Susie> Susie: Lets have a push on the use case
<Susie> Susie: Will schedule a meeting with Bosse for next week
<Susie> Susie: DILS Poster
<Susie> Jun: Talked about this a bit on Monday
<Susie> Jun: Connect TCM to LODD
<Susie> Jun: Need to move TCM to somewhere that is more performant
<Susie> Jun: We'll do a poster and demo
<Susie> Jun: So performance is important
<Susie> Jun: Given the time frame, I'm not going to be too ambitious about the content
<Susie> Jun: Describe complementary data that we can use that is accessible through SPARQL endpoints
<Susie> Jun: Interesting use cases would be good
<Susie> Jun: Have results from one disease, but it'd be good to get data from more diseases
<Susie> Jun: Show build application quickly, and getting complimentary data
<Susie> Jun: See how much analysis data I can get on interlinking
<Susie> Jun: Have been in touch with Oktie, who has sent useful literature
<Susie> Jun: Need to finalize the application
<Susie> Kei: Have you tried loading TCM into HCLS KB for performance testing
<Susie> Jun: Haven't done formal release of TCM yet
<Susie> Jun: Just need to send data to Matthias
<Susie> Jun: Links to TCM will happen with DBpedia
<Susie> Jun: Performance should then improve much faster
<Susie> Anja: We could ask Adrian also to put the data into his store
<Susie> Anja: For performance comparison
<Susie> Matthias: Also in touch with Neurocommons
<Susie> Jun: May also try Talis platform, as I have an account, and can get free storage if data is open
<Susie> Jun: I don't want to do too much for the DILS poster, as we also have the iTriplification paper
<Susie> Jun: Will send out draft during the 2nd week of July
<Susie> Jun: Will send to printers the week after that
<Susie> Kei: Audience tends to have a computer science background