HCLSIG/LODD/Meetings/2009-06-24 Conference Call

Conference Details

Date of Call: Wednesday June 24, 2009
Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
Dial-In #: +1.617.761.6200 (Cambridge, MA)
Dial-In #: +33.4.89.06.34.99 (Nice, France)
Dial-In #: +44.117.370.6152 (Bristol, UK)
Participant Access Code: 4257 ("HCLS").
IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
Duration: ~1h
Convener: Susie

Agenda

Progress on TCM & Slidder - Anja, Jun
Progress on STITCH - Matthias, Anja
Pharma use case - Bosse, Susie
DILS Poster - Jun
iTriplification Challenge - Anja
AOB

Minutes

Attendees: Anja, Jun, Kei, Matthias, Susie, Bosse

Apologies: Oktie, Elof

<Susie> Anja: Have been generating links from TCM to other data sets

<Susie> Anja: Had some D2R limitations

<Susie> Anja: Was sending pre-generated links to JUn

<Susie> Anja: D2R 3.3 release today or tomorrow

<Susie> Anja: Anja, Kei, Jun found some missing links for Anja to generate

<Susie> Jun: Noticed D2R doesn't give complete results

<Susie> Jun: Tried to verify results from D2R manually

<Susie> Jun: Have made important progress

<Susie> Jun: Links are much improved

<Susie> Jun: In telcon, discovered missing links

<Susie> Jun: Now data sets are well linked together

<Susie> Jun: Sometimes need to use indirect links

<Susie> Jun: Want to find out if indirect linking is sufficient for the use case

<Susie> Jun: If not, we will need to work out how to overcome the problem

<Susie> Jun: Kei suggested analyzing the quality of links in terms of numbers and quality

<Susie> Kei: Anja and Jun provided good updates

<Susie> Kei: Could you describe examples of direct and indirect links

<Susie> Kei: Are links generated manually or automatically

<Susie> Anja: Covers indirect links

<Susie> Anja: DrugBank and Dailymed link to DBPedia

<Susie> Anja: Did do some sameAs links between Dailymed and DrugBank

<Susie> Anja: Could put in links, but it would be redundant

<matthias_samwald> (just had to dial in again)

<Susie> Jun: No direct link between Dailymed and Slidder about drugs yet

<Susie> Jun: There are links through diseases

<Susie> Anja: Think drugs may be linked too

<Susie> Jun: Have to go through other data sets to connect between DailyMed and Slidder on drugs

<Susie> Kei: For poster, nice to have figure that shows the nature of the links

<Susie> Susie: Nice to explore the nature of the links more

<Susie> Anja: Several data sets covering drug, diseases, target

<Susie> Anja: Sometimes have to go through existing link sets to get new connections

<Susie> Anja: What's most important is identifying which data set would be most important for the entity of interest

<Susie> Anja: Will look into this after the iTriplification paper submission

<Susie> Kei: Difficult to identify easy links

<Susie> Kei: Often have to do sub-string match

<Susie> Kei: Not same identifiers, so proliferation of URI, which can have performance issues

<Susie> Kei: We are working on best practices for defining links

<Susie> Kei: Write up either for iTriplification challenge, or a IG Note

<Susie> Matthias; iTriplification is mainly looking for new data to be made available

<Susie> Anja: Need to be innovative too

<Susie> Matthias: Need some description as to how this was achieved

<Susie> Kei: What I hear is that best practices and data generation would work

<Susie> Jun: Think best practices should be written simply and would be very valuable

<Susie> Susie: Best Practices could be well suited to a IG note

<Susie> Bosse: Do you use ATC codes for linking data about drugs

<Susie> Anja: Sometimes ATC, sometimes name in combination with other identifiers

<Susie> Anja: Prioritizing data sets helps

<Susie> Bosse: ATC codes are quite standardized

<Susie> Bosse: ATC codes are from WHO

<Susie> Bosse: They are also international, so overcome problem of different drug names in different countries

<Susie> Bosse: Most drugs currently aren't DBPedia

<Susie> Anja: Looked at DBPedia because no ATC codes in Dailymed or LinkedCT

<Susie> Bosse: Will try to provide some input

<Susie> Bosse: Had same challenges in LLD

<matthias_samwald> Deadline for triplificatin challenge is August 9

<Susie> Kei: What's the deadline for the deadline for iTriplification

<Susie> Susie: August 9

<Susie> Anja: Update on STITCH

<AnjaJentzsch> http://www4.wiwiss.fu-berlin.de/stitch/

<Susie> Anja: Have been creating a schema for STITCH and have uploaded the data

<Susie> Anja: STITCH covers chemicals and proteins, and their interactions

<Susie> Anja: STITCH also covers many species

<Susie> Anja: Whole data set is about 4GB

<Susie> Anja: So have just done one drug so far

<Susie> Anja: STITCH site is missing protein names

<Susie> Anja: STITCH everything else is covered well

<Susie> Anja: Don't know how to talk about the score

<AnjaJentzsch> http://stitch.embl.de/cgi/show_network_section.pl?identifier=-3386&network_depth=1&required_score=400&...

<Susie> Anja: Score is value for how chemicals interact

<Susie> Anja: Worried about posting it with D2R as performance is slow

<Susie> Anja: for the SPARQL endpoint

<Susie> Anja: Might therefore post it into Virtuoso

<Susie> Anja: Matthias could load it into Virtuoso

<Susie> Anja: Chris isn't convinced that aTag are ideal for D2R

<Susie> Anja: I don't completely understand what the problem is

<Susie> Anja: You could send him an email

<Susie> Matthias: Not sure it's necessary to create D2R extension to do that

<Susie> Matthias: It would just need a mapping file

<Susie> Matthias: Should work with standard D2R?

<Susie> Anja: Extension would be needed as they don't render the page

<Susie> Anja: Don't think it's much work, but need to convince Chris of the value

<Susie> Anja: Next steps?

<Susie> Matthias: Nice to get RDF out of STITCH in aTag format

<Susie> Matthias: Might not be necessary to create RDFa, just RDF should work

<Susie> Anja: Looking at possible links from STITCH to other data sets

<Susie> Anja: Not sure if proteins in STITCH can be linked to targets, etc

<Susie> Susie: How are protein names covered?

<Susie> Anja: Only SwissProt or Ensembl identifiers

<Susie> Anja: I don't know the quality

<Susie> Anja: Most Bio2RDF endpoints are down most of the time

<Susie> Susie: Maybe we should provide SwissProt

<Susie> Matthias: It's part of the main release

<Susie> Matthias: But no SPARQL endpoint

<Susie> Susie: We could ask EBI if they could provide a SPARQL endpoint

<Susie> Bosse: Could we use UniProt from LinkedLife Data

<Susie> Bosse: Couldn't we include their SPARQL endpoint?

<Susie> Matthias: Is it included?

<matthias_samwald> by the way, here is an example of a RDF file about a protein in Uniprot: http://www.uniprot.org/uniprot/Q62165.rdf

<matthias_samwald> http://purl.uniprot.org/uniprot/Q62165

<Susie> Matthias: It has a file per protein

<Susie> Matthias: I don't know if it responds to content negotiation

<AnjaJentzsch> Bosse H. Andersson: UniProt is in LLD if it's helps: http://www.linkedlifedata.com/workbench/repository/overview.view?id=owlim

<Susie> Matthias: We should try thatr

<Susie> Matthias: Just tried it, and it does respond to content negotiation

<Susie> Matthias: Can get URI for each protein record

<Susie> Anja: Can begin to look at connecting STITCH to other data

<Susie> Anja: And get feedback from others

<Susie> Anja: I'll send out a wiki link afterwards

<Susie> Bosse: Look up service for drug codes

<Susie> Bosse: Will have to look into it more

<matthias_samwald> I just confirmed that the Linked Life Data SPARQL eindpoint contains triples about http://purl.uniprot.org/uniprot/Q62165

<matthias_samwald> so the endpoint gives us access to the official Uniprot RDF via SPARQL

<Susie> Bosse: Doesn't include the country where they were available

<Susie> Susie: Lets have a push on the use case

<Susie> Susie: Will schedule a meeting with Bosse for next week

<Susie> Susie: DILS Poster

<Susie> Jun: Talked about this a bit on Monday

<Susie> Jun: Connect TCM to LODD

<Susie> Jun: Need to move TCM to somewhere that is more performant

<Susie> Jun: We'll do a poster and demo

<Susie> Jun: So performance is important

<Susie> Jun: Given the time frame, I'm not going to be too ambitious about the content

<Susie> Jun: Describe complementary data that we can use that is accessible through SPARQL endpoints

<Susie> Jun: Interesting use cases would be good

<Susie> Jun: Have results from one disease, but it'd be good to get data from more diseases

<Susie> Jun: Show build application quickly, and getting complimentary data

<Susie> Jun: See how much analysis data I can get on interlinking

<Susie> Jun: Have been in touch with Oktie, who has sent useful literature

<Susie> Jun: Need to finalize the application

<Susie> Kei: Have you tried loading TCM into HCLS KB for performance testing

<Susie> Jun: Haven't done formal release of TCM yet

<Susie> Jun: Just need to send data to Matthias

<Susie> Jun: Links to TCM will happen with DBpedia

<Susie> Jun: Performance should then improve much faster

<Susie> Anja: We could ask Adrian also to put the data into his store

<Susie> Anja: For performance comparison

<Susie> Matthias: Also in touch with Neurocommons

<Susie> Jun: May also try Talis platform, as I have an account, and can get free storage if data is open

<Susie> Jun: I don't want to do too much for the DILS poster, as we also have the iTriplification paper

<Susie> Jun: Will send out draft during the 2nd week of July

<Susie> Jun: Will send to printers the week after that

<Susie> Kei: Audience tends to have a computer science background