HCLSIG/LODD/Meetings/2010-02-17 Conference Call

Conference Details

Date of Call: Wednesday February 17, 2010
Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
Dial-In #: +1.617.761.6200 (Cambridge, MA)
Dial-In #: +33.4.89.06.34.99 (Nice, France)
Dial-In #: +44.117.370.6152 (Bristol, UK)
Participant Access Code: 4257 ("HCLS").
IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
Duration: ~1h
Convener: Susie

Agenda

LinkedLifeData - Vassil
Data updates - All
FDA terms - Oktie & colleague
Mapping experimental data - Susie
Outreach (TCM, Bio-Ontologies SIG, ACS) - All
AOB

Minutes

Attendees: Anja, Susie, Kei, Jun, Vassil, Bosse

<Susie> Vassil present on LinkedLifeData

<Susie> http://esw.w3.org/topic/HCLSIG/LODD/Meetings/2010-02-17_Conference_Call

<Susie> LLD is a semantic data integration platform

<Susie> Aim is to aggregate data across data sources

<Susie> It's an EU project, with participation from AZ, and Ontotext

<Susie> LLD is a warehouse

<Susie> Need to clean up linked data from public files

<Susie> Uses NLP to bridge gap between text and linked data

<Susie> Follow LD principles

<Susie> Currently includes 4 billion statements

<Susie> Was 5 million statements, but the data wasn't all useful

<Susie> Includes representation of entrez gene

<Susie> Also includes proteins, and interactions

<Susie> Use the LODD data sources

<Anja> Hi everyone, I have to head off to an appointment. There are no data updates on my side. But I will have to look into STITCH since Matthias mentioned some problems with the protein URIs.

<Susie> Very happy that LODD is publishing information

<Susie> OK. Thanks for the heads up

<Anja> If there were any other problems which occured during the Tokyo Hackathon let me know

<Susie> Also includes documents, including PubMed and clinical trials data

<Susie> Can perform fast queries

<Susie> Data integration is a complex process with lots of ETL scripts

<Susie> Can be licensing restrictions

<Susie> Also worked to transform OBO ontologies to SKOS

<Susie> Preserve original RDF structure where possible

<Susie> Want URIs to be resolvable

<Susie> Interested in tracing back the provenance

<jun> what kind of provenance do you want to track?

<Susie> Provenance focuses on identify the release from which all statements come from

<Susie> Capture graph name, number of statements

<Susie> Want to use a standard vocab for the provenance such as voiD

<Susie> Jun: You may also want to look at the provenance ontology

<Susie> Most interested in the software engineer process

<Susie> Would be happy to collaborate on using standard vocabularies

<Susie> Use an extended RDF model

<Susie> Possible to associate additional URIs with statements

<Susie> Allows addition of information relating to release numbers

<Susie> This isn't part of the RDF standard

<Susie> Expected if put all information together then it will be linked and eassy to query

<Susie> Found this isn't the case with 20 data sources

<jun> http://purl.org/net/provenance/ to find out more about the Provenance Vocabulary

<Susie> Many data sources aren't properly linked

<Susie> Shows example where Biopax and uniprot aren't linked

<Susie> Either need to use complex substring functions to query the information

<Susie> Or need a smarter way to connect data and create predicates

<Susie> One part of work was to work on generating the links

<Susie> Identified 6 linked data integration patterns

<Susie> define meta-rules to connect resources with various predicates

<Susie> This is research work

<Susie> The work is manually controlled

<Susie> GO is a common data resources

<Susie> Everyone using their own GO URIs

<Susie> Need to say when links refer to the same thing

<Susie> Try to generate mapping between entrez gene, etc to enable query

<Susie> Show query relating to asthma

<Susie> And example relating to genes with known molecular interactions which are analysed with transfection

<Susie> And select all participating human genes which are drug targets and analysed with transfection

<Susie> Shows results in table and with faceted navigation

<Susie> Free and publicly available

<Susie> Worked on UI

<Susie> Believe it can scale to 20 billion statements

<Susie> Limitation is the availability of data

<Susie> OWLIM is proprietary to ontotext

<Susie> Data model extended with named graphs, provenance, etc.

<Susie> But also need mechanisms to enable data updates

<Susie> Have transformed pubmed, UMLS

<Susie> Happy to provide tools

<Susie> Not certain about licensing implications, so don't focus on converting data

<Susie> Where we have converted data we gained permission first

<Susie> Happy to distribute the links that we create

<Susie> Lots of possibilities as to where to go next, but are still exploring options

<Susie> Especially interested in integration of textual data

<Susie> Want to improve the user interface

<ericP> Susie, MCF just set up a page for the LOD track at WWW1020

<ericP> http://esw.w3.org/topic/LODCampW3CTrack would like from LODD, 1/ discussion topics, 2/ entertaining LTs, 3/ demos, 4/ who would attend the LOD track

<ericP> do you want to add to the agenda? (no urgency, just mentioning it)

<Susie> Clear synergy between LD and LODD

<Susie> LD can focus on provision of standard tools

<Susie> Need solid interface for the warehouse if it is to become a service

<Susie> Ontext provides tools and warehouse as a service

<Susie> Bosse: lots of overlaps between groups

<Susie> Bosse: Licensing is a challenge for both

<Susie> Bosse: UI is a challenge for both

<Susie> Bosse: Data quality is also an issue in both