HCLSIG/LODD/Meetings/2010-02-17 Conference Call
Conference Details
- Date of Call: Wednesday February 17, 2010
- Time of Call: 11:00am Eastern Daylight Time (EDT), 16:00 British Summer Time (BST), 17:00 Central European Time (CET)
- Dial-In #: +1.617.761.6200 (Cambridge, MA)
- Dial-In #: +33.4.89.06.34.99 (Nice, France)
- Dial-In #: +44.117.370.6152 (Bristol, UK)
- Participant Access Code: 4257 ("HCLS").
- IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page for details, or see Web IRC)
- Duration: ~1h
- Convener: Susie
Agenda
- LinkedLifeData - Vassil
- Data updates - All
- FDA terms - Oktie & colleague
- Mapping experimental data - Susie
- Outreach (TCM, Bio-Ontologies SIG, ACS) - All
- AOB
Minutes
Attendees: Anja, Susie, Kei, Jun, Vassil, Bosse
<Susie> Vassil present on LinkedLifeData
<Susie> http://esw.w3.org/topic/HCLSIG/LODD/Meetings/2010-02-17_Conference_Call
<Susie> LLD is a semantic data integration platform
<Susie> Aim is to aggregate data across data sources
<Susie> It's an EU project, with participation from AZ, and Ontotext
<Susie> LLD is a warehouse
<Susie> Need to clean up linked data from public files
<Susie> Uses NLP to bridge gap between text and linked data
<Susie> Follow LD principles
<Susie> Currently includes 4 billion statements
<Susie> Was 5 million statements, but the data wasn't all useful
<Susie> Includes representation of entrez gene
<Susie> Also includes proteins, and interactions
<Susie> Use the LODD data sources
<Anja> Hi everyone, I have to head off to an appointment. There are no data updates on my side. But I will have to look into STITCH since Matthias mentioned some problems with the protein URIs.
<Susie> Very happy that LODD is publishing information
<Susie> OK. Thanks for the heads up
<Anja> If there were any other problems which occured during the Tokyo Hackathon let me know
<Susie> Also includes documents, including PubMed and clinical trials data
<Susie> Can perform fast queries
<Susie> Data integration is a complex process with lots of ETL scripts
<Susie> Can be licensing restrictions
<Susie> Also worked to transform OBO ontologies to SKOS
<Susie> Preserve original RDF structure where possible
<Susie> Want URIs to be resolvable
<Susie> Interested in tracing back the provenance
<jun> what kind of provenance do you want to track?
<Susie> Provenance focuses on identify the release from which all statements come from
<Susie> Capture graph name, number of statements
<Susie> Want to use a standard vocab for the provenance such as voiD
<Susie> Jun: You may also want to look at the provenance ontology
<Susie> Most interested in the software engineer process
<Susie> Would be happy to collaborate on using standard vocabularies
<Susie> Use an extended RDF model
<Susie> Possible to associate additional URIs with statements
<Susie> Allows addition of information relating to release numbers
<Susie> This isn't part of the RDF standard
<Susie> Expected if put all information together then it will be linked and eassy to query
<Susie> Found this isn't the case with 20 data sources
<jun> http://purl.org/net/provenance/ to find out more about the Provenance Vocabulary
<Susie> Many data sources aren't properly linked
<Susie> Shows example where Biopax and uniprot aren't linked
<Susie> Either need to use complex substring functions to query the information
<Susie> Or need a smarter way to connect data and create predicates
<Susie> One part of work was to work on generating the links
<Susie> Identified 6 linked data integration patterns
<Susie> define meta-rules to connect resources with various predicates
<Susie> This is research work
<Susie> The work is manually controlled
<Susie> GO is a common data resources
<Susie> Everyone using their own GO URIs
<Susie> Need to say when links refer to the same thing
<Susie> Try to generate mapping between entrez gene, etc to enable query
<Susie> Show query relating to asthma
<Susie> And example relating to genes with known molecular interactions which are analysed with transfection
<Susie> And select all participating human genes which are drug targets and analysed with transfection
<Susie> Shows results in table and with faceted navigation
<Susie> Free and publicly available
<Susie> Worked on UI
<Susie> Believe it can scale to 20 billion statements
<Susie> Limitation is the availability of data
<Susie> OWLIM is proprietary to ontotext
<Susie> Data model extended with named graphs, provenance, etc.
<Susie> But also need mechanisms to enable data updates
<Susie> Have transformed pubmed, UMLS
<Susie> Happy to provide tools
<Susie> Not certain about licensing implications, so don't focus on converting data
<Susie> Where we have converted data we gained permission first
<Susie> Happy to distribute the links that we create
<Susie> Lots of possibilities as to where to go next, but are still exploring options
<Susie> Especially interested in integration of textual data
<Susie> Want to improve the user interface
<ericP> Susie, MCF just set up a page for the LOD track at WWW1020
<ericP> http://esw.w3.org/topic/LODCampW3CTrack would like from LODD, 1/ discussion topics, 2/ entertaining LTs, 3/ demos, 4/ who would attend the LOD track
<ericP> do you want to add to the agenda? (no urgency, just mentioning it)
<Susie> Clear synergy between LD and LODD
<Susie> LD can focus on provision of standard tools
<Susie> Need solid interface for the warehouse if it is to become a service
<Susie> Ontext provides tools and warehouse as a service
<Susie> Bosse: lots of overlaps between groups
<Susie> Bosse: Licensing is a challenge for both
<Susie> Bosse: UI is a challenge for both
<Susie> Bosse: Data quality is also an issue in both