HCLSIG BioRDF Subgroup/Meetings/2009-06-08 Conference Call

From W3C Wiki

Conference Details

  • Date of Call: Monday June 8, 2009
  • Time of Call: 11:00 am Eastern Time
  • Dial-In #: +1.617.761.6200 (Cambridge, MA)
  • Dial-In #: +33.4.89.06.34.99 (Nice, France)
  • Dial-In #: +44.117.370.6152 (Bristol, UK)
  • Participant Access Code: 4257 ("HCLS")
  • IRC Channel: irc.w3.org port 6665 channel #hcls (see W3C IRC page for details, or see Web IRC)
  • Duration: ~1 hour
  • Frequency: bi-weekly
  • Convener: Kei Cheung
  • Scribe: Lena Deus and Eric Prud'hommeaux

Attendees

Satya Sahoo, Olivier Bodenreider, Scott Marshall, Lena Deus, Jun Zhao, Kei Cheung, Eric Prud'hommeaux, Rob Frost

Regrets

Matthias Samwald

Agenda

  • Introduction and Roll Call (Kei)
  • Provenance/workflow presentation (Satya) MS powerpoint slideshow PDF version
  • Image data (Rob)
  • SPARQL control access (Eric, Lena)
  • Shared name -- pathway use case (Eric, Scott, Lena)
  • AIDA (Scott)
  • TCM data (Jun)
  • Atags (Matthias)

Minutes

<kei> introduction and going through the agenda

<kei> start with Satya's presentation on provenance

<kei> update on HCLS KB's (Matthias emailed an update to the group>

<kei> Rob will report on looking for image data in KB's

<kei> incorporating control access in sparql

<LenaDeus> Kei: security is a concern on the semantic web

<kei> Jun will give a brief update on tcm data

<kei> Scott will give an update on AIDA

<kei> Matthias emailed the group an update on aTags

<LenaDeus> Satya presents his slides: http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-06-08_Conference_Call

<LenaDeus> Topic: Provenence research

<LenaDeus> How can provenence be queried efficientelly for different applications?

<LenaDeus> Example use case: Try to understand the tasks that mediate a gene and a cloned sample

<LenaDeus> provenance can be used to assess whether data is trustworthy

<LenaDeus> a provenance ontology is used to represent provenance

<LenaDeus> Provenir ontology - the goal was to establish the minimun set of classes to describe provenance but also to enable its extension

<LenaDeus> agent and data re not connected to data directly - they are instead connected through a "process" (see slides)

<LenaDeus> The provenir ontology has 2 main classes: data and paramenter

<LenaDeus> There is no differentiation between dataset and provenance information in the data store

<LenaDeus> As such, both provenance and data can be queried using the same mechanism

<LenaDeus> Provenance is classfied into 3 categories

<LenaDeus> (slide 9)

<LenaDeus> Provenance Metadata; Specific Dataset; Operations in the Provenance Metadata

<LenaDeus> If the data has some set of characteristic attributes, the queries can be oriented by those attributes

<LenaDeus> 4 query operators were defined (see slide 10)

<LenaDeus> A query engine has been implemented on Oracle 10g

<LenaDeus> the query engines was developed as a plugin

<LenaDeus> Query optimization was found necessary - the query was taking 5-6 days to be completed

<LenaDeus> Provenance information is, by definition, historic information - it can therefore be used for optimization of queries

<LenaDeus> Using this model, the query time was reduced to 5/6 seconds

<mscottm> I think that the answer to Kei's question is 5 or 6 days...?

<LenaDeus> Conclusion (slide 15): 1) A common model of provenance that can be re-used within collaborations;

<LenaDeus> 2) Decision making support by use of standard reasoning rules

<LenaDeus> 3) A provenance query engine

<LenaDeus> 4) Verification and validation of data via provennace

<kei> Lena: how provenance info is represented in RDF?

<kei> Satya: yes in RDF.

<kei> Kei: Is named graph used?

<Satya> no.

<LenaDeus> Kei: can provenance information be integrated into the query federation scenario?

<@ericP> q+ to talk about proof languages

Zakim sees ericP on the speaker queue

<mscottm> http://twiki.ipaw.info/bin/view/Challenge/ThirdProvenanceChallenge

<LenaDeus> (I have to leave in 5 min: can anyone take over scribbing, please :) )

<@ericP> satya: we published a workflow system in IEEE

<@ericP> ... many workflow-based systems miss provenance info

<@ericP> ack me

<Zakim> ericP, you wanted to talk about proof languages

Zakim sees no one on the speaker queue

<ssahoo2> Semantic Provenance workshop at ISWC 2009: http://wiki.knoesis.org/index.php/SWPM-2009

<kei> ericP, proof chain is needed to be shown when federating provenance data

<@ericP> mscottm: for federation, you can use provenance to inform the query choreography

<@ericP> satya: named graph can help me direct my queries

<ssahoo2> I agree, there was a paper in WWW2005 by Jeremy Carrol: http://www4.wiwiss.fu-berlin.de/bizer/pub/Carroll_etall-WWW2005.pdf

<ssahoo2> discussing provenance, named graph and trust

<@ericP> satya: your provenance may be data for me

<@ericP> ... provenance info varies by query and domain requirements

<@ericP> mscottm: example to help evaludate

<@ericP> ... we have a workflow which produces textmined protein interactions

<@ericP> ... at the VoID level, you could say "this has a list of protein pairs"

<@ericP> ... then the provenance info would tell you where that data came from

ssahoo2 (826c1c72@128.30.52.43) Quit (Quit: CGI:IRC (EOF)^o)

<@ericP> kei: need to exchange now and the next call

<@ericP> topic: image datasets

<@ericP> rob: was looking at alen brain image data

<rfrost> http://www.w3.org/TR/hcls-kb/#aba

<rfrost> http://neurocommons.org/page/Bundles/aba

<@ericP> ... there is a bundle from 2007 incorproated in the neurocommons db

<rfrost> graph <http://sw.neurocommons.org/2007/aba-20070226> { ?aba_gene_record aba:refersToSameGeneAs ?mouse_gene. ?aba_mouse_expression_record aba:measuresGeneIdentifiedWith ?aba_gene_record . ?aba_mouse_expression_record aba:hasSectionSeries ?section_series . ?section_series aba:hasSection ?section. ?section aba:hasImagePyramids

<@ericP> ABA properties:

<@ericP> aba:refersToSameGeneAs

<@ericP> aba:measuresGeneIdentifiedWith

<@ericP> aba:hasSectionSeries

<@ericP> aba:hasSection

<@ericP> aba:hasImagePyramids

<rfrost> http://neurocommons.org/page/RDF_library/All_relations

<@ericP> this corpus is now old

<rfrost> http://developingmouse.brain-map.org/docs/ReferenceAtlas.pdf

<@ericP> rfrost: with this, we can model development

<@ericP> kei: would image data serve as a good use case for query federation scenarios?

<@ericP> ... do these contain provenance info?

<rfrost> not certain

<@ericP> ... for instance, go from region to sequences and visa-versa

<@ericP> rfrost: ABA offers web apis accessing image data by region or by gene

<@ericP> ericP: i think there was some value added to what's offered in the AB web api (image processing) in the neurocommons data

<@ericP> topic: DILS 09 abstract

<@ericP> junzhao: i've been logging @@1 into our SPARQL endpoing

<@ericP> ... grabbing herbs, clinical trials,

<@ericP> s/paper/poster/

<@ericP> junzhao: need input from matthias -- some "related" data is positive, other negative

<@ericP> ... would like 1 week before 22 july

<@ericP> ... so finish implementation work by end of june and start poster production beginning of july