W3C

HCLS IG

26 Jun 2008

See also: IRC log

Attendees

Present
+0186528aaaa, +1.703.740.aabb, Scott_Marshall, +1.781.273.aacc, Kerstin_Forsberg, +0186528aadd, Kingsley_Idehen, +0186528aagg, marco, mscottm, +1.212.937.aaii, Chimezie_Ogbuji, +7.233.aakk, cthompso, Kei_Cheung, +0493516aall, mt, andreasplendiani, Don_Doherty, AdrianP, Huajun, Vipul_Kashyap, ericP, mbevil, zenhack, kidehen
Regrets
Chair
Scott Marshall
Scribe
chimezie

Contents


bootup

<andreasplendiani> +29928aaff is andreasplendiani

<tslater> ted slater is at the 314-274-xxxx number

<andreasplendiani> Hi, sorry I've skipped the "voice presentation", I'm both on call and irc

<matthiassamwald> Hi everyone, I'm sitting in a meeting, so I CAN ONLY LISTEN, not talk.

<tslater> correct spelling of david's name is Nwokeabia

<kidehen> kidehen is "Kingsley_Idehen"

<jzhao> jun zhao

<mscottm> like this: Zakim, number is ircname

<mbevil> ZAKIM 215.652.2134 is mbevil

<kidehen> zakim: +1 617 273 0900 is kidehen

<jzhao> Zakim +0186528aagg is jzhao

<kidehen> zakim: + 781 273 0900 is kidehen

<kidehen> zakim +1 781 273 0900 is kidehen

<AdrianP> Can you please paste the URL to the slides into the IRC

HCLS presentation

ericP: On slide 1: Drug discovery

ericP: The first question is "Find me genes involved in signal transduction that are related to pyramidal neurons."

scribe: this question demonstrates the value of using semantic web representation

<tslater> ok, Ted Slater is on the phone at the +1.314.274.aabb number, on irc as tslater, but i don't see me showing up in the phone list

scribe: if you attempt to ask the question via google, you get several irrelevant results
... on slide 4 (we ask this question of pubmed)

ericP: the problem is that the question is still not structured. Now what if we want to ask a structured query against a database with the relevant data

scribe: the goal is to integrate the databases (go to slide 6)
... on slide 7: the goal is to integrate the data sources such that the query consults the 4 relevant databases (journal articles, p. neurons, signal transduction and their components, ...)

ericP: on slide8: we reduce the information to triples and store them in a DB as triples

on slide 9: we see examples of results (that were vetted)

ericP: see slide 10: We began with a 'good-enough model': simple, relationships and classifications

ericP: unification of key terms was crucial for answering the questions

scribe: there were 2 W3C notes that were the result of this work

-> http://www.w3.org/2001/sw/hcls/notes/kb/ A Prototype Knowledge Base for the Life Sciences

-> http://www.w3.org/TR/hcls-senselab/ Experiences with the conversion of SenseLab databases to RDF/OWL

kei: quick modeling was done using RDF, how does this differ from using OWL?

ericP: The data is writen in RDF, some of it is described in OWL, however, the OWL modeling was not done upfront

ericP: the Gene ontology is an example of where OWL was used to describe classes of protein

COI presentation

<ericP> http://tinyurl.com/55ozfp

vipul: the methodology has been to function in a consensus manner

scribe: there was a significant social engineering component
... the driving force is consensus not technology

vipul: the first decision was the usecase to develop for the prototype

scribe: various usecases were discussed and the decision was made to focus on patient recruitment

vipul: a main assumptin was that the data was in an EMR in order to demonstrate interoperability

vipul: how do we reuse clinical data for research?

scribe: HL7 RIM can be viewed as a format for clinical practice data
... CDISC can be viewed as a format for clinical research questions

vipul: suppose we had a mapping module, how can it be made flexible to encorporate changes in these formats

vipul: there was a lot of debate with no answers

scribe: we used GALEN, CPR, helen's eligibility criteria ontology, SDTM, HL7 RIM, etc..
... the conclusion was that no one ontology or information model was likely to fit the bill
... so the goal was to align with existing terminologies but address inadequacies in these source terminologies

vipul: the goal is to provide constructive feedback to the originating standards bodies

vipul: the goal of extensibility is still a target

vipul: we want to demonstrate re-use

scribe: patient recruitment against EMR is a demonstration of re-use of clinical data for research questions
... we want to show that even though data is coded via SNOMED-CT how can queries expressed in NCI thesaurus terms be invoked on the source data to answer questions?

vipul: two of the seed models: SDTM & HL7 RIM

vipul: we are mostly interested in re-using the mapping for other scenarios.

scribe: for example: consider hl7:substanceAdminstration and sdtm:concommitantMedication
... we would like to reuse these terms across various scenarios

vipul: decision 3: this was a key decision - We implemented teh POC using a real world dataset instead of a synthesized dataset

scribe: what would be an appropriate seed information model for the data we use?
... We are very thankful to Parsai for providing the source data

vipul: the information model mapping is the core of the effort

kei: is the seed data publicaly available?

vipul: in the POC, the data is anonymized

scribe: there is some demographic data, however
... in the real world, we will need to address privacy and encryption issues
... we are transforming the queries and not the source data formats

<ericP> we are pushing the query all the way into the database

vipul: the W3C is clearly interested in working with external standards organizations

scribe: there is an implication about w3C being content neutral
... this doesn't mean that gaps/inadequacies will be represented as is
... the goal is to address these inadequecies and give feedback to the standard bodies

vipul: Bron Kisler (from CDISC) is interested in such a collaboration given our content neutrality

vipul: (the architecture) the mapping module transforms queries (in two places) in order to align the queries with the source data

vipul: ericP is experimenting with SPASQL query engine for purposes of the POC implementation

vipul: we made an assumption that terminology in clinical trials and clinical practice are different

scribe: the technology choice is: SPARQL and N3 rules (we prefer N3 rules)

vipul: there are issues with the mapping between concepts related to medication and medication classifications

scribe: the usecase is such that the criteria is specified, this is translated to HL7 RIM, this is translated to the underlying SQL-like language, then component 6 identifies patients relevant to the query
... there is an open question about the expressivity of SPARQL vs. N3 rules

<AdrianP> declarative rules vs. SPARQL, ... e.g. no real rule chaining in SPARQL

<vipul> http://esw.w3.org/topic/HCLS/ClinicalObservationsInteroperability/SPARQL8.html

ericP: we have some work to do yet on the query translation component

vipul: a couple of months remain on this task

<vipul> http://esw.w3.org/topic/HCLS/ClinicalObservationsInteroperability/InterOntologyMapping.html

Translational Medicine

ericP: Translational medicine is important for determining which medications are relevant at the point of care based on the evidence of controlled research

scribe: the task is rather similar to computer-assisted diagnoses

<matthiassamwald> what you describe sounds more like "personalized medicine".

ericP: the interest in Pharmaceutical Companies is how to improve the effectiveness of their medications from the research done

mscotm: okay

mscottm: translation medicine is more 'bench-to-bedside' where we are attempting to connect two disparate areas: clinical practice / data and biological research

scribe: personalized medicine is one application within this larger realm

HCLS Projects

mscottm: ericP was interested in using the wiki to solicit feedback on other projects to consider

scribe: esp. now we have new members with new datasets and new problems relevant to this interest group
... is there anyone out there working on a particular project relevant to translation to RDF/OWL?

<AdrianP> http://esw.w3.org/topic/HCLSIG/Project_Ideas

mscottm: there was a recent project suggestion we can use as a strawman

<ericP> http://www.w3.org/mid/486217F4.6020205@inf.unibz.it

<mscottm> http://www.w3.org/mid/486217F4.6020205@inf.unibz.it

<andreasplendiani> nope

<andreasplendiani> access restricted

mscottm: (reads out the idea sent to member-restricted archive)

scribe: Reading through the meeting minutes, KB description, and proposals for

the KB enhancements, I was wondering if you think there may be interest

in extending the technology-part with "ontology-based data access"

ericP: adding the project to the Project_Ideas page

mscottm: This would be a project under the KB enhancement section

ack, vipul

vipul: if we suggest technology extensions we should include a description of the ROI

<mscottm> vipul: when suggesting a particular technology for a project, one should explain the value of that particular tech to the user

<ericP> http://esw.w3.org/topic/HCLSIG/Project_Ideas

<ericP> KB++ OWL Ontology queries over RDBs

-> http://esw.w3.org/topic/HCLSIG/Project_Ideas/OWL-RDB "OWL-RDB"

This latter link is where we are filling out the project idea

kei: perhaps we need categories of projects?

scribe: other categories: terminological frameworks, use-case driven (specific to domains), drug discovery, diseases, traditional chinese medicine

<marco> I am afraid I have to leave you. Thank you for the interesting meeting. I will disconnect silently to not disturb too much. Goodbye!

<mscottm> bye marco

marco: take care

<ericP> http://esw.w3.org/topic/HCLSIG/Project_Ideas/OWL-RDB

One value proposition / ROI of this project idea could be the ability to reuse organized, existing ontologies as a way to query more idiosyncratic data sources

scribe: this would allow researchers to query the KB using consensus terminology

kei: perhaps we need footnotes, etc. to better explain the projects

AdrianP: are we moving to Semantic Media Wiki?

ericP: yes we are

<AdrianP> great

<kidehen> ericP: do ping me re. Semantic MediaWiki etc. in relation to Virtuoso hosting etc.. when you have a moment

kei: one proposal: Extending the senselab project to add more content to existing data

scribe: considering a translational medicine component
... linking this to the clinical side

<zenhack> Marc Hadfield / nic zenhack / 212-217-9956

<mbevil> Zackim, +1.215.652.aaee is me

<zenhack> zakim +1.212.217.aamm isme

<kidehen> zakim +1 781 273 0900 is me

<kidehen> zakim +1781273.aahh is me

<mscottm> http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.countrycallingcodes.com%2FReverse-Lookup.php%3Fcalling-code%3D86&ei=nMNjSKOZFIqewgH1rNHIDw&usg=AFQjCNH7WjQca7vwPs0Djt_KjWUBg76lrA&sig2=zI4sJHmlmt8677wzHBP2aA

<AdrianP> bye

<kidehen> Zakim who is here

<andreasplendiani> bye

<chimezie> I unfortunately, have a hard-cutt off @ 12:30 EST PM and thus will have to sign off

<mscottm> bye and thanks chimezie

np, take care :)

<mscottm> ericP: it is useful to use membership list to manage the mailing list BUT it remains open for public comment. So, non-members can still request subscription by sending mail to eric Prud'hommeaux <eric@w3.org>

<mscottm> Also, we are open to Invited Expert applications

<mscottm> you can apply for Invited Expert status here: http://www.w3.org/2002/09/wbs/1/ieapp/

<mscottm> bye all

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.133 (CVS log)
$Date: 2008/06/27 16:01:49 $