W3C

- DRAFT -

SV_MEETING_TITLE

29 Nov 2010

See also: IRC log

Attendees

Present
Regrets
Chair
SV_MEETING_CHAIR
Scribe
matthias_samwald

Contents


indeed, eric.

(Hi Scott, I will be only listening today)

yes

<adriencoulet> I am +33.3.54.95

P29 is someone else

<mscottm> http://esw.w3.org/File:101129AdrienCouletBioRDF.PDF

<mscottm> http://esw.w3.org/HCLSIG_BioRDF_Subgroup/Meetings/2010/11-29_Conference_Call#Agenda

<ericP> http://esw.w3.org/images/d/dc/101129AdrienCouletBioRDF.PDF

<scribe> scribenick: matthias_samwald

adrien: slide 2
... this is joint work between NCBO and PharmGKB
... goal was improving content of PharmGKB by improving relationships etc.
... three main relationships in PharmGKB : Gene – Drug; Gene – Disease; Drug– Disease
... but people at PharmGKB are concered because relations in reality are not that simple
... slide 3
... six human curators are looking at literature.
... slide 4 -- we propose to have more detailed relationships. e.g. BAK1 gene polymorphism affects doxorubicin resistance -- Resistance to Doxorubicin is influenced by BAK1 variants. -- Doxorubicin induces BAK1 activity.
... we created an ontology, asked curators for help during ontology creation
... slide 7
... co-occurence detection has some limitations
... e.g., false positives
... we don't want to only have co-occurence, but we want to know exactly what entities were involved and what relationships.
... slide 10 -- example of the parsing we are doing
... slide 11: two superficially very different sentences can contain exactly the same content (because of synonyms)
... however, there was no dedicated ontology for pharmacogenomics, so we decided to create one.
... slide 13: we created the ontology semiautomatically based on the relations we extracted
... ontology was created bottom-up, based on most frequent words used in the categoreis "relationship", "gene", "drug", "phenotype".
... slide 21: we have raw entities that were mapped to normalized entities and relations
... e.g., "influences" becomes "affects"
... normalized relationships were encoded in RDF.
... slide 23: entities related to VKORC1 shown as a graph. (thickness of edges refers to the number of statements)

dietrich: some of the entities and relationships are quite complex
... i.e., you first define these concepts and that later on you try to find evidence for this concept?

adrien: 75% of the entites are in the initial ontology, 25% of the entities created directly from raw text
... slide 27: the resulting knowledge base is useful for the curation and knowledge summarization at PharmGKB
... Yael Garten (PhD student at Stanford) is also using it for knowledge discovery
... the SPARQL endpoint of the KB is at http://sparql.bioontology.org/webui/
... example queries are found on http://www.loria.fr/~coulet/material/sparql_queries
... (examples about relationships beween Parkinson's and UCH-L1 gene)
... connection to the linked data cloud: IDs from Entrez Gene, DrugBank, MeSH
... not connected to the linked data URIs at the moment, but that would be interesting future work

scott: is the SPARQL endpoint you gave already saving the results from Alzheimer's disease?

adrien: no, at the moment only Parkinson's
... but i can upload the one for Alzheimer's

scott: how will provenance be represented?

(discussion about provenance and named graphs)

scott: the ConceptWiki people have the notion of cardinal assertion, i.e., assertions that are formulated differently but have the same meaning

adrien: do they already have a set of triples with provenance?

scott: they have some mappings to RDF now. most of the assertions were created by people visiting the people and adding data about their favourite gene. but they also include text mining results.

dietrich: there are concepts, but there are also things that don't have a name / that are not concepts
... regarding provenance, there are two things: database and authorship provenance
... what we would like to see is provenance info that points to first evidence.

michael: adrien, what kind of evaluation have you done?

adrien: we asked: what is the content of the raw relations before normalisation? evaluated the quality of normalisation.
... using wordnet i created another ontology
... (normalisation was not very good, because wordnet was too general)

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2010/11/29 17:05:46 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.135  of Date: 2009/03/02 03:52:20  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Found ScribeNick: matthias_samwald
Inferring Scribes: matthias_samwald

WARNING: No "Topic:" lines found.


WARNING: No "Present: ... " found!
Possibly Present: P21 P29 RobFrost aaaa aabb aacc adrien adriencoulet dietrich ericP matthias_samwald michael mscottm scott scribenick
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy


WARNING: No meeting title found!
You should specify the meeting title like this:
<dbooth> Meeting: Weekly Baking Club Meeting


WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Got date from IRC log name: 29 Nov 2010
Guessing minutes URL: http://www.w3.org/2010/11/29-hcls-minutes.html
People with action items: 

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report


[End of scribe.perl diagnostic output]