HCLS -- 27 Aug 2007

<scribe> ACTION: Susie to ask Johnathan what he needs for TAG access [recorded in http://www.w3.org/2007/08/27-BioRDF-minutes.html#action01]

ericP: not sure what more direct access there could be

Demo extensions - Alan

Use cases - All

Documentation of SenseLab conversion - Kei

Kei: we have created a wiki page documenting the senslab conversion

<matthiassamwald> http://esw.w3.org/topic/HCLS/Senselab_Conversion

Kei: we converted NeuronDB to RDF and later to OWL
... in this, we learned ontology design features
... was focused on the NeuronDB database structure
... matthiassamwald joined the team and helped convert to a more generic OWL structure
... that data was used in the Banff demo
... contents of Senelab's native DB change from time to time
... working out how to reflect in RDF
... considering two-step approach:
... - syntactic conversion to RDF
... ... automated
... - semantic conversion to OWL
... ... needs human intervention
... .
... had a meeting with some HCLS folks about conversion process
... want to make sure we follow best practices and that we track the demo ontology changes

Susie: what's the goal, W3C note?

Kei: not sure. this is just an initial version
... want feedback from more people
... suggestions?

Susie: given that you are grappling with modeling/data/ontology changes, can you call it finished in say, a month?
... and then release future [coherent] versions

Kei: seems reasonable

ericP: let me know if you'd like to publish it as a SPARQL interface to your database

Kei: may be easier with a complete RDF dump
... but some folks may wish to access the DB directly [with queries]

Susie: senselab is in Oracle?
... could publish in MySQL and use mapping stuff we're working on [SPASQL]

Kei: yes, interested in working on mapping

Demo extensions - Alan

Leeet is a Semantic Web application that allows rapid and intuitive creation, editing and querying of Semantic Web content and annotations.
— http://neuroscientific.net/leeet/
http://kaukoluwiki.opendfki.de/wiki/TripleStore

Don: not on extension yet; still working on installation
... ran into glitch on loading DBs
... we have Virtuoso installed
... running into problem with perl script
... AlanR is helping

Susie: any progress on the poster?

Don: will be ready to talk about poster in about two weeks
... want to dive into matthiassamwald's poster and target to a neuroscience audience

matthiassamwald: will upload my demo to a server and send a pointer to public-semweb-lifesci

Susie: AlanR is working with DERI to install DB

alanr: DERI has machine back up
... not sure if they've installed Virtuoso
... considering hiring someone to write install scripts
... my schedule should be more calm now
... expect progress in next couple weeks

Susie: hosted at MIT in the interim?

alanr: yes. dunno if we will always host it

Susie: EricN is working on UI
... we considered working with UI experts, but seems we don't want to do that now

alanr: I have an idea for a UI; am looking for an implementor
... idea: wiki page with queries
... ... fill in a form to tailor the queries on the wiki
... ... and an interface to add specific predicates and structures for say, MESH
... .

ScottM: very interested, but don't want to commit to time

matthiassamwald: I will be starting in DERI in october/november

alanr: would like an auto-completer like for google search in firefox

[ discussion of related libraries, including leeet and a Sesame-tailored completion engine ]

Susie: noting everyone is on vacation, any progress on data conversion?

alanr: nope

<matthiassamwald> [Leeet] features an autocomplete mechanism based on Sparql queries.

alanr: talked to a fellow from EBI who is interested in expression data
... Marco Brandisi (SP?)

Susie: would be interesting to work on DrugDB. will prod people who volunteered
... may be some folks at Lilly who will want to contribute

<alanr> 1) Representing the information about the samples, experiment, protocols leading to the hybridization, technical aspects of the hybridization, etc.

<alanr> 2) Representing what the computed intensity of the spots on an array, as well as how those were computed (e.g. MAS5, rma, d-chip, etc)

<alanr> 3) Representing which genes are thought to be relatively highly expressed by interpreting the intensity of the spots as amount of expression of certain genes.

Use cases - All

DanielR: was interested in a use case involving images
... want to work with extending images with semantic annotations

susie: we are working on Use Cases in SWEO

DanielR: there was a discussion of a mammagraph use case

ScottM: I have been working on a mammography study in the netherlands

DanielR: expect NCI-backed standard for annotation medical images on the web
... controlled terminologies where possiblem for SNOMED, ... , something for regions

[scribe distracted -- missed stuff]

alanr: some work on annotations on Alan Brain Atlas
... there are existing region taxonomies
... another connection: Bijan Parsia said he'd be working on spatial reasoning
... (above, near...)

text mining

<mscottm> http://www.biosemantics.org/index.php?page=anni-2-0

ScottM: looked at Annie
... nice handling of synonyms for say, protiens
... once you pick a URI system, you will have Biologists who use their own names for, say, protein or gene
... you need a tool to manage the mapping

<scribe> ... done internally in text mining systems

UNKNOWN_SPEAKER: perhaps we can re-use unique concept identifier techniques from text mining systems
... my group provides web services for text mining packages; does not try text mining packages itself
... albert shuman has a nice overview of different systems
... UIMA framework came up
... migrated from IBM to apache
... makes text mining sysems more inter-operable
... noticed a corpus for huntingtons

alanr: working on extraction of named entities (diseases, phenotypes, ... whatever) and interactions
... some results from geneways (SP?)
... still pretty noisy
... all in PDF -- coding to convert to HTML

ScottM: Lucine uses PDF format

alanr: Lucine treats the document as a bag of words -- scrambling the order won't change the results
... believe HTML is the easiest to work with

<matthiassamwald> Dietrich Rebholz-Schuhmann

Susie: can you share Rebholz's tutorial?

ScottM: sure -- it's on-line

alanr: matthiassamwald wrote some related code

matthiassamwald: you can give it a pubmed identifier or query and you get back a list of annotated abstracts

<matthiassamwald> http://whatizit.neurocommons.org

<alanr> http://svn.neurocommons.org/svn/trunk/nlp/soc_textmining/

ScottM: advantage of using web services is that you can point at a service as the provenance of a piece of extracted data

HCLS

27 Aug 2007

Attendees

Contents

Demo extensions - Alan

Use cases - All

Documentation of SenseLab conversion - Kei

Demo extensions - Alan

Use cases - All

text mining

Summary of Action Items

Scribe.perl diagnostic output