W3C HCLS – Clinical Pharmacogenomics

06 Mar 2013

See also: IRC log




<egombocz> 510-705 is me, egombocz

<michel> janos: started rxnorm project as part of lodd in 2010

<michel> ... rxnorm is an NLM project, need to sign UMLS license agreement to download and work with it

<michel> ... contains proprietary sources, but ways to filter out

<michel> ... and get only the public domain content

<michel> ... Susie Stephens had contacted Olivier Bodenreider - anything at level 0 is public

<michel> ... never released the entire dump of files - part in license that you can make the whole thing available at once

<michel> ... created a front end to it

<michel> ... last update in 2011

<michel> ... NLM has released a subset of RxNorm - called current prescribable content

<jhajagos> http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html

<jhajagos> http://www.w3.org/wiki/HCLSIG/LODD/Data

<jhajagos> http://link.informatics.stonybrook.edu/rxnorm/

<michel> janos: without license agreement required to be signed - easy to generate RDF

<michel> ... wants to update the toolchain

<jhajagos> http://link.informatics.stonybrook.edu/rxnorm/RXAUI/2994963

<michel> michel: can we explore the data a bit

<jhajagos> http://link.informatics.stonybrook.edu/rxnorm/RXCUI/153892

michel: URIs could be replaced with labels where available
... i also notice that some of the links do not have labels at all

janos: right now it is quite close to the original data

michel: RXnorm has formulation, ingredients -- are there other relationships?

janos: ingredients, product labels, trade names, a hierarchy of types

michel: any information regarding the function?

janos: they link to the VA drug file, and this has information on indications and contraindications
... that data might not be in the public domain

michel: i think the therapeutic application areas are what we would be most interested in.

<michel> http://bio2rdf.org/ndc:54569-4605-0

janos: for some of the drugs with links to DrugBank, that could give such information

michel: in the latest Bio2RDF release we included NDCs, this could be used for linking RxNorm.

(NDC = National Drug Code)

<jhajagos> http://link.informatics.stonybrook.edu/sparql/

<jhajagos> http://nlm.nih.gov/research/umls/rxnorm/

janos: first link is the SPARQL endpoint, second link is the name of the graph in the endpoint containing RxNorm data.

<michel> types and labels : http://sparqlbin.com/#ed8e6610c703076a944e3bf3ace1c712

<michel> y label http://link.informatics.stonybrook.edu/rxnorm/RXCUI RXCUI: is a unique concept identifier http://link.informatics.stonybrook.edu/rxnorm/SAB SAB is the source vocabulary http://link.informatics.stonybrook.edu/rxnorm/TTY TTY (term type) http://link.informatics.stonybrook.edu/rxnorm/RXAUI RXAUI: identifies a string down to its source

michel: i would have expected types (classes, property types)

janos: they are typed, in the UMLS way -- typing is done in a specific vocabulary

michel: i would like to have some feedback on this approach to RDFizing
... we are also facing this decision in Bio2RDF a lot -- either reflecting source data as unaltered as possible, or re-interpreting the data to make it more useful to the RDF world (e.g., restructuring, adding classes)

janos: i would be afraid of misinterpreting something.

<michel> matthias: the problem of reinterpreting the data is that you could get things wrong, and it could be more difficult to keep those data updated - mappings might need to be revisited

<michel> ... maybe some middle ground; janos could maintain this generic representation, and perhaps add some more content with some simple rules

<michel> ... rdfs:label, dc:title, simple class typing

<michel> ... opt for both options to some degree

<michel> janos: maybe maintain each in different graphs?

michel: i think it is an opportunity for us -- not multiple values -- it is possible to translate one graph to another graph

(back on the call)

<michel> michel: sparql construct of rxnorm into "linked data + rdfs reasoning" friendly

<michel> ... dealing semantic relationships in umls

michel: had conversation with BioPortal team, who have RxNorm -- they refuse to add any semantics, i think you cannot even navigate the tree well
... RxNorm is part of a family of datasets, representation-wise
... seeing how to better reflect that in RDF is interesting

<egombocz> Agree with this; BioPortal's version is also from 3/7/2011 so it's old too. We really should get this into a more useful representation

erich: BioPortal version is a pain indeed. It is also out of date.
... that is not just issue of RxNorm, we have interconnectivity issues in lots of UMLS-based environments.

alistair: janos, do you plan on using VoiD for dataset metadata? i sugget using it.

janos: yes, i will look into it.

alistair: i also suggest the PAV vocabulary for provenance

<agray> PAV: provenance versioning and authoring ontology http://purl.org/pav/

michel: okay, we need a workplan

janos: i can make N3 files available
... about representation and publishing, we need to discuss as a group. in which namespace should result be published?
... Bio2RDF namespace?

michel: we use Bio2RDF namespace for the datasets we convert.
... generally, URIs are generated based on specific rules we defined

<michel> http://www.slideshare.net/micheldumontier/bio2rdf-release-2-improved-coverage-interoperability-and-provenance-of-linked-data-for-the-life-sciences

michel: details are shown in these slides

janos: original files are in RRF format. then the data is important into SQL database, then scripts are doing transformations based on the SQL database
... i plan to rewrite scripts so they run without SQL server
... scripts are in Python

michel: ideally for Bio2RDF would be PHP, but other languages are also okay if they are well-commented

janos: i plan to have this updated by April 1.

michel: we will consolidate our provenance models (see bio2rdf wiki)
... in April we will know better about which provenance representation would be best.

<agray> It is the open phacts and bio2rdf provenacne models which are being consolidated

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.137 (CVS log)
$Date: 2013-03-06 16:13:33 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.137  of Date: 2012/09/20 20:19:01  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/PAD/PAV/
No ScribeNick specified.  Guessing ScribeNick: matthias_samwald
Inferring Scribes: matthias_samwald

WARNING: No "Topic:" lines found.

WARNING: No "Present: ... " found!
Possibly Present: BobF IPcaller P14 P9 PAV Sal TallTed Tony aaaa aabb aacc agray alistair egombocz egonw_ ericP erich janos jhajagos matthias michel
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy

Got date from IRC log name: 06 Mar 2013
Guessing minutes URL: http://www.w3.org/2013/03/06-HCLS-minutes.html
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

[End of scribe.perl diagnostic output]