W3C HCLS – Pharmacogenomics

21 Jun 2012

Michel Dumontier


<michel> michel asks for a summary of last meeting

last CDS call

<michel> bobF matthias talked about his medical code project

-> http://www.w3.org/2012/06/14-hcls-minutes minutes

-> http://safety-code.org/ safety code interface

<michel> http://bio2rdf.semanticscience.org:8006/describe/?url=http%3A%2F%2Fbio2rdf.org%2Fpharmgkb_resource%3Aassociation_PA451906_PA27829&sid=100


michel: DBSNP contains >12M SNPS. we're focusing on those in PharmGKB (about 4400)
... that's more than the 400 matthias is using, but only a subset are interesting
... i will recieve certain info from DBSNP

<michel> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=2032582

michel: issue is which info...
... info includes the species, build from which it was derived,

BobF: overlap with previous DBSNP extract?

michel: i don't know how to keep that resource up to date
... wanted to have a lightweight approach to focus on SNPs of interest
... parsing all of DBSNP has been problematic

<BobF> eutils from NCBI

michel: using EUtils to retrived records from NCBI

BobF: that project was a snapshot to do prototype dev
... i made a high-level pass through the relevent attributes in DBSNP
... worth reviewing for re-application here
... attributes of interest:

<michel> rsid, chromosome file, snp class, validated?, alleles, symbol, locus_id, mrna_aac, fxn-class, allele, prot_acc, frame, residue

<michel> 4 tables - rs, val, snp, loc

val - current validation of a SNP

michel: what attributes do we prioritize?

BobF: not the volatile attributes, just IDs and higher-level annotations
... base location on a genomic contig or all SSIDs for a SNP are in the realm of DBSNP
... those won't have many motivating use cases

<michel> eutils url - http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=2032582&retmode=xml

<michel> on examining the dbsnp record, bobF suggests : rsid, build, validation status, variation class, alleles, chromosome #, associated genes (geneid + symbol), mRNA + proteins + their (functional) changes, linked ncbi resources (unigene, omim, pdb)

<michel> we examined the population genetics - an initial summary of the allelic frequency is discounted due to the highly variable populations from which they were drawn

<michel> bobF suggests that we examine and limit population studies to those that are well regarded (hapmap, pharmgkb, others) with good numbers and good descriptions of the populations

<michel> bobF notes that (and michel confirms) that population information is not directly linked to the dbsnp entry; will have to get at it another way - perhaps there is another eutils service for it

<bavya> how can we get those 4400 snps present in pharmgkb

<michel> michel to write parser to obtained selected fields from eutils service for 4400 annotated snps in pharmgkb

<michel> that list is available here: http://bio2rdf.semanticscience.org:8006/sparql/?default-graph-uri=&query=SELECT+distinct+%3Fv%0D%0AWHERE+%7B+%0D%0A+%3Fx+a+%3Chttp%3A%2F%2Fbio2rdf.org%2Fpharmgkb_vocabulary%3AAssociation%3E+.%0D%0A+%3Fx+%3Chttp%3A%2F%2Fbio2rdf.org%2Fpharmgkb_vocabulary%3Avariant%3E+%3Fv%0D%0A%7D%0D%0A&format=text%2Fhtml&timeout=0&debug=on

<bavya> can i know which fields were used to select these SNPA

<bavya> *SNPs

<michel> this is a sparql query on our endpoint containing pharmgkb snp annotations

<michel> http://bio2rdf.semanticscience.org:8006/sparql/

<michel> SELECT distinct ?v WHERE { ?x a <http://bio2rdf.org/pharmgkb_vocabulary:Association> . ?x <http://bio2rdf.org/pharmgkb_vocabulary:variant> ?v }

<bavya> I got it . thank you

<michel> the eutils service, then with one example snp: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=2032582&retmode=xml

