W3C HCLS – Clinical Decision Support -- 11 Oct 2012

<michel> scribenick: Michel

AMIA submission

matthias: put down ideas on a google doc

<bobP> hey, i can scribe

https://docs.google.com/file/d/0ByGT-vnkGcoLSkhzVDRYd2x6bkU/edit

<bobP> scribenick: bobP

Matthias: Suggestion by Bob Freimuth to look into immunogenetics
... AMIA page is limited to one page, tried to squeeze
... feedback +1 from Michel
... (reading... :)
... some PGx related developments in G-group
... we should move these discussions to hcls

Michel: +1
... also +1 on one-pager for AMIA
... "very well done"

Michel, Matthias discussing .doc commenting on G-drive

BobF: "This is well written" and plenty of material
... do we have a comprehensive dataset that can be mined
... confirm, that we have an example, w/o convering everything?

Matthias: We have converted clincally relevent subsets
... for PGx, focus on (?) for genes
... plus PharmaGKB clinically relevant SNPs
... plus Michel has a selection for conversion to RDF

Michel: SNPs have been converted.

BobF: Focus is large :)

Michel: We can provide data for SNPs and phenotypic datatypes
... would like to get reasoning going over these data

BobF: Given this scope, wording is +1

Matthias: Will do minor edits w feedback and submit

BobF: Thanks!

Matthias: Agenda: Potential use case of immunogenetics
... IGx might indeed be an addional use case w PGx
... IGx small number of genes 3-10 from human leukocyte antigen genes
... few, but high variety of sequences
... almost every nucleotide can vary
... BobF can we infer alleles?
... still looking; would be interesting to see how our approach could work here

BobF: IGx came to me from NMDP Nat'l Marrow Donor Program
... central for bone marrow and HLA typing, 1M patients/year
... tech challenge w HLA is interesting, less than a dozen, pseudogenes etc, are designed to be highly variable
... tremendous genetic variability
... throw this on everyone's radar
... post-translational mod, regulation, how they dimerize on cell surface
... translplant scenerio has to account for all of this
... NMDP came to Mayo for data modeling, w flexibility and manage thru time
... one of issues is that seq tech is improving, moved into whole geneome, next-gen
... impact on def of alleles becomes now quite complicated
... no longer talking about one or two wrt ref seq
... but dozens of variants
... the *allele nomenclature will not scale to this challenge
... now faced w data integration w patients w missing, old data
... so open world assumption may fill this
... algorithms for matches may need this
... IGx suited in many ways for semweb. How to proceed?

Scott: Harkening back, part of project in IGx, French group doing xml
... group had lots of interesting IGx xml
... data are threre, but how to harvest as semweb
... should we start a KB from this legacy?

BobF: Novel alleles are being discovered going forward
... challenges in enumerating alleles: can have indel events and other re-arrangements that make things *really* complicated
... canonical approaches will not cover the space
... emphasize that challenge comes from next-gen seq, novel variants at extremely large scale
... how to store + go back and interpret in a phenotypically relevant way
... NMDP has invited me down to workshop
... dedicated to developing datastandards for next-gen
... lots of vendors, there will be an ongoing effort by the group

<matthias_samwald> example of a full HLA sequence http://www.ebi.ac.uk/cgi-bin/imgt/hla/get_allele.cgi?B*15:15

BobF: scope of standards, from metadata, to more intricate governance over data changing monthly
... lots of the right people and stakeholders, gravitating to traditional approaches :{

Michel: An opportunity :)

BobF: We could have a dramatic impact on how this community functions.

bobP: (asking the right question is the most important step :)

<BobF> http://www.ebi.ac.uk/imgt/hla/ambig.html

Matthias: Seq above 1089 nucleotides for a single allele

BobF: This is an underestimate, since that's for coding sequence only
... there is intronic plus (?) se
... avg is 5K seq per gene, accounting for this other content

<BobF> intronic + flanking sequence

Matthias: Think of as a seq of fixed length?
... should look into insertions and deletions also
... looking, all of seqs have same length

BobF: Yes, generally true for class 1, but class 2 genes have a different structure
... true generally for open reading frames, but class 1 and class 2 differ in all this, different proteins

Scott: Given huge amount of variance, exp class 2, can we use SNPs here??
... have a sense that HLA has so much variation, would be hard to establish a reference population for SNP analysis

BobF: (not an expert, but) There are relatively finite number (not nec small) of alleles that are inherited thru generations
... other sites are novel variants that occur sporadically, can be inherited
... they are designed to be highly variable
... mutated quite rapidly

Matthias: But there is dbSNP data for these HLA genes

<BobF> http://hla.alleles.org/alleles/index.html

BobF: Yes, but dbSNP not authoritative for HLA
... > 8100 alleles in these databases, but not always haplotypes so the combinations will explode rapidly

<BobF> http://www.ebi.ac.uk/imgt/hla/ambig.html

BobF: page for ambiguous calls

<BobF> B*07:02:01G+B*07:05:03 vs B*07:02:11+B*07:05:01G

BobF: string indicates a given patient genotype, two diff haplotype combinations

<BobF> B*07:67N

BobF: but no way to tell the difference, so nomenclature is fudged as above
... trying to adopt nomenclature that will account for this, but running out of ways to lexically describe all this

Michel: Why are haplotypes ambiguous?

<matthias_samwald> link describing the naming: http://hla.alleles.org/nomenclature/naming.html

BobF: Gets back to phases. Take first set of allele defined on one strand, other allele on other strand
... ambiguous to tell from only genotype. Has implications for what gets expressed on the cell surface
... we cannot completely disambiguate the alleles yet

(Michel and BobF going deep)

BobF: Classical phasing problem is whether variant is on one strand of the other
... one, two, three polymorphs, can work out the phasing
... but will large numbers it becomes very difficult to disambiguate

Michel: king of getting it

(looking up the 1000-word figure)

(kind of getting it :)

Matthias: Timeline?

BobF: Group agrees that they need data standard
... have expressed that semweb, inferencing tech for allele calls might be a way to start
... month or so to frame, then feed them interim solutions over a few months
... pilot solutions 1Q13

Matthias: New use case before PGx is done?
... scalability of owl reasoners. IGx seems to be an order of magnitude greater than PGx

BobF: Solving this would solve PGx too. HLA is the king of all genetic loci in this regard.

- DRAFT -

W3C HCLS – Clinical Decision Support

11 Oct 2012

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output