See also: IRC log
<michel> scribenick: Michel
AMIA submission
matthias: put down ideas on a google doc
<bobP> hey, i can scribe
https://docs.google.com/file/d/0ByGT-vnkGcoLSkhzVDRYd2x6bkU/edit
<bobP> scribenick: bobP
Matthias: Suggestion by Bob
Freimuth to look into immunogenetics
... AMIA page is limited to one page, tried to squeeze
... feedback +1 from Michel
... (reading... :)
... some PGx related developments in G-group
... we should move these discussions to hcls
Michel: +1
... also +1 on one-pager for AMIA
... "very well done"
Michel, Matthias discussing .doc commenting on G-drive
BobF: "This is well written" and
plenty of material
... do we have a comprehensive dataset that can be mined
... confirm, that we have an example, w/o convering
everything?
Matthias: We have converted
clincally relevent subsets
... for PGx, focus on (?) for genes
... plus PharmaGKB clinically relevant SNPs
... plus Michel has a selection for conversion to RDF
Michel: SNPs have been converted.
BobF: Focus is large :)
Michel: We can provide data for
SNPs and phenotypic datatypes
... would like to get reasoning going over these data
BobF: Given this scope, wording is +1
Matthias: Will do minor edits w feedback and submit
BobF: Thanks!
Matthias: Agenda: Potential use
case of immunogenetics
... IGx might indeed be an addional use case w PGx
... IGx small number of genes 3-10 from human leukocyte antigen
genes
... few, but high variety of sequences
... almost every nucleotide can vary
... BobF can we infer alleles?
... still looking; would be interesting to see how our approach
could work here
BobF: IGx came to me from NMDP
Nat'l Marrow Donor Program
... central for bone marrow and HLA typing, 1M
patients/year
... tech challenge w HLA is interesting, less than a dozen,
pseudogenes etc, are designed to be highly variable
... tremendous genetic variability
... throw this on everyone's radar
... post-translational mod, regulation, how they dimerize on
cell surface
... translplant scenerio has to account for all of this
... NMDP came to Mayo for data modeling, w flexibility and
manage thru time
... one of issues is that seq tech is improving, moved into
whole geneome, next-gen
... impact on def of alleles becomes now quite
complicated
... no longer talking about one or two wrt ref seq
... but dozens of variants
... the *allele nomenclature will not scale to this
challenge
... now faced w data integration w patients w missing, old
data
... so open world assumption may fill this
... algorithms for matches may need this
... IGx suited in many ways for semweb. How to proceed?
Scott: Harkening back, part of
project in IGx, French group doing xml
... group had lots of interesting IGx xml
... data are threre, but how to harvest as semweb
... should we start a KB from this legacy?
BobF: Novel alleles are being
discovered going forward
... challenges in enumerating alleles: can have indel events
and other re-arrangements that make things *really*
complicated
... canonical approaches will not cover the space
... emphasize that challenge comes from next-gen seq, novel
variants at extremely large scale
... how to store + go back and interpret in a phenotypically
relevant way
... NMDP has invited me down to workshop
... dedicated to developing datastandards for next-gen
... lots of vendors, there will be an ongoing effort by the
group
<matthias_samwald> example of a full HLA sequence http://www.ebi.ac.uk/cgi-bin/imgt/hla/get_allele.cgi?B*15:15
BobF: scope of standards, from
metadata, to more intricate governance over data changing
monthly
... lots of the right people and stakeholders, gravitating to
traditional approaches :{
Michel: An opportunity :)
BobF: We could have a dramatic impact on how this community functions.
bobP: (asking the right question is the most important step :)
<BobF> http://www.ebi.ac.uk/imgt/hla/ambig.html
Matthias: Seq above 1089 nucleotides for a single allele
BobF: This is an underestimate,
since that's for coding sequence only
... there is intronic plus (?) se
... avg is 5K seq per gene, accounting for this other
content
<BobF> intronic + flanking sequence
Matthias: Think of as a seq of
fixed length?
... should look into insertions and deletions also
... looking, all of seqs have same length
BobF: Yes, generally true for
class 1, but class 2 genes have a different structure
... true generally for open reading frames, but class 1 and
class 2 differ in all this, different proteins
Scott: Given huge amount of
variance, exp class 2, can we use SNPs here??
... have a sense that HLA has so much variation, would be hard
to establish a reference population for SNP analysis
BobF: (not an expert, but) There
are relatively finite number (not nec small) of alleles that
are inherited thru generations
... other sites are novel variants that occur sporadically, can
be inherited
... they are designed to be highly variable
... mutated quite rapidly
Matthias: But there is dbSNP data for these HLA genes
<BobF> http://hla.alleles.org/alleles/index.html
BobF: Yes, but dbSNP not
authoritative for HLA
... > 8100 alleles in these databases, but not always
haplotypes so the combinations will explode rapidly
<BobF> http://www.ebi.ac.uk/imgt/hla/ambig.html
BobF: page for ambiguous calls
<BobF> B*07:02:01G+B*07:05:03 vs B*07:02:11+B*07:05:01G
BobF: string indicates a given patient genotype, two diff haplotype combinations
<BobF> B*07:67N
BobF: but no way to tell the
difference, so nomenclature is fudged as above
... trying to adopt nomenclature that will account for this,
but running out of ways to lexically describe all this
Michel: Why are haplotypes ambiguous?
<matthias_samwald> link describing the naming: http://hla.alleles.org/nomenclature/naming.html
BobF: Gets back to phases. Take
first set of allele defined on one strand, other allele on
other strand
... ambiguous to tell from only genotype. Has implications for
what gets expressed on the cell surface
... we cannot completely disambiguate the alleles yet
(Michel and BobF going deep)
BobF: Classical phasing problem
is whether variant is on one strand of the other
... one, two, three polymorphs, can work out the phasing
... but will large numbers it becomes very difficult to
disambiguate
Michel: king of getting it
(looking up the 1000-word figure)
(kind of getting it :)
Matthias: Timeline?
BobF: Group agrees that they need
data standard
... have expressed that semweb, inferencing tech for allele
calls might be a way to start
... month or so to frame, then feed them interim solutions over
a few months
... pilot solutions 1Q13
Matthias: New use case before PGx
is done?
... scalability of owl reasoners. IGx seems to be an order of
magnitude greater than PGx
BobF: Solving this would solve PGx too. HLA is the king of all genetic loci in this regard.
:)
<ericP> cheers all
This is scribe.perl Revision: 1.137 of Date: 2012/09/20 20:19:01 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Found ScribeNick: Michel Found ScribeNick: bobP Inferring Scribes: Michel, bobP Scribes: Michel, bobP ScribeNicks: Michel, bobP WARNING: No "Topic:" lines found. WARNING: No "Present: ... " found! Possibly Present: BobF Bob_Powers GVoice IPcaller MacTed Michel Scott Tony achille_z bobP ericP harryh https matthias matthias_samwald mscottm scribenick You can indicate people for the Present list like this: <dbooth> Present: dbooth jonathan mary <dbooth> Present+ amy Got date from IRC log name: 11 Oct 2012 Guessing minutes URL: http://www.w3.org/2012/10/11-hcls-minutes.html People with action items: WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report[End of scribe.perl diagnostic output]