HCLS/OntologyTaskForce/OboPhenotypeSyntaxExperiment

From W3C Wiki
Jump to: navigation, search
HCLS Home OTF Home Discussions Post to HCLS listserv Minutes Links

Representing experimental evidence and assertions about results from research articles using the OBO Phenotype Syntax

BB: 2007-02-13

Please Note: As this is an experiment worked on in available free time outside of normal daily commitments, additional input, guidance, and editorial advice is always welcome. Please pass it on either to Bill Bug (MailTo(William.Bug AT SPAMFREE drexelmed DOT edu)) or to Vipul Kashyap (MailTo(VKASHYAP1 AT SPAMFREE partners DOT org)).

GOAL

  1. Focus in on one of the microdomains related to one of the HCLS Use Cases (AD and/or PD)
  2. Distill a few formal assertions from each article in the relevant literature employing the formal OBO Phenotype Syntax, the OBO Ontology of Phenotypic Qualities, and the growing collection of OBO ontologies to cover the various, required semantic domains (e.g., neuroanatomy, cell types, molecular entities, pathways, etc.).
  3. Use these as concentrating semantic "seeds" - along with relevant models (e.g., LTP, LDP, etc.) described using shared distilled knowledge sources of various levels of semantic expressively (terminologies --> formal ontologies expressed using a common, community formalism) - to provide a means to catalyze the semantic correlation across BioRDF triplets and enrich the expressiveness of the resulting graphs, so as to support deriving much more immediately relevant correlations from SPARQL queries.
  4. Test the utility of this approach systematically in the following way:
    • a) Identify ~100 citations relevant to a specific, focussed topic (e.g., alpha-Synuclein relation to second messenger enzymes associated with molecular & cellular memory mechanisms in the hippocampus)
    • b) create formal OBO Phenotype Assertions according to combinations of the following conditions:
      1. 1, 3, 5, 10, or unlimited number of formal OBO phenotype statements for each article
      2. 2%, 5%, 15%, 25% of that literature corpus
  5. determine whether there is an 80:20-like break even point where minimal coverage of this literature corpus with these detailed, nuanced semantic assertions provides maximal value to achieve the seeding goal stated above.
  6. If this technique appears to be useful for one of the Use Cases (e.g., PD), repeat the approach for the other Use Case (e.g., AD), specifically choosing a microdomain where the underlying entities (e.g., molecules, pathways, brain regaions and other structural entities [e.g., Lewy Bodies]) overlap, so as to provide a specific semantic scaffold around which BioRDF data sources can be connected to help better illuminate how the pathoetiological causes and manifestations of these two neurodegenerative diseases relate to one another.

Value of pursuing this technique both to the HCLS efforts and to the neuroinformatics and more general bioinformatics communities at-large

  1. MOST IMPORTANTLY: This technique potentially provides a critical link between the BioRDF converted triplet stores to the existing formal, shared, semantic framework being developed and used by the bioinformatics community at-large and specifically sponsored both by the NIH-funded NCBC focussed on formal semantic frameworks for biomedical informatics - The National Center for Biomedical Ontology and the associated OBO Sourceforge and emerging OBO Foundry efforts
  2. The SPARQL queries run against the BioRDF data sets will essentially have two types of inter-relatedness they can exploit:
    • a) RDF versions of the data models will inter-relate entities as they are connected via the underlying data model they have previously been stored in. For example, if a database has info linking gene names to a particular location in the brain, this connection will be present in the triplet store, though the predicate may or may not be fully typed (e.g., the relation may not be semantically defined)
    • b) URI identity can also be used to draw links across data sets.
  In providing these Phenotype Assertions pulled form research articles fully typed inter-relations between disparate entities will be specified.  These typed semantic  relations will not be easily derivable via any other means - including enhanced string matching against the same research articles.  Having these in place will greatly enhance the correlative mining we can do on the BioRDF data sources using SPARQL.
  1. The goal of the Use Case work Don, Helen, June, Gwen, Elizabeth, and others have been involved in throughout 2006 has sought to identify both those data repositories that need to be converted to RDF - AND - to identify the "Distilled Knowledge Resources" (terminologies ---> fully-specified ontologies) that cover those domains. Since December, we have tried to focus in on microdomains within the use case(s) we know to require data that has already been converted by the BioRDF group. In this project, there has been an increasing focus on details extracted from the literature - abstracted to the level of general statements - e.g., "alpha-synuclein is associated with aggresome formation in post-mortem histology derived from PD-diagnosed". The problem is there is no way:
    • a) to reason or run SPARQL queries on those free-text assertions
    • b) to know exactly which pieces of experimental evidence they are based upon.
This experiment potentially provides a means to create a more formal, grounded semantic framework for the Use Case efforts - a means to host that work in manner that is algorithmically accessible to a SPARQL query engine.
  1. It uses a community-shared formal semantic framework - OBO-hosted ontologies - which are gradually being ported to the NCBO BioPortal. Many of these community resources come with significant lexical enhancements (e.g., synonyms) which can greatly extend their utility when trying to link URI terms across triplets.
  2. it employes a formalism to express experimental observations about biological reality - OBO Phenotype Syntax - being shared by the community, disseminated by the NCBO, and accumulated in their Open Biomedical Data (OBD) repository (see here and here for more information on OBD).
  3. Via OBD, the HCLS demo product (including the BioRDF converted data) can provide a foundation on which others in the neuroinformatics community can also build.
  4. Should this approach prove effective, it indicates the potential value OBO Phenotype Syntax can provide to the larger task of more broadly capturing - in a formal, semantically-parsable structure - experimental assertions from across the entire biomedical literature corpus.

The following NCBO pages provide additional info on their OBD efforts - what it is, how it fits into other community efforts, and what tools they are providing to make productive use of this repository:

You will note in the last two URLs, references to the use of SPARQL and D2RQ to create a GO annotation RDF endpoint. There is also a reference to the inspiration BioPAX-related research has provided to this GO-sponsored effort:

Real-World entities fitting into subsumptive taxonomic hierarchies

It is relatively straight-forward to map phenotype via the formal OBO Phenotype Syntax using the OBO Ontology of Phenotypic Qualities and the growing collection of OBO ontologies to cover the various, required semantic domains (e.g., neuroanatomy, cell types, molecular entities, pathways, etc.).

The core ontologies and related Distilled Knowledge Resources (DKR) have initially been specified primarily according to a subsumptive is_a hierarchy (with some being organized according to basic mereological principles [i.e., is_part_of or parthood relations] as well - e.g., GO). The goal is to focus on representing this fundemental semantic organizing framework laid out correctly before delving into greater relational complexity. The intension is for the more complex relations to be assembled later - some internal to these community domain ontologies, and others outside these core ontologies to express application-specific inter-relatedness.

The more complex interactions will ultimately consist of higher order graphs such as pathways and mereotopological graphs specifying adjacency, contiguity, containment, and long-range connectivity, the latter being particularly important in constructing a neuroanatomical ontology.

I would note the current OBO Phenotype Syntax can be easily expressed either via N3 (see below), RDF/XML, or OWL. It can be a little easier to construct assertions via this syntax, as it's possible to express rather complex relations without having to revert to complete, defined OWL-based ObjectProperties, though these assertions can certainly be expressed this way. The intension here is to start with the less complex OBO Phenotype Syntax and move on to representing this in OWL, as additional expressivity and reasoning capability is required.

Another important point to stress is tools exist both to create and the reason upon these Phenotype Syntax-bases assertions. Tools also exist to translate them into OWL, should that be necessary. This is also true for OBO Formated ontologies themselves. The details on the tools and how to use them effectively needs to be more fully investigated in cooperation with NCBO researchers. Certainly some of the elements discussion here are still in the development stages, but they are far enough along so as to practical to use for this experiment.

The following is a list of the core ontologies used to characterize the relevant biomedical entities. Those in OWL or OBO format are immediated usable in the OBO Phenotype Syntax. Those not available in this form yet are a bit problematic, but nontheless, necessary. I would note the OBO format does provide a means to link to other DKRs, and this may prove useful for making use of these other entities.

  • OWL & OBO ontologies:
Semantic Domain
Fundemantal Chemical
Proteins
Biopolymeric Sequence
Cellular Components, Molecular Funcation, Biological Process
Cell
Human Disease
Experiments & Procedures
Inherent Phenotypic Qualities
Relations
  • DKR not currently available in OWL & OBO:
Semantic Domain
Enzyme
Neuroanatomy

Test Case: A set of research articles making statements about these PD/PS associated entities

  1. Reduced ubiquitin C-terminal hydrolase-1 expression levels in dementia with Lewy bodies. M. Barrachina et al., 2006.
  2. Aggresome-related biogenesis of Lewy bodies.
  3. Aggresomes formed by alpha-synuclein and synphilin-1 are cytoprotective.
  4. Alpha-synuclein-enhanced green fluorescent protein fusion proteins form proteasome sensitive inclusions in primary neurons.
  5. Clinicogenetic study of mutations in LRRK2 exon 41 in Parkinson's disease patients from 18 countries.
  6. The LRRK2 I2012T, G2019S and I2020T mutations are not common in patients with essential tremor.
  7. Glucocerebrosidase mutations are not found in association with LRRK2 G2019S in subjects with parkinsonism.

The OBO Way to represent the relevant biomaterial entities

As mentioned above, the core ontology and DKS entities are represented primarily according to a subsumptive, is_a hierarchy. There is, however, an OBO syntax for representing more complex relations, when they are required to fully express an experiment assertion derived from a research article. As is shown below, this syntax has an equivalent representation RDF which here is shown using N3 Notation.

||||||<:> N3 || ||<:>OBO format||
||<)#FF00FF>:purkinje_cell ||<:#FF00FF> :is_a ||<(#FF00FF>:cns_neuron||<:>  == ||<:40%#00CCFF>CL:0000121^OBO_REL:is_a(CL:0000117) ||
||<)#FF99FF>:purkinje_cell ||<:#FF99FF> :located_in ||<(#FF99FF>:cerebellar_cortex||<:>  == ||<:40%#99FFFF>CL:0000121^OBO_REL:located_in(NN:0000644) ||
||<)#FF00FF>:purkinje_cell ||<:#FF00FF> :is_efferent_of ||<(#FF00FF>:cerebellar_cortex||<:>  == ||<:40%#00CCFF>CL:0000121^HCLS_REL:is_efferent_of(NN:0000644) ||
||||||<:> N3 || ||<:>OBO format||
||<)#FF00FF>:dopaminergic_neuron ||<:#FF00FF> :is_a ||<(#FF00FF>:neuron||<:>  == ||<:40%#00CCFF>CL:0000700^OBO_REL:is_a(CL:0000540) ||
||<)#FF99FF>:dopaminergic_neuron ||<:#FF99FF> :contains ||<(#FF99FF>:tyrosine_hydroxylase||<:>  == ||<:40%#99FFFF>CL:0000700^OBO_REL:contains(EC:1.14.16.2) ||
||<)#FF00FF>:l-dopa ||<:#FF00FF> :is_a_product_of ||<(#FF00FF>:tyrosine_hydroxylase||<:>  == ||<:40%#00CCFF>CHEBI:15765^HCLS_REL:is_a_product_of'(EC:1.14.16.2) ||
||<)#FF99FF>:l-dopa ||<:#FF99FF> :is_a_substrate_of ||<(#FF99FF>:aromatic_L_amino_acid_decarboxylase||<:>  == ||<:40%#99FFFF>CHEBI:15765^HCLS_REL:is_a_substrate_of'(EC:4.1.1.28) ||
||<)#FF00FF>:dopamine ||<:#FF00FF> :is_a_product_of ||<(#FF00FF>:aromatic_L_amino_acid_decarboxylase||<:>  == ||<:40%#00CCFF>CHEBI:18243^HCLS_REL:is_a_product_of'(EC:4.1.1.28) ||
||<)#FF99FF>:dopaminergic_neuron ||<:#FF99FF> :releases_neurotransmitter ||<(#FF99FF>:dopamine||<:>  == ||<:40%#99FFFF>CL:0000700^HCLS_REL:releases_neurotransmitter(CHEBI:18243) ||

NOTES:

  1. the OBO Relation Ontology does not have a is_efferent_of, is_a_product_of, is_a_substrate_of, releases_neurotransmitter relations yet, so they would need to be in the HCLS ontology.
  2. I assume the substrate and product relations would somehow draw on the BioPAX a:LEFT and a:RIGHT respectively, if BioPAX were used to represent these biochemical rxns.
  3. There are emerging ontologies including relations meant to capture nervous system-specific relations such as is_efferent_of & releases_neurotransmitter though they are not currently open. This is likely to change soon, but not in time for the HCLS demo. We should also check these against the SenseLab work Kei and colleagues have done.
  4. There is another pathway graph traversal leading to dopamine (Tyrosine --[aromatic_L_amino_acid_decarboxylase]--> Tyramine --[catechol-forming enzyme]--> Dopamine), but that is not the prevalent pathway thought to be critical to neurotransmission from dopaminergic terminals. I do believe this pathway is related to the dietary restrictions given with MAO inhibitors given for chronic depression. It's my understanding through mass action, eating foods high in Tyramine (cheeses & aged meats) can cause more DA & NE to build up when MAO enzymatic activity is low, leading to various positive iontropic cardiovascular events associated with beta-adrenergic stimulation.
  5. It would not be appropriate to use UniProt to reference tyrosine hydroxylase here, as UniProt would need to specify a species - e.g., human tyrosine hydroxylase or mouse tyrosine hydroxylase. The assertions here are about the universal, non-species-specific entity such as is represented in the IUBMB Enzyme Classification nomenclasture. You will note those two species-specific records in UniProt both reference this same EC number. This is one of the ways in which small molecules such as dopamine or nicotinamide differ from teh more complex polymers driven off the genome. Such small molecules can generally be considered of the same type regardless of the species, so long as they don't have a stereo-chemically specific variant across species that is biologically relevant - e.g., D-glucose or dextrose vs. L-glucose.

The OBO Phenotype Syntax + PATO Quality way to represent experimental observations/research statements/claims

BB: 2007-02-13

Below I take the research statements from some of the above cited articles and express them using the OBO Phenotype Syntax and using the PATO Ontology of Phenotypic Qualities. See the OBO Phenotype Syntax normal form listed below and the following OBO Phenotype links for more info on how to construct and interpret such OBO Phenotype Assertions:

I. M. Barrachina et al. Reduced ubiquitin C-terminal hydrolase-1 expression levels in dementia with Lewy bodies. 2006
  • A) Abstract:
  • B) Experiment Assertions:
    1. EA1:
      • a) description: "PCR assays, and Western blots demonstrated down-regulation of UCHL-1 mRNA and UCHL-1 protein in the cerebral cortex in DLB (either in pure forms, not associated with Alzheimer disease: AD, and in common forms, with accompanying AD changes), but not in PD, when compared with age-matched controls."
        1. sub-description: "UCHL-1 mRNA measured by PCR assay is decreased in the cerebral cortex in LBD patients, when compared with age-matched controls."
        2. sub-description: "UCHL-1 mRNA measured by PCR assay is decreased in the cerebral cortex in AD patients, when compared with age-matched controls."
        3. sub-description: "UCHL-1 mRNA measured by PCR assay is unchanged in the cerebral cortex in PD patients, when compared with age-matched controls."
        4. sub-description: "UCHL-1 protein measured by Western blots is decreased in the cerebral cortex in LBD patients, when compared with age-matched controls."
        5. sub-description: "UCHL-1 protein measured by Western blots is decreased in the cerebral cortex in AD patients, when compared with age-matched controls."
        6. sub-description: "UCHL-1 protein measured by Western blots is unchanged in the cerebral cortex in PD patients, when compared with age-matched controls."
    2. Phenotype Syntax Assertions:
      1. AD_cerebral_cortex_UCHL-1_mRNA_expression_phenotype (HCLS:000030) [see #OboPhenoAssert|below for a more detailed formal description of these phenotype assertions. What appears here can be considered simply the URI for a given assertion.]
      2. PD_cerebral_cortex_UCHL-1_mRNA_expression_phenotype (HCLS:000031)
      3. LBD_cerebral_cortex_UCHL-1_mRNA_expression_phenotype (HCLS:000032)
      4. AD_cerebral_cortex_UCHL-1_protein_expression_phenotype (HCLS:000033)
      5. PD_cerebral_cortex_UCHL-1_protein_expression_phenotype (HCLS:000034)
      6. LBD_cerebral_cortex_UCHL-1_protein_expression_phenotype (HCLS:000035)
  • C) EA2:
    • a) description: "UCHL-1 mRNA and protein expressions were reduced in the medulla oblongata in the same PD cases."
      1. sub-description: "UCHL-1 mRNA measured by PCR assay is decreased in the medulla oblongata in PD patients, when compared with age-matched controls."
      2. sub-description: "UCHL-1 protein measured by Western blots is decreased in the medulla oblongata in PD patients, when compared with age-matched controls."
  • D) Phenotype Syntax:
    1. PD_medulla_oblongata_UCHL-1_mRNA_expression_phenotype (HCLS:000048)
    2. PD_medulla_oblongata_UCHL-1_protein_expression_phenotype (HCLS:000049)
  • E) EA3:
    • a) description: "UCHL-1 protein was decreased in the substantia nigra in cases with Lewy body pathology."
      1. sub-description: "UCHL-1 protein measured by Western blots is decreased in the substantia nigra in LBD patients, when compared with age-matched controls." (reading the article indicates this finding was for both LBD & PD)
  • F) Phenotype Syntax:
    1. PD_substantia_nigra_UCHL-1_protein_expression_phenotype (HCLS:000050)
    2. LBD_substantia_nigra_UCHL-1_protein_expression_phenotype (HCLS:000051)
  • G) EA4:
    • a) description: "UCHL-1 down-regulation was not associated with reduced protein levels of several proteasomal subunits, including 20SX, 20SY, 19S and 11Salpha."
      1. sub-description: "20SX subunit protein measured by Western blots is unchanged in the tissue specimens from AD patients, when compared with age-matched controls."
      2. sub-description: "20SX subunit protein measured by Western blots is unchanged in the tissue specimens from PD patients, when compared with age-matched controls."
      3. sub-description: "20SX subunit protein measured by Western blots is unchanged in the tissue specimens from LBD patients, when compared with age-matched controls."
      4. sub-description: "20SY subunit protein measured by Western blots is unchanged in the tissue specimens from AD patients, when compared with age-matched controls."
      5. sub-description: "20SY subunit protein measured by Western blots is unchanged in the tissue specimens from PD patients, when compared with age-matched controls."
      6. sub-description: "20SY subunit protein measured by Western blots is unchanged in the tissue specimens from LBD patients, when compared with age-matched controls."
      7. sub-description: "19S subunit protein measured by Western blots is unchanged in the tissue specimens from AD patients, when compared with age-matched controls."
      8. sub-description: "19S subunit protein measured by Western blots is unchanged in the tissue specimens from PD patients, when compared with age-matched controls."
      9. sub-description: "19S subunit protein measured by Western blots is unchanged in the tissue specimens from LBD patients, when compared with age-matched controls."
      10. sub-description: "11Salpha subunit protein measured by Western blots is unchanged in the tissue specimens from AD patients, when compared with age-matched controls."
      11. sub-description: "11Salpha subunit protein measured by Western blots is unchanged in the tissue specimens from PD patients, when compared with age-matched controls."
      12. sub-description: "11Salpha subunit protein measured by Western blots is unchanged in the tissue specimens from LBD patients, when compared with age-matched controls."
  • H) Phenotype Syntax:
    • expressing this in OBO Phenotype Syntax is problematic - please see note below.
  • I) EA5:
    • a) description: "UCHL-3 expression was reduced in the cerebral cortex of PD and DLB patients."
      1. sub-description: "UCHL-3 protein measured by Western blots is decreased in the cerebral cortex in PD patients, when compared with age-matched controls."
      2. sub-description: "UCHL-3 protein measured by Western blots is decreased in the cerebral cortex in AD patients, when compared with age-matched controls."
      3. sub-description: "UCHL-3 protein measured by Western blots is decreased in the cerebral cortex in LBD patients, when compared with age-matched controls."
  • J) Phenotype Syntax:
    1. AD_cerebral_cortex_UCHL-3_protein_expression_phenotype (HCLS:000055)
    2. PD_cerebral_cortex_UCHL-3_protein_expression_phenotype (HCLS:000056)
    3. LBD_cerebral_cortex_UCHL-3_protein_expression_phenotype (HCLS:000057)
  • K) CONCLUSION: "These observations show reduced UCHL-1 expression as a contributory factor in the abnormal protein aggregation in DLB, and points UCHL-1 as a putative therapeutic target in the treatment of DLB."

HOW-TO for building Phenotype Syntax phrases:

NOTE: It shoud be stressed this initial example was created largely through manual editing of this Wiki page due to time constraints. This is obviously not a viable approach to carrying out this experiement. The ontological artifcats described below will be assembled either in Protege or SWOOP, so this work can proceed in a more expeditious and useful manner.

  • download the OBO-Edit program from the Gene Ontology tools page.
  • download the quality.obo file from the OBO Quality of Phenotypes Ontology Sourceforge page.
  • open the quality.obo file in OBO-Edit. There you will see the quality classes and the relations from that file. Do not use the classes in the Obsolete branch, as they are only there for backward compatibility.
  • extract the research statements/claims/phenotype observations from the abstract of a research article (the description quotes above.
  • guided by Chris Mungall's OBO Phenotype-Syntax examples use the formalism listed below to represent those phenotype descriptions.
    • you will need to refer to the other available OBO ontologies for the Entity representations. You can find these at the OBO Sourceforege site, though many are also browsable using the EBI Ontology Lookup tool.
    • for OBO formated ontologies, use OBO-Edit to browse the ontlogy. For OWL ontologies, use Protege. Often I simply browse/search the ontology right there online, since you are mainly hunting for isolated entities and generally can do this more quickly online. When building PATO phrases, you will want to have the PATO ontology open in OBO-Edit.
  • The ontologies chosen were the most convenient for me to search and browser mostly on the OBO site. Some of the choices may need vetting. For instance, we should probably use one of the more common pathway databases, such as the one assembled by the BioPAX or in the BioCyc project.
  • some entities will need to be post-composed using the OBO syntax you see listed in the previous section. See some of the complex entities below
  • the following are some terms derived from these articles and relevant entities - both simple and complex, post-composed:
term abbrev entity ontology
Ubiquitin C-terminal hydrolase-1 protein UCHL-1 protein ubiquitin_carboxyl_terminal_hydrolase_isozyme_L1 UniProt Knowledgebase
UCHL-3 protein ubiquitin_carboxyl_terminal_hydrolase_isozyme_L3 UniProt Knowledgebase
mRNA mRNA Sequence Ontology
PCR assays (with implied products) PCR_product Sequence Ontology
Western Blot (with implied antigenic peptide) Western_blot_antigenic_peptide HCLS Ontology (should be in a proteomics ontology, but I can't find it)
ubiquitin proteasome system UPS PW:ubiquitin/proteasome degradation pathway Pathway Ontology
cerebral cortex cerebral_cortex NeuroNames
medulla oblongata medulla_oblongata NeuroNames
substantia nigra substantia_nigra NeuroNames
sample_population Ontology of Biomedical Investigation
patient_population HCLS Ontology
human_patient_population HCLS Ontology
pool_of_specimens Ontology of Biomedical Investigation
dementia with Lewy bodies - pure DLBp Lewy_Body_Disease Disease Ontology
dementia with Lewy bodies - common DLBc Alzheimers_Disease Disease Ontology
Parkinson disease PD Parkinson_Disease Disease Ontology
down-regulation decreased Phenotypic Quality Ontology
unchanged normal Phenotypic Quality Ontology
only_has_member HCLS Ontology
is_diagnosed_with HCLS Ontology
is_control_for HCLS Ontology
is_aged_matched_control_for HCLS Ontology
translates_to HCLS Ontology
is_translated_from HCLS Ontology
Entity ID N3
UCHL-1_mRNA (HCLS:000007) :SO:mRNA :translates_to :UP:ubiquitin_carboxyl_terminal_hydrolase_isozyme_L1 ==
UCHL-1_PCR_product (HCLS:000008) :SO:PCR_product :derives_from :HCLS:UCHL-1_mRNA ==
UCHL-1_western_blot_antigenic_peptide (HCLS:000029) :HCLS:Western_blot_antigenic_peptide :derives_from :UP:ubiquitin_carboxyl_terminal_hydrolase_isozyme_L1 ==
UCHL-3_western_blot_antigenic_peptide (HCLS:000054) :HCLS:Western_blot_antigenic_peptide :derives_from :UP:ubiquitin_carboxyl_terminal_hydrolase_isozyme_L3 ==
defined_human_patient_population (HCLS:000009) :HCLS:human_patient_population :only_has_member :NCBITaxon:homo_sapiens ==
AD_patient_population (HCLS:000010) :HCLS:defined_human_patient_population :HCLS:is_diagnosed_with DO:Alzheimers_Disease ==
LBD_patient_population (HCLS:000011) :HCLS:defined_human_patient_population :HCLS:is_diagnosed_with :DO:Lewy_Body_Disease ==
PD_patient_population (HCLS:000012) :HCLS:defined_human_patient_population :HCLS:is_diagnosed_with :DO:Parkinson_Disease ==
AD_patient_age_matched_control_population (HCLS:000013) HCLS:defined_human_patient_population HCLS:is_aged_matched_control_for HCLS:AD_patient_population ==
LBD_patient_age_matched_control_population (HCLS:000014) HCLS:defined_human_patient_population HCLS:is_aged_matched_control_for HCLS:LBD_patient_population ==
PD_patient_age_matched_control_population (HCLS:000015) HCLS:defined_human_patient_population HCLS:is_aged_matched_control_for HCLS:PD_patient_population ==
AD_tissue_sample (HCLS:000016) :OBO:pool_of_specimens :derives_from :HCLS:AD_patient_population ==
AD_cerebral_cortical_tissue_sample (HCLS:000017) :HCLS:AD_tissue_sample :derives_from :NN:cerebral_cortex ==
AD_medulla_oblongata_tissue_sample (HCLS:000036) :HCLS:AD_tissue_sample :derives_from :NN:medulla_oblongata ==
AD_substantia_nigra_tissue_sample (HCLS:000037) :HCLS:AD_tissue_sample :derives_from :NN:substantia_nigra ==
LBD_tissue_sample (HCLS:000018) :OBO:pool_of_specimens :derives_from :HCLS:LBD_patient_population ==
LBD_cerebral_cortical_tissue_sample (HCLS:000019) :HCLS:LBD_tissue_sample :derives_from :NN:cerebral_cortex ==
LBD_medulla_oblongata_tissue_sample (HCLS:000038) :HCLS:LBD_tissue_sample :derives_from :NN:medulla_oblongata ==
LBD_substantia_nigra_tissue_sample (HCLS:000039) :HCLS:LBD_tissue_sample :derives_from :NN:substantia_nigra ==
PD_tissue_sample (HCLS:000020) :OBO:pool_of_specimens :derives_from :HCLS:PD_patient_population ==
PD_cerebral_cortical_tissue_sample (HCLS:000021) :HCLS:PD_tissue_sample :derives_from :NN:cerebral_cortex ==
PD_medulla_oblongata_tissue_sample (HCLS:000040) :HCLS:PD_tissue_sample :derives_from :NN:medulla_oblongata ==
PD_substantia_nigra_tissue_sample (HCLS:000041) :HCLS:PD_tissue_sample :derives_from :NN:substantia_nigra ==
AD_age_matched_control_tissue_sample (HCLS:000022) :OBO:pool_of_specimens :derives_from :HCLS:AD_patient_age_matched_control_population ==
AD_cerebral_cortical_age_matched_control_tissue_sample (HCLS:000023) :HCLS:AD_age_matched_control_tissue_sample :derives_from :NN:cerebral_cortex ==
AD_medulla_oblongata_age_matched_control_tissue_sample (HCLS:000042) :HCLS:AD_age_matched_control_tissue_sample :derives_from :NN:medulla_oblongata ==
AD_substantia_nigra_age_matched_control_tissue_sample (HCLS:000043) :HCLS:AD_age_matched_control_tissue_sample :derives_from :NN:substantia_nigra ==
LBD_age_matched_control_tissue_sample (HCLS:000024) :OBO:pool_of_specimens :derives_from :HCLS:LBD_patient_age_matched_control_population ==
LBD_cerebral_cortical_age_matched_control_tissue_sample (HCLS:000025) :HCLS:LBD_age_matched_control_tissue_sample :derives_from :NN:cerebral_cortex ==
LBD_medulla_oblongata_age_matched_control_tissue_sample (HCLS:000044) :HCLS:LBD_age_matched_control_tissue_sample :derives_from :NN:medulla_oblongata ==
LBD_substantia_nigra_age_matched_control_tissue_sample (HCLS:000045) :HCLS:LBD_age_matched_control_tissue_sample :derives_from :NN:substantia_nigra ==
PD_age_matched_control_tissue_sample (HCLS:000026) :OBO:pool_of_specimens :derives_from :HCLS:PD_patient_age_matched_control_population ==
PD_cerebral_cortical_age_matched_control_tissue_sample (HCLS:000027) :HCLS:PD_age_matched_control_tissue_sample :derives_from :NN:cerebral_cortex ==
PD_medulla_oblongata_age_matched_control_tissue_sample (HCLS:000046) :HCLS:PD_age_matched_control_tissue_sample :derives_from :NN:medulla_oblongata ==
PD_substantia_nigra_age_matched_control_tissue_sample (HCLS:000047) :HCLS:PD_age_matched_control_tissue_sample :derives_from :NN:substantia_nigra ==

Pheno ID Entity Quality
AD_cerebral_cortex_UCHL-1_mRNA_expression_phenotype (HCLS:000030) HCLS:UCHL-1_PCR_product^derives_from(AD_cerebral_cortical_tissue_sample) PATO:decrease
PD_cerebral_cortex_UCHL-1_mRNA_expression_phenotype (HCLS:000031) HCLS:UCHL-1_PCR_product^derives_from(PD_cerebral_cortical_tissue_sample) PATO:normal
LBD_cerebral_cortex_UCHL-1_mRNA_expression_phenotype (HCLS:000032) HCLS:UCHL-1_PCR_product^derives_from(LBD_cerebral_cortical_tissue_sample) PATO:decrease
PD_medulla_oblongata_UCHL-1_mRNA_expression_phenotype (HCLS:000048) HCLS:UCHL-1_PCR_product^derives_from(PD_medulla_oblongata_tissue_sample) PATO:decrease
AD_cerebral_cortex_UCHL-1_protein_expression_phenotype (HCLS:000033) HCLS:UCHL-1_western_blot_antigenic_peptide^derives_from(AD_cerebral_cortical_tissue_sample) PATO:decrease
PD_cerebral_cortex_UCHL-1_protein_expression_phenotype (HCLS:000034) HCLS:UCHL-1_western_blot_antigenic_peptide^derives_from(PD_cerebral_cortical_tissue_sample) PATO:normal
LBD_cerebral_cortex_UCHL-1_protein_expression_phenotype (HCLS:000035) HCLS:UCHL-1_western_blot_antigenic_peptide^derives_from(LBD_cerebral_cortical_tissue_sample) PATO:decrease
PD_medulla_oblongata_UCHL-1_protein_expression_phenotype (HCLS:000049) HCLS:UCHL-1_western_blot_antigenic_peptide^derives_from(PD_medulla_oblongata_tissue_sample) PATO:decrease
PD_substantia_nigra_UCHL-1_protein_expression_phenotype (HCLS:000050) HCLS:UCHL-1_western_blot_antigenic_peptide^derives_from(PD_substantia_nigra_tissue_sample) PATO:decrease
LBD_substantia_nigra_UCHL-1_protein_expression_phenotype (HCLS:000051) HCLS:UCHL-1_western_blot_antigenic_peptide^derives_from(LBD_substantia_nigra_tissue_sample) PATO:decrease
AD_cerebral_cortex_UCHL-3_protein_expression_phenotype (HCLS:000055) HCLS:UCHL-3_western_blot_antigenic_peptide^derives_from(AD_cerebral_cortical_tissue_sample) PATO:decrease
PD_cerebral_cortex_UCHL-3_protein_expression_phenotype (HCLS:000056) HCLS:UCHL-3_western_blot_antigenic_peptide^derives_from(PD_cerebral_cortical_tissue_sample) PATO:normal
LBD_cerebral_cortex_UCHL-3_protein_expression_phenotype (HCLS:000057) HCLS:UCHL-3_western_blot_antigenic_peptide^derives_from(LBD_cerebral_cortical_tissue_sample) PATO:decrease

Caveats

  • These have been vetted with Chris Mungall, though as yet, I've not adopted all of his suggestions. He'd not seen the complete picture, yet, so I will be asking for more feedback from him on the content of this page.
  • There are some problematic issues here, but I think this works as a first pass on this application. Please feel free to supply feedback, if you believe my use of either syntax or semantics doesn't conform with your understanding of the relevant standard.
  • I have done this manually, so there may be typos (please fix any you see or pass them on to me [BB]).
  • There are tools for constructing OBO Phenotype Assertions (e.g., Phenote - see above); however, they are not currently designed to support what I am doing here. They are much more strictly focused on linking genotype to phenotype. My understanding is what I'm trying to capture here is within the scope of what PATO & PhenoSyntax is designed to support.
  • Though it appears there is redundancy with some of the control group classes (e.g., can't the same control group be used to derive control tissue specimens from all the relevant brain regions to compare against each of the specific specimens from the different diagnosed disease populations?). The answer is - we need to go back to the paper and read the methods section. In many cases, a distinct control population will be defined for each distinct disease population. If the control population is in fact re-used, appropriate equivalentClass relations can be established - or the control population can simply be re-used.
  • some of these entities and relations can be more formally enhanced in their expressivity if they are moved into OWL-DL (e.g., linking translates_to and is_translated_from as reciprocal ObjectProperties, etc.). I don't think that is necessary to build these OBO Phenotype Assertions, but it certain can and should be done.
  • Problematic Issues
    • A PCR_product is really a bfo:aggregateObject, but hopefully they are a relatively uniform population of objects, and the shorthand of referring to it as a simple biomaterial entity will suffice
    • A Western_blot_antigenic_peptide is also a bfo:aggregateObject but is much more problematic when dealt with as a simple biomaterial entity. The level of heterogeneity can be difficult to determine, though, generally, if there is only a single-band derived from the gel, it is probably safe to use this shorthand for now. In reality, especially with polyclonal sera that have not been affinity purified, you can see multiple bands and smeared bands, at which point collapsing this aggregateObject conflicts with the observation.
    • Time - oh, don't get me started about time. In defining a certain patient population, there is an implicit temporal boundary that should be declared (e.g., at time T, this group of patients were within age range X - Y. There is also the issue of WHEN the diagnosis for PD, AD, or LBD was made. I believe in this study, PD & AD were diagnosed in living patients, whereas LBD was a post-mortem diagnosis (I believe that's true - must check the paper again). There is also the issue of when the specimen was derived. In this case they were all post-mortem (typical for neurodegenerative disease), but in other diseases - especially cancer, biopsoies are very common. There is also the issue of when the actual mRNA & protein extracts where created relative to the harvesting of the tissue. Such temporal details can have significant ramifications on the experimental outcome, and in some studies this temporal dependence will have been tested or controlled for.
    • OBO_RO:derives_from is used quite liberally here. I think that is very problematic, as - just as with the merological part_of relation - derives_from can refer to a very different relation in reality depending on the context. In other words, I find the semantic definition of derives_from in need of more specification, a fact which may derive from my own potential misuse or under-specification.
    • proteosomal subunits are tough to characterize ontologically. The anti-bodies are raised and purified againts specific enzymatic degredation products of isolated proteosomes. The particular products are hard to find either in a distilled knowledge source such as the IUBMB Enzyme Classification (EC) which contains the category proteosomal endopeptidases or in the data repositories such as GENBANK where you find a littany of specific subunits few of which are from human. It's hard to be certain exactly what entity is being bound by the anti-body, so to say more than 20S proteosomal endopeptidase is probably not truly reflective of what is being observed here. The 20SbetaX, 20SbetaY, 19Sbeta, and 11Salpha binding can't easily be mapped back to an entry in UniProt or PDB.

OBO Phenotype Syntax formalism

The following is the BNF definition of the currently proposed OBO Phenotype Syntax formalism (2006-09-05).


phenotype ::= phenotype_typeref? phenotype_character*

phenotype_character ::= description? expressivity? bearer quality*
expressivity ::= 'Expressivity=' float '%'
bearer ::= 'E=' typeref
quality ::= 'Q=' typeref count? related_entity* on_condition* in_comparison_to* measurement* temporal_qualifier* modifier*
count ::= 'C=' integer
related_entity ::= 'E2=' typeref
on_condition ::= 'On=' typeref
in_comparison_to ::= 'Compar=' typeref comparison_target?
measurement ::= 'M=' float unit
temporal_qualifier ::= 'T=' qualifier
modifier ::= 'Tag=' id
phenotype_typeref ::= 'P=' typeref

comparison_target ::= '{' relation? typeref '}'

description ::= 'Desc=' '"' text '"'

relation ::= typeref
typeref ::= id conjunction*
conjunction ::= '^' qualifier
qualifier ::= relation '(' typeref ')'

id ::= prefix ':' local_id
prefix ::= word
local_id ::= word
unit ::= word


HCLS Home OTF Home Discussions Post to HCLS listserv Minutes Links