HCLSIG BioRDF Subgroup/QueryFederation2

From W3C Wiki
Jump to: navigation, search

Microarray Use Case

Editors: KeiCheung MScottMarshall


  • There are benefits making meta-data about microarray experiments available: reproducibility, cross-comparison, and integration
  • Making such meta-data accessible to machine in a standard format (Semantic Web) better enable computers to help researchers discover, correlate, and integrate data generated from different but related experiments
  • Including biologically significant gene lists and their context as part of the standard representation format would increase the benefit

This use case explores the federation of microarray data and related data using the Semantic Web. Some areas of interest include the following:

Currently, we are identifying the types of metadata (contextual information) that are needed for describing the experiments, samples, and gene lists in a way that is useful to domain scientists. In doing this, we are also identifying ontologies that contain the relevant terms and relationships. This use case may provide an opportunity to facilitate interaction/collaboration among different communities (e.g., semantic web, ontology, neuroscience and microarray)


Below are the citations and abstracts of four microarray experiments found in the NIH Neuroscience Microarray Consortium.

Dunckley T, Beach TG, Ramsey KE, Grover A, Mastroeni D, Walker DG, Lafleur BJ, Coon KD, Brown KM, Caselli R, Kukull W, Higdon R, McKeel D, Morris JC, Hulette C, Schmechel D, Reiman EM, Rogers J, Stephan DA., Gene expression correlates of neurofibrillary tangles in Alzheimer's disease; Neurobiol Aging, 2005, 27(10):1359-71. (pubmed link; Gene List)

Neurofibrillary tangles (NFT) constitute one of the cardinal histopathological features of Alzheimer's disease (AD). To explore in vivo molecular processes involved in the development of NFTs, we compared gene expression profiles of NFT-bearing entorhinal cortex neurons from 19 AD patients, adjacent non-NFT-bearing entorhinal cortex neurons from the same patients, and non-NFT-bearing entorhinal cortex neurons from 14 non-demented, histopathologically normal controls (ND). Of the differentially expressed genes, 225 showed progressively increased expression (AD NFT neurons > AD non-NFT neurons > ND non-NFT neurons) or progressively decreased expression (AD NFT neurons < AD non-NFT neurons < ND non-NFT neurons), raising the possibility that they may be related to the early stages of NFT formation. Immunohistochemical studies confirmed that many of the implicated proteins are dysregulated and preferentially localized to NFTs, including apolipoprotein J, interleukin-1 receptor-associated kinase 1, tissue inhibitor of metalloproteinase 3, and casein kinase 2, beta. Functional validation studies are underway to determine which candidate genes may be causally related to NFT neuropathology, thus providing therapeutic targets for the treatment of AD.

Liang WS, Dunckley T, Beach TG, Grover A, Mastroeni D, Walker DG, Caselli RJ, Kukull WA, McKeel D, Morris JC, Hulette C, Schmechel D, Alexander GE, Reiman EM, Rogers J, Stephan DA , Gene expression profiles in anatomically and functionally distinct regions of the normal aged brain; Physiological Genomics, 2007, 3; 28:311-322. (pubmed link)

In this article, we have characterized and compared gene expression profiles from laser capture microdissected neurons in six functionally and anatomically distinct regions from clinically and histopathologically normal aged human brains. These regions, which are also known to be differentially vulnerable to the histopathological and metabolic features of Alzheimer's disease (AD), include the entorhinal cortex and hippocampus (limbic and paralimbic areas vulnerable to early neurofibrillary tangle pathology in AD), posterior cingulate cortex (a paralimbic area vulnerable to early metabolic abnormalities in AD), temporal and prefrontal cortex (unimodal and heteromodal sensory association areas vulnerable to early neuritic plaque pathology in AD), and primary visual cortex (a primary sensory area relatively spared in early AD). These neuronal profiles will provide valuable reference information for future studies of the brain, in normal aging, AD and other neurological and psychiatric disorders.

Liang WS, Reiman EM, Valla J, Dunckley T, Beach TG, Grover A, Niedzielko TL, Schneider LE, Mastroeni D, Caselli R, Kukull W, Morris JC, Hulette CM, Schmechel D, Rogers J, Stephan DA., Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons.; Proc Natl Acad Sci U S A., 2008, 11; 105:4441-6. (pubmed link)

Alzheimer's disease (AD) is associated with regional reductions in fluorodeoxyglucose positron emission tomography (FDG PET) measurements of the cerebral metabolic rate for glucose, which may begin long before the onset of histopathological or clinical features, especially in carriers of a common AD susceptibility gene. Molecular evaluation of cells from metabolically affected brain regions could provide new information about the pathogenesis of AD and new targets at which to aim disease-slowing and prevention therapies. Data from a genome-wide transcriptomic study were used to compare the expression of 80 metabolically relevant nuclear genes from laser-capture microdissected non-tangle-bearing neurons from autopsy brains of AD cases and normal controls in posterior cingulate cortex, which is metabolically affected in the earliest stages; other brain regions metabolically affected in PET studies of AD or normal aging; and visual cortex, which is relatively spared. Compared with controls, AD cases had significantly lower expression of 70% of the nuclear genes encoding subunits of the mitochondrial electron transport chain in posterior cingulate cortex, 65% of those in the middle temporal gyrus, 61% of those in hippocampal CA1, 23% of those in entorhinal cortex, 16% of those in visual cortex, and 5% of those in the superior frontal gyrus. Western blots confirmed underexpression of those complex I-V subunits assessed at the protein level. Cerebral metabolic rate for glucose abnormalities in FDG PET studies of AD may be associated with reduced neuronal expression of nuclear genes encoding subunits of the mitochondrial electron transport chain.

Greene JG, Dingledine R, and Greenamyre JT, Gene expression profiling of rat midbrain dopamine neurons: implications for selective vulnerability in parkinsonism; Neurobiol Dis, 2005, February:18(1): 19-31. (pubmed link)

To elucidate factors related to selective dopamine neuron degeneration in Parkinson's disease (PD), we have defined gene expression profiles of discrete dopamine neuron subpopulations in the rat using immunofluorescent laser capture microscopy and microarray analysis. Although profiles were remarkably similar, there are concerted categorical differences in gene expression between dopamine neurons that might explain their differential susceptibility. As a group, energy metabolism transcripts are more highly expressed in substantia nigra (SN) dopamine neurons, an intriguing result considering previous evidence for a mitochondrial defect in idiopathic PD and the greater susceptibility of SN dopamine neurons to damage by mitochondrial poisons. Examination of putative transcription factor binding sites suggests that these concerted differences may be related to differential activity of specific transcription factors. These results provide the first large scale description of gene expression profiles of dopamine neurons and suggest several avenues for investigation into dopaminergic neuroprotective therapy for PD.


Below are some representative concepts/terms that are found in the above examples.

  • Disease (e.g., AD, PD)
  • Neuron (e.g., dopamine neuron)
  • Brain region (e.g., hippocampus, posterior cingulate cortex, visual cortex)
  • Brain function (e.g., unimodal and heteromodal sensory association)
  • Species (e.g., human)
  • Experimental factor (e.g., normal vs. AD)
  • Sample extraction method (e.g., laser-capture microdissection)
  • Proteins (e.g., NFT)
  • Genes and their annotation (e.g., gene lists)
  • Biological process (e.g., energy metabolism)
  • Cellular component (e.g., mitochondrial electron transport chain)

These concepts/terms are defined in different ontologies/vocabularies. Using these concepts/terms and their relationships, one can discover semantically (biologically) related experiments so that some integrative analyses of the associated datasets can be performed. The researchers may also be interested in knowing what genes (or what types of genes) have been found significantly expressed under certain experimental conditions. Below are several example queries:

  • Find experiments involving the same disease/phenotype, brain region, and species
  • Find experiments that involve AD patients of certain disease states (e.g., mild vs severe AD patients)
  • Find particular types of genes that are expressed in a given brain region

Additional example queries (2010-April-4) include:

  • Find genelists that involve AD, pyramidal neuron, mouse, and hippocampus
  • Find genes that can differentially expressed between hippocampus region (HIP) and entorhinal cortex region (EC) (e.g., average signal in HIP/average signal in EC > X)
  • Extension of query 2 by focusing on genes that are involved in certain pathways, functions, etc.


Example Queries

  • Find over-expressed gene lists from experiments that involve memory-related brain regions from AD-related patients/samples
  • Find differentially expressed genes between Hippocampus and Entorhinal Cortex for AD or aged subjects.

RDF Structure

We aim to have a simple ontology to describe the example gene lists published at:

  • A tentative RDF graph by Jun on April 4, 2010. [2]
  • A more recent RDF graph by Lena on June 7, 2010. rdf-0607

This is a tentative first draft of a candidate gene list template as represented in RDF.

//Eric's example - a doggy with bad breath

<experiment1> provenir:has_parameter <sample1> .
<sample1> provenir:derivedFrom <NiceDoggy> ; biordf:disease :Alzheimer's ; biordf:diseaseStage :earlyAlzheimer's .

//As was agreed, disease should be a subClassOf healthState; also, biordf:disease may have stages, for example a cancer stage
biordf:disease rdfs:subClassOf biordf:healthState .

//The differencialGeneList should derive from the raw data files generated as a result of experiment1:

<experiment1> provenir:data_collection <RawArrayFilesExp1> .
<differentialGeneList1> biordf:computed_from <RawArrayFilesExp1> .
<differentialGeneList1> members (<geneX>, <geneY>) .

//Annotation of RawArrayFilesExp1; relevant when the array platform used in each raw data file is not the same within the same experiment

<RawArrayFilesExp1> members (<ABC1.CEL>, <ABC2.CEL>, ... ) .
<ABC1.CEL> mged:ArrayGroup affy:U133A .
<ABC2.CEL> mged:ArrayGroup affy:U133plus2.0 .

//Annotation of individual genes in the gene list; since this is in the context of an experiment, indicating whether they were over or under expressed is important

<geneX> biordf:geneLabel "UBC" ; biordf:gene_annotation_link <http://www.ncbi.nlm.nih.gov/gene/7316> ; 
                                 biordf:expressedHumanProtein <http://www.uniprot.org/uniprot/P63279>, <http://www.uniprot.org/uniprot/P61081> ;
                                 biordf:gene_expression_value_context :overexpressed ;
                                 biordf:associatedDisease <http://trustworthy_known_disease_database/Alzheimer> .

<geneY> biordf:geneLabel "VDAC3" ;  biordf:gene_annotation_link <http://www.ncbi.nlm.nih.gov/gene/7419> .

TODO Items

  • prototype genelists in MAGE-TAB
  • genelist RDF from GeneAtlas (in JSON)?
  • compare RDF to microarray RDF in Pistoia SESL project
  • consider how to merge RDF representation approach with that used by Sudeshna Das in Scientific Discourse
  • From Jun at ISWC:
    • have we considered the IS-A model? Yes, we did. But I can't remember why we didn't use it for our institutional level provenance in the end.
    • have we considered providing more information about the patients, from Bosse? We only describe the kind of diseases or brain region of the patients, but nothing more detailed, like the age, etc. Bosse is interested to collaborate on this and Pfizer also has some interests on this.
    • generally, people are very interested to see if we can map our model to existing provenance core models, like OPM, and to identify whether there are any missing gaps in the current provenance models for feedback to the provenance community.

Related Links

Old version of this use case