HCLS/Banff2007Demo/HCLS/Banff2007Demo/StepByStepExplanationOfDemoSparql

From W3C Wiki

HCLS Demo SPARQL Example

The following is a step-by-step explanation of the Banff 2007 Demo's SPARQL query which asks: “Can we find candidate genes known to be involved in signal transduction and active in pyramidal neurons?”

This is a work in progress. Please feel free to add your comments. Donald Doherty


PREFIX go: <http://purl.org/obo/owl/GO#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX mesh: <http://purl.org/commons/record/mesh/>
PREFIX sc: <http://purl.org/science/owl/sciencecommons/>
PREFIX ro: <http://www.obofoundry.org/ro/ro.owl#>

SELECT DISTINCT ?genename ?processname
WHERE
{ GRAPH <http://purl.org/commons/hcls/pubmesh>
   { ?paper ?p mesh:D017966 .
      ?article sc:identified_by_pmid ?paper.
      ?gene sc:describes_gene_or_gene_product_mentioned_by ?article.
    }
    GRAPH <http://purl.org/commons/hcls/goa>
    { ?protein rdfs:subClassOf ?res.
       ?res owl:onProperty ro:has_function.
       ?res owl:someValuesFrom ?res2.
       ?res2 owl:onProperty ro:realized_as.
       ?res2 owl:someValuesFrom ?process.
       GRAPH <http://purl.org/commons/hcls/20070416/classrelations>
       {{ ?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166 }
         UNION
         { ?process rdfs:subClassOf go:GO_0007166 }}
       ?protein rdfs:subClassOf ?parent.
       ?parent owl:equivalentClass ?res3.
       ?res3 owl:hasValue ?gene.
     }
     GRAPH <http://purl.org/commons/hcls/gene>
     { ?gene rdfs:label ?genename }
     GRAPH<http://purl.org/commons/hcls/20070416>
     { ?process rdfs:label ?processname}
}

The SPARQL query begins by defining the go (Gene Ontology), rdfs (RDF Schema), owl (Web Ontology Language), mesh (MeSH Ontology), sc (Science Commons Ontology), and ro (OBO Relation Ontology) namespaces.

PREFIX go: <http://purl.org/obo/owl/GO#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX mesh: <http://purl.org/commons/record/mesh/>
PREFIX sc: <http://purl.org/science/owl/sciencecommons/>
PREFIX ro: <http://www.obofoundry.org/ro/ro.owl#>

The SELECT clause sets up two variables - genename and processname - that will be displayed at the end of the search. It includes the DISTINCT operator that restricts the displayed data to unique sets.

SELECT DISTINCT ?genename ?processname

The WHERE clause includes one high-level GRAPH statement, which contains three GRAPH statements at the next level. Each GRAPH statement states the graph that the following match will be applied to.

WHERE

The top level GRAPH statement says that the following matches will be performed on the http://purl.org/commons/hcls/pubmesh graph. The HCLS PubMeSH graph contains the 2007 MeSH to Medline associations, Medline titles and dates, and genes to papers links.

{ GRAPH <http://purl.org/commons/hcls/pubmesh>

All triples with objects that match mesh:D017966 (MeSH Unique ID for Pyramidal Cells) are selected and have their subjects assigned to the paper variable and their properties assigned to the p variable. (Select all papers that have something to do with Pyramidal Cells.)

   { ?paper ?p mesh:D017966 .

Next a match is carried out on triples associated with the RDF terms assigned to the paper variable. This time the RDF terms assigned to paper are treated as objects. Only those triples including an RDF term assigned to paper in the object position and an sc:identified_by_pmid in the property position are selected and their RDF terms in the subject position are assigned to the article variable. (Of those papers dealing with Pyramidal Cells, only select those associated with a published paper listed on PubMed.)

      ?article sc:identified_by_pmid ?paper.

A match is carried out on the triples associated with the RDF terms assigned to the article variable. The RDF terms assigned to article are treated as objects. Only those triples including an RDF term assigned to article in the object position and an sc:describes_gene_or_gene_product_mentioned_by in the property position are selected and their RDF terms in the subject position are assigned to the gene variable. (Find all genes and gene products that are associated with Pyramidal Cells and are published in papers found on PubMed.)

      ?gene sc:describes_gene_or_gene_product_mentioned_by ?article.

The first GRAPH statement at the next level says that the following matches will be performed on the http://purl.org/commons/hcls/goa graph. The HCLS GOA graph contains the Gene Ontology annotations related to Entrez gene identifiers.

    GRAPH <http://purl.org/commons/hcls/goa>

All triples with properties that match rdfs:subClassOf are selected and their subjects are assigned to the protein variable and their objects are assigned to the res variable. (Select all proteins that are a subclass of a resource.)

    { ?protein rdfs:subClassOf ?res.

A match is carried out on triples associated with the RDF terms assigned to the res variable. The RDF terms assigned to res are treated as subjects. Only those triples including an RDF term assigned to res in the subject position, owl:onProperty in the property position, and ro:has_function in the subject position are selected. (Of those RDF terms that are subclasses of their objects, only select those instances with stated functions.)

Note that the owl:onProperty property is a local property restriction.

       ?res owl:onProperty ro:has_function.

Again, a match is carried out on triples associated with the RDF terms assigned to the res variable. The RDF terms assigned to res are treated as subjects. Only those triples including an RDF term assigned to res in the subject position and owl:someValuesFrom in the property position are selected. The objects of the selected triples are assigned to res2. (Of those RDF terms that are subclasses of their objects and have stated functions, only select those with at least one instance of the property value.)

       ?res owl:someValuesFrom ?res2.

This time a match is carried out on triples associated with the RDF terms assigned to the res2 variable. The RDF terms assigned to res2 are treated as subjects. Only those triples including an RDF term assigned to res2 in the subject position, owl:owl:onProperty in the property position, and ro:realized_as in the object position are selected. (Of those RDF terms that are subclasses of their objects, have stated functions, and at least one instance of the property value, only select those with ???.)

       ?res2 owl:onProperty ro:realized_as.

Again, a match is carried out on triples associated with the RDF terms assigned to the res2 variable. The RDF terms assigned to res2 are treated as subjects. Only those triples including an RDF term assigned to res2 in the subject position and owl:someValuesFrom in the property position are selected. The objects of the selected triples are assigned to process. (???.)

       ?res2 owl:someValuesFrom ?process.


TO BE CONTINUED!