slanted W3C logo
Cover page images (keys)

SPARQL Access to SQL Databases

Eric Prud'hommeaux
Sanitation Engineer, W3C
SPARQL, HCLS, RDB2RDF Working Groups

Lee Feigenbaum
VP Technology & Standards, Cambridge Semantics
Co-Chair, W3C SPARQL Working Group


Valid XHTML + RDFa

Introductions/agenda bashing

What's in it for us?

most of the world's data
is in documents

machine-useful data
is in RDBs

today's questions
link RDBs

books

Value proposition for SQL as RDF

Mashups become queries.

SERVICE <http://uu3.org:8888/7tm_receptors>
  {
    ?iuphar  iface:family      ?family .
    ?iuphar  iface:code	       ?code .
    ?iuphar  iface:iupharName  ?iupharNm .
    ?human   iface:iuphar      ?iuphar .
    ?human   iface:geneName    "GABBR1" .
    ?human   iface:entrezGene  ?humanEntrez .
  }
SERVICE <http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&should-sponge=&>

  {
     _:gene  dbp:entrezgene    ?humanEntrez ;
             rdfs:label        ?label ;
    FILTER (lang(?label) = "en")
  }
SERVICE <http://hcls.deri.org/atag/data/gabab_example.html>
  {
     ?topic  rdfs:label        ?label .
     ?post   sioc:topic        ?topic
  }

Data Browsers/Integrators

RDBs as RDF

Goal: put a relational database on the semantic web.

RDF Data Model

Database to RDF

[unrendered SVG image of HL7 data in RDF]

The Direct Mapping represents a relational database as an RDF graph.

[unrendered SVG image of HL7 data in RDF]

Database to RDF

[unrendered SVG image of HL7 data in RDF]

[unrendered SVG image of HL7 data in RDF]

Database to RDF

[unrendered SVG image of HL7 data in RDF]

[unrendered SVG image of HL7 data in RDF]

Database to RDF

[unrendered SVG image of HL7 data in RDF]

[unrendered SVG image of HL7 data in RDF]

Database to RDF

[unrendered SVG image of HL7 data in RDF]

[unrendered SVG image of HL7 data in RDF]

Scripting

spaghetti

Direct Mapping

Employee
IDfnameaddr
7Bob18
8SueNULL
Address
IDcitystate
18CambridgeMA

Input:

Output:

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
      

Direct Graph

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
      

Brief intro to SPARQL

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
	

Get the name and city for
each person living in MA:

SELECT ?name ?city
 WHERE { ?who <Employee#fname> ?name .
         ?who <Employee#addr>  ?adr .
         ?adr <Address#city>   ?city .
         ?adr <Address#state>  "MA" }

Stepping through a query match

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
	

Get the name and city for
each person living in MA:

SELECT ?name ?city
 WHERE { ?who <Employee#fname> ?name .
         ?who <Employee#addr>  ?adr .
         ?adr <Address#city>   ?city .
         ?adr <Address#state>  "MA" }

Stepping through a query match

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
	

Get the name and city for
each person living in MA:

SELECT ?name ?city
 WHERE { ?who <Employee#fname> ?name .
         ?who <Employee#addr>  ?adr .
         ?adr <Address#city>   ?city .
         ?adr <Address#state>  "MA" }

first triple produces two solutions:

Employee
?who?name
<Employee/ID-7>"Bob"
<Employee/ID-8>"Sue"

Stepping through a query match

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
	

Get the name and city for
each person living in MA:

SELECT ?name ?city
 WHERE { ?who <Employee#fname> ?name .
         ?who <Employee#addr>  ?adr .
         ?adr <Address#city>   ?city .
         ?adr <Address#state>  "MA" }

second triple eliminates a solution:

Employee
?who?name?adr
<Employee/ID-7>"Bob"<Address/ID-18>
<Employee/ID-8>"Sue"

Stepping through a query match

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
	

Get the name and city for
each person living in MA:

SELECT ?name ?city
 WHERE { ?who <Employee#fname> ?name .
         ?who <Employee#addr>  ?adr .
         ?adr <Address#city>   ?city .
         ?adr <Address#state>  "MA" }

third triple binds ?city:

Employee
?who?name?adr?city
<Employee/ID-7>"Bob"<Address/ID-18>"Cambridge"

Stepping through a query match

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
	

Get the name and city for
each person living in MA:

SELECT ?name ?city
 WHERE { ?who <Employee#fname> ?name .
         ?who <Employee#addr>  ?adr .
         ?adr <Address#city>   ?city ;
         ?adr <Address#state>  "MA" }

fourth triple tests a constant:

Employee
?who?name?adr?city
<Employee/ID-7>"Bob"<Address/ID-18>"Cambridge"

Stepping through a query match

<Employee/ID-7> rdf:type <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> rdf:type <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .

<Address/ID-18> rdf:type <Address> .
<Address/ID-18> <Address#ID> 18 .
<Address/ID-18> <Address#city> "Cambridge" .
<Address/ID-18> <Address#state> "MA" .
	

Get the name and city for
each person living in MA:

SELECT ?name ?city
 WHERE { ?who <Employee#fname> ?name .
         ?who <Employee#addr>  ?adr .
         ?adr <Address#city>   ?city .
         ?adr <Address#state>  "MA" }

project requested variables:

Employee
?name?city
"Bob""Cambridge"

Web of Data - optimistic view

http://myco.example/allDBs/

<Employee/ID-7> a          <Employee> .
<Employee/ID-7> <Employee#ID> 7 .
<Employee/ID-7> <Employee#fname> "Bob" .
<Employee/ID-7> <Employee#addr> 18 .
<Employee/ID-7> <Employee#ref-addr> <Address/ID-18> .
<Employee/ID-8> a          <Employee> .
<Employee/ID-8> <Employee#ID> 8 .
<Employee/ID-8> <Employee#fname> "Sue" .
	

http://myco.example/allDBs/

<Expense/ID-3> a                  <Expense> .
<Expense/ID-3> <Expense#ID>       3 .
<Expense/ID-3> <Expense#amount>   30.01 .
<Expense/ID-3> <Expense#payee>    7 .
<Expense/ID-3> <Expense#ref-payee> <Employee/ID-7> .

<Expense/ID-4> a                  <Expense> .
<Expense/ID-4> <Expense#ID>       4 .
<Expense/ID-4> <Expense#amount>   3030.23 .
<Expense/ID-4> <Expense#payee>    8 .
<Expense/ID-4> <Expense#ref-payee> <Employee/ID-8> .
	

Web of Data - best case

http://myco.example/HR/

<Employee/ID-7> <Employee#fname>          "Bob" .
<Employee/ID-7> <Employee#empno>          12817 .
<Employee/ID-7> <Employee#ref-department> <Department/ID-2> .

<Department/ID-2> <Department#ID>         2 .
<Department/ID-2> <Department#name>       "Engineering" .
<Department/ID-2> <Department#ref-manager> <Employee/ID-8> .
	

http://money.myco.example/

<expense/expno-6078> <expense#expno>     3 .
<expense/expno-6078> <expense#amount>    30.01 .
<expense/expno-6078> <expense#ref-payee> <emp/empno-2112> .

<emp/empno-2112> <emp#empno>    2112 .
<emp/empno-2112> <emp#fname>    "Robert" .
<emp/empno-2112> <emp#empno>    12817 .
<emp/empno-2112> <emp#ref-addr> <Address/ID-18> .
	

Web of Data - common case

UniProt gene

acc val
P04637TP53
P04637P53

Gene Ontology association

id symboldbxref_idspecies_idtype_idfull_name
17440p53 76533 135847 25790p53
3431471Tp53 3551209 370755 25790tumor protein p53
3586773tp53 3778590 54765 25790tumor protein p53

Gene Ontology term

id name term_type acc
2164transcription factor activity molecular_functionGO:0003700
3814nucleus cellular_componentGO:0005634
3814nucleus cellular_componentGO:0005634
3906cytoplasm cellular_componentGO:0005737
5005apoptosis biological_processGO:0006915
18231positive regulation of apoptosisbiological_processGO:0043065

Web of Data - common case

http://ucsc.example/uniProt/

_:a <gene#acc> "P04637"; <gene#val> "TP53".
_:b <gene#acc> "P04637"; <gene#val> "P53".

http://ucsc.example/go/

@prefix goassoc:   <http://ucsc.example/go/assoc#>.
@prefix goassocid: <http://ucsc.example/go/assoc/id->.
goassocid:17440   go:symbol "p53" ; go:dbxref_id   76533; go:species_id 135847; go:type_id 25790; go:full_name "p53".
goassocid:3431471 go:symbol "Tp53"; go:dbxref_id 3551209; go:species_id 370755; go:type_id 25790; go:full_name "tumor protein p53".
goassocid:3586773 go:symbol "tp53"; go:dbxref_id 3778590; go:species_id  54765; go:type_id 25790; go:full_name "tumor protein p53".

@prefix goterm:   <http://ucsc.example/go/term#>.
@prefix gotermid: <http://ucsc.example/go/term/id->.
gotermid:2164  goterm:name "transcription factor activity"   ; goterm:term_type "molecular_function"; goterm:acc "GO:0003700".
gotermid:3814  goterm:name "nucleus"                         ; goterm:term_type "cellular_component"; goterm:acc "GO:0005634".
gotermid:3814  goterm:name "nucleus"                         ; goterm:term_type "cellular_component"; goterm:acc "GO:0005634".
gotermid:3906  goterm:name "cytoplasm"                       ; goterm:term_type "cellular_component"; goterm:acc "GO:0005737".
gotermid:5005  goterm:name "apoptosis"                       ; goterm:term_type "biological_process"; goterm:acc "GO:0006915".
gotermid:18231 goterm:name "positive regulation of apoptosis"; goterm:term_type "biological_process"; goterm:acc "GO:0043065".

1st evaluation of value proposition

SPAAAACE Data

SPAAAACE LaunchSite

idlaunchsite country place label
1 capecanaveralUnited StatesCape CanaveralCape Canaveral, United States

SPAAAACE Launch

idname descaltNameinatDesignlaunchmass name
1 1968-089A... 03486 1968-089A 1 14781.0Apollo 7

SPAAAACE Spacecraft

idlaunch date sitevehicle craft
1 1968-0891968-10-111 Saturn 1B1

ER Diagram

MissionRole +mission +actor +role +n Mission +id +title Role +id +label Crew +id +name SpacecraftDiscipline +craft +discipline Discipline +id +label LaunchSite +id +city Launch +id +date +site +vehicle Spacecraft +id +launch +mission +name +altname +designator +agency +mass PlaceName +id +label CountryName +id +label AltCraftName +craft +name

Mapping to common schema

Mapping to common labels

SemWeb rules

R2RML Subject Map

Crew
idcrewword name
4 harrisonhaganschmittHarrison Hagan Schmitt

+

space:CrewMapping
    rr:tableName "Crew";
    rr:subjectMap [ rr:template "http://nasa.dataincubator.org/person/{crewword}";
                    rr:class space:Person ]



.

=

crew:harrisonhaganschmitt a foaf:Person .

R2RML Predicate Map

Crew
idcrewword name
4 harrisonhaganschmittHarrison Hagan Schmitt

+

space:CrewMapping
    rr:tableName "Crew";
    rr:subjectMap [ rr:template "http://nasa.dataincubator.org/person/{crewword}";
                    rr:class space:Person ];
    rr:predicateObjectMap [ rr:predicateMap [ rr:predicate foaf:name ];
                            rr:objectMap [ rr:column "name";]
                          ]
.

=

crew:harrisonhaganschmitt a foaf:Person ;
                          foaf:name "Harrison Hagan Schmitt" .

R2RML View

SELECT ... LaunchSite.id=Placename.site
LABELWORDLABEL
canaryislandsCanary Islands, Spain
capecanaveralCape Canaveral, United States
capecanaveralCape Kennedy

+

space:PlaceNameMapping
    a rr:TriplesMapClass;
    rr:sqlQuery """
        SELECT LaunchSite.labelword LABELWORD, PlaceName.label LABEL
          FROM LaunchSite, PlaceName
         WHERE LaunchSite.id=PlaceName.site""" ;
    rr:subjectMap [ rr:template "http://nasa.dataincubator.org/launchsite/{LABELWORD}" ];
    rr:predicateObjectMap [ rr:predicateMap [ rr:predicate rdfs:label ];
                            rr:objectMap [ rr:column "LABEL" ]
                          ].

=

site:canaryislands rdfs:label "Canary Islands, Spain" .
site:capecanaveral rdfs:label "Cape Canaveral, United States" .
site:capecanaveral rdfs:label "Cape Kennedy" .

R2RML Join

space:LaunchSiteMapping
    rr:tableName "LaunchSite";
    rr:subjectMap [ rr:template "http://nasa.dataincubator.org/launchsite/{LABELWORD}" ];
    rr:predicateObjectMap [
        rr:predicate rdfs:label;
        rr:objectMap [
            rr:parentTriplesMap space:PlaceNameMapping;
            rr:joinCondition [
                rr:child "id";
                rr:parent "site" ] ] ] .
site:canaryislands rdfs:label "Canary Islands, Spain" .
site:capecanaveral rdfs:label "Cape Canaveral, United States" .
site:capecanaveral rdfs:label "Cape Kennedy" .

Mapping SPARQL queries to SQL

Other R2RML Features

2nd evaluation of value proposition

Exercise #1: Give me all the properties about Apollo 7

Given:

Solution #1: Give me all the properties about Apollo 7

SELECT ?p ?o
{ 
  <http://nasa.dataincubator.org/spacecraft/1968-089A> ?p ?o
}

(query)

Query #2: Multiple triple patterns: property retrieval

Find me all the people in Tim Berners-Lee's FOAF file that have names and email addresses. Return each person's URI, name, and email address.
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE {
    ?person foaf:name ?name .
    ?person foaf:mbox ?email .
}
        
  • We can use multiple triple patterns to retrieve multiple properties about a particular resource
  • Shortcut: SELECT * selects all variables mentioned in the query
  • Dataset: http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf

Try it with ARQ, OpenLink's Virtuoso, or Redland's Rasqal. (Expected results.)

Exercise #2: URLs for Apollo 7

What URL does this database use for Apollo 7?
What is the (NASA) homepage for the mission?

Given the Talis endpoint:

Solution #2: URLs for Apollo 7

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?craft ?homepage
{
  ?craft foaf:name "Apollo 7" .
  ?craft foaf:homepage ?homepage
}

(query)

Query #3: Multiple triple patterns: traversing a graph

Find me the homepage of anyone known by Tim Berners-Lee.
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
PREFIX card: <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf#>
SELECT ?homepage
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
    card:i foaf:knows ?known .
    ?known foaf:homepage ?homepage .
}
        
  • The FROM keyword lets us specify the target graph in the query itself.
  • By using ?known as an object of one triple and the subject of another, we traverse multiple links in the graph.
    query diagram

Try it with ARQ, OpenLink's Virtuoso, or Redland's Rasqal. (Expected results.)

Exercise #3: What was the point of Apollo 7?

Given, the Talis endpoint:

Solution #3: What was the point of Apollo 7?

PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?disc ?label
{
  <http://nasa.dataincubator.org/spacecraft/1968-089A> space:discipline ?disc .
  ?disc rdfs:label ?label
}

(query)

Dataset: DBPedia

Query #4: Exploring DBPedia

Find me 50 example concepts in the DBPedia dataset.
SELECT DISTINCT ?concept
WHERE {
    ?s a ?concept .
} LIMIT 50
        
  • LIMIT is a solution modifier that limits the number of rows returned from a query. SPARQL has two other solution modifiers:
    • ORDER BY for sorting query solutions on the value of one or more variables
    • OFFSET, used in conjunction with LIMIT and ORDER BY to take a slice of a sorted solution set (e.g. for paging)
  • The SPARQL keyword a is a shortcut for the common predicate rdf:type, giving the class of a resource.
  • The DISTINCT modifier eliminates duplicate rows from the query results.

Try it with a DBPedia-specific SPARQL endpoint. (Expected results.)

Exercise #4: Find 50 Spacecraft

Given:

Solution #4: Find 50 Spacecraft

PREFIX space: <http://purl.org/net/schemas/space/>
SELECT ?craft
{
  ?craft a space:Spacecraft
}
LIMIT 50

(query)

Query #5: Basic SPARQL filters

Find me all landlocked countries with a population greater than 15 million.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>        
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?country_name ?population
WHERE {
    ?country a type:LandlockedCountries ;
             rdfs:label ?country_name ;
             prop:populationEstimate ?population .
    FILTER (?population > 15000000) .
}
        
  • FILTER constraints use boolean conditions to filter out unwanted query results.
  • Shortcut: a semicolon (;) can be used to separate two triple patterns that share the same subject. (?country is the shared subject above.)
  • rdfs:label is a common predicate for giving a human-friendly label to a resource.
  • Note all the translated duplicates in the results. How can we deal with that?

Try it with one of DBPedia's SPARQL endpoint. (Expected results.)

Exercise #5: Find launches in October 1968

Given, the Talis endpoint:

Solution #5: Find launches in October 1968

PREFIX space: <http://purl.org/net/schemas/space/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT *
{ ?launch space:launched ?date
  FILTER (
    ?date > "1968-10-1"^^xsd:date &&
    ?date < "1968-10-30"^^xsd:date
  )
}

(query)

SPARQL built-in filter functions

Query #6: Filters for picking among translations

Find me all landlocked countries with a population greater than 15 million (revisited), with the highest population country first.
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?country_name ?population
WHERE {
    ?country a type:LandlockedCountries ;
             rdfs:label ?country_name ;
             prop:populationEstimate ?population .
    FILTER (?population > 15000000 &&
            langMatches(lang(?country_name), "EN")) .
} ORDER BY DESC(?population)
        
  • lang extracts a literal's language tag, if any
  • langMatches matches a language tag against a language range

Try it with a DBPedia-specific SPARQL endpoint. (Expected results.)

Dataset: Jamendo

Query #7a: Finding artists' info - the wrong way

Find all Jamendo artists along with their image, home page, and the location they're near.
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?name ?img ?hp ?loc
WHERE {
  ?a a mo:MusicArtist ;
     foaf:name ?name ;
     foaf:img ?img ;
     foaf:homepage ?hp ;
     foaf:based_near ?loc .
}
        
  • Jamendo has information on about 3,500 artists.
  • Trying the query, though, we only get 2,667 results. What's wrong?

Try it with DBTune.org's Jamendo-specific SPARQL endpoint. Be sure to choose SPARQL rather than SeRQL. (Expected results.)

Query #7b: Finding artists' info - the right way

Find all Jamendo artists along with their image, home page, and the location they're near, if any.
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?name ?img ?hp ?loc
WHERE {
  ?a a mo:MusicArtist ;
     foaf:name ?name .
  OPTIONAL { ?a foaf:img ?img }
  OPTIONAL { ?a foaf:homepage ?hp }
  OPTIONAL { ?a foaf:based_near ?loc }
}
        
  • Not every artist has an image, homepage, or location!
  • OPTIONAL tries to match a graph pattern, but doesn't fail the whole query if the optional match fails.
  • If an OPTIONAL pattern fails to match for a particular solution, any variables in that pattern remain unbound (no value) for that solution.

Try it with DBTune.org's Jamendo-specific SPARQL endpoint. Be sure to choose SPARQL rather than SeRQL. (Expected results.)

Dataset: HCLS Knowledge Base at DERI Galway

Query #8: Querying alternatives

Find me the cellular processes that are either integral to or a refinement of signal transduction.
PREFIX go: <http://purl.org/obo/owl/GO#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://www.obofoundry.org/ro/ro.owl#>
SELECT DISTINCT ?label ?process
WHERE {
  { ?process obo:part_of go:GO_0007165 } # integral to
UNION
  { ?process rdfs:subClassOf go:GO_0007165 } # refinement of
  ?process rdfs:label ?label
}
  • The UNION keyword forms a disjunction of two graph patterns. Solutions to both sides of the UNION are included in the results.
  • The URI go:GO_0007165 is the identifier for signal transduction in the Gene Ontology
  • N.B. Cell-surface-receptor-linked signal transduction is a refinement (subclass) of signal transduction

Try it with the neurocommons SPARQL endpoint. (Expected results.)

RDF Datasets

RDF Datasets

image of multiple interacting RDF graphs

Dataset: semanticweb.org

Query #9: Querying named graphs

Find me people who have been involved with at least three ISWC or ESWC conference events.
SELECT DISTINCT ?name
WHERE {
    ?person foaf:name ?name .
    GRAPH ?g1 { ?person a foaf:Person }
    GRAPH ?g2 { ?person a foaf:Person }
    GRAPH ?g3 { ?person a foaf:Person }
    FILTER(?g1 != ?g2 && ?g1 != ?g3 && ?g2 != ?g3) .
}     
        
  • The GRAPH ?g construct allows a pattern to match against one of the named graphs in the RDF dataset. The URI of the matching graph is bound to ?g (or whatever variable was actually used).
  • N.B. The FILTER assures that we're finding a person who occurs in three distinct graphs.
  • N.B. The Web interface we use for this SPARQL query defines the foaf: prefix, which is why we omit it here.

Try it with the data.semanticweb.org SPARQL endpoint. (Expected results.)

Query #10: Tranforming between vocabularies

Convert FOAF data to VCard data.
PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT { 
  ?X vCard:FN ?name .
  ?X vCard:URL ?url .
  ?X vCard:TITLE ?title .
}FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE { 
  OPTIONAL { ?X foaf:name ?name . FILTER isLiteral(?name) . }
  OPTIONAL { ?X foaf:homepage ?url . FILTER isURI(?url) . }
  OPTIONAL { ?X foaf:title ?title . FILTER isLiteral(?title) . }
}
        
  • CONSTRUCT is an alternative SPARQL result clause to SELECT. Instead of returning a table of result values, CONSTRUCT returns an RDF graph.
  • The result RDF graph is created by taking the results of the equivalent SELECT query and filling in the values of variables that occur in the CONSTRUCT template.
  • Triples are not created in the result graph for template patterns that involve an unbound variable.

Try it with ARQ or OpenLink's Virtuoso. (Expected results.)

Query #11: ASKing a question

Is the Amazon river longer than the Nile River?
PREFIX prop: <http://dbpedia.org/property/>
ASK
{
  <http://dbpedia.org/resource/Amazon_River> prop:length ?amazon .
  <http://dbpedia.org/resource/Nile> prop:length ?nile .
  FILTER(?amazon > ?nile) .
}       
        
  • The ASK result clause simply returns true or false depending on whether or not the query pattern has any matches in the dataset.
  • As with SELECT queries, the boolean result is (by default) encoded in an SPARQL Results Format XML document.
  • Shortcut: the WHERE keyword is optional--not only in ASK queries but in all SPARQL queries.

Try it with the Virtuoso DBPedia SPARQL endpoint. (Expected results. - or are they??)

Dataset: EDGAR Corporate Ownership Data

Query #12: Learning about a resource

Tell me whatever you'd like to tell me about the Ford Motor Company.
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
DESCRIBE ?ford WHERE {
  ?ford foaf:name "FORD MOTOR CO" .
}
        
  • The DESCRIBE query result clause allows the server to return whatever RDF it wants that describes the given resource(s).
  • Because the server is free to interpret DESCRIBE as it sees fit, DESCRIBE queries are not interoperable.
  • Common implementations include concise-bounded descriptions, named graphs, minimum self-contained graphs, and more.

Note: The EDGAR-specific SPARQL endpoint is not currently up. (Expected results.)

What's new in SPARQL 1.1 Query?

A new SPARQL WG was chartered in March 2009 to extend the SPARQL language and landscape. SPARQL 1.1 Query includes these extensions:

Query #13: Projected Expressions

How many neutrons does the most common isotope of each element have?
PREFIX :   <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?element ?protons
(ROUND(?weight) - ?protons AS ?neutrons)
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE {
  [] a :Element ;
    :atomicNumber ?protons ;
    :atomicWeight ?weight ;
    :name ?element .
} ORDER BY ?protons
  • Projected expressions allows for arbitrary expressions to be used for columns in a query's result set.
  • Projected expressions must be in parentheses and must be given an alias using the AS keyword.
  • Note: [] in a query acts as an unnamed variable.

Try it with sparql.org. (Expected results.)

Dataset: UK Government Data

Query #14: Aggregates

How many roads of each classification are there in the UK?
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX roads: <http://transport.data.gov.uk/def/traffic/>
SELECT ?cat
(COUNT(DISTINCT ?thing) AS ?roads)
WHERE {
   ?thing  a roads:Road ; roads:countPointRoadCategory ?cat .
}
GROUP BY ?cat
  • Aggregate queries post-process query results by dividing the solutions into groups, and then performing summary calculations on those groups.
  • As in SQL, the GROUP BY clause specifies the key variable(s) to use to partition the solutions into groups.
  • SPARQL 1.1 defines these aggregate functions: COUNT, MIN, MAX, SUM, AVG, GROUP_CONCAT, SAMPLE
  • SPARQL 1.1 also includes a HAVING clause to filter the results of the query after applying aggregates.

Try it with the data.gov.uk endpoint. Make sure to choose the Transport dataset. (Expected results.)

Query #15a: Limit Per Resource Without Subqueries

Retrieve the second page of names and emails of people in Tim Berners-Lee's FOAF file, given that each page has 10 people.
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
    ?person foaf:name ?name .
    OPTIONAL { ?person foaf:mbox ?email }
} ORDER BY ?name LIMIT 10 OFFSET 10

Query #15b: Limit Per Resource With Subqueries

Retrieve the second page of names and emails of people in Tim Berners-Lee's FOAF file, given that each page has 10 people.
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
    {      SELECT DISTINCT ?person ?name WHERE { 
        ?person foaf:name ?name 
      } ORDER BY ?name LIMIT 10 OFFSET 10    }
    OPTIONAL { ?person foaf:mbox ?email }
}

Query #16a: Negation In SPARQL 1.0

Find the person entries in Tim Berners-Lee's FOAF file that do not contain a URL for the person's FOAF file.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?name
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
  ?person a foaf:Person ; foaf:name ?name .
 
OPTIONAL { ?person rdfs:seeAlso ?url }
  FILTER(!bound(?url))
}

Query #16b: Negation In SPARQL 1.1 (Part 1)

Find the person entries in Tim Berners-Lee's FOAF file that do not contain a URL for the person's FOAF file.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?name
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
  ?person a foaf:Person ; foaf:name ?name .
 
MINUS { ?person rdfs:seeAlso ?url }
}

Query #16c: Negation In SPARQL 1.1 (Part 2)

Find the person entries in Tim Berners-Lee's FOAF file that do not contain a URL for the person's FOAF file.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?name
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
  ?person a foaf:Person ; foaf:name ?name .
 
FILTER(NOT EXISTS { ?person rdfs:seeAlso ?url })
}

Query #17a: Finding Beers

Find all of the beers included in the beer ontology.
PREFIX beer: <http://www.w3.org/2011/12/beer.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?beer
FROM <http://www.w3.org/2011/12/beer.owl>
WHERE {
   ?beer rdf:type beer:Beer .
}

Query #17b: Finding Beers, Revisited

Find all of the beers included in the beer ontology.
PREFIX beer: <http://www.w3.org/2011/12/beer.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?beer
FROM <http://www.w3.org/2011/12/beer.owl>
WHERE {
   ?beer
rdf:type/rdfs:subClassOf* beer:Beer .
}

Query #18: Federate Data From Two Endpoints

Find the birth dates of all of the actors in Star Trek: The Motion Picture
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?actor_name ?birth_date
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf> # placeholder graph
WHERE {
 
SERVICE <http://data.linkedmdb.org/sparql> {
    <http://data.linkedmdb.org/resource/film/675> movie:actor ?actor .
    ?actor movie:actor_name ?actor_name
  }
  SERVICE <http://dbpedia.org/sparql> {
    ?actor2 a dbpedia:Actor ; foaf:name ?actor_name_en ; dbpedia:birthDate ?birth_date .
    FILTER(STR(?actor_name_en) = ?actor_name)
  }
}

What else is new in SPARQL 1.1?

The new SPARQL WG is also extending the SPARQL landscape with:

SPARQL 1.1 Update

SPARQL 1.1 Update is a language for managing and updating RDF graphs.

SPARQL 1.1 Graph Store HTTP Protocol

The SPARQL 1.1 Graph Store HTTP Protocol defines how to use RESTful HTTP requests to affect an RDF graph store. Some examples:

SPARQL 1.1 Service Description

The SPARQL 1.1 Service Description defines a discovery mechanism and vocabulary for describing the capabilities and data available at a SPARQL endpoint.

What's missing from SPARQL?

Even with the ongoing SPARQL 1.1 work, there are several other pieces of the SPARQL landscape that are not yet standardized, including:
The W3C ESW wiki lists more SPARQL extensions.

More philosophizing