SPARQL Support in MySQL

Eric Prud'hommeaux, RDF Data Access Working Group team contact

http://www.w3.org/2007/Talks/0710-spasql/

What is SPASQL?

SPARQL queries over conventional relational data.
Implicit relational mapping to RDF.
Not a triple store:
triples

subjects predicates object

order7 product product23

order7 shipDate 20050508

product23 product-name "Mystifier"
Not a table-per-predicate mapping:
shipDates

subjects object

order7 20050508

product-names

subjects object

product23 "Mystifier"

triples
subjects	predicates	object
order7	product	product23
order7	shipDate	20050508
product23	product-name	"Mystifier"

shipDates
subjects	object
order7	20050508

product-names
subjects	object
product23	"Mystifier"

SPASQL Example

Example use case: OrderTracking.

Orders
id	customer	product	orderDate	shippingAddress
2186	518	7385	2006-05-01 01:08:30	NULL
2186	208	2922	2006-05-01 18:34:38	NULL
2187	518	7439	2006-05-01 01:05:30	NULL

Comparing SPARQL to SQL — references

SQL graph shape comes from table aliases:

... WHERE o.orderDate < "20060501"
      AND o.shipDate = "20060508"

SPARQL graph shape (mostly) comes from variable.

... WHERE { ?o <orderDate> "20060501" .
            ?o <shipDate> "20060508" }

Comparing SPARQL to SQL — links

SQL restricts cross joins on attribute values.

  ...
  FROM Orders AS o
       JOIN Products AS p ON o.product=p.id

SPARQL has links in the data model that it queries.

 ...
 WHERE { ?o <Orders.products> ?p }

Data Mapping Details

Tuples correspond to RDF nodes.
Attribute names correspond to RDF predicates.
Foreign keys point to other nodes.
Other attribute values correspond to RDF literals (the boxes).
NULL attributess do NOT correspond to anything.

Data Mapping Details

Tuples correspond to RDF nodes.
Attribute names correspond to RDF predicates.
Foreign keys point to other nodes.
Other attribute values correspond to RDF literals (the boxes).
NULL attributess do NOT correspond to anything.

Data Mapping Details

Tuples correspond to RDF nodes.
Attribute names correspond to RDF predicates.
Foreign keys point to other nodes.
Other attribute values correspond to RDF literals (the boxes).
NULL attributess do NOT correspond to anything.

{ ?o <Orders.product> ?p }

Data Mapping Details

Tuples correspond to RDF nodes.
Attribute names correspond to RDF predicates.
Foreign keys point to other nodes.
Other attribute values correspond to RDF literals (the boxes).
NULL attributess do NOT correspond to anything.

{ ?o <Orders.product> ?p
  ?o <Orders.shipDate> ?ship }

Data Mapping — NULLs

Tuples correspond to RDF nodes.
Attribute names correspond to RDF predicates.
Foreign keys point to other nodes.
Other attribute values correspond to RDF literals (the boxes).
NULL attributess do NOT correspond to anything.

{ ?t <Orders.product> ?p
  OPTIONAL { ?t <Orders.shipDate> ?ship } }

Data Mapping — UNION

SPARQL UNION corresponds to an SQL subselected UNION

SPARQL

WHERE { { ?x <TableA.field> ?y }
        UNION
        { ?x <TableB.field> ?y } }

SQL

JOIN ( SELECT TableA.field AS foo FROM ... WHERE ...
       UNION
       SELECT TableB.field [AS foo] FROM ... WHERE ...  ) AS U0 ON U0.foo=...

Costs of SPASQL

Parsing — extra hash lookups — minor cost
Execution — same compile tree modulo primary key lookup
Familiarity — a LOT of people know a little SQL.
SPASQL is relatively new.
Expressivity — many fewer standard functions at present.

UNION Example — SPARQL

SELECT ?part ?descr ?given ?family ?posn ?rma ?billCity ?billState ?shipCity ?shipState
 WHERE { { ?order <Orders.product> ?prod .
           ?order <Orders.customer> ?cust .
           OPTIONAL { ?order <Orders.shippingAddress> ?shipAddr .
                      ?shipAddr <Addresses.city> ?shipCity .
                      ?shipAddr <Addresses.state> ?shipState }
         } UNION {
           ?order <Returns.product> ?prod .
           ?order <Returns.customer> ?cust .
           ?order <Returns.rma> ?rma
         }
         ?prod <Products.partNo> ?part .
         ?prod <Products.description> ?descr .
         { ?cust <Customers.id> ?num .
           ?cust <Customers.givenName> ?given .
           ?cust <Customers.familyName> ?family .
           ?cust <Customers.billingAddress> ?addr
         } UNION {
           ?cust <Employees.id> ?num .
           ?cust <Employees.givenName> ?given .
           ?cust <Employees.familyName> ?family .
           ?cust <Employees.position> ?posn
         }
         ?cust <Employees.homeAddress> ?addr .
         ?addr <Addresses.city> ?billCity .
         ?addr <Addresses.state> ?billState }

UNION Example — corresponding SQL

This SQL approximates the previous SPARQL query:

SELECT Products_0.partNo AS part,
       Products_0.description AS descr,
       U1.given,
       U1.family,
       U1.posn,U0.rma,
       Addresses_1.city AS billCity,
       Addresses_1.state AS billState,
       U0.shipCity,
       U0.shipState
  FROM (
              SELECT Orders_0.customer AS cust,
                     Orders_0.product AS prod,
                     NULL AS rma,
                     IF(Addresses_0.city IS NULL OR Addresses_0.state IS NULL, NULL, Addresses_0.city) AS shipCity,
                     IF(Addresses_0.city IS NULL OR Addresses_0.state IS NULL, NULL, Addresses_0.state) AS shipState
                FROM Orders AS Orders_0
                     LEFT OUTER JOIN Addresses AS Addresses_0 ON Orders_0.shippingAddress=Addresses_0.id
               WHERE Orders_0.customer IS NOT NULL AND Orders_0.product IS NOT NULL
        UNION ALL
              SELECT Returns_0.customer AS cust,
                     Returns_0.product AS prod,
                     Returns_0.RMAnumber AS rma,
                     NULL AS shipCity,
                     NULL AS shipState
                FROM Returns AS Returns_0
               WHERE Returns_0.customer IS NOT NULL AND Returns_0.product IS NOT NULL AND Returns_0.RMAnumber IS NOT NULL
    ) AS U0
    INNER JOIN Products AS Products_0 ON U0.prod=Products_0.id
    INNER JOIN (
              SELECT Customers_0.id AS num,
                     Customers_0.givenName AS given,
                     Customers_0.familyName AS family,
                     NULL AS posn,
                     Customers_0.billingAddress AS addr
                FROM Customers AS Customers_0
               WHERE Customers_0.id IS NOT NULL AND Customers_0.givenName IS NOT NULL AND Customers_0.familyName IS NOT NULL AND Customers_0.billingAddress IS NOT NULL
        UNION ALL
              SELECT Employees_0.id AS num,
                     Employees_0.givenName AS given,
                     Employees_0.familyName AS family,
                     Employees_0.position AS posn,
                     Employees_0.homeAddress AS addr
                FROM Employees AS Employees_0
               WHERE Employees_0.id IS NOT NULL AND Employees_0.givenName IS NOT NULL AND Employees_0.familyName IS NOT NULL AND Employees_0.position IS NOT NULL AND Employees_0.homeAddress IS NOT NULL
    ) AS U1 ON U1.num=U0.cust
    INNER JOIN Addresses AS Addresses_1 ON U1.addr=Addresses_1.id
 WHERE Products_0.partNo IS NOT NULL AND Products_0.description IS NOT NULL AND Addresses_1.city IS NOT NULL AND Addresses_1.state IS NOT NULL;

Benefits

XQuery argument: one language to access all domains.
Variables may make structure more visible.

Dual access — arbirary migration.

$query1 = "SELECT id FROM Orders WHERE Orders.shipDate = '20060501'";
mysql_query($query1);
...
$query2 = "SPARQL: SELECT ?o WHERE { ?o <Orders.product> '20060501' }";
mysql_query($query2);

Possible performance windfalls.

Implementation Status

SELECT, DISTINCT, LIMIT, ORDER BY, OFFSET
conjunction, FILTER, OPTIONAL
|| && + - * / < <= > >= BOUND

operator mapping in XTech talk:
http://www.w3.org/2005/05/22-SPARQL-MySQL/XTech#features

SPASQL in MySQL project page
http://www.w3.org/2005/05/22-SPARQL-MySQL/

SPARQLfed?

# Assign convenient prefixes to common for later syntactic shorthand.
PREFIX db: <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#SqlDB?properties=..%2Ftest%2F>
PREFIX ma: <http://med.example/ma#>
PREFIX cs: <http://med.example/cs#>

PREFIX up: <http://med.example/up#>
PREFIX sa: <http://med.example/sa#>
PREFIX mt: <http://med.example/mt#>

SELECT ?name ?chemical ?motif ?saProt ?kd50 ?like ?ld


FROM NAMED db:MicroArray.prop
FROM NAMED db:Uniprot.rdf
FROM NAMED db:ScreeningAssay.prop
FROM NAMED db:ChemStructure.prop

FROM NAMED db:MouseToxicity.prop

WHERE
{
# Get a name and a chemical from the (SQL) MicroArray database.
GRAPH db:MicroArray.prop {
         ?g	ma:name		?name .
	 ?g	ma:expression	"up" .
	 ?g	ma:experiment	?kinase .
	 ?kinase ma:against	?agin .
	 ?agin	cs:chemical	?chemical }


# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?name .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }


# Use the (SQL) Kinase databaes to limit to the interesting chemicals.
GRAPH db:ScreeningAssay.prop {
         ?a	sa:name		"KinaseAssay" .
	 ?a	cs:chemical	?chemical .	# bound to ?ma.cs:chemical
	 ?a	sa:upname	?saProt .
	 ?a	ma:kd50		?kd50

         FILTER (?kd50 >= .7 || ?kd50 < .2) }

# This (SQL) chemical database indexes like sidechains.
GRAPH db:ChemStructure.prop {
         ?c	cs:chemical	?chemical .	# bound to ?ma.cs:chemical, ?sa.cs:chemical

	 ?c	cs:structure	"asdfasdf" .
	 ?c	cs:sidechain	?side .
	 ?c2	cs:sidechain	?side .
	 ?c2	cs:chemical	?like }

# Limit by toxicity in the (SQL) MouseToxicity experiments.
GRAPH db:MouseToxicity.prop {
         ?t	cs:chemical	?like .		# bound to ?cs.cs:sidechain

	 ?t	mt:toxicity	?ld
         FILTER (?ld < .35) }
}

Use Case: Joining Pharma Databases

# Get a name and a chemical from the (SQL) MicroArray database.
GRAPH db:MicroArray.prop {
         ?g	ma:name		?name .
	 ?g	ma:expression	"up" .
	 ?g	ma:experiment	?kinase .
	 ?kinase ma:against	?agin .
	 ?agin	cs:chemical	?chemical }

}

gives me:

g	name	kinase	agin	chemical
g1	name1	kinase1	agin1	chemical1
g2	name2	kinase2	agin2	chemical2
g3	name3	kinase2	agin2	chemical3

Federation

g	name	kinase	agin	chemical
g1	name1	kinase1	agin1	chemical1
g2	name2	kinase2	agin2	chemical2
g3	name3	kinase2	agin2	chemical3

The next GRAPH query

# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?name .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }

is constrained by the variable name

How can we execute query?

dispatch the next query unconstrained by the current results.

# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?span .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }

issue the query three times, with each of (name1, name2, name3) substituted for ?name

GRAPH db:Uniprot.rdf {
         ?p	ma:name		"name1" .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }

GRAPH db:Uniprot.rdf {
         ?p	ma:name		"name2" .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }

GRAPH db:Uniprot.rdf {
         ?p	ma:name		"name3" .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }

How can we execute query?

Send current (relevent) bindings with the query:

# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?name .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }
 BINDINGS ?name { 
  ("name1")
  ("name2")
  ("name3") }

Streaming

BINDINGS section at bottom of query.

Server may return responses as client sends bindings.

# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?name .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }
 BINDINGS ?name { 
  ("name1") # ⊨ rows 1-3
  ("name2")  # ⊨ no new rows
  ("name3") } # ⊨ row 4

name	p	motif
name1	<http://...protein/352354#p>	motifA
name1	<http://...protein/856876#p>	motifB
name1	<http://...protein/833993#p>	motifC
name3	<http://...protein/685685#p>	motifX

Benefits

One connection.
One round trip.
Streamable, if you're very clever.

Costs

Language Extension.
Slightly harder to implement than any other query relay.

Deployment Plan

Depends mostly on demand.
MySQL 5 x86 binary available from the project page.
Binaries for other viable debian platforms trivially built.
Perhaps MySQL SPASQL availability will inspire other DB vendors.

SPARQL Support in MySQL

What is SPASQL?

SPASQL Example

Comparing SPARQL to SQL — references

Comparing SPARQL to SQL — links

Data Mapping Details

Data Mapping Details

Data Mapping Details

Data Mapping Details

Data Mapping — NULLs

Data Mapping — UNION

SPARQL

SQL

Costs of SPASQL

UNION Example — SPARQL

UNION Example — corresponding SQL

Benefits

Implementation Status

Next

SPARQLfed?

Use Case: Joining Pharma Databases

Federation

How can we execute query?

How can we execute query?

Streaming

Benefits

Costs

Deployment Plan