W3C logo
slanted W3C logo
 Cover page images (keys)

SPARQL Support in MySQL

Eric Prud'hommeaux, RDF Data Access Working Group team contact

http://www.w3.org/2007/Talks/0710-spasql/

What is SPASQL?

SPASQL Example

Example use case: OrderTracking.

Orders
idcustomerproductorderDateshippingAddress
218651873852006-05-01 01:08:30NULL
218620829222006-05-01 18:34:38NULL
218751874392006-05-01 01:05:30NULL

Comparing SPARQL to SQL — references

SQL graph shape comes from table aliases:

... WHERE o.orderDate < "20060501"
      AND o.shipDate = "20060508"

SPARQL graph shape (mostly) comes from variable.

... WHERE { ?o <orderDate> "20060501" .
            ?o <shipDate> "20060508" }

Comparing SPARQL to SQL — links

SQL restricts cross joins on attribute values.

  ...
  FROM Orders AS o
       JOIN Products AS p ON o.product=p.id

SPARQL has links in the data model that it queries.

 ...
 WHERE { ?o <Orders.products> ?p }

Data Mapping Details

... ...

Data Mapping Details

... ...

Data Mapping Details

... ...
{ ?o <Orders.product> ?p }

Data Mapping Details

... ...
{ ?o <Orders.product> ?p
  ?o <Orders.shipDate> ?ship }

Data Mapping — NULLs

... ...
{ ?t <Orders.product> ?p
  OPTIONAL { ?t <Orders.shipDate> ?ship } }

Data Mapping — UNION

SPARQL UNION corresponds to an SQL subselected UNION

SPARQL

WHERE { { ?x <TableA.field> ?y }
        UNION
        { ?x <TableB.field> ?y } }

SQL

JOIN ( SELECT TableA.field AS foo FROM ... WHERE ...
       UNION
       SELECT TableB.field [AS foo] FROM ... WHERE ...  ) AS U0 ON U0.foo=...

Costs of SPASQL

UNION Example — SPARQL

SELECT ?part ?descr ?given ?family ?posn ?rma ?billCity ?billState ?shipCity ?shipState
 WHERE { { ?order <Orders.product> ?prod .
           ?order <Orders.customer> ?cust .
           OPTIONAL { ?order <Orders.shippingAddress> ?shipAddr .
                      ?shipAddr <Addresses.city> ?shipCity .
                      ?shipAddr <Addresses.state> ?shipState }
         } UNION {
           ?order <Returns.product> ?prod .
           ?order <Returns.customer> ?cust .
           ?order <Returns.rma> ?rma
         }
         ?prod <Products.partNo> ?part .
         ?prod <Products.description> ?descr .
         { ?cust <Customers.id> ?num .
           ?cust <Customers.givenName> ?given .
           ?cust <Customers.familyName> ?family .
           ?cust <Customers.billingAddress> ?addr
         } UNION {
           ?cust <Employees.id> ?num .
           ?cust <Employees.givenName> ?given .
           ?cust <Employees.familyName> ?family .
           ?cust <Employees.position> ?posn
         }
         ?cust <Employees.homeAddress> ?addr .
         ?addr <Addresses.city> ?billCity .
         ?addr <Addresses.state> ?billState }

UNION Example — corresponding SQL

This SQL approximates the previous SPARQL query:

SELECT Products_0.partNo AS part,
       Products_0.description AS descr,
       U1.given,
       U1.family,
       U1.posn,U0.rma,
       Addresses_1.city AS billCity,
       Addresses_1.state AS billState,
       U0.shipCity,
       U0.shipState
  FROM (
              SELECT Orders_0.customer AS cust,
                     Orders_0.product AS prod,
                     NULL AS rma,
                     IF(Addresses_0.city IS NULL OR Addresses_0.state IS NULL, NULL, Addresses_0.city) AS shipCity,
                     IF(Addresses_0.city IS NULL OR Addresses_0.state IS NULL, NULL, Addresses_0.state) AS shipState
                FROM Orders AS Orders_0
                     LEFT OUTER JOIN Addresses AS Addresses_0 ON Orders_0.shippingAddress=Addresses_0.id
               WHERE Orders_0.customer IS NOT NULL AND Orders_0.product IS NOT NULL
        UNION ALL
              SELECT Returns_0.customer AS cust,
                     Returns_0.product AS prod,
                     Returns_0.RMAnumber AS rma,
                     NULL AS shipCity,
                     NULL AS shipState
                FROM Returns AS Returns_0
               WHERE Returns_0.customer IS NOT NULL AND Returns_0.product IS NOT NULL AND Returns_0.RMAnumber IS NOT NULL
    ) AS U0
    INNER JOIN Products AS Products_0 ON U0.prod=Products_0.id
    INNER JOIN (
              SELECT Customers_0.id AS num,
                     Customers_0.givenName AS given,
                     Customers_0.familyName AS family,
                     NULL AS posn,
                     Customers_0.billingAddress AS addr
                FROM Customers AS Customers_0
               WHERE Customers_0.id IS NOT NULL AND Customers_0.givenName IS NOT NULL AND Customers_0.familyName IS NOT NULL AND Customers_0.billingAddress IS NOT NULL
        UNION ALL
              SELECT Employees_0.id AS num,
                     Employees_0.givenName AS given,
                     Employees_0.familyName AS family,
                     Employees_0.position AS posn,
                     Employees_0.homeAddress AS addr
                FROM Employees AS Employees_0
               WHERE Employees_0.id IS NOT NULL AND Employees_0.givenName IS NOT NULL AND Employees_0.familyName IS NOT NULL AND Employees_0.position IS NOT NULL AND Employees_0.homeAddress IS NOT NULL
    ) AS U1 ON U1.num=U0.cust
    INNER JOIN Addresses AS Addresses_1 ON U1.addr=Addresses_1.id
 WHERE Products_0.partNo IS NOT NULL AND Products_0.description IS NOT NULL AND Addresses_1.city IS NOT NULL AND Addresses_1.state IS NOT NULL;

Benefits

Implementation Status

operator mapping in XTech talk:
http://www.w3.org/2005/05/22-SPARQL-MySQL/XTech#features

SPASQL in MySQL project page
http://www.w3.org/2005/05/22-SPARQL-MySQL/

Next

SPARQLfed?

# Assign convenient prefixes to common for later syntactic shorthand.
PREFIX db: <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#SqlDB?properties=..%2Ftest%2F>
PREFIX ma: <http://med.example/ma#>
PREFIX cs: <http://med.example/cs#>

PREFIX up: <http://med.example/up#>
PREFIX sa: <http://med.example/sa#>
PREFIX mt: <http://med.example/mt#>

SELECT ?name ?chemical ?motif ?saProt ?kd50 ?like ?ld


FROM NAMED db:MicroArray.prop
FROM NAMED db:Uniprot.rdf
FROM NAMED db:ScreeningAssay.prop
FROM NAMED db:ChemStructure.prop

FROM NAMED db:MouseToxicity.prop

WHERE
{
# Get a name and a chemical from the (SQL) MicroArray database.
GRAPH db:MicroArray.prop {
         ?g	ma:name		?name .
	 ?g	ma:expression	"up" .
	 ?g	ma:experiment	?kinase .
	 ?kinase ma:against	?agin .
	 ?agin	cs:chemical	?chemical }


# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?name .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }


# Use the (SQL) Kinase databaes to limit to the interesting chemicals.
GRAPH db:ScreeningAssay.prop {
         ?a	sa:name		"KinaseAssay" .
	 ?a	cs:chemical	?chemical .	# bound to ?ma.cs:chemical
	 ?a	sa:upname	?saProt .
	 ?a	ma:kd50		?kd50

         FILTER (?kd50 >= .7 || ?kd50 < .2) }

# This (SQL) chemical database indexes like sidechains.
GRAPH db:ChemStructure.prop {
         ?c	cs:chemical	?chemical .	# bound to ?ma.cs:chemical, ?sa.cs:chemical

	 ?c	cs:structure	"asdfasdf" .
	 ?c	cs:sidechain	?side .
	 ?c2	cs:sidechain	?side .
	 ?c2	cs:chemical	?like }

# Limit by toxicity in the (SQL) MouseToxicity experiments.
GRAPH db:MouseToxicity.prop {
         ?t	cs:chemical	?like .		# bound to ?cs.cs:sidechain

	 ?t	mt:toxicity	?ld
         FILTER (?ld < .35) }
}

Use Case: Joining Pharma Databases

# Get a name and a chemical from the (SQL) MicroArray database.
GRAPH db:MicroArray.prop {
         ?g	ma:name		?name .
	 ?g	ma:expression	"up" .
	 ?g	ma:experiment	?kinase .
	 ?kinase ma:against	?agin .
	 ?agin	cs:chemical	?chemical }

}

gives me:

gnamekinaseaginchemical
g1name1kinase1agin1chemical1
g2name2kinase2agin2chemical2
g3name3kinase2agin2chemical3

Federation

gnamekinaseaginchemical
g1name1kinase1agin1chemical1
g2name2kinase2agin2chemical2
g3name3kinase2agin2chemical3

The next GRAPH query

# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?name .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }

is constrained by the variable name

How can we execute query?

  1. dispatch the next query unconstrained by the current results.
    # The uniprot data (in RDF) has motif and pathway information.
    GRAPH db:Uniprot.rdf {
             ?p	ma:name		?span .		# bound to ?ma.ma:name
    	 ?p	up:motif	?motif .
    	 ?p	up:pathway	"apoptosis" }
    
  2. issue the query three times, with each of (name1, name2, name3) substituted for ?name
    GRAPH db:Uniprot.rdf {
             ?p	ma:name		"name1" .		# bound to ?ma.ma:name
    	 ?p	up:motif	?motif .
    	 ?p	up:pathway	"apoptosis" }
    
    GRAPH db:Uniprot.rdf {
             ?p	ma:name		"name2" .		# bound to ?ma.ma:name
    	 ?p	up:motif	?motif .
    	 ?p	up:pathway	"apoptosis" }
    
    GRAPH db:Uniprot.rdf {
             ?p	ma:name		"name3" .		# bound to ?ma.ma:name
    	 ?p	up:motif	?motif .
    	 ?p	up:pathway	"apoptosis" }
    

How can we execute query?

Send current (relevent) bindings with the query:

# The uniprot data (in RDF) has motif and pathway information.
GRAPH db:Uniprot.rdf {
         ?p	ma:name		?name .		# bound to ?ma.ma:name
	 ?p	up:motif	?motif .
	 ?p	up:pathway	"apoptosis" }
 BINDINGS ?name { 
  ("name1")
  ("name2")
  ("name3") }

Streaming

Benefits

Costs

Deployment Plan