Heterogeneous RDF Databases

Querying generic triple stores are inefficient as they do not gather all the properties of an entity together. This forces a self join for each additional attribute involved in the query. Migrating to a conventional relation provides the efficiency we are used to, and the brittleness we are used to enduring. The goal of the heterogeneous rdf databases is to provide an optimizable compromose between the two.

Algorithm

The SQL query generator parses an RDF query and examines the predicates to determine whether they will come from a conventional relation or, if the data in question isn't in any of those schemas, from a generic triple store. This allows the custodian of the data to migrate data between the triple store and entity-specific relations (henceforth ESRs) to optimize for efficiency or clarity. The data in ESRs can be accessed by conventional non-RDF tools. One likely consumer of this technology is the digital libraries community.

Order Tracking Example

An example of RDF access to an OrderTracking database shows how an RDF query can be translated into a SQL query. The heterogeneous DB introduces a side table, _Statements_, to the database with the additional triples:

_Statements_
subject	predicate	object
customer 1	marketingProfile	"suburban dad"
order 3183	dontTell	address 1

The sales department has used the _Statements_ table to add information without changing the deployed schema for either Customers or Orders. Further, this data is not just in notes, but is actually linked to the other elements in the database by more than coincident sounding strings (see actual _Statements_ implementation below for how). Then can then query for optional additional marketing information when selling pool supplied with

ns testDB=<http://localhost/OrderTracking#>
attach <http://www.w3.org/1999/02/26-modules/algae#dynamic> testDB:test1 (
                    class=\"W3C::Rdf::SqlDB\"
                    properties=\"../test/OrderTracking.prop\")
ask testDB:test1 (
       ?o	testDB:Orders_customer	?c .
       ?o	testDB:Orders_product	?p .
       ?p	testDB:Products_name	"pool" .
       ?c	testDB:Customers_givenName	?first .
       ?c	testDB:Customers_familyName	?last .
       ?c       <http://example.com/sales#marketingProfile> ?profile)
collect (?first ?last ?profile)

resulting in a query like

SELECT Orders_0.id AS o_id,
       Customers_0.id AS c_id,
       Products_0.id AS p_id,
       Customers_0.givenName AS first_givenName,
Customers_0.familyName AS last_familyName,
RdfIds_1.id AS profile_rdfid_id,
RdfIds_1.type AS profile_rdfid_type
FROM Orders AS Orders_0
     INNER JOIN Customers AS Customers_0 ON Orders_0.customer=Customers_0.id
     INNER JOIN Products AS Products_0 ON Orders_0.product=Products_0.id
     INNER JOIN RdfIds AS RdfIds_0 ON RdfIds_0.tableName="Customers" AND RdfIds_0.tableRowId=concat("id=", Customers_0.id)
     INNER JOIN Statements AS Statements_0 ON Statements_0.subject=RdfIds_0.id
     INNER JOIN RdfIds AS RdfIds_1 ON Statements_0.object=RdfIds_1.id
WHERE Products_0.name="pool"

o_id	c_id	p_id	first_givenName	last_familyName	profile_rdfid_id	profile_rdfid_type
2185	1	1004	Biff	Thompson	3	String

leaving a little work to handle the data that's been normalized out of the _Statements_ table:

SELECT Strings.string 
FROM RdfIds
     INNER JOIN Strings ON RdfIds.id=3
WHERE RdfIds.string=Strings.id

string
suburban dad

This provides flexibility to users of tables that do not have the power to change them, or where the deployment effort is too great to merit the sparse nature of the data.

Actual _Statements_ Implementation

In fact, the _Statements_ table is more complex (and normalized) than described above. Each predicate, subject, object in _Statements_ is a reference to _RdfIds_ which is then of the form:

_RdfIds_
id	type	genId	uri	string	tableName	tableRowId
1	Ref	0	1	0	NULL	NULL
2	Table	0	0	0	Orders	id=2185
3	String	0	0	1	NULL	NULL
4	Table	0	0	0	Addresses	id=1
5	Ref	0	2	0	NULL	NULL
6	Table	0	0	0	Customers	id=1

and _Statements_ is keys to that table:

_Statements_
id	subject	predicate	object
1	2	1	3
2	2	5	4

Experience

I've started an implementation in Algae. See the implementation notes.

Eric Prud'hommeaux <eric+www@w3.org

$Id: Overview.html,v 1.6 2003/10/14 01:36:07 eric Exp $