Heterogeneous RDF Databases

Querying generic triple stores are inefficient as they do not gather all the properties of an entity together. This forces a self join for each additional attribute involved in the query. Migrating to a conventional relation provides the efficiency we are used to, and the brittleness we are used to enduring. The goal of the heterogeneous rdf databases is to provide an optimizable compromose between the two.


The SQL query generator parses an RDF query and examines the predicates to determine whether they will come from a conventional relation or, if the data in question isn't in any of those schemas, from a generic triple store. This allows the custodian of the data to migrate data between the triple store and entity-specific relations (henceforth ESRs) to optimize for efficiency or clarity. The data in ESRs can be accessed by conventional non-RDF tools. One likely consumer of this technology is the digital libraries community.

Order Tracking Example

An example of RDF access to an OrderTracking database shows how an RDF query can be translated into a SQL query. The heterogeneous DB introduces a side table, _Statements_, to the database with the additional triples:

customer 1marketingProfile"suburban dad"
order 3183dontTelladdress 1

The sales department has used the _Statements_ table to add information without changing the deployed schema for either Customers or Orders. Further, this data is not just in notes, but is actually linked to the other elements in the database by more than coincident sounding strings (see actual _Statements_ implementation below for how). Then can then query for optional additional marketing information when selling pool supplied with

ns testDB=<http://localhost/OrderTracking#>
attach <http://www.w3.org/1999/02/26-modules/algae#dynamic> testDB:test1 (
ask testDB:test1 (
       ?o	testDB:Orders_customer	?c .
       ?o	testDB:Orders_product	?p .
       ?p	testDB:Products_name	"pool" .
       ?c	testDB:Customers_givenName	?first .
       ?c	testDB:Customers_familyName	?last .
       ?c       <http://example.com/sales#marketingProfile> ?profile)
collect (?first ?last ?profile)

resulting in a query like

SELECT Orders_0.id AS o_id,
       Customers_0.id AS c_id,
       Products_0.id AS p_id,
       Customers_0.givenName AS first_givenName,
Customers_0.familyName AS last_familyName,
RdfIds_1.id AS profile_rdfid_id,
RdfIds_1.type AS profile_rdfid_type
FROM Orders AS Orders_0
     INNER JOIN Customers AS Customers_0 ON Orders_0.customer=Customers_0.id
     INNER JOIN Products AS Products_0 ON Orders_0.product=Products_0.id
     INNER JOIN RdfIds AS RdfIds_0 ON RdfIds_0.tableName="Customers" AND RdfIds_0.tableRowId=concat("id=", Customers_0.id)
     INNER JOIN Statements AS Statements_0 ON Statements_0.subject=RdfIds_0.id
     INNER JOIN RdfIds AS RdfIds_1 ON Statements_0.object=RdfIds_1.id
WHERE Products_0.name="pool"

leaving a little work to handle the data that's been normalized out of the _Statements_ table:

SELECT Strings.string 
     INNER JOIN Strings ON RdfIds.id=3
WHERE RdfIds.string=Strings.id
suburban dad

This provides flexibility to users of tables that do not have the power to change them, or where the deployment effort is too great to merit the sparse nature of the data.

Actual _Statements_ Implementation

In fact, the _Statements_ table is more complex (and normalized) than described above. Each predicate, subject, object in _Statements_ is a reference to _RdfIds_ which is then of the form:


and _Statements_ is keys to that table:



I've started an implementation in Algae. See the implementation notes.

Eric Prud'hommeaux <eric+www@w3.org
Valid XHTML 1.0! $Id: Overview.html,v 1.6 2003/10/14 01:36:07 eric Exp $