Abstract

W3C's Resource Description Framework, RDF [1], has been of great interest for networked data expression due in part to the popularity of the World Wide Web, and in part to its foundations in more traditional knowledge representation and reasoning systems. Among the attractive features of this language is the ability to distribute ontology development by basing terms in dereferencable, globally unambiguous identifiers. For this, RDF uses URIs whose scalability has been proven by the success of the web. This paper introduces a relational database/RDF gateway called Federate.

SQL has worked well for intranets and homogeneous trust environments. When relational data is made available to wider audiences, custodians must create custom gateways. This is often done via web sites which present content generated out of the database; RDF provides a similar path from the database to the world, but without the loss of semantic precision which occurs in using HTML. RDF's universally grounded entities and relationships provide a mechanism to unify RDBs with other RDBs, as well as static data and other sources. The resulting network of universally interpretable data can be used as a medium for future data-oriented applications.

Data encoding languages such as XDR and ASN.1 concern themselves with expressing data structures but leave the semantics up to context hidden in the communication protocol endpoints and in contained data available to higher level protocols. This limits the re-use of this data to applications which are given special knowledge of this context. Fully self-described documents allow transport layer protocols like SMTP, FTP, NNTP and HTTP to provide caching, authentication and other generic infrastructure functions.

Introduction
Overview of RDF Model
"Intranet" Use Case
"Internet" Use Case
Implementations
Syntax and Semantics of n3
Syntax and Semantics of algae
Example 1 - OrderTracking Billing Information (JOIN)
Example 2 - OrderTracking Shipping Information (OUTER JOIN)
Algorithm
Migration Between Holds and Normalized
Language Extents
Modeling n-ary Relations
Query Distribution
View Update
Alternative Approaches
Example Tables
Glossary
Bibliography

Introduction

RDF is a language for expressing Directed Labeled Graphs (DLGs) using URIs (a superset of URLs) as node and arc identifiers. It can be expressed in XML or in non-XML syntaxes like n3 or ntriples. The three statements:

testDB:Orders_id_2185 has an testDB:Orders-customer of testDB:Customers_id_1
testDB:Orders_id_2185 has an testDB:Orders-product of testDB:Products_id_1004
testDB:Orders_id_2185 has an testDB:Orders-orderDate of "Jan 1 2001 13:24:55"

can be easily visuallized as a graph.

As DLGs clearly express relationships between objects, RDF is an excellent language for expressing relational data. For instance, the above statements came from a row in an example OrderTracking database:

Orders

Customers

id	customer	product	orderDate	shippingAddress
2185	1	1004	20010101132455	NULL
2186	2	2001	20020902132455	NULL
2187	2	2004	20020902132455	NULL
3183	2	2005	20020902132455	2

id	givenName	familyName	billingAddress
1	Biff	Thompson	1
2	Chip	Thompson	1
3	Eustis	Walker	2
4	Elie	Tweak	3

Enthusiasm for XML and semi-structured data has prompted some shift of highly structured relational data into XML documents and databases. Despite this, storage of structured data is still highly concentrated in relational databases, and, for performance and integrity checking reasons, likely to stay that way. Tools like Federate can be used to provide a consistent interface to the vast data stores in commercial infrastructure. This consistent interface simplifies the job of defining views of the data that will be available to different actors. These views may be joined with views from other publishers, creating a much more useful fabric of data that doesn't require human intuition (and luck) to interpret.

Storing RDF data in a relational database is not novel, however, Federate provides RDF agents with the ability to query a relational database with an application-specific schema. Querying RDF stored as Statements incurs the cost of an additional join per restriction condition (WHERE or JOIN ON clause in SQL). In general, properly normalized relational databases do not need a join for unique properties, ie, properties for which there is one and only one slot. Federate provides RDF data with the speed, compactness, and integrety constraints of conventional relational databases.

Overview of RDF Model

RDF is a language for modeling data in binary relations. Relations are expressed in statements composed of a subject, object and a predicate (relationship) between them. These statements combine to form a directed, labeled graph (DLG).

The data expressed may be anything to which one may assign relationships. Data and meta-data are expressed in the same language and may appear in the same document. Those familiar with UML may note that RDF imposes no distinction between the data, model and meta-model (...). Statements are unordered as assertions in predicate logic are unordered. Ordering of lists is handled withing the model with specific predicates.

"Intranet" Use Case

Database federation needs arise typically from independently developed and administered data. Relational schemes impose strict understanding of the relationships of the modeled data. However, joining two databases can be tricky as these relationships may be less clear. This can arrise from incomplete understanding, but more often will arrise when the custodians of the data have different needs. While the elements in the microworld described in two databases may coincide, it is not guaranteed they will correspond exactly to entities in both database. For instance, a vender database that maintains a part number for each tracked item may not correspond to a client's database which has sales names for these items, or house-defined names for groups of the items, or part numers from that vender interspersed with part numbers from other venders. Sometimes the correlation of two entities can only be asserted for a particular domain of query, eg, for the purposes of this query, butter melting frobnosticators can be considered the same as heated mystifiers. This use case considers the case where such an impedance mismatch occurs for data that is managed by different departments in a large organization.

Custom code can be written to handle joins across two independently maintained databases within the same organization, but it will only be useful for those two databases, and serviced queries will have to have the same attitude about what items can be merged. It is frequently impractical to provide a meta language to describe what items can be conditionally merged, and when. Development of such languages leads to many years of theoretical exploration.

Mapping databases to RDF has proven quite simple. Slightly more challenging is the use of ontology languages to join disparate databases. RDFS and OWL can be used to relate these databases, either in a universal it-is-always-true-that sense, or in a premise, for-the-purposes-of-this-query, sense. This allows one to code the expertise embedded in custom database gateways in a language that has already been developed to globally unambiguously model data.

Instead of simply defining mappings between two databases, it is also possible to define mappings to a "house standard" of entity references. This allows the mapping problem to be addressed once for each database, rather than once for each pair of database.

"Internet" Use Case

Because the "intranet" use case did not rely on coordination of disparate databases within an organization, the same approach can be used to magage relationships between databases published by different organizations, for instance, a vendor and a product rating service.

A product database may have some public visibility in the form of a marketing portal. This is currently done by publishing product lists in HTML, frequently with forms for purchasing the advertised products. Consumer desire to use specialized tools to browse and submit orders may motivate the vendor to publish the data in a predictable, machine-readbale language. This eliminates tedious screen scraping from the more necessary task of semantic interpretation of the data.

A consumer-oriented rating service, ala Consumer Reports, can now publish their product reviews reusing the vendor's entity identifiers. Users of this data now have a trivial job of joining the two sources as they have the equivilent of an external key. This is one ideal of the semantic web.

In the common scenario that publishers do not agree on common identifers for items, the same mechanisms used in the "intranet" use case can be used to establish the useful relationships.

Implementations

At present, Federate works with two reasoning agents, algae [1], and cwm [2]. Most of the experimentation has been done on the W3C access control database and a sample OrderTracking database.

Syntax and Semantics of n3

A subset of n3 is used this paper. It can be thought of as a series of statements, each terminated by a '.'. Each statement consists of a subject, relation, and object. Each of the nodes may be a URI enclosed in <> or a qname as described in XML namespaces. Statements may be grouped in a formula. This has the effect of removing them from the assertions of that document. These formulae are used in, amoung other things, rules. The standard format of a rule is:

<> log:forAll ','-sepparated var list .
{ statements with vars } log:implies { statements with vars } .

In fact, n3 is much more than the portion used for examples in this document. Unlike Algae, it uses triples to declare rules (and implicitly, queries) and to declare the relationship between documents and the data in those documents. For more information, see A Rough Guide to N3.

Syntax and Semantics of algae

Algae is a simple, pragmatic query and rules language. The functions that are relevent to this paper are:

ns prefix=uri: declare the prefix prefix for the URI uri.
attach type dbName (parameter=value...): attach to database of type type and call it dbName.
ask dbName (triple ...): select a graph described by the triples from the database dbName. Each triple consists of a subject, relation, and object in the same syntax as n3.
collect (variable ...): select variables from the results of an ask

The connectives are described below in Language Extents.

Example 1 - OrderTracking Billing Information (JOIN)

The following algae query of the example OrderTracking database demonstrates access to a simple, but practical JOIN on a set of database relations (tables). See the complete tables from which this data was taken, the algae script and the resulting graph.

ns testDB=<http://localhost/OrderTracking#>
attach <http://www.w3.org/1999/02/26-modules/algae#dynamic> testDB:test1 (
                    class="W3C::Rdf::SqlDB"
                    properties="../test/OrderTracking.prop")
ask testDB:test1 (
       ?o	testDB:Orders_id		?orderId .
       ?o	testDB:Orders_customer		?c .
       ?o	testDB:Orders_orderDate		20020907 .
       ?o	testDB:Orders_product		?p .
       ?p	testDB:Products_name		?productName .
       ?c	testDB:Customers_givenName	?first .
       ?c	testDB:Customers_familyName	?last .
       ?c	testDB:Customers_billingAddress	?billAddr .
       ?billAddr testDB:Addresses_street	?billStreet .
       ?billAddr testDB:Addresses_city		?billCity .
       ?billAddr testDB:Addresses_state		?billState
      )
collect (?orderId ?productName ?first ?last ?billStreet ?billCity ?billState)

Following is a breakdown of the same query in n3 of the OrderTracking database.

n3 distinguishes variables be explicitly listing existentials and universals. Following "log:forAll" is a list of universal quantifiers.

this log:forAll :o, :d, :p, :productName, :c, :first, :last, :billAddr, :billStreet, :billCity, :billState .

<sql://rdftest@swada.w3.org/OrderTracking/> is log:authoritativeService of
	Orders:id, Orders:customer, Orders:product, Orders:orderDate, 
	Products:id, Products:name, 
	Customers:id, Customers:familyName, Customers:givenName, Customers:billingAddress, 
	Addresses:id, Addresses:street, Addresses:city, Addresses:state.

This example query doesn't rely on any external schema definition; the external keys are associated with their table/primary key pairs with "log:pointsAt".

Orders:product log:pointsAt Products:id .
Orders:customer log:pointsAt Customers:id .
Customers:billingAddress log:pointsAt Addresses:id .

 :o Orders:id :orderId .
 :o Orders:customer :c .
 :o Orders:orderDate 20020907 .
 :o Orders:product :p .

 :p Products:name :productName .

 :c Customers:givenName :last .
 :c Customers:familyName :first .
 :c Customers:billingAddress :billAddr .   

 :billAddr Addresses:street :billStreet .
 :billAddr Addresses:city :billCity .
 :billAddr Addresses:state :billState .

For each order :o, find the order date, product name and the customer's name and billing address.

SELECT Orders_0.id AS o_id,
       Customers_0.id AS c_id,
       Products_0.id AS p_id,
       Products_0.name AS productName_name,
Customers_0.givenName AS first_givenName,
Customers_0.familyName AS last_familyName,
Addresses_0.id AS billAddr_id,
       Addresses_0.street AS billStreet_street,
Addresses_0.city AS billCity_city,
Addresses_0.state AS billState_state
FROM Orders AS Orders_0
     INNER JOIN Customers AS Customers_0 ON Orders_0.customer=Customers_0.id
     INNER JOIN Products AS Products_0 ON Orders_0.product=Products_0.id
     INNER JOIN Addresses AS Addresses_0 ON Customers_0.billingAddress=Addresses_0.id
WHERE Orders_0.orderDate="20020907"

orderId	productName	first	last	billStreet	billCity	billState
"2185"	"pool"	"Biff"	"Thompson"	"123 Elm Street"	"EdgeCity"	"AV"
"2187"	"nose ring"	"Chip"	"Thompson"	"123 Elm Street"	"EdgeCity"	"AV"
"3183"	"other ring"	"Chip"	"Thompson"	"123 Elm Street"	"EdgeCity"	"AV"

Example 2 - OrderTracking Shipping Information (OUTER JOIN)

This example extends the OrderTracking JOIN example above by adding optional arcs for shipping address and contact (person to sign for the shipment). This demonstrates the algorithm's ability to manage multiple joins to the same table. See the complete tables from which this data was taken, the algae script and the resulting graph.

ns testDB=<http://localhost/OrderTracking#>
attach <http://www.w3.org/1999/02/26-modules/algae#dynamic> testDB:test1 (
                    class="W3C::Rdf::SqlDB"
                    properties="../test/OrderTracking.prop")
ask testDB:test1 (
       ?o	testDB:Orders_id		?orderId .
       ?o	testDB:Orders_customer		?c .
       ?o	testDB:Orders_orderDate		20020907 .
       ?c	testDB:Customers_givenName	?first .
       ?c	testDB:Customers_familyName	?last .
       ?c	testDB:Customers_billingAddress	?billAddr .
       ?billAddr testDB:Addresses_street	?billStreet .
       ?billAddr testDB:Addresses_city		?billCity .
       ?billAddr testDB:Addresses_state		?billState .
       ~?o	testDB:Orders_shippingAddress	?shipAddr .
       ~?shipAddr testDB:Addresses_street	?shipStreet .
       ~?shipAddr testDB:Addresses_city		?shipCity .
       ~?shipAddr testDB:Addresses_state	?shipState .
       ~?shipAddr testDB:Addresses_contact	?signer .
       ~?signer	testDB:Customers_givenName	?sFirst .
       ~?signer	testDB:Customers_familyName	?sLast .
      )
collect (?orderId ?first ?last ?billStreet ?billCity ?billState ?sFirst ?sLast ?shipStreet ?shipCity ?shipState)

specifies that the Addresses table JOINed on the relvar Orders.shippingAddress, as well as the Customers table then joined on Addresses.contact, are JOINed with an OUTER join:

yielding:

orderId	first	last	billStreet	billCity	billState	sFirst	sLast	shipStreet	shipCity	shipState
"2185"	"Biff"	"Thompson"	"123 Elm Street"	"EdgeCity"	"AV"	NULL	NULL	NULL	NULL	NULL
"2187"	"Chip"	"Thompson"	"123 Elm Street"	"EdgeCity"	"AV"	NULL	NULL	NULL	NULL	NULL
"3183"	"Chip"	"Thompson"	"123 Elm Street"	"EdgeCity"	"AV"	"Eustis"	"Walker"	"245 King Street"	"EdgeCity"	"AV"

Algorithm

The SQL query generator parses an RDF query and examines the predicates to determine from which relation they will come. The algorithm is roughly as follows:

0. compile table structure into Classes with Property objects

For each tableName {
    c = new Class(tableName)
    for each fieldName {
        c.addProperty(new Property(c, fieldName)
    }
}

1. find relation for predicate

Each relation has a set of attribute identifiers (column headings). The predicates served by the database are a function of the relation and the attribute within that relation. This can be a strait string transformation (as in the above example) or a lookup table to allow for more flexible property names.

property = getProperty(predicate uri)
property.processTerm(triple, db)

2. identify entity alias

Entities that are the subject of a triple will either be named explicitly by an URI or be identified by a variable. Each time a new variable is introduced to identify entities in the same relation, a new table alias is generated. This allows the Addresses relation to be used as both the home address and the shipping address above.

db.getAlias(subject)
    if (ret = symbol2table{symbol}) return ret
    ret = symbol2table{symbol} = tableAlias{table}++ # Orders_0

Migration Between Holds and Normalized Data Stores

Suppose Chip places an order for a nose ring and we wish to store this in a generic triple store. This would be composed of the triples:

[ customer [ givenName "Chip" ] .
  product  [ name "nose ring" ] ] .

which implies the following triples, using _:<n> to represent an unnamed node in the graph:

_:1 customer _:2 .
_:2 givenName "Chip" .
_:1 product _:3 .
_:3 name "nose ring" .

If we wanted a normalized database to optimize storage and retrieval of this information, we could create an OrderTracking database:

Orders

Customers

Products

id	customer	product
2187	2	2004

id	givenName
2	Chip

id	name
2004	nose ring

In doing this, we must invent some integers (2187, 2, 2004) to use as identifiers for the rows in the tables. These enable use to store the relationships conveyed by _:1, _:2 and _:3 above. The id attribute for Orders, Customers and Products is not an attribute of the row, but an artifact of the mechanism for storing internal structure.

For consistency between the two expressions (holds and normalized), it is necessary to treat all external keys (Orders.customer and Orders.product above) not as integers, but as references to the entity represented by the indicated row in some table. A consequence of this contraint is that a benighted relational database that does not know the external keys, but instead treats all joins at being constrained on coincident attributes will report the object of would-be foreign key arcs as simply the integer representing this primary key of the referenced table/row.

_:1 customer 2 .

This would be analogous to a generic triple store reporting that the object of an arc some memory address instead of the node in the graph that was the true object of that arc. When later informed of these external keys, the object will no longer be an integer, but instead an entity reference.

_:1 customer <http://...Customers#id.204> .

These two statements are non-monotonic. This is appropriate as this system is indeed inconsistent and reflects a change in the state of the microuniverse.

Frequently, larger distribution organizations assign identifiers to orders and products. These are seldom small integers that should be used as primary keys, but instead published alphanumeric strings appearing in catalogs or returned to the customer who has place an order.

Orders

Customers

Products

id	customer	product	orderID
2187	2	2004	200303120001a

id	givenName
2	Chip

id	orderID	name
2004	nr21a.5	nose ring

This would add two triples to the above list.

_:1 customer _:2 .
_:2 givenName "Chip" .
_:1 product _:3 .
_:3 name "nose ring" .
_:1 orderID "200303120001a" .
_:3 prodID "nr21a.5" .

It is possible that a custodian of the relational store would choose to publish the external keys as order or product identifiers. In such a case, the primary key would be a published attribute of the entity represented by the table row. External keys would remain as opaque references to the entity.

_:1 customer _:2 .
_:2 givenName "Chip" .
_:1 product _:3 .
_:3 name "nose ring" .
_:1 orderID "2187" .
_:3 prodID "2004" .

The current implementations of Federate assign ids to rows in tables based on the primary key. This may not be good practice, but is temptingly convenient and useful.

Language Extents

Federate can interpret most common logical connectives and express them in SQL to the federated sources. The cwm port can, at this time, express only conjunction, but the development platform, algae, can express and execute the standard set:

connective	algae	cwm
conjunction	(?a ?b ?c)(?d ?c ?e)	:a :b :c. :d :c :e.
disjunction	(\|\| (?a ?b ?c)(?a ?b ?d))	not implemented
negation	!(?a ?b ?c)	not implemented
outer join	(?a ?b ?c)~(?d ?c ?e)	not implemented
domain constraints
regular expression	~http://example.com/.*	not implemented

@@@For instance, the somewhat complicated looking query:@@@

Modeling n-ary Relations

A common criticism of RDF is the consequence of the decision to model relations as only binary. This requires additional enitties to be created when modeling n-ary relations. For example, the common supplier schenaro:

requires that a Supplies entity be created though it does not model a phenomena and not an object in the microuniverse. (It could be argued that the Supplies entity models a contract or a percieved acquisition agreement.) This turns out to not be a problem for mapping to the relational world as RDBs are similarly constrained. A relation may not have duplicate tuples, implying a primary key (with potentially more than one attribute). The entity represented by this tuple can be represented by a node in an RDF graph where the node can be named by a function of the attributes in the primary key. Alternatively, the node may be unnamed but distingueshed by arcs describing the attributes in the primary key.

Query Distribution

Many relational databases provide a mechanism to distribute database tuples and attributes in a manner that allows a query planner to predict which sites will have which data. The same processes can be used by an SQL query engine handling Federate requests. This could also be done by the Federate engines cwm and algae as both have unifiers to merge data from SQL with data from other sources like HTTP or rules. At present, both implementations decide which SQL store to query based on the predicate. Distributions based on the subject or object would be analogous to relation databases with tuples distributed at different sites. This should not be significant more difficult than the attribute distribution which is already implemented.

View Update

Inserting or replacing arcs derivative of a normalized relational store is analogous to the classic view update problem in relational databases. Briefly, a view is created by defining a query involving one or more base tables and assigning the resulting table a name. This name becomes available for constructing future queries. Whether the view is populated with tuples which must be updated when the base tables are updates, or simply invokes the defining query each time it is referenced, is an implementation detail that the custodian should not have to consider. The view update problem occurs when one inserts, updates, or deletes the triples in a view. Such updates need to reflect back to updates in the base tables in such a way that the base tables and the derivative view remain consistent.

C J Date outlines one approach to view update. At this time, this approach has not been implemented in Federate. Given a set up triple assertions which reflect changes to the microuniverse, it seems feasible to use this approach to create a set of SQL updates resulting in a consistent system.

Heterogeneous RDF Databases

A benefit of semi-structured databases is the flexibility to store data outside of a predefined schema. RDF is designed so that adding statements to a graph does not change the meaning of the existing data. This enables agents to interpret the portion of the database that they understand and safely ignore the rest. In a sales context, RDF can be used to store data in a schema that evolves rapidly to suit client needs. As mentioned above, storing arbitrary RDF in a relational database is commonly implemented. This data may be stored in the same database (though different tables) as normalized data. The agent storing data can examine the data and see if it is appropriate for a normalized table. If not, it can be stored in the arbitrary RDF store. A query agent can examine the terms in a query and separate those that are in the normalized database from those stored in the generic triple store. It is possible to shift all the unificatoin to the database by constructing joins between the arbitrary RDF store tables and the normalized tables. For example, the following query searches for extra terms introduced ad hoc by the sales department:

@@@

The constructed SQL query is large, but effective:

@@@

Alternative Approaches

As stated above, the emergence of XML tools has enabled relational data to be expressed in XML semi-structured databases. For instance, the example OrderTracking database could trivially be stored in an XML tree structure.

Colloquial XML

<OrderTracking xmlns="http://example.com/colloquial-ns#"
  <Orders>
    <Order id="2185">
      <customer>1</customer>
      <product>1004</product>
      <orderDate>20020902132455</orderDate>
    </Order>
    <Order id="2186">
      <customer>2</customer>
      <product>2001</product>
      <orderDate>20020902132457</orderDate>
    </Order>
  </Orders>
  <Customers>
    <Customer id="1">
      <givenName>Biff</givenName>
      <familyName>Thompson</familyName>
      <billingAddress>1</billingAddress>
    </Customer>
    <Customer id="2">
      <givenName>Chip</givenName>
      <familyName>Thompson</familyName>
      <billingAddress>1</billingAddress>
    </Customer>
  </Customers>
</OrderTracking>

taken from a larger example colloquial-embedded.xml.

@@@ ... Leads us to consider the difference between a database communication protocol and a more application specific protocol for placing orders ala XMLP.

There are a few advantages to encoding this data in XML. The principle one is human readability/editability which comes at a cost of verbosity. Another is the use of standard XML constraint languages like schemas and dtds, though these are usually available directly in a relational database and more tailored to the semantics of database attributes. A third is the use of XML transformation tools, like xinclude and XSTL, to manipulate the structure of the document.

The most useful tool for such an XML database is XQuery. This language is designed to provide similar functionality to SQL but work on XML databases. The advantage of SQL of XQuery is that SQL is tied more closely to the semantics of the relations. Where a query for billing address for the pending orders would look like

SELECT Orders.id, familyName, givenName 
FROM Orders 
INNER JOIN Customers ON customer=Customers.id

or the familiar form

SELECT Orders.id, familyName, givenName 
FROM Orders, Customers
WHERE customer=Customers.id

in XQuery, it would be more verbose:

FOR $o in doc("colloquial.xml")/OrderTracking/Orders/Order,
    $c in doc("colloquial.xml")/OrderTracking/Customers/Customer,
WHERE $o/customer/text() = $r/attributes("id")
RETURN <answer>
         {$r/attributes("id")}
         {$c/familyName}
         {$c/givenName}
       </answer>

Contrasting SQL and XQuery

function	SQL	XQuery
table selection	FROM Orders, Customers	FOR $o in doc("colloquial.xml")/OrderTracking/Orders/Order, $c in doc("colloquial.xml")/OrderTracking/Customers/Customer,
constraints	WHERE customer=Customers.id	WHERE $o/customer/text() = $r/attributes("id")
output selection	SELECT Orders.id, familyName, givenName	RETURN <answer> {$r/attributes("id")} {$c/familyName} {$c/givenName} </answer>

The overhead in processing "/OrderTracking/Orders/Order" is much higher than selecting the Orders table in a relational store. The XQuery processor must check the root node to see if it matches "OrderTracking". If it does, it must check each of the immediate children in the XML tree to see if it matches "Orders" and for each match, repeat the process for "Order". An XML database could assume such a layout of the data and store it in relations, but then it would be unable to process arbitary XQuery (XPath, actually) paths. Such an implementation would actually be a relational database with an alternative query syntax.

On the generation side, it will again be inefficient to create an tree stucture capable of storing arbitrary tree data when a simple derived relation would do. I believe, however, that the output is guaranteed to be an XML tree so the benefits of closure (the same operations may be performed on the output as on the input), such as nested expressions, would be available to XML.

Relational Model With Embedded Data Typing

<DB:Database xmlns:DB="http://example.com/relationalModel-ns#">
  <DB:Relation DB:name="Orders">
    <DB:Tuple>
      <DB:Attribute DB:AttrName="id" DB:Domain="DB:Integer" DB:PrimaryKeyOrdinal="0">2185</DB:Attribute>
      <DB:Attribute DB:AttrName="customer" DB:Domain="DB:Integer" DB:ExternalKey="Customers:id">1</DB:Attribute>
      <DB:Attribute DB:AttrName="product" DB:Domain="DB:Integer" DB:ExternalKey="Products:id">1004</DB:Attribute>
      <DB:Attribute DB:AttrName="orderDate" DB:Domain="DB:Date">20020902132455</DB:Attribute>
    </DB:Tuple>
    <DB:Tuple>
      <DB:Attribute DB:AttrName="id" DB:Domain="DB:Integer" DB:PrimaryKeyOrdinal="0">2186</DB:Attribute>
      <DB:Attribute DB:AttrName="customer" DB:Domain="DB:Integer" DB:ExternalKey="Customers:id">2</DB:Attribute>
      <DB:Attribute DB:AttrName="product" DB:Domain="DB:Integer" DB:ExternalKey="Products:id">2001</DB:Attribute>
      <DB:Attribute DB:AttrName="orderDate" DB:Domain="DB:Date">20020902132457</DB:Attribute>
    </DB:Tuple>
  </DB:Relation>
  <DB:Relation DB:name="Customers">
    <DB:Tuple>
      <DB:Attribute DB:AttrName="id" DB:Domain="DB:Integer" DB:PrimaryKeyOrdinal="0">1</DB:Attribute>
      <DB:Attribute DB:AttrName="givenName" DB:Domain="DB:PersonName">Biff</DB:Attribute>
      <DB:Attribute DB:AttrName="familyName" DB:Domain="DB:PersonName">Thompson</DB:Attribute>
      <DB:Attribute DB:AttrName="billingAddress" DB:Domain="DB:Integer" DB:ExternalKey="Addresses:id">1</DB:Attribute>
    </DB:Tuple>
    <DB:Tuple>
      <DB:Attribute DB:AttrName="id" DB:Domain="DB:Integer" DB:PrimaryKeyOrdinal="0">2</DB:Attribute>
      <DB:Attribute DB:AttrName="givenName" DB:Domain="DB:PersonName">Chip</DB:Attribute>
      <DB:Attribute DB:AttrName="familyName" DB:Domain="DB:PersonName">Thompson</DB:Attribute>
      <DB:Attribute DB:AttrName="billingAddress" DB:Domain="DB:Integer" DB:ExternalKey="Addresses:id">1</DB:Attribute>
    </DB:Tuple>
  </DB:Relation>
</DB:Database>

taken from a larger example RM-embedded.xml.

This schema is a way to represent any relational database. The same tools could be used to query/manipulate the OrderTracking database as any other database.

Native RDF Database

One strong motivation for gateways between RDF and relational data is the ability of relational databases to take advantage of data modeling constraints to increase efficiency. For instance, if a set of entities have a unique attributes, they can be indexed with a B-Tree yielding log(n) access time. The holds database is a generic triple store and therefor may not be indexed so efficiently. Further, a query for a tuple constrained by n properties is quite efficient in a normalized store, but must be resolved with n-1 self-joins in the holds database. All the operations described above to store appropriate RDF data in a relational store impose schema constraints on the data and take advantage of them to increase efficiency.

The constraint over multiple attributes on a tuple takes advantage of a database's locality. In general, relational stores are designed so that retrieving any attributes from a tuple retrieves the whole tuple. Extending an object database to support RDF natively, we could provide for that same locality to represent a node and all the properties going to or coming from that node. Further, the database can take advantage of schema constraints like OWL's InverseFunctionalProperty to create unique indexes to map these property values back to the respective nodes. The query planner can look for these properties and optimize database queries much as they are done with conventional relational databases. DLG showing localization of properties to the nodes that touch

As many arcs touch two nodes, and as those nodes need to be retrievable when acquired via either node, duplication may be necessary. This improves the efficiency to read, say, the Product 1004 or Order 2185, but increases the burden for updating the database (need two writes). This is efficient in scenarios where reading happens more frequently than update.

Example Tables

Four tables were used in the examples above:

Orders
id	customer	product	orderDate	shippingAddress
2185	1	1004	20020907	NULL
2186	2	2001	20020908	NULL
2187	2	2004	20020907	NULL
3183	2	2005	20020907	2

Customers
id	givenName	familyName	billingAddress
1	Biff	Thompson	1
2	Chip	Thompson	1
3	Eustis	Walker	2
4	Elie	Tweak	3

Products
id	name
1001	white house
1002	picket fence
1003	sport utility vehicle
1004	pool
1005	grill
2001	skateboard
2002	rebelious music
2003	earring
2004	nose ring
2005	other ring

Addresses
id	apt	street	city	state	contact
1	NULL	123 Elm Street	EdgeCity	AV	1
2	1	245 King Street	EdgeCity	AV	3
3	18b	Ally 17	BigCity	AV	3

These are also available in a text file.

Glossary

tuple
RDB relation: A relation is an abstraction of what we commonly call a table. In this abstraction, there is no order to the rows or columns and no two rows may have the same attributes (values). Attributes must be referenced by name and rows must be referenced by distinguishing attributues.
RDF relation
ER relation
domain (data type)
proposition
microuniverse: fraction of the real world which is modeled in a databaes; also called the domain of discourse

Bibliography

- C. J. Date