A Direct Mapping of Relational Data to RDF

1 Introduction

Relational databases proliferate both because of their efficiency and their precise definitions, allowing for tools like SQL [SQLFN] to manipulate and examine the contents predictably and efficiently. Resource Description Framework (RDF) [RDF] is a data format based on a web-scalable architecture for identification and interpretation of terms. This document defines a mapping from relational representation to an RDF representation.

Strategies for mapping relational data to RDF abound. The direct mapping defines a simple transformation, providing a basis for defining and comparing more intricate transformations. This document includes an informal and a formal description of the transformation.

The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language. It can be also used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.

2 Direct Mapping Description (Informative)

The direct mapping defines an RDF Graph [RDF] representation of the data in any relational database. The direct mapping takes as input a relational database (data and schema), and generates an RDF graph that is called the direct graph. This graph is composed of relative IRIs that may be resolved against a base IRI per [RFC3987]. Foreign keys in relational databases establish a named reference from any row in a table to exactly one row in a (potentially different) table. The direct graph conveys these references, as well as each value in the rows.

2.1 Direct Mapping Example

The concepts in direct mapping can be introduced with an example RDF graph produced by a relational database. Following is SQL (DDL) to create a simple example with two tables with single-column primary keys and one foreign key reference between them:

CREATE TABLE Addresses (
	ID INT, 
	city CHAR(10), 
	state CHAR(2), 
	PRIMARY KEY(ID)
)

CREATE TABLE People (
	ID INT, 
	fname CHAR(10), 
	addr INT, PRIMARY KEY(ID), 
	FOREIGN KEY(addr) REFERENCES Addresses(ID)
)

INSERT INTO Addresses (ID, city, state) VALUES (18, "Cambridge", "MA")
INSERT INTO People (ID, fname, addr) VALUES (7, "Bob", 18)
INSERT INTO People (ID, fname, addr) VALUES (8, "Sue", NULL)

HTML tables will be used in this document to convey SQL tables. The primary key of these tables will be marked with the PK class to convey an SQL primary key such as ID in CREATE TABLE Addresses (ID INT, ... PRIMARY KEY(ID)). Foreign keys will be illustrated with a notation like "→ Address(ID)" to convey an SQL foreign key such as CREATE TABLE People (... addr INT, FOREIGN KEY(addr) REFERENCES Addresses(ID)).

People
PK		→ Address(ID)
ID	fname	addr
7	Bob	18
8	Sue	NULL

Addresses
PK
ID	city	state
18	Cambridge	MA

Given a base IRI http://foo.example/DB/, the direct mapping of this database produces a direct graph:

@base <http://foo.example/DB/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .


<People/ID=7#_> <rdf:type> <People> .
<People/ID=7#_> <People#ID> 7 .
<People/ID=7#_> <People#fname> "Bob" .
<People/ID=7#_> <People#addr> <Addresses/ID=18#_> .
<People/ID=8#_> <rdf:type> <People> .
<People/ID=8#_> <People#ID> 8 .
<People/ID=8#_> <People#fname> "Sue" .

<Addresses/ID=18#_> <rdf:type> <Addresses> .
<Addresses/ID=18#_> <Addresses#ID> 18 .
<Addresses/ID=18#_> <Addresses#city> "Cambridge" .
<Addresses/ID=18#_> <Addresses#state> "MA" .

In this expression, each row, e.g. (7, "Bob", 18), produces a set of triples with a common subject. The subject is an IRI formed from the concatenation of the base IRI, table name (People), primary key column name (ID) and primary key value (7). The predicate for each column is an IRI formed from the concatenation of the base IRI, table name and the column name. The values are either RDF literals formed from the lexical form of the column value, or, in the case of foreign keys, row identifiers (<Addresses/ID=18#_>). Note that these reference row identifiers must coincide with the subject used for the triples generated from the referenced row.

2.2 Preliminaries: Generating IRIs

In the process of translating relational data into RDF, the direct mapping must create IRIs for identifying tables, the columns in a table, and each row in a table. In this section, we assume that http://foo.example/DB is the the base IRI. All the examples in this section will contain relative IRIs which are to be understood as relative to this base IRI. The following are the IRIs that need to be generated:

Table IRI: The IRI that identifies a table is created by concatenating the base IRI with the table name. Specifically, if base_IRI is the base IRI and table_name is the table name, then base_IRI/table_name is the Table IRI for the table.
Column IRI:
- Single-column IRI: The IRI that identifies a column of a table is created by concatenating the base IRI with the table name and the column name. Specifically, if base_IRI is the base IRI, table_name is the table name and column_name is the column name, then base_IRI/table_name#column_name is the Column IRI for the column.
- Multi-column IRI: The IRI that identifies a sequence of two or more columns of a table is created by concatenating the base IRI with the table name and the column names. Specifically, if base_IRI is the base IRI, table_name is the table name and column_name_1, column_name_2, ..., column_name_k is a sequence of k columns (k > 1), then base_IRI/table_name#column_name_1,column_name_2,...,column_name_k is the Column IRI for the columns.
Row RDF Node:
- Row RDF Node for a row with a single-column primary key: The IRI that identifies a row is created by concatenating the base IRI with the table name, the column name of the primary key and the value of the row in that column. Specifically, if base_IRI is the base IRI, table_name is the table name, column_name is the column name of the primary key and value is the value of the row in that column, then base_IRI/table_name/column_name=value#_ is the Row RDF Node (or Row IRI) for the row.
- Row RDF Node for a row with a multi-column primary key: The IRI that identifies a row is created by concatenating the base IRI with the table name, the names of the columns that constitute the primary key and the values of the row in those columns. Specifically, if base_IRI is the base IRI, table_name is the table name, column_name_1, column_name_2, ..., column_name_k is the sequence of k columns (k > 1) that constitute the primary key, and value_1, value_2, ..., value_k is the sequence of values of the columns that constitute the primary key of the row, then base_IRI/table_name/column_name_1=value_1,column_name_2=value_2,...,column_name_k=value_k#_ is the Row RDF Node (or Row IRI) for the row.
- Row RDF Node for a row without a primary key: A fresh Blank Node is created, which is used as the Row RDF Node for the row.

Issue (hash-vs-slash):

The direct graph may be offered as Linked Open Data, raising the issue of distinguishing row identifiers from the information resources which describe them. This edition of this document presumes hash identifiers, allowing a GET on a row identifier to retrieve a small resource (i.e. not all rows from the same table) and distinguish between the retrieved resource People/ID=7 and the row People/ID=7#_. The "slash" alternative would offer a direct graph with identifiers like People/ID=7 but would demand the server respond to GET /People/ID=7 with a 303 redirect to some other resource.

Resolution:

None recorded.

2.2.1 IRIs generated for the initial example

Given the base IRI http://foo.example/DB/, the following are some of the IRIs that are used when translating into RDF the relational data given in the initial example:

For the table People, the following IRIs are considered in the translation process:

Table IRI:
```
<People> 
                     
```

Column IRIs:

<People#ID> 
<People#fname> 
<People#addr>

Row IRIs:

<People/ID=7#_> 
<People/ID=8#_>

For the table Addresses, the following IRIs are considered in the translation process:

Table IRI:
```
<Addresses> 
                     
```

Columns IRIs:

<Addresses#ID> 
<Addresses#city>  
<Addresses#state>

Row IRI:

<Addresses/ID=18#_>

2.3 Mapping Rules

Each row in the database produces a set of RDF triples with a subject, predicate, and object composed as follows:

Shared Subject: A Row RDF Node, which may be an IRI or a Blank Node, is generated for each row.
Table Triples: The row generates a triple with the following:
- Predicate: the rdf:type property
- Object: the Table IRI for the table
Literal Triples: Each column with a non-null value, including the column(s) that constitute the primary key, and that either is not the only constituent of a foreign key or is the only constituent of a foreign key that references a candidate key, generates a triple with the following:
- Predicate: the Column IRI for the column
- Object: an RDF Literal with an XML Schema datatype corresponding to the SQL datatype of that value. Per XML Datatypes for SQL Datatypes, string datatypes are expressed as an RDF plain literal
Reference Triples: Columns that constitute a foreign key and with non-null values in the row generate triples with the following:
- Predicate: the Column IRI for the columns that constitute the foreign key
- Object: the Row RDF Node for the corresponding referenced row (according to the foreign key)

Issue (primary-is-candidate-key):

Should the following exception be included in the definition of the direct mapping?

Primary-is-Candidate-Key Exception: If the primary key is also a candidate key K to table R:

The shared subject is the subject of the referenced row in R.
The foreign key K generates no reference triple.
Even if K is a single-column foreign key, it generates a literal triple.

Resolution:

None recorded.

2.3.1 Triples generated for the example in Section Direct Mapping Example

Next we show how the 11 triples in the example of Section Direct Mapping Example are classified into the above categories:

Triples generated from table People:

Table Triples:

<People/ID=7#_> <rdf:type> <People> .              
<People/ID=8#_> <rdf:type> <People> .

Literal Triples:

<People/ID=7#_> <People#ID> 7 .
<People/ID=7#_> <People#fname> "Bob" .
<People/ID=8#_> <People#ID> 8 .
<People/ID=8#_> <People#fname> "Sue" .

Reference Triple:

<People/ID=7#_> <People#addr> <Addresses/ID=18#_> .

Triples generated from table Addresses:

Table Triple:

<Addresses/ID=18#_> <rdf:type> <Addresses> .

Literal Triples:

<Addresses/ID=18#_> <Addresses#ID> 18 .
<Addresses/ID=18#_> <Addresses#city> "Cambridge" .
<Addresses/ID=18#_> <Addresses#state> "MA" .

2.4 Additional Examples and Corner Cases

2.4.1 Foreign keys referencing candidate keys

More complex schemas include compound and composite primary keys. In this example, the columns deptName and deptCity in the People table reference name and city in the Department table. The following is the schema of the augmented database:

CREATE TABLE Addresses (
	ID INT, 
	city CHAR(10), 
	state CHAR(2), 
	PRIMARY KEY(ID)
)

CREATE TABLE Deparment (
	ID INT, 
	name CHAR(10), 
	city CHAR(10), 
	manager INT, 
	PRIMARY KEY(ID), 
	UNIQUE (name, city), 
	FOREIGN KEY(manager) REFERENCES People(ID)
)

CREATE TABLE People (
	ID INT, 
	fname CHAR(10), 
	addr INT, 
	deptName CHAR(10), 
	deptCity CHAR(10), 
	PRIMARY KEY(ID), 
	FOREIGN KEY(addr) REFERENCES Addresses(ID), 
	FOREIGN KEY(deptName, deptCity) REFERENCES Department(name, city) 
)

The following is an instance of the augmented relational schema:

People
PK		→ Addresses(ID)	→ Department(name, city)
ID	fname	addr	deptName	deptCity
7	Bob	18	accounting	Cambridge
8	Sue	NULL	NULL	NULL

Addresses
PK
ID	city	state
18	Cambridge	MA

Department
PK	Unique Key		→ People(ID)
ID	name	city	manager
23	accounting	Cambridge	8

Per the People tables's compound foreign key to Department:

The row in People with deptName="accounting" and deptCity="Cambridge" references a row in Department with a primary key of ID=23.
The predicate for this key is formed from "deptName,deptCity", reflecting the order of the column names in the foreign key.
The referent identifier (object of the above predicate) is formed from the base IRI and "ID=23".

In this example, the direct mapping generates the following triples:

@base <http://foo.example/DB/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<People/ID=7#_> <rdf:type> <People> .
<People/ID=7#_> <People#ID> 7 .
<People/ID=7#_> <People#fname> "Bob" .
<People/ID=7#_> <People#addr> <Addresses/ID=18#_> .
<People/ID=7#_> <People#deptName> "accounting" .
<People/ID=7#_> <People#deptCity> "Cambridge" .
<People/ID=7#_> <People#deptName,deptCity> <Department/ID=23#_> .
<People/ID=8#_> <rdf:type> <People> .
<People/ID=8#_> <People#ID> 8 .
<People/ID=8#_> <People#fname> "Sue" .

<Addresses/ID=18#_> <rdf:type> <Addresses> .
<Addresses/ID=18#_> <Addresses#ID> 18 .
<Addresses/ID=18#_> <Addresses#city> "Cambridge" .
<Addresses/ID=18#_> <Addresses#state> "MA" .

<Department/ID=23#_> <rdf:type> <Department> .
<Department/ID=23#_> <Department#ID> 23 .
<Department/ID=23#_> <Department#name> "accounting" .
<Department/ID=23#_> <Department#city> "Cambridge" .
<Department/ID=23#_> <Department#manager> <People#ID=8#_> .

The green triples above are generated by considering the new elements in the augmented database. It should be noticed that:

Although deptName is an attribute of table People that is part of a foreign key, the Literal Triple <People/ID=7#_> <People#deptName> "accounting" is generated by the direct mapping because deptName is not the sole column of a foreign key of table People.
The Reference Triple <People/ID=7#_> <People#deptName,deptCity> <Department/ID=23#_> is generated by considering a foreign key referencing a candidate key (instead of the primary key): (deptName, deptCity) is a multi-column foreign key in the table People which references the multi-column candidate key (name, city) in the table Department.

2.4.2 Multi-column keys

We note that primary keys may also be composite. For example, if the primary key for Department were (name, city) instead of ID in the example in Section Foreign keys referencing candidate keys, then the identifier for the only row in this table would be <Department/name=accounting,city=Cambridge>, and the following triples would have been generated by the direct mapping:

<Department/name=accounting,city=Cambridge#_> <rdf:type> <Department> . 
<Department/name=accounting,city=Cambridge#_> <Department#ID> 23 . 
<Department/name=accounting,city=Cambridge#_> <Department#name> "accounting" .
<Department/name=accounting,city=Cambridge#_> <Department#city> "Cambridge" .

2.4.3 Empty (non-existent) primary keys

Even if there is no primary key, rows generate a set of triples with a shared subject, but that subject is a blank node. For instance, assume that the following table is added to the schema of the example in Section Foreign keys referencing candidate keys (for keeping track of tweets in Twitter):

CREATE TABLE Tweets (
	tweeter INT,
	when TIMESTAMP,
	text CHAR(140),
	FOREIGN KEY(tweeter) REFERENCES People(ID)
)

The following is an instance of table Tweets:

Tweets
→ People(ID)
tweeter	when	text
7	2010-08-30T01:33	I really like lolcats.
7	2010-08-30T09:01	I take it back.

Given that table Tweets does not have a primary key, each row in this table is identified by a Blank Node. In fact, when translating the above table the direct mapping generates the following triples:

@base <http://foo.example/DB/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

_:a <rdf:type> <Tweets> .
_:a <Tweets#tweeter> <People/ID=7#_> .
_:a <Tweets#when> "2010-08-30T01:33"^^xsd:dateTime .
_:a <Tweets#text> "I really like lolcats." .

_:b <rdf:type> <Tweets> .
_:b <Tweets#tweeter> <People/ID=7#_> .
_:b <Tweets#when> "2010-08-30T09:01"^^xsd:dateTime .
_:b <Tweets#text> "I take it back." .

It is not possible to dereference blank nodes ("_:a" and "_:b" above). Queries or updates may be made to these nodes via SPARQL queries.

2.4.4 Referencing tables with empty primary keys

Rows in tables with no primary key may still be referenced by foreign keys. (Relational database theory tells us that these rows must be unique as foreign keys reference candidate keys and candidate keys are unique across all the rows in a table.) References to rows in tables with no primary key are expressed as RDF triples with blank nodes for objects, where that blank node is the same node used for the subject in the referenced row.

This example includes several foreign keys with mutual column names. For clarity; here is the DDL to clarify these keys:

CREATE TABLE Projects (
	lead INT,
        FOREIGN KEY (lead) REFERENCES People(ID),
        name VARCHAR(50), 
        UNIQUE (lead, name), 
        deptName VARCHAR(50), 
        deptCity VARCHAR(50),
        UNIQUE (name, deptName, deptCity),
        FOREIGN KEY (deptName, deptCity) REFERENCES Department(name, city)
)

CREATE TABLE TaskAssignments (
	worker INT,
        FOREIGN KEY (worker) REFERENCES People(ID),
        project VARCHAR(50), 
        PRIMARY KEY (worker, project), 
        deptName VARCHAR(50), 
        deptCity VARCHAR(50),
        FOREIGN KEY (worker) REFERENCES People(ID),
        FOREIGN KEY (project, deptName, deptCity) REFERENCES Projects(name, deptName, deptCity),
        FOREIGN KEY (deptName, deptCity) REFERENCES Department(name, city)
)

The following is an instance of the preceding schema:

Projects
Unique key
	Unique key
→ People(ID)		→ Department(name, city)
lead	name	deptName	deptCity
8	pencil survey	accounting	Cambridge
8	eraser survey	accounting	Cambridge

TaskAssignments
PK
	→ Projects(name, deptName, deptCity)
→ People(ID)		→ Departments(name, city)
worker	project	deptName	deptCity
7	pencil survey	accounting	Cambridge

In this case, the direct mapping generates the following triples from the preceding tables:

@base <http://foo.example/DB/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix pencil: <http://foo.example/DB/TaskAssignment/worker=7,project=pencil+survey#_> .

_:c <rdf:type> <Projects> .
_:c <Projects#lead> <People/ID=8#_> .
_:c <Projects#name> "pencil survey" .
_:c <Projects#deptName> "accounting" .
_:c <Projects#deptCity> "Cambridge" .
_:c <Projects#deptName,deptCity> <Department/ID=23#_> .

_:d <rdf:type> <Projects> .
_:d <Projects#lead> <People/ID=8#_> .
_:d <Projects#name> "eraser survey" .
_:d <Projects#deptName> "accounting" .
_:d <Projects#deptCity> "Cambridge" .
_:d <Projects#deptName,deptCity> <Department/ID=23#_> .

pencil:_ <rdf:type> <TaskAssignments> .
pencil:_ <TaskAssignments#worker> <People/ID=7#_> .
pencil:_ <TaskAssignments#project> "pencil survey" .
pencil:_ <TaskAssignments#deptName> "accounting" .
pencil:_ <TaskAssignments#deptCity> "Cambridge" .
pencil:_ <TaskAssignments#deptName,deptCity> <Department/ID=23#_> .
pencil:_ <TaskAssignments#project,deptName,deptCity> _:c .

The absence of a primary key forces the generation of blank nodes, but does not change the structure of the direct graph or names of the predicates in that graph.

2.5 Hierarchical Tables

It is common to express specializations of some concept as multiple tables sharing a common primary key. In such cases, the primary keys of the inherited tables are in turn foreign keys to the table from which they derive.

Addresses
PK
ID	city	state
18	Cambridge	MA

Offices
PK
→ Addresses(ID)
ID	building	ofcNumber
18	32	G528

ExecutiveOffices
PK
→ Offices(ID)
ID	desk
18	oak

In this example, Offices are a specialization of Addresses and ExecutiveOffices are a specialization of Offices. The subjects for the triples implied by rows in Offices or ExecutiveOffices are the same as those for the corresponding row in Addresses.

@base <http://foo.example/DB/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<Addresses/ID=18#_> <Addresses#ID> 18 .
<Addresses/ID=18#_> <Addresses#city> "Cambridge" .
<Addresses/ID=18#_> <Addresses#state> "MA" .

<Addresses/ID=18#_> <Offices#ID> 18 .
<Addresses/ID=18#_> <Offices#building> 32 .
<Addresses/ID=18#_> <Offices#ofcNumber> "G528" .

<Addresses/ID=18#_> <ExecutiveOffices#ID> 18 .
<Addresses/ID=18#_> <ExecutiveOffices#desk> "oak" .

The Primary-is-foreign Key Exception allows the generation of a triple with an RDF literal for the ID column in the Offices and ExecutiveOffices table (Offices.ID=18 and ExecutiveOffices.ID=18).

Issue (hier-table-at-risk):

This feature attempts to intricately model some existing modeling practice but adds significant complexity. This feature is at risk.

Resolution:

None recorded.

Issue (fk-pk-order):

What if fk is a rearrangement of the pk? E.g what if TaskAssignments, with a primary key (project, worker), had a foreign key (worker, project)?

Resolution:

None recorded.

Issue (many-to-many-as-repeated-properties):

The direct graph is arguably more faithful to the conceptual model if it reflects e.g. a person with multiple addresses (some many-to-many Person2Address table) as repeated properties. It is difficult to detect which tables with exactly two foreign keys and no other attributes are many-to-many. As a counter example, a Wedding table may have exactly two spouses but it's still not a many-to-many relation in most places.

Resolution:

None recorded.

Issue (formalism-model):

The RDB2RDF working group has not decided on a formalism for representing the direct mapping. We would appreciate feedback from the community in helping us choose between Section 5. Direct Mapping Definition and Section 6. Direct Mapping as Rules.

Resolution:

None recorded.

3 Direct Mapping Definition (Normative)

3.1 Notation for this Direct Mapping

3.1.1 Notation for Types

A : a type

A? : an optional argument of type A

A ⊔ B : disjoint union of A and B

( A, B ) : tuple (Cartesian product) of types A and B

[ A ] : list of elements of type A

{ A } : set of elements of type A

{ A→B } : finite map of elements of type A to elements of type B

3.1.2 Notation for Injectors

a : an instance of an A

( a1, b1 ) : a tuple with elements a1 and b1

[ a1, a2 ] : list with elements a1 and a2

{ a1, a2 } : set with elements a1 and a2

{ a1→b1, a2→b2 } : map with elements with key a1 mapped to b1 and key a2 mapped to b2

3.1.3 Accessor Functions

AB[a] : in a map of A to B, the instance of B for a given A

3.2 Data Model Definition (Normative)

The buttons below can be used to show or hide the available syntaxes.

3.2.1 Reference Database Model Definition (Normative)

There are many models for databases in SQL literature; because the Direct Mapping does not rely on column position, we use a model which assumes a 1:1 correspondance between attribute (column name) and value, i.e. a map. Starting with a traditional model of a relational database we define a Relation (a table) which has a name, a Header, Body and primary/foreign key details. The Body contains maps from attribute names to values and the Header provides the datatypes to interpret those values.

Relational Definition

[1]	`Database`	≝	`{ TableName → Table }`
	A relational database is a mapping of relation name to relation.
	case class Database( m:Map[TableName, Table] )
[2]	`Table`	≝	`( Header, [CandidateKey], CandidateKey?, ForeignKeys, Body ) where the 2nd slot is a list of candidate keys that apply to the table, and the 3rd is an optional candidate key use as the primary key`
	A relation has a header, a list of candidate keys, a primary key (of type candidate key), a mapping of foreign keys, and a body.
	case class Table (header:Header, body:Body, candidates:List[CandidateKey], pk:Option[CandidateKey], fks:ForeignKeys)
[3]	`Header`	≝	`{ AttrName → SQLDatatype }`
	A header is a mapping from attribute name to SQL datatype.
	case class Header (types:Map[AttrName, SQLDatatype])
[4]	`CandidateKey`	≝	`[ AttrName ]`
	A candidate key is a list of attribute names.
	type CandidateKey = List[AttrName]
[5]	`ForeignKeys`	≝	`{ [AttrName] → ( Table, [AttrName] ) }`
	Foreign keys is a mapping from a list of attribute names to a relation and a list of attribute names.
	type ForeignKeys = Map[AttrName, Target] case class Target (rel:TableName, attrs:CandidateKey)
[6]	`SQLDatatype`	≝	`{ INT \| FLOAT \| DATE \| TIME \| TIMESTAMP \| CHAR \| VARCHAR \| STRING }`
	An SQL datatype is an INT, FLOAT, DATE, TIME, TIMESTAMP, CHAR, VARCHAR or STRING as defined in the SQL specification .
	sealed abstract class SQLDatatype case class SQLInt () extends SQLDatatype case class SQLFloat () extends SQLDatatype … case class SQLString () extends SQLDatatype
[7]	`Body`	≝	`[ Tuple ]`
	A body is a list of (potentially duplicate) tuples.
	type Body = List[Tuple]
[8]	`Tuple`	≝	`{ AttrName → CellValue }`
	A tuple is a mapping from attribute name to cell value.
	case class Tuple (m:Map[AttrName, CellValue])
[9]	`CellValue`	≝	`value \| Null`
	A cell value is a scalar value in some SQL datatype, or SQL NULL.
	abstract class CellValue case class LexicalValue (s:String) extends CellValue case class ␀ () extends CellValue

3.2.2 RDF Model Definition (Non-normative)

Per RDF Concepts and Abstract Syntax, an RDF graph is a set of triples of a subject, predicate and object. The subject may be an IRI or a blank node, the predicate must be an IRI and the object may be an IRI, blank node, or an RDF literal.

This section recapitulates for convience the formal definition of RDF.

RDF Definition

[10]	`Graph`	≝	`{ Triple }`
	An RDF graph is a set of RDF triples.
	type RDFGraph = Set[Triple]
[11]	`Triple`	≝	`( Subject, Predicate, Object )`
	An RDF triple contains a subject, predicate and object.
	case class Triple (s:Subject, p:IRI, o:Object)
[12]	`Subject`	≝	`IRI \| BlankNode`
	A subject is an IRI or a blank node.
	sealed abstract class Node // factor out IRIs and BNodes case class NodeIRI(i:IRI) extends Node case class NodeBNode(b:BNode) extends Node sealed abstract class Subject case class SubjectNode(n:Node) extends Subject
[13]	`Predicate`	≝	`IRI`
	A predicate is an IRI.
	sealed abstract class Predicate case class PredicateIRI(i:IRI) extends Predicate
[14]	`Object`	≝	`IRI \| BlankNode \| Literal`
	An object is an IRI, a blank node, or a literal.
	sealed abstract class Object case class ObjectNode(n:Node) extends Object case class ObjectLiteral (n:Literal) extends Object
[15]	`IRI`	≝	`RDF URI-reference as subsequently restricted by SPARQL`
	An IRI is an RDF URI reference as subsequently restricted by SPARQL.
	case class IRI(iri:String)
[16]	`BlankNode`	≝	`RDF blank node`
	A blank node is an arbitrary term used only to establish graph connectivity.
	case class BNode(label:String)
[17]	`Literal`	≝	`PlainLiteral \| TypedLiteral`
	A literal is either a plain literal or a typed literal.
	sealed abstract class Literal case class LiteralTyped(i:TypedLiteral) extends Literal case class LiteralPlain(b:PlainLiteral) extends Literal
[18]	`PlainLiteral`	≝	`(lexicalForm) \| (lexicalForm, langageTag).`
	A plain literal has a lexical form and an optional language tag.
	case class PlainLiteral(value:String, langtag:Option[String])
[19]	`TypedLiteral`	≝	`(lexicalForm, IRI).`
	An typed literal has a lexical form and a datatype IRI.
	case class TypedLiteral(value:String, datatype:IRI)

The direct mapping is a formula for creating an RDF graph from the tuples in a relation. A base IRI defines a web space for the labels in this graph; all labels are generated by appending to the base. The functions scalar and reference extract the non-Null scalar and reference attributes respectively.

[20]	`references(T, R)`	≝	`{ K ∣ ∄(T(A) = Null ∣ A ∈ K) ∧ K ≠ R.PrimaryKey ∣ K ∈ R.ForeignKeys }`
	The references function returns the attributes in any of a relation's foreign keys.
	def references (t:Tuple, r:Table):Set[List[AttrName]] = { val allFKs:Set[List[AttrName]] = r.fks.keySet val nulllist:Set[AttrName] = t.nullAttributes(r.header) val nullFKs:Set[List[AttrName]] = allFKs.flatMap(a => { val int:Set[AttrName] = nulllist & a.toSet if (int.toList.length == 0) None else List(a) }) /** Check to see if r's primary key is a hierarchical key. * http://www.w3.org/2001/sw/rdb2rdf/directMapping/#rule3 */ if (r.pk.isDefined && r.fks.contains(r.pk.get)) r.fks.keySet -- nullFKs - r.fks(r.pk.get).attrs else r.fks.keySet -- nullFKs }
[21]	`scalars(T)`	≝	`{ A in T ∣ A ≠ Null ∧ [A] ∉ references(T) }`
	The scalars function returns the attributes which are NOT in any of a relation's foreign keys.
	def scalars (t:Tuple, r:Table):Set[AttrName] = { val allAttrs:Set[AttrName] = r.header.keySet val nulllist:Set[AttrName] = t.nullAttributes(r.header) val refs = references(t, r) filter (a => a.length == 1) map (a => a(0)) allAttrs -- refs -- nulllist }

Each tuple in a relation with some candidate key can be uniquely identified by values of that key. A KeyMap(R) maps the candidate keys in a relation to a map of key values to the subject nodes assigned to each tuple.

[22]	`KeyMap`	≝	`{ CandidateKey → { [CellValue] → RDF Node } }`
	A KeyMap is a map from candidate key to a map from list of cell values to RDF nodes.
	type KeyMap = Map[CandidateKey, Map[List[CellValue], Node]]

The function directDB(DB) computes a RowIRI M for each relation with one or more candidate keys. The function directR(R, M) maps the Tuples in a Table R to an RDF graph. The following definitions assume the existance of some BaseIRI U and Database DB.

[23]	`directDB()`	≝	`{ directR(R, M) ∣ R ∈ DB }`
	The directDB of a database DB is a set of RDF triples (RDF graph) created by calling directR on each relation in DB.
	def directDB (u:BaseIRI, db:Database) : RDFGraph = { val idxables = db.keySet filter { rn => !db(rn).candidates.isEmpty } val rowIRI = idxables map {rn => rn -> relation2KeyMap(u, db(rn))} db.keySet.flatMap(rn => directR(u, db(rn), rowIRI, db)) }
[24]	`directR(R, M)`	≝	`{ directT(T, R, M) ∣ T ∈ R.Body }`
	The directR of a relation is a set of RDF triples created by calling directT on each tuple in the body of the database.
	def directR (u:BaseIRI, r:Table, nodes:RowIRI, db:Database) : RDFGraph = body(r).flatMap(t => directT(u, t, r, nodes, db))
[25]	`directT(T, R, M)`	≝	`{ directS(S, T, R, M) ∣ S = subject(T, R, M) }`
	The directT of a tuple in a relation is a set of RDF triples created by calling directS with an S created by the function `subject`.
	def directT (u:BaseIRI, t:Tuple, r:Table, nodes:RowIRI, db:Database) : Set[Triple] = { val s = subject(t, r, nodes, db) directS(u, s, t, r, nodes, db) }
[26]	`subject(T, R, M)`	≝	`if (pk(R) = ∅) then new blank node else rowIRI(R, T[pk(R)]) # references the ultimate referent of hierarchical key`
	The subject identifier for a tuple in a relation is fresh blank node, if there is no primary key, or the IRI returned from rowIRI of that primary key's attribute values in that tuple.
	def subject (t:Tuple, r:Table, nodes:RowIRI, db:Database):Node = if (r.candidates.size > 0) { // Known to have at least one key, so take the first one. val k = r.candidates(0) val vs = t.lexvaluesNoNulls(k) nodes.ultimateReferent(r.name, k, vs, db) } else /** Table has no candidate keys. */ freshbnode()
[27]	`directS(S, T, R, M)`	≝	`{ directL(S, R, A) ∣ A ∈ scalars(T, R) } ∪ { directN(S, As, T, M) ∣ As ∈ references(T, R) }`
	The directS of a subect, tuple and relation is the set of RDF triples created by: calling directL on each scalar attribute in T, calling directN on each foreign key in T
	def directS (u:BaseIRI, s:Node, t:Tuple, r:Table, nodes:RowIRI, db:Database) : Set[Triple] = { references(t, r).map(as => directN(u, s, as, r, t, nodes)) ++ scalars(t, r).map(a => directL(u, r.name, s, a, r.header, t)) }
[28]	`directL(S, R, A)`	≝	`triple(S, propertyIRI(R, [A]), literalmap(A))`
	The directL of a subject, relation and attribute is the RDF triple with that subject, the predicate returned from propertyIRI, and the object returned from literalmap.
	def directL (u:BaseIRI, rn:TableName, s:Node, a:AttrName, h:Header, t:Tuple) : Triple = { val p = propertyIRI (u, rn, List(a)) val l = t.lexvalue(a).get val o = literalmap(l, h.sqlDatatype(a)) Triple(s, p, o) }
[29]	`directN(S, R, As)`	≝	`triple(S, propertyIRI(R, As), rowIRI(R, As))`
	The directN of a subject, relation and list of attributes is the RDF triple with that subject, a predicate returned from propertyIRI, and the object returned by rowIRI of the list of attributes.
	def directN (u:BaseIRI, s:Node, as:List[AttrName], r:Table, t:Tuple, nodes:RowIRI) : Triple = { val p = propertyIRI (u, r.name, as) val ls:List[LexicalValue] = t.lexvaluesNoNulls(as) val target = r.fks(as) val o:Object = nodes(target.rel)(target.attrs)(ls) Triple(s, p, o) }

rowIRI generates a row IRI. propertyIRI generates a property IRI.

[31]	`rowIRI(R, As)`	≝	`IRI(UE(R.name) + "/" + (join(',', UE(A.name) + "=" + UE(A.value)) ∣ A ∈ As ) + "#_")`
	A rowIRI is a concatonation, with punctuation as separators, of a base IRI, url-encoded relation name, and the attribute name/value pairs in the list of attributes.
	def rowIRI (u:BaseIRI, rn:TableName, as:List[AttrName], ls:List[LexicalValue]) : IRI = { val pairs:List[String] = as.zip(ls).map(x => UE(x._1) + "=" + UE(x._2.s)) u + ("/" + UE(rn) + "/" + pairs.mkString("_") + "#_") }
[32]	`propertyIRI(R, As)`	≝	`IRI((join(',', UE(A.name)) ∣ A ∈ As ) "#" As.name)`
	A propertyIRI is a concatonation, with punctuation as separators, of a base IRI, url-encoded relation name, and the attribute names the list of attributes.
	def propertyIRI (u:BaseIRI, rn:TableName, as:List[AttrName]) : IRI = u + ("/" + UE(rn) + "#" + as.mkString("_"))

literalmap produces RDF literal with XSD datatypes with this type mapping TM:

[40] literalmap(A) ≝ Literal(A[V], SQL2XSD[A]) ∣ SQL2XSD is the mapping from SQL datatypes to XML datatypes below:

XML Datatypes for SQL Datatypes
SQL	XSD data type for typed literals, "plain literal" for plain literals
INT	http://www.w3.org/TR/xmlschema-2/#integer
FLOAT	http://www.w3.org/TR/xmlschema-2/#float
DATE	http://www.w3.org/TR/xmlschema-2/#date
TIME	http://www.w3.org/TR/xmlschema-2/#time
TIMESTAMP	http://www.w3.org/TR/xmlschema-2/#dateTime
CHAR	plain literal
VARCHAR	plain literal
STRING	plain literal

UE (url-encode) is the conventional url encoding used for e.g. HTML CGI forms:

[41] UE(T) ≝ url-encode T per WSDL urlEncoded.

4 Direct Mapping as Rules (Normative)

In this section, we formally present the Direct Mapping as rules in Datalog syntax. The left hand side of each rule is the RDF Triple output. The right hand side of each rule consists of a sequence of predicates from the relational database and built-in predicates. The built-in predicates are:

generateTableIRI(x, y): Given a table name x, it generates the Table IRI y
generateColumnIRI(x, y, z): Given a table name x and a non-empty list of columns y, it generates the Column IRI z
generateRowIRI(x, y₁, y₂, z): Given a table name x, a non-empty list y₁ of columns and a non-empty list y₂ of values (for the columns in y₁), it generates the Row RDF Node (or Row IRI) z.
generateRowBlankNode(x, y, z): Given a table name x without a primary key and the list y of values for a row of table x, it generates the Row RDF Node z for the row (which is a Blank Node in this case).

Consider again the example from Section Transformation Example. It should be noticed that in the rules presented in this section, a formula of the form Addresses(x, y, z) indicates that variables x, y and z are used to store the values of a row in the three columns of the table Addresses (according to the order specified in the schema of the table, that is, x, y and z store the values of ID, city and state, respectively). Moreover, double quotes are used in the rules to refer to the string with the name of a table or a column. For example, a formula of the form generateRowIRI("Addresses", ["ID"], [x], p) is used to generate the Row RDF Node (or Row IRI) p for the row of table "Addresses" whose value in the primary key "ID" is the value stored in the variable x.

4.1 Generating Table Triples

4.1.1 Table has a single-column primary key

Assume that r(a, b₁, ..., b_n) is a table with columns a, b₁, ..., b_n and such that [a] is the primary key of r. Then the following is the direct mapping rule to generate Table Triples from r:

Triple(s, "rdf:type", o) ← r(x, y₁, ..., y_n), generateRowIRI("r", ["a"], [x], s), generateTableIRI("r", o)

4.1.2 Table has a multi-column primary key

Assume that r(a₁, ..., a_m, b₁, ..., b_n) is a table with columns a₁, ..., a_m, b₁, ..., b_n and such that [a₁, ..., a_m] is the primary key of r (m > 1). Then the following is the direct mapping rule to generate Table Triples from r:

		
Triple(s, "rdf:type", o) ← r(x₁, ..., x_m, y₁, ..., y_n), generateRowIRI("r", ["a₁", ..., "a_m"], [x₁, ..., x_m], s), generateTableIRI("r", o)

4.1.3 Table does not have a primary key

Assume that r(b₁, ..., b_n) is a table with columns b₁, ..., b_n and such that r does not have a primary key. Then the following is the direct mapping rule to generate Table Triples from r:

		
Triple(s, "rdf:type", o) ← r(y₁, ..., y_n), generateRowBlankNode("r", [y₁, ..., y_n], s), generateTableIRI("r", o)

4.2 Generating Literal Triples

4.2.1 Table has a single-column primary key

Assume that r(a, b₁, ..., b_n) is a table with columns a, b₁, ..., b_n and such that [a] is the primary key of r. Then if a is not the only constituent of a foreign key of r or is the only constituent of a foreign key of r that references a candidate key, the direct mapping includes the following rule for r and a to generate Literal Triples:

		
Triple(s, p, x) ← r(x, y₁, ..., y_n), generateRowIRI("r", ["a"], [x], s), generateColumnIRI("r", ["a"], p)

Moreover, for every b_j (1 ≤ j ≤ n) that is not the only constituent of a foreign key of r or is the only constituent of a foreign key of r that references a candidate key, the direct mapping includes the following rule for r and b_j to generate Literal Triples:

		
Triple(s, p, y_j) ← r(x, y₁, ..., y_n), generateRowIRI("r", ["a"], [x], s), generateColumnIRI("r", ["b_j"], p)

4.2.2 Table has a multi-column primary key

Assume that r(a₁, ..., a_m, b₁, ..., b_n) is a table with columns a₁, ..., a_m, b₁, ..., b_n and such that [a₁, ..., a_m] is the primary key of r (m > 1). Then for every a_j (1 ≤ j ≤ m) that is not the only constituent of a foreign key of r or is the only constituent of a foreign key of r that references a candidate key, the direct mapping includes the following rule for r and a_j to generate Literal Triples:

		
Triple(s, p, x_j) ← r(x₁, ..., x_m, y₁, ..., y_n), generateRowIRI("r", ["a₁", ..., "a_m"], [x₁, ..., x_m], s), generateColumnIRI("r", ["a_j"], p)

		
Triple(s, p, y_j) ← r(x₁, ..., x_m, y₁, ..., y_n), generateRowIRI("r", ["a₁", ..., "a_m"], [x₁, ..., x_m], s), generateColumnIRI("r", ["b_j"], p)

4.2.3 Table does not have a primary key

Assume that r(b₁, ..., b_n) is a table with columns b₁, ..., b_n and such that r does not have a primary key. Then for every b_j (1 ≤ j ≤ n) that is not the only constituent of a foreign key of r or is the only constituent of a foreign key of r that references a candidate key, the direct mapping includes the following rule for r and b_j to generate Literal Triples:

		
Triple(s, p, y_j) ← r(y₁, ..., y_n), generateRowBlankNode("r", [y₁, ..., y_n], s), generateColumnIRI("r", ["b_j"], p)

4.3 Generating Reference Triples

In this section we will define the rules to generate reference triples. The different cases include when a foreign key references a single-column or multi-column primary key of another table and when a foreign key references a single-column or multi-column candidate key of another table which may or may not have a primary key.

A Direct Mapping of Relational Data to RDF

W3C Working Draft 18 November 2010

Abstract

Status of this Document

Table of Contents

Appendix

1 Introduction

2 Direct Mapping Description (Informative)

2.1 Direct Mapping Example

2.2 Preliminaries: Generating IRIs

2.2.1 IRIs generated for the initial example

2.3 Mapping Rules

2.3.1 Triples generated for the example in Section Direct Mapping Example

2.4 Additional Examples and Corner Cases

2.4.1 Foreign keys referencing candidate keys

2.4.2 Multi-column keys

2.4.3 Empty (non-existent) primary keys

2.4.4 Referencing tables with empty primary keys

2.5 Hierarchical Tables

3 Direct Mapping Definition (Normative)

3.1 Notation for this Direct Mapping

3.1.1 Notation for Types

3.1.2 Notation for Injectors

3.1.3 Accessor Functions

3.2 Data Model Definition (Normative)

3.2.1 Reference Database Model Definition (Normative)

Relational Definition

3.2.2 RDF Model Definition (Non-normative)

RDF Definition

4 Direct Mapping as Rules (Normative)

4.1 Generating Table Triples

4.1.1 Table has a single-column primary key

4.1.2 Table has a multi-column primary key

4.1.3 Table does not have a primary key

4.2 Generating Literal Triples

4.2.1 Table has a single-column primary key

4.2.2 Table has a multi-column primary key

4.2.3 Table does not have a primary key

4.3 Generating Reference Triples

5 References

A CVS History