Default Mapping

From RDB2RDF
Revision as of 13:51, 14 September 2010 by Jsequeda (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

In this document, we present the default mapping of a relational database to RDF. The default mapping will generate an RDF graph which we consider as the direct RDF graph. This document defines the transformation from a relational database to the direct RDF graph.

Running Example

We will consider the following relational database schema with the corresponding instances as our running example. The student table consist of the attribute s_id which is the primary key and the attribute name. The department table consist of the attribute d_id which is the primary key and the attribute title. The course table consists of the attribute c_id which is the primary key, the attribute title and the attribute d_id which is a foreign key that reference the attribute d_id of the department table. The enrolled table is a many-to-many relation with the attribute s_id which is a foreign key that references the attribute s_id of the student table and the attribute c_id which is a foreign key that references the attribute c_id of the course table. Both s_id and c_id are the primary key of the enrolled table.

In our running example, we consider the following database instance:

Student

s_id name
1 Bob
2 Alice

Department

d_id title
3 CS
4 EE

Course

c_id title d_id
5 Data Structures 3
6 Circuit Design 4
7 Algorithms 3

Enrolled

s_id c_id
1 5
2 6
1 6
2 7

In order to be more concise when making reference to the relational schema and the instance, we will use the following notation throughout the document to describe the relational schema:

student(s_id, name)
course(c_id, title, d_id) 
enrolled(s_id, c_id)
department(d_id, title)

and we will use the following notation for the instance in our running example:

student(1, Bob)
student(2, Alice)
department(3, CS)
department(4, EE)
course(5, Data Structures, 3)
course(6, Circuit Design,4)
course(7, Algorithms, 3)
enrolled(1, 5)
enrolled(2, 6)
enrolled(1, 6)
enrolled(2, 7)

Generating URIs

The default mapping that produces an RDF graph from a relational database needs to create IRIs identifying each tuple and the attributes. A stem URI should be the start point, such as http://www.example.com/DB. Then the default mapping generates the following URIs:


  • Relation URIs: The URI that identifies a relation is created by concatenating the stem URI with the relation name.
  • Attribute URIs: The URI that identifies a sequence of attributes of a relation is created by concatenating the stem URI with the relation name and the attribute names.
  • Tuple URIs: The URI that identifies a tuple is created by concatenating the stem URI, with the relation name and the value of the attribute which is the primary key. If the relation has a composite primary key (more than one attribute as a primary key), then the URI is created by concatenating the stem URI with the relation name and the values of the attributes of the primary key. We assume at this point that every relation has a primary key (which can be formed by all the attributes of the relation).


For example, the URI which identifies the tuple student(1, Bob) is http://www.example.com/DB/student/1#, while the URI which identifies the attribute name of the relation student is http://www.example.com/DB/student/name#.

Formally, URIs are generated in the mapping language by using the following built-in predicates:


  • generateRelationURI: Assuming that the stem URI is s and r is the name of a relation, we have that generateRelationURI(r, x) holds if x is equal to s/r#.
  • generateAttributeURI: For every i ≥ 1, there exists a predicate generateAttibuteURI with (i+2) arguments. Assuming that s is the stem URI, r is a relation name and a1, a2, ..., ai is a sequence of names of attributes of r, we have that generateAttibuteURI(r, a1, a2, ..., ai, x) holds if x is equal to s/r/a1_a2_..._ai#.
  • generateTupleURI: For every i ≥ 1, there exists a predicate generateTupleURI with (i+2) arguments. Assuming that s is the stem URI, r is a relation name and (v1, v2, ..., vi) is an instantiation of the primary key of r, we have that generateTupleURI(r, v1, v2, ..., vi, x) holds if x is equal to s/r/v1_v2_..._vi#.


For example, the following are URIs generated by using the above built-in predicates:

generateRelationURI("student", http://www.example.com/DB/student#) 
generateAttributeURI("student", "name", http://www.example.com/DB/student/name#)
generateTupleURI("student", 1, http://www.example.com/DB/student/1#)
generateTupleURI("enrolled", 1, 5, http://www.example.com/DB/enrolled/1_5#)

Default Mapping from a Relational Database to RDF

We now describe the default mapping from a relation database to RDF. We divide the whole transformation in four sections, each containing a particular type of rule. More specifically, in each section we show the template for a type of rule, we then apply it to the running example, and we finally present the output that an RDB2RDF system should give in the example.


Generate class type RDF triples

Assume that r(a1, ..., am, b1, ..., bn) is a relation with attributes a1, ..., am, b1, ..., bn, and such that (a1, ..., am) is the primary key of r. Then the default mapping includes the following rule for r:

Triple(s, "rdf:type", o) ← r(x1, ..., xm, y1, ..., yn), generateTupleURI("r", x1, ..., xm, s), generateRelationURI("r", o)

In the this rule, x1, ..., xm, y1, ..., yn are variables, which are used to store the values of the attributes of relation r for each tuple in this relation (for every i ∈ {1, ..., m}, variable xi stores the value of attribute ai, and for every j ∈ {1, ..., n}, variable yj stores the value of attribute bj), and s, o are variables that are used to store the URI of a tuple of r and the URI of relation r, respectively. In particular, (x1, ..., xm) stores the value of the primary key of relation r for each tuple in it. We refer the reader to http://www.w3.org/2001/sw/rdb2rdf/wiki/Semantics_of_R2RML for a formal definition of the semantics of the rules used in this document.


Example default mapping:

Triple(s, "rdf:type", o) ← student(x, y), generateTupleURI("student", x, s), generateRelationURI("student", o) 
Triple(s, "rdf:type", o) ← course(x, y1, y2), generateTupleURI("course", x, s), generateRelationURI("course", o)
Triple(s, "rdf:type", o) ← department(x, y), generateTupleURI("department", x, s), generateRelationURI("deparment", o)


Output:

Triple(http://www.example.com/DB/student/1#, rdf:type, http://www.example.com/DB/student)
Triple(http://www.example.com/DB/student/2#, rdf:type, http://www.example.com/DB/student)
Triple(http://www.example.com/DB/department/3#, rdf:type, http://www.example.com/DB/department)
Triple(http://www.example.com/DB/department/4#, rdf:type, http://www.example.com/DB/department)
Triple(http://www.example.com/DB/course/5#, rdf:type, http://www.example.com/DB/course)
Triple(http://www.example.com/DB/course/6#, rdf:type, http://www.example.com/DB/course)
Triple(http://www.example.com/DB/course/7#, rdf:type, http://www.example.com/DB/course)


Generate RDF triples for each non-foreign key attribute in a relation

Assume that r(a1, ..., am, b1, ..., bn) is a relation with attributes a1, ..., am, b1, ..., bn, and such that (a1, ..., am) is the primary key of r. Then for every bj (1 ≤ j ≤ n) that is not part of a foreign key of r, the default mapping includes the following rule for r and bj:

Triple(s, p, yj) ← r(x1, ..., xm, y1, ..., yn), generateTupleURI("r", x1, ..., xm, s), generateAttributeURI("r", "bj", p)  

It is important to notice that variable yj in the preceding rule is used to store the value of attribute bj in each tuple of relation r, while "bj" is a string with the name of the attribute.


Example default mapping:

Triple(s, p, y) ← student(x, y), generateTupleURI("student", x, s), generateAttributeURI("student", "name", p) 
Triple(s, p, y1) ← course(x, y1, y2), generateTupleURI("course", x, s), generateAttributeURI("course", "title", p) 
Triple(s, p, y) ← department(x, y), generateTupleURI("department", x, s), generateAttributeURI("department", "title", p) 


Output:

Triple(http://www.example.com/DB/student/1#, http://www.example.com/DB/student/name#, "Bob")
Triple(http://www.example.com/DB/student/2#, http://www.example.com/DB/student/name#, "Alice")
Triple(http://www.example.com/DB/course/5#, http://www.example.com/DB/course/title#, "Data Structures")
Triple(http://www.example.com/DB/course/6#, http://www.example.com/DB/course/title#, "Circuit Design")
Triple(http://www.example.com/DB/course/7#, http://www.example.com/DB/course/title#, "Algorithms")
Triple(http://www.example.com/DB/department/3#, http://www.example.com/DB/department/title#, "CS")
Triple(http://www.example.com/DB/department/4#, http://www.example.com/DB/department/title#, "EE")


Generate RDF triples from a foreign key relationship between two relations

Assume that r(a1, ..., am, b1, ..., bn, c1, ..., cp) is a relation with attributes a1, ..., am, b1, ..., bn, c1, ..., cp, and such that (a1, ..., am) is the primary key of r and (b1, ..., bn) is a foreign key of r that references to a relation r1. Then default mapping includes the following rule for r:

Triple(s, p, o) ← r(x1, ..., xm, y1, ..., yn, z1, ..., zp), generateTupleURI("r", x1, ..., xm, s), 
                   generateAttributeURI("r", "b1", ..., "bn", p), generateTupleURI("r1", y1, ..., yn, o)  

As in the previous cases, variables x1, ..., xm are used to store the values of the primary key of relation r for each tuple in it. Moreover, variables y1, ..., yn are used to store the values of the foreign key of r mentioned above, while "b1", ..., "bn" are strings with the names of the attributes of this foreign key.


Example default mapping:

Triple(s, p, o) ← course(x, z, y), generateTupleURI("course", x, s), generateAttributeURI("course", "d_id", p), 
                   generateTupleURI("department", y, o)


Output:

Triple(http://www.example.com/DB/course/5#, http://www.example.com/DB/course/d_id#, http://www.example.com/DB/department/3#)
Triple(http://www.example.com/DB/course/6#, http://www.example.com/DB/course/d_id#, http://www.example.com/DB/department/4#)
Triple(http://www.example.com/DB/course/7#, http://www.example.com/DB/course/d_id#, http://www.example.com/DB/department/3#)


Generate RDF triples from a many-to-many relation

Assume that r(a1, ..., am, b1, ..., bn) is a relation with attributes a1, ..., am, b1, ..., bn, and such that (a1, ..., am) , (b1, ..., bn) are foreign keys of r that reference to relations r1 and r2, respectively. Then default mapping includes the following rule for r:

Triple(s, p, o) ← r(x1, ..., xm, y1, ..., yn), generateTupleURI("r1", x1, ..., xm, s), generateRelationURI("r", p), 
                   generateTupleURI("r2", y1, ..., yn, o)  


Example default mapping:

Triple(s, p, o) ← enrolled(x, y), generateURI("student", x, s), generateRelationURI("enrolled", p),
                   generateTupleURI("course", y, o) 


Output:

Triple(http://www.example.com/DB/student/1#, http://www.example.com/DB/enrolled#, http://www.example.com/DB/course/5#)
Triple(http://www.example.com/DB/student/2#, http://www.example.com/DB/enrolled#, http://www.example.com/DB/course/6#)
Triple(http://www.example.com/DB/student/1#, http://www.example.com/DB/enrolled#, http://www.example.com/DB/course/6#)
Triple(http://www.example.com/DB/student/2#, http://www.example.com/DB/enrolled#, http://www.example.com/DB/course/7#)

Default Mapping in R2RML

Following our running example, the default mapping is represented as follows in the R2RML syntax:

TO-DO: (this is just a sketch)

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .

<#classMap1>
    a rr:ClassMap;
    rr:class <http://www.example.com/DB/student>t;
    rr:sqlDefString """
       SELECT * FROM student""";
   rr:instanceIdMap [ a rr:IRIMap; rr:column SOMETHING ];
   rr:propertyMap [ rr:property <http://www.example.com/DB/student/name#>; rr:column "name" ];
   .