Database-Instance-Only and Database-Instances-and-Schema Mapping

From RDB2RDF
Revision as of 19:14, 28 July 2010 by Marenas (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

We present two mapping languages. The first mapping language, "Database-Instances-Only", is a simple language that only takes into account the instances of the database. The schema is not considered, and the output is only RDF.

The second mapping language, "Database-Instances-and-Schema", is a more expressive language that considers the schema and instances of the database. The output is an ontology in RDFS/OWL and RDF.


Preliminaries

Relational schemas and instances

Let U be an infinite set of constants and V an infinite set of variables. We assume that U and V are disjoint sets.

A relational schema R, or just schema, is a finite set {R1, ..., Rk} of relation symbols, with each relation symbol Ri having a fixed arity ni > 0. An instance D of R assigns to each relation symbol Ri a finite ni-ary relation RiD of elements from U (that is, RiD is a finite subset of Uni).


Let consider the following running example:

Relational schema R

student(s_id, name)
course(c_id, title, d_id) 
enrolled(s_id, c_id)
department(d_id, title)

Database instance D

student(1, John)
course(2, CS101, 3) 
enrolled(1, 2) 
department(3, CS)

Then we have that R = {student, course, enrolled, department}, where the arity of relation symbols student, enrolled and department is 2, and the arity of relation symbol course is 3. Moreover, we have that:

studentD = { (1, John) }
courseD = { (2, CS101, 3) }
enrolledD = { (1, 2) } 
departmentD = { (3, CS) }


Datalog

Syntax of Datalog rules

A Datalog rule is a rule of the form:

P(x) ← P1(x1), ..., Pk(xk), not Q1(y1), ..., not Qm(ym), u1 ≠ v1, ..., un ≠ vn

where:

  • k > 0, m ≥ 0 and n ≥ 0
  • P, P1, ..., Pk, Q1, ..., Qm are (non necessarily distinct) relation symbols
  • each of x, x1, ..., xk, y1, ..., ym is a tuple of variables (elements from V) and constants (elements from U)
  • every ui (1 ≤ i ≤ n), and every vi, is either a variable or a constant
  • the following safety conditions are satisfied:
    • every variable in x is mentioned in some tuple xj (1 ≤ j ≤ k)
    • every variable in yi (1 ≤ i ≤ m) is mentioned in some tuple xj (1 ≤ j ≤ k)
    • if ui is a variable (1 ≤ i ≤ n), then ui is mentioned in some tuple xj (1 ≤ j ≤ k)
    • if vi is a variable (1 ≤ i ≤ n), then vi is mentioned in some tuple xj (1 ≤ j ≤ k)


It should be noticed that if m = 0 in the previous Datalog rule, then the rule does not include any element of the form "not Q(x)". Similarly, if n = 0 in the previous Datalog rule, then it does not include any inequalities.

The following is a Datalog rule for our running example:

triple(x, "name", y) ← student(x, y) 

In this rule, x and y are variables, and "name" is a constant (an element from U). From now on, we use double quotes in Datalog rules to denote constants.

As a second example, consider the following Datalog rules for our running example:

A(x) ← enrolled(x, y), enrolled(x, z), y ≠ z 
B(x) ← enrolled(x, y1), enrolled(x, y2), course(y1, w1, z1), course(y2, w2, z2), y1 ≠ y2, z1 ≠ "CS", z2 ≠ "CS"

Intuitively, the first rule retrieves all the students that are taking at least two distinct courses, while the second rule retrieves all the students that are taking at least two distinct courses that are not given in the "CS" department. Finally, consider the following Datalog rule that uses the relation symbol A just defined:

C(x) ← enrolled(x, y), not A(x)

Intuitively, this rule retrieves all the students that are taking exactly one course: enrolled(x, y) indicates that x is taking at least one course, while not A(x) indicates that x is not in the table A, that is, x is not taking at least two distinct courses.

We conclude this section by mentioning that in the Datalog rule shown at the beginning, P(x) is the head of the rule and P1(x1), ..., Pk(xk), not Q1(y1), ..., not Qm(ym), u1 ≠ v1, ..., un ≠ vn is the body of the rule.

Non-recursive Datalog programs

A Datalog program Π is a finite set of Datalog rules. In Π, a relation symbol P is intensional if it is mentioned in the head of some rule of Π, and it is extensional otherwise. Then a Datalog program Π is said to be defined over a schema R if the set of extensional relation symbols of Π is a subset of R. For example, the following is a Datalog program over the schema of our running example:

A(x) ← enrolled(x, y), enrolled(x, z), y ≠ z 
C(x) ← enrolled(x, y), not A(x)

To see why this is the case, it should be noticed that {A, C} is the set of intesional relation symbols of this program, while {enrolled} is the set of extensional relation symbols of this program.

A Datalog program Π is said to be non-recursive if there exists a function f that assigns a positive number to each relation symbol in Π in such a way that for every rule in Π, if P is the relation symbol in the head of the rule and Q is a relation symbol in the body of the rule, then f(P) > f(Q). Thus, for example, the Datalog program:

A(x) ← enrolled(x, y), enrolled(x, z), y ≠ z 
C(x) ← enrolled(x, y), not A(x)

is non-recursive as the function f defined as f(C) = 3, f(A) = 2, f(enrolled) = 1 satisfies the aforementioned condition. On the other hand, the following Datalog program:

A(x) ← E(x,y), A(y)

as well as:

A(x) ← E(x,y)
A(x) ← B(x)
B(x) ← C(x)
C(x) ← A(x)

are recursive Datalog programs.


Intuitively, a Datalog program is recursive if one of its intensional predicates is defined in terms of itself. The following is a typical example of a recursive Datalog program:

flight(x,y) ← non_stop_flight(x,y)
flight(x,y) ← non_stop_flight(x,z), flight(z,y)

This program retrieves the pairs of cities (x,y) such that there is a way to flight from x to y that may include an arbitrary number of stopovers.


Important: In what follows, we consider only non-recursive Datalog programs.


Semantics of Datalog programs

To define the semantics of Datalog programs, we need to introduce some terminology. A substitution σ is a function from VU to U that is the identity on the constants (that is, σ(c) = c for every c ∈ U). Given a tuple x=(x1, ..., xk) of constants and variables and a substitution σ, tuple σ(x) is defined as (σ(x1), ..., σ(xk)). For example, if x= (x, "name", y) and σ is a substitution such that σ(x) = "1" and σ(y) = "John", then σ(x) = ("1", "name", "John").

Let R be a schema, D an instance of R, and Π a Datalog program over R, and assume that f is a function that assigns a positive number to each relation symbol in Π in such a way that for every rule in Π, if P is the relation symbol in the head of the rule and Q is a relation symbol in the body of the rule, then f(P) > f(Q) (such a function exists since Π is assumed to be non-recursive). The evaluation of Π over D assigns to every predicate symbol R mentioned in Π a relation RD of the corresponding arity. Formally, this evaluation is recursively defined as follows.

(1) For every extensional relation symbol R mentioned in Π, relation RD is just the relation assigned to R by D.

(2) Assume that P is an intensional relation symbol in Π and that for every relation symbol Q such that f(P) > f(Q), relation QD has already been computed. Then a tuple c of constants is in PD if and only if there exist a rule in Π:

P(x) ← P1(x1), ..., Pk(xk), not Q1(y1), ..., not Qm(ym), u1 ≠ v1, ..., un ≠ vn

and a substitution σ such that:

  • σ(x) = c
  • for every i ∈ {1, ..., k}: σ(xi) is a tuple in PiD
  • for every i ∈ {1, ..., m}: σ(yi) is not a tuple in QiD
  • for every i ∈ {1, ..., n}: σ(ui) ≠ σ(vi)


It is important to notice that in the previous definition, a function f is used to define the evaluation of Π over D. Thus, it is natural to ask what would happen if one replaces this function by another function g satisfying the condition mentioned in the definition. Given that f is only used to determine the order in which the rules of Π have to be evaluated, it can be formally proved that if one uses g instead of f when evaluating Π over D, then the result is the same.


We now show an example of the evaluation process. Let Π be the following Datalog program over the schema of our running example:

A(x) ← enrolled(x, y), enrolled(x, z), y ≠ z 
C(x) ← enrolled(x, y), not A(x)

Moreover, let f be a function defined as f(C) = 3, f(A) = 2 and f(enrolled) = 1, which actually shows that Π is a non-recursive Datalog program. To evaluate Π over our running database instance D, we first notice that:

enrolledD = { (1,2) }

Then we consider intensional predicate A, as we have that f(A) > f(enrolled), enrolledD has already been computed and enrolled in the only relation symbol mentioned in Π whose image under f is smaller than f(A). In this case, we have that:

AD = ∅

as there is no substitution σ such that (σ(x), σ(y)) is in enrolledD, (σ(x), σ(z)) is in enrolledD and σ(y) ≠ σ(z). Finally, we consider intensional predicate C, for which we have that:

CD = { 1 }

since for the substitution σ such that σ(x) = 1 and σ(y) = 2, we have that (σ(x), σ(y)) is in enrolledD and σ(x) is not in AD. Notice that this result corresponds with our intuition about the definition of relation symbol C, as CD contains the set of students in D that are taking exactly one course.


Built-in predicates

Fix a relational schema R, and assume that B is a set of relation symbols that are not mentioned in R. Moreover, assume that for every relation symbol R in B of arity n, there exists a (non-necessarily finite) n-ary relation I of elements from U such that for every instance D of R, it holds that RD = I. That is, the interpretation of each relation symbol in B is fixed (it does not depend on the database instances) and may be infinite.

Each relation symbol in B is called a built-in predicate. For example, if U is the set of natural numbers, then < is a built-in predicate whose interpretation in each database instance D is:

<D = { (n,m) | n and m are natural numbers and n is smaller than m }


Built-in predicates can be included in Datalog rules. More precisely, a Datalog rule with built-in predicates is a rule of the form:

P(x) ← P1(x1), ..., Pk(xk), not Q1(y1), ..., not Qm(ym), u1 ≠ v1, ..., un ≠ vn, R1(w1), ..., Rs(ws)

where:

  • k > 0, m ≥ 0, n ≥ 0 and s ≥ 0
  • P, P1, ..., Pk, Q1, ..., Qm are (non necessarily distinct) relation symbols not mentioned in B
  • R1, ..., Rs are (non necessarily distinct) built-in predicate symbols (relation symbols from B)
  • each of x, x1, ..., xk, y1, ..., ym, w1, ..., ws is a tuple of variables and constants
  • every ui (1 ≤ i ≤ n), and every vi, is either a variable or a constant
  • the following safety conditions are satisfied:
    • every variable in x is mentioned in some tuple xj (1 ≤ j ≤ k)
    • every variable in yi (1 ≤ i ≤ m) is mentioned in some tuple xj (1 ≤ j ≤ k)
    • if ui is a variable (1 ≤ i ≤ n), then ui is mentioned in some tuple xj (1 ≤ j ≤ k)
    • if vi is a variable (1 ≤ i ≤ n), then vi is mentioned in some tuple xj (1 ≤ j ≤ k)
    • every variable in wi (1 ≤ i ≤ s) is mentioned in some tuple xj (1 ≤ j ≤ k)


It is important to notice that the last safety condition is imposed to avoid rules like the following:

A(x) ← B(y), y < x

which may have an infinite number of solutions if U is the set of natural numbers and < is the usual order on these numbers.


A Datalog program Π with built-in predicates is a finite set of Datalog rules with built-in predicates. A Datalog program Π is said to be defined over relational schema R with built-in predicates B if the set of extensional relation symbols of Π is a subset of RB.

The semantics of Datalog programs with built-in predicates is a simple extension of the semantics defined above. Let D be an instance of schema R and Π a Datalog program over R with built-in predicates B, and assume that f is a function that assigns a positive number to each relation symbol in Π in such a way that for every rule in Π, if P is the relation symbol in the head of the rule and Q is a relation symbol in the body of the rule, then f(P) > f(Q) (such a function exists since Π is assumed to be non-recursive). The evaluation of Π over D assigns to every predicate symbol R mentioned in Π a relation RD of the corresponding arity. Formally, this evaluation is recursively defined as follows.

(1) For every extensional relation symbol R mentioned in Π, relation RD is just the relation assigned to R by D (in particular, for every built-in predicate R mentioned in Π, relation RD is just the fixed interpretation assigned to R by D).

(2) Assume that P is an intensional relation symbol in Π and that for every relation symbol Q such that f(P) > f(Q), relation QD has already been computed. Then a tuple c of constants is in PD if and only if there exist a rule in Π:

P(x) ← P1(x1), ..., Pk(xk), not Q1(y1), ..., not Qm(ym), u1 ≠ v1, ..., un ≠ vn, R1(w1), ..., Rs(ws)

and a substitution σ such that:

  • σ(x) = c
  • for every i ∈ {1, ..., k}: σ(xi) is a tuple in PiD
  • for every i ∈ {1, ..., m}: σ(yi) is not a tuple in QiD
  • for every i ∈ {1, ..., n}: σ(ui) ≠ σ(vi)
  • for every i ∈ {1, ..., s}: σ(wi) is a tuple in RiD

Assumptions

As we mentioned before, we only consider non-recursive Datalog programs in this document. Moreover, for the sake of readability we only consider unary keys and unary foreign keys in this document. Besides, we assume given the following built-in predicate:

generateURI(pk, s) is a built-in predicate which holds if s is a URI generated with pk


Database-Instance-Only Mapping

This mapping makes the following assumptions:

  • The user only cares about the database instances
  • No need of taking into account the schema of the database or creating an ontology (RDFS/OWL)
  • The output of the mapping are only RDF triples
  • The predicate in the RDF triple is always type rdf:Property (even though this is not explicit)
  • A relation can be either an existing table in the database or a user generated SQL query
  • The mapping is written in Datalog. However, this can be syntactically translated to W3C's RIF


Formal definition of the language

Given a relational schema R, a database-instance-only mapping over R is a Datalog program over R with built-in predicates { generateURI }.

In what follows, we show two alternative approaches to write database-instance-only mappings.


First approach: Database-instance-only mapping as a default mapping

The rules are generated automatically in this case, so that the user does not need to know any of rules or have any ontology in mind to translate his/her relational data into RDF.

To present the mapping language, for each type of rule in it: we show its template, we then apply it to the running example, and we finally present the output that an RDB2RDF system should give in the example.


Case 1: Generate triples for each attribute in a relation

Mapping template:

Triple(s, "p", p) ← r(..., pk, ..., p, ...), generateURI(pk, s)

where pk is the primary key of relation r and "p" is the attribute label of p.

Example mapping:

Triple(s, "name", name) ← student(s_id, name), generateURI(s_id, s) 
Triple(s, "title", title) ← course(c_id, title, _), generateURI(c_id, s) 
Triple(s, "title", title) ← department(d_id, title), generateURI(d_id, s)

Output:

Triple(http://..../student#1, name, John)
Triple(http://..../course#2, title, CS101)
Triple(http://..../department#3, title, CS)


Case 2: Generate triples from a foreign key relationship between two relations

Mapping template:

Triple(s, "fk", o) ← r(..., pk, ..., fk, ...), generateURI(pk, s), generateURI(fk, o) 

where pk is the primary key of relation r, fk is a foreign key in relation r and "fk" is the attribute label of fk.

Example mapping:

Triple(s, "d_id", o) ← course(c_id, _, d_id), generateURI(c_id, s), generateURI(d_id, o) 

Output:

Triple(http://..../course#2, d_id, http://..../department#3)


Case 3: Generate triples from a many-to-many relation (binary relation)

Mapping Template:

Triple(s, "p", o) ← r(fk1, fk2), generateURI(fk1, s), generateURI(fk2, o) 

where r is a binary many-to-many relation (fk1 and fk2 are foreign keys in relation r) and "p" is the concatenation of the attribute label of fk1 with the symbol _ and the attribute label of fk2.

Example mapping:

Triple(s, "s_id_c_id", o) ← enrolled(fk1, fk2), generateURI(fk1, s), generateURI(fk2, o) 

Output:

Triple(http://..../student#1, s_id_c_id, http://..../course#2)


Second approach: User can also create his/her own database-instance-only mapping rules

The rules are created by the user in this case, although he/she could use some rules that are generated automatically (like in the first approach).

To present the mapping language, we show how the rules in the first approach could be modified to consider an existing ontology (for example, name is replaced by foaf:name), and also we show some extra rules generated by a user.


Case 1: Generate triples for each attribute in a relation

Mapping template:

Triple(s, "q", p) ← r(..., pk, ..., p, ...), generateURI(pk, s)

where pk is the primary key of relation r and "q" is either the attribute label of p or a user generated property label.

Example mapping:

Triple(s, "foaf:name", name) ← student(s_id, name), generateURI(s_id, s) 
Triple(s, "dc:title", title) ← course(c_id, title, _), generateURI(c_id, s) 
Triple(s, "dc:title", title) ← department(d_id, title), generateURI(d_id, s)

Output:

Triple(http://..../student#1, foaf:name, John)
Triple(http://..../course#2, dc:title, CS101)
Triple(http://..../department#3, dc:title, CS)


Case 2: Generate triples from a foreign key relationship between two relations

Mapping template:

Triple(s, "q", o) ← r(..., pk, ..., fk, ...), generateURI(pk, s), generateURI(fk, o) 

where pk is the primary key of relation r, fk is a foreign key in relation r and "q" is the attribute label of fk or a user generated property label.

Example mapping:

Triple(s, "ex:given_at", o) ← course(c_id, _, d_id), generateURI(c_id, s), generateURI(d_id, o) 

Output:

Triple(http://..../course#2, ex:given_at, http://..../department#3)


Case 3: Generate triples from a many-to-many relation (binary relation)

Mapping Template:

Triple(s, "q", o) ← r(fk1, fk2), generateURI(fk1, s), generateURI(fk2, o) 

where r is a binary many-to-many relation (fk1 and fk2 are foreign keys in relation r) and "q" is some user generated property label.

Example mapping:

Triple(s, "ex:enrolled", o) ← enrolled(fk1, fk2), generateURI(fk1, s), generateURI(fk2, o) 

Output:

Triple(http://..../student#1, ex:enrolled, http://..../course#2)


Case 4: Rules generated by the user

The database-instance-only mapping language could be used to map some additional knowledge that the user has about the information in the relational database.

For example, assume that the user knows that every student that takes a "CS" course must belong to the "CS" department. Then he/she can add this additional knowledge with the following rule:

Triple(s, "ex:student_at", o) ← enrolled(s_id, c_id), course(c_id, _, d_id), department(d_id, "CS"), generateURI(s_id, s), 
                                 generatedURI(d_id, o)

Output:

Triple(http://..../student#1, ex:student_at, http://..../department#3)


Database-Instances-and-Schema Mapping

This mapping makes the following assumptions:

  • The user is interested in having the database instances and the schema
  • The schema of the database gets mapped to an ontology (RDFS/OWL)
  • The output of the mapping is in RDF and OWL
  • A relation can be either an existing table in the database or a user generated SQL query
  • The mapping is written in Datalog. However, this can be syntactically translated to W3C's RIF

Characteristics

  • More expressive mapping language


Representing relational schemas and instances

The following predicate symbols are used to represent the schema and the instances of a relational database. These predicates can be automatically computed, and they are necessary for the database-instances-and-schema mapping, as this language takes into consideration both the instances and the schema of a relational database.

RDB-schema predicates

Rel(r) = r is a relation 
Attr(x, r, t) = x is an attribute in relation r of type t 
PK(x, r) = attribute x is the primary key of relation r
FK(x, r, y, s) = attribute x is a foreign key in relation r that references attribute y in relation s

RDB-instances predicates

Value(r, tuple_id, a, v): tuple_id is the identifier of a tuple in the relation r, which has value v in attribute a

Continuing our running example, the following RDB-schema predicates are used to represent the schema of the database:

Rel(student)
Rel(course) 
Rel(enrolled)
Rel(department)
Attr(s_id, student, int)
Attr(name, student, string)
Attr(c_id, course int)
Attr(title, course, string)
Attr(d_id, course, int)
Attr(s_id, enrolled, int)
Attr(c_id, enrolled, int)
Attr(d_id, department, int)
Attr(title, department, string)
PK(s_id, student)
PK(c_id, course)
PK(d_id, department)
FK(d_id, course, d_id, department)
FK(s_id, enrolled, s_id, student)
FK(c_id, enrolled, c_id, course)

and the following RDB-instances predicates are used to represent the given instance:

Value(student, t1, s_id, 1) 
Value(student, t1, name, John)
Value(course, t2, c_id, 2)
Value(course, t2, title, CS101)
Value(course, t2, d_id, 3) 
Value(enrolled, t3, s_id, 1)
Value(enrolled, t3, c_id, 2)
Value(department, t4, d_id, 3)
Value(department, t4, title, CS)


Formal definition of the language

A database-instances-and-schema mapping is a Datalog program over { Rel, Att, PK, FK, Value } with built-in predicates { generateURI }.

In what follows, we show how to translate relational data into RDF triples through database-instances-and-schema mappings.


Representing ontologies (RDFS/OWL)

In the database-instances-and-schema mapping language, the user can define any number of additional predicates by writing Datalog programs over the RDB-schema and RDB-instances predicates. In this section, we show some predicates that can be generated in this way. The predicates and the rules defining them are useful when mapping relational data into RDFS/OWL, and they could be generated automatically as they are generic.


Auxiliary rules: Are useful when defining the ontology predicates

ExistsFK(x, r) ← FK(x, r, _, _)
NonFK(x, r) ← Att(x, r, _), not ExistsFK(x, r)

Notice that ExistsFK(x, r) holds if attribute x is a foreign key in relation r, while NonFK(x, r) holds if x is an attribute of relation r that is not a foreign key in r.


Rule 1: Identify Binary Relations

BinRel(r, s, t) ← Rel(r), FK(p, r, _, s), FK(q, r, _, t), p ≠ q, not ExistThreeFK(r), not ExistsNonFKAtt(r)
ExistThreeFK(r) ← FK(p1, r, _, _), FK(p2, r, _, _), FK(p3, r, _, _),  p1 ≠ p2, p1 ≠ p3, p2 ≠ p3
ExistsNonFKAtt(r) ← NonFK(x, r)

Notice that ExistThreeFK(r) holds if relation r has at least three distinct foreign keys, and ExistsNonFKAtt(r) holds if relation r has at least one attribute that is not a foreign key of r.


Rule 2: Identify Ontology Classes (relations that are not binary relations)

Class(r) ← Rel(r), not IsBinRel(r)
IsBinRel(r) ← BinRel(r, _, _)

Notice that Class(r) holds if relation r represents a class, and IsBinRel(r) holds if relation r is a binary relation.


Rule 3: Identify Object Properties through binary relations

ObjP(r, s, t) ← BinRel(r, s, t), not IsBinRel(s), not IsBinRel(t)

Notice that ObjP(r, s, t) holds if relation r represents an object property with domain s and range t


Rule 4: Identify Object Properties through a foreign key relationship

ObjP(x, s, t) ← FK(x, s, y, t), not IsBinRel(s), not IsBinRel(t)


Rule 5: Identify Datatype Properties

DTP(x, r, t)  ← NonFK(x, r), Att(x, r, t)

Notice that DTP(x, r, t) holds if attribute x represents a data type property with domain r and range t.


The following is the result of applying the previous rules to our running example:

Auxiliary rules:

ExistsFK(d_id, course)
ExistsFK(s_id, enrolled)
ExistsFK(c_id, enrolled)
NonFK(s_id, student)
NonFK(name, student)
NonFK(c_id, course)
NonFK(title, course)
NonFK(d_id, department)
NonFK(title, department)

Rule 1:

BinRel(enrolled, student, course)
ExistsNonFKAtt(student)
ExistsNonFKAtt(course)
ExistsNonFKAtt(department)

Rule 2:

Class(student)
Class(course)
Class(department)
IsBinRel(enrolled).

Rule 3:

ObjP(enrolled, student, course)

Rule 4:

ObjP(d_id, course, department)

Rule 5:

DTP(name, student, string)
DTP(title, course, string)
DTP(title, department, string)


Translating relational data into RDF triples

In this section, we present an example of the rules that the user can write in the database-instances-and-schema mapping language. It should be noticed the mapping is written using the ontology predicates, and it can be considered as a default mapping (that can be generated automatically) as it is generic. Moreover, we present the output that an RDB2RDF system should give in each of the example rules.


Case 1: Generate rdf:type triple instances

Example mapping:

Triple(s, "rdf:type", m) ← Class(m),  PK(p, m), Value(m, _, p, v),  generateURI(v, s) 

Output:

Triple(http://..../student#1, rdf:type, student)

Notice that this triple is generated by considering the facts: Class(student), PK(s_id, student), Value(student, t1, s_id, 1), generateURI(1, http://..../student#1).


Case 2: Generate owl:DatatypeProperty triple instances

Example mapping:

Triple(s, p, o) ← DTP(p, d, r), Value(d, t, p, o),  PK(pk, d), Value(d, t, pk, v), generateURI(v, s)

Output:

Triple(http://..../student#1, name, John)

Notice that this triple is generated by considering the facts: DTP(name, student, string), Value(student, t1, name, John), PK(s_id, student), Value(student, t1, s_id, 1), generateURI(1, http://..../student#1).


Case 3a: Generate owl:ObjectProperty triple instances from a binary relation

Example mapping:

Triple(s, p, o) ← BinRel(p, d, r), PK(pk1, d), Value(d, _, pk1, v1), PK(pk2, r), Value(r, _, pk2, v2), 
                   Value(p, t, pk1, v1), Value(p, t, pk2, v2), generateURI(v1, s), generateURI(v2, o) 

Output:

Triple(http://..../student#1, enrolled, http://..../course#2)

Notice that this triple is generated by considering the facts: BinRel(enrolled, student, course), PK(s_id, student), Value(student, t1, s_id, 1), PK(c_id, course), Value(course, t2, c_id, 2), Value(enrolled, t3, s_id, 1), Value(enrolled, t3, c_id, 2), generateURI(1, http://..../student#1), generateURI(2, http://..../course#2)


Case 3b: Generate owl:ObjectProperty triple instances from a foreign key relationship

Example mapping:

Triple(s, p, o) ← ObjP(p, d, r), PK(pk, d), Value(d, t, pk, v1), FK(fk, d, _, r), Value(d, t, fk, v2), 
                   generateURI(v1, s), generateURI(v2, o) 

Output:

Triple(http://..../course#2, d_id,  http://..../department#3)

Notice that this triple is generated by considering the facts: ObjP(d_id, course, department), PK(c_id, course), Value(course, t2, c_id, 2), FK(d_id, course, d_id, department), Value(course, t2, d_id, 3), generateURI(2, http://..../course#2), generateURI(3, http://..../department#3)


Rules generated by the user

As for the case of the database-instance-only mapping language, the database-instances-and-schema mapping language could be used to map some additional knowledge that the user has about the information in the relational database.


Example 1. To use an existing ontology, the user can use his/her own rules:

Triple(s, "ex:given_at", o) ← Value("course", t, "c_id", x), Value("course", t, "d_id", y), 
                               generateURI(x, s), generateURI(y, o) 

Output:

Triple(http://..../student#1, ex:student_at, http://..../department#3)


Example 2. Assume that the user knows that every student that takes a "CS" course must belong to the "CS" department. Then he/she can add this additional knowledge with the following rule:

Triple(s, "ex:student_at", o) ← Value("enrolled", t1, "s_id", x), Value("enrolled", t1, "d_id", y), 
                                 Value("department", t2, "d_id", y), Value("department",  t2, "title", "CS"), 
                                 generateURI(x, s), generatedURI(y, o)

Output:

Triple(http://..../student#1, ex:student_at, http://..../department#3)