Warning:
This wiki has been archived and is now read-only.

R2RML in a custom syntax

From RDB2RDF
Jump to: navigation, search

This page presents a syntax proposal for the R2RML language. It is based neither on RDF nor XML, but it is a custom syntax insipired by SPARQL.

Overview

This syntax combines elements from various previous languages and proposals:

  • It uses full SQL queries for most mapping tasks, like the various syntax proposals for the SQL-based approach.
  • It uses RDF templates similar to SPARQL CONSTRUCT templates, like Virtuoso RDF Views.
  • It uses a two-phase SQL query and triple instantiation approach, similar to Virtuoso, D2RQ or the Revelytix mapping language.

In its simplest form, a mapping file consists of one CONSTRUCT-like template for each mapped table. The templates are instantied with the rows of their table. Instead of simple tables, one can also use arbitrary SELECT queries. Further features are: re-usable templates for IRI generation; support for named graphs; expression of uniqueness constraints and foreign key constraints over the views.


Prefixes and base URI

Prefixes are established with the PREFIX keyword, identical to SPARQL:

PREFIX biz: <http://biz.example.org/biz-ontology#>
PREFIX ex: <http://example.org/>

All relative URIs are resolved against a base URI, which can be supplied by the processing environment, or explicitly established using BASE:

BASE <http://example.org/>


Table maps

A table map consists of two parts: a table name, and a triple template. The template will be instantiated for each record in the table. Variables in the template are replaced with column values from the record.

TABLE emp
TEMPLATE {
    []
        a biz:Employee;
        biz:employeeNumber ?empno;
        biz:name ?ename .
}

Instead of a blank node, one can use an IRI template to identify the resource:

TABLE emp
TEMPLATE {
    <http://example.com/employee/{?empno}>
        a biz:Employee;
        biz:employeeNumber ?empno;
        biz:name ?ename;
        biz:department <http://example.com/department/{?deptno}>.
}

IRI templates can be defined once and then re-used multiple times. A defined IRI template works like a function: It is invoked with arguments, and inside the template definition, the arguments are available as {1}, {2} etc:

IRITEMPLATE empIRI <http://example.com/employee/{1}>
IRITEMPLATE deptIRI <http://example.com/department/{1}>
TABLE emp
TEMPLATE {
    empIRI(?empno)
        a biz:Employee;
        biz:employeeNumber ?empno;
        biz:name ?ename;
        biz:department deptIRI(?deptno).
}

The GRAPH keyword can be used to assign the produced triples to a specific graph. Its value can be an IRI, IRI template, IRI template function, QName, or variable.

GRAPH <http://example.com/data/employees>
GRAPH <http://example.com/data/department/{?deptno}>

If different triples within the template should be placed in different graphs, then the GRAPH keyword can be used within the template, similar to the GRAPH keyword in SPARQL's SELECT clause:

TEMPLATE {
    GRAPH <data/public> {
        empIRI(?empno)
            a biz:Employee;
            biz:employeeNumber ?empno;
            biz:name ?ename;
            biz:department deptIRI(?deptno).
    }
    GRAPH <data/private> {
        empIRI(?empno) biz:salary ?salary .
    }
}


View maps

Instead of a TABLE, one can reference an arbitrary SQL query.

SQL {
    SELECT empno, ename FROM emp WHERE emp.status != -1
}
TEMPLATE {
    []
        a biz:Employee;
        biz:employeeNumber ?empno;
        biz:name ?ename .
}

In this case, one SHOULD declare UNIQUE KEY and FOREIGN KEY constraints on the query result. A UNIQUE KEY states that an attribute in the SQL query result does not contain any duplicate values; this is helpful for optimization. Here, the empno column from the SQL query contains no duplicates:

UNIQUE KEY ?empno

A FOREIGN KEY constraint states that all the values of an attribute in the SQL query result are either NULL or are values that exist in some other attribute. Normal SQL foreign keys are constraints on base tables; this is the same concept, but on view results. The following constraint states that the deptno attribute of the SQL query result contains values from the deptno column of the dept table:

FOREIGN KEY ?deptno REFERENCES TABLE dept(?deptno)

One can also reference attributes of other view maps. In order to be referenced, a view map needs an identifier:

ID :dept

Then it can be referenced like this:

FOREIGN KEY ?deptno REFERENCES ID :dept(?deptno)

Simple example

This is a simple example based on the Department/Employee example used for the XML- and Turtle-based syntax proposals.

BASE <http://example.com/>
PREFIX biz: <http://biz.example.org/biz-ontology#>

IRIPATTERN deptIRI AS <dept/{1}>
IRIPATTERN empIRI AS <emp/{1}>

TABLE dept
TEMPLATE {
    deptIRI(?deptno) a biz:Department;
        biz:name ?dname;
        biz:location ?loc .
}

TABLE emp
TEMPLATE {
    empIRI(?empno) a biz:Employee;
        a <jobs/{?job}>;
        a <etypes/{?etype}>;
        biz:employeeNumber ?empno;
        biz:name ?ename;
        biz:department deptIRI(?deptno);
}


Complex example

This is equivalent to the previous example, but uses view maps instead of table maps, and additionally places the mapped information into a number of different graphs.

BASE <http://example.com/>
PREFIX biz: <http://biz.example.org/biz-ontology#>

IRIPATTERN deptIRI AS <dept/{1}>
IRIPATTERN empIRI AS <emp/{1}>

ID :empMap
SQL {
    Select deptno, dname, loc from dept
}
UNIQUE KEY ?deptno
GRAPH <data/departments>
TEMPLATE {
    deptIRI(?deptno) a biz:Department;
        biz:name ?dname;
        biz:location ?loc .
}

ID :deptMap
SQL {
    Select empno, job, etype, ename, deptno from emp
}
UNIQUE KEY ?empno
FOREIGN KEY ?deptno REFERENCES ID :empMap(?deptno)
GRAPH <data/{?job}/{?etype}>
TEMPLATE {
    empIRI(?empno) a biz:Employee;
        a <jobs/{?job}>;
        a <etypes/{?etype}>;
        biz:employeeNumber ?empno;
        biz:name ?ename;
        biz:department deptIRI(?deptno);
}


Background: Class maps vs. view maps

This approach has a major conceptual difference to the SQL-based approach as originally proposed by Souri.

The original approach is based on Class maps: A class map defines a SQL query that populates a class with instances. Loosely speaking, each row of the SQL result corresponds to an instance of the class, and each column to a property of the class.

The approach presented here is based on View maps. View maps are a generalized form of class maps. No assumption is made that the SQL query populates a single class, or that a single class is populated by one SQL query. Instead, the mapping includes a template that defines how each result row is to be turned into triples.

The class map approach is OWL-centric (the data model is one of classes, properties and individuals). The view map approach is RDF-centric (the data model is a graph; classes are just another kind of resources; instances are linked to their classes through rdf:type, which is just another property).