Warning:
This wiki has been archived and is now read-only.

Identifier re-use

From RDB2RDF
(Redirected from Entity disambiguation)
Jump to: navigation, search

As per our charter "the mapping language MUST allow for a mechanism to create identifiers for database entities. The generation of identifiers should be designed to support the implementation of the linked data principles. Where possible, the language will encourage the reuse of public identifiers for long-lived entities such as persons, corporations and geo-locations."

This page addresses possible options for disambiguation interfaces in R2RML for well-known entities identifier. It tackles thus one particular aspect of integrating R2RML endpoints nicely into the Linked Data Web.


Vocabulary alignment

The alignment of the vocabulary part of an R2RML mapping is ensured by the possibility to re-use existing vocabulary elements wherever properties or classes are created.

Direct representation of entity references in database columns

This option would allow to globally define namespace prefixes, to store CURIE directly in a database column and to resolve CURIES to URIs based on the defined namespace prefixes when required for a particular SPARQL result set or RDF serialization.

A database table containing rdf:type information directly in a particular column, could then for example look as follows:

s_id name type
1 Bob foaf:person
2 Alice foaf:agent

Entity alignment

In order to support the direct translation of SPARQL queries into SQL queries adhering to the underlying relational database schema, a mapping of database entities to Linked Data entities must be reversible. Different options include:

Computing URIs from a database column

To some extent, we can already do this via SQL expressions:

[] a rr:TriplesMap; rr:query "SELECT 'http://example.com/people/' || PEOPLE.ID, ...";

And via templates:

[] a rr:ObjectsMap; rr:template "http://example.com/people/{ID}";

A problem is that one might want to generate URIs from strings that contain disallowed characters, for example, creating a URL from the string "John Smith".

In D2RQ

D2RQ further supports this by providing the “urlencode” and “urlify” functions in templates:

[] a d2rq:propertyBridge; d2rq:uriTemplate "http://example.com/people/@@PEOPLE.NAME|urlencode@@";

This produces "http://example.com/people/John%20Smith".

[] a d2rq:propertyBridge; d2rq:uriTemplate "http://example.com/people/@@PEOPLE.NAME|urlify@@";

This produces "http://example.com/people/John_Smith".

See the manual for details.

User defined entity mapping functions

This option would allow the mapping engineer to define a function r2rml_entity_mapping(entity), which translates between internal and external entity identifiers. It hast to be ensured, that the function is a bijection, i.e. r2rml_entity_mapping(r2rml_entity_mapping(entity))=entity.

Problem: It might be difficult to use this option when SPARQL queries can contain filter clauses (e.g. sameTerm) which compare entity identifiers with syntactically different representations.

User-defined mapping functions in D2RQ

D2RQ supports this via its translation table mechanism. A translation table is thought of as a table that associates some values from the DB with one RDF term each. It can be thought of as a two-column table with database value in one column and RDF term in the other. Both columns have to be unique, so the mapping has to be a bijection. This allows two-way mapping.

A translation table can be defined via a user-defined Java class.

Example Turtle below. D2RQ's d2rq:PropertyBridge concept roughly corresponds to R2RML's r2r:ObjectMap.

:color a d2rq:PropertyBridge;
    d2rq:property ex:color;
    d2rq:uriColumn "OBJECT.COLOR";
    d2rq:translationTable [
        d2rq:javaClass "com.example.myproject.ColorTranslator";
    ].

The Java class has to implement the Translator interface.

In practice, the requirement for the mapping to be bijective is often a snag. Imagine if the 43 different color codes from the DB are translated to only 9 different colors in your domain ontology. This is not possible with this mechanism in D2RQ.

Defining an entity mapping table within the database

This option is similar to providing an entity mapping function, however, the function is now explicitly represented as a database table.

This is already possible with R2RML, it is simply an additional join to get to the value in the table.

Defining an entity mapping as part of the mapping file

Again this is similar to providing an entity mapping function. Again the function is explicitly represented as a table. But the table is not a database table, but part of the mapping file, or some other external file.

Mapping tables in D2RQ

Again, D2RQ supports this via its translation table mechanism, see above.

Example Turtle below. D2RQ's d2rq:PropertyBridge concept roughly corresponds to R2RML's r2r:ObjectMap.

:color a d2rq:PropertyBridge;
    d2rq:property ex:color;
    d2rq:uriColumn "OBJECT.COLOR";
    d2rq:translationTable [
        d2rq:translation [ d2rq:databaseValue "R"; d2rq:rdfValue :red; ];
        d2rq:translation [ d2rq:databaseValue "G"; d2rq:rdfValue :green; ];
        # ... more translations omitted ...
        d2rq:translation [ d2rq:databaseValue "B"; d2rq:rdfValue :blue; ].
    ].

One can also put the table into an external two-column CSV file:

:color a d2rq:PropertyBridge;
    d2rq:property ex:color;
    d2rq:uriColumn "OBJECT.COLOR";
    d2rq:translationTable [
        d2rq:href <http://example.com/stuff/colorcodes.csv>;
    ].

In practice, the requirement for the mapping to be bijective is often a snag. Imagine if the 43 different color codes from the DB are translated to only 9 different colors in your domain ontology. This is not possible with this mechanism in D2RQ.