Linked Data Aspects of R2RML
This page is to discuss the Linked Data aspects of the RDB2RDF Mapping Language (R2RML) based on our charter.
Regarding Linked Data, the RDB2RDF WG charter says:
The mapping language MUST allow for a mechanism to create identifiers for database entities. The generation of identifiers should be designed to support the implementation of the linked data principles. Where possible, the language will encourage the reuse of public identifiers for long-lived entities such as persons, corporations and geo-locations.
In the following we detail out what this means, taking into account TimBL's design note on Relational Databases on the Semantic Web.
Linked Data Principles
R2ML MUST support identifiers which conform to Linked Data principles:
- Use URIs as names for things.
- Use HTTP URIs so that people can look up those names.
- It is vital to have HTTP URIs to enable the retrieval of the data, see also http://www.w3.org/TR/cooluris/
- When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
- Include links to other URIs so that they can discover more things.
As we have described in an RDB2RDF XG note regarding reusable identifiers it is beneficial to reuse URIs for well-known entities such as people, places, etc.
We have to address two issues concerning this aspect:
- What are well-known entities, really? We can certainly not define it completely, but need to define their characteristics. For example, one may consider a certain product to be well-known (e.g. the iPhone) however, in other cases it might be not so obvious (due to limited scope or whatever).
- How are the URIs for well-known entities generated? Example: use DBpedia URIs, use centralised service such as Okkam, etc.
There seem to be three different technical solutions to that problem:
- Having a column in a table containing strings, which can be used to generate well-known URIs.
- Having a column in a table already containing complete URIs or abbreviated URIs (#1 can be considered a special case).
- Attaching a user-defined-function to properties, which takes a database cell as a parameter, calling a web-service and returning a URI as a result.
A detailed discussion can be found at the following page: Entity disambiguation
This is an important issue: Provenance information is critical for trust, quality assessment, data mash-ups and so on. This is not explicitly in scope for the RDB2RDF WG. Nevertheless, we should explore wether it is possible to allow R2ML to provide that kind of information.
Scenarios: Two scenarios appear most important inside the WG's scope.
- Provenance information on the database level: “This RDF was obtained from that database using the following mapping.”
- There might be provenance-related information stored in the DB itself, e.g., author and publishing time of a document that is itself stored in the DB; might want to translate that into some provenance-related vocabulary.
Linked Data: Linked data solves a part of the problem, because one knows the DNS domain where the data is published; this is important provenance information and information consumers can make trust decisions based on the domain.
Provenance in the SPARQL data model: SPARQL supports the Named Graphs data model: RDF information can be partitioned into several graphs, and meta-information can be attached to each of the graphs. (see Section 8 of the SPARQL spec).
It should be possible to map a database to not just a single RDF graph but to a set of named graphs. This could be done in a static way (e.g., each view goes into a separate named graph), or in a dynamic way (e.g., mapped data is placed into a graph based on some column in the DB, e.g., an author column).
Provenance XG: The WG should liase with the W3C Provenance Incubator Group.