SQL in R2RML mappings

We have implemented the core of R2RML (as defined in the 2010/12/07 Editors
Draft) in our relational-to-RDF product and we are working on converting
several mappings to use R2RML. In the course of this work we have
encountered the following issues/questions related to the SQL statements
that are embedded in R2RML mappings. A theme of this list is a desire to
avoid duplicating SQL in the mappings. It is possible that we are not
interpreting the specification properly, in which case we would appreciate
clarification. Also, I am happy to provide more details about any of these
issues and to talk about how we are working around them.

Thank you.
-David McNeil

====

1) Reuse a SQL query string in multiple TriplesMaps.

For example, consider the case where several tables must be joined in order
to identify the rows to map. If these rows are to be used to generate
multiple subjects, then the entire query must be copied.

A possible solution would be to allow a SQL query to be represented as a
resource (rather than as a string) and used by multiple TriplesMaps.

2) Reuse a SQL query as a sub-query under multiple TriplesMaps.

For example, there may be a repeated sub-query that is used by several
queries. Rather than copy-and-paste the query multiple times, it is
desirable to write it once and join it into several other queries. In
particular, consider the case where the mapping process is not allowed to
make changes to the database (e.g. by adding a view to the database).

A possible solution is to allow a logical table (i.e. query) to be built in
the mapping from sub-logical-tables.

3) Allow a predicate/object map to join in another table.

RefPredicateObjectMaps allow TriplesMaps to be "joined" in very specific
ways. But they do not allow the expressions used to compute the predicate or
object to reference both queries that are joined. Consider a case where the
type of the relationship (and therefore the specific predicate) is
determined by columns of the object query. For example a predicate of either
"mother" or "father" might be needed depending on the gender of the object.
This type of mapping is not supported by RefPredicateObjectMaps because the
predicate cannot refer to any of the columns used to produce the object.
Similarly the object produced cannot use columns from the subject's query.
This can be worked around by simply creating a new SQL query and basing the
subject, predicate, and object on it. However, this requires undesirable
copying of the SQL.

This could be addressed by allowing a predicate object map to explicitly
define an additional query as a logical query.

4) Allow two queries to be joined via either an inner or an outer join.

I don't see any means to specify inner/outer joins as part of a
joinCondition. This would be useful in cases where null values in the joined
table are used to generate triples in the output.

5) Support for database vendor specific SQL statements.

For example, this is needed if a mapping needs to use an Oracle specific
statement that cannot be parsed as standard SQL.

A possible solution is to allow SQL statements to be flagged as "opaque".
This would indicate that the statements are not to be parsed by the mapping
tool, but simply passed down to the underlying database.

Received on Tuesday, 25 January 2011 14:50:20 UTC