Last Call Changes to R2RML

From RDB2RDF
Revision as of 13:21, 31 January 2012 by Rcygania2 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page summarizes changes that were made to the R2RML spec after Last Call.

Diff

A diff of all changes since Last Call is available.

Significant changes to normative sections

Here we list significant changes to normative sections. Changes to normative sections potentially affect conformance.

Characters in base IRIs

Section: Definition: base IRI

At LC, the spec said that a base IRI SHOULD end in a slash (“/”) character. Now it additionally says that it SHOULD NOT contain question mark (“?”) or hash (“#”) characters. This is a normative change, but it affects neither the behaviour of R2RML processors nor R2RML documents, but is merely a new constraint that applies to the parameters that an R2RML processor is invoked with. Also, it is only a “SHOULD”.

New classes for R2RML views and SQL Base Tables or Views

Section: Definition: R2RML classes

Two classes, rr:R2RMLView and rr:BaseTableOrView, have been added to represent mapping components that previously didn't have a corresponding class. This is change in normative text, but the spec notes: “Explicit typing of the resources in a mapping graph with R2RML classes is optional. Their presence in a graph has no effect on the behaviour of an R2RML processor.”

Clarification of the behaviour of R2RML classes

Section: Definition: R2RML classes

At LC, this section stated: “Using these classes is optional in a mapping graph. The applicable class of a resource can always be inferred from its properties.” This has been clarified to read: “Explicit typing of the resources in a mapping graph with R2RML classes is optional. Their presence in a graph has no effect on the behaviour of an R2RML processor. The mapping component represented by any given resource in a mapping graph is defined by the presence or absence of certain properties, as defined throughout this specification.” This is only a clarification, but the previous text might be misread as saying that R2RML classes, even though optional, have some effect when present. Correcting such a misunderstanding in an implementation should not be hard.

Stricter definition of “quoted and escaped data value”

Section: Definition: quoted and escaped data value

This term was underspecified. The definition has been augmented with a reference to the SQL grammar. (The previous definition was not very good from a formal point of view, but didn't have too much room for misunderstanding.)

Allowing additional datasets beside the output dataset

Section: Definition: output dataset

After stating that the output dataset MUST NOT contain any additional triples or named graphs besides what's defined in the spec, a new sentence is added: “However, R2RML processors MAY provide access to datasets that contain additional triples or graphs beyond those in the output dataset, such as inferred triples or provenance information.” This isn't really any change, because any such additional datasets are out of scope of the spec and therefore there never was a constraint in the first place.

Always apply to-XSD transformations before booleans, binaries and datetimes are used

Section: Various that affect generation of strings from logical table columns

In the LC design, when non-string columns were referenced to construct IRIs, templates, or blank node identifiers, then the column value would be simply converted to string and used as is. For example, a boolean used in a template might yield TRUE. In the new design, the transformations in the natural literal table are applied first. The transformation for boolean is “lowercase”, so the value in the template instantiation would be true. This affects BOOLEAN, BINARY and friends, and the datetime types. The motivation for this is mainly that it makes the use of dates and timestamps in IRI templates work better – we now get, for example, http://example.com/date/2011-11-11 instead of http://example.com/date/DATE%20'2011-11-11'. This is a rather obscure change that is not expected to have major impact in practice. We have to make sure that this is covered in test cases.

Changes to canonicalization behaviour for non-string columns

Section: 10.2 Natural Mapping of SQL Values and 10.4 Non-String Columns in String Contexts

By defining the conversion of SQL data values to RDF literals and strings in terms of SQL CAST(xxx AS VARCHAR) expressions, R2RML was effectively mandating the production of canonical string forms of all non-string SQL datatypes, using the definition of CAST(…) in the SQL spec as a canonicalization function. This design had issues, and was replaced with a more nuanced design: When literals are produced, canonicalization is no longer necessary, but any XSD lexical form of the target XSD type with the right value can be used. When strings are produced (as part of IRIs, blank node identifiers, etc), canonicalization SHOULD be done, but according to XSD canonical form.

As part of this, references to XSD 1.0 were changed to XSD 1.1, which has slightly different (and better) canonicalization rules.

The behaviour for literals relaxed a constraint to ease implementations. The previous conforming behaviour is still conforming. The behaviour for strings changed a MUST constraint to a different SHOULD constraint. The previous conforming behaviour does not meet the SHOULD in some cases, but is still conforming.

Changed default rr:termType rules

Section: 7.4 IRIs, Literal, Blank Nodes (rr:termType)

Object maps using rr:template would always default to producing IRIs if no rr:termType was specified. This was changed so that they produce literals if rr:language or rr:datatype are present. Previously, the presence of these properties without rr:termType was an error. Implementations have to change their default and validation rules.

Removed syntactic sugar for simple logical tables

Section: 6 Mapping Logical Tables to RDF with Triples Maps

We used to allow using rr:tableName (and, sort of unintentionally, rr:sqlQuery and rr:sqlVersion) directly on a triples map without an intermediate logical table resource. This syntactic shortcut – a minor syntactic convenience feature – was removed. Implementations may have to remove support for this, and mapping files may have to be adapted; both should be easy.

Allow multiple predicate maps and object maps on a single predicate-object map

Section: 6.3 Creating Properties and Values with Predicate-Object Maps

At LC, the spec said that a predicate-object map must have exactly one predicate map, and exactly one object map or referencing object map. Now it allows one or more of either kind. This doesn't invalidate any R2RML mappings, but removes a syntactic constraint and thereby might cause some additional cost for implementers.

Significant changes to non-normative sections

Here we list significant changes to non-normative sections, such as notes and examples. As non-normative sections don't influence conformance, the impact of these changes is very low. Nevertheless, these changes might alter the understanding that a reader gets from reading the spec.

Behaviour of base IRIs

Section: Definition: base IRI

The note about resolution of relative IRIs has been extended significantly to describe exactly the situations when R2RML's base IRI resolution behaves differently from RFC 3986 base IRI resolution.

Note on SQL identifier syntax

Section: Note on SQL identifiers

The second half of this note contained outdated explanations and examples that disagreed with the rest of the note, with all other examples, and with the normative text. This has been fixed.

Example for producing literals using rr:template

Section: 7.3 From a template

The section's last example is supposed to produce RDF literals, but lacked the required rr:termType specification. The default, if none is specified, is to produce IRIs. The example has been fixed by adding an rr:termType.

Example using rr:refObjectMap fixed

Section: 8 Foreign Key Relationships among Logical Tables

The first example in the section used a property rr:refObjectMap that no longer exists. This has been fixed.

New example for a referencing object map without rr:joinCondition

Section: 8 Foreign Key Relationships among Logical Tables

An extended second example was added to the end of the section. It is new content and therefore doesn't have the same degree of peer review as the rest of the text yet.

New example for implementing a translation table using an R2RML view

Section: 2.7 Example: Translating database type codes to IRIs

Since R2RML-native translation tables are not a feature of R2RML 1.0, an example was added that shows how to implement them using R2RML views and SQL's CASE statement.

Example usng rr:graph fixed

Section: 9 Assigning Triples to Named Graphs

The first example in the section used rr:graph where it should have used rr:constant. This was fixed.

Bugfixes in canonical SQL literal patterns

Section: 10.2 Canonical SQL Literals

The table in the Note in this section contained four errors: (1) Pattern for floats was missing the exponent; (2) Pattern for DATE was missing the initial “DATE” string; (3) same for TIME; (4) same for TIMESTAMP.

Update: This section is now gone as part of the Section 10 editorial rewrite.

Bugfixes in canonical SQL literal examples

Section: 10.2 Canonical SQL Literals

(1) DATE, TIME and TIMESTAMP examples lacked single quotes around the value; (2) there was a time example with 00.000 seconds, but this is not canonical because fractional zeroes would be removed in the canonical form; the example is now 34.885 seconds

Update: This section is now gone as part of the Section 10 editorial rewrite.

Bugfixes in Appendix B.2 property table

Section: B.2 Properties

The cardinalities for rr:subject, rr:subjectMap, rr:predicate, rr:predicateMap, rr:object, rr:objectMap, rr:graph, rr:graphMap were incorrectly stated as 1, it was corrected to 0…1 in all cases.

Major editorial changes

Here we list changes that affect large portions of text, but do not alter the conformance criteria.

Datatype mapping re-write

Section: 10 Datatype Conversions, with smaller changes to 7.6 Typed Literals and 11.2 The Generated RDF Terms of a Term Map

The whole datatype mapping mechanism was presented in a convoluted and poorly factored way. The relationship between Sections 7.6, 10 and 11.2 were interwoven in a way that was too complicated. This made it hard to understand the details of the mapping, and discouraged its re-use for the Direct Mapping spec. The re-write is hopefully clearer in its presentation. It uses different terminology: natural RDF literal, datatype-override RDF literal, etc. It introduces significant amounts of new non-normative material (discussion of portability issues arising from use of non-string columns in string contexts; summary of lexical space of XSD types). The non-editorial changes resulting from this are discussed separately above.

Section 11 cleanup

Section: 11 The Output Dataset

Various cleanup regarding cardinalities of various terms, and where “loops” are done. This section is fairly intricate, and the section would benefit from another round of careful review.

Minor editorial changes

Many minor editorial changes were made, including typo and grammar fixes and clarifications of wording. These are not listed here individually, but can be studied by diffing the latest Editor's Draft in CVS against r1.160.