comment on SPARQL 1.1 CSV result format: datatyped literals

Dear WG,

I am currently looking into Sesame's conformance testing framework wrt
the tests at http://www.w3.org/2009/sparql/docs/tests/data-sparql11/ and
am hitting a problem with the tests involving CSV result formats. This
problem has to do with the CSV format's lack of support for recording
datatypes.

I am aware that this lack of support is specified in the introduction as
"by design", however, I want to urge the WG to reconsider this design
choice.

As an example, the problem is demonstrated by test case
csv-tsv-res/csv03. In this test case, the input data contains several
typed literals. To pick one example: "a7"^^xsd:hexBinary. The test query
is expected to return a result row containing this literal, however, the
CSV result format records this expected result row as:

 http://example.org/s7,http://example.org/p7,a7

As you can see, there is no hint in the format that tells the parser
what datatype 'a7' should be. In consequence, the current test fails in
Sesame because the framework can not reconcile this result with the
(typed) result from the query engine.

IMO, this is a major flaw in the specification of the CSV format. Not
just because Sesame's testing framework happens to fail on this, but
more generally to allow CSV to throw away a significant part of the
information it is supposed to record is inappropriate. I simply don't
think that it is up to a recording format to make that kind of decision.

I appreciate that the CSV format should be simple and that it is
primarily aimed at importing results into spreadsheet tools (and not at
driving testing frameworks :)). However, like any data recording format,
minimally it should be expressive enough to ensure that round-trips for
any valid input are possible.

Like TSV, CSV should have provisions for recording datatyped literals in
a way that a parser can reconstruct the exact value being recorded.

If some client application requires only plain literals, then this can
be expressed in the SPARQL query that produces the result (using STR()
in the SELECT). That is the appropriate delegation of responsibility IMO.

I suggest that a simple modification to CSV to fix this is to amend the
section on serializing RDF terms
(http://www.w3.org/2009/sparql/docs/csv-tsv-results/results-csv-tsv.html#csv-terms)
to state that terms (or at the very least, datatyped literals) are
recorded in Turtle format. This would be more in line with how TSV
behaves, as well.


Regards,

Jeen Broekstra

Received on Thursday, 16 February 2012 20:52:31 UTC