SPARQL and string literal matching woes - spec inconclusive - try 2

***

I'm apologize, I pressed the send key too soon accidentally, please ignore my earlier mail on the subject.

***

Hello,

I am having some trouble matching literals with SPARQL. It seems that
just about every implementation I tried manages to give me a differing
set of answers for a very simple query. I have tried to verify this
against the specification, but I haven't been able to find a
conclusive answer there.

I believe this could be an important interoperability blocker for
several applications as the problem is easily triggered by the
simplest of graph patterns.

The issue is very simple do describe, but to be exact so it can be
discussed reasonably, I will write it here somewhat verbosely:

Assume the following RDF graph:

***
@prefix dt:   <http://example.org/datatype#> .
@prefix ns:   <http://example.org/ns#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

ns:a ns:p "value" .
ns:b ns:p "value"^^xsd:string .
ns:c ns:p "value"^^dt:datatype .
ns:d ns:p "value"@en .
***

Now, if this graph is queried with the simple SPARQL query:

  SELECT ?x where { ?x ns:p "value" }

What should the result set be? It seems that some implementations
return just ns:a here, where as some implementations return ns:a and
ns:b.

Second case is where the SPARQL query is:

  SELECT ?x where { ?x ns:p "value"^^xsd:string }

In this case too, we get implementations that return just ns:b and
implementations that return ns:a and ns:b.

There does not seem to be a clear ruling on this by the specification,
even though the feel of the specification in general (in my opinion)
seems to indicate that returning both ns:a and ns:b to these queries
would not be the right solution.

The third case is a bit more complex SPARQL query:

  SELECT ?x where { ?x ns:p ?y FILTER (?y = "value") }

This is somewhat trickier. The SPARQL specification defines that
operator fn:compare is used to match between plain literal pairs and
also between xsd:string pairs. I am not sure if the specification
defines how a string literal in a filter clause should be interpreted
- that is, if "value" is actually a plain literal or just some
ephemeral string type. If the specification defines that "value" is a
plain literal, then when comparing ns:b, we have a comparison between
xsd:string and a plain literal - which is not defined by SPARQL. The
common comparison operator is RDFTerm-equal, which states that for
literals, the comparison is done by RDF Concepts literal equality,
which clearly defines that "value" and "value"^^xsd:string should not
compare equal. But it also defines that comparisons between two
literals can result in type errors. I am at a loss here as to what the
specification actually signifies in the case of a plain literal
against a typed literal.

But again in this case, implementations differ in whether they return
ns:a or both ns:a and ns:b.

The fourth case is again similar to the one before:

  SELECT ?x where { ?x ns:p ?y FILTER (?y = "value"^^xsd:string)

And likewise, implementations differ in whether they return only ns:b
or both ns:a and ns:b.

The fifth case uses a yet another new operator:

  SELECT ?x where { ?x ns:p ?y FILTER (sameTerm(?y, "value")) }

This case seems to be a clear cut decision in my opinion. The
specification clearly defines sameTerm to use the RDF Concepts
comparsion, which compares "value" and "value"^^xsd:string as not
equal. The only confusion can be from the question if "value" is to be
interpreted exactly as a plain literal, or if it could be just a
string argument to a function without being an RDF term at all.

However, even in this case, I found some implementations which return
both ns:a and ns:b instead of just ns:a. These I would personally
classify as non-conforming implementations.

The sixth case is again a variation of the one before:

  SELECT ?x where { ?x ns:p ?y FILTER (sameTerm(?y, "value"^^xsd:string)) } 

In this case, there should be no question as to whether
"value"^^xsd:string is a typed literal or not.

Even still, some implementations return both ns:a and ns:b instead of
just ns:b in this case.

Summary:

I have no clue as to how the specification wants string literals
matching done in SPARQL implementations - and it seems that neither do
many of the implementors. Hopefully some clarity can be brought in to
this matter.

A similar issues remain on matching other datatypes as well - but
those issues are more easily discussed once this issue has been dealt
with.

Thank you for your time,
-- Naked

Received on Saturday, 5 July 2008 15:06:47 UTC