semantics and complexity of sparql

Dear All,

  Motivated by database considerations and the discussion in this list 
--which we have follow closely-- we developed a thorough theoretical
study  of the core part of SPARQL, namely its pattern matching facility,
and submitted the study to review elsewhere. As the deadline of
June 6, 2006 is approaching, we thought it was convenient to make
public this study with the hope that it can be of some help in the 
development process of SPARQL.

  We present in this message general observations, and suggest the
interested persons to get the full document with proofs in
http://arxiv.org/abs/cs.DB/0605124 
Our core suggestion is the need of a formal syntax and semantics
for the language SPARQL. The purpose of the following points
is to highlight this need:

1) The evaluation process of a language should be directed by a
   clear semantics and not by the language grammar 
(http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Apr/0023.html).

2) Ideally, a query language should have a compositional semantics.
As D. Maier pointed out for XML: "The meaning of an expression should be
the same wherever it appears. Furthermore, expressions with equal result
types should be allowed to appear in the same contexts."
   In SPARQL, the problems brought by a non-compositional semantics are
more evident in the nested optional construct.
We show in the paper a natural and simple compositional semantics.

3) A precise fromal syntax would be very welcomed by developers.
     For ex. current SPARQL specification lacks explicit precedence and
         association rules (they are implicit in the grammar).
(http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006May/0003.html).

The above lack of formal specification can be exemplified by
several features, among which the most visible are:

i) Non idempotency of sets of graph patterns in the
    presence of UNION, for ex. implying that for some graph
   patterns P it is not always the case that {P . P} gives the same
   result as { P }. We show that UNION is the center of the problem
   and present solutions.

ii) Unrestricted variables in FILTER conditions lead to undetermined results

(http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006May/0009.html) 
(http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Apr/0025.html)
   There are also other curiosities arising from this lack of restriction
that we show in the paper.

iii) ARQ, a relevant implementation of SPARQL has the following problems:
  - non compositional (nor conjunctions, nor optionals have compositional
    properties in ARQ).
  - the conjunction is not commutative.
  - FILTER with strange behavior (because of the unrestricted use of
    variables in conditions).


best regards,

Jorge Perez,
Marcelo Arenas,
Claudio Gutierrez.

Received on Thursday, 1 June 2006 22:22:56 UTC