Formal Objections to the SPARQL Query Language for RDF

DAWG

This document tracks outstanding dissent (formal objections) to the SPARQL Query Language for RDF, SPARQL Protocol for RDF, and SPARQL Query Results XML Format, along with the Working Group's response to the objections.

the WG RESOLVED 2004-07-15 to adopt BRQL v1.11 as its strawman query language design, over the objection of RobS and JeffP of Network Inference:
...XQuery, with minor extensions, would be the best overall foundation on which to enable query-based access to the family of Semantic Web languages. ...

This view did not meet with a critical mass of support in Working Group discussions, though it continued to be explored in the community. One of the most thorough explorations of the relationship of SPARQL to XQuery and SQL concludes:

We have, somewhat reluctantly, concluded that the design goals of SQL and SPARQL are sufficiently different that there is adequate justification for the creation of a special-purpose language for querying RDF collections. We are comforted by the belief that it is possible to translate SPARQL expressions into SQL expressions, allowing users to store their RDF collections in relational databases if they wish to do so, and to write their queries in either SQL or in SPARQL, as they see fit. While predicting that it will be similarly possible to serialize RDF collections into XML documents and transform SPARQL expressions into XQuery expressions, we do not believe that most users would take that direction.
SQL, XQuery, and SPARQL What's Wrong With This Picture? by Jim Melton, Oracle Corporation; in proceedings of XML 2005
Requirement 3.6 Optional Match was accepted 2004-07-15 over the objection of RobS of Network Inference
Note that the objection concludes with:

...Network Inference certainly sees value in both features, and supports both as objectives for this working group. If the potential problems related to these requirements can be overcome, then our objection to the classification of these features as "requirements" should not prevent the group from regaining consensus on a final recommendation.

And while the theoretical issues with OPTIONAL have been expensive to work out, they seem to be specified to the satisfaction of the community. Further, the number of use cases where this feature is critical suggests that SPARQL would not succeed without it (For example, see MacGregor 24 Mar 2005.)
Objective 4.2 Data Integration and Aggregation was accepted 2004-09-16 over the objection of Network Inference/Rob Shearer:

The only technology that I think we all really agree on is RDF and the RDF data model. It strikes me as blatantly wrong to attempt a query standard based on some other data model, and "RDF+some meta information" is some other data model. If the meta information can be exposed in RDF, then our query language should support it by default. If it can't be exposed in RDF, then why are we considering native support in an RDF query language?

A comment from outside the WG also says:

I think these should be removed from the basic SPARQL core, since I feel they add a fair deal of implementation complexity and an application can achieve the same result by submitting multiple queries, possibly to different query processors.

I also feel it would be premature to standardize an approach to multi-graph querying ahead of there being a consensus/standard for something like RDF named graphs.

Klyne 08 Apr 2005

The FROM NAMED and GRAPH features seems to be specified to the satisfaction of a critical mass of the community, supported in several implementations, and required by number of use cases and applications.

The feature seems to be specified to the satisfaction of a critical mass of the community, and it seems unlikely that further deliberation of this issue would result in substantially more consensus.

On 5 March 2006, Elliotte Harold asked that we don't use ? and $. Pick one. He was not satisfied by our attempts to justify our decision as part of punctuationSyntax issue:

> >> A number of design considerations were laid out in:
> >> Draft: open issues around '?' use.
> >> 
> 
> I think this makes some good arguments for using a $ instead of a ?. 
> However it doesn't convince me that using both is a good idea. Why are 
> two characters considered necessary here? Why not just pick the $ and be 
> done with it?

The use of ?var syntax in SPARQL goes back all the way to the 1st
WD in October 2004
 
The number of reviewers, users, and implementors that we would need
to collaborate with in order to take ?var out is considerable, and
it's not clear that we have an argument that is sufficient to convince
them. True, allowing both adds various costs, but this is largely
sunk cost. The details of the specification are worked out; we have
test cases and multiple implementations. A growing number of users
have learned the ?var syntax, and those that need to use ODBC-style
systems seem to know about and be happy with $var.
It seems unlikely that we would get consensus around a change
to take out ?var or $var in a reasonable amount of time, and the
number of parties that are interested to see SPARQL advance to
Candidate Rec soon is considerable.

Again, please let us know whether you find this response satisfactory.

On 22 May 2007, Peter F. Patel-Schneider sent comments on the third Last Call SPARQL Query Language draft:

Because of problems described in 8/ below, I do not believe that the
document is adequate to progress to the next stage of the W3C process,
even without my fundamental disagreement with the treatment of the
meaning of RDF graphs in SPARQL (3/ and 9/ below).

...

3/ Matching literals

I was very surprised to see that the exact literal form of an RDF
literal is significant (Section 2.3.3).  Imagine what would happen if an
SQL query depended on the exact literal form in which numbers were
entered into a database!

...

8/ Basic Definition of SPARQL

...

The definition of BGP Matching is not specified in the document.  The
definition in Section 12.3.1 defines a "solution" reasonably, although
presumably mu is *the* "restriction of P to the query variables in BGP.
However, the last bit of the definition doesn't make sense?  What is
omega there?  What is mu there?  What is theta?  What is mu(theta)?
Where then is the definition of the match of a BGP against an RDF graph?

Section 12.5 does not provide the missing glue, as it just defers to
Section 12.3.1.  Section 12.5 doesn't even get to a BGP and an RDF
graph.

9/ A Fundamental Disagreement on SPARQL

I still object to the fact that SPARQL can produce different results for
equivalent RDF graphs, as described in Section 12.3.2.

The Working Group sent separate responses to the editorial content on 28 May 2007 and to the substantive content on 29 May 2007:

SPARQL is defined for simple entailment as noted above. As per 6.5.1 of
RDF concepts:

http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal
[[
6.5.1 Literal Equality

Two literals are equal if and only if all of the following hold:

     * The strings of the two lexical forms compare equal, character by
character.
     * Either both or neither have language tags.
     * The language tags, if any, compare equal.
     * Either both or neither have datatype URIs.
     * The two datatype URIs, if any, compare equal, character by
character.
]]

SPARQL does accomodate value matching with FILTER expressions, which are
defined by XPath Functions and Operators.

RDF semantics talks about datatyped interpretations and D-entailment:
http://www.w3.org/TR/rdf-mt/#dtype_interp

Based on current implementation practice, the working group decided to
leave D-entailment as a research problem. This includes richer semantics
for graph patterns such as:

  [] rdf:type xsd:nonNegativeInteger .

The Working Group has approved tests in this space. See, for example:

  http://www.w3.org/2001/sw/DataAccess/tests/#open-eq-01

(But note that that's in the old test space and will be moving in the next
few weeks.)

...

> The definition of BGP Matching is not specified in the document.  The
> definition in Section 12.3.1 defines a "solution" reasonably, although
> presumably mu is *the* "restriction of P to the query variables in BGP.
> However, the last bit of the definition doesn't make sense?  What is
> omega there?  What is mu there?  What is theta?  What is mu(theta)?
> Where then is the definition of the match of a BGP against an RDF graph?

The last part of the definition says that the cardinality of a solution 
(mu) in a multiset of solution mappings (omega) is the number of distinct 
RDF instance mappings (sigma) that compose with mu to give a pattern 
instance mapping (P) that, when applied to the BGP, produces a subset of 
G.

...

We have recorded your previous objection to this as part of our 
rdfSemantics issue:

  http://www.w3.org/2001/sw/DataAccess/issues#rdfSemantics


The Working Group has considered alternatives; Bijan Paria described
forms of leanness with respect to the data graph and the subgraph
matching a query pattern.
  http://www.w3.org/mid/DF0BA59F-7E8C-4CDB-BF2C-C391D05CEB4D@cs.man.ac.uk

The results of a SPARQL query with respect to a given data graph are
defined, and specifically do not include leaning the matching
subgraph. SPARQL neither prohibits nor requires the reduction of
equivalent graphs to the minimal entailment

Most current implementations do not lean the input graph. The WG
consensus was not to impose this on implementations, noting
significant efficiency and scaling costs. Coupled with extension to
other entailment systems, the fundamental problem is outside current
practice. Implementations are free to process data before exposing it
(e.g. apply leaning).

In a 29 May 2007 reply, Peter F. Patel-Schneider confirmed his objections:

> Most current implementations do not lean the input graph. The WG
> consensus was not to impose this on implementations, noting
> significant efficiency and scaling costs. Coupled with extension to
> other entailment systems, the fundamental problem is outside current
> practice. Implementations are free to process data before exposing it
> (e.g. apply leaning).


I still object to this decision.

Further, if the definition of SPARQL admits different results for graphs
that are simple-equivalent, then this should be so stated more
prominently, particularly as the document talks about simple entailment
so prominently.  It is misleading to talk about SPARQL being
based on simple entailment when it is not and such references should be
removed from the document.

On 24May 2007, Bob MacGregor sent comments about declarative vs. algebraic semantics, named graphs vs. quads, and the negation of the bound operator. Subsequent discussion resolved the second issue as out-of-scope to SPARQL, but the other two objections remained:
```
My argument is against the choice of an algebraic semantics instead  of a
declarative semantics.  Unless I am mistaken, OWL has a declarative semantics,
and I would assume that SWIRL and RuleML have or will each have a  declarative
semantics.  Suppose X would like to implement rules from one of these languages
using SPARQL to evaluate the rule bodies.  If the semantics of SPARQL  aligns
with the rule language, or perhaps with a subset of it, then X can  comfortably
use SPARQL for this task.  However, comparing a declarative (rule)  semantics
with an algebraic (SPARQL) semantics is an apples and oranges comparison.  To
be sure that SPARQL properly implements the rules, X would have to produce the
declarative semantics on her own.

A declarative semantics forms a bedrock on which to build a logic pyramid.  An
algebraic semantics is essentially a dead-end.

...

I wasn't recommending eliminating UNBOUND from the language; I was
recommending relegating it to secondary status within the language, i.e.,
making it a computed predicate and not according it a reserved word.  Its
easily the most egregious hack in the language.
```
The group has unsuccessfully attempted to ascertain if the declarative semantics that Bob prefers would result in a different language than the current draft. We believe that the current semantics satisfy a critical mass of the community, and that the implementation report to be produced as an exit critria from CR will demonstrate that multiple SPARQL engines exist that implement the current semantics in an interoperable fashion.

The bound operator has been part of the SPARQL specification since the February 2005 working draft (with a TODO mention in late 2004). In conjunction with the logical-not operator, !, the group closed the unsaid issue in the Jan 2005 Helsinki face-to-face meeting, noting that the expressivity of !bound covers common use cases of unsaid and fulfills the non-existent triples design objective. Current SPARQL implementations support bound and it is widely used in SPARQL queries (e.g., see the sample query in the SPARQL FAQ).
On 26 October 2007, Frank McCabe sent comments about declarative semantics and the OPTIONAL feature. The Chair responded, noting that there did not appear to be new information in the comment to cause the Working Group to reconsider its design at this point in the process and also noting two existing objections relevant to the commenter (#optional and algebraicsemantics). In an offlist reply to the Chair, the commenter asked that his comment be added in support to the objection against the inclusion of OPTIONAL in SPARQL; the commenter asked that his comment not be considered as support for the objection to the use of an algebraic semantics. The Chair has asked for a public confirmation of this objection.
On 1 November 2007, Andrew Newman objected to the use of result sets to define SPARQL, preferring an RDF query language based on manipulations of graphs. Subsequent messages from the commenter explain more of the preferred design.
```
The basis of my objection is founded on SPARQL being an RDF query
language and that it should use an RDF data model throughout.  This is
one property that represents what is considered good design for query
languages (for RDF query languages see [1] but it has been covered
elsewhere in criticism of other query languages such as SQL).

[1] J. Bailey et al, "Web and Semantic Web Query Languages: A Survey,"
LNCS 3564, 2005, Norbert Eisinger, Jan Maluszynski (editor(s)),
```
The Working Group notes that the SPARQL language has been based on sets of solution tuples for quite some time, and the design seems to satisfy a critical mass of the user and implementor communities. The Working Group is unfamiliar with any existing query languages that meet the commenter's design goals.
On 14 November 2007, Sean B. Palmer objected to the inclusion of the unregistered text/rdf+n3 MIME type in an example in the SPARQL Protocol for RDF specification, asking that the MIME type be registered or that its use be removed from the SPARQL protocol specification.

The objection is pending as the Working Group pursues an acceptable solution to the issue.

Lee Feigenbaum, RDF Data Access Working Group chair
$Revision: 1.5 $ of $Date: 2007/12/21 16:50:07 $

Change Log

$Log: obj108.html,v $
Revision 1.5  2007/12/21 16:50:07  lfeigenb
added SBP objection

Revision 1.4  2007/11/07 06:57:48  lfeigenb
updated pointers and response to #bindingBased

Revision 1.3  2007/11/02 02:51:22  eric
+ bindingBased

Revision 1.2  2007/10/26 06:54:56  lfeigenb
algabraic -> algebraic

Revision 1.1  2007/10/26 06:52:49  lfeigenb
new document for tracking SPARQL objections