Transition Request to advance SPARQL to Candidate Recommendation

the RDF Data Access Working Group decided (21 Mar meeting minutes, pending successful outcome of ballot ) to request that you advance this specification to W3C Candidate Recommendation and call for implementation.

Status of these documents (proposed)

tweaks to be made in the publication process are marked PUBFIX

This section describes the status of this document at the time of its publication[PUBFIX which hasn't happened yet]. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This 29 Mar 2006 [PUBFIX confirm] draft, along with the other working drafts for SPARQL, are a Candidate Recommendation; it been widely reviewed and satisfies the requirements documented in RDF Data Access Use Cases and Requirements ; W3C publishes a Candidate Recommendation to gather implementation experience.

The first release of this document was 12 Oct 2004[PUBFIX tune to each part] and the RDF Data Access Working Group has made its best effort to address comments received since then, releasing several drafts and resolving a list of issues meanwhile. The design has stabilized and the Working Group intends to advance this specification to Proposed Recommendation once the exit criteria below are met:

A test suite gives reasonable coverage of the features of the query language and protocol.
Note that the working group maintains a collection of query tests and a collection of protocol tests. Only a portion of the tests in these collections are approved at this time.

Each identified SPARQL feature has at least two implementations.

At least two conformant SPARQL services are available. [PUBFIX update link to /TR/ space]

Relevant media types are registered:

The SPARQL specifications introduce two new Internet Media Types. Review has been requested, but the types are not yet registered:

application/sparql-query: review request of 24 Nov 2005

application/sparql-results+xml: review request of 24 Nov 2005

The SPARQL protocol specification uses the ext/rdf+n3 media type, which is unregistered, in an example

Normative dependencies, have been advanced to Proposed Recommendation status:

XQuery 1.0 and XPath 2.0 Functions and Operators

Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language

Web Services Description Language (WSDL) Version 2.0 Part 2: Adjuncts

This specification will remain a Candidate Recommendation until at least 30 May 2006[PUBFIX if 29 March slips, so does this]. An implementation report is in progress.

Comments on this document should be sent to public-rdf-dawg-comments@w3.org, a mailing list with a public archive.

Publication as a Candidate Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Summary of Review

The first public working draft of the SPARQL specification was released in Oct 2004, following a June 2004 Use Cases and Requirements release. The November 2004 Last Call milestone from our charter was delayed due to difficulties reaching consensus on an initial design and requirements; see outstanding dissent below. We adopted a WSDL requirement and a sorting objective in early 2005, accepting another schedule slip. Our requirements have been stable since the March 2005 draft. In a number of cases, we have considered features that go beyond these requirements, but ultimately postponed them due to lack of implementation and design experience. For example, Features for querying lists/collections have been frequently requested, but the requestors seem to be satisfied with our decision to postpone the issue.

About 75 people participated in the comments mailing list, including editors and WG members. Tutorial articles include:

Search RDF data with SPARQL by Philip McCarthy 10 May 2005 on IBM developerWorks
Introducing SPARQL: Querying the Semantic Web by Leigh Dodds November 16, 2005 on XML.com

A community-maintained list of SPARQL software includes SPARQL engines in progress in PHP, Java, Perl, python C, and Common Lisp, as well as client side utilities and parsers. The companion list of services and applications includes interactive forms that allow developers and users to evaluate the language over the web and a few medium to large scale, though experimental, services. We have not evaluated the completeness of these services and software, though this level of support clearly indicates significant investment in and satisfaction with the SPARQL specifications and justifies continued investment in finishing the test materials.

Dependencies were discharged as follows:

The XML Query WG and XSL WG sent review comments in Sep 2005. We sent a response that addressed them in Nov 2005. A 20 Feb 2006 communication in the W3C Semantic Web Coordination Group (member-confidential) suggests that the XSL and XQuery WGs are satisifed; we have not heard further from them.
We requested review from the Semantic Web Best Practices and Deployment (SWBPD) Working Group in general and consulted members of that WG in particular on the SOURCE and UNSAID issues. This has resulted in various individual comments but no comments from the SWBPD WG as a whole.
We exchanged comments with the WSD WG on a number of details related to specifying the SPARQL protocol using WSDL 2.0. While our September 2005 protocol draft conflicted with the then-current WSDL 2.0 specification, our 25 Jan 2006 protocol draft is in sync with latest information we have gotten from the WSD WG and the Woden validator (see 21 March "wsdl fun" thread in DAWG, notice to wsd WG 21 March).
IETF review of SPARQL related media types (application/sparql-query, application/sparql-results+xml) began with review requests (query review request results review request) on 24 Nov 2005. We have not received any comments as a result. We accept registration of these media types as a CR exit criterion.

In July 2005 and September 2005, we released last call working draft of the query language and protocol (respectively) since we had closed all outstanding issues and met all our requirements. Since then, there has been a sustained tension between a growing user and implementor community that is ready for the specification to advance despite any remaining flaws and a diligent review community that is insisting on a high level of rigor.

We tracked status of comments since July 2005, including 55 cases of comments that the WG addressed to the documented satisfaction of the commentors. Due to a number of small technical changes and an increasing number of cases where the WG addressed a comment but did not get a clear indication of satisfaction or otherwise from the commentor, we issued a second last call of the SPARQL protocol 25 Jan 2006 and the SPARQL query language 20 February 2006. Comments were due 13 March 2006; our comment status report shows 9 threads where the WG and the commentor reached consensus, one case where the we "Corrected along the lines of your suggestion" and asked if it was satisfactory but have not seen a response. The remaining two threads are discussed under outstanding dissent below.

Changes since last call have been editorial changes and clarifications only.

Outstanding dissent (formal objections)

the WG RESOLVED 2004-07-15 to adopt BRQL v1.11 as its strawman query language design, over the objection of RobS and JeffP of Network Inference:
...XQuery, with minor extensions, would be the best overall foundation on which to enable query-based access to the family of Semantic Web languages. ...

This view did not meet with a critical mass of support in Working Group discussions, though it continued to be explored in the community. One of the most thorough explorations of the relationship of SPARQL to XQuery and SQL concludes:

We have, somewhat reluctantly, concluded that the design goals of SQL and SPARQL are sufficiently different that there is adequate justification for the creation of a special-purpose language for querying RDF collections. We are comforted by the belief that it is possible to translate SPARQL expressions into SQL expressions, allowing users to store their RDF collections in relational databases if they wish to do so, and to write their queries in either SQL or in SPARQL, as they see fit. While predicting that it will be similarly possible to serialize RDF collections into XML documents and transform SPARQL expressions into XQuery expressions, we do not believe that most users would take that direction.
SQL, XQuery, and SPARQL What's Wrong With This Picture? by Jim Melton, Oracle Corporation; in proceedings of XML 2005
Requirement 3.6 Optional Match was accepted 2004-07-15 over the objection of RobS of Network Inference
Note that the objection concludes with:

...Network Inference certainly sees value in both features, and supports both as objectives for this working group. If the potential problems related to these requirements can be overcome, then our objection to the classification of these features as "requirements" should not prevent the group from regaining consensus on a final recommendation.

And while the theoretical issues with OPTIONAL have been expensive to work out, they seem to be specified to the satisfaction of the community. Further, the number of use cases where this feature is critical suggests that SPARQL would not succeed without it (For example, see MacGregor 24 Mar 2005.)
The DESCRIBE issue was resolved over the objection of Dan Connolly:
expectations around DESCRIBE are very different from CONSTRUCT and SELECT, and hence it should be specified in a separate query language

This objection was supported by a number of public comments; at least one reviewer wrote to explicitly support this feature, meanwhile. The feature seems to be specified to the satisfaction of a critical mass of the community, supported in several implementations, and used in a number of applications.
Objective 4.2 Data Integration and Aggregation was accepted 2004-09-16 over the objection of Network Inference/Rob Shearer:

The only technology that I think we all really agree on is RDF and the RDF data model. It strikes me as blatantly wrong to attempt a query standard based on some other data model, and "RDF+some meta information" is some other data model. If the meta information can be exposed in RDF, then our query language should support it by default. If it can't be exposed in RDF, then why are we considering native support in an RDF query language?

A comment from outside the WG also says:

I think these should be removed from the basic SPARQL core, since I feel they add a fair deal of implementation complexity and an application can achieve the same result by submitting multiple queries, possibly to different query processors.

I also feel it would be premature to standardize an approach to multi-graph querying ahead of there being a consensus/standard for something like RDF named graphs.

Klyne 08 Apr 2005

The FROM NAMED and GRAPH features seems to be specified to the satisfaction of a critical mass of the community, supported in several implementations, and required by number of use cases and applications.

The fromUnionQuery issue was resolved in our 2005-06-07 meeting over the objection of Steve Harris. This was a design issue where the group had a lot of difficulty finding consensus, and the chair chose to act in the interest of schedule concerns:

DanC summarized by observing 3 designs that seemed to be coherent
and had been developed and advocated sufficiently that we might
be able to finish them in a timely manner:

OPTIONS:
  (a) without FROM/FROM_NAMED, dataset is unconstrained; with
   FROM/FROM_NAMED, dataset is bounded from below by given references.
  (b) like (a) but FROM/FROM named completely specify the dataset
  (c) datasets have "aggregate graph" rather than background/default
   graph, and it always contains the merge of the named graphs

By "bounded from below," DanC clarified that he meant D1 >= D2 iff
	D1's background/aggregate graph has everything that D2's has,
		i.e. D1's bg graph rdf-simply-entails D2's
	and D1 has all the named graphs that D2 has; i.e.
	for every named graph (U, G) in D2, (U, G) is also in D1's named
	graphs.

KC observed that this is basically a web-social question of
constraining what publishers do.

DC observed that constraining publishers might be responsive
to comments on this part of our spec, in the interest of
interoperability at the expense of flexibility.

Polling showed significant opposition to (b); after that option
was removed, the WG was split nearly 50-50 between (a) and (c).
In the interest of time, the chair chose one of the proposals
and we

RESOLVED: to go option (a) without FROM/FROM_NAMED, dataset is
unconstrained; with FROM/FROM_NAMED, dataset is bounded from below
by given references.
SH objects. abstaining: EricP, DaveB

The feature seems to be specified to the satisfaction of a critical mass of the community, and it seems unlikely that further deliberation of this issue would result in substantially more consensus.

The rdfSemantics issue was closed in our 2006-01-26 meeting over the objection of Pat Hayes, which was that the definitions are overly complex.
This issue arose from comments on the specification of matching in the July 2005 SPARQL draft with respect to the definition of RDF simple entailment. After discussing a number of use cases and design alternatives, the WG chose a design that was phrased in terms of entailment in such a way that it should extend to OWL more straightforwardly, but substantively, is not different from the July 2005 draft. After discussing the details of the definitions for some months, the chair observed a critical mass around a set of definitions and put the question despite outstanding dissent.
On 22 February, Peter F. Patel-Schneider sent comments on Section 1 and Section 2 of SPARQL Query Language for RDF:
In general I found the first two sections of the document very hard to understand. The mixing of definitions, explanation, information, etc. confused me over and over again. I strongly suggest an organization something like:
- Introduction (informative)
- Formal development (normative)
  - Underlying notions (normative)
  - Patterns and matching (normative)
- SPARQL syntax (normative)
- Informal narrative (informative)
- Examples (informative)
I also found that things that didn't need to be explained were explained, and things that did need to be explained were not explained. A major example of the latter is the role of the scoping graph. Examples showing why E-matching is defined the way it is would be particularly useful.

Because of the problems I see in Section 2, I do not feel that I can adequately understand the remainder of the document.

Because of these problems I do not feel that this document should be advanced to the next stage in the W3C recommendation process without going through another last-call stage.
Our response of 22 March is:

After perhaps overly brief consideration of your comments, we are somewhat sympathetic to your concerns about organization and clarity; however, we also have schedule considerations and the investment in other reviewers. Re-organizing the document at this stage would delay things considerably; it's not even clear that we could get a sufficient number of reviewers to take another look before CR.

The specific examples you give below are very valuable; I am marking this thread [needstest], which allows us to find it more easily during CR and integrate the examples you give into our test suite. We have also discussed the possibility of significant organizational changes after CR, such as moving the formal definitions to the back of the document.

As far as I can tell, all of the examples you give are useful clarification questions, but they do not demonstrate design errors. If they do, in fact, demonstrate design errors, I'm reasonably confident we will discover that as we integrate them into our test suite during CR.

Are you, by chance, satisfied by this response, which does not involve making the changes you request at this time, but includes an offer to give them due consideration after we request CR? If not, there's no need to reply; I'm marking this comment down as outstanding dissent unless I hear otherwise.

On 5 March, Elliotte Harold asked that we don't use ? and $. Pick one. He was not satisfied by our attempts to justify our decision as part of punctuationSyntax issue:

> >> A number of design considerations were laid out in:
> >> Draft: open issues around '?' use.
> >> 
> 
> I think this makes some good arguments for using a $ instead of a ?. 
> However it doesn't convince me that using both is a good idea. Why are 
> two characters considered necessary here? Why not just pick the $ and be 
> done with it?

The use of ?var syntax in SPARQL goes back all the way to the 1st
WD in October 2004
 
The number of reviewers, users, and implementors that we would need
to collaborate with in order to take ?var out is considerable, and
it's not clear that we have an argument that is sufficient to convince
them. True, allowing both adds various costs, but this is largely
sunk cost. The details of the specification are worked out; we have
test cases and multiple implementations. A growing number of users
have learned the ?var syntax, and those that need to use ODBC-style
systems seem to know about and be happy with $var.
It seems unlikely that we would get consensus around a change
to take out ?var or $var in a reasonable amount of time, and the
number of parties that are interested to see SPARQL advance to
Candidate Rec soon is considerable.

Again, please let us know whether you find this response satisfactory.

Status of these documents (proposed)

Summary of Review

Outstanding dissent (formal objections)

Ammendments