Warning:
This wiki has been archived and is now read-only.

CommentResponse:JBolleman-1

From SPARQL Working Group
Jump to: navigation, search

Hi Jerven,

This response aims to address comments

 http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011May/0016.html
 http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011May/0017.html

Dear workgroup,

I was recently made aware that there is no easy way to get a guaranteed working pagination.
 
i.e. QUERY OFFSET 0 LIMIT 5 page 1
     QUERY OFFSET 5 LIMIT 5 page 2
     QUERY OFFSET 10 LIMIT 5 page 3

Without adding an ORDER BY clause. Adding any kind of ORDER BY clause would be enough to 
ensure pagination worked. I would therefore like to see an  ORDER BY * or ORDER BY ANY 
option. To ensure that the results come  in some implementation specific order and that 
this can be used to show all possible results.

Trying a few public current SPARQL implementations. With ORDER BY * showed that this is 
currently not implemented. Although pagination with OFFSET and LIMIT without an ORDER BY 
clause  seems to work as a naive user (e.g. me) would expect. Meaning that for current 
SPARQL implementers it is no work at all other than dealing with a slightly different 
SPARQL grammar.

Pagination guaranteed to succeed would then be 

i.e. QUERY OFFSET 0 LIMIT 5 ORDER BY ANY page 1
     QUERY OFFSET 5 LIMIT 5 ORDER BY ANY page 2
     QUERY OFFSET 10 LIMIT 5 ORDER BY ANY page 3

The other option is to expand the description of the OFFSET clause. For example the use 
of the OFFSET clause should guarantee that query results come back in a consistent order.

I hope this concern makes sense.

Regards,
Jerven

You are right in the observation that the spec does not prescribe pagination via OFFSET and LIMIT to be reproducible unless combined with a total order over the results.

First of all, note that a shortcut like "ORDER BY *" as you suggest would not guarantee a predicable total order of results, because a) ORDER BY does not guarantee a total order, cf. http://www.w3.org/TR/sparql11-query/#modOrderBy and b) for instance when blank nodes are returned two separate calls are not guaranteed to return the same blank node identifiers.

Further, the proposed feature/behavior is beyond the current scope of our charter [1]. When we had discussed the selection of features to be addressed in this round of SPARQL, a related proposal was on the table [2] but didn't find a majority within the group. (Our selection of additional features/extensions to be addressed in SPARQL1.1 was made by support and available resources within the group.)

Note that, another reason why we do not support this as a query language feature, is that this should be considered rather a protocol issue: existing HTTP mechanisms are applicable like ETags for consistency and ranges for slicing. Client side paging off a stream of results is also a candidate mechanism.

Cursors, paging, (transactions) etc are about controlling the flow of results and about results over multiple requests, not in defining results.

This clear-cut separation is also important in the context of SPARQL1.1 Update: potential interactions with update, system restarts and anything that means the server would loose state or simple re-execution of the query would produce different answers even in a deterministic query processor.

So, we are afraid at this stage and within the remaining resources of the working group, we won't be able to address this suggestion in the current working group.

with best regards, Axel, on behalf of the SPARQL WG

1. http://www.w3.org/2009/05/sparql-phase-II-charter.html

2. http://www.w3.org/2009/sparql/wiki/Feature:Cursors