Feature:LimitPerResource
Contents
Feature: LIMIT per resource
The current LIMIT keyword is restricted in that it can only limit over a fixed number of solutions, but not per number of distinct resources in "object-oriented queries". We suggest a simple extension of the LIMIT solution modifier to remedy this.
Feature description
The current LIMIT keyword is restricted in that it can only limit over a fixed number of solutions, but not per number of distinct resources in "object-oriented queries". We suggest a simple extension of the LIMIT solution modifier to remedy this. Probably this is very related to grouping an aggregates.
Example
If we write
select ?post ?title ?author ?date where { ?post a sioc:Post ; dc:title ?title ; sioc:has_creator ?author ; dc:created ?date } LIMIT 10
in current SPARQL it retrieve 10 different posts from SIOC data only if there's one author / title / date per post. But if for instance one post in the store has 3 different authors, I will retrieve only 8 distinct ?post, but indeed 10 different (post, title, author, date) solution tuples.
Could there be something like
select ?post ?title ?author ?date where { ?post a sioc:Post ; dc:title ?title ; sioc:has_creator ?author ; dc:created ?date } LIMIT ?post 10
that will in that case indeed retrieve 10 posts (and 12 quadruples)?
Alternative syntaxes could be
LIMIT 10 ON ?POST
or use grouping
LIMIT 10 GROUP BY ?POST
Existing Implementation(s)
We are not aware of existing implementations, but it seems to be implementable easy enough, especially if a grouping mechanism was added to the language.
Existing Specification / Documentation
N/A
Compatibility
An extension for allowing LIMIT over groups/resources should be fully upwards compatibly.
Links to postponed Issues
The feature duplicates the functionality of nested queries. Semantics of aggregates and DISTINCT needs detalization. LIMIT usually assumes ORDER BY. (if someone has more hints, please add)
Related Use Cases/Extensions
GROUP BY and aggregates are probably related.
On IRC, Andy points out that this can be accomplished with SubSelects:
SELECT ?post ?title ?author ?date { { SELECT ?post { ?post a sioc:Post } LIMIT 10 } ?post dc:title ?title ; sioc:has_creator ?author ; dc:created ?date. }
The approach of using sub-selects applies to CONSTRUCT and DESCRIBE queries as well.
Champions
- Alexandre Passant / DERI
- Kjetil Kjernsmo / Computas AS
Use Cases
- Retrieving a certain number of posts along with potentially multi-valued ancillary information about them (see above examples)
Computas feels this is a very important feature, as the lack of it has caused some very timeconsuming problems. The core problem has two aspects: One is that applications often have a concept of "number of hits" or "retrieve a certain number of records". Then we note that the RDF model is most useful for heterogenous data. Thus, there is no predictable number of properties linked to a "record", and furthermore, it is impossible to predict the total number of solutions relative to the number of "records".
More concretely, we have queries of the type (we don't have exactly this query, but this exemplifies the problem well):
CONSTRUCT { ?resource a sioc:Item ; dct:title ?title ; dct:subject ?class . ?class a skos:Concept ; skos:prefLabel ?label . } WHERE { ?resource a ?class ; dct:title ?title ; dct:language <http://www.lingvoj.org/lang/no> . ?class rdfs:label ?label . }
The idea here is that OWL is used to classify items in the backend, but exposed to the frontend as SKOS-modelled concepts, and also each item is exposed as a sioc:Item, irrespective of the type it had in the backend.
There can be thousands of matching items, thus it takes a long time and is a general waste of resources to retrieve them all. Thus a paging solution is needed, and you would in SQL implement this easily by LIMIT and OFFSET. However, as there can be any number of classes this is not possible in the simple example above, unless we can say LIMIT ?resource 10.
With DESCRIBEs, there is another concern. DESCRIBEs can return a number of triples that is impossible to foresee before running the query. Typically, you'd like all properties and objects of a certain number of subjects, so for example given the data:
</dahut> a foaf:Image ; dc:title "A picture of a Dahut"@en ; dc:description "Look at the legs!"@en . </foo> a foaf:Image ; dc:title "There, behind the clouds"@en . </bar> a foaf:Image ; dc:title "Standing at the bar"@en .
and this query
DESCRIBE ?resource WHERE { ?resource dc:title ?title . } LIMIT ?resource 2 ORDER BY ?title
would return
</dahut> a foaf:Image ; dc:title "A picture of a Dahut"@en ; dc:description "Look at the legs!"@en . </bar> a foaf:Image ; dc:title "Standing at the bar"@en .
since you have 3 subjects that can be bound to ?resource and you take the first two based on the ORDER clause.
This has the same motivation as the CONSTRUCT example above, it enables you think about each foaf:Image as a record, rather than a solution.