Feature:LimitPerResource

From SPARQL Working Group
Revision as of 17:56, 17 April 2009 by Aseaborne (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Feature: LIMIT per resource

The current LIMIT keyword is restricted in that it can only limit over a fixed number of solutions, but not per number of distinct resources in "object-oriented queries". We suggest a simple extension of the LIMIT solution modifier to remedy this.

Feature description

The current LIMIT keyword is restricted in that it can only limit over a fixed number of solutions, but not per number of distinct resources in "object-oriented queries". We suggest a simple extension of the LIMIT solution modifier to remedy this. Probably this is very related to grouping an aggregates.

Example

If we write

select ?post ?title ?author ?date
where {
?post a sioc:Post ;
  dc:title ?title ;
  sioc:has_creator ?author ;
  dc:created ?date
} LIMIT 10

in current SPARQL it retrieve 10 different posts from SIOC data only if there's one author / title / date per post. But if for instance one post in the store has 3 different authors, I will retrieve only 8 distinct ?post, but indeed 10 different (post, title, author, date) solution tuples.

Could there be something like

select ?post ?title ?author ?date
where {
?post a sioc:Post ;
  dc:title ?title ;
  sioc:has_creator ?author ;
  dc:created ?date
} LIMIT ?post 10

that will in that case indeed retrieve 10 posts (and 12 quadruples)?

Alternative syntaxes could be

LIMIT 10 ON ?POST

or use grouping

LIMIT 10 GROUP BY ?POST

Existing Implementation(s)

We are not aware of existing implementations, but it seems to be implementable easy enough, especially if a grouping mechanism was added to the language.

Existing Specification / Documentation

N/A

Compatibility

An extension for allowing LIMIT over groups/resources should be fully upwards compatibly.

Links to postponed Issues

The feature duplicates the functionality of nested queries. Semantics of aggregates and DISTINCT needs detalization. LIMIT usually assumes ORDER BY. (if someone has more hints, please add)

Related Use Cases/Extensions

GROUP BY and aggregates are probably related.

On IRC, Andy points out that this can be accomplished with SubSelects:

 SELECT ?post ?title ?author ?date
 {  
   { SELECT ?post { ?post a sioc:Post } LIMIT 10 }
   ?post dc:title ?title ; sioc:has_creator ?author ; dc:created ?date.
 }

The approach of using sub-selects applies to CONSTRUCT and DESCRIBE queries as well.

Champions

Use Cases

  • Retrieving a certain number of posts along with potentially multi-valued ancillary information about them (see above examples)


Computas feels this is a very important feature, as the lack of it has caused some very timeconsuming problems. The core problem has two aspects: One is that applications often have a concept of "number of hits" or "retrieve a certain number of records". Then we note that the RDF model is most useful for heterogenous data. Thus, there is no predictable number of properties linked to a "record", and furthermore, it is impossible to predict the total number of solutions relative to the number of "records".

More concretely, we have queries of the type (we don't have exactly this query, but this exemplifies the problem well):

CONSTRUCT { 
?resource a sioc:Item ; 
         dct:title ?title ;
         dct:subject ?class .
?class    a skos:Concept ;
         skos:prefLabel ?label .
} 
WHERE {
?resource a ?class ; 
         dct:title ?title ;
         dct:language <http://www.lingvoj.org/lang/no> .
?class    rdfs:label ?label .
}

The idea here is that OWL is used to classify items in the backend, but exposed to the frontend as SKOS-modelled concepts, and also each item is exposed as a sioc:Item, irrespective of the type it had in the backend.

There can be thousands of matching items, thus it takes a long time and is a general waste of resources to retrieve them all. Thus a paging solution is needed, and you would in SQL implement this easily by LIMIT and OFFSET. However, as there can be any number of classes this is not possible in the simple example above, unless we can say LIMIT ?resource 10.

With DESCRIBEs, there is another concern. DESCRIBEs can return a number of triples that is impossible to foresee before running the query. Typically, you'd like all properties and objects of a certain number of subjects, so for example given the data:

</dahut> a foaf:Image ;
       dc:title "A picture of a Dahut"@en ;
       dc:description "Look at the legs!"@en .
</foo> a foaf:Image ;
       dc:title "There, behind the clouds"@en .
</bar> a foaf:Image ;
       dc:title "Standing at the bar"@en .

and this query

DESCRIBE ?resource WHERE {
  ?resource dc:title ?title .
} 
LIMIT ?resource 2 ORDER BY ?title 

would return

</dahut> a foaf:Image ;
       dc:title "A picture of a Dahut"@en ;
       dc:description "Look at the legs!"@en .
</bar> a foaf:Image ;
       dc:title "Standing at the bar"@en .

since you have 3 subjects that can be bound to ?resource and you take the first two based on the ORDER clause.

This has the same motivation as the CONSTRUCT example above, it enables you think about each foaf:Image as a record, rather than a solution.

References