Design:SubSelect

From SPARQL Working Group
Revision as of 12:49, 8 March 2011 by Aseaborne (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Definition and scope of feature

A SubSelect is the ability to nest a SELECT query inside SPARQL query pattern. The sub-select is evaluated and the results (a table) form part of the enclosing group, to be combined with other elements of the group.

Status

Lots of open questions, especially around various scoping issues. Not clear how much this section should reference Project Expressions or Aggregates.

  • ISSUE-5 : Can ASK queries go in FILTERs?
  • ISSUE-6 : Can SELECT queries go in FILTERs?
  • ISSUE-7 : Can CONSTRUCT and DESCRIBE queries go in FROM/FROM NAMED?
  • ISSUE-8 : Currently, tending to no separate dataset for sub-Select.
  • ISSUE-10 : Can ASK queries go in graph patterns?
  • ISSUE-13 : Can HAVING clauses have the same subqueries as FILTER clauses?

Syntax

The inlcuded sub-select is the same as a top-level SELECT except it excludes the prologue (BASE, PREFIX) and dataset description clause (FROM, FROM NAMED)

(This example includes aggregation and project expressions)

SubSelect ::= 'SELECT' ( 'DISTINCT' | 'REDUCED' )? 
              ( ( Var | AggExpression | BuiltInCall | FunctionCall | ( '(' Expression ( 'AS' Var )? ')' ) )+ | '*' )
              WhereClause
              SolutionModifier
GroupGraphPattern ::=  '{' ( SubSelect | GroupGraphPatternSub ) '}']
GroupGraphPatternSub ::=  ... as before for original GroupGraphPattern ...

The grammar as it stands does not allow for unambiguous parsing of projected expressions, so some (back compatible) extension to the grammar is required. Some choices for the syntax are:

The syntax used in the above grammar.

SELECT (?x+3 AS ?xp3) ?y WHERE ...

A form which depends on adding commas to expression lists, which was included in the list of "syntax enhancements" - a "time permitting" feature.

SELECT ?x+3 AS ?xp3, ?y WHERE ...

Equally there are choices for the bracketing syntax of the SubSelects, but the most popular choice seems to be a pair of { ... }, for example:

...
WHERE {
  ...
  {
    SELECT [Vars] WHERE {
      [SubSelect]
    }
  }
}

A.8 Grammar in SPARQL/Query 1.0.

Algebra operator

SPARQL/Query 1.0 works on a multiset of bindings during graph pattern matching and a sequence for the solution modfiers. Multisets don't preserve ordering whereas sequences can do.

The algebra changes to combine these two concepts: a table is a possibly ordered sequence of bindings. Graph pattern operations do not.

Alternative approach: There is already a toList operation that changes a multiset into a sequence of indeterminate order. Introduce a toMultiSet operator that takes the sequence output of sub-select and yields a multiset with the same cardinalities of members but loosing any ordering.

The alternative approach above my produce surprising results, eg in:

SELECT ?y ?name
WHERE {
  ?x foaf:knows ?y .
  {
    SELECT ?y SAMPLE(?name)
    WHERE {
      ?x foaf:name ?name . 
    }
    ORDER BY ?x
    GROUP BY ?name
  }
}

Will the user expect the ORDER BY to be preserved outside of the SubSelect {}s?

Mapping from AST to algebra

The mapping depends largely on the Operator chosen, and the used to toList etc.

If toList etc. is used then a possible mapping is

[ based on 12.2.1 of SPARQL/Query 1.0 ]

 Let FS := the empty set
 Let G := the empty pattern, Z, a basic graph pattern which is the empty set.

 For each element E in the GroupGraphPattern
    If E is of the form FILTER(expr)
        FS := FS set-union {expr}
    If E is of the form OPTIONAL{P}
    Then
        Let A := Transform(P)
        If A is of the form Filter(F, A2)
            G := LeftJoin(G, A2, F)
        else 
            G := LeftJoin(G, A, true)
+   If E is of the form SubSelect(S)
+      G := Join(G, toList(S)
    If E is any other form:
       Let A := Transform(E)
       G := Join(G, A)
   
  
 If FS is not empty:
   Let X := Conjunction of expressions in FS
   G := Filter(X, G)

 The result is G.

Also need to add algebra for GROUP BY and the aggregate functions we want to support - unless we have a separate aggregate section?

Test Cases

Data

people-a.ttl (default graph):

@prefix : <http://people.example/> .

:alice :name "Alice", "Alice Foo", "A. Foo" .
:alice :knows :bob, :carol . 
:bob :name "Bob", "Bob Bar", "B. Bar" .
:carol :name "Carol", "Carol Baz", "C. Baz" .

SubSelect with LIMIT

Return a name (the one with the lowest sort order) from all the people that know Alice and have a name.

PREFIX : <http://people.example/>
SELECT ?y ?name WHERE {
  :alice :knows ?y .
  {
    SELECT ?y ?name WHERE {
      ?y :name ?name
    }
    ORDER BY ?name
    LIMIT 1
  }
}

results:

?y          ?name
:bob        "B. Bar"
:carol      "C. Baz"

Note: the projection of ?y may be required if we want the result multiset from the SubSelect to be compatible with the outer graph pattern. This is related to a decision around what we want the scope of variables to be.

Addendum:

The query above does not address the use case: after the design of aggregate and sub-queries,

PREFIX  :     <http://people.example/>

SELECT  ?y ?name
WHERE
  { :alice :knows ?y
    { SELECT  ?y (MIN(?n) AS ?name)
      WHERE
        { ?y :name ?n }
      GROUP BY ?y
    }
  }

=== SubSelect with COUNT aggregate ===

Return the URIs of all the people Alice knows, and how many names they have (requires aggregates):

<pre>
PREFIX : <http://people.example/>
SELECT ?y ?names WHERE {
  :alice :knows ?y .
  {
    SELECT ?y (COUNT(?name) AS ?names) WHERE {
      ?y :name ?name
    }
    GROUP BY ?y
  }
}

results:

?y       ?names
:bob     3
:carol   3

SubSelect with GRAPH

A test of the expectations around the meaning of GRAPH in combination with SubSelects:

SELECT * WHERE {
  GRAPH <some-other.ttl> {
    {
      SELECT ?x WHERE {
        ?x ?p [] .
      }
    }
  }
}

results:

?x

Note: do we have/want to require syntax like GRAPH ?x {{ [SubSelect] }} or is GRAPH ?x { [SubSelect] } sufficient?