W3C

The SPARQL Algebra

This is not an editors working draft
$Id: rq24-algebra.html,v 1.15 2007/01/03 15:14:36 aseaborne Exp $
Editor:
Andy Seaborne, Hewlett-Packard Laboratories, Bristol

Status

This document has no status as a Data Access Working Group document whatsoever. It is merely workspace while developing the text for rq24.


Background

 

X SPARQL The SPARQL Algebra

The SPARQL Algebra defines the semantics of a SPARQL query execution. The algebraic expression is formed from the query string by first parsing, and then applying transformations as described below. The result of a query can then be calculated by the evaluation rules applied to the algebraic expression. This gives the correct results of  query -- it does not imply the actual execution of a query must be performed in this manner.

  1. Mapping a query string to a SPARQL abstract query
  2. SPARQL Algebra
  3. Evaluation Semantics

X.1 Mapping a SPARQL Query to a SPARQL Abstract Query

X.1.1 Mapping Graph Patterns

Parsing a SPARQL query string produces an syntax tree comprised of:

The result is a SPARQL abstract query that uses the symbols:

This is not the grammar in rq24. Because of the explicit procedure to map groups to Join/LeftJoin expressions, this more direct form can be used and it more accurately relates the the conversion procedure.

[20]   GroupGraphPattern   ::=   '{' GraphPatternElement '}'
[21]   GraphPatternElement   ::=   BasicGraphPattern? ( GraphPatternNotTriples '.'? GraphPatternElement )?
[22]   BasicGraphPattern   ::=   TriplesSameSubject ( '.' BasicGraphPattern? )?
[23]   GraphPatternNotTriples   ::=   OptionalGraphPattern | GroupOrUnionGraphPattern | GraphGraphPattern | Constraint
[24]   OptionalGraphPattern   ::=   'OPTIONAL' GroupGraphPattern
[25]   GraphGraphPattern   ::=   'GRAPH' VarOrIRIref GroupGraphPattern
[26]   GroupOrUnionGraphPattern   ::=   GroupGraphPattern ( 'UNION' GroupGraphPattern )*
[27]   Constraint   ::=   'FILTER' ( BrackettedExpression | BuiltInCall | FunctionCall )

Define "GraphPattern" as one of GroupGraphPattern, GroupOrUnionGraphPattern, GraphGraphPattern, OptionalGraphPattern and BasicGraphPattern.

Step 1 : BasicGraphPatterns

Replace all BasicGraphPattern elements by BGP(list of triple patterns)

Step 2 : GroupOrUnionGraphPattern

Replace any GroupOrUnionGraphPattern elements:

Step 3 : GraphGraphPattern

Map GRAPH IRI GroupGraphPattern to Graph(IRI, GroupGraphPattern)
Map GRAPH Var GroupGraphPattern to Graph(var, GroupGraphPattern)

Step 4 : GroupGraphPattern

We introduce the following symbols:

After parsing, a group pattern consists of a sequence of graph patterns and filters (Constraint).

A group pattern is mapped into the SPARQL algebra as follows:

This is the left-associative operator assignment, mapping optional to LeftJoin, adjacency to Join, and noting that LeftJoin can take a filter from its right (fixed) subelement.

Write: "A" for an algebra expression

Map all graph patterns contained in this group to give a list, SP, of algebra expressions
Let F := the conjunction of all filters appearing as elements of the GROUP expressions
Let G := the empty pattern, {}

for i := 0 ; i < length(SP); i++
   If SP[i] is an OPTIONAL,
      If SP[i] is of the form OPTIONAL(Filter(F, A))
           G := LeftJoin(G, A, F)
      else
           G := LeftJoin(G , A, true)
   Otherwise for expression SP[i], G := Join(G, SP[i])

If F is not empty:
  If G = empty pattern then G := Filter(F, empty pattern)
  If G = LeftJoin(A1, A2, true) then G := LeftJoin(A1, A2, F)
  If G = Join(A1, A2) then G := Filter(F, Join(A1, A2)
  If G = Union(A1, A2) then G := Filter(F, Union(A1, A2))
  If G = Graph(x, A) then G := Filter(F, Graph(x, A)) where x is a variable or IRI.
The result is G

Step 5 : Simplification

Groups of one graph pattern (not a filter) become join({}, A) and can be replaced by A

Replace join({}, A) by A

@@Join(A, {}) can also be removed.

X.1.2 Examples

The second form of a rewrite example is the first with empty group joins removed by step 5.

{ ?s ?p ?o }
BGP(?s ?p ?o)
{ ?s :p1 ?v1 ; :p2 ?v2 }
BGP( ?s :p1 ?v1 .?s :p2 ?v2 )
{ { ?s :p1 ?v1 } UNION {?s :p2 ?v2 } }
Union(Join({}, BGP(?s :p1 ?v1)), Join({}, BGP(?s :p2 ?v2)) )
Union( BGP(?s :p1 ?v1) , BGP(?s :p2 ?v2) )
{ { ?s :p1 ?v1 } UNION {?s :p2 ?v2 } UNION {?s :p3 ?v3 } }
Union(Join({}, BGP(?s :p1 ?v1)), Join({}, BGP(?s :p2 ?v2)) , Join({}, BGP(?s :p3 ?v3)) )
Union( Union( BGP(?s :p1 ?v1) , BGP(?s :p2 ?v2), BGP(?s :p3 ?v3)) ,
{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 } }
LeftJoin(Join({}, BGP(?s :p1 ?v1)), Join({}, BGP(?s :p1 ?v1)) ), true)
LeftJoin(BGP(?s :p1 ?v1), BGP(?s :p2 ?v2), true)
{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 } OPTIONAL { ?s :p3 ?v3 } }
LeftJoin(
     LeftJoin(
        BGP(?s :p1 ?v1),
        BGP(?s :p2 ?v2),
        true) ,
     BGP(?s :p3 ?v3),
     true)
{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 FILTER(?v1<3) } }
LeftJoin(Join({}, BGP(?s :p1 ?v1)), Join({}, BGP(?s :p1 ?v1)) ), (?v1<3) )
LeftJoin( BGP(?s :p1 ?v1) , BGP(?s :p2 ?v2) ), (?v1<3) )
{ {?s :p1 ?v1} UNION {?s :p2 ?v2} OPTIONAL {?s :p3 ?v3} }
LeftJoin(
  Union(BGP(?s :p1 ?v1), BGP(?s :p2 ?v2)) ,
  BGP(?s :p3 ?v3) ,
  true )
{ ?s :p1 ?v1} FILTER (?v1 < 3 ) OPTIONAL {?s :p3 ?v3} }
Filter( ?v1 < 3 ,
        LeftJoin( BGP(?s :p1 ?v1), BGP(?s :p2 ?v2), true) ,
       )

X.1.3 Solution Modifiers

Symbols after parsing:

ToList turns a multiset into a sequence with the same elements and cardinality. There is no implied ordering to the sequence; duplicates need not be adjacent.

Slice is the combination of Offset and Limit

where Mod is any of the modifiers above.

 

Step 1 : ToList

Set M = ToList(Pattern)

Step 2 : ORDER BY

If the query string has any ORDER BY clauses

M =  OrderBy(M, list of of order conditions)

Step 3 : Projection

M = Project(M, vars)

where vars is the set of variables mentioned in the SELECT clause or all named variables in the query if SELECT * used.

Step 4 : Distinct

If the query contains DISTINCT,

M = Distinct(M)

Step 5 : OFFSET and LIMIT

If the query mentions "OFFSET start" or "LIMIT length"

M = Slice(M, start, length)

start defaults to 0

length defaults to the (size(M)-start)

 

The overall abstract query is M.

X.2 SPARQL Algebra Operators

@@A note about multisets: set and a cardinality function

X.2.1 Algebra operators

These operators are not those of the SPARQL abstract query form. The SPARQL algebra operators of the same name are used to evaluate SPARQL abstract query nodes as described in the section "Evaluation Semantics"

Solution is a mapping.

Defn: Substitution function

A mapping μ from V to T is a partial function μ : V -> T.

The domain of μ, dom(μ) is the subset of V where μ is defined.

Write μ0 for the mapping such that dom(μ0) is the empty set.

Write Ω0 for the multiset consisting of exactly the empty mapping μ0, with cardinality 1.

Write μ(?x->t) for the substitution function mapping variable x to RDF term t : { (x, t) }

Write Ω(?x->t) for multiset consisting of exactly μ(?x->t), that is, { { (x, t) } } with cardinality 1.

Defn: Compatible Mappings

Two mappings μ1 and μ2 are compatible if, for every variable v in dom(μ1) and in dom(μ2), μ1(v) = μ2(v).

If μ1 and μ2 are compatible then μ1 set-union μ2 is also a mapping.

Write merge(μ1, μ2) for μ1 set-union μ2

Write card[Ω](μ) for the cardinality of μ in a multiset of mappings Ω.

@@Cardinality function is not defined for μ not in Ω.

@@ Define expr(μ)

Defn: Filter

Let Ω be a multiset of mappings and expr be an expression. We define:

Filter(expr, Ω) = { μ | μ in Ω and expr(μ) is an expression that has a boolean effective value of true }

card[Filter(expr, Ω)](μ) = card[Ω](μ)

Defn: Join

Let Ω1 and Ω2 be multisets of mappings. We define:

Join(Ω1, Ω2) = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are compatible }

card[Join(Ω1, Ω2)](μ) = sum over μ in (Ω1 set-union Ω2), card[Ω1](μ1)*card[Ω2](μ2)

Defn: Diff

Let Ω1 and Ω2 be multisets of mappings. We define:

Diff(Ω1, Ω2, expr) =
        { μ | μ in Ω1 such that for all μ′ in Ω2, μ and μ′ are not compatible }
            set-union
        { μ | μ in Ω1 such that for all μ′ in Ω2, μ and μ' are compatible and expr(merge(μ, μ')) is false }

card[Diff(Ω1, Ω2, expr)](μ) = card[Ω1](μ)

Defn: LeftJoin

Let Ω1 and Ω2 be multisets of mappings and F a filter. We define:

LeftJoin(Ω1, Ω2, expr) = filter(expr, Join(Ω1, Ω2)) set-union diff(Ω1, Ω2, expr)

card[LeftJoin(Ω1, Ω2, expr)](μ) = card[filter(expr, Join(Ω1, Ω2))](μ) + card[diff(Ω1, Ω2, expr)](μ)

Written in full that is:

LeftJoin(Ω1, Ω2, expr) =
        { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are compatible and expr(merge(μ1, μ2)) is true }
            set-union
        { μ1 | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are not compatible }
            set-union
        { μ1 | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are compatible and expr(merge(μ1, μ2)) is false }

As these are distinct, the cardinality of LeftJoin is cardinality of these individual components of the definition.

Defn: Union

Let Ω1 and Ω2 be multisets of mappings. We define:

Union(Ω1, Ω2) = { μ | μ in Ω1 or μ in Ω2 }

card[Union(Ω1, Ω2)](μ) = card[Ω1](μ) + card[Ω2](μ)

Write [x | C] for a sequence of elements satisfying some condition C

Write card[L](x) to be the cardinality of x in L

Defn: ToList

Let Ω be a multiset of mappings.  We define:

ToList(Ω) = a sequence of mappings μ in Ω in any order, with card[Ω](μ) occurrences of μ

ToList(Ω) = [ μ | μ in Ω] ; card[ToList(Ω)](μ) = card[Ω](μ)

Defn: OrderBy

Let Ψ be a sequence of mappings.  We define:

OrderBy(Ψ, condition) = [ μ | μ in Ψ and the sequence satisfies the ordering condition]

card[OrderBy(Ψ, condition)](μ) = card[Ψ](μ)

Defn: Distinct

Let Ψ be a sequence of mappings.  We define:

Distinct(Ψ) = [ μ | μ in Ψ] ; card[Distinct(Ψ)](μ) = 1 ;

The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

Defn: Project

Let Ψ be a sequence of mappings and V a set of variables.

For mapping μ, write P(μ, V) to be the restriction of μ to variables in V.

Project(Ψ, V)[i]  = P(Ψ[i], V)

Defn: Slice

Let Ψ be a sequence of mappings.  We define:

Slice(Ψ, start, length)[i] = Ψ[start+i] for i = 0 to (length-1)

X.3 Evaluation Semantics

We define eval(D(G), graph pattern) as the evaluation of a graph pattern with respect to a dataset D having active graph G. The active graph is initially the default graph. The active graph is used to match the pattern unless otherwise stated.

@@Note the change from Join(pattern, pattern) (introduced parse symbols) to Join(mapping multiset, mapping multiset) etc.

D : a dataset
D(G) : D a dataset with active graph G (the one patterns match against)
D[i] : The graph with IRI i in dataset D
D[DFT] : the default graph of D
Defn: Evaluation of Join(P1, P2, F)
eval(D(G), Join(P1, P2)) = Join(eval(D(G), P1), eval(D(G), P2))
eval(D(G), Join(P1, P2), F) = Filter(F, Join( eval(D(G), P1), eval(D(G), P2) ) )
Defn: Evaluation of LeftJoin(P1, P2)
eval(D(G), LeftJoin(P1, P2, F)) = LeftJoin(eval(D(G), P1), eval(D(G), P2), F)
Defn: Evaluation of a Basic Graph Pattern

Section 2.5

Defn: Evaluation of a Union Pattern
eval(D(G), Union(P1,P2)) = Union(eval(D(G), P1), eval(D(G), P2))
Defn: Evaluation of a Graph Patten
eval(D(G), Graph(IRI,P)) = eval(D(D[i]), P)
eval(D(G), Graph(var,P)) =
    multiset-union over IRI i in D : join( eval(D(D[i]), P) , Ω(?v->i) )
Defn: Evaluation of ToList(L)
eval(List(P), Graph(IRI,P)) = ToList(eval(D[DFT], Graph(IRI,P))
Defn: Evaluation of Distinct(L)
eval(Distict(L)) = Distinct(eval(L))
Defn: Evaluation of Project(L, vars)
eval(Project(L, vars)) = Project(eval(L), vars)
Defn: Evaluation of OrderBy(L, expression)
eval(OrderBy(L, condition)) = OrderBy(eval(L), condition)
Defn: Evaluation of Slice(L, start, length)
eval(Slice(L, start, length)) = Slice(eval(L), start, length)

$Log: rq24-algebra.html,v $
Revision 1.15  2007/01/03 15:14:36  aseaborne
Corrected from Fred Zemke (28 Dec 2006)

Revision 1.14  2006/12/15 18:09:07  aseaborne
Changes in response to

2006OctDec/0206
as noted in
2006OctDec/0208