I just received a question about SPARQL parsing and algebra and thought I'd share the answer with the world. Given a query:
SELECT DISTINCT ?title WHERE { ?s1 <p1> ?o1 . ?s1 <p2> ?o2 . OPTIONAL { ?s1 <p3> ?o3 . OPTIONAL { ?s1 <p4> ?o4 . } FILTER (!bound(?o4)) } FILTER (!bound(?o3)) }
, the ARQ SPARQL Validator gives:
1 (base <http://example/base/> 2 (distinct 3 (project (?title) 4 (filter (! (bound ?o3)) 5 (leftjoin 6 (bgp 7 (triple ?s1 <p1> ?o1) 8 (triple ?s1 <p2> ?o2) 9 ) 10 (leftjoin 11 (bgp (triple ?s1 <p3> ?o3)) 12 (bgp (triple ?s1 <p4> ?o4))) 13 (! (bound ?o4)))))))
The question was: why is that "(! (bound ?o4) )
" down on line 13 instead of between line 9 and line 10?
The short answer is that it's a parameter of the outer OPTIONAL
.
Were the outer OPTIONAL
not optional, "(! (bound ?o4) )
" would indeed be on line 10.
The longer answer follows:
Substituting Basic Graph Patterns with "{Pn}
" for readability, we have:
{ {P₁} OPTIONAL { {P₂} OPTIONAL { {P₃} } FILTER (F₁) } FILTER (F₂) }
By the grammar, a list of graph patterns { {A} {B} {C} }
parses into a left tree like:
⋅₂ / \ ⋅₁ C / \ A B
(where '⋅' represents a GraphGraphPattern and doesn't imply any op like ⋈ or ⋉).
Looking at the outer pattern, { {P₁} OPTIONAL {…} FILTER (F₂) }
, we get:
⋅ / \ ⋅ FILTER (F₁) / \ P₁ OPTIONAL(…)
The next one in, { {P₂} OPTIONAL {…} FILTER(F₁) }
, gives:
⋅ / \ ⋅ FILTER(F₂) / \ P₂ OPTIONAL(…)
In total, our parse tree looks like:
⋅₄ / \ ⋅₃ FILTER(F₁) / \ P₁ OPTIONAL(⋅₂) / \ ⋅₁ FILTER(F₂) / \ P₂ OPTIONAL(P₃)
Now we translate (AKA Transform) each of the '⋅'s to SPARQL Algebra per the GraphGraphPattern rule.
Naturally, this is a recursive definition, so the Transform(⋅₄)
needs the Transform(⋅₃)
, which needs Transform(⋅₂)
, etc.
Starting from the deepest recursion, we want to Transform(⋅₁)
.
Transform(⋅₁)
Per 18.2.2.5 Translate Graph Patterns, we start with some initial state:
Let FS := the empty set Let G := the empty pattern, Z, a basic graph pattern which is the empty set.
and examine each element in the pattern (i.e. both elements).
For each element E in the GroupGraphPattern
The left side of our pattern has E = P₂
, and P₂
doesn't match any of the
specially-handled patterns (FILTER, OPTIONAL, MINUS, BIND) in 18.2.2.5 :
If E is any other form Let A := Transform(E) G := Join(G, A)
E
is P₂
and per the initial state, G
is Z
. Per 18.2.2.3 Translate Basic Graph
Patterns
adjacent triple patterns are collected together to form a basic graph pattern BGP(triples)
so Transform(P₂)
is BGP(P₂)
. Per the 18.2.2.6 Simplification step, Join(Z, P₂)=P₂
so G = P₂
. Now for the next element E
in the GroupGraphPattern ⋅₁:
If E is of the form OPTIONAL{P} Let A := Transform(P) If A is of the form Filter(F, A2) … Else G := LeftJoin(G, A, true)
so Transform(⋅₁)
is (leftjoin P₂ P₃ true)
.
Transform (⋅₂)
The left side of ⋅₂
leaves with G = Transform(⋅₁) = (leftjoin P₂ P₃ true)
.
The right side gives us (filter (F₂) Transform(⋅₁))
by applying
If E is of the form FILTER(expr) FS := FS ∪ {expr}
and
If FS is not empty Let X := Conjunction of expressions in FS G := Filter(X, G)
so Transform (⋅₂)
is (filter (F₂) (leftjoin P₂ P₃ true))
.
Transform (⋅₃)
The left side leaves G = BGP(P₁)
.
The is an optional which matches the form Filter(F, A2)
. The right side
If E is of the form OPTIONAL{P} Let A := Transform(P) If A is of the form Filter(F, A2) G := LeftJoin(G, A2, F) Else …
gives us LeftJoin(G, (leftjoin P₂ P₃ true), F₂)
so Transform (⋅₃)
is LeftJoin(BGP(P₁), (leftjoin P₂ P₃ true), F₂)
.
Transform (⋅₄)
From the left side, G = LeftJoin(BGP(P₁), (leftjoin P₂ P₃ true), F₂). The right is a filter so we apply
If E is of the form FILTER(expr) FS := FS ∪ {expr}
and
If FS is not empty Let X := Conjunction of expressions in FS G := Filter(X, G)
so Transform (⋅₄)
is (filter F₁ LeftJoin(BGP(P₁), (leftjoin P₂ P₃ true), F₂))
.
Evaluation
Note that the evaluation of LeftJoin(G, A2, F) is a bit unusual in that an expression F is part of the join. In 18.4, the definition of LeftJoin
LeftJoin(Ω1, Ω2, expr) = Filter(expr, Join(Ω1, Ω2)) ∪ Diff(Ω1, Ω2, expr) Diff(Ω1, Ω2, expr) = { μ | μ in Ω1 such that ∀ μ′ in Ω2, either μ and μ′ are not compatible or μ and μ' are compatible and expr(merge(μ, μ')) has an effective boolean value of false }
effectively says that
for each row μ1 in Ω1, for each row μ2 in Ω2, if μ1 is compatible with μ2 AND F(μ1 μ2)==true add a row with the total bindings μ1 μ2 if no rows matched μ1, keep it anyways, otherwise, remove it.
There are other ways to implement the don't-delete-incompatible-rows semantics, but this illustrates the process.
This also allows you to have one join function which handles Join, LeftJoin and Minus (which requires one to delete only the rows that match) and filtered joins ((filter (join Ω1 Ω2))
).