Parsing and Simplifying OPTIONALs

I just received a question about SPARQL parsing and algebra and thought I'd share the answer with the world. Given a query:

SELECT DISTINCT ?title
WHERE {
  ?s1 <p1> ?o1 .
  ?s1 <p2> ?o2 .
  OPTIONAL {
    ?s1 <p3> ?o3 .
    OPTIONAL {
      ?s1 <p4> ?o4 .
    } FILTER (!bound(?o4))
  } FILTER (!bound(?o3))
}

, the ARQ SPARQL Validator gives:

  1 (base <http://example/base/>
  2   (distinct
  3     (project (?title)
  4       (filter (! (bound ?o3))
  5         (leftjoin
  6           (bgp
  7             (triple ?s1 <p1> ?o1)
  8             (triple ?s1 <p2> ?o2)
  9           )
 10           (leftjoin
 11             (bgp (triple ?s1 <p3> ?o3))
 12             (bgp (triple ?s1 <p4> ?o4)))
 13           (! (bound ?o4)))))))

The question was: why is that "(! (bound ?o4) )" down on line 13 instead of between line 9 and line 10? The short answer is that it's a parameter of the outer OPTIONAL. Were the outer OPTIONAL not optional, "(! (bound ?o4) )" would indeed be on line 10. The longer answer follows:

Substituting Basic Graph Patterns with "{Pn}" for readability, we have:

  {
    {P₁}
    OPTIONAL {
      {P₂}
      OPTIONAL {
        {P₃}
      } FILTER (F₁)
    } FILTER (F₂)
  }

By the grammar, a list of graph patterns { {A} {B} {C} } parses into a left tree like:

      ⋅₂
     / \
    ⋅₁  C
   / \
  A   B

(where '⋅' represents a GraphGraphPattern and doesn't imply any op like ⋈ or ⋉).
Looking at the outer pattern, { {P₁} OPTIONAL {…} FILTER (F₂) }, we get:

      ⋅
     / \
    ⋅   FILTER (F₁)
   / \
  P₁  OPTIONAL(…)

The next one in, { {P₂} OPTIONAL {…} FILTER(F₁) }, gives:

          ⋅
         / \
        ⋅   FILTER(F₂)
       / \
     P₂  OPTIONAL(…)

In total, our parse tree looks like:

      ⋅₄
     / \
    ⋅₃  FILTER(F₁)
   / \
  P₁  OPTIONAL(⋅₂)
              / \
             ⋅₁  FILTER(F₂)
            / \
          P₂  OPTIONAL(P₃)

Now we translate (AKA Transform) each of the '⋅'s to SPARQL Algebra per the GraphGraphPattern rule.

Naturally, this is a recursive definition, so the Transform(⋅₄) needs the Transform(⋅₃), which needs Transform(⋅₂), etc. Starting from the deepest recursion, we want to Transform(⋅₁).

Transform(⋅₁)

Per 18.2.2.5 Translate Graph Patterns, we start with some initial state:

Let FS := the empty set
Let G := the empty pattern, Z, a basic graph pattern which is the empty set.

and examine each element in the pattern (i.e. both elements).

For each element E in the GroupGraphPattern

The left side of our pattern has E = P₂, and P₂ doesn't match any of the specially-handled patterns (FILTER, OPTIONAL, MINUS, BIND) in 18.2.2.5 :

    If E is any other form 
        Let A := Transform(E)
        G := Join(G, A)

E is P₂ and per the initial state, G is Z. Per 18.2.2.3 Translate Basic Graph Patterns

      adjacent triple patterns are collected together to form a basic graph pattern BGP(triples)

so Transform(P₂) is BGP(P₂). Per the 18.2.2.6 Simplification step, Join(Z, P₂)=P₂ so G = P₂. Now for the next element E in the GroupGraphPattern ⋅₁:

    If E is of the form OPTIONAL{P} 
        Let A := Transform(P)
        If A is of the form Filter(F, A2)
            …
        Else 
            G := LeftJoin(G, A, true)

so Transform(⋅₁) is (leftjoin P₂ P₃ true).

Transform (⋅₂)

The left side of ⋅₂ leaves with G = Transform(⋅₁) = (leftjoin P₂ P₃ true). The right side gives us (filter (F₂) Transform(⋅₁)) by applying

    If E is of the form FILTER(expr)
        FS := FS ∪ {expr}

and

If FS is not empty
    Let X := Conjunction of expressions in FS
    G := Filter(X, G)

so Transform (⋅₂) is (filter (F₂) (leftjoin P₂ P₃ true)).

Transform (⋅₃)

The left side leaves G = BGP(P₁). The is an optional which matches the form Filter(F, A2). The right side

    If E is of the form OPTIONAL{P} 
        Let A := Transform(P)
        If A is of the form Filter(F, A2)
            G := LeftJoin(G, A2, F)
        Else 
            …

gives us LeftJoin(G, (leftjoin P₂ P₃ true), F₂) so Transform (⋅₃) is LeftJoin(BGP(P₁), (leftjoin P₂ P₃ true), F₂).

Transform (⋅₄)

From the left side, G = LeftJoin(BGP(P₁), (leftjoin P₂ P₃ true), F₂). The right is a filter so we apply

    If E is of the form FILTER(expr)
        FS := FS ∪ {expr}

and

If FS is not empty
    Let X := Conjunction of expressions in FS
    G := Filter(X, G)

so Transform (⋅₄) is (filter F₁ LeftJoin(BGP(P₁), (leftjoin P₂ P₃ true), F₂)).

Evaluation

Note that the evaluation of LeftJoin(G, A2, F) is a bit unusual in that an expression F is part of the join. In 18.4, the definition of LeftJoin

LeftJoin(Ω1, Ω2, expr) = Filter(expr, Join(Ω1, Ω2)) ∪ Diff(Ω1, Ω2, expr)
Diff(Ω1, Ω2, expr) = { μ | μ in Ω1 such that ∀ μ′ in Ω2, either μ and μ′
are not compatible or μ and μ' are compatible and expr(merge(μ, μ')) has
an effective boolean value of false }

effectively says that

for each row μ1 in Ω1,
  for each row μ2 in Ω2,
    if μ1 is compatible with μ2 AND F(μ1 μ2)==true
      add a row with the total bindings μ1 μ2 
  if no rows matched μ1, keep it anyways,
  otherwise, remove it.

There are other ways to implement the don't-delete-incompatible-rows semantics, but this illustrates the process. This also allows you to have one join function which handles Join, LeftJoin and Minus (which requires one to delete only the rows that match) and filtered joins ((filter (join Ω1 Ω2))).