ARQ's SPARQL Algebra

In the process of defining a semantics for SPARQL, the SPARQL Algebra defines a not-so-abstract syntax for SPARQL. (Compare the abstract syntax of RDF Concepts, which says that "an RDF triple contains three components: the subject, which is an RDF URI reference or a blank node…" to the concrete syntax of SPARQL's algebra which makes references to e.g. "triple(s, p, o)".) This concrete syntax provides an alternative, machine-oriented language for SPARQL which:

The last step is an "almost" because the SPARQL algebra defines graph pattern forms, expressions and solution modifiers, but elides query forms (SELECT, CONSTRUCT, ASK, DESCRIBE) and specific syntaxes for terminals like IRIs, literals, blank nodes, projection positions ("Ψ"), and the separator parameter to GroupConcat. Algebraic forms of solutions, i.e. Ω (a solution set) and μ (a solution) need are not needed for a concrete syntax as they are only needed for evaluation.

The ARQ SPARQL Query Validator maps SPARQL queries to SPARQL Algebra, providing a complete concrete syntax which is a superset of the algebra used to define of SPARQL. This document inspects the ARQ SPARQL Algebra and proposes some modifications which would further enable this syntax as a query exchange format.

Query Forms

While neither the SPARQL nor the ARQ Algebra explicitly capture query forms, different query forms have somewhat differentiated ARQ Algebra representations:

SELECT <list>

SPARQL:
SELECT ?s ?o
 WHERE { ?s ?p ?o }
ARQ Algebra:
(base <http://example/base/>
  (project (?s ?o)
    (bgp (triple ?s ?p ?o))))

The algebra is expressed in s-expressions, leveraging ()s to convey scoping. The outer scoping of the ARQ Algebra provides a default (example) base IRI (which would likely be different from service to service). This base can be overridden with the BASE SPARQL directive.

The SELECT manifests as a projection of s and o from the results bound by the bgp.

SELECT *

SPARQL:
SELECT *
 WHERE { ?s ?p ?o }
ARQ Algebra:
(base <http://example/base/>
  (bgp (triple ?s ?p ?o)))

SELECT * effectively eliminates the projection, leaving the result of the nested WHERE clause as the results of the query. Note that this means that a subquery with no projection, grouping, or HAVING clause isn't treated as a subquery:

SPARQL:
SELECT ?s ?o
 WHERE {
  { SELECT *
     WHERE { ?s ?p ?o }
  }
 }
ARQ Algebra:
(base <http://example/base/>
  (project (?s ?o)
    (bgp (triple ?s ?p ?o))))

Of course, we cannot reconstruct the original query, but IRI prefix expansion, white-space normalization and case insensitivity already preclude that. This also has an impact on what can serve as a graph pattern (see Graph Patterns below).

CONSTRUCT

SPARQL:
CONSTRUCT { ?s <x> ?o }
   WHERE { ?s ?p ?o }
ARQ Algebra:
(base <http://example/base/>
  (bgp (triple ?s ?p ?o)))

CONSTRUCT has no syntactic distinction from SELECT *. A form like:

(construct
  (bgp (triple ?s ?p ?o))
  (bgp (triple ?s <x> ?o)))

would move the algebra towards an expressive closure, capturing not only the components of the query language which provide solution sets, but also what to do with that solution set. Upside: this would encourage the modularization of the parser. Downside: this makes the algebra modal, complicating composability (we can no longer nest just any algebra form inside another, e.g. (join (bgp …) (construct (bgp …)))).

DESCRIBE

The other form which returns RDF instead of a solution set is DESCRIBE. DESCRIBE may take a term list and a pattern as does SELECT:

SPARQL:
DESCRIBE ?s ?o
   WHERE { ?s ?p ?o }
ARQ Algebra:
(base <http://example/base/>
  (project (?s ?o)
    (bgp (triple ?s ?p ?o))))

This form of DESCRIBE has no syntactic distinction from SELECT <list> { … }. A form like:

(describe-all
  (project (?s ?o)
    (bgp (triple ?s <x> ?o))))

would leverage the existing project semantics not increase the algebra modality more than would CONSTRUCT.

The other form of DESCRIBE has no where clause:

SPARQL:
DESCRIBE <X> <Y>
ARQ Algebra:
(base <http://example/base/>
  (null))

This form of DESCRIBE has basically no footprint at all (the "null" the probably a consequence of a Java toString function). A form like:

(describe (<X> <Y>))

would provide a list of resources to describe and would be distinct from the describe-all form proposed above.

ASK

SPARQL:
ASK WHERE { ?s ?p ?o }
ARQ Algebra:
(base <http://example/base/>
  (bgp (triple ?s ?p ?o)))

ASK, like CONSTRUCT, has no syntactic distinction from SELECT *. A form like:

(ask
  (bgp (triple ?s ?p ?o)))

could produce a solution sequence with either one solution with no bindings (AKA (table unit) in the ARQ Algebra) or no solutions.

Graph Patterns

The SPARQL grammar has a set of graph patterns which are composed to form query patterns. These manifest in the SPARQL Algebra as BGPs, Joins, LeftJoins, Unions, Minuss, Binds and Filters. 18.2.4.4 SELECT Expressions transforms SubSelects to either the nested WHERE pattern or a projects of the WHERE pattern. Modeling this with a subset of the language, we can create a stratified grammar to parse these nested graph patterns:

Productions:

[1]    SliceOpt    ::=    Slice
| ProjectOpt
[2]    ProjectOpt    ::=    Project
| GroupGraphPatternNoSub
[3]    GroupGraphPatternNoSub    ::=    BGP
| Join
[4]    GroupGraphPatternSub    ::=    BGP
| Join
| SubSelect
[5]    SubSelect    ::=    Slice
| Project
[6]    Slice    ::=    "(" "slice" ProjectOpt ")"
[7]    Project    ::=    "(" "project" "(" (TERM)+ ")" GroupGraphPatternNoSub ")"
[8]    BGP    ::=    "(" "bgp" (Triple)+ ")"
[9]    Triple    ::=    "(" "triple" TERM TERM TERM ")"
[10]    Join    ::=    "(" "join" GroupGraphPatternSub GroupGraphPatternSub ")"
[11]    <TERM>    ::=    "?" ([a-z])+
| "_" ([a-z])+
| "<" ([^>])* ">"
| '"' ([^\"])* '"'
| "'" ([^'])* "'"
[12]    PASSED TOKENS    ::=    ([ \t\r\n])+

Note that eliminating the stratification introduces an ambiguity as (bgp ?s ?p ?o) could be parsed as either a BGP as a GroupGraphPattern, or as a BGP as a SubSelect in a GroupGraphPattern, or ... Stratification constraints the parsing to the former option.

Groups

The first argument to Group is a list of expressions supplied in a GROUP BY directive. The second argument is the list of grouped expressions appearing in the SELECT. The ARQ Algebra identifies the elements of that list with position identifiers of the form "?.n". This position identifier is referenced by the nesting project required with any GROUP BY:

SPARQL:
SELECT ?p (GROUP_CONCAT(?o ; SEPARATOR=",") AS ?con)
 WHERE { ?s ?p ?o }
 GROUP BY (?s) (?p)
   HAVING (?s) (?o)
ARQ Algebra:
(base <http://example/base/>
  (project (?p ?con)
    (filter (exprlist ?s ?o)
      (extend ((?con ?.0))
        (group (?s ?p) ((?.0 (group_concat (separator ',') ?o)))
          (bgp (triple ?s ?p ?o)))))))

SPARQL variables are not permitted to have a leading '.' so this positional identifier doesn't conflict with any other terminal in the ARQ Algebra.

Proposed Changes


Change Log:

$Id