In the process of defining a semantics for SPARQL, the SPARQL Algebra defines a not-so-abstract syntax for SPARQL. (Compare the abstract syntax of RDF Concepts, which says that "an RDF triple contains three components: the subject, which is an RDF URI reference or a blank node…" to the concrete syntax of SPARQL's algebra which makes references to e.g. "triple(s, p, o)".) This concrete syntax provides an alternative, machine-oriented language for SPARQL which:
- Provides a reference for testing SPARQL parsing and simplification.
- Simplifies discussions about specific query constructions.
- Almost modularizes the parsing and evaluation components of a SPARQL engine.
The last step is an "almost" because the SPARQL algebra defines graph pattern forms, expressions and solution modifiers, but elides query forms (SELECT
, CONSTRUCT
, ASK
, DESCRIBE
) and specific syntaxes for terminals like IRI
s, literals
, blank nodes
, projection positions ("Ψ"), and the separator
parameter to GroupConcat
.
Algebraic forms of solutions, i.e. Ω
(a solution set) and μ
(a solution) need are not needed for a concrete syntax as they are only needed for evaluation.
The ARQ SPARQL Query Validator maps SPARQL queries to SPARQL Algebra, providing a complete concrete syntax which is a superset of the algebra used to define of SPARQL. This document inspects the ARQ SPARQL Algebra and proposes some modifications which would further enable this syntax as a query exchange format.
Query Forms
While neither the SPARQL nor the ARQ Algebra explicitly capture query forms, different query forms have somewhat differentiated ARQ Algebra representations:
SELECT <list>
SELECT ?s ?o WHERE { ?s ?p ?o }
(base <http://example/base/> (project (?s ?o) (bgp (triple ?s ?p ?o))))
The algebra is expressed in s-expressions, leveraging ()
s to convey scoping.
The outer scoping of the ARQ Algebra provides a default (example) base IRI (which would likely be different from service to service).
This base can be overridden with the BASE
SPARQL directive.
The SELECT
manifests as a projection of s and o from the results bound by the bgp
.
SELECT *
SELECT * WHERE { ?s ?p ?o }
(base <http://example/base/> (bgp (triple ?s ?p ?o)))
SELECT *
effectively eliminates the projection, leaving the result of the nested WHERE clause as the results of the query.
Note that this means that a subquery with no projection, grouping, or HAVING
clause isn't treated as a subquery:
SELECT ?s ?o WHERE { { SELECT * WHERE { ?s ?p ?o } } }
(base <http://example/base/> (project (?s ?o) (bgp (triple ?s ?p ?o))))
Of course, we cannot reconstruct the original query, but IRI prefix expansion, white-space normalization and case insensitivity already preclude that. This also has an impact on what can serve as a graph pattern (see Graph Patterns below).
CONSTRUCT
CONSTRUCT { ?s <x> ?o } WHERE { ?s ?p ?o }
(base <http://example/base/> (bgp (triple ?s ?p ?o)))
CONSTRUCT
has no syntactic distinction from SELECT *
.
A form like:
(construct (bgp (triple ?s ?p ?o)) (bgp (triple ?s <x> ?o)))
would move the algebra towards an expressive closure, capturing not only the components of the query language which provide solution sets, but also what to do with that solution set.
Upside: this would encourage the modularization of the parser.
Downside: this makes the algebra modal, complicating composability (we can no longer nest just any algebra form inside another, e.g. (join (bgp …) (construct (bgp …)))
).
DESCRIBE
The other form which returns RDF instead of a solution set is DESCRIBE
.
DESCRIBE
may take a term list and a pattern as does SELECT
:
DESCRIBE ?s ?o WHERE { ?s ?p ?o }
(base <http://example/base/> (project (?s ?o) (bgp (triple ?s ?p ?o))))
This form of DESCRIBE
has no syntactic distinction from SELECT <list> { … }
.
A form like:
(describe-all (project (?s ?o) (bgp (triple ?s <x> ?o))))
would leverage the existing project
semantics not increase the algebra modality more than would CONSTRUCT
.
The other form of DESCRIBE
has no where clause:
DESCRIBE <X> <Y>
(base <http://example/base/> (null))
This form of DESCRIBE
has basically no footprint at all (the "null
" the probably a consequence of a Java toString function).
A form like:
(describe (<X> <Y>))
would provide a list of resources to describe and would be distinct from the describe-all
form proposed above.
ASK
ASK WHERE { ?s ?p ?o }
(base <http://example/base/> (bgp (triple ?s ?p ?o)))
ASK
, like CONSTRUCT
, has no syntactic distinction from SELECT *
.
A form like:
(ask (bgp (triple ?s ?p ?o)))
could produce a solution sequence with either one solution with no bindings (AKA (table unit)
in the ARQ Algebra) or no solutions.
Graph Patterns
The SPARQL grammar has a set of graph patterns which are composed to form query patterns.
These manifest in the SPARQL Algebra as BGP
s, Join
s, LeftJoin
s, Union
s, Minus
s, Bind
s and Filter
s.
18.2.4.4 SELECT Expressions transforms SubSelects to either the nested WHERE pattern or a projects of the WHERE pattern.
Modeling this with a subset of the language, we can create a stratified grammar to parse these nested graph patterns:
Productions: | |||
[1] | SliceOpt |
::= | Slice |
[2] | ProjectOpt |
::= | Project |
[3] | GroupGraphPatternNoSub |
::= | BGP |
[4] | GroupGraphPatternSub |
::= | BGP |
[5] | SubSelect |
::= | Slice |
[6] | Slice |
::= | "(" "slice" ProjectOpt ")" |
[7] | Project |
::= | "(" "project" "(" (TERM)+ ")" GroupGraphPatternNoSub ")" |
[8] | BGP |
::= | "(" "bgp" (Triple)+ ")" |
[9] | Triple |
::= | "(" "triple" TERM TERM TERM ")" |
[10] | Join |
::= | "(" "join" GroupGraphPatternSub GroupGraphPatternSub ")" |
[11] | <TERM > |
::= | "?" ([a-z])+ |
[12] | PASSED TOKENS |
::= | ([ \t\r\n])+ |
Note that eliminating the stratification introduces an ambiguity as (bgp ?s ?p ?o)
could be parsed as either a BGP
as a GroupGraphPattern
, or as a BGP
as a SubSelect
in a GroupGraphPattern
, or ...
Stratification constraints the parsing to the former option.
Groups
The first argument to Group is a list of expressions supplied in a GROUP BY
directive.
The second argument is the list of grouped expressions appearing in the SELECT
.
The ARQ Algebra identifies the elements of that list with position identifiers of the form "?.n".
This position identifier is referenced by the nesting project
required with any GROUP BY
:
SELECT ?p (GROUP_CONCAT(?o ; SEPARATOR=",") AS ?con) WHERE { ?s ?p ?o } GROUP BY (?s) (?p) HAVING (?s) (?o)
(base <http://example/base/> (project (?p ?con) (filter (exprlist ?s ?o) (extend ((?con ?.0)) (group (?s ?p) ((?.0 (group_concat (separator ',') ?o))) (bgp (triple ?s ?p ?o)))))))
SPARQL variables are not permitted to have a leading '.' so this positional identifier doesn't conflict with any other terminal in the ARQ Algebra.
Proposed Changes
-
Only provide the
base
context when the parsed SPARQL query provided a base.
Services or tools which leverage a generic parser would have their own default base contexts and would have to explicitly detect and strip out any default provided by a parser. -
Add syntactic forms for
construct
,describe
andask
.