W3C

SPARQL 1.1 Query Language

W3C Working Draft 1 June 2010

This version:
http://www.w3.org/TR/2010/WD-sparql11-query-20100601/
Latest version:
http://www.w3.org/TR/sparql11-query/
Previous version:
http://www.w3.org/TR/2010/WD-sparql11-query-20100126/
Editors:
Steve Harris, Garlik
Andy Seaborne, Talis Systems Limited <andy.seaborne@talis.com>
Previous Editor:
Eric Prud'hommeaux, W3C <eric@w3.org>

Please refer to the errata for this document, which may include some normative corrections.

The previous errata for this document, are also available.

See also translations.


Abstract

RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, creating values by complex expressions, extensible value testing, and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The documents produced by this Working Group are:

This publication includes the new features of SPARQL 1.1 into the main SPARQL Query specification. The structure of this document will change to fully integrate the new features. In this publication, new content is gathered together for ease of review of these new features.

The new features are:

No incompatibilities with existing valid SPARQL queries, in either syntax or results, will be introduced by these extensions to the language.

The design of the features presented here is work-in-progress and does not represent the final decisions of the working group. Implementers and application writers should not assume that the designs in this document will not change.

Comments on this document should be sent to public-rdf-dawg-comments@w3.org, a mailing list with a public archive. Questions and comments about SPARQL that are not related to this specification, including extensions and features, can be discussed on the mailing list public-sparql-dev@w3.org, (public archive).

This document was produced by the SPARQL Working Group, which is part of the W3C Semantic Web Activity.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 Document Outline
    1.2 Document Conventions
        1.2.1 Namespaces
        1.2.2 Data Descriptions
        1.2.3 Result Descriptions
        1.2.4 Terminology
2 Making Simple Queries (Informative)
    2.1 Writing a Simple Query
    2.2 Multiple Matches
    2.3 Matching RDF Literals
        2.3.1 Matching Literals with Numeric Types
        2.3.2 Matching Literals with Arbitrary Datatypes
    2.4 Blank Node Labels in Query Results
    2.5 Creating Values with Expressions
    2.6 Building RDF Graphs
3 RDF Term Constraints (Informative)
    3.1 Restricting the Value of Strings
    3.2 Restricting Numeric Values
    3.3 Other Term Constraints
4 SPARQL Syntax
    4.1 RDF Term Syntax
        4.1.1 Syntax for IRIs
            4.1.1.1 Prefixed names
            4.1.1.2 Relative IRIs
        4.1.2 Syntax for Literals
        4.1.3 Syntax for Query Variables
        4.1.4 Syntax for Blank Nodes
    4.2 Syntax for Triple Patterns
        4.2.1 Predicate-Object Lists
        4.2.2 Object Lists
        4.2.3 RDF Collections
        4.2.4 rdf:type
5 Graph Patterns
    5.1 Basic Graph Patterns
        5.1.1 Blank Node Labels
        5.1.2 Extending Basic Graph Pattern Matching
    5.2 Group Graph Patterns
        5.2.1 Empty Group Pattern
        5.2.2 Scope of Filters
        5.2.3 Group Graph Pattern Examples
6 Including Optional Values
    6.1 Optional Pattern Matching
    6.2 Constraints in Optional Pattern Matching
    6.3 Multiple Optional Graph Patterns
7 Matching Alternatives
8 Negation
    8.1 Filtering Using Graph Patterns
        8.1.1 Testing For the Absence of a Pattern
        8.1.2 Testing For the Presence of a Pattern
    8.2 Removing bindings
    8.3 Relationship and difference between NOT EXISTS and MINUS
    8.4 Algebra Operators
        8.4.1 Algebra: EXISTS
        8.4.2 Algebra: MINUS
9 Property Paths
10 Aggregates
    10.1 Aggregate Example
    10.2 Algebra Operators
        10.2.1 Set Functions
        10.2.2 Mapping from Abstract Syntax to Algebra
11 Subqueries
12 RDF Dataset
    12.1 Examples of RDF Datasets
    12.2 Specifying RDF Datasets
        12.2.1 Specifying the Default Graph
        12.2.2 Specifying Named Graphs
        12.2.3 Combining FROM and FROM NAMED
    12.3 Querying the Dataset
        12.3.1 Named and Default Graphs
13 Solution Sequences and Modifiers
    13.1 ORDER BY
    13.2 Projection
    13.3 Duplicate Solutions
    13.4 OFFSET
    13.5 LIMIT
14 Query Forms
    14.1 SELECT
        14.1.1 Projection
        14.1.2 SELECT expressions
    14.2 CONSTRUCT
        14.2.1 Templates with Blank Nodes
        14.2.2 Accessing Graphs in the RDF Dataset
        14.2.3 Solution Modifiers and CONSTRUCT
    14.3 ASK
    14.4 DESCRIBE (Informative)
        14.4.1 Explicit IRIs
        14.4.2 Identifying Resources
        14.4.3 Descriptions of Resources
15 Testing Values
    15.1 Operand Data Types
    15.2 Filter Evaluation
        15.2.1 Invocation
        15.2.2 Effective Boolean Value (EBV)
    15.3 Operator Mapping
        15.3.1 Operator Extensibility
    15.4 Operators Definitions
        15.4.1 bound
        15.4.2 isIRI
        15.4.3 isBlank
        15.4.4 isLiteral
        15.4.5 str
        15.4.6 lang
        15.4.7 datatype
        15.4.8 logical-or
        15.4.9 logical-and
        15.4.10 RDFterm-equal
        15.4.11 sameTerm
        15.4.12 langMatches
        15.4.13 regex
        15.4.14 COALESCE
        15.4.15 IF
        15.4.16 IN
        15.4.17 NOT IN
        15.4.18 IRI
        15.4.19 BNODE
        15.4.20 STRDT
        15.4.21 STRLANG
    15.5 Constructor Functions
    15.6 Extensible Value Testing
16 Definition of SPARQL
    16.1 Initial Definitions
        16.1.1 RDF Terms
        16.1.2 RDF Dataset
        16.1.3 Query Variables
        16.1.4 Triple Patterns
        16.1.5 Basic Graph Patterns
        16.1.6 Solution Mapping
        16.1.7 Solution Sequence Modifiers
    16.2 SPARQL Query
        16.2.1 Converting Graph Patterns
        16.2.2 Examples of Mapped Graph Patterns
        16.2.3 Converting Solution Modifiers
    16.3 Basic Graph Patterns
        16.3.1 SPARQL Basic Graph Pattern Matching
        16.3.2 Treatment of Blank Nodes
    16.4 SPARQL Algebra
    16.5 Evaluation Semantics
    16.6 Extending SPARQL Basic Graph Matching
        16.6.1 Notes
17 SPARQL Grammar
18 Conformance
19 Security Considerations (Informative)
20 Internet Media Type, File Extension and Macintosh File Type

Appendices

A References
    A.1 Normative References
    A.2 Other References
B CVS History


1 Introduction

RDF is a directed, labeled graph data format for representing information in the Web. RDF is often used to represent, among other things, personal information, social networks, metadata about digital artifacts, as well as to provide a means of integration over disparate sources of information. This specification defines the syntax and semantics of the SPARQL query language for RDF.

The SPARQL query language for RDF is designed to meet the use cases and requirements identified by the RDF Data Access Working Group in RDF Data Access Use Cases and Requirements [UCNR].

The SPARQL query language is closely related to the following specifications:

1.1 Document Outline

Unless otherwise noted in the section heading, all sections and appendices in this document are normative.

@@Revise when structure stable

This section of the document, section 1, introduces the SPARQL query language specification. It presents the organization of this specification document and the conventions used throughout the specification.

Section 2 of the specification introduces the SPARQL query language itself via a series of example queries and query results. Section 3 continues the introduction of the SPARQL query language with more examples that demonstrate SPARQL's ability to express constraints on the RDF terms that appear in a query's results.

Section 4 presents details of the SPARQL query language's syntax. It is a companion to the full grammar of the language and defines how grammatical constructs represent IRIs, blank nodes, literals, and variables. Section 4 also defines the meaning of several grammatical constructs that serve as syntactic sugar for more verbose expressions.

Section 5 introduces basic graph patterns and group graph patterns, the building blocks from which more complex SPARQL query patterns are constructed. Sections 6, 7, and 8 present constructs that combine SPARQL graph patterns into larger graph patterns. In particular, Section 6 introduces the ability to make portions of a query optional; Section 7 introduces the ability to express the disjunction of alternative graph patterns; and Section 8 introduces the ability to constrain portions of a query to particular source graphs. Section 8 also presents SPARQL's mechanism for defining the source graphs for a query.

Section 9 defines the constructs that affect the solutions of a query by ordering, slicing, projecting, limiting, and removing duplicates from a sequence of solutions.

Section 10 defines the four types of SPARQL queries that produce results in different forms.

Section 11 defines SPARQL's extensible value testing framework. It also presents the functions and operators that can be used to constrain the values that appear in a query's results.

Section 12 is a formal definition of the evaluation of SPARQL graph patterns and solution modifiers.

Appendix A contains the normative definition of the SPARQL query language's syntax, as given by a grammar expressed in EBNF notation.

1.2 Document Conventions

1.2.1 Namespaces

In this document, examples assume the following namespace prefix bindings unless otherwise stated:

PrefixIRI
rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs:http://www.w3.org/2000/01/rdf-schema#
xsd:http://www.w3.org/2001/XMLSchema#
fn:http://www.w3.org/2005/xpath-functions#

1.2.2 Data Descriptions

This document uses the Turtle [TURTLE] data format to show each triple explicitly. Turtle allows IRIs to be abbreviated with prefixes:

@prefix dc:   <http://purl.org/dc/elements/1.1/> .
@prefix :     <http://example.org/book/> .
:book1  dc:title  "SPARQL Tutorial" .

1.2.3 Result Descriptions

Result sets are illustrated in tabular form.

A 'binding' is a pair (variable, RDF term). In this result set, there are three variables: x, y and z (shown as column headers). Each solution is shown as one row in the body of the table.  Here, there is a single solution, in which variable x is bound to "Alice", variable y is bound to <http://example/a>, and variable z is not bound to an RDF term. Variables are not required to be bound in a solution.

1.2.4 Terminology

The SPARQL language includes IRIs, a subset of RDF URI References that omits spaces. Note that all IRIs in SPARQL queries are absolute; they may or may not include a fragment identifier [RFC3987, section 3.1]. IRIs include URIs [RFC3986] and URLs. The abbreviated forms (relative IRIs and prefixed names) in the SPARQL syntax are resolved to produce absolute IRIs.

The following terms are defined in RDF Concepts and Abstract Syntax [CONCEPTS] and used in SPARQL:

2 Making Simple Queries (Informative)

Most forms of SPARQL query contain a set of triple patterns called a basic graph pattern. Triple patterns are like RDF triples except that each of the subject, predicate and object may be a variable. A basic graph pattern matches a subgraph of the RDF data when RDF terms from that subgraph may be substituted for the variables and the result is RDF graph equivalent to the subgraph.

2.2 Multiple Matches

The result of a query is a solution sequence, corresponding to the ways in which the query's graph pattern matches the data. There may be zero, one or multiple solutions to a query.

Data:

@prefix foaf:  <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name   "Johnny Lee Outlaw" .
_:a  foaf:mbox   <mailto:jlow@example.com> .
_:b  foaf:name   "Peter Goodguy" .
_:b  foaf:mbox   <mailto:peter@example.org> .
_:c  foaf:mbox   <mailto:carol@example.org> .

Query:

PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
  { ?x foaf:name ?name .
    ?x foaf:mbox ?mbox }

Query Result:

namembox
"Johnny Lee Outlaw"<mailto:jlow@example.com>
"Peter Goodguy"<mailto:peter@example.org>

Each solution gives one way in which the selected variables can be bound to RDF terms so that the query pattern matches the data. The result set gives all the possible solutions. In the above example, the following two subsets of the data provided the two matches.

 _:a foaf:name  "Johnny Lee Outlaw" .
 _:a foaf:box   <mailto:jlow@example.com> .
 _:b foaf:name  "Peter Goodguy" .
 _:b foaf:box   <mailto:peter@example.org> .

This is a basic graph pattern match; all the variables used in the query pattern must be bound in every solution.

2.3 Matching RDF Literals

The data below contains three RDF literals:

2.6 Building RDF Graphs

SPARQL has several query forms. The SELECT query form returns variable bindings. The CONSTRUCT query form returns an RDF graph. The graph is built based on a template which is used to generate RDF triples based on the results of matching the graph pattern of the query.

Data:

@prefix org:    <http://example.com/ns#> .

_:a  org:employeeName   "Alice" .
_:a  org:employeeId     12345 .

_:b  org:employeeName   "Bob" .
_:b  org:employeeId     67890 .

Query:

PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
PREFIX org:    <http://example.com/ns#>

CONSTRUCT { ?x foaf:name ?name }
WHERE  { ?x org:employeeName ?name }

Results:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
      
_:x foaf:name "Alice" .
_:y foaf:name "Bob" .

which can be serialized in RDF/XML as:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    >
  <rdf:Description>
    <foaf:name>Alice</foaf:name>
  </rdf:Description>
  <rdf:Description>
    <foaf:name>Bob</foaf:name>
  </rdf:Description>
</rdf:RDF>

3 RDF Term Constraints (Informative)

Graph pattern matching produces a solution sequence, where each solution has a set of bindings of variables to RDF terms. SPARQL FILTERs restrict solutions to those for which the filter expression evaluates to TRUE.

This section provides an informal introduction to SPARQL FILTERs; their semantics are defined in @@ Section 11. Testing Values. The examples in this section share one input graph:

Data:
@prefix dc:   <http://purl.org/dc/elements/1.1/> .
@prefix :     <http://example.org/book/> .
@prefix ns:   <http://example.org/ns#> .

:book1  dc:title  "SPARQL Tutorial" .
:book1  ns:price  42 .
:book2  dc:title  "The Semantic Web" .
:book2  ns:price  23 .

3.1 Restricting the Value of Strings

SPARQL FILTER functions like regex can test RDF literals. regex matches only plain literals with no language tag. regex can be used to match the lexical forms of other literals by using the str function.

Query:

PREFIX  dc:  <http://purl.org/dc/elements/1.1/>
SELECT  ?title
WHERE   { ?x dc:title ?title
          FILTER regex(?title, "^SPARQL") 
        }

Query Result:

Regular expression matches may be made case-insensitive with the "i" flag.

Query:

The regular expression language is defined by XQuery 1.0 and XPath 2.0 Functions and Operators and is based on XML Schema Regular Expressions.

By constraining the price variable, only :book2 matches the query because only :book2 has a price less than 30.5, as the filter condition requires.

3.3 Other Term Constraints

@@ Fix section refs

In addition to numeric types, SPARQL supports types xsd:string, xsd:boolean and xsd:dateTime (see 11.1 Operand Data Types). 11.3 Operator Mapping lists a set of test functions, including BOUND, isLITERAL and langMATCHES and accessors, including STR, LANG and DATATYPE. 11.5 Constructor Functions lists a set of XML Schema constructor functions that are in the SPARQL language to cast values from one type to another.

4 SPARQL Syntax

This section covers the syntax used by SPARQL for RDF terms and triple patterns. The full grammar is given in appendix A.

4.1 RDF Term Syntax

4.1.1 Syntax for IRIs

The IRIref production designates the set of IRIs [RFC3987]; IRIs are a generalization of URIs [RFC3986] and are fully compatible with URIs and URLs. The PrefixedName production designates a prefixed name. The mapping from a prefixed name to an IRI is described below. IRI references (relative or absolute IRIs) are designated by the IRI_REF production, where the '<' and '>' delimiters do not form part of the IRI reference. Relative IRIs match the irelative-ref reference in section 2.2 ABNF for IRI References and IRIs in [RFC3987] and are resolved to IRIs as described below.

The set of RDF terms defined in RDF Concepts and Abstract Syntax includes RDF URI references while SPARQL terms include IRIs. RDF URI references containing "<", ">", '"' (double quote), space, "{", "}", "|", "\", "^", and "`" are not IRIs. The behavior of a SPARQL query against RDF statements composed of such RDF URI references is not defined.

4.1.1.2 Relative IRIs

Relative IRIs are combined with base IRIs as per Uniform Resource Identifier (URI): Generic Syntax [RFC3986] using only the basic algorithm in Section 5.2 . Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of Internationalized Resource Identifiers (IRIs) [RFC3987].

The BASE keyword defines the Base IRI used to resolve relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded in Content". Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the Base IRI may come from an encapsulating document, such as a SOAP envelope with an xml:base directive or a mime multipart document with a Content-Location header. The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular SPARQL query was retrieved. If none of the above specifies the Base URI, the default Base URI (section 5.1.4, "Default Base URI") is used.

The following fragments are some of the different ways to write the same IRI:

<http://example.org/book/book1>
BASE <http://example.org/book/>
<book1>
PREFIX book: <http://example.org/book/>
book:book1

4.1.2 Syntax for Literals

The general syntax for literals is a string (enclosed in either double quotes, "...", or single quotes, '...'), with either an optional language tag (introduced by @) or an optional datatype IRI or prefixed name (introduced by ^^).

As a convenience, integers can be written directly (without quotation marks and an explicit datatype IRI) and are interpreted as typed literals of datatype xsd:integer; decimal numbers for which there is '.' in the number but no exponent are interpreted as xsd:decimal; and numbers with exponents are interpreted as xsd:double. Values of type xsd:boolean can also be written as true or false.

To facilitate writing literal values which themselves contain quotation marks or which are long and contain newline characters, SPARQL provides an additional quoting construct in which literals are enclosed in three single- or double-quotation marks.

Examples of literal syntax in SPARQL include:

Tokens matching the productions INTEGER, DECIMAL, DOUBLE and BooleanLiteral are equivalent to a typed literal with the lexical value of the token and the corresponding datatype (xsd:integer, xsd:decimal, xsd:double, xsd:boolean).

4.1.4 Syntax for Blank Nodes

Blank nodes in graph patterns act as non-distinguished variables, not as references to specific blank nodes in the data being queried.

Blank nodes are indicated by either the label form, such as "_:abc", or the abbreviated form "[]". A blank node that is used in only one place in the query syntax can be indicated with []. A unique blank node will be used to form the triple pattern. Blank node labels are written as "_:abc" for a blank node with label "abc". The same blank node label cannot be used in two different basic graph patterns in the same query.

The [:p :v] construct can be used in triple patterns. It creates a blank node label which is used as the subject of all contained predicate-object pairs. The created blank node can also be used in further triple patterns in the subject and object positions.

The following two forms

[ :p "v" ] .
[] :p "v" .

allocate a unique blank node label (here "b57") and are equivalent to writing:

_:b57 :p "v" .

This allocated blank node label can be used as the subject or object of further triple patterns. For example, as a subject:

[ :p "v" ] :q "w" .

which is equivalent to the two triples:

_:b57 :p "v" .
_:b57 :q "w" .

and as an object:

:x :q [ :p "v" ] .

which is equivalent to the two triples:

:x  :q _:b57 .
_:b57 :p "v" .

Abbreviated blank node syntax can be combined with other abbreviations for common subjects and common predicates.

  [ foaf:name  ?name ;
    foaf:mbox  <mailto:alice@example.org> ]

This is the same as writing the following basic graph pattern for some uniquely allocated blank node label, "b18":

  _:b18  foaf:name  ?name .
  _:b18  foaf:mbox  <mailto:alice@example.org> .

4.2 Syntax for Triple Patterns

Triple Patterns are written as a whitespace-separated list of a subject, predicate and object; there are abbreviated ways of writing some common triple pattern constructs.

The following examples express the same query:

PREFIX  dc: <http://purl.org/dc/elements/1.1/>
SELECT  ?title
WHERE   { <http://example.org/book/book1> dc:title ?title }  
PREFIX  dc: <http://purl.org/dc/elements/1.1/>
PREFIX  : <http://example.org/book/>

SELECT  $title
WHERE   { :book1  dc:title  $title }
BASE    <http://example.org/book/>
PREFIX  dc: <http://purl.org/dc/elements/1.1/>

SELECT  $title
WHERE   { <book1>  dc:title  ?title }

4.2.3 RDF Collections

RDF collections can be written in triple patterns using the syntax "(element1 element2 ...)". The form "()" is an alternative for the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#nil. When used with collection elements, such as (1 ?x 3 4), triple patterns with blank nodes are allocated for the collection. The blank node at the head of the collection can be used as a subject or object in other triple patterns. The blank nodes allocated by the collection syntax do not occur elsewhere in the query.

(1 ?x 3 4) :p "w" .

is syntactic sugar for (noting that b0, b1, b2 and b3 do not occur anywhere else in the query):

    _:b0  rdf:first  1 ;
          rdf:rest   _:b1 .
    _:b1  rdf:first  ?x ;
          rdf:rest   _:b2 .
    _:b2  rdf:first  3 ;
          rdf:rest   _:b3 .
    _:b3  rdf:first  4 ;
          rdf:rest   rdf:nil .
    _:b0  :p         "w" . 

RDF collections can be nested and can involve other syntactic forms:

(1 [:p :q] ( 2 ) ) .

is syntactic sugar for:

    _:b0  rdf:first  1 ;
          rdf:rest   _:b1 .
    _:b1  rdf:first  _:b2 .
    _:b2  :p         :q .
    _:b1  rdf:rest   _:b3 .
    _:b3  rdf:first  _:b4 .
    _:b4  rdf:first  2 ;
          rdf:rest   rdf:nil .
    _:b3  rdf:rest   rdf:nil .

4.2.4 rdf:type

The keyword "a" can be used as a predicate in a triple pattern and is an alternative for the IRI  http://www.w3.org/1999/02/22-rdf-syntax-ns#type. This keyword is case-sensitive.

  ?x  a  :Class1 .
  [ a :appClass ] :p "v" .

is syntactic sugar for:

  ?x    rdf:type  :Class1 .
  _:b0  rdf:type  :appClass .
  _:b0  :p        "v" .

5 Graph Patterns

SPARQL is based around graph pattern matching. More complex graph patterns can be formed by combining smaller patterns in various ways:

In this section we describe the two forms that combine patterns by conjunction: basic graph patterns, which combine triples patterns, and group graph patterns, which combine all other graph patterns.

The outer-most graph pattern in a query is called the query pattern. It is grammatically identified by GroupGraphPattern in

[13]  WhereClause  ::=  'WHERE'? GroupGraphPattern

5.2 Group Graph Patterns

In a SPARQL query string, a group graph pattern is delimited with braces: {}. For example, this query's query pattern is a group graph pattern of one basic graph pattern.

6 Including Optional Values

Basic graph patterns allow applications to make queries where the entire query pattern must match for there to be a solution. For every solution of a query containing only group graph patterns with at least one basic graph pattern, every variable is bound to an RDF Term in a solution. However, regular, complete structures cannot be assumed in all RDF graphs. It is useful to be able to have queries that allow information to be added to the solution where the information is available, but do not reject the solution because some part of the query pattern does not match. Optional matching provides this facility: if the optional part does not match, it creates no bindings but does not eliminate the solution.

6.1 Optional Pattern Matching

Optional parts of the graph pattern may be specified syntactically with the OPTIONAL keyword applied to a graph pattern:

pattern OPTIONAL { pattern }

There is no value of mbox in the solution where the name is "Bob".

This query finds the names of people in the data. If there is a triple with predicate mbox and the same subject, a solution will contain the object of that triple as well. In this example, only a single triple pattern is given in the optional match part of the query but, in general, the optional part may be any graph pattern. The entire optional graph pattern must match for the optional graph pattern to affect the query solution.

7 Matching Alternatives

@@Additional SPARQL 1.1. syntax - can omit the {} for the LHS to bring into line with OPTIONAL and MINUS.

SPARQL provides a means of combining graph patterns so that one of several alternative graph patterns may match. If more than one of the alternatives matches, all the possible pattern solutions are found.

Pattern alternatives are syntactically specified with the UNION keyword.

Data:
@prefix dc10:  <http://purl.org/dc/elements/1.0/> .
@prefix dc11:  <http://purl.org/dc/elements/1.1/> .

_:a  dc10:title     "SPARQL Query Language Tutorial" .
_:a  dc10:creator   "Alice" .

_:b  dc11:title     "SPARQL Protocol Tutorial" .
_:b  dc11:creator   "Bob" .

_:c  dc10:title     "SPARQL" .
_:c  dc11:title     "SPARQL (updated)" .

This query finds titles of the books in the data, whether the title is recorded using Dublin Core properties from version 1.0 or version 1.1. To determine exactly how the information was recorded, a query could use different variables for the two alternatives:

PREFIX dc10:  <http://purl.org/dc/elements/1.0/>
PREFIX dc11:  <http://purl.org/dc/elements/1.1/>

SELECT ?x ?y
WHERE  { { ?book dc10:title ?x } UNION { ?book dc11:title  ?y } }

This will return results with the variable x bound for solutions from the left branch of the UNION, and y bound for the solutions from the right branch. If neither part of the UNION pattern matched, then the graph pattern would not match.

The UNION pattern combines graph patterns; each alternative possibility can contain more than one triple pattern:

This query will only match a book if it has both a title and creator predicate from the same version of Dublin Core.

8 Negation

The SPARQL query language incoporates two styles of negation, one based on filtering results depending on whether a graph pattern does or does not match in the context of the query solution being filterd, and one based on removing solutions related to another pattern.

8.4 Algebra Operators

@@Content here will migrate to the formal definition section.

8.4.1 Algebra: EXISTS

There is a filter operator "exists" that takes a graph pattern. exists returns true/false depending on whether the pattern matches. No additional binding of variables occurs. The NOT EXISTS form translates into fn:not(exists(...)).

 xsd:boolean   EXISTS {pattern pat}

Returns true if pattern pat matches the dataset. Returns false otherwise.

@@active graph

Variables in the pattern pat that are bound in the current solution mapping take the value they have from the solution mapping. Variables in the pattern pat that are not bound in the current solution mapping take part in pattern matching.

To facilitate this, we introduce an algebra operation for the evaluation of the pattern in an algebra EXISTS operation:

Definition: Substitute

Let μ a solution mapping.

substitute(pattern, μ) = the pattern formed by replacing every occurrence of a variable in pattern by its value in μ.

We define an expression function "exists" using "substitute":

Definition: Exists

Let μ a solution mapping:

exists(pattern, μ) = true if and only if eval(substitute(pattern, μ), D[g]) has any solutions.

8.4.2 Algebra: MINUS

Definition: Minus

Minus(Ω1, Ω2) = { μ | μ in Ω1 such that for all μ' in Ω2, either μ and μ' are not compatible or dom(μ) and dom(μ') are disjoint }

The additional restriction on dom(μ) and dom(μ') is added so that if any solution mapping has no variables in common with solution mappings of Ω1 then Minus(Ω1, Ω2) is empty, regardless of the rest of Ω2. The empty solution mapping is compatible with every other solution mapping so P MINUS {} would otherwise be empty for any pattern P.

9 Property Paths

@@See Property Paths Doc.

10 Aggregates

Aggregates apply expressions over groups of solutions. By default a solution set consists of a single group, containing all solutions.

Grouping may be specified using the GROUP BY syntax.

Aggregates defined in version 1.1 of SPARQL/Query are COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE.

In aggregate queries and sub-queries only expressions which have been used as GROUP BY expressions, or aggregated expressions (i.e. expressions where all variables appear inside an aggregate) can be projected. In order to project arbitrary expressions the SAMPLE aggregate may be used.

@@ note: perhaps it would be simpler to require that all variables be passed to some aggregate, SAMPLE can be used on GROUP BY expressions, and the result would be equivalent to the text above. This would reduces the complexity of implementations, not having to determine if the projected expression and the group expression are equivalent.

10.2 Algebra Operators

ListEval is a function which is used to evaluate a list of expressions against a solution and return a list of the resulting values.

Group, a function which groups a solution sequence into multiple solutions, based on some attribute of the solutions.

Definition: Group

Group evaluates a list of expressions against a solution sequence, producing a set of partial functions from keys to solution sequences.

The behaviour of Group is different when ExprList is empty.

Group((), Ω) = { 1 -> Ω }

Group(ExprList, Ω) = { ListEval(ExprList, μ) -> { μ' | μ' in Ω, ListEval(ExprList, μ) = ListEval(ExprList, μ') } | μ in Ω }

For example, given a solution sequence S, ( {?x→2, ?y→3}, {?x→2, ?y→5}, {?x→6, ?y→7} ),
Group((?x), S) = {
  (2) → ( {?x→2, ?y→3}, {?x→2, ?y→5} ),
  (6) → ( {?x→6, ?y→7} )
}

Aggregation, a function which calculates a scalar value as an output of the aggregate expression in the SELECT clause.

Definition: Aggregation

Aggregation applies a set function “func” to a multiset of lists of expressions and a grouped solution sequence, G as produced by the Group function. It produces a single value for each key and partition for that key (key, X).

Aggregation(ExprList, func, scalar, G) = { dom(g) → F | g in G }

Where
   M = ListEvalE(ExprList, range(g))
   F = func(M, card[range(g)] - card[M], scalar), for non-DISTINCT
   F = func(Distinct(M), card[range(g)] - card[M], scalar), for DISTINCT

Special Case: when COUNT is used with the expression * the value of F will be cardinality of the group solution sequence, card[range(g)], or card[Distinct(range(g))] if the DISTINCT keyword is present.

@@ should "scalar" be a set of partial functions instead of a value?

All aggregates may have the DISTINCT keyword as the first token in their argument list. If this keyword is present then first argument to func is Distinct(M).

Example

Given a solution multiset (Ω) with the following values:

?x?y?z
123
134
256

And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY ?x.

We produce G = Group((?x), Ω) = { (1) → {?y=2, ?z=3}, {?y=3, ?z=4}), (2) → {?y=5, ?z=6} }

And so Aggregation((?y, ?z), ex:agg, 0, G) =
{ (1) → eg:agg({(2, 3), (3, 4)}, 0)), (2) → eg:agg({(5, 6)}, 0) }.

@@ need to define HAVING as a form of FILTER, c.f. ISSUE 12.

10.2.1 Set Functions

The set functions which underlie SPARQL aggregates all have a common signature: SetFunc(M, err), or SetFunc(M, err, scalar, ...) where M is a multiset of lists, err is a value indicating whether the evaluation of any of the expressesions evaluated with respect to Ω returned an error, and scalar is one or more scalar values that are passed to the set function indirectly via the ( ... ; key=value ) syntax for aggregates in the SPARQL grammar.

Flatten is a function which is used to collapse multisets of lists into a multiset, so for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 }.

Count is a SPARQL set function which counts the number times a given expression has a bound, and non-error value with the aggregate group.

Sum is a SPARQL set function that will return the numeric value obtained by summing the values within the aggregate group. Type promotion happens as per the op:numeric-add function, applied transitively,(see definition below) so the value of SUM(?x), in an aggregate group where ?x has values 1 (integer), 2.0e0 (float), and 3.0 (decimal) will be 6.0 (decimal).

The Avg set function calculates the average value for an expression over a group. It is defined in terms of Sum and Count.

Min and Max are SPARQL set functions that return the minimum and maximum value from a group respectively.

The make use of the SPARQL ORDER BY ordering definition, to allow ordering over arbitrarily typed expressions.

GroupConcat is a set function which performs a string concatenation across the values of an expression with a group. The order of the strings is not specified. The separator character used in the concatenation may be given with the scalar argument SEPARATOR.

Sample is a set function which returns an arbitrary value from the multiset passed to it.

10.2.2 Mapping from Abstract Syntax to Algebra

Example:

SELECT (SUM(?val) AS ?sum)
WHERE {
  ?a rdf:value ?val .
} GROUP BY ?a

The SUM expression becomes Aggregation((?a), (?val), Sum, (), BGP(?x rdf:value ?val)).

In general the aggregate expression

AGG(exprlist ; scalarvals) ... GROUP BY grouplist

becomes Aggregation(grouplist, exprlist, Agg, scalarvals, BGP).

Joining Aggregate Values

In order to project values from (sub-)queries using aggregate values, a Solution Multiset is constructed where each solution comprises the result of the Aggregate functions which share a key.

Definition: AggregateJoin

Given a list of aggregations, A = (A1, A2, ...) we produce a solution sequence using the AggregateJoin function:

AggregateJoin(A) = { { aggi → range(Ai) } | dom(Ai) = k, k in set-union(dom(A)) }

For example, if we have two aggregations:

A1 = { (1,3) → 5, (7,9) → 11 }
A2 = { (1,3) → 6, (7,9) → 12 }

AggregateJoin(A) = {
   { agg1 → 5, agg2 → 6 },
   { agg1 → 11, agg2 → 12 }
}

11 Subqueries

Example

Data:

Results:

yname
:bob"B. Bar"
:carol"C. Baz"

Algebra Operator

Subqueries require one additional algebra operator, toMultiset, which takes Lists and returns Multisets.

Mapping from Abstract Syntax to Algebra

In general, GroupGraphPatternSub is evaluated and then the resulting multiset is projected with the Project function, and handled as per the Converting Solution Modifiers section. The resulting sequence is converted back to a multiset with ToMultiset.

As a consequence the ordering from any ORDER BY expressions is not propagated outside the subquery.

@@ this section might be clearer if Converting Solution Modifiers was encapsulated as a function.

Example:

{
  SELECT ?z WHERE {
   ?x ?y ?z .
  }
}

Becomes ToMultiset(Project(BGP(?x ?y ?z), {?z})).

Only variables projected by the Project function are visible to operations outside the ToMultiset call. It is an error to reuse variable names both inside and outside a subquery when the variable is not projected from the subquery.

12 RDF Dataset

The RDF data model expresses information as graphs consisting of triples with subject, predicate and object. Many RDF data stores hold multiple RDF graphs and record information about each graph, allowing an application to make queries that involve information from more than one graph.

A SPARQL query is executed against an RDF Dataset which represents a collection of graphs. An RDF Dataset comprises one graph, the default graph, which does not have a name, and zero or more named graphs, where each named graph is identified by an IRI. A SPARQL query can match different parts of the query pattern against different graphs as described in section 8.3 Querying the Dataset.

An RDF Dataset may contain zero named graphs; an RDF Dataset always contains one default graph. A query does not need to involve matching the default graph; the query can just involve matching named graphs.

The graph that is used for matching a basic graph pattern is the active graph. In the previous sections, all queries have been shown executed against a single graph, the default graph of an RDF dataset as the active graph. The GRAPH keyword is used to make the active graph one of all of the named graphs in the dataset for part of the query.

12.1 Examples of RDF Datasets

The definition of RDF Dataset does not restrict the relationships of named and default graphs. Information can be repeated in different graphs; relationships between graphs can be exposed. Two useful arrangements are:

In this example, the default graph contains the names of the publishers of two named graphs. The triples in the named graphs are not visible in the default graph in this example.

Example 2:

RDF data can be combined by the RDF merge [RDF-MT] of graphs. One possible arrangement of graphs in an RDF Dataset is to have the default graph be the RDF merge of some or all of the information in the named graphs.

In this next example, the named graphs contain the same triples as before. The RDF dataset includes an RDF merge of the named graphs in the default graph, re-labeling blank nodes to keep them distinct.

# Default graph
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:x foaf:name "Bob" .
_:x foaf:mbox <mailto:bob@oldcorp.example.org> .

_:y foaf:name "Alice" .
_:y foaf:mbox <mailto:alice@work.example.org> .
# Named graph: http://example.org/bob
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a foaf:name "Bob" .
_:a foaf:mbox <mailto:bob@oldcorp.example.org> .
# Named graph: http://example.org/alice
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a foaf:name "Alice" .
_:a foaf:mbox <mailto:alice@work.example> .

In an RDF merge, blank nodes in the merged graph are not shared with blank nodes from the graphs being merged.

12.2 Specifying RDF Datasets

A SPARQL query may specify the dataset to be used for matching by using the FROM clause and the FROM NAMED clause to describe the RDF dataset. If a query provides such a dataset description, then it is used in place of any dataset that the query service would use if no dataset description is provided in a query. The RDF dataset may also be specified in a SPARQL protocol request, in which case the protocol description overrides any description in the query itself. A query service may refuse a query request if the dataset description is not acceptable to the service.

The FROM and FROM NAMED keywords allow a query to specify an RDF dataset by reference; they indicate that the dataset should include graphs that are obtained from representations of the resources identified by the given IRIs (i.e. the absolute form of the given IRI references). The dataset resulting from a number of FROM and FROM NAMED clauses is:

  • a default graph consisting of the RDF merge of the graphs referred to in the FROM clauses, and
  • a set of (IRI, graph) pairs, one from each FROM NAMED clause.

If there is no FROM clause, but there is one or more FROM NAMED clauses, then the dataset includes an empty graph for the default graph.

12.2.3 Combining FROM and FROM NAMED

The FROM clause and FROM NAMED clause can be used in the same query.

The RDF Dataset for this query contains a default graph and two named graphs. The GRAPH keyword is described below.

The actions required to construct the dataset are not determined by the dataset description alone. If an IRI is given twice in a dataset description, either by using two FROM clauses, or a FROM clause and a FROM NAMED clause, then it does not assume that exactly one or exactly two attempts are made to obtain an RDF graph associated with the IRI. Therefore, no assumptions can be made about blank node identity in triples obtained from the two occurrences in the dataset description. In general, no assumptions can be made about the equivalence of the graphs.

12.3 Querying the Dataset

When querying a collection of graphs, the GRAPH keyword is used to match patterns against named graphs. GRAPH can provide an IRI to select one graph or use a variable which will range over the IRI of all the named graphs in the query's RDF dataset.

The use of GRAPH changes the active graph for matching basic graph patterns within part of the query. Outside the use of GRAPH, the default graph is matched by basic graph patterns.

The following two graphs will be used in examples:

# Named graph: http://example.org/foaf/aliceFoaf
@prefix  foaf:     <http://xmlns.com/foaf/0.1/> .
@prefix  rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix  rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .

_:a  foaf:name     "Alice" .
_:a  foaf:mbox     <mailto:alice@work.example> .
_:a  foaf:knows    _:b .

_:b  foaf:name     "Bob" .
_:b  foaf:mbox     <mailto:bob@work.example> .
_:b  foaf:nick     "Bobby" .
_:b  rdfs:seeAlso  <http://example.org/foaf/bobFoaf> .

<http://example.org/foaf/bobFoaf>
     rdf:type      foaf:PersonalProfileDocument .
# Named graph: http://example.org/foaf/bobFoaf
@prefix  foaf:     <http://xmlns.com/foaf/0.1/> .
@prefix  rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix  rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .

_:z  foaf:mbox     <mailto:bob@work.example> .
_:z  rdfs:seeAlso  <http://example.org/foaf/bobFoaf> .
_:z  foaf:nick     "Robert" .

<http://example.org/foaf/bobFoaf>
     rdf:type      foaf:PersonalProfileDocument .

Any triple in Alice's FOAF file giving Bob's nick is not used to provide a nick for Bob because the pattern involving variable nick is restricted by ppd to a particular Personal Profile Document.

12.3.1 Named and Default Graphs

Query patterns can involve both the default graph and the named graphs. In this example, an aggregator has read in a Web resource on two different occasions. Each time a graph is read into the aggregator, it is given an IRI by the local system. The graphs are nearly the same but the email address for "Bob" has changed.

In this example, the default graph is being used to record the provenance information and the RDF data actually read is kept in two separate graphs, each of which is given a different IRI by the system. The RDF dataset consists of two named graphs and the information about them.

RDF Dataset:

The IRI for the date datatype has been abbreviated in the results for clarity.

13 Solution Sequences and Modifiers

Query patterns generate an unordered collection of solutions, each solution being a partial function from variables to RDF terms. These solutions are then treated as a sequence (a solution sequence), initially in no specific order; any sequence modifiers are then applied to create another sequence. Finally, this latter sequence is used to generate one of the results of a SPARQL query form.

A solution sequence modifier is one of:

Modifiers are applied in the order given by the list above.

13.1 ORDER BY

The ORDER BY clause establishes the order of a solution sequence.

Following the ORDER BY clause is a sequence of order comparators, composed of an expression and an optional order modifier (either ASC() or DESC()). Each ordering comparator is either ascending (indicated by the ASC() modifier or by no modifier) or descending (indicated by the DESC() modifier).

The "<" operator (see the Operator Mapping and 11.3.1 Operator Extensibility) defines the relative order of pairs of numerics, simple literals, xsd:strings, xsd:booleans and xsd:dateTimes. Pairs of IRIs are ordered by comparing them as simple literals.

SPARQL also fixes an order between some kinds of RDF terms that would not otherwise be ordered:

  1. (Lowest) no value assigned to the variable or expression in this solution.
  2. Blank nodes
  3. IRIs
  4. RDF literals

A plain literal is lower than an RDF literal with type xsd:string of the same lexical form.

SPARQL does not define a total ordering of all possible RDF terms. Here are a few examples of pairs of terms for which the relative order is undefined:

  • "a" and "a"@en_gb (a simple literal and a literal with a language tag)
  • "a"@en_gb and "b"@en_gb (two literals with language tags)
  • "a" and "1"^^xsd:integer (a simple literal and a literal with a supported data type)
  • "1"^^my:integer and "2"^^my:integer (two unsupported data types)
  • "1"^^xsd:integer and "2"^^my:integer (a supported data type and an unsupported data type)

This list of variable bindings is in ascending order:

RDF TermReason
Unbound results sort earliest.
_:zBlank nodes follow unbound.
_:aThere is no relative ordering of blank nodes.
<http://script.example/Latin>IRIs follow blank nodes.
<http://script.example/Кириллица>The character in the 23rd position, "К", has a unicode codepoint 0x41A, which is higher than 0x4C ("L").
<http://script.example/漢字> The character in the 23rd position, "漢", has a unicode codepoint 0x6F22, which is higher than 0x41A ("К").
"http://script.example/Latin"Simple literals follow IRIs.
"http://script.example/Latin"^^xsd:stringxsd:strings follow simple literals.

The ascending order of two solutions with respect to an ordering comparator is established by substituting the solution bindings into the expressions and comparing them with the "<" operator. The descending order is the reverse of the ascending order.

The relative order of two solutions is the relative order of the two solutions with respect to the first ordering comparator in the sequence. For solutions where the substitutions of the solution bindings produce the same RDF term, the order is the relative order of the two solutions with respect to the next ordering comparator. The relative order of two solutions is undefined if no order expression evaluated for the two solutions produces distinct RDF terms.

Ordering a sequence of solutions always results in a sequence with the same number of solutions in it.

Using ORDER BY on a solution sequence for a CONSTRUCT or DESCRIBE query has no direct effect because only SELECT returns a sequence of results. Used in combination with LIMIT and OFFSET, ORDER BY can be used to return results generated from a different slice of the solution sequence. An ASK query does not include ORDER BY, LIMIT or OFFSET.

Grammar rules:
[16]  OrderClause  ::=  'ORDER' 'BY' OrderCondition+
[17]  OrderCondition  ::=   ( ( 'ASC' | 'DESC' ) BrackettedExpression )
| ( Constraint | Var )
[18]  LimitClause  ::=  'LIMIT' INTEGER
[19]  OffsetClause  ::=  'OFFSET' INTEGER

13.3 Duplicate Solutions

A solution sequence with no DISTINCT or REDUCED query modifier will preserve duplicate solutions.

14 Query Forms

SPARQL has four query forms. These query forms use the solutions from pattern matching to form result sets or RDF graphs. The query forms are:

SELECT
Returns all, or a subset of, the variables bound in a query pattern match.
CONSTRUCT
Returns an RDF graph constructed by substituting variables in a set of triple templates.
ASK
Returns a boolean indicating whether a query pattern matches or not.
DESCRIBE
Returns an RDF graph that describes the resources found.

The SPARQL Variable Binding Results XML Format can be used to serialize the result set from a SELECT query or the boolean result of an ASK query.

14.1 SELECT

The SELECT form of results returns variables and their bindings directly. It combines the operations of projecting the required variables with introducing new variable bindings into a query solution.

@@Grammar refers to SPARQL 1.0 only

14.2 CONSTRUCT

The CONSTRUCT query form returns a single RDF graph specified by a graph template. The result is an RDF graph formed by taking each query solution in the solution sequence, substituting for the variables in the graph template, and combining the triples into a single RDF graph by set union.

If any such instantiation produces a triple containing an unbound variable or an illegal RDF construct, such as a literal in subject or predicate position, then that triple is not included in the output RDF graph. The graph template can contain triples with no variables (known as ground or explicit triples), and these also appear in the output RDF graph returned by the CONSTRUCT query form.

14.4 DESCRIBE (Informative)

The DESCRIBE form returns a single result RDF graph containing RDF data about resources. This data is not prescribed by a SPARQL query, where the query client would need to know the structure of the RDF in the data source, but, instead, is determined by the SPARQL query processor. The query pattern is used to create a result set. The DESCRIBE form takes each of the resources identified in a solution, together with any resources directly named by IRI, and assembles a single RDF graph by taking a "description" which can come from any information available including the target RDF Dataset. The description is determined by the query service. The syntax DESCRIBE * is an abbreviation that describes all of the variables in a query.

14.4.3 Descriptions of Resources

The RDF returned is determined by the information publisher. It is the useful information the service has about a resource. It may include information about other resources: for example, the RDF data for a book may also include details about the author.

A simple query such as

which includes the blank node closure for the vcard vocabulary vcard:N. Other possible mechanisms for deciding what information to return include Concise Bounded Descriptions [CBD].

For a vocabulary such as FOAF, where the resources are typically blank nodes, returning sufficient information to identify a node such as the InverseFunctionalProperty foaf:mbox_sha1sum as well as information like name and other details recorded would be appropriate. In the example, the match to the WHERE clause was returned, but this is not required.

15 Testing Values

@@ To add to ofunction/operator table: IRI, BNODE, STRDT, STRLANG, IF, COALESCE, IN, NOT IN

SPARQL FILTERs restrict the solutions of a graph pattern match according to a given expression. Specifically, FILTERs eliminate any solutions that, when substituted into the expression, either result in an effective boolean value of false or produce an error. Effective boolean values are defined in section 11.2.2 Effective Boolean Value and errors are defined in XQuery 1.0: An XML Query Language [XQUERY] section 2.3.1, Kinds of Errors. These errors have no affect outside of FILTER evaluation.

RDF literals may have a datatype IRI:

@prefix a:          <http://www.w3.org/2000/10/annotation-ns#> .
@prefix dc:         <http://purl.org/dc/elements/1.1/> .

_:a   a:annotates   <http://www.w3.org/TR/rdf-sparql-query/> .
_:a   dc:date       "2004-12-31T19:00:00-05:00" .

_:b   a:annotates   <http://www.w3.org/TR/rdf-sparql-query/> .
_:b   dc:date       "2004-12-31T19:01:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .

The object of the first dc:date triple has no type information. The second has the datatype xsd:dateTime.

SPARQL expressions are constructed according to the grammar and provide access to functions (named by IRI) and operator functions (invoked by keywords and symbols in the SPARQL grammar). SPARQL operators can be used to compare the values of typed literals:

PREFIX a:      <http://www.w3.org/2000/10/annotation-ns#>
PREFIX dc:     <http://purl.org/dc/elements/1.1/>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>

SELECT ?annot
WHERE { ?annot  a:annotates  <http://www.w3.org/TR/rdf-sparql-query/> .
        ?annot  dc:date      ?date .
        FILTER ( ?date > "2005-01-01T00:00:00Z"^^xsd:dateTime ) }

The SPARQL operators are listed in section 11.3 and are associated with their productions in the grammar.

In addition, SPARQL provides the ability to invoke arbitrary functions, including a subset of the XPath casting functions, listed in section 11.5. These functions are invoked by name (an IRI) within a SPARQL query. For example:

... FILTER ( xsd:dateTime(?date) < xsd:dateTime("2005-01-01T00:00:00Z") ) ...

The following typographical conventions are used in this section:

15.1 Operand Data Types

SPARQL functions and operators operate on RDF terms and SPARQL variables. A subset of these functions and operators are taken from the XQuery 1.0 and XPath 2.0 Functions and Operators [FUNCOP] and have XML Schema typed value arguments and return types. RDF typed literals passed as arguments to these functions and operators are mapped to XML Schema typed values with a string value of the lexical form and an atomic datatype corresponding to the datatype IRI. The returned typed values are mapped back to RDF typed literals the same way.

SPARQL has additional operators which operate on specific subsets of RDF terms. When referring to a type, the following terms denote a typed literal with the corresponding XML Schema [XSDT] datatype IRI:

The following terms identify additional types used in SPARQL value tests:

  • numeric denotes typed literals with datatypes xsd:integer, xsd:decimal, xsd:float, and xsd:double.
  • simple literal denotes a plain literal with no language tag.
  • RDF term denotes the types IRI, literal, and blank node.
  • variable denotes a SPARQL variable.

The following types are derived from numeric types and are valid arguments to functions and operators taking numeric arguments:

SPARQL language extensions may treat additional types as being derived from XML schema data types.

15.2 Filter Evaluation

SPARQL provides a subset of the functions and operators defined by XQuery Operator Mapping. XQuery 1.0 section 2.2.3 Expression Processing describes the invocation of XPath functions. The following rules accommodate the differences in the data and execution models between XQuery and SPARQL:

  • Unlike XPath/XQuery, SPARQL functions do not process node sequences. When interpreting the semantics of XPath functions, assume that each argument is a sequence of a single node.
  • Functions invoked with an argument of the wrong type will produce a type error. Effective boolean value arguments (labeled "xsd:boolean (EBV)" in the operator mapping table below), are coerced to xsd:boolean using the EBV rules in section 11.2.2 .
  • Apart from BOUND, all functions and operators operate on RDF Terms and will produce a type error if any arguments are unbound.
  • Any expression other than logical-or (||) or logical-and (&&) that encounters an error will produce that error.
  • A logical-or that encounters an error on only one branch will return TRUE if the other branch is TRUE and an error if the other branch is FALSE.
  • A logical-and that encounters an error on only one branch will return an error if the other branch is TRUE and FALSE if the other branch is FALSE.
  • A logical-or or logical-and that encounters errors on both branches will produce either of the errors.

The logical-and and logical-or truth table for true (T), false (F), and error (E) is as follows:

ABA || BA && B
TTTT
TFTF
FTTF
FFFF
TETE
ETTE
FEEF
EFEF
EEEE

15.2.1 Invocation

SPARQL defines a syntax for invoking functions and operators on a list of arguments. These are invoked as follows:

  • Argument expressions are evaluated, producing argument values. The order of argument evaluation is not defined.
  • Numeric arguments are promoted as necessary to fit the expected types for that function or operator.
  • The function or operator is invoked on the argument values.

If any of these steps fails, the invocation generates an error. The effects of errors are defined in Filter Evaluation.

15.2.2 Effective Boolean Value (EBV)

Effective boolean value is used to calculate the arguments to the logical functions logical-and, logical-or, and fn:not, as well as evaluate the result of a FILTER expression.

The XQuery Effective Boolean Value rules rely on the definition of XPath's fn:boolean. The following rules reflect the rules for fn:boolean applied to the argument types present in SPARQL Queries:

  • The EBV of any literal whose type is xsd:boolean or numeric is false if the lexical form is not valid for that datatype (e.g. "abc"^^xsd:integer).
  • If the argument is a typed literal with a datatype of xsd:boolean, the EBV is the value of that argument.
  • If the argument is a plain literal or a typed literal with a datatype of xsd:string, the EBV is false if the operand value has zero length; otherwise the EBV is true.
  • If the argument is a numeric type or a typed literal with a datatype derived from a numeric type, the EBV is false if the operand value is NaN or is numerically equal to zero; otherwise the EBV is true.
  • All other arguments, including unbound arguments, produce a type error.

An EBV of true is represented as a typed literal with a datatype of xsd:boolean and a lexical value of "true"; an EBV of false is represented as a typed literal with a datatype of xsd:boolean and a lexical value of "false".

15.3 Operator Mapping

The SPARQL grammar identifies a set of operators (for instance, &&, *, isIRI) used to construct constraints. The following table associates each of these grammatical productions with the appropriate operands and an operator function defined by either XQuery 1.0 and XPath 2.0 Functions and Operators [FUNCOP] or the SPARQL operators specified in section 11.4. When selecting the operator definition for a given set of parameters, the definition with the most specific parameters applies. For instance, when evaluating xsd:integer = xsd:signedInt, the definition for = with two numeric parameters applies, rather than the one with two RDF terms. The table is arranged so that the upper-most viable candiate is the most specific. Operators invoked without appropriate operands result in a type error.

SPARQL follows XPath's scheme for numeric type promotions and subtype substitution for arguments to numeric operators. The XPath Operator Mapping rules for numeric operands (xsd:integer, xsd:decimal, xsd:float, xsd:double, and types derived from a numeric type) apply to SPARQL operators as well (see XML Path Language (XPath) 2.0 [XPATH20] for defintions of numeric type promotions and subtype substitution). Some of the operators are associated with nested function expressions, e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath definitions, fn:not and op:numeric-equal produce an error if their argument is an error.

The collation for fn:compare is defined by XPath and identified by http://www.w3.org/2005/xpath-functions/collation/codepoint. This collation allows for string comparison based on code point values. Codepoint string equivalence can be tested with RDF term equivalence.

SPARQL Unary Operators
OperatorType(A)FunctionResult type
XQuery Unary Operators
! A xsd:boolean (EBV)fn:not(A)xsd:boolean
+ A numericop:numeric-unary-plus(A)numeric
- A numericop:numeric-unary-minus(A)numeric
SPARQL Tests, defined in section 11.4
BOUND(A) variablebound(A)xsd:boolean
isIRI(A)
isURI(A)
RDF termisIRI(A)xsd:boolean
isBLANK(A) RDF termisBlank(A)xsd:boolean
isLITERAL(A) RDF termisLiteral(A)xsd:boolean
SPARQL Accessors, defined in section 11.4
STR(A) literalstr(A)simple literal
STR(A) IRIstr(A)simple literal
LANG(A) literallang(A)simple literal
DATATYPE(A) typed literaldatatype(A)IRI
DATATYPE(A) simple literaldatatype(A)IRI
SPARQL Binary Operators
OperatorType(A)Type(B)FunctionResult type
Logical Connectives, defined in section 11.4
A || Bxsd:boolean (EBV)xsd:boolean (EBV)logical-or(A, B)xsd:boolean
A && Bxsd:boolean (EBV)xsd:boolean (EBV)logical-and(A, B)xsd:boolean
XPath Tests
A = Bnumericnumericop:numeric-equal(A, B)xsd:boolean
A = Bsimple literalsimple literalop:numeric-equal(fn:compare(A, B), 0)xsd:boolean
A = Bxsd:stringxsd:stringop:numeric-equal(fn:compare(STR(A), STR(B)), 0)xsd:boolean
A = Bxsd:booleanxsd:booleanop:boolean-equal(A, B)xsd:boolean
A = Bxsd:dateTimexsd:dateTimeop:dateTime-equal(A, B)xsd:boolean
A != Bnumericnumericfn:not(op:numeric-equal(A, B))xsd:boolean
A != Bsimple literalsimple literalfn:not(op:numeric-equal(fn:compare(A, B), 0))xsd:boolean
A != Bxsd:stringxsd:stringfn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 0))xsd:boolean
A != Bxsd:booleanxsd:booleanfn:not(op:boolean-equal(A, B))xsd:boolean
A != Bxsd:dateTimexsd:dateTimefn:not(op:dateTime-equal(A, B))xsd:boolean
A < Bnumericnumericop:numeric-less-than(A, B)xsd:boolean
A < Bsimple literalsimple literalop:numeric-equal(fn:compare(A, B), -1)xsd:boolean
A < Bxsd:stringxsd:stringop:numeric-equal(fn:compare(STR(A), STR(B)), -1)xsd:boolean
A < Bxsd:booleanxsd:booleanop:boolean-less-than(A, B)xsd:boolean
A < Bxsd:dateTimexsd:dateTimeop:dateTime-less-than(A, B)xsd:boolean
A > Bnumericnumericop:numeric-greater-than(A, B)xsd:boolean
A > Bsimple literalsimple literalop:numeric-equal(fn:compare(A, B), 1)xsd:boolean
A > Bxsd:stringxsd:stringop:numeric-equal(fn:compare(STR(A), STR(B)), 1)xsd:boolean
A > Bxsd:booleanxsd:booleanop:boolean-greater-than(A, B)xsd:boolean
A > Bxsd:dateTimexsd:dateTimeop:dateTime-greater-than(A, B)xsd:boolean
A <= Bnumericnumericlogical-or(op:numeric-less-than(A, B), op:numeric-equal(A, B))xsd:boolean
A <= Bsimple literalsimple literalfn:not(op:numeric-equal(fn:compare(A, B), 1))xsd:boolean
A <= Bxsd:stringxsd:stringfn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 1))xsd:boolean
A <= Bxsd:booleanxsd:booleanfn:not(op:boolean-greater-than(A, B))xsd:boolean
A <= Bxsd:dateTimexsd:dateTimefn:not(op:dateTime-greater-than(A, B))xsd:boolean
A >= Bnumericnumericlogical-or(op:numeric-greater-than(A, B), op:numeric-equal(A, B))xsd:boolean
A >= Bsimple literalsimple literalfn:not(op:numeric-equal(fn:compare(A, B), -1))xsd:boolean
A >= Bxsd:stringxsd:stringfn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), -1))xsd:boolean
A >= Bxsd:booleanxsd:booleanfn:not(op:boolean-less-than(A, B))xsd:boolean
A >= Bxsd:dateTimexsd:dateTimefn:not(op:dateTime-less-than(A, B))xsd:boolean
XPath Arithmetic
A * Bnumericnumericop:numeric-multiply(A, B)numeric
A / Bnumericnumericop:numeric-divide(A, B)numeric; but xsd:decimal if both operands are xsd:integer
A + Bnumericnumericop:numeric-add(A, B)numeric
A - Bnumericnumericop:numeric-subtract(A, B)numeric
SPARQL Tests, defined in section 11.4
A = BRDF termRDF termRDFterm-equal(A, B)xsd:boolean
A != BRDF termRDF termfn:not(RDFterm-equal(A, B))xsd:boolean
sameTERM(A, B) RDF termRDF termsameTerm(A, B)xsd:boolean
langMATCHES(A, B) simple literalsimple literallangMatches(A, B)xsd:boolean
REGEX(STRING, PATTERN)simple literalsimple literalfn:matches(STRING, PATTERN)xsd:boolean
SPARQL Trinary Operators
OperatorType(A)Type(B)Type(C)FunctionResult type
SPARQL Tests, defined in section 11.4
REGEX(STRING, PATTERN, FLAGS)simple literalsimple literalsimple literalfn:matches(STRING, PATTERN, FLAGS)xsd:boolean

xsd:boolean function arguments marked with "(EBV)" are coerced to xsd:boolean by evaluating the effective boolean value of that argument.

15.4 Operators Definitions

This section defines the operators introduced by the SPARQL Query language. The examples show the behavior of the operators as invoked by the appropriate grammatical constructs.

15.4.4 isLiteral

 xsd:boolean   isLiteral (RDF term term)

Returns true if term is a literal. Returns false otherwise.

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name       "Alice".
_:a  foaf:mbox       <mailto:alice@work.example> .

_:b  foaf:name       "Bob" .
_:b  foaf:mbox       "bob@work.example" .

This query is similar to the one in 11.4.2 except that is matches the people with a name and an mbox which is a literal. This could be used to look for erroneous data (foaf:mbox should only have an IRI as its object).

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
 WHERE { ?x foaf:name  ?name ;
           foaf:mbox  ?mbox .
         FILTER isLiteral(?mbox) }

Query result:

namembox
"Bob""bob@work.example"

15.4.10 RDFterm-equal

 xsd:boolean   RDF term term1 = RDF term term2

Returns TRUE if term1 and term2 are the same RDF term as defined in Resource Description Framework (RDF): Concepts and Abstract Syntax [CONCEPTS]; produces a type error if the arguments are both literal but are not the same RDF term; returns FALSE otherwise. term1 and term2 are the same if any of the following is true:

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name       "Alice".
_:a  foaf:mbox       <mailto:alice@work.example> .

_:b  foaf:name       "Ms A.".
_:b  foaf:mbox       <mailto:alice@work.example> .

This query finds the people who have multiple foaf:name triples:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name1 ?name2
 WHERE { ?x foaf:name  ?name1 ;
            foaf:mbox  ?mbox1 .
         ?y foaf:name  ?name2 ;
            foaf:mbox  ?mbox2 .
         FILTER (?mbox1 = ?mbox2 && ?name1 != ?name2)
       }

Query result:

name1name2
"Alice""Ms A."
"Ms A.""Alice"

In this query for documents that were annotated on New Year's Day (2004 or 2005), the RDF terms are not the same, but have equivalent values:

@prefix a:          <http://www.w3.org/2000/10/annotation-ns#> .
@prefix dc:         <http://purl.org/dc/elements/1.1/> .

_:b   a:annotates   <http://www.w3.org/TR/rdf-sparql-query/> .
_:b   dc:date       "2004-12-31T19:00:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
PREFIX a:      <http://www.w3.org/2000/10/annotation-ns#>
PREFIX dc:     <http://purl.org/dc/elements/1.1/>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>

SELECT ?annotates
WHERE { ?annot  a:annotates  ?annotates .
        ?annot  dc:date      ?date .
        FILTER ( ?date = xsd:dateTime("2005-01-01T00:00:00Z") ) }
annotates
<http://www.w3.org/TR/rdf-sparql-query/>

* Invoking RDFterm-equal on two typed literals tests for equivalent values. An extended implementation may have support for additional datatypes. An implementation processing a query that tests for equivalence on unsupported datatypes (and non-identical lexical form and datatype IRI) returns an error, indicating that it was unable to determine whether or not the values are equivalent. For example, an unextended implementation will produce an error when testing either "iiii"^^my:romanNumeral = "iv"^^my:romanNumeral or "iiii"^^my:romanNumeral != "iv"^^my:romanNumeral.

15.4.11 sameTerm

 xsd:boolean   sameTerm (RDF term term1, RDF term term2)

Returns TRUE if term1 and term2 are the same RDF term as defined in Resource Description Framework (RDF): Concepts and Abstract Syntax [CONCEPTS]; returns FALSE otherwise.

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name       "Alice".
_:a  foaf:mbox       <mailto:alice@work.example> .

_:b  foaf:name       "Ms A.".
_:b  foaf:mbox       <mailto:alice@work.example> .

This query finds the people who have multiple foaf:name triples:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name1 ?name2
 WHERE { ?x foaf:name  ?name1 ;
            foaf:mbox  ?mbox1 .
         ?y foaf:name  ?name2 ;
            foaf:mbox  ?mbox2 .
         FILTER (sameTerm(?mbox1, ?mbox2) && !sameTerm(?name1, ?name2))
       }

Query result:

name1name2
"Alice""Ms A."
"Ms A.""Alice"

Unlike RDFterm-equal, sameTerm can be used to test for non-equivalent typed literals with unsupported data types:

@prefix :          <http://example.org/WMterms#> .
@prefix t:         <http://example.org/types#> .

_:c1  :label        "Container 1" .
_:c1  :weight       "100"^^t:kilos .
_:c1  :displacement  "100"^^t:liters .

_:c2  :label        "Container 2" .
_:c2  :weight       "100"^^t:kilos .
_:c2  :displacement  "85"^^t:liters .

_:c3  :label        "Container 3" .
_:c3  :weight       "85"^^t:kilos .
_:c3  :displacement  "85"^^t:liters .
PREFIX  :      <http://example.org/WMterms#>
PREFIX  t:     <http://example.org/types#>

SELECT ?aLabel1 ?bLabel
WHERE { ?a  :label        ?aLabel .
        ?a  :weight       ?aWeight .
        ?a  :displacement ?aDisp .

        ?b  :label        ?bLabel .
        ?b  :weight       ?bWeight .
        ?b  :displacement ?bDisp .

        FILTER ( sameTerm(?aWeight, ?bWeight) && !sameTerm(?aDisp, ?bDisp) }
aLabelbLabel
"Container 1""Container 2"
"Container 2""Container 1"

The test for boxes with the same weight may also be done with the '=' operator (RDFterm-equal) as the test for "100"^^t:kilos = "85"^^t:kilos will result in an error, eliminating that potential solution.

15.4.12 langMatches

 xsd:boolean   langMatches (simple literal language-tag, simple literal language-range)

Returns true if language-tag (first argument) matches language-range (second argument) per the basic filtering scheme defined in [RFC4647] section 3.3.1. language-range is a basic language range per Matching of Language Tags [RFC4647] section 2.1. A language-range of "*" matches any non-empty language-tag string.

@prefix dc:       <http://purl.org/dc/elements/1.1/> .

_:a  dc:title         "That Seventies Show"@en .
_:a  dc:title         "Cette Série des Années Soixante-dix"@fr .
_:a  dc:title         "Cette Série des Années Septante"@fr-BE .
_:b  dc:title         "Il Buono, il Bruto, il Cattivo" .

This query uses langMatches and lang (described in section 11.2.3.8) to find the French titles for the show known in English as "That Seventies Show":

PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title
 WHERE { ?x dc:title  "That Seventies Show"@en ;
            dc:title  ?title .
         FILTER langMatches( lang(?title), "FR" ) }

Query result:

title
"Cette Série des Années Soixante-dix"@fr
"Cette Série des Années Septante"@fr-BE

The idiom langMatches( lang( ?v ), "*" ) will not match literals without a language tag as lang( ?v ) will return an empty string, so

PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title
 WHERE { ?x dc:title  ?title .
         FILTER langMatches( lang(?title), "*" ) }

will report all of the titles with a language tag:

title
"That Seventies Show"@en
"Cette Série des Années Soixante-dix"@fr
"Cette Série des Années Septante"@fr-BE

15.4.13 regex

 xsd:boolean   regex (simple literal text, simple literal pattern)
 xsd:boolean   regex (simple literal text, simple literal pattern, simple literal flags)

Invokes the XPath fn:matches function to match text against a regular expression pattern. The regular expression language is defined in XQuery 1.0 and XPath 2.0 Functions and Operators section 7.6.1 Regular Expression Syntax [FUNCOP].

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name       "Alice".
_:b  foaf:name       "Bob" .
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
 WHERE { ?x foaf:name  ?name
         FILTER regex(?name, "^ali", "i") }

Query result:

name
"Alice"

15.5 Constructor Functions

SPARQL imports a subset of the XPath constructor functions defined in XQuery 1.0 and XPath 2.0 Functions and Operators [FUNCOP] in section 17.1 Casting from primitive types to primitive types. SPARQL constructors include all of the XPath constructors for the SPARQL operand data types plus the additional datatypes imposed by the RDF data model. Casting in SPARQL is performed by calling a constructor function for the target type on an operand of the source type.

XPath defines only the casts from one XML Schema datatype to another. The remaining casts are defined as follows:

  • Casting an IRI to an xsd:string produces a typed literal with a lexical value of the codepoints comprising the IRI, and a datatype of xsd:string.
  • Casting a simple literal to any XML Schema datatype is defined as the product of casting an xsd:string with the string value equal to the lexical value of the literal to the target datatype.

The table below summarizes the casting operations that are always allowed (Y), never allowed (N) and dependent on the lexical value (M). For example, a casting operation from an xsd:string (the first row) to an xsd:float (the second column) is dependent on the lexical value (M).

bool = xsd:boolean
dbl = xsd:double
flt = xsd:float
dec = xsd:decimal
int = xsd:integer
dT = xsd:dateTime
str = xsd:string
IRI = IRI
ltrl = simple literal

From \ TostrfltdbldecintdTbool
strYMMMMMM
fltYYYMMNY
dblYYYMMNY
decYYYYYNY
intYYYYYNY
dTYNNNNYN
boolYYYYYNY
IRIYNNNNNN
ltrlYMMMMMM

15.6 Extensible Value Testing

A PrimaryExpression grammar rule can be a call to an extension function named by an IRI. An extension function takes some number of RDF terms as arguments and returns an RDF term. The semantics of these functions are identified by the IRI that identifies the function.

SPARQL queries using extension functions are likely to have limited interoperability.

As an example, consider a function called func:even:

 xsd:boolean   func:even (numeric value)

This function would be invoked in a FILTER as such:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX func: <http://example.org/functions#>
SELECT ?name ?id
WHERE { ?x foaf:name  ?name ;
           func:empId   ?id .
        FILTER (func:even(?id)) }

For a second example, consider a function aGeo:distance that calculates the distance between two points, which is used here to find the places near Grenoble:

 xsd:double   aGeo:distance (numeric x1, numeric y1, numeric x2, numeric y2)
PREFIX aGeo: <http://example.org/geo#>

SELECT ?neighbor
WHERE { ?a aGeo:placeName "Grenoble" .
        ?a aGeo:location ?axLoc .
        ?a aGeo:location ?ayLoc .

        ?b aGeo:placeName ?neighbor .
        ?b aGeo:location ?bxLoc .
        ?b aGeo:location ?byLoc .

        FILTER ( aGeo:distance(?axLoc, ?ayLoc, ?bxLoc, ?byLoc) < 10 ) .
      }

An extension function might be used to test some application datatype not supported by the core SPARQL specification, it might be a transformation between datatype formats, for example into an XSD dateTime RDF term from another date format.

16 Definition of SPARQL

This section defines the correct behavior for evaluation of graph patterns and solution modifiers, given a query string and an RDF dataset. It does not imply a SPARQL implementation must use the process defined here.

The outcome of executing a SPARQL query is defined by a series of steps, starting from the SPARQL query as a string, turning that string into an abstract syntax form, then turning the abstract syntax into a SPARQL abstract query comprising operators from the SPARQL algebra. This abstract query is then evaluated on an RDF dataset.

16.1 Initial Definitions

16.1.1 RDF Terms

SPARQL is defined in terms of IRIs [RFC3987]. IRIs are a subset of RDF URI References that omits spaces.

Definition: RDF Term

Let I be the set of all IRIs.
Let RDF-L be the set of all RDF Literals
Let RDF-B be the set of all blank nodes in RDF graphs

The set of RDF Terms, RDF-T, is I union RDF-L union RDF-B.

This definition of RDF Term collects together several basic notions from the RDF data model, but updated to refer to IRIs rather than RDF URI references.

16.1.2 RDF Dataset

Definition: RDF Dataset

An RDF dataset is a set:
{ G, (<u1>, G1), (<u2>, G2), . . . (<un>, Gn) }
where G and each Gi are graphs, and each <ui> is an IRI. Each <ui> is distinct.

G is called the default graph. (<ui>, Gi) are called named graphs.

Definition: Active Graph

The active graph is the graph from the dataset used for basic graph pattern matching.

16.1.3 Query Variables

Definition: Query Variable

A query variable is a member of the set V where V is infinite and disjoint from RDF-T.

16.1.4 Triple Patterns

Definition: Triple Pattern

A triple pattern is member of the set:
(RDF-T union V) x (I union V) x (RDF-T union V)

This definition of Triple Pattern includes literal subjects. This has been noted by RDF-core.

"[The RDF core Working Group] noted that it is aware of no reason why literals should not
  be subjects and a future WG with a less restrictive charter may
  extend the syntaxes to allow literals as the subjects of statements."

Because RDF graphs may not contain literal subjects, any SPARQL triple pattern with a literal as subject will fail to match on any RDF graph.

16.1.5 Basic Graph Patterns

Definition: Basic Graph Pattern

A Basic Graph Pattern is a set of Triple Patterns.

The empty graph pattern is a basic graph pattern which is the empty set.

16.1.6 Solution Mapping

A solution mapping is a mapping from a set of variables to a set of RDF terms. We use the term 'solution' where it is clear.

Definition: Solution Mapping

A solution mapping, μ, is a partial function μ : V -> RDF-T.

The domain of μ, dom(μ), is the subset of V where μ is defined.

Definition: Solution Sequence

A solution sequence is a list of solutions, possibly unordered.

16.1.7 Solution Sequence Modifiers

Definition: Solution Sequence Modifier

A solution sequence modifier is one of:

  • Group By
  • Having @@
  • Select Expressions@@
  • Order By modifier: put the solutions in order
  • Projection modifier: choose certain variables
  • Distinct modifier: ensure solutions in the sequence are unique
  • Reduced modifier: permit any non-distinct solutions to be eliminated
  • Offset modifier: control where the solutions start from in the overall sequence of solutions
  • Limit modifier: restrict the number of solutions

16.2 SPARQL Query

This section defines the process of converting graph patterns and solution modifiers in a SPARQL query string into a SPARQL algebra expression. The process describes converts one level of query nesting, as formed by subqueries using the nested SELECT syntax and is applied recursively on subqueries. Each level consists of graph pattern matching and filtering, followed by the application of solution modifiers.

@@Better description of level? Suggestions?

After parsing a SPARQL query string, and applying the abbreviations for IRIs and triple patterns given in section 4, there is an abstract syntax tree composed of:

PatternsModifiersQuery Forms
RDF termsDISTINCTSELECT
triple patternsREDUCEDCONSTRUCT
Basic graph patternsPROJECTDESCRIBE
GroupsORDER BYASK
OPTIONALLIMIT 
UNIONOFFSET 
GRAPHGROUP BY 
NOT EXISTSHAVING 
EXISTSSelect expressions 
MINUS  
FILTER  

The result of converting such an abstract syntax tree is a SPARQL query that uses the following symbols in the SPARQL algebra:

Graph PatternSolution Modifiers
BGP ToList
JoinOrderBy
LeftJoinProject
FilterDistinct
UnionReduced
GraphSlice
ExtendsGroupAggregate@@
Minus 

@@Check

Slice is the combination of OFFSET and LIMIT.

ToList is used where conversion from the results of graph pattern matching to sequences occurs.

Definition: SPARQL Query

A SPARQL Abstract Query is a tuple (E, DS, R) where:

16.2.1 Converting Graph Patterns

This section describes the process for translating a SPARQL graph pattern into a SPARQL algebra expression. After translating syntactic abbreviations for IRIs and triple patterns, it recursively processes syntactic forms into algebra expressions:

First, expand abbreviations for IRIs and triple patterns given in section 4.

The WhereClause consists of a GroupGraphPattern which is comprised of the following forms:

@@Group/aggregate

Each is translated by the following procedure:

Transform(syntax form)

If the form is TriplesBlock

@@Extend for Property Paths

The result is BGP(list of triple patterns)

If the form is GroupOrUnionGraphPattern

Let A := undefined

For each element G in the GroupOrUnionGraphPattern
    If A is undefined
        A := Transform(G)
    Else
        A := Union(A, Transform(G))

The result is A

If the form is GraphGraphPattern

If the form is GRAPH IRI GroupGraphPattern
    The result is Graph(IRI, Transform(GroupGraphPattern))
If the form is GRAPH Var GroupGraphPattern
    The result is Graph(Var, Transform(GroupGraphPattern))

If the form is GroupGraphPattern

We introduce the following symbols:

  • Join(Pattern, Pattern)
  • LeftJoin(Pattern, Pattern, expression)
  • Filter(expression, Pattern)
Let FS := the empty set
Let G := the empty pattern, Z, a basic graph pattern which is the empty set.

For each element E in the GroupGraphPattern
   If E is of the form FILTER(expr)
       FS := FS set-union {expr}
   If E is of the form OPTIONAL{P}
   Then
       Let A := Transform(P)
       If A is of the form Filter(F, A2)
           G := LeftJoin(G, A2, F)
       else 
           G := LeftJoin(G, A, true)

   @@All binary operators that have open LHS: new UNION, MINUS, (NOT)EXISTS
   @@SubSELECT??

   If E is any other form:
      Let A := Transform(E)
      G := Join(G, A)
   
  
If FS is not empty:
  Let X := Conjunction of expressions in FS
  G := Filter(X, G)
The result is G.

Simplification step:

Groups of one graph pattern (not a filter) become join(Z, A) and can be replaced by A. The empty graph pattern Z is the identity for join:

Replace join(Z, A) by A
Replace join(A, Z) by A

16.2.2 Examples of Mapped Graph Patterns

The second form of a rewrite example is the first with empty group joins removed by the simplification step.

Example: group with a basic graph pattern consisting of a single triple pattern:

Example: group with a basic graph pattern consisting of two triple patterns:

Example: group consisting of a union of two basic graph patterns:

Example: group consisting a union of a union and a basic graph pattern:

Example: group consisting of a basic graph pattern and an optional graph pattern:

Example: group consisting of a basic graph pattern and two optional graph patterns:

Example: group consisting of a basic graph pattern and an optional graph pattern with a filter:

Example: group consisting of a union graph pattern and an optional graph pattern:

Example: group consisting of a basic graph pattern, a filter and an optional graph pattern:

16.2.3 Converting Solution Modifiers

Solutions modifiers apply to the processing of a SPARQL query after pattern matching. The solution modifiers are applied to a query in the following order:

@@First sort out SELECT into extend and project and collect all uses of aggregates at this level.

Step 1 : ToList

ToList turns a multiset into a sequence with the same elements and cardinality. There is no implied ordering to the sequence; duplicates need not be adjacent.

Let M := ToList(Pattern)

Step 2: GROUP BY

If the GROUP BY keyword is used, or there is implicit grouping due to the use of aggregates in the projection, then grouping is performed by the Group function. It divides the solution into one or more solutions, with the same overall cardinality.

If there is no grouping required by the query, then steps 2 and 3 are omitted.

Let G := Group(M)

Step 3: Aggregates

The Aggregation function applies the aggregates to the group, and the results are joined to produce the new solution using the AggregateJoin function.

Let A1 := Aggregation(G)
...

Let M := AggregateJoin(A)

Step 4: Select expressions

@@This draft text sorts out extends/project and will also modify the translation step for projection below as well

@@Define "visible variable" and pull out of text here.

We have two forms of the abstract syntax to consider:

SELECT selItem ... { pattern }
SELECT * { pattern }

 

Let X := algebra from earlier steps
Let VS := list of all variables visible in the pattern,
          so restricted by sub-SELECT projected variables and GROUP BY variables.
    Not visible: only in filter, exists/not exists, masked by a subselect, non-projected GROUP variables.

Let P := [], a list of variable names
Let E := [], a list of pairs of the form (expression, variable)
  
IF  "SELECT *" THEN P := VS

IF  "SELECT selItem ...:" then  
  for each selItem:
    IF selItem is a variable THEN
      P := P append variable 
    FI
    IF selItem is (expr AS variable) THEN 
       variable must not appear in VS; if it does then generate a syntax error and stop
       P := P append variable
     E := E append (expr, variable) 
    FI

for each pair (var, expr) in E:
  X := extend(X, var, expr)
  
X := project(X, P)
 
Result is X  

The syntax error arises for use of a variable as the named target of AS (e.g. ... AS ?x) when the variable is used inside the WHERE clause of the SELECT.

Step 5: HAVING

@@

Step 6 : ORDER BY

If the query string has an ORDER BY clause

M := OrderBy(M, list of order comparators)

Step 7 : Projection

M := Project(M, vars)

where vars is the set of variables mentioned in the SELECT clause or all named variables in the query if SELECT * used.

Step 8 : DISTINCT

If the query contains DISTINCT,

M := Distinct(M)

Step 9 : REDUCED

If the query contains REDUCED,

M := Reduced(M)

Step 10 : OFFSET and LIMIT

If the query contains "OFFSET start" or "LIMIT length"

M := Slice(M, start, length)

start defaults to 0

length defaults to (size(M)-start).

The overall abstract query is M.

16.3 Basic Graph Patterns

When matching graph patterns, the possible solutions form a multiset [multiset], also known as a bag. A multiset is an unordered collection of elements in which each element may appear more than once. It is described by a set of elements and a cardinality function giving the number of occurrences of each element from the set in the multiset.

Write μ for solution mappings and

Write μ0 for the mapping such that dom(μ0) is the empty set.

Write Ω0 for the multiset consisting of exactly the empty mapping μ0, with cardinality 1. This is the join identity.

Write μ(?x->t) for the solution mapping variable x to RDF term t : { (x, t) }

Write Ω(?x->t) for the multiset consisting of exactly μ(?x->t), that is, { { (x, t) } } with cardinality 1.

Definition: Compatible Mappings

Two solution mappings μ1 and μ2 are compatible if, for every variable v in dom(μ1) and in dom(μ2), μ1(v) = μ2(v).

If μ1 and μ2 are compatible then μ1 set-union μ2 is also a mapping. Write merge(μ1, μ2) for μ1 set-union μ2

Write card[Ω](μ) for the cardinality of solution mapping μ in a multiset of mappings Ω.

16.3.1 SPARQL Basic Graph Pattern Matching

Basic graph patterns form the basis of SPARQL pattern matching. A basic graph pattern is matched against the active graph for that part of the query. Basic graph patterns can be instantiated by replacing both variables and blank nodes by terms, giving two notions of instance. Blank nodes are replaced using an RDF instance mapping,  σ, from blank nodes to RDF terms; variables are replaced by a solution mapping from query variables to RDF terms.

Definition: Pattern Instance Mapping

A Pattern Instance Mapping, P, is the combination of an RDF instance mapping, σ, and solution mapping, μ. P(x) = μ(σ(x))

For a BGP 'x', P(x) denotes the result of replacing blank nodes b in x for which σ is defined with σ(b) and all variables v in x for which μ is defined with μ(v).

Any pattern instance mapping defines a unique solution mapping and a unique RDF instance mapping obtained by restricting it to query variables and blank nodes respectively.

Definition: Basic Graph Pattern Matching

Let BGP be a basic graph pattern and let G be an RDF graph.

μ is a solution for BGP from G when there is a pattern instance mapping P such that P(BGP) is a subgraph of G and μ is the restriction of P to the query variables in BGP.

card[Ω](μ) = card[Ω](number of distinct RDF instance mappings, σ, such that P = μ(σ) is a pattern instance mapping and P(BGP) is a subgraph of G).

If a basic graph pattern is the empty set, then the solution is Ω0.

16.3.2 Treatment of Blank Nodes

This definition allows the solution mapping to bind a variable in a basic graph pattern, BGP, to a blank node in G. Since SPARQL treats blank node identifiers in a SPARQL Query Results XML Format document as scoped to the document, they cannot be understood as identifying nodes in the active graph of the dataset. If DS is the dataset of a query, pattern solutions are therefore understood to be not from the active graph of DS itself, but from an RDF graph, called the scoping graph, which is graph-equivalent to the active graph of DS but shares no blank nodes with DS or with BGP. The same scoping graph is used for all solutions to a single query. The scoping graph is purely a theoretical construct; in practice, the effect is obtained simply by the document scope conventions for blank node identifiers.

Since RDF blank nodes allow infinitely many redundant solutions for many patterns, there can be infinitely many pattern solutions (obtained by replacing blank nodes by different blank nodes). It is necessary, therefore, to somehow delimit the solutions for a basic graph pattern. SPARQL uses the subgraph match criterion to determine the solutions of a basic graph pattern. There is one solution for each distinct pattern instance mapping from the basic graph pattern to a subset of the active graph.

This is optimized for ease of computation rather than redundancy elimination. It allows query results to contain redundancies even when the active graph of the dataset is lean, and it allows logically equivalent datasets to yield different query results.

16.4 SPARQL Algebra

For each symbol in a SPARQL abstract query, we define an operator for evaluation. The SPARQL algebra operators of the same name are used to evaluate SPARQL abstract query nodes as described in the section "Evaluation Semantics".

Definition: Filter

Let Ω be a multiset of solution mappings and expr be an expression. We define:

Filter(expr, Ω) = { μ | μ in Ω and expr(μ) is an expression that has an effective boolean value of true }

card[Filter(expr, Ω)](μ) = card[Ω](μ)

Definition: Join

Let Ω1 and Ω2 be multisets of solution mappings. We define:

Join(Ω1, Ω2) = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are compatible }

card[Join(Ω1, Ω2)](μ) =
    for each merge(μ1, μ2), μ1 in Ω1and μ2 in Ω2 such that μ = merge(μ1, μ2),
        sum over (μ1, μ2), card[Ω1](μ1)*card[Ω2](μ2)

It is possible that a solution mapping μ in a Join can arise in different solution mappings, μ1and μ2 in the multisets being joined. The cardinality of  μ is the sum of the cardinalities from all possibilities.

Definition: Diff

Let Ω1 and Ω2 be multisets of solution mappings. We define:

Diff(Ω1, Ω2, expr) = { μ | μ in Ω1 such that for all μ′ in Ω2, either μ and μ′ are not compatible or μ and μ' are compatible and expr(merge(μ, μ')) has an effective boolean value of false }

card[Diff(Ω1, Ω2, expr)](μ) = card[Ω1](μ)

Diff is used internally for the definition of LeftJoin.

Definition: LeftJoin

Let Ω1 and Ω2 be multisets of solution mappings and expr be an expression. We define:

LeftJoin(Ω1, Ω2, expr) = Filter(expr, Join(Ω1, Ω2)) set-union Diff(Ω1, Ω2, expr)

card[LeftJoin(Ω1, Ω2, expr)](μ) = card[Filter(expr, Join(Ω1, Ω2))](μ) + card[Diff(Ω1, Ω2, expr)](μ)

Written in full that is:

LeftJoin(Ω1, Ω2, expr) =
    { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are compatible and expr(merge(μ1, μ2)) is true }
set-union
    { μ1 | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are not compatible, or Ω2 is empty }
set-union
    { μ1 | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are compatible and expr(merge(μ1, μ2)) is false, or Ω2 is empty }

As these are distinct, the cardinality of LeftJoin is cardinality of these individual components of the definition.

Definition: Union

Let Ω1 and Ω2 be multisets of solution mappings. We define:

Union(Ω1, Ω2) = { μ | μ in Ω1 or μ in Ω2 }

card[Union(Ω1, Ω2)](μ) = card[Ω1](μ) + card[Ω2](μ)

Definition: Minus

@@

Definition: Extend

Let μ be a solution mapping, Ω a multiset of solution mappings, var a variable and expr be an expression [@@link], then we define:

extend(μ, var, expr) = μ set-union { (var,value) | var not in dom(μ) and value = eval(expr) }

extend(μ, var, expr) = μ if var not in dom(μ) and eval(expr) is an error

extend is undefined when var in dom(μ).

extend(Ω , var, term) = { extend(μ, var, term) | μ in Ω }

@@ Define the case for var in dom(μ) (does not arise in SELECT expressions)

Write [x | C] for a sequence of elements where C(x) is true.

Write card[L](x) to be the cardinality of x in L.

Definition: ToList

Let Ω be a multiset of solution mappings. We define:

ToList(Ω) = a sequence of mappings μ in Ω in any order, with card[Ω](μ) occurrences of μ

card[ToList(Ω)](μ) = card[Ω](μ)

Definition: OrderBy

Let Ψ be a sequence of solution mappings. We define:

OrderBy(Ψ, condition) = [ μ | μ in Ψ and the sequence satisfies the ordering condition]

card[OrderBy(Ψ, condition)](μ) = card[Ψ](μ)

Definition: Project

Let Ψ be a sequence of solution mappings and PV a set of variables.

For mapping μ, write Proj(μ, PV) to be the restriction of μ to variables in PV.

Project(Ψ, PV) = [ Proj(Ψ[μ], PV) | μ in Ψ ]

card[Project(Ψ, PV)](μ) = card[Ψ](μ)

The order of Project(Ψ, PV) must preserve any ordering given by OrderBy.

Definition: Distinct

Let Ψ be a sequence of solution mappings. We define:

Distinct(Ψ) = [ μ | μ in Ψ ]

card[Distinct(Ψ)](μ) = 1

The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

Definition: Reduced

Let Ψ be a sequence of solution mappings. We define:

Reduced(Ψ) = [ μ | μ in Ψ ]

card[Reduced(Ψ)](μ) is between 1 and card[Ψ](μ)

The order of Reduced(Ψ) must preserve any ordering given by OrderBy.

The Reduced solution sequence modifier does not guarantee a defined cardinality.

Definition: Slice

Let Ψ be a sequence of solution mappings. We define:

Slice(Ψ, start, length)[i] = Ψ[start+i] for i = 0 to (length-1)

16.5 Evaluation Semantics

We define eval(D(G), graph pattern) as the evaluation of a graph pattern with respect to a dataset D having active graph G. The active graph is initially the default graph.

D : a dataset
D(G) : D a dataset with active graph G (the one patterns match against)
D[i] : The graph with IRI i in dataset D
D[DFT] : the default graph of D
P, P1, P2 : graph patterns
L : a solution sequence
Definition: Evaluation of Filter(F, P)
eval(D(G), Filter(F, P)) = Filter(F, eval(D(G),P))
Definition: Evaluation of Join(P1, P2)
eval(D(G), Join(P1, P2)) = Join(eval(D(G), P1), eval(D(G), P2))
Definition: Evaluation of LeftJoin(P1, P2, F)
eval(D(G), LeftJoin(P1, P2, F)) = LeftJoin(eval(D(G), P1), eval(D(G), P2), F)
Definition: Evaluation of a Basic Graph Pattern
eval(D(G), BGP) = multiset of solution mappings

See section 12.3 Basic Graph Patterns

Definition: Evaluation of a Union Pattern
eval(D(G), Union(P1,P2)) = Union(eval(D(G), P1), eval(D(G), P2))
Definition: Evaluation of a Graph Pattern
if IRI is a graph name in D
eval(D(G), Graph(IRI,P)) = eval(D(D[IRI]), P)
if IRI is not a graph name in D
eval(D(G), Graph(IRI,P)) = the empty multiset
eval(D(G), Graph(var,P)) =
     Let R be the empty multiset
     foreach IRI i in D
        R := Union(R, Join( eval(D(D[i]), P) , Ω(?var->i) )
     the result is R

The evaluation of graph uses the SPARQL algebra union operator. The cardinality of a solution mapping is the sum of the cardinalities of that solution mapping in each join operation.

Definition: Evaluation of Extend
eval(D(G), extend(var, expr, P)) = extend(var, expr , eval(D(G), P))

@@Only defined where var no already bound

Definition: Evaluation of ToList
eval(D, ToList(P)) = ToList(eval(D(D[DFT]), P))
Definition: Evaluation of Distinct
eval(D, Distict(L)) = Distinct(eval(D, L))
Definition: Evaluation of Reduced
eval(D, Reduced(L)) = Reduced(eval(D, L))
Definition: Evaluation of Project
eval(D, Project(L, vars)) = Project(eval(D, L), vars)
Definition: Evaluation of OrderBy
eval(D, OrderBy(L, condition)) = OrderBy(eval(D, L), condition)
Definition: Evaluation of Slice
eval(D, Slice(L, start, length)) = Slice(eval(D, L), start, length)

16.6 Extending SPARQL Basic Graph Matching

The overall SPARQL design can be used for queries which assume a more elaborate form of entailment than simple entailment, by re-writing the matching conditions for basic graph patterns. Since it is an open research problem to state such conditions in a single general form which applies to all forms of entailment and optimally eliminates needless or inappropriate redundancy, this document only gives necessary conditions which any such solution should satisfy. These will need to be extended to full definitions for each particular case.

Basic graph patterns stand in the same relation to triple patterns that RDF graphs do to RDF triples, and much of the same terminology can be applied to them. In particular, two basic graph patterns are said to be equivalent if there is a bijection M between the terms of the triple patterns that maps blank nodes to blank nodes and maps variables, literals and IRIs to themselves, such that a triple ( s, p, o ) is in the first pattern if and only if the triple ( M(s), M(p), M(o) ) is in the second. This definition extends that for RDF graph equivalence to basic graph patterns by preserving variable names across equivalent patterns.

An entailment regime specifies

  1. a subset of RDF graphs called well-formed for the regime
  2. an entailment relation between subsets of well-formed graphs and well-formed graphs.

Examples of entailment regimes include simple entailment [RDF-MT], RDF entailment [RDF-MT], RDFS entailment [RDF-MT], D-entailment [RDF-MT] and OWL Direct and RDF-Based Semantics entailment [Ref: OWL2 semantics]. Of these, only OWL Direct Semantics (OWL-DL) entailment restricts the set of well-formed graphs. If E is an entailment regime then we will refer to E-entailment, E-consistency, etc, following this naming convention.

Some entailment regimes can categorize some RDF graphs as inconsistent. For example, the RDF graph:

_:x rdf:type xsd:string .
_:x rdf:type xsd:decimal .

is D-inconsistent when D contains the XSD datatypes. The effect of a query on an inconsistent graph is not covered by this specification, but must be specified by the particular SPARQL extension.

A SPARQL extension to E-entailment must satisfy the following conditions.

1 -- The scoping graph, SG, corresponding to any consistent active graph AG is uniquely specified up to RDF graph equivalence and is E-equivalent to AG.

2 -- For any basic graph pattern BGP and pattern instance mapping P, P(BGP) is well-formed for E

3 -- For any scoping graph SG and answer set {P1 ... Pn} for a basic graph pattern BGP, and where {BGP1 .... BGPn} is a set of basic graph patterns all equivalent to BGP, none of which share any blank nodes with any other or with SG

SG E-entails (SG union P1(BGP1) union ... union Pn(BGPn))

These conditions do not fully determine the set of possible answers, since RDF allows unlimited amounts of redundancy. In addition, therefore, the following must hold.

4 -- Each SPARQL extension MUST provide conditions on answer sets which guarantee that the set of triples obtained by instantiating BGP with each solution μ is uniquely specified up to RDF graph equivalence, and SHOULD provide further conditions to prevent trivial infinite answers as appropriate to the regime.

16.6.1 Notes

(a) SG will often be graph equivalent to AG, but restricting this to E-equivalence allows some forms of normalization, for example elimination of semantic redundancies, to be applied to the source documents before querying.

(b) The construction in condition 3 ensures that any blank nodes introduced by the solution mapping are used in a way which is internally consistent with the way that blank nodes occur in SG. This ensures that blank node identifiers occur in more than one answer in an answer set only when the blank nodes so identified are indeed identical in SG. If the extension does not allow answer bindings to blank nodes, then this condition can be simplified to the condition:

SG E-entails P(BGP) for each pattern solution P.

(c) These conditions do not impose the SPARQL requirement that SG shares no blank nodes with AG or BGP. In particular, it allows SG to actually be AG. This allows query protocols in which blank node identifiers retain their meaning between the query and the source document, or across multiple queries. Such protocols are not supported by the current SPARQL protocol specification, however.

(d) Since conditions 1 to 3 are only necessary conditions on answers, condition 4 allows cases where the set of legal answers can be restricted in various ways. For example, the current state of the art in OWL-DL querying focusses on the case where answer bindings to blank nodes are prohibited. We note that these conditions even allow the pathological 'mute' case where every query has an empty answer set.

(e) None of these conditions refer explicitly to instance mappings on blank nodes in BGP. For some entailment regimes, the existential interpretation of blank nodes cannot be fully captured by the existence of a single instance mapping. These conditions allow such regimes to give blank nodes in query patterns a 'fully existential' reading.

It is straightforward to show that SPARQL satisfies these conditions for the case where E is simple entailment, given that the SPARQL condition on SG is that it is graph-equivalent to AG but shares no blank nodes with AG or BGP (which satisfies the first condition). The only condition which is nontrivial is (3).

Every answer Pi is the solution mapping restriction of a SPARQL instance Mi such that Mi(BGPi) is a subgraph of SG. Since BGPi and SG have no blank nodes in common, the range of Mi contains no blank nodes from BGPi; therefore, the solution mapping Pi and RDF instance mapping Ii components of Mi commute, so Mi(BGPi) = Ii(Pi(BGPi)). So

M1(BGP1) union ... union Mn(BGPn)
= I1(P1(BGP1)) union ... union In(Pn(BGPn))
= [ I1 + ... + In]( P1(BGP1) union ... union Pn(BGPn) )

since the domains of the Ii instance mappings are all mutually exclusive. Since they are also exclusive from SG,

SG union [ I1 + ... + In]( P1(BGP1) union ... union Pn(BGPn) )
= [ I1 + ... + In](SG union P1(BGP1) union ... union Pn(BGPn) )

i.e.

SG union P1(BGP1) union ... union Pn(BGPn)

has an instance which is a subgraph of SG, so is simply entailed by SG by the RDF interpolation lemma [RDF-MT].

17 SPARQL Grammar

17.1 SPARQL Query String

A SPARQL query string is a Unicode character string (c.f. section 6.1 String concepts of [CHARMOD]) in the language defined by the following grammar, starting with the Query production. For compatibility with future versions of Unicode, the characters in this string may include Unicode codepoints that are unassigned as of the date of this publication (see Identifier and Pattern Syntax [UNIID] section 4 Pattern Syntax). For productions with excluded character classes (for example [^<>'{}|^`]), the characters are excluded from the range #x0 - #x10FFFF.

17.2 Codepoint Escape Sequences

A SPARQL Query String is processed for codepoint escape sequences before parsing by the grammar defined in EBNF below. The codepoint escape sequences for a SPARQL query string are:

EscapeUnicode code point
'\u' HEX HEX HEX HEXA Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the encoded hexadecimal value.
'\U' HEX HEX HEX HEX HEX HEX HEX HEXA Unicode code point in the range U+0 to U+10FFFF inclusive corresponding to the encoded hexadecimal value.

where HEX is a hexadecimal character

HEX ::= [0-9] | [A-F] | [a-f]

Examples:

<ab\u00E9xy>        # Codepoint 00E9 is Latin small e with acute - é
\u03B1:a            # Codepoint x03B1 is Greek small alpha - α
a\u003Ab            # a:b -- codepoint x3A is colon

Codepoint escape sequences can appear anywhere in the query string. They are processed before parsing based on the grammar rules and so may be replaced by codepoints with significance in the grammar, such as ":" marking a prefixed name.

These escape sequences are not included in the grammar below. Only escape sequences for characters that would be legal at that point in the grammar may be given. For example, the variable "?x\u0020y" is not legal (\u0020 is a space and is not permitted in a variable name).

17.3 White Space

White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a SPARQL parser. White space is significant in strings.

For example:

?a<?b&&?c>?d

is the token sequence variable '?a', an IRI '<?b&&?c>', and variable '?d', not a expression involving the operator '&&' connecting two expression using '<' (less than) and '>' (greater than).

17.5 IRI References

Text matched by the IRI_REF production and PrefixedName (after prefix expansion) production, after escape processing, must be conform to the generic syntax of IRI references in section 2.2 of RFC 3987 "ABNF for IRI References and IRIs" [RFC3987]. For example, the IRI_REF <abc#def> may occur in a SPARQL query string, but the IRI_REF <abc##def> must not.

Base IRIs declared with the BASE keyword must be absolute IRIs. A prefix declared with the PREFIX keyword may not be re-declared in the same query. See section 2.1.1, Syntax of IRI Terms, for a description of BASE and PREFIX.

17.7 Escape sequences in strings

In addition to the codepoint escape sequences, the following escape sequences any string production (e.g. STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, STRING_LITERAL_LONG2):

EscapeUnicode code point
'\t'U+0009 (tab)
'\n'U+000A (line feed)
'\r'U+000D (carriage return)
'\b'U+0008 (backspace)
'\f'U+000C (form feed)
'\"'U+0022 (quotation mark, double quote mark)
"\'"U+0027 (apostrophe-quote, single quote mark)
'\\'U+005C (backslash)

Examples:

"abc\n"
"xy\rz"
'xy\tz'

17.8 Grammar

The EBNF notation used in the grammar is defined in Extensible Markup Language (XML) 1.1 [XML11] section 6 Notation.

Keywords are matched in a case-insensitive manner with the exception of the keyword 'a' which, in line with Turtle and N3, is used in place of the IRI rdf:type (in full, http://www.w3.org/1999/02/22-rdf-syntax-ns#type).

Keywords: @@Update for SPARQL 1.1

BASESELECTORDER BYFROMGRAPHSTRisURI
PREFIXCONSTRUCTLIMITFROM NAMEDOPTIONALLANGisIRI
 DESCRIBEOFFSETWHEREUNIONLANGMATCHESisLITERAL
 ASKDISTINCT FILTERDATATYPEREGEX
  REDUCED aBOUNDtrue
     sameTERMfalse
      isBLANK

Escape sequences are case sensitive.

When choosing a rule to match, the longest match is chosen.

  • @@Context sensitive: aggregates
  • @@Context sensitive: update templates (DATA and vars)
[1]  Top  ::=  Prologue ( Query | Update )
[2]  QueryUnit  ::=  Prologue Query
[3]  Query  ::=  SelectQuery | ConstructQuery | DescribeQuery | AskQuery
[4]  Prologue  ::=  BaseDecl? PrefixDecl*
[5]  BaseDecl  ::=  'BASE' IRI_REF
[6]  PrefixDecl  ::=  'PREFIX' PNAME_NS IRI_REF
[7]  SelectQuery  ::=  SelectClause DatasetClause* WhereClause SolutionModifier BindingsClause
[8]  SubSelect  ::=  SelectClause WhereClause SolutionModifier
[9]  SelectClause  ::=  'SELECT' ( 'DISTINCT' | 'REDUCED' )? ( ( Var | ( '(' Expression 'AS' Var ')' ) )+ | '*' )
[10]  ConstructQuery  ::=  'CONSTRUCT' ConstructTemplate DatasetClause* WhereClause SolutionModifier
[11]  DescribeQuery  ::=  'DESCRIBE' ( VarOrIRIref+ | '*' ) DatasetClause* WhereClause? SolutionModifier
[12]  AskQuery  ::=  'ASK' DatasetClause* WhereClause
[13]  DatasetClause  ::=  'FROM' ( DefaultGraphClause | NamedGraphClause )
[14]  DefaultGraphClause  ::=  SourceSelector
[15]  NamedGraphClause  ::=  'NAMED' SourceSelector
[16]  SourceSelector  ::=  IRIref
[17]  WhereClause  ::=  'WHERE'? GroupGraphPattern
[18]  SolutionModifier  ::=  GroupClause? HavingClause? OrderClause? LimitOffsetClauses?
[19]  GroupClause  ::=  'GROUP' 'BY' GroupCondition+
[20]  GroupCondition  ::=  ( BuiltInCall | FunctionCall | '(' Expression ( 'AS' Var )? ')' | Var )
[21]  HavingClause  ::=  'HAVING' HavingCondition+
[22]  HavingCondition  ::=  Constraint
[23]  OrderClause  ::=  'ORDER' 'BY' OrderCondition+
[24]  OrderCondition  ::=   ( ( 'ASC' | 'DESC' ) BrackettedExpression )
| ( Constraint | Var )
[25]  LimitOffsetClauses  ::=  ( LimitClause OffsetClause? | OffsetClause LimitClause? )
[26]  LimitClause  ::=  'LIMIT' INTEGER
[27]  OffsetClause  ::=  'OFFSET' INTEGER
[28]  BindingsClause  ::=  ( 'BINDINGS' Var+ '{' ( '(' BindingValue+ ')' )* '}' )?
[29]  BindingValue  ::=  IRIref | RDFLiteral | NumericLiteral | BooleanLiteral | 'UNDEF'
[30]  UpdateUnit  ::=  Prologue Update
[31]  Update  ::=  Update1+
[32]  Update1  ::=  ( Modify | Load | Clear | Drop | Create ) ';'?
[33]  Modify  ::=  ( 'WITH' IRIref )? ( Insert | Delete )
[34]  Insert  ::=  'INSERT' ( 'DATA' QuadData | QuadTemplate 'WHERE' GroupGraphPattern )
[35]  Delete  ::=  'DELETE' ( 'DATA' QuadData | 'WHERE' QuadTemplate | QuadTemplate ( 'INSERT' QuadTemplate )? 'WHERE' GroupGraphPattern )
[36]  Clear  ::=  'CLEAR' GraphRef
[37]  Load  ::=  'LOAD' IRIref+ ( 'INTO' GraphRef )?
[38]  Drop  ::=  'DROP' 'SILENT'? IRIref
[39]  Create  ::=  'CREATE' 'SILENT'? IRIref
[40]  GraphRef  ::=  'DEFAULT' | IRIref
[41]  QuadTemplate  ::=  '{' Quads '}'
[42]  Quads  ::=  TriplesTemplate? ( QuadsNotTriples '.'? TriplesTemplate? )*
[43]  QuadsNotTriples  ::=  'GRAPH' VarOrIRIref '{' TriplesTemplate '}'
[44]  TriplesTemplate  ::=  TriplesSameSubject ( '.' TriplesTemplate? )?
[45]  QuadData  ::=  '{' Quads '}'
[46]  GroupGraphPattern  ::=  '{' ( SubSelect | GroupGraphPatternSub ) '}'
[47]  GroupGraphPatternSub  ::=  TriplesBlock? ( GraphPatternNotTriples '.'? TriplesBlock? )*
[48]  TriplesBlock  ::=  TriplesSameSubjectPath ( '.' TriplesBlock? )?
[49]  GraphPatternNotTriples  ::=  GroupGraphPattern | OptionalGraphPattern | UnionGraphPattern | MinusGraphPattern | GraphGraphPattern | ServiceGraphPattern | Filter
[50]  OptionalGraphPattern  ::=  'OPTIONAL' GroupGraphPattern
[51]  GraphGraphPattern  ::=  'GRAPH' VarOrIRIref GroupGraphPattern
[52]  ServiceGraphPattern  ::=  'SERVICE' VarOrIRIref GroupGraphPattern
[53]  MinusGraphPattern  ::=  MINUS_P GroupGraphPattern
[54]  UnionGraphPattern  ::=  'UNION' GroupGraphPattern
[55]  Filter  ::=  'FILTER' Constraint
[56]  Constraint  ::=  BrackettedExpression | BuiltInCall | FunctionCall
[57]  FunctionCall  ::=  IRIref ArgList
[58]  ArgList  ::=  ( NIL | '(' 'DISTINCT'? Expression ( ',' Expression )* ')' )
[59]  ExprAggArg  ::=  '(' 'DISTINCT'? Expression ')'
[60]  ExpressionList  ::=  ( NIL | '(' Expression ( ',' Expression )* ')' )
[61]  ConstructTemplate  ::=  '{' ConstructTriples? '}'
[62]  ConstructTriples  ::=  TriplesSameSubject ( '.' ConstructTriples? )?
[63]  TriplesSameSubject  ::=  VarOrTerm PropertyListNotEmpty | TriplesNode PropertyList
[64]  PropertyListNotEmpty  ::=  Verb ObjectList ( ';' ( Verb ObjectList )? )*
[65]  PropertyList  ::=  PropertyListNotEmpty?
[66]  ObjectList  ::=  Object ( ',' Object )*
[67]  Object  ::=  GraphNode
[68]  Verb  ::=  VarOrIRIref | 'a'
[69]  TriplesSameSubjectPath  ::=  VarOrTerm PropertyListNotEmptyPath | TriplesNode PropertyListPath
[70]  PropertyListNotEmptyPath  ::=  ( VerbPath | VerbSimple ) ObjectList ( ';' ( ( VerbPath | VerbSimple ) ObjectList )? )*
[71]  PropertyListPath  ::=  PropertyListNotEmpty?
[72]  VerbPath  ::=  Path
[73]  VerbSimple  ::=  Var
[74]  Path  ::=  PathAlternative
[75]  PathAlternative  ::=  PathSequence ( '|' PathSequence )*
[76]  PathSequence  ::=  PathEltOrInverse ( '/' PathEltOrInverse | '^' PathElt )*
[77]  PathElt  ::=  PathPrimary PathMod?
[78]  PathEltOrInverse  ::=  PathElt | '^' PathElt
[79]  PathMod  ::=  ( '*' | '?' | '+' | '{' ( Integer ( ',' ( '}' | Integer '}' ) | '}' ) ) )
[80]  PathPrimary  ::=  ( IRIref | 'a' | '!' PathNegatedPropertyClass | '(' Path ')' )
[81]  PathNegatedPropertyClass  ::=  ( PathOneInPropertyClass | '(' ( PathOneInPropertyClass ( '|' PathOneInPropertyClass )* )? ')' )
[82]  PathOneInPropertyClass  ::=  IRIref | 'a'
[83]  Integer  ::=  INTEGER
[84]  TriplesNode  ::=  Collection | BlankNodePropertyList
[85]  BlankNodePropertyList  ::=  '[' PropertyListNotEmpty ']'
[86]  Collection  ::=  '(' GraphNode+ ')'
[87]  GraphNode  ::=  VarOrTerm | TriplesNode
[88]  VarOrTerm  ::=  Var | GraphTerm
[89]  VarOrIRIref  ::=  Var | IRIref
[90]  Var  ::=  VAR1 | VAR2
[91]  GraphTerm  ::=  IRIref | RDFLiteral | NumericLiteral | BooleanLiteral | BlankNode | NIL
[92]  Expression  ::=  ConditionalOrExpression
[93]  ConditionalOrExpression  ::=  ConditionalAndExpression ( '||' ConditionalAndExpression )*
[94]  ConditionalAndExpression  ::=  ValueLogical ( '&&' ValueLogical )*
[95]  ValueLogical  ::=  RelationalExpression
[96]  RelationalExpression  ::=  NumericExpression ( '=' NumericExpression | '!=' NumericExpression | '<' NumericExpression | '>' NumericExpression | '<=' NumericExpression | '>=' NumericExpression | 'IN' ExpressionList | 'NOT IN' ExpressionList )?
[97]  NumericExpression  ::=  AdditiveExpression
[98]  AdditiveExpression  ::=  MultiplicativeExpression ( '+' MultiplicativeExpression | '-' MultiplicativeExpression | ( NumericLiteralPositive | NumericLiteralNegative ) ( ( '*' UnaryExpression ) | ( '/' UnaryExpression ) )? )*
[99]  MultiplicativeExpression  ::=  UnaryExpression ( '*' UnaryExpression | '/' UnaryExpression )*
[100]  UnaryExpression  ::=    '!' PrimaryExpression
| '+' PrimaryExpression
| '-' PrimaryExpression
| PrimaryExpression
[101]  PrimaryExpression  ::=  BrackettedExpression | BuiltInCall | IRIrefOrFunction | RDFLiteral | NumericLiteral | BooleanLiteral | Var | Aggregate
[102]  BrackettedExpression  ::=  '(' Expression ')'
[103]  BuiltInCall  ::=    'STR' '(' Expression ')'
| 'LANG' '(' Expression ')'
| 'LANGMATCHES' '(' Expression ',' Expression ')'
| 'DATATYPE' '(' Expression ')'
| 'BOUND' '(' Var ')'
| 'IRI' '(' Expression ')'
| 'URI' '(' Expression ')'
| 'BNODE' ( '(' Expression ')' | NIL )
| 'COALESCE' ExpressionList
| 'IF' '(' Expression ',' Expression ',' Expression ')'
| 'STRLANG' '(' Expression ',' Expression ')'
| 'STRDT' '(' Expression ',' Expression ')'
| 'sameTerm' '(' Expression ',' Expression ')'
| 'isIRI' '(' Expression ')'
| 'isURI' '(' Expression ')'
| 'isBLANK' '(' Expression ')'
| 'isLITERAL' '(' Expression ')'
| RegexExpression
| ExistsFunc
| NotExistsFunc
[104]  RegexExpression  ::=  'REGEX' '(' Expression ',' Expression ( ',' Expression )? ')'
[105]  ExistsFunc  ::=  'EXISTS' GroupGraphPattern
[106]  NotExistsFunc  ::=  'NOT EXISTS' GroupGraphPattern
[107]  Aggregate  ::=  ( 'COUNT' '(' 'DISTINCT'? ( '*' | Expression ) ')' | 'SUM' ExprAggArg | 'MIN' ExprAggArg | 'MAX' ExprAggArg | 'AVG' ExprAggArg | 'SAMPLE' ExprAggArg | 'GROUP_CONCAT' '(' 'DISTINCT'? Expression ( ',' Expression )* ( ';' 'SEPARATOR' '=' String )? ')' )
[108]  IRIrefOrFunction  ::=  IRIref ArgList?
[109]  RDFLiteral  ::=  String ( LANGTAG | ( '^^' IRIref ) )?
[110]  NumericLiteral  ::=  NumericLiteralUnsigned | NumericLiteralPositive | NumericLiteralNegative
[111]  NumericLiteralUnsigned  ::=  INTEGER | DECIMAL | DOUBLE
[112]  NumericLiteralPositive  ::=  INTEGER_POSITIVE | DECIMAL_POSITIVE | DOUBLE_POSITIVE
[113]  NumericLiteralNegative  ::=  INTEGER_NEGATIVE | DECIMAL_NEGATIVE | DOUBLE_NEGATIVE
[114]  BooleanLiteral  ::=  'true' | 'false'
[115]  String  ::=  STRING_LITERAL1 | STRING_LITERAL2 | STRING_LITERAL_LONG1 | STRING_LITERAL_LONG2
[116]  IRIref  ::=  IRI_REF | PrefixedName
[117]  PrefixedName  ::=  PNAME_LN | PNAME_NS
[118]  BlankNode  ::=  BLANK_NODE_LABEL | ANON
[119]  IRI_REF  ::=  '<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
[120]  PNAME_NS  ::=  PN_PREFIX? ':'
[121]  PNAME_LN  ::=  PNAME_NS PN_LOCAL
[122]  BLANK_NODE_LABEL  ::=  '_:' PN_LOCAL
[123]  VAR1  ::=  '?' VARNAME
[124]  VAR2  ::=  '$' VARNAME
[125]  LANGTAG  ::=  '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
[126]  INTEGER  ::=  [0-9]+
[127]  DECIMAL  ::=  [0-9]+ '.' [0-9]* | '.' [0-9]+
[128]  DOUBLE  ::=  [0-9]+ '.' [0-9]* EXPONENT | '.' ([0-9])+ EXPONENT | ([0-9])+ EXPONENT
[129]  INTEGER_POSITIVE  ::=  '+' INTEGER
[130]  DECIMAL_POSITIVE  ::=  '+' DECIMAL
[131]  DOUBLE_POSITIVE  ::=  '+' DOUBLE
[132]  INTEGER_NEGATIVE  ::=  '-' INTEGER
[133]  DECIMAL_NEGATIVE  ::=  '-' DECIMAL
[134]  DOUBLE_NEGATIVE  ::=  '-' DOUBLE
[135]  EXPONENT  ::=  [eE] [+-]? [0-9]+
[136]  STRING_LITERAL1  ::=  "'" ( ([^#x27#x5C#xA#xD]) | ECHAR )* "'"
[137]  STRING_LITERAL2  ::=  '"' ( ([^#x22#x5C#xA#xD]) | ECHAR )* '"'
[138]  STRING_LITERAL_LONG1  ::=  "'''" ( ( "'" | "''" )? ( [^'\] | ECHAR ) )* "'''"
[139]  STRING_LITERAL_LONG2  ::=  '"""' ( ( '"' | '""' )? ( [^"\] | ECHAR ) )* '"""'
[140]  ECHAR  ::=  '\' [tbnrf\"']
[141]  NIL  ::=  '(' WS* ')'
[142]  WS  ::=  #x20 | #x9 | #xD | #xA
[143]  ANON  ::=  '[' WS* ']'
[144]  PN_CHARS_BASE  ::=  [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[145]  PN_CHARS_U  ::=  PN_CHARS_BASE | '_'
[146]  VARNAME  ::=  ( PN_CHARS_U | [0-9] ) ( PN_CHARS_U | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] )*
[147]  PN_CHARS  ::=  PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]
[148]  PN_PREFIX  ::=  PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)?
[149]  PN_LOCAL  ::=  ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)?

Notes:

  1. The SPARQL grammar is LL(1) when the rules with uppercased names are used as terminals.
  2. In signed numbers, no white space is allowed between the sign and the number. The AdditiveExpression grammar rule allows for this by covering the the two cases of an expression followed by a signed number. These produce an addition or substraction of the unsigned number as appropriate.

Some grammar files for some commonly used tools are available here.

18 Conformance

See appendix A SPARQL Grammar regarding conformance of SPARQL Query strings, and section 10 Query Forms for conformance of query results. See appendix E. Internet Media Type for conformance to the application/sparql-query media type.

This specification is intended for use in conjunction with the SPARQL Protocol [SPROT] and the SPARQL Query Results XML Format [RESULTS]. See those specifications for their conformance criteria.

Note that the SPARQL protocol describes an abstract interface as well as a network protocol, and the abstract interface may apply to APIs as well as network interfaces.

19 Security Considerations (Informative)

SPARQL queries using FROM, FROM NAMED, or GRAPH may cause the specified URI to be dereferenced. This may cause additional use of network, disk or CPU resources along with associated secondary issues such as denial of service. The security issues of Uniform Resource Identifier (URI): Generic Syntax [RFC3986] Section 7 should be considered. In addition, the contents of file: URIs can in some cases be accessed, processed and returned as results, providing unintended access to local resources.

SPARQL requests may cause additional requests to be issued from the SPARQL endpoint, such as FROM NAMED. The endpoint is potentially within an organisations firewall or DMZ, and so such queries may be a source of indirection attacks.

The SPARQL language permits extensions, which will have their own security implications.

Multiple IRIs may have the same appearance. Characters in different scripts may look similar (a Cyrillic "о" may appear similar to a Latin "o"). A character followed by combining characters may have the same visual representation as another character (LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT has the same visual representation as LATIN SMALL LETTER E WITH ACUTE). Users of SPARQL must take care to construct queries with IRIs that match the IRIs in the data. Further information about matching of similar characters can be found in Unicode Security Considerations [UNISEC] and Internationalized Resource Identifiers (IRIs) [RFC3987] Section 8.

20 Internet Media Type, File Extension and Macintosh File Type

contact:
Eric Prud'hommeaux
See also:
How to Register a Media Type for a W3C Specification
Internet Media Type registration, consistency of use
TAG Finding 3 June 2002 (Revised 4 September 2002)

The Internet Media Type / MIME Type for the SPARQL Query Language is "application/sparql-query".

It is recommended that sparql query files have the extension ".rq" (all lowercase) on all platforms.

It is recommended that sparql query files stored on Macintosh HFS file systems be given a file type of "TEXT".

Type name:
application
Subtype name:
sparql-query
Required parameters:
None
Optional parameters:
None
Encoding considerations:
The syntax of the SPARQL Query Language is expressed over code points in Unicode [UNICODE]. The encoding is always UTF-8 [RFC3629].
Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-F]
Security considerations:
See SPARQL Query appendix C, Security Considerations as well as RFC 3629 [RFC3629] section 7, Security Considerations.
Interoperability considerations:
There are no known interoperability issues.
Published specification:
This specification.
Applications which use this media type:
No known applications currently use this media type.
Additional information:
Magic number(s):
A SPARQL query may have the string 'PREFIX' (case independent) near the beginning of the document.
File extension(s):
".rq"
Base URI:
The SPARQL 'BASE <IRIref>' term can change the current base URI for relative IRIrefs in the query language that are used sequentially later in the document.
Macintosh file type code(s):
"TEXT"
Person & email address to contact for further information:
public-rdf-dawg-comments@w3.org
Intended usage:
COMMON
Restrictions on usage:
None
Author/Change controller:
The SPARQL specification is a work product of the World Wide Web Consortium's RDF Data Access Working Group. The W3C has change control over these specifications.

A References

A.1 Normative References

[CHARMOD]
Character Model for the World Wide Web 1.0: Fundamentals, R. Ishida, F. Yergeau, M. J. Düst, M. Wolf, T. Texin, Editors, W3C Recommendation, 15 February 2005, http://www.w3.org/TR/2005/REC-charmod-20050215/ . Latest version available at http://www.w3.org/TR/charmod/ .
[CONCEPTS]
Resource Description Framework (RDF): Concepts and Abstract Syntax, G. Klyne, J. J. Carroll, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ . Latest version available at http://www.w3.org/TR/rdf-concepts/ .
[FUNCOP]
XQuery 1.0 and XPath 2.0 Functions and Operators, J. Melton, A. Malhotra, N. Walsh, Editors, W3C Recommendation, 23 January 2007, http://www.w3.org/TR/2007/REC-xpath-functions-20070123/ . Latest version available at http://www.w3.org/TR/xpath-functions/ .
[RDF-MT]
RDF Semantics, P. Hayes, Editor, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ . Latest version available at http://www.w3.org/TR/rdf-mt/ .
[RFC3629]
RFC 3629 UTF-8, a transformation format of ISO 10646, F. Yergeau November 2003
[RFC4647]
RFC 4647 Matching of Language Tags, A. Phillips, M. Davis September 2006
[RFC3986]
RFC 3986 Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter January 2005
[RFC3987]
RFC 3987, "Internationalized Resource Identifiers (IRIs)", M. Dürst , M. Suignard
[UNICODE]
The Unicode Standard, Version 4. ISBN 0-321-18578-1, as updated from time to time by the publication of new versions. The latest version of Unicode and additional information on versions of the standard and of the Unicode Character Database is available at http://www.unicode.org/unicode/standard/versions/.
[XML11]
Extensible Markup Language (XML) 1.1, J. Cowan, J. Paoli, E. Maler, C. M. Sperberg-McQueen, F. Yergeau, T. Bray, Editors, W3C Recommendation, 4 February 2004, http://www.w3.org/TR/2004/REC-xml11-20040204/ . Latest version available at http://www.w3.org/TR/xml11/ .
[XPATH20]
XML Path Language (XPath) 2.0, A. Berglund, S. Boag, D. Chamberlin, M. F. Fernández, M. Kay, J. Robie, J. Siméon, Editors, W3C Recommendation, 23 January 2007, http://www.w3.org/TR/2007/REC-xpath20-20070123/ . Latest version available at http://www.w3.org/TR/xpath20/ .
[XQUERY]
XQuery 1.0: An XML Query Language, S. Boag, D. Chamberlin, M. F. Fernández, D. Florescu, J. Robie, J. Siméon, Editors, W3C Recommendation, 23 January 2007, http://www.w3.org/TR/2007/REC-xquery-20070123/. Latest version available at http://www.w3.org/TR/xquery/ .
[XSDT]
XML Schema Part 2: Datatypes Second Edition, P. V. Biron, A. Malhotra, Editors, W3C Recommendation, 28 October 2004, http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/ . Latest version available at http://www.w3.org/TR/xmlschema-2/ .
[BCP47]
Best Common Practice 47, P. V. Biron, A. Malhotra, Editors, W3C Recommendation, 28 October 2004, http://www.rfc-editor.org/rfc/bcp/bcp47.txt .

A.2 Other References

[CBD]
CBD - Concise Bounded Description, Patrick Stickler, Nokia, W3C Member Submission, 3 June 2005.
[DC]
Expressing Simple Dublin Core in RDF/XML Dublin Core Dublin Core Metadata Initiative Recommendation 2002-07-31.
[Multiset]
Multiset, Wikipedia, The Free Encyclopedia. Article as given on October 25, 2007 at http://en.wikipedia.org/w/index.php?title=Multiset&oldid=163605900. The latest version of this article is at http://en.wikipedia.org/wiki/Multiset.
[OWL-Semantics]
OWL Web Ontology Language Semantics and Abstract Syntax, Peter F. Patel-Schneider, Patrick Hayes, Ian Horrocks, Editors, W3C Recommendation http://www.w3.org/TR/2004/REC-owl-semantics-20040210/. Latest version at http://www.w3.org/TR/owl-semantics/.
[RDFS]
RDF Vocabulary Description Language 1.0: RDF Schema, Dan Brickley, R.V. Guha, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ . Latest version at http://www.w3.org/TR/rdf-schema/ .
[RESULTS]
SPARQL Query Results XML Format, D. Beckett, Editor, W3C Recommendation, 15 January 2008, http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/ . Latest version available at http://www.w3.org/TR/rdf-sparql-XMLres/ .
[SPROT]
SPARQL Protocol for RDF, K. Clark, Editor, W3C Recommendation, 15 January 2008, http://www.w3.org/TR/2008/REC-rdf-sparql-protocol-20080115/ . Latest version available at http://www.w3.org/TR/rdf-sparql-protocol/ .
[TURTLE]
Turtle - Terse RDF Triple Language, Dave Beckett.
[UCNR]
RDF Data Access Use Cases and Requirements, K. Clark, Editor, W3C Working Draft, 25 March 2005, http://www.w3.org/TR/2005/WD-rdf-dawg-uc-20050325/ . Latest version available at http://www.w3.org/TR/rdf-dawg-uc/ .
[UNISEC]
Unicode Security Considerations, Mark Davis, Michel Suignard
[VCARD]
Representing vCard Objects in RDF/XML, Renato Iannella, W3C Note, 22 February 2001, http://www.w3.org/TR/2001/NOTE-vcard-rdf-20010222/ . Latest version is available at http://www.w3.org/TR/vcard-rdf .
[WEBARCH]
Architecture of the World Wide Web, Volume One, I. Jacobs, N. Walsh, Editors, W3C Recommendation, 15 December 2004, http://www.w3.org/TR/2004/REC-webarch-20041215/ . Latest version is available at http://www.w3.org/TR/webarch/ .
[UNIID]
Identifier and Pattern Syntax 4.1.0, Mark Davis, Unicode Standard Annex #31, 25 March 2005, http://www.unicode.org/reports/tr31/tr31-5.html . Latest version available at http://www.unicode.org/reports/tr31/ .
[SPARQL-sem-05]
A relational algebra for SPARQL, Richard Cyganiak, 2005
[SPARQL-sem-06]
Semantics of SPARQL, Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez, 2006

B CVS History

$Log: Overview.html,v $
Revision 1.2  2010/06/01 16:56:33  lfeigenb
fix broken links

Revision 1.1  2010/06/01 15:41:01  lfeigenb
initial checkin

Revision 1.4  2010/05/25 15:02:31  lfeigenb
small pubrules fixes

Revision 1.3  2010/05/25 14:49:11  lfeigenb
pubrules and date tweaks

Revision 1.2  2010/05/24 10:31:34  aseaborne
WD publication HTML

Revision 1.66  2010/05/24 10:25:22  aseaborne
Changes for WD publication

Revision 1.65  2010/05/24 08:06:21  aseaborne
Editorial fix 2010May/0002

Revision 1.64  2010/05/20 19:01:23  aseaborne
Comments from 2010AprJun/0220

Revision 1.63  2010/05/19 21:33:53  sharris2
Fix HTML problems with blockquotes inside ps

Revision 1.62  2010/05/19 19:54:13  aseaborne
Fix link into old spec

Revision 1.61  2010/05/18 18:15:45  sharris2
Fix HTML errors

Revision 1.60  2010/05/18 18:08:34  aseaborne
Fix typo

Revision 1.59  2010/05/18 17:13:19  aseaborne
Update section "negation" to reflect ISSUE-29.

Revision 1.58  2010/05/18 13:51:02  sharris2
Fix subsection numbering in Aggregates section

Revision 1.57  2010/05/18 09:29:18  aseaborne
Fix XSL/CSS to work from same directory - aids publishing portability

Revision 1.56  2010/05/17 21:33:36  aseaborne
Update grammar

Revision 1.55  2010/05/17 20:03:05  aseaborne
Clean up some validation errors

Revision 1.54  2010/05/17 09:43:08  sharris2
Added missing {s to example in Aggregates section

Revision 1.53  2010/05/08 19:21:13  aseaborne
WG comments: see 2010AprJun/0115

Revision 1.52  2010/05/07 22:23:32  aseaborne
Correct MINUS example

Revision 1.51  2010/05/04 10:39:29  aseaborne
Add example for MINUS
Add section about differences in MINUS and NOT EXISTS

Revision 1.50  2010/04/29 16:58:11  sharris2
More work on aggregates, now closer to being a complete description

Revision 1.49  2010/04/28 17:03:25  sharris2
Much more work on aggregates, now mostly hangs together, not yet complete

Revision 1.48  2010/04/27 13:59:17  sharris2
Partial update of Aggregates section to match F2F discussion. Set functions refeined down to GroupConcat

Revision 1.47  2010/04/27 11:44:28  sharris2
Fix references to "aggregate functions".

Revision 1.46  2010/04/27 09:11:02  aseaborne
Minor: Fix example in 2.5

Revision 1.45  2010/04/26 09:22:05  aseaborne
Placeholders for grammar notes

Revision 1.44  2010/04/22 13:02:18  aseaborne
Typo in grammar section

Revision 1.43  2010/04/22 12:47:38  aseaborne
Edit BGP extension conditions 1 and 4

Revision 1.42  2010/04/16 09:33:56  aseaborne
More editting for SELECT expressions and algebra definitions

Revision 1.41  2010/04/15 22:20:01  aseaborne
Moved definitions for SELECT expression into various definitions sections.

Add @@ placeholders

Revision 1.40  2010/04/01 13:59:07  aseaborne
Added more placeholders

Revision 1.39  2010/03/30 08:33:28  aseaborne
Add placeholders for SPARQL 1.1 extra functions

Revision 1.37  2010/03/30 08:29:52  aseaborne
Add placeholders for SPARQL 1.1 extra functions

Revision 1.36  2010/02/26 10:21:04  aseaborne
Add @@ to note library functions to be documented

Revision 1.35  2010/01/26 16:13:16  aseaborne
Put in references from SPARQL 1.0 to fix broken links

Revision 1.34  2010/01/24 15:24:04  apollere2
Commented


Revision 1.33  2010/01/22 01:15:08  apollere2
Changed previous version link, pubrules complained about
different shortname and we have a new previous version.

Revision 1.32  2010/01/22 01:05:50  apollere2
Changed previous version to FPWD sparql 1.1

Revision 1.31  2010/01/22 00:49:06  apollere2
Fixed some validation error.

Revision 1.30  2010/01/06 13:59:51  aseaborne
Add previous editor

Revision 1.29  2010/01/05 13:42:05  aseaborne
Corrections in response to 2010JanMar/0022. See 2010JanMar/0025.

Revision 1.28  2010/01/05 11:01:08  aseaborne
Editorial corrections

Revision 1.27  2010/01/05 10:57:17  sharris2
Fixed typo SELCT -> SELECT
Fixed error in query in §10
Added text about variable scope in subqueries to end of §10

Revision 1.26  2010/01/04 16:16:41  aseaborne
Fix markup

Revision 1.25  2010/01/04 16:04:34  aseaborne
Editorial fixes from 2010JanMar/0001.  See 2010JanMar/0014.

Revision 1.24  2010/01/04 14:12:53  aseaborne
Editorial fixes from 2010JanMar/0000.

Revision 1.23  2010/01/04 11:30:00  sharris2
Fix english in §9 (aggregateFunctions)
Fix defn. of key(), added ref. to ISSUE-53 in §9.2
Changed 2nd subquery example in §10 (subqueries)
Fixed typo funstion -> function
           reuslting -> resulting

Revision 1.22  2009/12/30 21:30:26  aseaborne
Put in editors and document name

Revision 1.21  2009/12/22 12:23:59  sharris2
Added paragraph to Security Considations section about indirection attacks to
close ACTION-135

Revision 1.20  2009/12/21 15:06:31  aseaborne
Editorial

Revision 1.19  2009/12/21 14:56:23  aseaborne
Editorial

Revision 1.18  2009/12/21 13:00:00  sharris2
Cleanup section heading for aggregates

Revision 1.17  2009/12/21 12:51:06  sharris2
Added section on subqueries from FPWD

Revision 1.16  2009/12/21 12:16:20  sharris2
Added text explaining the rules around projecting in aggregated queries

Revision 1.15  2009/12/20 19:49:37  sharris2
Removed dead link marker

Revision 1.14  2009/12/20 19:48:09  sharris2
Added section on aggregate functions

Revision 1.13  2009/12/19 18:35:03  aseaborne
Typo

Revision 1.12  2009/12/19 18:23:02  aseaborne
Update abstract

Revision 1.11  2009/12/16 13:12:53  aseaborne
Editorial

Revision 1.10  2009/12/16 13:10:55  aseaborne
Added content for NOT EXISTS as a new section.

Changed affilation for Andy

Make use of bold in definitions consistent.

Revision 1.9  2009/12/14 14:20:50  aseaborne
Put SELECT expression text into SELECT section.

Fixup <pre> (had leading blank line)

Revision 1.8  2009/12/14 06:25:53  lfeigenb
add relative path for xmlspec.dtd

Revision 1.7  2009/12/07 17:12:02  aseaborne

12.1.6 Solution Mapping
Remove bold on Solution Mapping and Solution Sequence

Add "isBLANK" to keyword table.

Revision 1.6  2009/12/07 16:56:05  aseaborne
Missed applying s/non-unique/non-distinct/

Fixed section depth in 10.2.1, .2, .3

Revision 1.5  2009/12/07 16:31:50  aseaborne
Errata applied (SPARQL 1.0):

Sections numbers refer to SPARQL 1.0:

See Wiki Errata page.

1-- 11.4.1 
SELECT ?name  .., ?givenName 
Should be SELECT ?givenName

2-- TOC /Restricting the Value/ should be /Values/
TOC is now automatcially created

3-- 12.1.7 REDUCED
s/non-unique/non-distinct/

4-- 9.1 Order By
Remove incorrect example (was third in list)

5-- 9.3.1 DISTINCT
s/solution set/solution sequence/

6-- 12.4 SPARQL Algebra (Left Join showed in full)
Added "or Ω2 is empty" to cases 2 and 3.

7-- 10.3 ASK
Fix SPARQL XML Results example

8-- 11.3 Operator Mapping of SameTerm
s/sameTERM(A)/sameTERM(A, B)/In 9.3.2 REDUCED, s/an REDUCED/a REDUCED

9-- 12 Definition of SPARQL
s/outcome of executing a SPARQL/outcome of executing a SPARQL query/

10-- 12.6 Extending SPARQL Basic Graph Matching
s/pattern solution mapping/pattern instance mapping/

11-- 12.1.6 Definition: Solution Mapping
s/V -> T/V -> RDF-T/

12-- 12.2.1
s/the point at the simplification step/the point at which the simplification step/

13-- 9.3.2 REDUCED
s/an REDUCED/a REDUCED

14-- 12.2.1 Converting Graph Patterns
s/a SPARQL graph patterns/a SPARQL graph pattern/

15-- In 12.6 Extending SPARQL Basic Graph Matching,
s/share no/shares no/

Revision 1.4  2009/11/08 17:07:09  aseaborne
Use common XML processing from ../shared

Revision 1.3  2009/09/29 15:40:39  eric
...

Revision 1.2  2009/09/29 15:31:36  eric
...

Revision 1.1  2009/09/29 15:27:28  eric
CREATED

Revision 1.7  2009/09/01 14:45:46  eric
~ fixed prev version link

Revision 1.6  2009/09/01 14:25:51  eric
~ abandoning relative refs to the xmlspec DTD

Revision 1.5  2009/09/01 14:22:20  eric
~ trying validating with relative refs to the TR/2008/REC-xml-20081126/xmlspec.dtd DTD

Revision 1.4  2009/09/01 14:20:12  eric
~ experimenting with boundries on the CVS log

Revision 1.3  2009/09/01 14:15:51  eric
+ cvs log

Revision 1.2  2009/09/01 14:13:54  eric
+ sections for Subqueries, Negation, Project Expressions