W3C W3C Member Submission

XSPARQL: Semantics

W3C Member Submission 20 January 2009

This version:
http://www.w3.org/Submission/2009/SUBM-xsparql-semantics-20090120/
Latest version:
http://www.w3.org/Submission/xsparql-semantics/
Authors:
Thomas Krennwallner - Institute of Information Systems, Vienna University of Technology
Nuno Lopes - DERI, NUI Galway
Axel Polleres - DERI, NUI Galway

This work is supported by Science Foundation Ireland under grants number SFI/02/CE1/I131 and SFI/08/CE/I1380 and under the European Commission European FP6 project inContext (IST-034718).

Valid XHTML + RDFa

This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.


Abstract

XSPARQL is a query language combining XQuery and SPARQL for transformations between RDF and XML. This document defines the semantics of XSPARQL.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a part of the XSPARQL Submission which comprises five documents:

  1. XSPARQL Language Specification
  2. XSPARQL: Semantics (this document)
  3. XSPARQL: Implementation and Test-cases
  4. XSPARQL: Use cases
  5. Examples, Test cases and Use cases (ZIP Archive)

By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.


Table of Contents


1. Introduction

This document defines the semantics of XSPARQL based on the XQuery semantics [XQUERYSEMANTICS]. Each pragma-free XQuery is an XSPARQL query and has the same result under both XQuery and XSPARQL semantics.

2. XSPARQL Semantics

The semantics of XSPARQL follows the formal treatment of the XQuery semantics [XQUERYSEMANTICS]. We adopt the notation provided there and define XSPARQL semantics by means of respective normalization mapping rules and inference rules.

We define dynamic evaluation inference rules for a new built-in function fs:sparql which evaluates SPARQL queries according to the SPARQL semantics, cf. [SPARQL]. Other modifications include normalization of the XSPARQL constructs to XQuery expressions. This means that we do do not need new grammar productions but only those defined in the XQuery Core syntax.

2.1. FLWOR' Expressions

The XSPARQL syntax [XSPARQLLANGUAGE] defines, together with the XQuery FLWOR expression, a new for-loop for iterating over SPARQL results: SparqlForClause. This object stands at the same level as XQuery's for and let expressions, i.e., such type of clauses are allowed to start new FLWOR' expressions, or may occur inside nested XSPARQL queries.

To this end, our new normalization mapping rules [·]Expr' inherit from the definitions of XQuery's [·]Expr mapping rules and overload some expressions to accommodate XSPARQL's new syntactic objects. The semantics of XSPARQL expressions hence stands on top of XQuery's semantics.

A single SparqlForClause is normalized as follows:

[ for $VarName1 ... $VarNamen DatasetClause where
GroupGraphPattern SolutionModifier ReturnClause
]Expr'
==
[
let $_aux_queryresult :=
  [ VarName1 ... VarNamen DatasetClause
where GroupGraphPattern SolutionModifier
]SparqlQuery
for $_aux_result in $_aux_queryresult//_sparql_result:result
  [VarName1]SparqlResult
            ⋮
  [VarNamen]SparqlResult
ReturnClause
]Expr

Here, [·]SparqlQuery and [·]SparqlResult are auxiliary mapping rules for expanding the given expressions:

[$VarName]SparqlResult
==
let $_VarName_Node := $_aux_result/_sparql_result:binding[@name = "VarName"]
let $VarName := data($_VarName_Node/*)
let $_VarName_NodeType := name($_VarName_Node/*)
let $_VarName_NodeDatatype := string($_VarName_Node/*/@datatype)
let $_VarName_NodeLang := string($_VarName_Node/*/@lang)
let $_VarName_RDFTerm := _rdf_term($_VarName_Node)

here, for each SPARQL variable:

  and

  [ $VarName1 ... $VarNamen DatasetClause
where GroupGraphPattern SolutionModifier
]SparqlQuery
==
fs:sparql([ fs:serialize("SELECT $VarName1 ... $VarNamen", DatasetClause, " where { ",
    fs:serialize( GroupGraphPattern ), " } SolutionModifier")
]Expr')

The function _rdf_term($_VarName_Node) is defined as follows:

statEnv |- $_VarName_Node bound

[_rdf_term($_VarName_Node)]Expr =
statEnv |- [
if ($_VarName_NodeType = "literal") then
fn:concat("""", $VarName ,"""",
if ($_VarName_NodeLang) then fn:concat("@", $_VarName_NodeLang) else "",
if ($_VarName_NodeDatatype) then fn:concat("^^<", $_VarName_NodeDatatype,">") else "",
else if ($_VarName_NodeType = "bnode") then fn:concat("_:", $VarName)
else if ($_VarName_NodeType = "uri") then fn:concat("<", $VarName , ">")
else ""
]Expr

We now define the meaning of fs:sparql. It is, following the style of [XQUERYSEMANTICS], an abstract function which returns a SPARQL query result XML document [SPARQLPROTOCOL] expected as the result of a SPARQL select query. I.e., the result of fs:sparql conforms to the XML Schema definition http://www.w3.org/2007/SPARQL/result.xsd. We further assume that the _sparql_result: namespace is declared implicitly in each XSPARQL query and tied to the namespace URI http://www.w3.org/2007/SPARQL/result#.

fs:sparql($query as xs:string) as document-node(schema-element(_sparql_result:sparql))

Firstly, fs:sparql implicitly expands PrefixedNames within the SPARQL query string it is given, according to the namespace declarations in the prolog of XSPARQL queries.

Static typing rules apply here according to the rules given in the XQuery semantics.

The fs:serialize function behaves like the fn:concat function on arguments of xs:anyAtomicType, cf. [XPATHFUNCT], and serializes arguments bound to XML nodes to xs:string by converting them to a corresponding string in the lexical space of rdf:XMLLiteral as defined in [RDFCONCEPTS]. This ensures that the structure of the XML literals is preserved in the output, while the conversion to xs:string of fn:concat only retains the text data of the XML Literal.

Since this function must be evaluated according to the SPARQL semantics, we need to get the value of fs:sparql in the dynamic evaluation semantics of XSPARQL.

The built-in function fs:sparql applied to Value1 yields Value

dynEnv |- function fs:sparql with types (xs:string) on values (Value1) yields Value

In case of error (for instance, the query string is not syntactically correct, or the DatasetClause cannot be accessed), fs:sparql issues an error:

Value1 cannot be evaluated according to SPARQL semantics

dynEnv |- function fs:sparql with types (xs:string) on values (Value1) yields fn:error()

The only remaining part is defining the semantics of a GroupGraphPattern using our extended [·]Expr'. This mapping rule takes care that variables in scope of XSPARQL expressions are properly substituted using the evaluation mechanism of XQuery. To this end, we assume that [·]Expr' takes expressions in SPARQL's GroupGraphPattern syntax and constructs a sequence of strings and variables, by applying the auxiliary mapping rule [·]VarSubst to each of the graph pattern's variables. This rule looks up bound variables from the static environment and possibly replaces them to variables or to a string expression, where the value of the string is the name of the variable. This has the effect that unbound variables in GroupGraphPattern will be evaluated by SPARQL instead of XQuery. The statical semantics for [·]VarSubst is defined below using the next inference rules. They use the new judgement $VarName bound, which holds if the variable $VarName is bound in the current static environment.

statEnv |- $_VarName_RDFTerm bound

statEnv |- [$VarName]VarSubst = [$_VarName_RDFTerm]Expr

 

statEnv |- $VarName bound          statEnv |- not($_VarName_RDFTerm bound)

statEnv |- [$VarName]VarSubst = [$VarName]Expr

 

statEnv |- not($VarName bound)          statEnv |- not($_VarName_RDFTerm bound)

statEnv |- [$VarName]VarSubst = ["$VarName"]Expr

Next, we define the normalization of for expressions. In order to handle blank nodes appropriately in construct expressions, we need to decorate the variables of standard XQuery for-expressions with position variables. First, we must normalize for-expressions to core for-loops:

[ for $VarName1 OptTypeDeclaration1 OptPositionalVar1 in Expr1, ...,
$VarNamen OptTypeDeclarationn OptPositionalVarn in Exprn ReturnClause
]Expr'
==
[ for $VarName1 OptTypeDeclaration1 OptPositionalVar1 in Expr1 return

for $VarNamen OptTypeDeclarationn OptPositionalVarn in Exprn
ReturnClause
]Expr'

Now we can apply our decoration of the core for-loops (without position variables) recursively:

[ for $VarNamei OptTypeDeclarationi in Expri ReturnClause ]Expr'
==
[for $VarNamei OptTypeDeclarationi at $_VarNamei_Pos in [Expri]Expr' [ReturnClause]Expr' ]Expr

Similarly, let expressions must be normalized as follows:

[ let $VarName1 OptTypeDeclaration1 := Expr1, ...,
$VarNamen OptTypeDeclarationn := Exprn ReturnClause
]Expr'
==
[ let $VarName1 OptTypeDeclaration1 := Expr1 return

let $VarNamen OptTypeDeclarationn := Exprn
ReturnClause
]Expr'

Now we can recursively apply [·]Expr' on the core let-expressions:

[ let $VarNamei OptTypeDeclarationi := Expri ReturnClause ]Expr'
==
[ let $VarNamei OptTypeDeclarationi := [Expri]Expr' [ReturnClause]Expr' ]Expr

We do not specify where and order by clauses here, as they can be handled similarly as above let and for expressions.

2.2. CONSTRUCT Expressions

We define now the semantics for the ReturnClause. Expressions of form return Expr are evaluated as defined in the XQuery semantics. Stand-alone construct-clauses are normalized as follows:

[ construct ConstructTemplate ]Expr'
==
[ return fs:serialize([ConstructTemplate]SubjPredObjList) ]Expr

The auxiliary mapping rule [·]SubjPredObjlist rewrites variables and blank nodes inside of ConstructTemplates using the normalization mapping rules [·]Subject, [·]PredObjList, and [·]ObjList. They use the judgements expr is valid subject, valid predicate, and valid object, which holds if the expression expr is, according to the RDF specification [RDFCONCEPTS], a valid subject, predicate, and object, resp: i.e., subjects must be bound and not literals, predicates, must be bound, not literals and not blank nodes, and objects must be bound. If, for any reason, one criterion fails, the triple containing the ill-formed expression will be removed from the output. Free variables in the construct are unbound, hence triples containing such variables must be removed too. The boundness condition can be checked at runtime by wrapping each variable and FLWOR' into a fn:empty() assertion, which removes the corresponding triple from the ConstructTemplate output. Next, we sketch only some of the normalization rules; the missing rules should be clear from the context:

statEnv |- VarOrTerm is valid subject

statEnv |-     [ VarOrTerm PropertyListNotEmpty ]SubjPredObjList     ==
[ fs:serialize([VarOrTerm]Subject, [PropertyListNotEmpty]PredObjlist) ]Expr

 

[ [ PropertyListNotEmpty ] ]SubjPredObjList
==
statEnv |- [ fs:serialize("[ ", [PropertyListNotEmpty]PredObjectList, " ]") ]Expr

 

statEnv |- Verb is valid predicate
statEnv |- Object1 is valid object

statEnv |- Objectn is valid object

statEnv |-     [ Verb Object1, ..., Objectn ]PredObjectList     ==
[ fs:serialize([Verb]Expr', ",", [Object1]Expr',",",...,",",[Objectn]Expr') ]Expr

Otherwise, if one of the premises is not true, we suppress the generation of this triple. One of the negated rules is the following:

statEnv |- not(VarOrTerm is valid subject)

statEnv |- [VarOrTerm PropertyListNotEmpty]SubjPredObjList = [""]Expr

The normalization for subjects, verbs, and objects according to [·]Expr' is similar to GroupGraphPattern: all variables in it will be replaced using [·]VarSubst.

Blank nodes inside of construction templates must be treated carefully by adding position variables from surrounding for expressions. To this end, we use [·]BNodeSubst. Since we normalize every for-loop by attaching position variables, we just need to retrieve the available position variables from the static environment. We assume a new static environment component statEnv.posVars which holds - similar to the statEnv.varType component - all in-context positional variables in the given static environment, that is, the variables defined in the at clause of any enclosing for loop.

statEnv |- statEnv.posVars = VarName1_Pos, ..., VarNamen_Pos

statEnv |- [_:BNodeName]BNodeSubst = [fs:serialize("_:",BNodeName,"_", VarName1_Pos , ..., VarNamen_Pos ) ]Expr

2.3. SPARQL Filter Operators

SPARQL filter expressions in WHERE GroupGraphPattern are evaluated using fs:sparql. But we additionally allow the following functions inherited from SPARQL in XSPARQL:

BOUND($A as xs:string) as xs:boolean
isIRI($A as xs:string) as xs:boolean
isBLANK($A as xs:string) as xs:boolean
isLITERAL($A as xs:string) as xs:boolean
LANG($A as xs:string) as xs:string
DATATYPE($A as xs:string) as xs:anyURI

The semantics of above functions is defined as follows:

statEnv |- $_Varname_Node bound

statEnv |- [BOUND($VarName)]Expr' = [if (fn:empty($_Varname_Node)) then fn:false() else fn:true()]Expr

 

statEnv |- $_Varname_NodeType bound

statEnv |- [isIRI($VarName)]Expr' = [ if (fn:empty($_Varname_NodeType = "uri"))
then fn:false() else fn:true()
]Expr

 

statEnv |- $_Varname_NodeType bound

statEnv |- [isBLANK($VarName)]Expr' = [ if (fn:empty($_Varname_NodeType = "blank"))
then fn:false() else fn:true()
]Expr

 

statEnv |- $_Varname_NodeType bound

statEnv |- [isLITERAL($VarName)]Expr' = [ if (fn:empty($_Varname_NodeType = "literal"))
then fn:false() else fn:true()
]Expr

 

statEnv |- $_Varname_Node bound

statEnv |- [LANG($VarName)]Expr' = [fn:string($_Varname_Node/@xml:lang)]Expr

 

statEnv |- $_Varname_Node bound

statEnv |- [DATATYPE($VarName)]Expr' = [$_Varname_Node/@datatype]Expr

3. Correspondence between XSPARQL and XQuery

XSPARQL syntactically subsumes XQuery and - taking into account the shortcut notations defined in Section 4 of [XSPARQLLANGUAGE] - SPARQL construct queries. Concerning semantics, XSPARQL equally builds on top of its constituent languages. As shown above, we have extended the formal semantics of XQuery from [XQUERYSEMANTICS] by additional reduction rules which reduce each XSPARQL query to XQuery expressions which operate on results of SPARQL queries in the SPARQL's XML result format [SPARQLPROTOCOL].

Since we add only new reduction rules for SPARQL-like heads and bodies, it is easy to see that each native XQuery is treated in a semantically equivalent way in XSPARQL. The only thing affecting native XSPARQL queries are the "decoration" rules, which however, do not affect the final query result

Proposition 1. Each pragma-free XQuery is an XSPARQL query and has the same result under both XQuery and XSPARQL semantics

Proof (Sketch). As easily seen, given an XSPARQL query falling in the XQuery fragment, the result of our normalization is again an XQuery. Note that, however, even this fragment, our additional rewriting rules do change the original query in some cases. More concretely, what happens is that by our "decoration" rule each position-variable free for loop (i.e., that does not have an at clause) is decorated with a new position variable. As these new position variables start with an underscore they cannot occur in the original query, so this rewriting does not interfere with the semantics of the original query. The only rewriting rules which use the newly created position variables are those for rewriting blank nodes in construct parts, i.e., the [·]BNodeSubst rule. However, this rule only applies to XSPARQL queries which fall outside the native XQuery fragment.

A similar correspondence holds for native SPARQL queries. Let us now sketch the proof showing the equivalence of XSPARQL's semantics and the evaluation of rewritten SPARQL queries into native XQuery. Intuitively, we "inherit" the SPARQL semantics from the fs:sparql "oracle".

Let Ω denote a solution sequence of a an abstract SPARQL query q =(E, DS, R) where E is a SPARQL algebra expression, DS is an RDF Dataset and R is a set of variables called the query form (cf. [SPARQL]). Then, by SPARQLResult(Ω) we denote the SPARQL result XML format representation of Ω.

We are now ready to state some properties about our transformations. The following proposition states that any SPARQL select query can be equivalently viewed as an XSPARQL F'DWMR query.

Proposition 2. Let q = (EWM,DS,$x1,...,$xn) be a SPARQL query of the form select $x1,...,$xn DWM, where we denote by DS the RDF dataset (cf. [SPARQL]) corresponding to the DatasetClause (D), by G the respective default graph of DS, and by EWM the SPARQL algebra expression corresponding to WM and P be the pattern defined in the where part (W). If eval(DS(G), q) = Ω1, and

statEnv; dynEnv |- for $x1 ... $xn from D(G) where P return ($x1, ..., $xn) ⇒ Ω2.

Then, Ω1 ≡ Ω2 modulo representation. Here, by equivalence (≡) modulo representation we mean that both Ω1 and Ω2 represent the same sequences of (partial) variable bindings.

Proof (Sketch). By the rule

[ for $x1 ... $xn from D(G) where P return ($x1, ..., $xn) ]Expr'
==
[ let $aux_queryresult := [·]SparqlQuery ... for ... [·]SparqlResult ... return ($x1, ..., $xn) ]Expr'

[·]SparqlQuery builds q as string without replacing any variable, since all variables in P are free. Then, the resulting string is applied to fs:sparql, which - since q was unchanged - by definition returns exactly SPARQLResult(Ω1), and thus the return part return ($x1, ..., $xn) which extracts Ω2 is obviously just a representational variant of Ω1.

By similar arguments, we can see that SPARQL's construct queries are treated semantically equivalent in XSPARQL and in SPARQL, taking into account the shortcut notations defined in Section 4 of [XSPARQLLANGUAGE]. The idea here is that the rewriting rules constructs from Section 2.2 extract exactly the triples from the solution sequence from the body defined as defined in the SPARQL semantics [SPARQL].


4. References

[RDFCONCEPTS]
Graham Klyne and Jeremy Carroll (eds.). Resource Description Framework (RDF): Concepts and Abstract Syntax, February 2004. W3C Recommendation, available at http://www.w3.org/TR/rdf-concepts/.
[SPARQL]
Eric Prud'hommeaux and Andy Seaborne (eds.). SPARQL Query Language for RDF, 15 January 2008. W3C Recommendation, available at http://www.w3.org/TR/rdf-sparql-query/.
[SPARQLPROTOCOL]
Kendall Grant Clark, Lee Feigenbaum, and Elias Torres. SPARQL Protocol for RDF, 15 January 2008. W3C Recommendation, available at http://www.w3.org/TR/rdf-sparql-protocol/.
[XPATHFUNCT]
Ashok Malhotra, Jim Melton, and Norman Walsh (eds.). XQuery 1.0 and XPath 2.0 Functions and Operators. W3C Recommendation, 23 January 2007. Available at http://www.w3.org/TR/xpath-functions/.
[XQUERYSEMANTICS]
Denise Draper, Peter Fankhauser, Mary Fernández, Ashok Malhotra, Kristoffer Rose, Michael Rys, Jérôme Siméon, and Philip Wadler. XQuery 1.0 and XPath 2.0 Formal Semantics. W3c recommendation, W3C, January 2007. W3C Recommendation, available at http://www.w3.org/TR/xquery-semantics/.
[XSPARQLLANGUAGE]
XSPARQL Language Specification. Document included in the present W3C Member Submission.