Copyright © 2009 by DERI Galway at the National University of Ireland, Galway, Ireland.
This work is supported by Science Foundation Ireland under grants number SFI/02/CE1/I131 and SFI/08/CE/I1380 and under the European Commission European FP6 project inContext (IST-034718).
This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.
XSPARQL is a query language combining XQuery and SPARQL for transformations between RDF and XML. This document defines the semantics of XSPARQL.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a part of the XSPARQL Submission which comprises five documents:
By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.
This document defines the semantics of XSPARQL based on the XQuery semantics [XQUERYSEMANTICS]. Each pragma-free XQuery is an XSPARQL query and has the same result under both XQuery and XSPARQL semantics.
The semantics of XSPARQL follows the formal treatment of the XQuery semantics [XQUERYSEMANTICS]. We adopt the notation provided there and define XSPARQL semantics by means of respective normalization mapping rules and inference rules.
We define dynamic evaluation inference rules for a new built-in function fs:sparql which evaluates SPARQL queries according to the SPARQL semantics, cf. [SPARQL]. Other modifications include normalization of the XSPARQL constructs to XQuery expressions. This means that we do do not need new grammar productions but only those defined in the XQuery Core syntax.
The XSPARQL syntax [XSPARQLLANGUAGE] defines, together with the XQuery FLWOR expression, a new for-loop for iterating over SPARQL results: SparqlForClause. This object stands at the same level as XQuery's for and let expressions, i.e., such type of clauses are allowed to start new FLWOR' expressions, or may occur inside nested XSPARQL queries.
To this end, our new normalization mapping rules [·]Expr' inherit from the definitions of XQuery's [·]Expr mapping rules and overload some expressions to accommodate XSPARQL's new syntactic objects. The semantics of XSPARQL expressions hence stands on top of XQuery's semantics.
A single SparqlForClause is normalized as follows:
|
|||||||||||||
== | |||||||||||||
|
Here, [·]SparqlQuery and [·]SparqlResult are auxiliary mapping rules for expanding the given expressions:
[$VarName]SparqlResult |
== |
let $_VarName_Node := $_aux_result/_sparql_result:binding[@name = "VarName"] let $VarName := data($_VarName_Node/*) let $_VarName_NodeType := name($_VarName_Node/*) let $_VarName_NodeDatatype := string($_VarName_Node/*/@datatype) let $_VarName_NodeLang := string($_VarName_Node/*/@lang) let $_VarName_RDFTerm := _rdf_term($_VarName_Node) |
here, for each SPARQL variable:
and
| |||
== | |||
|
The function _rdf_term($_VarName_Node) is defined as follows:
statEnv |- $_VarName_Node bound | ||||||||||||||||
[_rdf_term($_VarName_Node)]Expr = | ||||||||||||||||
|
We now define the meaning of fs:sparql. It is, following the style of [XQUERYSEMANTICS], an abstract function which returns a SPARQL query result XML document [SPARQLPROTOCOL] expected as the result of a SPARQL select query. I.e., the result of fs:sparql conforms to the XML Schema definition http://www.w3.org/2007/SPARQL/result.xsd. We further assume that the _sparql_result: namespace is declared implicitly in each XSPARQL query and tied to the namespace URI http://www.w3.org/2007/SPARQL/result#.
fs:sparql($query as xs:string) as document-node(schema-element(_sparql_result:sparql))
Firstly, fs:sparql implicitly expands PrefixedNames within the SPARQL query string it is given, according to the namespace declarations in the prolog of XSPARQL queries.
Static typing rules apply here according to the rules given in the XQuery semantics.
The fs:serialize function behaves like
the fn:concat
function on arguments of xs:anyAtomicType
,
cf. [XPATHFUNCT], and serializes arguments bound to
XML nodes to xs:string
by converting them to a corresponding
string in the lexical space of rdf:XMLLiteral as defined in
[RDFCONCEPTS]. This ensures that the structure of
the XML literals is preserved in the output, while the conversion
to xs:string
of fn:concat
only retains the text data
of the XML Literal.
Since this function must be evaluated according to the SPARQL semantics, we need to get the value of fs:sparql in the dynamic evaluation semantics of XSPARQL.
The built-in function fs:sparql applied to Value1 yields Value |
dynEnv |- function fs:sparql with types (xs:string) on values (Value1) yields Value |
In case of error (for instance, the query string is not syntactically correct, or the DatasetClause cannot be accessed), fs:sparql issues an error:
Value1 cannot be evaluated according to SPARQL semantics |
dynEnv |- function fs:sparql with types (xs:string) on values (Value1) yields fn:error() |
The only remaining part is defining the semantics of a GroupGraphPattern using our extended [·]Expr'. This mapping rule takes care that variables in scope of XSPARQL expressions are properly substituted using the evaluation mechanism of XQuery. To this end, we assume that [·]Expr' takes expressions in SPARQL's GroupGraphPattern syntax and constructs a sequence of strings and variables, by applying the auxiliary mapping rule [·]VarSubst to each of the graph pattern's variables. This rule looks up bound variables from the static environment and possibly replaces them to variables or to a string expression, where the value of the string is the name of the variable. This has the effect that unbound variables in GroupGraphPattern will be evaluated by SPARQL instead of XQuery. The statical semantics for [·]VarSubst is defined below using the next inference rules. They use the new judgement $VarName bound, which holds if the variable $VarName is bound in the current static environment.
statEnv |- $_VarName_RDFTerm bound |
statEnv |- [$VarName]VarSubst = [$_VarName_RDFTerm]Expr |
statEnv |- $VarName bound statEnv |- not($_VarName_RDFTerm bound) |
statEnv |- [$VarName]VarSubst = [$VarName]Expr |
statEnv |- not($VarName bound) statEnv |- not($_VarName_RDFTerm bound) |
statEnv |- [$VarName]VarSubst = ["$VarName"]Expr |
Next, we define the normalization of for expressions. In order to handle blank nodes appropriately in construct expressions, we need to decorate the variables of standard XQuery for-expressions with position variables. First, we must normalize for-expressions to core for-loops:
|
|||
== | |||
|
Now we can apply our decoration of the core for-loops (without position variables) recursively:
[ for $VarNamei OptTypeDeclarationi in Expri ReturnClause ]Expr' |
== |
[for $VarNamei OptTypeDeclarationi at $_VarNamei_Pos in [Expri]Expr' [ReturnClause]Expr' ]Expr |
Similarly, let expressions must be normalized as follows:
|
|||
== | |||
|
Now we can recursively apply [·]Expr' on the core let-expressions:
[ let $VarNamei OptTypeDeclarationi := Expri ReturnClause ]Expr' |
== |
[ let $VarNamei OptTypeDeclarationi := [Expri]Expr' [ReturnClause]Expr' ]Expr |
We do not specify where and order by clauses here, as they can be handled similarly as above let and for expressions.
We define now the semantics for the ReturnClause. Expressions of form return Expr are evaluated as defined in the XQuery semantics. Stand-alone construct-clauses are normalized as follows:
[ construct ConstructTemplate ]Expr' |
== |
[ return fs:serialize([ConstructTemplate]SubjPredObjList) ]Expr |
The auxiliary mapping rule [·]SubjPredObjlist rewrites variables and blank nodes inside of ConstructTemplates using the normalization mapping rules [·]Subject, [·]PredObjList, and [·]ObjList. They use the judgements expr is valid subject, valid predicate, and valid object, which holds if the expression expr is, according to the RDF specification [RDFCONCEPTS], a valid subject, predicate, and object, resp: i.e., subjects must be bound and not literals, predicates, must be bound, not literals and not blank nodes, and objects must be bound. If, for any reason, one criterion fails, the triple containing the ill-formed expression will be removed from the output. Free variables in the construct are unbound, hence triples containing such variables must be removed too. The boundness condition can be checked at runtime by wrapping each variable and FLWOR' into a fn:empty() assertion, which removes the corresponding triple from the ConstructTemplate output. Next, we sketch only some of the normalization rules; the missing rules should be clear from the context:
statEnv |- VarOrTerm is valid subject | |||
|
[ [ PropertyListNotEmpty ] ]SubjPredObjList |
== |
statEnv |- [ fs:serialize("[ ", [PropertyListNotEmpty]PredObjectList, " ]") ]Expr |
statEnv |- Verb is valid predicate statEnv |- Object1 is valid object ⋮ statEnv |- Objectn is valid object |
|||
|
Otherwise, if one of the premises is not true, we suppress the generation of this triple. One of the negated rules is the following:
statEnv |- not(VarOrTerm is valid subject) |
statEnv |- [VarOrTerm PropertyListNotEmpty]SubjPredObjList = [""]Expr |
The normalization for subjects, verbs, and objects according to [·]Expr' is similar to GroupGraphPattern: all variables in it will be replaced using [·]VarSubst.
Blank nodes inside of construction templates must be treated carefully by adding position variables from surrounding for expressions. To this end, we use [·]BNodeSubst. Since we normalize every for-loop by attaching position variables, we just need to retrieve the available position variables from the static environment. We assume a new static environment component statEnv.posVars which holds - similar to the statEnv.varType component - all in-context positional variables in the given static environment, that is, the variables defined in the at clause of any enclosing for loop.
statEnv |- statEnv.posVars = VarName1_Pos, ..., VarNamen_Pos |
statEnv |- [_:BNodeName]BNodeSubst = [fs:serialize("_:",BNodeName,"_", VarName1_Pos , ..., VarNamen_Pos ) ]Expr |
SPARQL filter expressions in WHERE GroupGraphPattern are evaluated using fs:sparql. But we additionally allow the following functions inherited from SPARQL in XSPARQL:
BOUND($A as xs:string) as xs:boolean isIRI($A as xs:string) as xs:boolean isBLANK($A as xs:string) as xs:boolean isLITERAL($A as xs:string) as xs:boolean LANG($A as xs:string) as xs:string DATATYPE($A as xs:string) as xs:anyURI
The semantics of above functions is defined as follows:
statEnv |- $_Varname_Node bound |
statEnv |- [BOUND($VarName)]Expr' = [if (fn:empty($_Varname_Node)) then fn:false() else fn:true()]Expr |
statEnv |- $_Varname_NodeType bound | |||
|
statEnv |- $_Varname_NodeType bound | |||
|
statEnv |- $_Varname_NodeType bound | |||
|
statEnv |- $_Varname_Node bound |
statEnv |- [LANG($VarName)]Expr' = [fn:string($_Varname_Node/@xml:lang)]Expr |
statEnv |- $_Varname_Node bound |
statEnv |- [DATATYPE($VarName)]Expr' = [$_Varname_Node/@datatype]Expr |
XSPARQL syntactically subsumes XQuery and - taking into account the shortcut notations defined in Section 4 of [XSPARQLLANGUAGE] - SPARQL construct queries. Concerning semantics, XSPARQL equally builds on top of its constituent languages. As shown above, we have extended the formal semantics of XQuery from [XQUERYSEMANTICS] by additional reduction rules which reduce each XSPARQL query to XQuery expressions which operate on results of SPARQL queries in the SPARQL's XML result format [SPARQLPROTOCOL].
Since we add only new reduction rules for SPARQL-like heads and bodies, it is easy to see that each native XQuery is treated in a semantically equivalent way in XSPARQL. The only thing affecting native XSPARQL queries are the "decoration" rules, which however, do not affect the final query result
Proposition 1. Each pragma-free XQuery is an XSPARQL query and has the same result under both XQuery and XSPARQL semantics
Proof (Sketch). As easily seen, given an XSPARQL query falling in the XQuery fragment, the result of our normalization is again an XQuery. Note that, however, even this fragment, our additional rewriting rules do change the original query in some cases. More concretely, what happens is that by our "decoration" rule each position-variable free for loop (i.e., that does not have an at clause) is decorated with a new position variable. As these new position variables start with an underscore they cannot occur in the original query, so this rewriting does not interfere with the semantics of the original query. The only rewriting rules which use the newly created position variables are those for rewriting blank nodes in construct parts, i.e., the [·]BNodeSubst rule. However, this rule only applies to XSPARQL queries which fall outside the native XQuery fragment.
A similar correspondence holds for native SPARQL queries. Let us now sketch the proof showing the equivalence of XSPARQL's semantics and the evaluation of rewritten SPARQL queries into native XQuery. Intuitively, we "inherit" the SPARQL semantics from the fs:sparql "oracle".
Let Ω denote a solution sequence of a an abstract SPARQL query q =(E, DS, R) where E is a SPARQL algebra expression, DS is an RDF Dataset and R is a set of variables called the query form (cf. [SPARQL]). Then, by SPARQLResult(Ω) we denote the SPARQL result XML format representation of Ω.
We are now ready to state some properties about our transformations. The following proposition states that any SPARQL select query can be equivalently viewed as an XSPARQL F'DWMR query.
Proposition 2. Let q = (EWM,DS,$x1,...,$xn) be a SPARQL query of the form select $x1,...,$xn DWM, where we denote by DS the RDF dataset (cf. [SPARQL]) corresponding to the DatasetClause (D), by G the respective default graph of DS, and by EWM the SPARQL algebra expression corresponding to WM and P be the pattern defined in the where part (W). If eval(DS(G), q) = Ω1, and
Then, Ω1 ≡ Ω2 modulo representation. Here, by equivalence (≡) modulo representation we mean that both Ω1 and Ω2 represent the same sequences of (partial) variable bindings.
Proof (Sketch). By the rule
[ for $x1 ... $xn from D(G) where P return ($x1, ..., $xn) ]Expr' |
== |
[ let $aux_queryresult := [·]SparqlQuery ... for ... [·]SparqlResult ... return ($x1, ..., $xn) ]Expr' |
[·]SparqlQuery builds q as string without replacing any variable, since all variables in P are free. Then, the resulting string is applied to fs:sparql, which - since q was unchanged - by definition returns exactly SPARQLResult(Ω1), and thus the return part return ($x1, ..., $xn) which extracts Ω2 is obviously just a representational variant of Ω1.
By similar arguments, we can see that SPARQL's construct queries are treated semantically equivalent in XSPARQL and in SPARQL, taking into account the shortcut notations defined in Section 4 of [XSPARQLLANGUAGE]. The idea here is that the rewriting rules constructs from Section 2.2 extract exactly the triples from the solution sequence from the body defined as defined in the SPARQL semantics [SPARQL].