This document describes an RDF vocabulary for record the results of queries. It was written in support of the ad hoc work on "RDF Query (and Rule) Testcase" and is also related to Work Package 7 of SWAD-e.
This section outlines the range of query languages we hope to cover. We also discuss the forms of query results and identify the most basic form.
The languages of interest are those that are based on conjunctive triple patterns. This class of query language is the focus because it covers a significant number of the many query languages under development for the semantic web. At the RDF level they do have one significant element in common: the first stage can be expressed as a matching of a graph pattern [2] against the graph. A survey "RDF Query and Rules Status" has been compiled by Eric Prud'hommeaux.
This graph pattern is an RDF graph with variables for some arcs or nodes (resources, bNodes or literals). This also includes "bArcs" (blank arcs) for some queries, so taking the pattern outside of pure RDF.
Many of these query systems have significant other features, such as numerical constraints on the values allowed to match the pattern part of the query. We don't consider this further here as these features are outside of the RDF processing even though very important for practical use of a query processor.
For the class of query languages we are considering, results can returned in a number of forms:
A graph pattern may match the target graph in a number of ways. The first two forms of results take this into account, the third just merges the different ways into a single result.
As such the bound variable approach and the multiple subgraph approach are equivalent in terms of RDF information returned but they are not the same. With the details of the bound variables, an application could recreate the multiple subgraphs by substituting the variables into the graph pattern so they record the same RDF information.
In the opposite direction, each matching subgraph can match the pattern in one or more ways, yielding some ambiguity. An alternative way of thinking about this is that the same subgraph can occurred more than once in the sequence of matches – there are as many repeats of this identical subgraph as there are ways to match the graph pattern. Note also, the matching subgraph may have less statements than the number of triple patterns making up the graph pattern because an RDF model is a set of statements and one statement may be matched more than once to a triple pattern in a query within a single solution.
A fourth possibility is described in "Possible RDF query work". The proposal is for a results template, which unifies query with a subset of rules [6]. The vocabulary described below is one special case of this where the template is fixed to describe variable bindings.
In modelling results in a single RDF graph, we may wish to capture each of the three ways of returning results above.
The last case, the single subgraph is trivial - it is an RDF graph already.
The second, a sequence of subgraphs, would need reification: we can't just put all the subgraphs into one RDF graph because it looses the separate subgraph nature (an alternative way to think of it is that RDF statements are independently true - putting everything into one graph makes statements from one subgraph "true" at the same time as the results in the other subgraphs are "true"). Reification is needed to embed a subgraph in the result record without making the statements in the subgraph assertions in the result record.
The set of variables bindings records all the match information can be used to recreate the two graph forms. Therefore, we record the ways the variables can be bound to values.
This vocabulary models a table of results as a number of solutions to the query. Each solution is a number of variable/value pair bindings. Variables are recorded as their name; values can be any RDF object (URI, bNode, literal).
The structure is that there is some resource that represents the result set. Such a result set consists of a number of solutions, and each solution is a number of associations of variable names with the value for this solution. The value can be a resource, bNode or literal. The vocabulary defines a unique resource "undefined" to record when a variable is not bound – some query languages provide optional binding variables such as DQL.
An alternative conceptualization is that a result set is a table whose columns are the variables and whose rows are the various ways a set of values can satisfy the query.
The namespace is http://jena.hpl.hp.com/2003/03/result-set#, here using the prefix "rs"
Suppose the results are as in the table below:
x | y |
---|---|
"123"^^xsd:integer |
<http://example.com/resource1> |
"2003-01-21" |
<http://example.com/resource2> |
"anon1" |
_:a |
"anon2" |
_:a |
then we can encode these result in RDF/XML or in N3. '_:a' represents a bNode that occurs in two solutions.
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:rs='http://jena.hpl.hp.com/2003/03/result-set#'> <rs:ResultSet rdf:about=''> <rs:resultVariable>x</rs:resultVariable> <rs:resultVariable>y</rs:resultVariable> <rs:size rdf:datatype='http://www.w3.org/2001/XMLSchema#integer'>4</rs:size> <rs:solution> <rs:ResultSolution> <rs:binding rdf:parseType='Resource'> <rs:variable>x</rs:variable> <rs:value rdf:datatype='http://www.w3.org/2001/XMLSchema#integer'>123</rs:value> </rs:binding> <rs:binding rdf:parseType='Resource'> <rs:variable>y</rs:variable> <rs:value rdf:resource='http://example.com/resource1'/> </rs:binding> </rs:ResultSolution> </rs:solution> <rs:solution> <rs:ResultSolution> <rs:binding rdf:parseType='Resource'> <rs:variable>x</rs:variable> <rs:value>2003-01-21</rs:value> </rs:binding> <rs:binding rdf:parseType='Resource' > <rs:variable>y</rs:variable> <rs:value rdf:resource='http://example.com/resource2'/> </rs:binding> </rs:ResultSolution> </rs:solution> <rs:solution> <rs:ResultSolution> <rs:binding rdf:parseType='Resource'> <rs:variable>x</rs:variable> <rs:value>anon1</rs:value> </rs:binding> <rs:binding rdf:parseType='Resource' > <rs:variable>y</rs:variable> <rs:value rdf:nodeID='a'/> </rs:binding> </rs:ResultSolution> </rs:solution> <rs:solution> <rs:ResultSolution> <rs:binding rdf:parseType='Resource'> <rs:variable>x</rs:variable> <rs:value>anon2</rs:value> </rs:binding> <rs:binding rdf:parseType='Resource' > <rs:variable>y</rs:variable> <rs:value rdf:nodeID='a'/> </rs:binding> </rs:ResultSolution> </rs:solution> </rs:ResultSet> </rdf:RDF>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rs: <http://jena.hpl.hp.com/2003/03/result-set#> . <> rdf:type rs:ResultSet ; rs:size 4 ; rs:resultVariable "x" ; rs:resultVariable "y" ; rs:solution [ rdf:type rs:ResultSolution ; rs:binding [ rs:variable "x" ; rs:value 123 ] ; rs:binding [ rs:variable "y" ; rs:value <http://example.com/resource1> ] ] ; rs:solution [ rdf:type rs:ResultSolution ; rs:binding [ rs:variable "x" ; rs:value "2003-01-21" ] ; rs:binding [ rs:variable "y" ; rs:value <http://example.com/resource2> ] ] ; rs:solution [ rdf:type rs:ResultSolution ; rs:binding [ rs:variable "x" ; rs:value "anon1" ] ; rs:binding [ rs:variable "y" ; rs:value _:a ] ] ; rs:solution [ rdf:type rs:ResultSolution ; rs:binding [ rs:variable "x" ; rs:value "anon2" ] ; rs:binding [ rs:variable "y" ; rs:value _:a ] ] ; .
Andy Seaborne :
andy.seaborne@hp.com
$Id: recording-query-results.html,v 1.9 2004/06/03 12:41:37 aseaborne Exp $