Recording Query Results

URI: http://www.w3.org/2003/03/rdfqr-tests/recording-query-results.html
Authors and Contributors:: Andy Seaborne, folks on #rdfig
Abstract:: This document describes a way to record the results of queries where the queries are from languages that return bound variables. The recording of the results is in RDF, enabling graph comparison to be used for testing whether two sets of query results are equivalent.
Status:: This is a discussion document: comments to www-rdf-rules@w3.org please (archived).

1. Introduction
2. Scope
- 2.1 Query Languages
- 2.2 Forms of Query Results
3. Modelling Results in RDF
4. Result Set Vocabulary
5. An example
5. References and Resources

1. Introduction

This document describes an RDF vocabulary for record the results of queries. It was written in support of the ad hoc work on "RDF Query (and Rule) Testcase" and is also related to Work Package 7 of SWAD-e.

2. Scope

This section outlines the range of query languages we hope to cover. We also discuss the forms of query results and identify the most basic form.

2.1 Query Languages

The languages of interest are those that are based on conjunctive triple patterns. This class of query language is the focus because it covers a significant number of the many query languages under development for the semantic web. At the RDF level they do have one significant element in common: the first stage can be expressed as a matching of a graph pattern [2] against the graph. A survey "RDF Query and Rules Status" has been compiled by Eric Prud'hommeaux.

This graph pattern is an RDF graph with variables for some arcs or nodes (resources, bNodes or literals). This also includes "bArcs" (blank arcs) for some queries, so taking the pattern outside of pure RDF.

Many of these query systems have significant other features, such as numerical constraints on the values allowed to match the pattern part of the query. We don't consider this further here as these features are outside of the RDF processing even though very important for practical use of a query processor.

2.2 Forms of Query Results

For the class of query languages we are considering, results can returned in a number of forms:

The result table: a set of rows, each row being a set of pairs of variables and their values (i.e. the columns correspond to the variables, the table entries to values).
A sequence of subgraphs, one for each way the pattern matches the graph
A single subgraph that is the merge of all the ways the query can match. This is also the smallest subgraph of the original graph which would yield the same results for the query as the first two forms.

A graph pattern may match the target graph in a number of ways. The first two forms of results take this into account, the third just merges the different ways into a single result.

As such the bound variable approach and the multiple subgraph approach are equivalent in terms of RDF information returned but they are not the same. With the details of the bound variables, an application could recreate the multiple subgraphs by substituting the variables into the graph pattern so they record the same RDF information.

In the opposite direction, each matching subgraph can match the pattern in one or more ways, yielding some ambiguity. An alternative way of thinking about this is that the same subgraph can occurred more than once in the sequence of matches – there are as many repeats of this identical subgraph as there are ways to match the graph pattern. Note also, the matching subgraph may have less statements than the number of triple patterns making up the graph pattern because an RDF model is a set of statements and one statement may be matched more than once to a triple pattern in a query within a single solution.

A fourth possibility is described in "Possible RDF query work". The proposal is for a results template, which unifies query with a subset of rules [6]. The vocabulary described below is one special case of this where the template is fixed to describe variable bindings.

3. Modelling Results in RDF

In modelling results in a single RDF graph, we may wish to capture each of the three ways of returning results above.

The last case, the single subgraph is trivial - it is an RDF graph already.

The second, a sequence of subgraphs, would need reification: we can't just put all the subgraphs into one RDF graph because it looses the separate subgraph nature (an alternative way to think of it is that RDF statements are independently true - putting everything into one graph makes statements from one subgraph "true" at the same time as the results in the other subgraphs are "true"). Reification is needed to embed a subgraph in the result record without making the statements in the subgraph assertions in the result record.

The set of variables bindings records all the match information can be used to recreate the two graph forms. Therefore, we record the ways the variables can be bound to values.

4. Result Set Vocabulary

This vocabulary models a table of results as a number of solutions to the query. Each solution is a number of variable/value pair bindings. Variables are recorded as their name; values can be any RDF object (URI, bNode, literal).

The structure is that there is some resource that represents the result set. Such a result set consists of a number of solutions, and each solution is a number of associations of variable names with the value for this solution. The value can be a resource, bNode or literal. The vocabulary defines a unique resource "undefined" to record when a variable is not bound – some query languages provide optional binding variables such as DQL.

An alternative conceptualization is that a result set is a table whose columns are the variables and whose rows are the various ways a set of values can satisfy the query.

Vocabulary in N3 – (local copy)
Vocabulary in RDF/XML – (local copy)

The namespace is http://jena.hpl.hp.com/2003/03/result-set#, here using the prefix "rs"

Classes

rs:ResultSet: The class of the whole solution set
rs:Solution: The class of things that are one possible solution.
rs:Binding: The class of a single association of a variable and a value.

Properties

rs:solution: Property to a solution set
rs:size: Records the number of results (solutions) in this result set. This is a convenience value and should agree with the number of statements using property rs:solution on this result set.
rs:resultVariable: Names of the variables used. Again, convenience.
rs:binding: Property relating a single solution to a single variable and its value in this solution.
rs:variable: The name of variable in a binding.
rs:value: The value associated with the variable in this binding.

5. Example

Suppose the results are as in the table below:

x	y
"123"^^xsd:integer	<http://example.com/resource1>
"2003-01-21"	<http://example.com/resource2>
"anon1"	_:a
"anon2"	_:a

then we can encode these result in RDF/XML or in N3. '_:a' represents a bNode that occurs in two solutions.

<rdf:RDF
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xmlns:rs='http://jena.hpl.hp.com/2003/03/result-set#'>
    <rs:ResultSet rdf:about=''>
        <rs:resultVariable>x</rs:resultVariable>
        <rs:resultVariable>y</rs:resultVariable>
        <rs:size rdf:datatype='http://www.w3.org/2001/XMLSchema#integer'>4</rs:size>

        <rs:solution>
            <rs:ResultSolution>
                <rs:binding rdf:parseType='Resource'>
                    <rs:variable>x</rs:variable>
                    <rs:value rdf:datatype='http://www.w3.org/2001/XMLSchema#integer'>123</rs:value>
                </rs:binding>
                <rs:binding rdf:parseType='Resource'>
                    <rs:variable>y</rs:variable>
                    <rs:value rdf:resource='http://example.com/resource1'/>
                </rs:binding>
            </rs:ResultSolution>
        </rs:solution>

        <rs:solution>
            <rs:ResultSolution>
                <rs:binding rdf:parseType='Resource'>
                    <rs:variable>x</rs:variable>
                    <rs:value>2003-01-21</rs:value>
                </rs:binding>
                <rs:binding rdf:parseType='Resource' >
                    <rs:variable>y</rs:variable>
                    <rs:value rdf:resource='http://example.com/resource2'/>
                </rs:binding>
            </rs:ResultSolution>
        </rs:solution>

        <rs:solution>
            <rs:ResultSolution>
                <rs:binding rdf:parseType='Resource'>
                    <rs:variable>x</rs:variable>
                    <rs:value>anon1</rs:value>
                </rs:binding>
                <rs:binding rdf:parseType='Resource' >
                    <rs:variable>y</rs:variable>
                    <rs:value rdf:nodeID='a'/>
                </rs:binding>
            </rs:ResultSolution>
        </rs:solution>

        <rs:solution>
            <rs:ResultSolution>
                <rs:binding rdf:parseType='Resource'>
                    <rs:variable>x</rs:variable>
                    <rs:value>anon2</rs:value>
                </rs:binding>
                <rs:binding rdf:parseType='Resource' >
                    <rs:variable>y</rs:variable>
                    <rs:value rdf:nodeID='a'/>
                </rs:binding>
            </rs:ResultSolution>
        </rs:solution>

    </rs:ResultSet>
</rdf:RDF>

@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rs:     <http://jena.hpl.hp.com/2003/03/result-set#> .


<>  rdf:type rs:ResultSet ;
    rs:size 4 ;
    rs:resultVariable "x" ; rs:resultVariable "y" ;
    rs:solution
        [ rdf:type rs:ResultSolution ;
          rs:binding [ rs:variable "x" ; rs:value 123 ] ;
          rs:binding [ rs:variable "y" ; rs:value <http://example.com/resource1> ]
        ] ;

    rs:solution
        [ rdf:type rs:ResultSolution ;
          rs:binding [ rs:variable "x" ;
                      rs:value "2003-01-21" ] ;
          rs:binding [ rs:variable "y" ;
                      rs:value <http://example.com/resource2> ]
        ] ;

    rs:solution
        [ rdf:type rs:ResultSolution ;
          rs:binding [ rs:variable "x" ;
                      rs:value "anon1" ] ;
          rs:binding [ rs:variable "y" ;
                      rs:value _:a ]
        ] ;

    rs:solution
        [ rdf:type rs:ResultSolution ;
          rs:binding [ rs:variable "x" ;
                      rs:value "anon2" ] ;
          rs:binding [ rs:variable "y" ;
                      rs:value _:a ]
        ] ;
    .

6. References and Resources

[1] RDF Query and Rules Status: Query and rule language survey - maintained by Eric Prud'hommeaux
[2] QL98: W3C Workshop on Query Languages
[3] Enabling Inference: R.V. Guha, Ora Lassila, Eric Miller, Dan Brickley
[4] Query use cases and examples: Alberto Reggiori and Andy Seaborne
[5] SWAD-Europe: Databases, Query, API, Interfaces report on Query languages: Libby Miller, Jan Grant
[6] RuleML: A set of DTDs for describing rules - including query as a special case of derivation rules. RuleML DTDs v0.8 are "RDF-friendly" but are XML.

Andy Seaborne : andy.seaborne@hp.com
$Id: recording-query-results.html,v 1.9 2004/06/03 12:41:37 aseaborne Exp $