Summary of RDF query tests work, February-May 2003

URI: http://www.w3.org/2003/03/rdfqr-tests/summary.html
Authors and Contributors:: Libby Miller, Dan Brickley, #rdfig IRC participants (see Acknowlegements section)
Abstract:: A brief summary of work carried out in 2003 to collect and run RDF query test cases with different implementations. In conjunction with two other documents describing the results set [1] and manifest [2] formats, we hope the work might be useful for the W3C RDF Data Access Working Group [3], which has just begun as this is written (March 2004).
Status:: This is a discussion document: comments to www-rdf-rules@w3.org please (archived). It will likely be superceded by work in the W3C Data Access Working Group [3].

1. Introduction
2. Query syntax and model
Example
3. Manifest format
Example
4. Result Set format
Example
5. Test cases
6. Problems and issues
7. Acknowledgements
8. References and Resources

1. Introduction

This document summarises work carried out in February-May 2003 to gather together tests that could be used to check if different implementations of simple conjunctive RDF query languages produced the same results.

The result is a collection of tests using

a query language syntax (N-Triples [4], with bNode labels treated specially)
a manifest format (documentation [2])
a result set format (documentation [1])

The testcases [5] are available from a RDF Query (and Rule) Testcase Workspace [6].

The aim was to get something that could be implemented quickly by four developers with four implementations of two simple RDF query languages. The work was driven by the pragmatic need to find out if the RDF query implementations were producing the same results for the same query. This was a first step towards interoperability between the different systems, and a way to tease out the assumptions behind the implementations.

2. Query syntax and model

The query language syntax is just N-Triples; the query model is an RDF graph with a question mark after it - an RDF graph reinterpreted as a question [7]. BNodes are treated as variables.

N-Triples was chosen as a syntax because it was already specified for other purposes, was simple, and was verbose enough to avoid confusion with an end-user query language proposal.

There are various disadvantages to using this syntax (some are documented here [8]), the most obvious being that N-Triples does not have 'bArcs' - blank arcs, restricting the types of queries that could be written down in N-Triples as strictly interpretted. Pragmatically however, the N-Triples parsers were very easy to alter, and so for our limited purpose the N-Triples syntax was sufficient.

Example

Get the variables per, img and mb, where per is something with a foaf:depiction of img and a per also has a foaf:mbox, mb.

 _:per <http://xmlns.com/foaf/0.1/depiction> _:img .
 _:per <http://xmlns.com/foaf/0.1/mbox> _:mb .

As a graph:

an image of a graph showing the query _:per
foaf:depiction _:img and _:per foaf:mbox _:mb

3. Manifest format

The manifest format is documented in more detail in a document by Alberto Reggiori [2].

The manifest is a list of tests with a number, a human-readable description, and links to input and output documents.

input documents are the source RDF files to be queried
output documents are the expected results sets, also expressed in RDF
query documents specify the query file to use

The documents have a syntax specified in the manifest: N-Triples, N3 or RDF/XML. Any document can be in any format. Multiple input documents (source files) are allowed, but only one query and one result file.

In practice all these links were to local files, anticipating that the tests would be downloaded as a bundle rather than retrieved from online sources as required.

In later versions, variable names and number of rows in the result set were added. The former is very important, and specifies which of the bNode values used are for returning and which are not. This is important because it can substantially improve processing time and reduce the number of rows returned. The number of rows in the result set can be used as a preliminary check to see that the results are in the right area, without doing a full graph comparison of the results.

Example

A manifest showing a single test with one input (source) document, one query and one output (result) document. The test is number 1; 4 rows are expected in the result set.

<rdf:RDF
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xmlns:mf='http://www.w3.org/2003/03/rdfqr-tests/manifest.rdfs#'
    xmlns:tq='http://www.w3.org/2003/03/rdfqr-tests/query.rdfs#' >

  <rdf:Description>
     <mf:tests rdf:parseType='Collection'>
        <mf:Test mf:num='1'>
            <mf:name>test1</mf:name>
            <mf:description>simple test</mf:description>
            <mf:status>true</mf:status>
            <mf:input rdf:parseType='Resource'>
                <tq:queryDocument rdf:resource='file:queries/nt/q1.nt' 
                        rdf:type='http://www.w3.org/2003/03/rdfqr-tests/query.rdfs#NT-Document'/>
                <tq:inputDocument rdf:resource='file:inputs/000068.rdf' 
                        rdf:type='http://www.w3.org/2003/03/rdfqr-tests/query.rdfs#RDF-XML-Document'/>
            </mf:input>
            <mf:output rdf:parseType='Resource'>
                <tq:outputDocument rdf:resource='file:results/rs1.rdf'
                        rdf:type='http://www.w3.org/2003/03/rdfqr-tests/query.rdfs#RDF-XML-Document'
                tq:numberRows='4'/>
            </mf:output>
        </mf:Test>
    </mf:tests>
  </rdf:Description>
</rdf:RDF>

4. Result Set format

This is described in more detail in a document by Andy Seaborne [1].

The result set format is basically a table described in RDF. The aim was that a JDBC result set (for example) could be quickly transformed into the RDF format, so that the results could be checked against an exemplar result set also expressed in RDF, using a graph comparison tool, such as Jena's rdfcompare [9], or Cwm [10].

Datatypes, resources and literals are described as you would expect; bNode representations can be tricky for graph comparisons.

Example

<rdf:RDF 
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' 
    xmlns:rs='http://jena.hpl.hp.com/2003/03/result-set#'> 
   <rs:ResultSet rdf:about=''> 
        <rs:resultVariable>img</rs:resultVariable>
        <rs:resultVariable>mb</rs:resultVariable>
        <rs:size rdf:datatype='http://www.w3.org/2000/10/XMLSchema#integer'>2</rs:size>
        <rs:solution>
           <rs:ResultSolution>
                <rs:binding rdf:parseType='Resource'>
                   <rs:variable>img</rs:variable>
                   <rs:value rdf:resource='http://swordfish.rdfweb.org/photos/2001/04/23/000068.JPG'/>
                </rs:binding>
                <rs:binding rdf:parseType='Resource'>
                   <rs:variable>mb</rs:variable>
                   <rs:value rdf:resource='mailto:sean@example.com' />
                </rs:binding>
           </rs:ResultSolution>
        </rs:solution>

        <rs:solution>
           <rs:ResultSolution>
                <rs:binding rdf:parseType='Resource'>
                   <rs:variable>img</rs:variable>
                   <rs:value rdf:resource='http://swordfish.rdfweb.org/photos/2001/04/23/000068.JPG'/>
                </rs:binding>
                <rs:binding rdf:parseType='Resource'>
                   <rs:variable>mb</rs:variable>
                   <rs:value rdf:resource='mailto:geez@example.com' />
                </rs:binding>
           </rs:ResultSolution>
        </rs:solution>
</rs:ResultSet>
</rdf:RDF>

5. Test cases

These are available here on this site@@ under the W3C license [11].

They are in three sections: RDFStore donated by Alberto Reggiori of @Semantics, RDQL donated by Andy Seaborne from HP Labs, and Squish donated by Libby Miller, University of Bristol. Jena and RDFStore[12] both use the RDQL[13] RDF query language; Squish[14] is similiar to RDQL but not identical (both are based on the earlier rdfDB QL by R.V. Guha[15]). All have conversions to N-Triples format.

6. Problems and issues

This was a simple, pragmatic solution to the practical problem of testing RDF query with several implementations and query languages. Many issues were never addressed, or were briefly discussed and postponed. Examples are

the relationship between this work and RDF Core testcases and W3C QA work
bNodes in results sets: graph comparision won't work unless nodeID is used
representing more complex query languages as N-Triples - for example, how could you express optionals? what about substring matching etc?
issues about altering of N-Triples semantics by using graphs as queries
a possible vocabulary for describing more complex characteristics of tests, for example, simple conjunctive, substring match, greater than, less than, datatypes

7. Acknowledgements

This work was a collaborative effort amongst members of the RDF (now 'Semantic Web') Interest Group[16], conducted largely through IRC chat sessions in #rdfig. Details of the meetings are available at RDF Query (and Rule) Testcase Repository. Many people contributed, with tests and documents or with their own implementation and process experience. Here is a non-exhaustive list of people who contributed on IRC or by mailing list.

Andy Seaborne
Alberto Reggiori
Dan Brickley
Jeen Brokestra
Jos De Roo
Tim Berners-Lee
Jan Grant
Eric Miller
Matt Biddulph
Lars Marius Garshol
Eric Prud'hommeaux
Arjohn Kampman
Daniel Krech
Dan Connolly
Libby Miller

Libby Miller and Dan Brickley's contribution to this work was funded through the SWAD-Europe project [17].

8. References and Resources

Selected resources

RDF Query and Rules Status: Query and rule language survey - maintained by Eric Prud'hommeaux
QL98: W3C Workshop on Query Languages, which yielded influential papers such as
Enabling Inference: R.V. Guha, Ora Lassila, Eric Miller, Dan Brickley
Query use cases and examples: Alberto Reggiori and Andy Seaborne; A great place to get usecases and look at some of the commonality and diversity of RDF query languages
SWAD-Europe: Databases, Query, API, Interfaces report on Query languages: Libby Miller, Jan Grant; An FAQ document for RDF query
[1] Recording Query Results: Andy Seaborne
[2] RDF Query Test Cases manifest Format: Alberto Reggiori
[3] W3C RDF Data Access Working Group
[4] N-Triples: part of RDF Test Cases, by Jan Grant and Dave Beckett
[5] Some RDF Query test cases
[6] RDF Query (and Rule) Testcases Workspace
[7] What is an RDF Query? (email to www-rdf-rules): Pat Hayes
[8] SWAD-Europe FAQ Entry - Why not use an RDF graph with blanks for querying RDF?: Dave Reynolds
[9] rdfcompare, a graph comparison tool in Jena, HP's Java RDF library
[10] Cwm - a general-purpose data processor for the semantic web (written in python)
[11] W3C SOFTWARE NOTICE AND LICENSE: Joseph Reagle
[12] RDFStore, a Perl API for RDF Storage
[13] RDQL - RDF Data Query Language
[14] Squish, an SQL-ish query language for RDF
[15] rdfDB : An RDF Database
[16] Semantic Web Interest Group
[17] Semantic Web Advanced Development in Europe

Libby Miller : libby.miller@bristol.ac.uk, Dan Brickley : danbri@w3.org
$Id: summary.html,v 1.5 2004/03/17 15:56:04 lmiller Exp $

Summary of RDF query tests work, February-May 2003

Contents

1. Introduction

2. Query syntax and model

Example

3. Manifest format

Example

4. Result Set format

Example

5. Test cases

6. Problems and issues

7. Acknowledgements

8. References and Resources

Selected resources

References