W3C Member Submission

RDQL - A Query Language for RDF

W3C Member Submission 9 January 2004

This version:
http://www.w3.org/submissions/2004/SUBM-RDQL-20040109/
Latest version:
http://www.w3.org/submissions/RDQL/
Author:
Andy Seaborne, HP Labs Bristol, andy.seaborne@hp.com

Abstract

The document describes RDQL (RDF Data Query Language) which has been implemented in a number of RDF systems for extracting information from RDF graphs. First, there is a brief introduction to the language, then a more formal description of the grammar.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document.

By publishing this document, W3C acknowledges that Hewlett-Packard has made a formal submission to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership; please consult the complete list of acknowledged W3C Member Submissions. See also Submission request and Team Comment.


Table of contents

  1. Introduction
  2. Description
  3. Implementations
  4. Grammar
  5. References and Resources

Introduction

The document describes RDQL (RDF Data Query Language) which has been implemented in a number of RDF systems for extracting information from RDF graphs.

RDQL is an evolution from several languages and including ideas described in [6]. See [1] for the original paper about three similar query languages, together with some history and context.  See [2] for a comprehensive survey of many RDF query languages (and also rule systems) and [3] for a number of use case with examples in several languages.

Description

This section is an explanation of the RDQL syntax with examples.  It is not a tutorial (see [4] for the Jena tutorial section on RDQL) but a quick description of the key elements of the query language. The grammar, given later, is definitive.

An RDF [8] model is graph, often expressed as a set of triples.  An RDQL consists of a graph pattern, expressed as a list of triple patterns.  Each triple pattern is comprised of named variables and RDF values (URIs and literals).  An RDQL query can additionally have a set of constraints on the values of those variables, and a list of the variables required in the answer set.

Example 1:

SELECT ?x
WHERE (?x,  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,
                                   <http://example.com/someType>)

This triple pattern matches all statements in the graph that have predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type and object http://example.com/someType.  The variable "?x" will be bound to the label of the subject resource.  All such "x" are returned (strictly, "x" is the variable name, "?" introduces a variable but is not part if its name).

An RDQL query treats an RDF graph purely as data. If the implementation of that graph provides inferencing to appear as "virtual triples" (i.e. triples that appear in the graph but are not in the ground facts) then an RDQL will include those triples as possible matches in triple patterns. RDQL makes no distinction between inferred triples and ground triples.

The terms quoted by "<>" are URIrefs.  Other RDF values are literals which, following N-Triples syntax [7], are a string and optional language tag (introduced with '@') and datatype URI (introduced by '^^').  URIrefs can also abbreviated with an XML QName-like form; this is syntactic assistance and is translated to the full URIref.

The example query above had just one triple pattern forming a single edge in the graph pattern. More complicated graph patterns are made by writing all the edges in the query. Like RDF, these are interpreted conjunctively – all of them must match for a result to be added to the result set of the query. This may mean that variables are used to link together triple patterns.

Example 2:

SELECT ?family , ?given
WHERE  (?vcard  vcard:FN "John Smith")
       (?vcard  vcard:N  ?name)
       (?name   vcard:Family  ?family)
       (?name   vcard:Given  ?given)
USING  vcard FOR <http://www.w3.org/2001/vcard-rdf/3.0#>

This query, based on the vCard vocabulary [9], finds the family name and given name from any vcards with formatted name (FN) "John Smith".  The vCard vocabulary has a structured value for the name, using the vcard:N property to point to another node in the RDF graph.  This node, in turn, has the various name elements as further statements. This intermediate node can be a blank node (an RDF node without a URIref in this RDF graph).

We have used the prefix 'vcard' to abbreviate the URI or URIref.  Writing the full URI or writing the abbreviated form is the same query as RDF only deals with full URIrefs.

We have used a comma to separate the variables in the SELECT clause.  Commas in queries in triple patterns or in places where lists of items occur are optional and the application writer can choose to use them or not for readability and personal style.

Example 3:

SELECT ?resource
WHERE (?resource info:age ?age)
AND ?age >= 24
USING info FOR <http://example.org/peopleInfo#>

In this example, there is a constraint to restrict the object value of the matched statements.

Example 4:

SELECT ?resource
FROM   <http://example.org/someWebPage>
WHERE  (?resource info:age ?age)
AND    ?age >= 24
USING  info FOR <http://example.org/peopleInfo#>

In this example, the source of the data to be queried is supplied.

Where not supplied, it is the responsibility of the execution environment to associate the query with the RDF graph to be queried. Such mechanisms are outside the scope of this note.

Implementations

RDQL was first released in Jena 1.2.0. At the time of writing, the following systems are known to provide RDQL: there is no formal compliance test but all these systems implement something around the triple pattern matching and constraint system that they can be described as "RDQL".  They are all (to the authors knowledge) derived from the original grammar [5].

In addition, RDQL is one language used for remote query by the Joseki RDF Server.

Grammar

This grammar is derived from the Jena implementation of RDQL.

Note: this is a permissive grammar.  It is designed for convenience and includes liberal interpretations of terms from other systems.

Lexical Tokens

QuotedURI  ::=  '<' URI characters (from RFC 2396) '>'
NSPrefix  ::=   NCName As defined in XML Namespace v1.1 and XML 1.1
LocalPart  ::=   NCName As defined in XML Namespace v1.1 and XML 1.1
SELECT  ::=  'SELECT' Case Insensitive match
FROM  ::=  'FROM'   Case Insensitive match
SOURCE  ::=  'SOURCE' Case Insensitive match
WHERE  ::=  'WHERE' Case Insensitive match
AND  ::=  'AND' Case Insensitive match
USING  ::=  'USING' Case Insensitive match
Identifier  ::=  ([a-z][A-Z][0-9][-_.])+  
EOF  ::=  End of file
COMMA  ::=  ','
INTEGER_LITERAL   ::=  ([0-9])+
FLOATING_POINT_LITERAL  ::=  ([0-9])*'.'([0-9])+('e'('+'|'-')?([0-9])+)?
STRING_LITERAL1  ::=  '"'UTF-8 characters'"' (with escaped \")
STRING_LITERAL2  ::=  "'"UTF-8 characters"'" (with escaped \')
LPAREN  ::=  '('
RPAREN  ::=  ')'
COMMA  ::=  ','
DOT  ::=  '.'
GT  ::=  '>'
LT  ::=  '<'
BANG  ::=  '!'
TILDE  ::=  '~'
HOOK  ::=  '?'
COLON  ::=  ':'
EQ  ::=  '=='
NEQ  ::=  '!='
LE  ::=  '<='
GE  ::=  '>='
SC_OR  ::=  '||'
SC_AND  ::=  '&&'
STR_EQ  ::= 'EQ' Case Insensitive match
STR_NE  ::= 'NE' Case Insensitive match
PLUS  ::=  '+'
MINUS  ::=  '-'
STAR  ::=  '*'
SLASH  ::=  '/'
REM  ::=  '%'
STR_MATCH  ::=  '=~' | '~~'
STR_NMATCH  ::=  '!~'
DATATYPE  ::=  '^^'
AT  ::=  '@'

Grammar

References to lexical tokens are enclosed in <>.  Whitespace is skipped.

Notes: The term "literal" refers to a constant value, and not only an RDF Literal.

CompilationUnit  ::=  Query <EOF>
CommaOpt  ::=  ( <COMMA> )?
Query  ::=  SelectClause ( SourceClause )? TriplePatternClause ( ConstraintClause )? ( PrefixesClause )?
SelectClause  ::=  ( <SELECT> Var ( CommaOpt Var )* | <SELECT> <STAR> )
SourceClause  ::=  ( <SOURCE> | <FROM> ) SourceSelector ( CommaOpt SourceSelector )*
SourceSelector  ::=  QName
TriplePatternClause  ::=  <WHERE> TriplePattern ( CommaOpt TriplePattern )*
ConstraintClause  ::=  <SUCHTHAT> Expression ( ( <COMMA> | <SUCHTHAT> ) Expression )*
TriplePattern  ::=  <LPAREN> VarOrURI CommaOpt VarOrURI CommaOpt VarOrConst <RPAREN>
VarOrURI  ::=  Var
 | URI
VarOrConst  ::=  Var
 | Const
Var  ::=  "?" Identifier
PrefixesClause  ::=  <PREFIXES> PrefixDecl ( CommaOpt PrefixDecl )*
PrefixDecl  ::=  Identifier <FOR> <QuotedURI>
Expression  ::=  ConditionalOrExpression
ConditionalOrExpression  ::=  ConditionalAndExpression ( <SC_OR> ConditionalAndExpression )*
ConditionalAndExpression  ::=  StringEqualityExpression ( <SC_AND> StringEqualityExpression )*
StringEqualityExpression  ::=  ArithmeticCondition ( <STR_EQ> ArithmeticCondition | <STR_NE> ArithmeticCondition | <STR_MATCH> PatternLiteral | <STR_NMATCH> PatternLiteral )*
ArithmeticCondition  ::=  EqualityExpression
EqualityExpression  ::=  RelationalExpression ( <EQ> RelationalExpression | <NEQ> RelationalExpression )?
RelationalExpression  ::=  AdditiveExpression ( <LT> AdditiveExpression | <GT> AdditiveExpression | <LE> AdditiveExpression | <GE> AdditiveExpression )?
AdditiveExpression  ::=  MultiplicativeExpression ( <PLUS> MultiplicativeExpression | <MINUS> MultiplicativeExpression )*
MultiplicativeExpression  ::=  UnaryExpression ( <STAR> UnaryExpression | <SLASH> UnaryExpression | <REM> UnaryExpression )*
UnaryExpression  ::=  UnaryExpressionNotPlusMinus
 | ( <PLUS> UnaryExpression | <MINUS> UnaryExpression )
UnaryExpressionNotPlusMinus  ::=  ( <TILDE> | <BANG> ) UnaryExpression
 | PrimaryExpression
PrimaryExpression  ::=  Var
 | Const
 | <LPAREN> Expression <RPAREN>
Const  ::=  URI
 | NumericLiteral
 | TextLiteral
 | BooleanLiteral
 | NullLiteral
NumericLiteral  ::=  ( <INTEGER_LITERAL> | <FLOATING_POINT_LITERAL> )
TextLiteral  ::=  ( <STRING_LITERAL1> | <STRING_LITERAL2> ) ( <AT> Identifier )? ( <DATATYPE> URI )?
PatternLiteral  ::= 
BooleanLiteral  ::=  <BOOLEAN_LITERAL>
NullLiteral  ::=  <NULL_LITERAL>
URI  ::=  <QuotedURI>
 |  QName
QName  ::=  <NSPrefix> ':' (<LocalPart>)?
Unlilke XML Namespaces, the local part is optional
Identifier  ::=  ( <IDENTIFIER> | <SELECT> | <SOURCE> | <FROM> | <WHERE> | <PREFIXES> | <FOR> | <STR_EQ> | <STR_NE> )

Acknowledgements

The author would like to thank Dave Beckett for his help with earlier versions of this submission.

References and Resources

Resources

References

[1] "Three Implementations of SquishQL, a Simple RDF Query Language", Libby Miller, Andy Seaborne, Alberto Reggiori; ISWC2002

[2] "RDF Query and Rules: A Framework and Survey", Eric Prud'hommeaux

[3] "RDF Query and Rule languages Use Cases and Example", Alberto Reggiori, Andy Seaborne

[4] RDQL Tutorial for Jena (in the Jena tutorial).

[5] RDQL BNF from Jena

[6] Enabling Inference, R.V. Guha, Ora Lassila, Eric Miller, Dan Brickley

[7] N-Triples

[8] RDF http://www.w3.org/RDF/

[9] "Representing vCard Objects in RDF/XML", Renato Iannella, W3C Note.

 

Valid XHTML 1.0!