Copyright Hewlett-Packard 2003. Distribution policies are governed by the W3C intellectual property terms.
The document describes RDQL (RDF Data Query Language) which has been implemented in a number of RDF systems for extracting information from RDF graphs. First, there is a brief introduction to the language, then a more formal description of the grammar.
This section describes the status of this document at the time of its publication. Other documents may supersede this document.
By publishing this document, W3C acknowledges that Hewlett-Packard has made a formal submission to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership; please consult the complete list of acknowledged W3C Member Submissions. See also Submission request and Team Comment.
The document describes RDQL (RDF Data Query Language) which has been implemented in a number of RDF systems for extracting information from RDF graphs.
RDQL is an evolution from several languages and including ideas described in [6]. See [1] for the original paper about three similar query languages, together with some history and context. See [2] for a comprehensive survey of many RDF query languages (and also rule systems) and [3] for a number of use case with examples in several languages.
This section is an explanation of the RDQL syntax with examples. It is not a tutorial (see [4] for the Jena tutorial section on RDQL) but a quick description of the key elements of the query language. The grammar, given later, is definitive.
An RDF [8] model is graph, often expressed as a set of triples. An RDQL consists of a graph pattern, expressed as a list of triple patterns. Each triple pattern is comprised of named variables and RDF values (URIs and literals). An RDQL query can additionally have a set of constraints on the values of those variables, and a list of the variables required in the answer set.
Example 1:
SELECT ?x WHERE (?x, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.com/someType>)
This triple pattern matches all statements in the graph that have predicate
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
and object
http://example.com/someType
.
The variable "?x" will be bound to the label of the subject resource.
All such "x" are returned (strictly, "x" is the variable name, "?"
introduces a variable but is not part if its name).
An RDQL query treats an RDF graph purely as data. If the implementation of that graph provides inferencing to appear as "virtual triples" (i.e. triples that appear in the graph but are not in the ground facts) then an RDQL will include those triples as possible matches in triple patterns. RDQL makes no distinction between inferred triples and ground triples.
The terms quoted by "<>" are URIrefs. Other RDF values are literals which, following N-Triples syntax [7], are a string and optional language tag (introduced with '@') and datatype URI (introduced by '^^'). URIrefs can also abbreviated with an XML QName-like form; this is syntactic assistance and is translated to the full URIref.
The example query above had just one triple pattern forming a single edge in the graph pattern. More complicated graph patterns are made by writing all the edges in the query. Like RDF, these are interpreted conjunctively – all of them must match for a result to be added to the result set of the query. This may mean that variables are used to link together triple patterns.
Example 2:
SELECT ?family , ?given WHERE (?vcard vcard:FN "John Smith") (?vcard vcard:N ?name) (?name vcard:Family ?family) (?name vcard:Given ?given) USING vcard FOR <http://www.w3.org/2001/vcard-rdf/3.0#>
This query, based on the vCard vocabulary [9], finds the family name and
given name from any vcards with formatted name (FN) "John Smith". The
vCard vocabulary has a structured value for the name, using the vcard:N
property to point to another node in the RDF graph. This node, in turn, has the various name
elements as further statements. This intermediate node can be a blank node (an
RDF node without a URIref in this RDF graph).
We have used the prefix 'vcard' to abbreviate the URI or URIref. Writing the full URI or writing the abbreviated form is the same query as RDF only deals with full URIrefs.
We have used a comma to separate the variables in the SELECT clause. Commas in queries in triple patterns or in places where lists of items occur are optional and the application writer can choose to use them or not for readability and personal style.
Example 3:
SELECT ?resource WHERE (?resource info:age ?age) AND ?age >= 24 USING info FOR <http://example.org/peopleInfo#>
In this example, there is a constraint to restrict the object value of the matched statements.
Example 4:
SELECT ?resource FROM <http://example.org/someWebPage> WHERE (?resource info:age ?age) AND ?age >= 24 USING info FOR <http://example.org/peopleInfo#>
In this example, the source of the data to be queried is supplied.
Where not supplied, it is the responsibility of the execution environment to associate the query with the RDF graph to be queried. Such mechanisms are outside the scope of this note.
RDQL was first released in Jena 1.2.0. At the time of writing, the following systems are known to provide RDQL: there is no formal compliance test but all these systems implement something around the triple pattern matching and constraint system that they can be described as "RDQL". They are all (to the authors knowledge) derived from the original grammar [5].
In addition, RDQL is one language used for remote query by the Joseki RDF Server.
This grammar is derived from the Jena implementation of RDQL.
Note: this is a permissive grammar. It is designed for convenience and includes liberal interpretations of terms from other systems.
QuotedURI | ::= | '<' URI characters (from RFC 2396) '>' | |
NSPrefix | ::= | NCName As defined in XML Namespace v1.1 and XML 1.1 | |
LocalPart | ::= | NCName As defined in XML Namespace v1.1 and XML 1.1 | |
SELECT | ::= | 'SELECT' | Case Insensitive match |
FROM | ::= | 'FROM' | Case Insensitive match |
SOURCE | ::= | 'SOURCE' | Case Insensitive match |
WHERE | ::= | 'WHERE' | Case Insensitive match |
AND | ::= | 'AND' | Case Insensitive match |
USING | ::= | 'USING' | Case Insensitive match |
Identifier | ::= | ([a-z][A-Z][0-9][-_.])+ | |
EOF | ::= | End of file | |
COMMA | ::= | ',' | |
INTEGER_LITERAL | ::= | ([0-9])+ | |
FLOATING_POINT_LITERAL | ::= | ([0-9])*'.'([0-9])+('e'('+'|'-')?([0-9])+)? | |
STRING_LITERAL1 | ::= | '"'UTF-8 characters'"' (with escaped \") | |
STRING_LITERAL2 | ::= | "'"UTF-8 characters"'" (with escaped \') | |
LPAREN | ::= | '(' | |
RPAREN | ::= | ')' | |
COMMA | ::= | ',' | |
DOT | ::= | '.' | |
GT | ::= | '>' | |
LT | ::= | '<' | |
BANG | ::= | '!' | |
TILDE | ::= | '~' | |
HOOK | ::= | '?' | |
COLON | ::= | ':' | |
EQ | ::= | '==' | |
NEQ | ::= | '!=' | |
LE | ::= | '<=' | |
GE | ::= | '>=' | |
SC_OR | ::= | '||' | |
SC_AND | ::= | '&&' | |
STR_EQ | ::= | 'EQ' | Case Insensitive match |
STR_NE | ::= | 'NE' | Case Insensitive match |
PLUS | ::= | '+' | |
MINUS | ::= | '-' | |
STAR | ::= | '*' | |
SLASH | ::= | '/' | |
REM | ::= | '%' | |
STR_MATCH | ::= | '=~' | '~~' | |
STR_NMATCH | ::= | '!~' | |
DATATYPE | ::= | '^^' | |
AT | ::= | '@' |
References to lexical tokens are enclosed in <>. Whitespace is skipped.
Notes: The term "literal" refers to a constant value, and not only an RDF Literal.
The author would like to thank Dave Beckett for his help with earlier versions of this submission.
Resources
References
[1] "Three Implementations of SquishQL, a Simple RDF Query Language", Libby Miller, Andy Seaborne, Alberto Reggiori; ISWC2002
[2] "RDF Query and Rules: A Framework and Survey", Eric Prud'hommeaux
[3] "RDF Query and Rule languages Use Cases and Example", Alberto Reggiori, Andy Seaborne
[4] RDQL Tutorial for Jena (in the Jena tutorial).
[6] Enabling Inference, R.V. Guha, Ora Lassila, Eric Miller, Dan Brickley
[8] RDF http://www.w3.org/RDF/
[9] "Representing vCard Objects in RDF/XML", Renato Iannella, W3C Note.