This document is complete, but may be updated throughout the life of the project. First version published 2002-10-01. This version 2003-04-01.
This report is part of SWAD-Europe Work package 7: Databases, Query, API, Interfaces. It is intended to compare existing RDF query language functionality, documenting different scenarios and users for RDF query languages (for example scripters, programmers; data, schema). There are many existing documents covering this area. Rather than repeating work that has already been done we have concentrated on frequently asked questions (FAQs): we have looked through the lists www-rdf-interest@w3.org (archive), www-rdf-rules@w3.org (archive), jena-dev@yahoogroups.com (archives), gathered together the questions and located answers from mailing lists and available expertise. An ongoing draft of the RDF FAQ is also available.
As part of the work on this deliverable, the project has begun to gather together people and materials for creating a repository of tests for RDF query. We are working with people who have their own query languages, creators of a repository of RDF query usecases, and with the creator of a comparison document for RDF query and rules languages to hold a series of IRC and face to face meetings to discuss a common manifest format and results format for the tests. The aim is that the creators of similar RDF query languages can share testcases and thereby improve interoperability. More information is available on the Extended Semantic Web wiki, and in testcases for RDF query FAQ below.
Note that similar questions are grouped together and answered together.
There is no one 'official' RDF query language - there has not been a W3C activity in this area, for example. However there are many RDF query language implementations in use. There is no explicit consensus about what an RDF query is, although there are many similarities in the available implementations.
It is important to distinguish between the syntax of an RDF query and what it does. Many RDF query languages (but not all) have different syntaxes but do basically the same thing, that is, describe an RDF graph with parts missing, assign those parts variable names, and get a series of bindings to those variables. Examples of query languages that do this include RDFdb QL, Algae, Squish, RDQL, RDFql.
For example, this is a representation of a query which is a graph with parts missing. It expresses the query:
"find me the name of the person whose email address is libby.miller@bristol.ac.uk, and also find me the title and identifier of anything that she has created"
This simple description is not sufficient to encompass all RDF query languages by any means. Some (in particular RQL) have a specific syntax for accessing RDF schema information (such as subclass, subproperty). Others have an XML Path-like structure that can recursively match subgraphs (Versa).
For more information, look at Enabling Inference, by R.V. Guha, Ora Lassila, Eric Miller, Dan Brickley), and for an overview of available query language implementations for RDF try Alberto Reggiori/Andy Seaborne - RDF Query and Rule languages Use Cases and Examples survey and Eric Prud'hommeaux's RDF query survey.
Numerous answers from a thread on www-rdf-rules@w3.org, September 2001.
#rdfig irc channel, 2003-02-27
libby: so, I'm working through my rdfquery faq, and I get to: Why isn't there a W3C working group or taskforce for RDF query? libby: anyone know? libby: not that I want one danbri: Sure: W3C has not decided to Charter a WG yet. That's why there isn't one. danbri: Re 'taskforce', this is a vague notion anyway. There is an IG-related mailing list, www-rdf-rules, which effectively services that purpose.
It seems possible that there will be one in the near future: see RDF interest group chatlogs for 2003-02-28, and work from the 2003 W3C All Groups Meeting and Technical Plenary - (Semantic Web architecture annotated agenda and logs, Possible query work, Notes from RDF query BOFs (birds of a feather meetings), trip report).
The W3C has information about the process of creating working groups and interest groups. There is a www-rdf-rules@w3.org list archive, which anyone can join - information about how to do that is at the bottom of that page.
It is not possible to query generic RDF in XML using XQuery without preprocessing. This is primarily because the same piece of RDF can be expressed in many different syntactic formats in XML, making syntactic XML-based query formats like XSLT and XQuery difficult to use on arbitrary RDF. Attempts have been made to normalize RDF into a different XML syntax which can be queried with these tools, although none of these have been adopted officially by the RDF Core working group. Max Froumentin's XSLT RDF parser tackles similar issues, first normalizing and then parsing the RDF.
For more information, see the XML Europe 2001 paper by Robie at al The Syntactic Web. Max Froumentin's XSLT RDF parser is documented here. There was also a thread about RDF and XQuery on www-ql@w3.org mailing list in June 2001.
RQL can explicitly be used to retrieve RDFS (RDF Schema) information.
For simpler query languages, whether (for example) all the subclasses of a given class are also returned as well as the immediate subclass will depend on the underlying database. If the database implements RDF Schema (whether by explicitly adding all the subclasses and subproperties whenever a class or property is encountered or whether it uses rules to do this), then simple query languages will retrieve the same information. For simpler databases which do not compute the deductive closure of subclass and subproperty in this way, then the information will not be found.
Here's an example from Jeen Broekstra showing RQL and RDQL queries over Sesame, illustrating both scenarios.
DAML+oil is a language defined in RDF whose semantics are built on top of the RDF model. Therefore you can query DAML+oil data, just as you can query RDF Schema information using RDF query languages. However you will not get any DAML+oil-specific semantics from RDF query languages, so the result may be awkward to deal with.
There is also a DAML query language, DQL (DQL Semantics, proposed DQL syntax), download. DQL is described using RDF and DAML+oil, but is essentially an RDF graph-matching language and does not express any DAML+oil specific query facilities.
Because Jena has a DAML API, the Jena team are often asked questions about the relationship between their RDF query language RDQL and DAML. Some answers are below.
"RDQL only returns Resource/Property/Literal objects - not the higher-level DAML objects that the DAML sub-system of Jena has that get via the DAML API. Also, it does not currently (to be changed) have access to the implications of the DAML inference even though these manifest themselves as virtual triples."
Andy Seaborne, jena-dev message
"Once your data/ontology is DAML compliant, you will be able to do RDQL queries on it (actually, you can query without it being DAML compliant - it needs only to be legal RDF but then it might not be saying what you mean it to say :-). N.B. RDQL querys RDF as data - you would not get DAML inference as a result of an RDQL query."
Andy Seaborne, jena-dev message
"RDQL ("RDF Data Query Language") queries models at the RDF and so it is rather difficult, at the moment, to query at the DAML level with RDQL - you have to know how DAML uses RDF and there is no inference at all. If you wish to access a DAML model in DAML terms you need to go through the DAML API. If you have a data-oriented requirement then RDQL, on an RDF model, can be used instead of a sequence of Jena core APIs calls."
Andy Seaborne, jena-dev message
Datatypes are specified in the RDF Core concepts document and there is some informative explanantion in the RDF Semantics document
These documents are at last call working draft stage
"This document is in the Last Call review period, which ends on 21 February 2003. This document has been endorsed by the RDF Core Working Group. This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to how the changes made affect existing implementations and content."
Datatyping has only been introduced comparatively recently, and I have not found any query languages that implement datatypes. This section will be updated when more information is available.
Optional variables are very useful when the data to be queried is not completely consistent. For example if the data to be queried contains RDF about documents, and the RDF decsribes titles for some, descriptions for others, creators for some and contributors for others, then without optional vaiables, to access this information would require different queries for all these different variations. This is the case if all query variables must be matched in order for the query to complete successfully.
If optional variables are allowed, this informatio can be accessed in one query.
"There are two "forms" that XUL templates may be written in. The "simple" form, which is currently the most common form in the Mozilla codebase, and the "extended" form, which allows for sophisticated pattern matching against the RDF graph."
Here is the extended form primer, and the XUL reference.
Algae by Eric Prud'hommeaux can also do optional variables. From a private communication about Algae's optional/required syntax:
optional terms: (ask '((requiredP1 ?n1 ?n2) (requiredP2 ?n2 ?n3) ~(optionalP3 ?n3 ?n4) ~(optionalP4 ?n4 ?n5)) collect '(?n1 ?n3 ?n5))
"Blank nodes are treated as simply indicating the existence of a thing, without using, or saying anything about, the name of that thing. (This is not the same as assuming that the blank node indicates an 'unknown' uriref; for example, it does not assume that there is any uriref which refers to the thing. The discussion of skolemization in the proof appendix is relevant to this point.)"
For more information, see the primer introduction.
Because b-nodes do not have names, it is not usually possible to ask for them by name directly. In Jena for example it is possible to use b-nodes in RDQL queries, but only if you construct the query programmatically via the api, having first retrieved the b-node identifier using another method. See brief chat with Andy Seaborne on IRC RDF interest group channel, 2003-02-27.
Reified statements (primer introduction) can usually be queried in the same way as you would query any other part of an RDF graph, for example, in Squish:
select ?s, ?p, ?o from http://example.com/reification/example.rdf where (rdf:subject ?statement ?s) (rdf:predicate ?statement ?p) (rdf:object ?statement ?o) (rdf:type ?statement rdf:Statement) using rdf for http://www.w3.org/1999/02/22-rdf-syntax-ns#
Jena seems to be a special case - see Jena-dev message from Dave Reynolds.
'Provenance' information is just some information about where the RDF data came from originally. Many RDF implementations store this kind of information. However provenance information is external to the RDF model, meaning that there is no way of representing for each triple where it came from within RDF. Reification will not do it as reified triples are not asserted. There is a construct in N3 that allows grouping of RDF statements which may be used for provenance information, but this is not part of the RDF Core working group's RDF Model theory.
What this means is that RDF query languages which query only the RDF model (the triples) cannot directly access provenance information, even if the underlying store retains this information. However, if a query language is able to return objects (rather than strings) bound to the variables, then it may be possible to retrieve provenance information from this source.
Alberto Reggiori's Perl implementation of RDQL in RDFStore has an optional fourth argument to each query triple. This fourth argument is the url from which the data to be retrieved originally came from. This is useful where the database is very large and the original source of the data to be searched for is known. more information about RDF Store and provenance.
Versa, David Allsopp's query language ('reachable'), RDF Objects (Alex Barnell).
Here is an example of how you might do this, by Dan Brickley.
People have begun to discuss this (summary) as a possible method for describing an RDF query in a neutral way for query test cases. There are various problems; for example:
Here are various other answers from the www-rdf-rules list.
The SWAD-Europe report Mapping Semantic Web Data with RDBMSes desribes several schemas used by different database systems for storing RDF in SQL databases. There is also a somewhat older survey by Sergey Melink.
For RDF queries, an interesting approach is Matt Biddulph's query rewriter. This takes a query described in a simple query language for RDF and rewrites it into the SQL required for retrieval from a one-table store. Matt has a PHP version, and there is also a Java version for two-table SQL schema for RDF, based on Matt's work.
The advantage of this approach is that applications need only make one hit on the database, rather than one for each part of the query.
Alternatively it may be possible to query ordinary relational database tables as RDF.
There is no single RDF query language and therefore no one query API. Jena does have a programmatic query API for its query language RDQL (Jena javadoc). See com.hp.hpl.jena.rdf.query.Query for more details.
RDF Access to Relational Databases, Eric Prud'hommeaux's new tool for generating specific tables from a generic triple store
Eric Prud'hommeaux a message about algae and cwm being able to query a relational database with an application-specific schema
There are some related threads from www-rdf-rules@w3.org January 2003, www-rdf-interest March 2002, and www-rdf-interest May 2001.
Many implementations have their own testsuites, however there is no cross-implementation set of tests available as yet.
There is now an RDF query testcases repository on the W3C site which contains proposals for RDF query manifest and result set formats, and details of ongoing IRC (internet Relay Chat) and face to face meetings. IRC meetings are currently being held regularly on this topic: everyone interested is welcome.
Preliminary discussions on this subject were held on the www-rdf-rules mailing list and summarised on the SWAD-Europe wiki. An IRC meeting was held in February 2003 on the topic of deciding a manifest format for tests (summary of discussion).
http://www.w3.org/TandS/QL/QL98/
A workshop held in 1998 which produced a number of very influential papers on RDF query (in particular Enabling Inference, by R.V. Guha, Ora Lassila, Eric Miller, Dan Brickley).
http://www.w3.org/2001/11/13-RDF-Query-Rules/
A summary of RDF query language characteristics, and sample queries from various implementations.
"This document is intended to provide an understanding of the concepts and issues related to querying semantic web data. Further it provides a survey of implementations. Web service-related examples come from a strawman WSDL RDF model proposed in another document."
http://rdfstore.sourceforge.net/2002/06/24/rdf-query/
A database of usecases and sample queries that can be added to online, including a very useful use-case based query comparison document, and a schema for usecases
Announcement,
Report of ad hoc Query/NetAPI meeting at ISWC/Sardinia.
"we, that is a group of implementors working on different versions of
SquishQL, RDQL and
other similar RDF query and rule languages, met in Chia for the ISWC2002
at the beginning of June;
after a very informal meeting we decided to set up a survey about Use
Cases and practical examples
about how to query and access remotely RDF databases."
http://139.91.183.30:9090/RDF/publications/tr308.pdf
This is a very detailed and comprehensive summary of RDF tools available. It documents url, documentation, tutorials, demonstration, versions, platform and pricing policies of tools for storage and query of RDF, DAML, Topic Maps. It also covers performance and scalability, inference support, query language, update and API support.
Announcement. "Apart from a general description of each language/tool, we provide preliminary criteria for comparing the expressiveness of the existing query languages as well as the technical characteristics of the supporting tools."
http://lists.w3.org/Archives/Public/www-rdf-interest/2002Jan/0220.html
A brief summary with links of various query implementations, sent to www-rdf-interest@w3.org, January 2001.
http://ilrt.org/discovery/2001/08/rdfquery-bof/
http://lists.w3.org/Archives/Public/www-rdf-interest/2002Jan/0199.html
This thread illustrates API calls and queries from various implementations for a particular testcase concerning datatype handling. (Note that this was before the method of handling RDF datatypes was decided on in the RDF Core working group).
see also the thread on www-rdf-rules@w3.org, November 2001
http://infomesh.net/2002/eep3/
"Eep3 is a general Semantic Web API written in Python, with various features:-
http://www.w3.org/2000/10/swap/doc/cwm.html
Cwm is a general-purpose data processor for the semantic web. It is a forward chaining reasoner which can be used for querying, checking, transforming and filtering information. Its core language is RDF, extended to include rules, and it uses RDF/XML or N3 serializations as required. Originally, from "Closed world machine" because it processed information in a limited space, cwm does not make any assumptions about a closed world. Think of it as defined area but with openings - like a valley. Cwm is written in python.
http://www.ninebynine.org/RDFNotes/RDFForLittleLanguages.htm
download, announcement to www-rdf-interest@w3.org
"I've been doing some experiments in the course of putting together a
simple
RDF application, covering RDF query formats, report generation/data
transformation, and using RDF to encode "little languages".
The primary goal was to build a simple but flexible application to
generate
HTML from RDF/N3 data Along the way, I've been experimenting with query
patterns and transformation/formatting templates, all coded in RDF/N3.
And
there's yet another N3 parser in Python."
http://www.w3.org/1999/11/11-WWWProposal/rdfqdemo.html
"The real point of the demo was to begin to explore quite how we'd like RDF query and data structures to show up in mainstream scripting environments, eg. what might the notion of a query 'result set' look like to an programmer working with an RDF query-able system in (say) Java, Javascript or Perl."
"Algae is a constraint-based query interface based on algernon."
http://uche.ogbuji.net/tech/rdf/versa/
"I am one of the developers of Versa, a query language for RDF. There are many other query languages for RDF, and probably will be at least until the community agrees to standardize. The Versa developers tried most of these and found them unsuitable for various practical reasons. In particular, Versa is designed to be integrated into other programming languages and systems. It is inspired in many ways by XPath, the very successful query language (of sorts) for XML."
http://www.hpl.hp.com/semweb/rdql.html
"RDQL is an implementation of an SQL-like query language for RDF. It treats RDF as data and provides query with triple patterns and constraints over a single RDF model. The target usage is for scripting and for experimentation in information modelling languages."
http://swordfish.rdfweb.org/rdfquery/
"This is an RDF query engine, written in Java, which can take SQL-like query strings and which uses the JDBC API"
http://www.ics.forth.gr/proj/isst/RDF/RQL/rql.html
download, demos, futher demos.
"RDFStore implements the RDQL language to query RDF repositories directly from Perl. The toolkit consists of a Perl API, a streaming SiRPAC parser and a generic hashed data storage custom designed for the RDF model. The storage sub-system allows transparently storage and retrieval of RDF nodes, arcs and labels, either from an in-memory structure, from the local disk or from a very fast and scaleable remote storage. The latter is a fast networked TCP/IP based transactional storage library that uses multiple single key hash based BerkeleyDB files together with an optimized network routing daemon with a single thread/process per database. The data indexing model is general enough to retrieve RDF subgraphs and properties using free-text and statement-group sensible matching. Each literal value gets indexed in its full Unicode form and in-memory data structures or objects can also be serialised on disk. The API supports bNodes (blank Nodes or anonymous-resources) but the storage internally does treat them like any other resource. Being in Perl, an un-typed language, the toolkit at the moment does not treat typed literals in any special way; all query filtering operations on the values are processed using pure Perl regular expressions and eval constructs." (Three Implementations of SquishQL, a Simple RDF Query Language, 2002)
http://edutella.jxta.org/reports/edutella-whitepaper.pdf
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Oct/0095.html
http://www.langdale.com.au/RDF/NexusQueryLanguage.pdf
http://lists.w3.org/Archives/Public/www-rdf-interest/2002May/0063.html
http://www.intellidimension.com/
http://tap.stanford.edu/overview.html
http://www-uk.hpl.hp.com/people/afs/Joseki/
http://sesame.aidministrator.nl/
A Prolog engine written in Java (refactored from XProlog) which is loaded with RDF rules. At present the knowledgebase can be entered as Prolog-format triple(X, Y, Z) or as N-Triples. It has a simple command-line style UI. More information.