Ideas for an RDF query language

This page is a starting point for gathering (!) ideas/suggestions/requirements/comments on RDF Query Languages.

The idea of using a Wiki area grew out of several ad hoc conversations, out of the RDF Query Testcase chats, organised by Libby Miller, on the rdf-interest IRC channel irc://irc.freenode.net:6667/rdfig#rdfig and email on www-rdf-rules (Archive).

A few resources as starting points:

Query test cases
RDF Query and Rule languages Use Cases and Examples survey
Tim Berners-Lee's notes for the March 2003 W3C Tech plenary.
RDF Query and Rules: A Framework and Survey
The W3C Public Query mailing list (Archive) www-rdf-rules@w3.org

Please add your ideas or comment on other peoples - think of this as informal requirements gathering. When there is enough material, we will summarise it.

 -- AndySeaborne, Alberto Reggiori // April 2003

Initial Summary

From the discussion on email -- AndySeaborne

This initial list is taken from Alberto's message and Andy's reply. Only short summaries are given here.

Optional Matches

RDF is semistructured data. When querying RDF, real applications often need to extract information that may be there but does not want the whole query to fail (example: get the publication date of a book if there is one but get the book title.

If the query is returning RDF, this means returning optional triples. If the query is returning bound variable, this means returning extra variables on some matches.

DQL DQL DQL Syntax provides "may-bind" variables, which indicate that the variable need not be bound in order for th equery to succeed.

Provenance Information

RDF sources gathered from a variety of places, once parsed and stored into an RDF database are "flattened". At the query time you very often need to filter them based on the "context" where they have been asserted - i.e. source URL or some other RDF resource which could be further described.

Result formatting

Applications often want results back in RDF for further processing: the results of query are formatted as RDF.

Some ideas in this area are provided in SeRQL announcement and manual

This is getting into the area of rules - see "cwm --filter". cwm. See also Tim Berners-Lee's notes for the March 2003 W3C Tech plenary in the section "The abstract syntax of the returned result".

Comments

A section for short comments - add an area if the current ones don't suit. Add a whole new section or add a wiki page for longer ones.

General

Some of the features of interert here are at the RDF level and some are outside the current RDF.

We know that current RDF isn't enough as your contextual/provenance discussion highlights 9or at least is unproven). It is important to distinguish when a feature is RDF and when a feature is beyond RDF because there will be other systems experimentation with the next wave of "RDF" (RDF-ng) while the core RDF is more stable.

-- AndySeaborne

Optional Matches

Provenance Information

Result Formatting

Query Processing Model

AndyS: (from an IRC chat with DanBri):

One generalization is for queries to be in three parts: locate, extract and present.

Locate is all exact matches of the (conjunctive) query graph pattern (c.f. QL98) and produces a number of solutions, where each solution is a set of variable bindings.

Extract is zero or more optional patterns, each of which is tried for each exact match and can extend each variable binding set with new variables (need not do so for all solutions).

Present deals with the form of the output. This may be the variables actually required, like in a SELECT clause, or it may be an RDF template where a graph has variables in it. The values of the variables are substituted to form a subgraphs for each solution.

Example:

In a system like cwm, the left-hand side of log:implies is the locate part, the right-hand side is the present part. There is no extract (optional bindings).

Similarly in RDQL and Inkiling, the locate part is the WHERE clause, the present part is the SELECT clause (not RDF) and there is no extract part.

DQL does have this optionality through the 'may bind' variables.

Issue: this implicit model of query execution may be limited in tha optional variable bindings in the extract part do not take part in the locate process.

What use cases does this miss?

Is it compatible with a path-like view of query?

I'm wondering if DataBinding between your program's native data format, and an RDF graph, would be considered part of the presentation, or a stage beyond the presentation. (Since, it's not necessarily a part of a query- it could just be part of serializing and deserializing.)

-- LionKimbro DateTime(2004-07-04T23:56:36)