author: Dan Brickley
This document consists of my raw notes from a meeting at ILRT, Bristol of several RDF / Semantic Web developers, where we discussed approaches to RDF query language design and implementation. These notes should not be considered minutes or a formal record, and have not been reviewed by the other participants.
Libby Miller; Andy Seaborne; Jeremy Carroll (?sp); Dan Brickley; Brian McBride; Dave Beckett; Jan Grant;
Libby and andy originally arranged to chat; as several of the rest of us were meeting for an rdf core telecon this expanded somewhat into an ad-hoc 'semantic web southwest' discussion and lunch.
Nearby: Libby and Andy's issue list | W3C Semantic Web Development | Inkling | RDFdb | Redland | Jena | W3C Perllib | Enabling Inference paper | Semantic Web South-West mailing list (bristol RDF discussions) | RDF IG
andy:
goal is to model workflows explicitly; want a big soup of shared information. An
architecture for sharing disparate information.
examples: tescos recently give supplies access to point of sale info,
better flow of info. tesco charge for epos. takes lots of managers out of
the decision loop. Want this style of operation for b2b, hence rdf
interest.
matter of organising, want a more 'open'(?) system than xml (eg. freely
annotational). Using RDF for modelling. Businesses like absolutes, want
true/false etc w.r.t. models, no wooliness.
re query, want to be able to get at all this information; data query
against the graphs.
looked around, found squish.
libby:
been working on rdf for a while now, a few databases that can handle rdf.
We were working on a fairly low level API, node/arc interfface. So next
step was to make it as easy as possible to use; keep it understandable for
the kind of humans who're used to scripting. Focus on what it looks like
to use in apps, rather than optimised implementation.
have found it much easier to prototype apps.
quetsion now: do we want OR as well as ANDs!
do we want transitive closure a la RQL etc...
nail down a syntax...
problems with syntax;
no inverted commas around literals
sql like constraints '<' etc., reflect into rdf relations
working on relation to sql datatypes (for SQL-backed storage).
jeremy:
in design of a query language, can ignore WG
andy: you talked about OR queries, what use cases?
libby: this came up a while back. Apps can build things with lots of
little queries or one big one
jan: we're doingr record retreival, rdf as an encoding protocol
we use udc, library of congress etc. And I want to be able to say
'give me udc=101.10 or a dcc equiv or...". I don't want find the
conjunction...
jeremy: looks more complicated! user query may be "...or some equivalent"
rather than have the user compose it all explicitly.
jan: we treat it like stemming...
andy: identical problem with UDDI, since multiple ways of classifying
industries etc. And we want to reflect the hierachies.
need to do translation, query rewrites etc.
UDDI thinking of OR/AND
jeremy: implicit OR in rdf query anyway...
jan: if you're looking for bits of graph, and two sub-constraints are
independently met but don't join, you get cartesian product
jan: you need some boolean capability, (which SQL does have...)
...seems very useful. Output is a table of variable bindings,
very simple.
for general rdf query, you're going to want the rdf view too
danbri: some heritage here, w.r.t guha's implementation and the enabling
inference paper: we saythere are two views, as a blob of rdf, or a table
of bindings.
brian: a query, you can get a list of statements, or a model
could extend for squish-like query
danrbi: but query is
brian: jena query interface is pretty abstract, you pass in a selector
object...
...
danbri: why use jena query interface instead of say JDBC
brian: perhaps we're t
..queryreturns a model,
are you proposing that they don't see an rdf api
dan: yep
brian; could do that (libby implemented much of this)
andy: how long can we keep up the illusion
jan: so long as the 'sql' queries begin 'SELECT' :)
andy: brian suggested putting query engine inside jena; we need to talk
about interface to ultimate store that's answering the query
brian: depends what you mean by inside
currently query engine going through jena public apis
...
query processor is going to want to have some internal knowledge
of how internal store is structured
jeremy: sounds like you're proposing something different
danbri: was thinking of a flag, eg a java interface, saying "I'm capable
of eating these queries whole, don't hammer my dumb rdf graph apis"
jan: a lot of similar languages around, eg. guhas, ericp's
brian: what do we hope to get out of this...
what did andy want to get...?
jeremy: [takes to whiteboard]
looked at regular expression queries, eg.
subclass / subclass queries. Can imagine a regex language
for queries, impl. by finite state automata.
andy: you neednt be quering just triples,
danbri (interrupting): squish queries needn't be against concretely stored
queries; there might be implied arcs
andy: languages, APIs, engines,...
query representation goes between languages/api and engines
eg. (?x ?y ?z) "these are just operators". Query engine needn't
know anything about rdf.
[P ? P 2]
like sql in that you don't get to manipulate the tables
you'll end up with generators. you'll end up with generators
that only create triples when you ask them certain questions
brian: you could have an inferencing query processor that does general
logic stuff; or you could have something with limited smarts, eg.
can do subclass / subproperty stuff.
worth looking at what the xml query guys have done
jan: yes, needn't be too constrained by sql too much
eg want to be able to get little bits of subgraph back, eg.
find me everything with mathematics as its dc:subject and then
some part of the graph nearby
andy: do we do this in query language, or elsewhere in the pipeline?
rather than as per sql have it as a huge dumping ground for all
this functionality?
sqls implementations aren't particularly interoperable
jeremy: need for a query optimiser. doing mathematical ops on filtering,
generation stage.
andy: create it, generate access plan, execute it.
jeremy: but access plan etc needs to sit down in the engine
brian: question re libby's squish implementation, does it map onto this
picture.
libby: probably!
syntax (in flux), a parser, parsed into an in-memory query object
some machinery for passing queries to a query engine, using JDBC API
once we've got a query representation, the engine knows about a
datasource and (currently) sits on top of a Jena, SiRPAC or any
other triples matching API. Then extra constraints sit on top.
What trying to do now, pass query representation thru to for eg
SQL stores.
andy: this isn't dissimilar, except i don't allow multiple databases,
and these can sit behind a composite
dan: ?
dave: want multiple datasource, eg for views
jeremey: should there be explicit support in query language for asking
against multiple sources? or applicatoin does this?
andy: sql has this explicitly or implicitly
jan: concern, if you base a query engine on top of fundamental primitives
if you do this atop
andy: same gruntwork, we're arguing about which layer
danbri; example, if we have things eg like the Google backlinks service,
we want the query engine layer, not a triple-layer, thing to do
the reasoning about which thing to ask first.
jan: almost missed brian's point; sometimes you want to pull out a bit of
a graph...
jeremy: result set needs to be able to return resources, particular anon
resources, we actually want to return a java object
jan: anon resources
danbri: we need to know that the database didn't have a URI name for a resource
danbri: in my perl api, a node can switch out the database that represents
it, in jena?
brian: jena binds the java objects more closely to their describing database
...envisage being able to smush so we can use email addresses etc
jeremy: want this as a view, eg. that a query is against a db that does smushing
danbri: is this params in the query language, or is it a service
discovery, description layer
dave: want flags for 'this does smushing' etc
jan: one way to proceed, look at the kinds of queries we want to answer
eg. if i'm hacking away with a triples api...
andy: that's where i got to with this
danbri: shows the power of the model
jeremy: lack of transitive closure , won't the onto folk dislike this?
jan: yes, onto need that.
from clause does part of this, but
dan: can't the service do this (trans closure etc) anyway
[...]
brian: suppose i have a different view...
if i do a query 'is x a subclass of y' i can have a language to
express that. but i can also have a language for adding rules.
isn't there advantage in doing this separately?
do the rules thing in a separate langauge
danbri: yup, that's my view
andy: we want it to spit out rdf and eat rdf, for composability
jeremy: a new rdf model that depends on a query against the old rdf model;
so we could do it as a virtual implementation
dave: that's how i planned to do it
brian: i prefer the virtual implementation
danbri: may give more basis for fancy rewriting
dave: yes, do it as views
brian: query against mulitple things that are themselves views
then optimise it down
jeremy: that's disgusting! i mean, really confusing to use...
brian: I have some working code that does this
danbri: for query language end
brian: for SQL
andy: w.r.t hinting being passed down, pass constraints,
dave: conjunctive query ORs etc., more than just sequence of triples
andy: transitive closure takes you outside relational algebra
dave: it would an rdf model
dan: what do you mean 'an rdf model'
dave: 'written in' rdf
andy: do you have examples of regular expressions?
jan: example...
sc ---subprop-->subclassOf
a--sc-->b
you can't express this in regular expressions
if i want to find all As that are subclass of Bs
brian: double transistive closure
jan: want pieces of regular expressions; a simple regex engine won't
get there further
jan: "i'm not sure you'll always want to do this sort of thing, but
sometimes you might; eg. query language optimisation"
danbri: what about eg DAML, or RDF versioning
jeremy: ugly brilliant example :)
danbri: [ rambling account of origins of rdfs:subPropertyOf, dc datamodel etc]
jermemy: one regular expression type that I couldn't find many
implementations for...
"connect a resource to others
by alternating pairs of property P and property Q"
something like a cousin relationship
brian: you could go around in circles a long time with these queries :)
jan: particularly in devon...
danbri: is there a value for a simple query language with multiple
implementations
brian: would be a good start
andy: I started this viewing it just as data; writing Jena fragments is a
pain for that kind of application
andy: Somethign at the squish level, or even "heres a triple plus a
constraint"
libby: just pootling around a graph isn't enough, you want constraints
jeremy:
--
dan: swcg / daml+oil being query
dave: what's scope of a simple query language
andy: some sql level thing, beigns with 'selecct'
andy: my personal style, begin with something simple and done
dave: start at the bottom
dan: send a string to a service, get back a bunch of data
dave: that's too vague to implement
libby: we have a bunch towards this already; issue is the exact sort of
constraints we want to implement
brian: is squish a good start
jan: yeah, there are bunch of operators
you can partition it into two parts
i think you want conjunctive and disjunctive
danbri: conjunctive only is visualisable, explainable to developers etc
andy: let's nail down a throwaway language
dan: jan, can you live without OR?
jeremy: could the query language be AND
but the implementations be OR
you seem to want do somethign real simple for practical and
political reasons
danbri: could we rule OR out of scope initially, but have the concrete
syntax allow a place for OR to live syntactically
andy:
you can do the OR by doing it in the constraints section;
the optimiser could be clever
jeremy:
SELECT ?a ?b
FROM ...
WHERE
?a p1 ?b
OR
?a q1 ?c
?c q2 ?b
becomes
AND (?a= ?a1 AND ?b=b2 )
OR (?a=?a2 AND ?b=?b2 )
[some discussion]
jeremy: doesn't work
dave: I would vote for putting in disjunction
jan: requirement is that the variables projected out
andy: i'd like something simple for starters
disjunction likely out...
dave: does not simple include views, regex, typing, classes etc
danbri; you're asking questions, getting answers:
Q:"Is fido a mammal?"
A: yes (clever service)
B: no (dumb service)
punts work onto service description
andy/jeremy: hmm, danger of being pseudo-compatible a la JDBC
j: this is fairly persuasive partioning
dave: who'll write this up
libby: I'd like to write up description of existing
resolved: we're happy talking on a public egroups list for Bristol RDF/SW
geeks
action: dan set up mailing list
dan: libby, will you summarise query languages
l: yup
andy: where are the ucertainties in the spec?
l: commas, literals not in inverted commas,
shortened version of URIs.
geoff's use of [] around full URIs instead of ::
brian: random suggestion, try using same convention as n-triple
(put language as part of the literal... ;-)