RDF Query Discussion notes, 2001-07-07

author: Dan Brickley

Overview

This document consists of my raw notes from a meeting at ILRT, Bristol of several RDF / Semantic Web developers, where we discussed approaches to RDF query language design and implementation. These notes should not be considered minutes or a formal record, and have not been reviewed by the other participants.

Participants

Libby Miller; Andy Seaborne; Jeremy Carroll (?sp); Dan Brickley; Brian McBride; Dave Beckett; Jan Grant;

Context

Libby and andy originally arranged to chat; as several of the rest of us were meeting for an rdf core telecon this expanded somewhat into an ad-hoc 'semantic web southwest' discussion and lunch.

Nearby: Libby and Andy's issue list | W3C Semantic Web Development | Inkling | RDFdb | Redland | Jena | W3C Perllib | Enabling Inference paper | Semantic Web South-West mailing list (bristol RDF discussions) | RDF IG

Raw Discussion Notes

andy:
goal is to model workflows explicitly; want a big soup of shared information. An
architecture for sharing disparate information. 

examples: tescos recently give supplies access to point of sale info,
better flow of info. tesco charge for epos. takes lots of managers out of
the decision loop. Want this style of operation for b2b, hence rdf
interest.

matter of organising, want a more 'open'(?) system than xml (eg. freely
annotational). Using RDF for modelling. Businesses like absolutes, want
true/false etc w.r.t. models, no wooliness.

re query, want to be able to get at all this information; data query
against the graphs. 

looked around, found squish.

libby:

been working on rdf for a while now, a few databases that can handle rdf.
We were working on a fairly low level API, node/arc interfface. So next
step was to make it as easy as possible to use; keep it understandable for
the kind of humans who're used to scripting. Focus on what it looks like
to use in apps, rather than optimised implementation. 

have found it much easier to prototype apps.

    quetsion now: do we want OR as well as ANDs!

    do we want transitive closure a la RQL etc...

    nail down a syntax...


problems with syntax;
    no inverted commas around literals

    sql like constraints '<' etc., reflect into rdf relations

working on relation to sql datatypes (for SQL-backed storage).

jeremy:
    in design of a query language, can ignore WG


andy: you talked about OR queries, what use cases?

libby: this came up a while back. Apps can build things with lots of
    little queries or one big one

jan:    we're doingr record retreival, rdf as an encoding protocol
    we use udc, library of congress etc. And I want to  be able to say
    'give me udc=101.10 or a dcc equiv or...". I don't want find the
    conjunction...

jeremy: looks more complicated! user query may be "...or some equivalent"
rather than have the user compose it all explicitly.

jan:    we treat it like stemming...


andy:   identical problem with UDDI, since multiple ways of classifying
    industries etc. And we want to reflect the hierachies.

    need to do translation, query rewrites etc.

    UDDI thinking of OR/AND

jeremy: implicit OR in rdf query anyway...

jan:    if you're looking for bits of graph, and two sub-constraints are
    independently met but don't join, you get cartesian product


jan: you need some boolean capability, (which SQL does have...)

    ...seems very useful. Output is a table of variable bindings,
    very simple.

    for general rdf query, you're going to want the rdf view too


danbri: some heritage here, w.r.t guha's implementation and the enabling
inference paper: we saythere are two views, as a blob of rdf, or a table
of bindings.

brian: a query, you can get a list of statements, or a model
    could extend for squish-like query

danrbi: but  query is 

brian:  jena query interface is pretty abstract, you pass in a selector
    object...

...


danbri: why use jena query interface instead of say JDBC

brian: perhaps we're t

    ..queryreturns a model, 

    are you proposing that they don't see an rdf api

dan: yep

brian; could do that (libby implemented much of this)

andy: how long can we keep up the illusion

jan: so long as the 'sql' queries begin 'SELECT'  :)

andy: brian suggested putting query engine inside jena; we need to talk
    about interface to ultimate store that's answering the query

brian: depends what you mean by inside
    currently query engine going through jena public apis
    
    ...
    query processor is going to want to have some internal knowledge
    of how internal store is structured



jeremy: sounds like you're proposing something different

danbri: was thinking of a flag, eg a java interface, saying "I'm capable
    of eating these queries whole, don't hammer my dumb rdf graph apis"

jan:    a lot of similar languages around, eg. guhas, ericp's

brian:  what do we hope to get out of this...

    what did andy want to get...?


jeremy: [takes to whiteboard]
    
    looked at regular expression queries, eg.
    subclass / subclass queries. Can imagine a regex language
    for queries, impl. by finite state automata.
    

andy: you neednt be quering just triples,

danbri (interrupting): squish queries needn't be against concretely stored
    queries; there might be implied arcs


andy: languages, APIs, engines,...
    query representation goes between languages/api and engines

    eg. (?x ?y ?z) "these are just operators". Query engine needn't
    know anything about rdf.
    [P ? P 2]

    like sql in that you don't get to manipulate the tables

    you'll end up with generators. you'll end up with generators 
    that only create triples when you ask them certain questions


brian:   you could have an inferencing query processor that does general
    logic stuff; or you could have something with limited smarts, eg.
    can do subclass / subproperty stuff.

    worth looking at what the xml query guys have done

jan:    yes, needn't be too constrained by sql too much

    eg want to be able to get little bits of subgraph back, eg. 
    find me everything with mathematics as its dc:subject and then
    some part of the graph nearby

andy:   do we do this in query language, or elsewhere in the pipeline?
    rather than as per sql have it as a huge dumping ground for all
    this functionality?
    sqls implementations aren't particularly interoperable

jeremy: need for a query optimiser. doing mathematical ops on filtering,
    generation stage.

andy:   create it, generate access plan, execute it.
    
jeremy: but access plan etc needs to sit down in the engine

brian: question re libby's squish implementation, does it map onto this
    picture.

libby:  probably!
    
    syntax (in flux), a parser, parsed into an in-memory query object

    some machinery for passing queries to a query engine, using JDBC API
    
    once we've got a query representation, the engine knows about a 
    datasource and (currently) sits on top of a Jena, SiRPAC or any
    other triples matching API. Then extra constraints sit on top.

    What trying to do now, pass query representation thru to for eg
    SQL stores.


andy:    this isn't dissimilar, except i don't allow multiple databases,
    and these can sit behind a composite    

dan: ?

dave: want multiple datasource, eg for views

jeremey: should there be explicit support in query language for asking
    against multiple sources? or applicatoin does this?

andy: sql has this explicitly or implicitly

jan: concern, if you base a query engine on top of fundamental primitives
    if you do this atop 

andy: same gruntwork, we're arguing about which layer 

danbri; example, if we have things eg like the Google backlinks service,
    we want the query engine layer, not a triple-layer, thing to do
    the reasoning about which thing to ask first.

jan: almost missed brian's point; sometimes you want to pull out a bit of
    a graph...

jeremy: result set needs to be able to return resources, particular anon
resources, we actually want to return a java object

jan:    anon resources



danbri: we need to know that the database didn't have a URI name for a resource



danbri: in my perl api, a node can switch out the database that represents
it, in jena?

brian: jena binds the java objects more closely to their describing database

...envisage being able to smush so we can use email addresses etc

jeremy: want this as a view, eg. that a query is against a db that does smushing

danbri: is this params in the query language, or is it a service
discovery, description layer

dave: want flags for 'this does smushing' etc

jan:    one way to proceed, look at the kinds of queries we want to answer
    eg. if i'm hacking away with a triples api...

andy: that's where i got to with this

danbri: shows the power of the model

jeremy: lack of transitive closure , won't the onto folk dislike this?

jan:    yes, onto need that.
    from clause does part of this, but 


dan: can't the service do this (trans closure etc) anyway

[...]

brian: suppose i have a different view...
    
    if i do a query 'is x a subclass of y' i can have a language to
    express that. but i can also have a language for adding rules.

    isn't there advantage in doing this separately?
    do the rules thing in a separate langauge

danbri: yup, that's my view 

andy: we want it to spit out rdf and eat rdf, for composability


jeremy: a new rdf model that depends on a query against the old rdf model;
    so we could do it as a virtual implementation

dave:   that's how i planned to do it

brian:  i prefer the virtual implementation

danbri: may give more basis for fancy rewriting

dave:   yes, do it as views

brian:  query against mulitple things that are themselves views
    then optimise it down


jeremy: that's disgusting! i mean, really confusing to use...

brian: I have some working code that does  this 

danbri: for query language end

brian: for SQL


andy: w.r.t hinting being passed down, pass constraints, 

dave: conjunctive query ORs etc., more than just sequence of triples

andy:   transitive closure takes you outside relational algebra


dave: it would an rdf model

dan: what do you mean 'an rdf model'

dave: 'written in' rdf


andy: do you have examples of regular expressions?

jan: example...

    sc ---subprop-->subclassOf
    a--sc-->b

you can't express this in regular expressions

if i want to find all As that are subclass of Bs

brian: double transistive closure

jan: want pieces of regular expressions; a simple regex engine won't
    get there further


jan:    "i'm not sure you'll always want to do this sort of thing, but
    sometimes you might; eg. query language optimisation"



danbri: what about eg DAML, or RDF versioning

jeremy: ugly brilliant example :)

danbri: [ rambling account of origins of rdfs:subPropertyOf, dc datamodel etc]

jermemy: one regular expression type that I couldn't find many
    implementations for...
    
    "connect a resource to others
    by alternating pairs of property P and property Q"

    something like a cousin relationship


brian: you could go around in circles a long time with these queries :)

jan: particularly in devon...

danbri: is there a value for a simple query language with multiple
implementations

brian: would be a good start

andy: I started this viewing it just as data; writing Jena fragments is a 
    pain for that kind of application

andy: Somethign at the squish level, or even "heres a triple plus a
    constraint"


libby:  just pootling around a graph isn't enough, you want constraints

jeremy: 


--
dan: swcg / daml+oil being query


dave: what's scope of a simple query language

andy: some sql level thing, beigns with 'selecct'

andy: my personal style, begin with something simple and done

dave: start at the bottom

dan: send a string to a service, get back a bunch of data

dave: that's too vague to implement

libby: we have a bunch towards this already; issue is the exact sort of
    constraints we want to implement

brian: is squish a good start

jan:    yeah, there are bunch of operators
    you can partition it into two parts
    
    i think you want conjunctive and disjunctive 

danbri: conjunctive only is visualisable, explainable to developers etc

andy:   let's nail down a throwaway language

dan:    jan, can you live without OR?


jeremy: could the query language be AND
    but the implementations be OR

    you seem to want do somethign real simple for practical and
    political reasons

danbri: could we rule OR out of scope initially, but have the concrete
syntax allow a place for OR to live syntactically

andy:
    you can do the OR by doing it in the constraints section;
    the optimiser could be clever


jeremy:
    SELECT ?a ?b 
    FROM  ...
    WHERE 
    ?a p1 ?b
    OR
    ?a q1 ?c
    ?c q2 ?b

becomes
    AND (?a= ?a1 AND ?b=b2 )
     OR (?a=?a2  AND ?b=?b2 )

[some discussion]

jeremy: doesn't work


dave: I would vote for putting in disjunction

jan: requirement is that the variables projected out


andy: i'd like something simple for starters

      disjunction likely out...

    
dave: does not simple include views, regex, typing, classes etc



danbri; you're asking questions, getting answers:
    Q:"Is fido a mammal?"
    A: yes (clever service)
    B: no (dumb service)

    punts work onto service description


andy/jeremy: hmm, danger of being pseudo-compatible a la JDBC

j:  this is fairly persuasive partioning

dave:   who'll write this up

libby: I'd like to write up description of existing 




resolved: we're happy talking on a public egroups list for Bristol RDF/SW
geeks

action: dan set up mailing list


dan: libby, will you summarise query languages

l: yup

andy: where are the ucertainties in the spec?
l: commas, literals not in inverted commas,
    shortened version of URIs.
    geoff's use of [] around full URIs instead of ::

brian: random suggestion, try using same convention as n-triple

   (put language as part of the literal... ;-) 


maintained by: danbri@w3.org