Ruby-RDF: Database and Query Support

by Dan Brickley

This page documents the Ruby-RDF database and query tools.

Overview

RubyRDF is evolving some data query facilities. There is a PostgreSQL-based storage and query system, a generic RDF query engine that works over the basicrdf API, and some Web Services code providing client and server tools for RDF query. This work is inadaquately document and tested, and is far from complete. A screenshot of RDFAuthor querying a RubyRDF RDF query server shows that something is working. The squish/ directory contains the implementation details and some example scripts. graphical query of ruby server using rdfauthor (screenshot)

PostgreSQL data query

The basicrdf.rb code now provides some simple support for storing RDF data in PostgreSQL relational tables, and (in squish.rb for translating RDF queries into suitable SQL. The latter code is a reworking (pretty much a java2ruby transliteration) of the Inkling Java Squish2SQL tool, and is implemented in squish.rb alongside a parser for the Squish RDF query language.

These facilities are not yet integrated into any kind of comprehensive API. They do work reasonably well on the commandline, though.

Data storage

RDF query

These assume you have a PostgreSQL database up and running, with data organised as above, and that you've some way of actually running the SQL queries that squish.rb generates. One way is simply to pipe them into the commandline tool 'psql'.

Examples

Some example usage from the commandline.

Loading from SQL script and querying with Squish

Some example commandline interactions, with annotations and ommissions for readibility...

wget http://rdfweb.org/2002/02/java/codepict-data/codepict.SQL

danbri@fireball:~/s-rubyrdf/tmp > createdb rubytest1

CREATE DATABASE

danbri@fireball:~/s-rubyrdf/tmp > psql rubytest1 < codepict.SQL
CREATE

SELECT DISTINCT b2.value AS name, b3.value AS thumb, b6.value AS mbox FROM  triples a1,  triples a2,  triples a3,  triples a4,  triples a5,
triples a6,  triples a7, resources b2, resources b3, resources b6
WHERE
        b2.key=a5.object AND b3.key=a6.object AND b6.key=a4.object AND a1.predicate = '116868652' AND a2.predicate = '116868652' AND
a3.predicate = '1547507681' AND a3.object = '1145937192' AND a4.predicate = '1547507681' AND a5.predicate = '1577895888' AND a6.predicate =
'-1848367484' AND a7.predicate = '-221079518' AND  a1.object = a2.object  AND  a2.object = a6.subject  AND  a6.subject = a7.subject  AND
 a1.subject = a3.subject  AND  a2.subject = a4.subject  AND  a4.subject = a5.subject


danbri@fireball:~/s-rubyrdf/tmp > ../squish/squish.rb ../squish/samples/test1.squish | psql rubytest1

(query results should scroll bye)

Loading from RDF/XML
danbri@fireball:~/s-rubyrdf/tmp > ../util/redparse ../examples/data/foafcorp.rdf | ../db/nt2sql.rb > tmp.sql

note:
    redparse is a perl script that uses the redland parser to get NTriples; 
    Ruby-RDF's XSLT-based parsing support isn't stable yet.

As before, this gives us lots of SQL of the following form:

    insert into resources values ( '-435081385', 'Bob Wright' );
    insert into resources values ( '-2004095611', 'Safeway' );
    insert into resources values ( '1698796385', 'http://rdfweb.org/2002/02/theyrule#pid_359' );
    [...]
    insert into triples values ('-471625023', '-425569083',  '-509211168','assertid-src-notyet:ruby-rdf:$Id: dataquery.html,v 1.9 2002/11/30 22:56:57 danbri Exp $','personidid:notyet','t');
    insert into triples values ('-471625023', '-425569083',  '-342378553','assertid-src-notyet:ruby-rdf:$Id: dataquery.html,v 1.9 2002/11/30 22:56:57 danbri Exp $','personidid:notyet','t');
    insert into triples values ('-471625023', '-425569083',  '1479696990','assertid-src-notyet:ruby-rdf:$Id: dataquery.html,v 1.9 2002/11/30 22:56:57 danbri Exp $','personidid:notyet','t');
    [...]
    (note the junk in the 4th and 5th columns; we don't use these fields yet)

And again, as before, we can load up this data into our PostgreSQL triples table. Here we show how to load it into the same table. This would be more interesting if the RDFWeb co-depiction dataset and the FOAF-Corp datasets actually overlapped...

cat tmp.sql | psql rubytest1

(lots of INSERT statements should scroll past)

Now we try querying... (see query file

../squish/squish.rb ../squish/samples/test2.squish | psql rubytest1

(and more results scroll past)

The query it generated was:

SELECT DISTINCT b1.value AS x, b2.value AS n FROM  triples a1, resources b1, resources b2
WHERE
        b1.key=a1.subject AND b2.key=a1.object AND a1.predicate = '-1296757095'


test3.squish also works against this dataset:

     ../squish/samples/test3.squish | psql rubytest1

The end.

Summary

The Ruby-RDF query and database support is minimalistic, but suprisingly useful. You can pipe data from the Web into a database and run RDF queries without your applications caring anything about how the data is stored. This is rather handy...

The idea is to make things as easy as 1, 2, 3:

(1) Create an RDF store, (2) fill it with web data, (3) ask it questions...

Here we do the whole thing, including grabbing an RDF dump of the FOAFCorp/theyrule dataset from the Web:

Plans

It would be nice to tie this stuff up with the API in basicrdf.rb, and to have some tests that reassure me it's all working ok...

Dan Brickley