From W3C Wiki

RubyRDF was an experimental bundle of RDF-related tools written for the Ruby programming language.

The project is now finished, unlike the code. There are now a number of other (more complete) Ruby RDF implementations. RubyRdf shouldn't be considered to compete with these. See the public-rdf-ruby list at W3C for discussions.

RubyRDF includes a parser (using either Expat or REXML XML parser), storage (via PostgreSQL or MySQL), manipulation and query (Squish2SQL rewriter or generic / in-memory query engine).

  • Overview
  • RDF database
  • RDF query
  • RDF parser
  • Testing and Bugs

Nearby: RubyRdfDatabaseSetup, RubyRdfInstallationLog, WebDataInterfaceDesign, RdfPerlLib

Topics being explored with this codebase: EventDiscovery, RestaurantRecommendation, ThingsVersusTheirNames, PersonsVersusTheirDescriptions, RdfCalendar, FaqIdeas, JabberChickenEgg, SemanticWeb4ContentFiltering

What is RubyRdf?

This is a simple, experimental RDF system implemented in the Ruby programming language. It serves three purposes: to help me learn Ruby, to support an RDFWeb idea I'm prototyping, and to explore some design options for RDF APIs. This is not production-grade stuff. --DanBri

NOTE: This Wiki site is something of a ProxyTopic shadowing the 'real' RubyRDF page. But it's easier to edit and to involve collaborators with, so it is likely that the RubyRDF website will be slimmed down in favour of Wiki-based documentation. Hmm, I just coped the RubyRDF home page into this wiki. reworking now. --DanBri

The Resource Description Framework (RDF) provides conventions for data mixing and sharing using XML (document format) and URI (naming) technologies. RubyRDF in turn provides some utilities for reading, storing, querying and merging RDF/XML data. It uses SQL-backed storage systems (PostgreSQL and MySQL), includes a parser (for interpreting RDF's XML syntax) and interfaces for accessing parsed, stored and queried RDF data.

OK, how do I download RubyRdf?

Pretty rough, but available for the curious:

What's new with RubyRdf?

Recent changes that may be of interest...

April 2003
  • documentation cleanup (moving to Wiki for most of it)
  • integrating RDF core tests for parser
  • integrating RDF query test case machinery
  • added rdf:nodeID support, see tests.

The rdf:nodeID fix is necessary background to hooking up the query components to the rdf query testcases being developed in the RDF Interest Group.

Any old news worth remembering?

News: 2002-03-20 - These tools were spotted on ruby-talk so I've created a RubyRDF entry in the Ruby Application Archive (RAA).
I've also been learning a bit more about the RAA data, the Rdoc ruby documentation tool (helped add RDF support into it) and other software description apps from the Ruby community ([swdoc/ rough notes and experiments]). A mini victory was getting Rdoc-based RDF queriable via PostgreSQL (see successlog). Nearby in the Web: a note I'm writing on Web services and RDF: 'Techniques for SOAP data aggregation and testing' uses these tools and SOAP4R to (eventually) show RDF query of data from the RAA SOAP service, merged with Rdoc. I've also downloaded and Rdoc'd 187 Ruby packages to see if we can do queries on things like "who has subclassed this class" across the entire Ruby community. In progress. (If we had unique identifiers for Ruby classes this would be pretty easy...).
News: 2002-02-27 - I did some more work on this! See [dataquery.html database and query support] document for details. This includes a re-coding of an RDF query to SQL filter from Libby Miller, so Ruby-RDF now has basic RDF query support (requires PostgreSQL database).

What is Ruby?

Ruby is an object-oriented scripting language that I finally got around to investigating this week. More information is available from the Ruby home page, as well as from sites like Ruby Central, where you can find an online book Programming Ruby and other introductory and reference material.

How do I use RubyRdf?

First bewarned this is a work in progress. Don't go flying spaceships with it...

To summarise. In Ruby-RDF you can:
  • load RDF from documents in RDF/XML or NTriples syntax
  • create, merge and query RDF graphs from within Ruby
  • export RDF data as text documents in NTriples syntax
  • more things I've not summarised here yet

Can you summarise the features without boring me?

(@@this needs update)

The API features the usual notions of Graph, Statement and Node based on the corresponding structures in W3C's RDF information model. A {[Graph } is an object that encapsulates some data, conceived of as RDF statements, which we think of as the directed labeled arcs in our graph. A Node is an object representing a 'resource' or 'literal' content in the RDF graph. Resource nodes may be blank or named with a URI; literal nodes have data content (text etc.).
There is also a simple XSLT-based RDF import facility, using Jason Diamond's XSLT RDF parser and the Template:Http:// Sabletron library]. It doesn't derference URIs or understand notion of a base URI yet, nor behave well if the support library is missing (todo: find out about exception handling in Ruby).
The basic idea with this API was to offer several flavours of interface, to see how they compare for application use.
  • "statements matching" interface (Graph.ask)
  • "graph navigation metaphor" interface (as per Mozilla)
  • "node centric" interface (based on asking questions of nodes not graphs)
  • Query language interfaces (SquishQL etc) and Web Services

Show me some example code! Something easy and useful please...

Removed (as was out of date) the example

todo: add RSS example here.

Status? What can I do with this thing now?

(WriteMe -- update this! point to tests as examples?)

There is a simple in-memory Graph implementation now. You can add data using tell() or query data using ask(). Not all of ask() is implemented yet. I've not started on the Mozilla-like API. There is not much by way of distinction between interface and implementation. The node-centric API works at a proof of concept level, but needs serious attention.

Any known bugs?

(update this / merge with stuff below)

There are more bugs than I can list here. Known problems:
  • NTriple import doesn't deal with 'blank' nodes, ie. those without URI labels
  • ...or even with literal text properly
  • character encoding and internationalization issues haven't been explored
  • The ask() basic query functionality is only halfway implemented (the useful half ;-)
  • There is no error handling. None!
  • There are no formal tests yet (except the NTriples roundtrip)

What're the plans for this package? It's gonna last forever and take over the world, right?

I might not do any more work on this. If I do, I'll be fixing up basic facilities (so it can be used) before worrying about efficiency, scalability, beauty or even full compliance with the specs. I might finish reading the Ruby docs first; if the current code looks like Perl, there's a reason for that...

All that said, the package is maturing nicely, 2001-2003, and the tests + debian packaging are giving me some confidence of its creeping utility. --DanBri

How does this compare to other node-oriented RDF APIs?

Other node-oriented RDF API experiments...

What support does RubyRdf offer for database access and persistence of RDF data?


  • in memory implementation (efficiency? lack thereof...)
  • SQL-backed version(s)
    • sha1 hashing of uris; is this a dodgy hack? probability of collisions w/ cut down sha1?
    • RDBMS schemas
    • PostgreSQL vs MySQL issues
  • related work, swad-e report etc.
  • sample code / tests
  • RDF4R RDF/XML serializer (more info needed. API?)

Does RubyRdf support any RDF query languages? What support is offered? How is it tested?

RubyRDF offers facilities for querying RDF data using Squish, a simple RDF query language. Squish corresponds to a subset of the RDQL language, and was based on R.V.Guha's RDFdb query language as refined in Libby Miller's Inkling implementation. The general approach is described in the 'Enabling Inference' paper proposed to W3C at the Query Languages '98 workshop in Boston.

RubyRDF includes a parser for a textual representation of Squish queries, but also allows for queries to be composed through an API (and hence could support alternate representations of a query). Queries in RubyRDF, following the Squish approach, can be thought of as simply RDF graphs "with bits missing". The missing bits are annotated with variable names, giving the familiarly SQL-ish "SELECT ?age, ?name, ?height ..." that gave rise to the name "Squish". The results of such a query can be treated as either an RDF graph (the subgraph of the original which matched), or as a set of variable-to-value bindings. Both perspectives have their uses.

There are two query implementations in RubyRDF. The generic implementation can implement Squish against any RDF datasource that implements the most basic 'triples matching' API. There is also a more optimised SQL-backed implementation which (credits due here to Matt Biddulph and Libby Miller) rewrite a Squish query into a self-joining SQL query. This assumes of course a certain strategy for SQL RDF storage (see RDF database below).

rdf query testing

Aee #rdfig chat with libby. gqt.rb generates _genq.rb.

Work is on progress to use a common framework for RDFQueryTestCases in RubyRDF. In the meantime, we have:

Query tests have output like this:

ruby ./gqt.rb > ./_genq.rb
ruby ./_genq.rb
Loaded suite ./_genq
Finished in 0.016589 seconds.

  1) Failure!!!
test_query_8(TC_QueryTest) [./_genq.rb:311]:
The actual (7) and expected (9 number of variables should agree

9 tests, 26 assertions, 1 failures, 0 errors

To do:

  • need framework for comparing actual and expected query resultsets (including bnode issues)

What RDF/XML parser support is there in RubyRdf? What's the deal with plugging in an XML parser?

RubyRDF includes an RDF parser, written originally by Brandt Kurowski. This has been reworked a little by DanBri in pursuit of XML parser independence (it previously assumed Expat, can now run with REXML) and RDF Core spec compliance (now handles the rdf:nodeID construct).

Parser testing is based on the RDF Core test cases collection.


  • expected.txt and .sh extract (from counting lines in the ntriple output files) the expected number of triples. we don't currently use these in our tests.

What's the status of SAX and SAX2 in Ruby? Surely that'd make your life easier here?

Very likely it would. There are efforts to wrap Ruby XML parsers in a common SAX/SAX2 interface. More information on this would be handy. (LinkMe/WriteMe)

How do I use Expat as the underling XML parser?

This is the default for RubyRdf's built-in RDF parser.

For XML parsing in Ruby, one uses either Expat, wrapped using xmlparser or the pure Ruby rexml package.

XMLParser is packaged for Debian as 'libxml-parser-ruby'. There is no MacOS X Fink package for yet, however it does install smoothly, and the underlying Expat parser is packaged in Fink (as 'expat').

Can I use REXML, the pure Ruby XML parser that's being bundled with Ruby 1.8+?

Sorta. RubyRdf contains some code for wiring it in, but the namespace support DanBri added is rough (ie. broken) and (worse) no tests are written to show up the problem. This was a train-journey hack.

The pure Ruby REXML parser is available as the Debian package 'librexml-ruby'. See pack/tests/syntax/ for experiments in using this instead of Expat: see rexmltest.rb

What I tried to do here, was rig things so that we use rexml instead of expat to generate SAX2 events. It is architected all wrong, and doesn't even keep namespace-context state properly. But the potential is there. I'm cleaning things up a bit. There is now RDFParser, REXRDFParser, and XMLParser. The latter subclasses the Expat XML code, and mixes-in generic RDF parsing functionality from RDFParser (see ruby mix-ins for context).

Can I use the xmlscan pure-ruby XML parser?

Not yet.

Another pure-ruby XML parser, claiming XML 1.0 compliance, multi-charset abilities, speediness, and tentative XML namespaces support. As of April 2003 it appears to be under active development.

I downloaded it but didn't quite get it working, error msg below. Could be a trivial mistake. --DanBri

lib/xmlscan/scanner.rb:487:in `initialize': failed to convert nil into String (TypeError)

How do I get PostgreSQL up and running?

You'll need Ruby's [DBI package] and the PostgreSQL extension library.

  • in debian these are ...
  • or build/install yourself

Something like this is needed on a Debian box using Ruby>1.6, since DBI etc aren't packaged yet:

ruby1.8 extconf.rb --with-pgsql-include-dir=/usr/include/postgresql/ --with-pgsql-lib-dir=/usr/lib/postgresq/l

Bugs, there must be bugs. Where's the bug list? tests? issue tracker?

See the bugzilla issue list for known problems. Or leave a scribble here.

A testing gotcha that got me: If we write "$:.unshift '../../lib/'" at the top of tests, so the local copy gets precdence, make sure to have the right number of ../../ in the path, else it may silently fail and test an older system-wide installation instead. Not sure how best to watch out for this.

what I'm currently doing is a bit of a hack:

$:.unshift '../lib/'
$:.unshift '../../lib/'

...but seems to cover the common case of running it from from of two dirs.

This is all thoroughly tested, right?

Uhm, we're in a creeping professionalism situation.

RubyRDF uses TestUnit for tests, driven from RDF-described test case descriptions. Also UsingTestUnit in the Ruby wiki (for test/unit), and (very useful!) Ruby Specific Techniques for Testing by Hugh Sasse (based around TestUnit).

See also: Test First, by Intention- A code and culture translation from the original Smalltalk to Ruby, original by Ronald Jeffries, translation by Aleksi Niemela and Dave Thomas

A question from DanBri: If I anticipate many of these tests will fail, can I indicate this somehow? So that the really exceptional/worrying failures stand out...?

cout in irc writes, danbri: what we do is make sure all our bugs are entered into a bug database (such as bugzilla), and all our bugs have a bug number. if the test checks for a known bug, then the assertion message should include that bug number. if we fix a bug, then the assertion message includes the bug number, but also indicates that the bug is already fixed (so if it shows up again, we'll know).

How does RubyRdf's test machinery work? Any design issues to work out?

Current approach in RubyRDF is to generate (from source RDF/XML test data) some Ruby test scripts, and run those. It would be interesting to learn more about testing frameworks that were designed with data-driven tests in mind; doing code-generation for this seems like overkill.

See #rdfig notes on testing design: whether to do code-generation or not. I'm inclined not to, now, but unsure how to avoid having one large test() method. I guess one test, and lots of assertions, one or several per test in the manifest, might work. (...) Time passes, mind changes. After discussion on #ruby-lang, and experience with non-generated approach, I'm now more inclined towards generating Ruby test scripts automatically. It will allow the test machinery of RubyUnit to offer finer-grained statistics on pass/fail performance. --DanBri

Also in progress, same thing for RDF Core syntax tests: gcore.rb, re-written in codegeneration style. See output from tests. Todo: add more helpful output, eg text associated with tests.

More detail on tests is included alongside documentation for the relevant parts of RubyRDF.

Has RubyRdf this been packaged for Debian? for Fink?

RubyRDF is packaged for Debian (tentatively) but not distributed through the main Debian distribution.

see the debian maintainers guide for advice on files that belong in debian/ directory. Also the debian policy documentation.

There is an experimental Debian package. See notes and files for more info. There is also an attempt at fink packaging. The former pre-decides which directories to put its files into; the latter runs a configure/install script on the target box. Neither is well tested.

-- DanBri - 07 Feb 2003

Here is where files get stashed by debian installer, as of Mar 20 2003


Here's how debian packages the rexml parser we use as a pure-ruby alternative to expat: apt-get install librexml-ruby ...should add this to our deb package metadata as useful optional package to accompany ours.

How do I use your experimental debian apt-source setup?

A quick howto:

Add the 'deb' line to /etc/apt/sources, ie.:

deb ./

Note that this is AT YOUR OWN RISK! I'm new to debian packaging. It may make a mess (but I don't believe it will). --DanBri

apt-get update
apt-get install rubyrdf-ruby

...also grab these xml parser packages (not sure if they'll be auto-included)

apt-get install librexml-ruby
apt-get install libxml-parser-ruby
apt-get install libdbi-ruby
apt-get install libdbd-pg-ruby libdbd-mysql-ruby

after which rdftest.rb should work on .rdf files. Tell it which file you want using --rdf= commandline switch. (hmm I think I broke this under Ruby 1.8, but we didn't package this for 1.8 anyways...)

Another test app to try: ayftest.rb, a tiny RDF harvester. need to install one of the xml parser packages by hand too: (try both, you can choose which with --parser=rubyexpat or --parser=rexml). The apt-get lines above do that. At some point I'll fix the dependencies so this happens automatically.

How would I go about testing your early Debian packaging attempts?

for alpha-test 'volunteers', ie. Libby.

help needed:

nearby: RubyRDF homepage | Debian info

Alpha Tester notes and scribbles: If you try it, leave notes or links to more info here. thanks --DanBri

trying to build packages on other machine, remembered that hard codes a full path on my machine. The makefile should generate instead. TODO. DONE! I think. Some ruby script spits it out now. See Makefile.

Libby's notes: rubyrdf alpha - debian install notes

Testing / installation hint: if I rebuild the .deb but don't change the version numbers etc., it is not enough to apt-get remove and then apt-get install the package. I need to do 'apt-get update' otherwise Debian may used a cached copy of the older .deb file.

TODO: find out where test cases should live, in Debian filetree layout. This also useful for Cwm, see CwmTips.

How do I install RubyRdf?

If you have a working Ruby interpreter, this should just work. Watch out for usual things; for example, you may need to edit #!/usr/local/bin/ruby at the top of any scripts. Also, the require() call may need the directory path to {[basicrdf.rb }.

Nearby: Template:Http:// for Win32] (includes Expat and xmlparser)

You can grab a recent tarball from the Web and install at your own risk, see RubyRdfInstallationLog. Note that it fails a bunch of tests. It would be good if we could indicate which failures were (sadly) normal, versus which are signs of something bad. Maybe RubyUnit can do this? Advice welcomed!

So, er, what exactly is an RDF API?

(see also WebDataInterfaceDesign)

The code here implements some parts of a a basic RDF API and in-memory store for the Ruby programming language. It is a rewrite of a similar system I worked on in Perl. To understand what an RDF API is for, you probably need a little background on RDF and its relationship to XML and the Web, more than can be provided here. The [/RDF/ RDF home page] and [/2001/sw/ Semantic Web activity] provide some useful links. This brief document on RDF "striped" syntax may help, as might Tim Bray's excellent What is RDF? article. Assuming some basic familiarity with RDF, we still need to know what an RDF API might do...
So what is an RDF API? How does it differ from XML's SAX and DOM interfaces? RDF, after all, is written in XML syntax...
An RDF API provides a way for application authors to access data that is structured according to the W3C RDF information model, that is, as a directed labeled graph of "nodes and arcs". Each Node represents something, either a "resource" (some identifiable thing, whether conceptual, physical or digital) or a "literal" chunk of data such as a string or number. Ruby fans, for whom (I'm told) numbers and strings are objects might wonder why RDF makes such a distinction. But it does, and RDF implementations consequently distinguish between nodes that are "resources" and nodes that are "literals".
The connections between these things, in RDF, are called "properties". They correspond to (binary) relations between nodes in the graph, and to the notion of attributes. In RDF we identify properties using Web identifiers, URIs. A URI (Uniform Resource Identifier) is a simple textual string (such as a URL), and provides a useful decentralised convention for naming things on the Internet. The neat idea in RDF is to use URIs to identify not just the things that we might want to describe using RDF, but also to identify the classes and properties that we use to describe them. So in RDF, each "arc" (or edge, or connection, or link) in the graph is named with a URI, and each "node" has a "type" which is some (URI-named) class.
So the RDF information model amounts to little more than a collection of triples (we call them "statements") consisting of nodes and (URI-named) arc labels. These are usefully thought of as forming a graph data structure, and most RDF APIs present themselves as interfaces to this graph.
We call these triples "statements" in acknowledgement that RDF content is supposed to be meaningful, to say something about the world. RDF data corresponds to a set of claims, ie. statements, about the named properties of named objects, where the names are written using URI syntax. An RDF API will offer applications an interface to objects that model the world in these terms, typically allowing data to be added, or questions to be asked of the Graph and Node objects.
So the reason we have RDF APIs, in addition to pure XML interfaces such as SAX and DOM, is that there are many mappings of RDF graphs into XML documents. By coding to an RDF API instead of to an XML API, we can set aside the detail of how our RDF is written, and deal directly with interfaces that care about the content of the RDF. In other words, we can have APIs that load XML/RDF data, and expose an interface couched in terms of "nodes and arcs" (objects and their relationships), rather than in terms of the XML document structures that encode this data.
It should be clear by now that the notion of 'object' is serving at least two purposes. Ruby (and other programming languages) present programmers with objects (that have properties and methods that receive messages). RDF presents programmers with a network of objects (nodes in a graph, each node representing some "resource" or thing), connected by directed, labeled arcs. Somehow we need to represent the latter using the former. This becomes interesting, since both Ruby and RDF have the notion of a class hierarchy, and ways of representing things with properties. The Node-centric RDF API outlined below explores one trick for reflecting RDF's notion of property into Ruby's notion of (missing) methods.
There is a third sense of the word 'object' that should be mentioned at this point: in RDF, the three parts of an RDF statement are called the "subject", "predicate" and "object". The subject is the node whose property is being described, the predicate is the type of property, and the object is the value of the property. To complicate things further, there are sometimes nodes in the graph corresponding to the types of property (such as 'worksFor') and sometimes even for so-called "reified" RDF statements, representations of statements and their component parts, ie the subject, predicate and object. Such complexities can largely be ignored for now; but they're worth mentioning as they are reflected in the basic RDF API. For example, our Graph object allows you to list all the subjects, predicates or objects in the graph. More details on this to follow.
In Ruby, everything is an object. In RDF, everything is a "resource", and is described using the RDF information model. This simplifies the world of Ruby programmers, and simplifiers the world of RDF programmers too. RDF offers a consistent approach to representing a lot of different kinds of data. Our goal here is to find some practical conventions for exposing that data to Ruby applications.

What RDF API design issues need more attention here?

A bunch. It'd be good to sync with the Cwm/SWAP and rdflib designs, since Ruby and Python are close enough... and many of the same issues have cropped up.

I like having nodes and graphs loosly coupled: you can change the graph that a node is attached to. I like the current (intended if not yet verified -- I think there may be a bug here) behaviour of only having one node object exist per URI or literal. But I don't think these two approaches are working well together. The nodes that come back from querying a graph, when new, are attached to that graph. When they're not new, we get counter-intuitive behavour.
The ask() facility on a Graph (and possibly on remote web services, which may share this interface) currently takes only a simple 'match this triple' argument. I'd like to be able to ask a Graph a more sophisticated query, eg. pass in SquishQL expressions and get back a table of bindings. Should the name ask() be reserved for this more ambitious use? How to offer multiple ways of querying the graph?
It's handy having Node, Statement etc., but sometimes these are overkill. We should be able to use simple string URIs and literals in a number of places that currently expect structured objects, eg. adding triples into a graph.
Indexing: the Graph object currently maintains a couple of indexes (@fp and @bp). Since the ask() method returns a graph (basically a collection of statements) as the result of a query, concern is that we're going to a lot of expense creating a rather transient graph object, indexing its contents etc.
Provenance: I want to keep track of where statements came from, implement aggregation of multiple graphs into a virtual database etc.
Query interface: should be able to add a SquishQL query interface easily enough, though only implementation currently will be remote SOAP web services.

Got any hand links to more Ruby resources / background materials?

Sure. These are handy...

Realtime chat interfaces? What's available?

  • for IRC, there is rbot, but that's more of an application with plugin framework than a library (LinkMe)
  • for Jabber, the Jabber4R library is wonderful. (I'm up and running in seconds, with no problem. --DanBri)

Ruby Debugger

M-x load-library rubydb3x.el
M-x rubydb


The xmlparser/sax library needs patching to work in Ruby 1.8. From Yoshida Masato:

Dan Brickley <> writes:
> /usr/local/lib/ruby/site_ruby/1.8/xml/sax.rb:343: warning: undefining `initialize' may cause serious problem

Apply the following patch:

RCS file: /Users/yoshidam/.cvs/xmlparser/lib/xml/sax.rb,v
retrieving revision
retrieving revision 1.2
diff -u -r1.1.1.1 -r1.2
--- sax.rb      2003/03/12 05:54:42
+++ sax.rb      2003/03/12 06:38:21     1.2
@@ -339,9 +339,7 @@
     module Helpers
-      class ParserFactory
-        undef initialize
+      module ParserFactory
         def ParserFactory::makeParser(klass)
           if klass.kind_of?(Class)