rdflib is a Python library for working with RDF. The library attempts to follow the terminology and concepts found in the RDF Semantics document as closely as possible (with a few notable exceptions such as the use of "TripleStore" instead of "Model"). Among other functionality, the library implements a TripleStore interface tailored to the Python language; for example, the methods for retrieving information from the TripleStore are implemented using Python generators.
There are several TripleStore backends, ranging from an in-memory to a persistent backend that uses Sleepycat BTrees. There are also a number of parsers and serializers -- RDF/XML, NTriples, and a few others in the works -- for getting RDF into and out of the TripleStore.
The initial implementation focus for rdflib was on the TripleStore interface, and the initial TripleStore backend used nested, in-memory dictionaries for the indices. (For example, spo, pos, osp...) Because of this initial focus, rdflib is partially optimized for small tasks where an in-memory storage is sufficient.
After the initial implementation experience, and as applications started to outgrow the simple in-memory TripleStore, a persistent backend was added using nested ZODB BTrees and then a persistent backend using Sleepycat Btrees without nesting.
Some applications, which might otherwise have used a TripleStore fruitfully, require additional interface support for aggregation and provenance. A simple use case for this additional data is a web spider that needs to be able to update data that it has already spidered. As a result of these applications, rdflib has a TripleStore variant which supports the idea of assertional contexts -- what rdflib calls an InformationStore.
The pluggable TripleStore backend capability of rdflib -- in addition to some of Python's charms -- has made it an ideal platform for experimentation, particularly with regard to different backend strategies and implementations. For example, rdflib has an in-memory backed that uses KDTrees. Other experiments have been done, using ideas from Donald Knuth's Sorting and Searching book[1], the Reiser4 white paper[2], and Dan Gusfield's Algorithms on Strings, Trees and Sequences book[3].
[1] Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition) by Donald E. Knuth http://www.amazon.com/exec/obidos/ASIN/0201896850/102-8454493-8075306
[2] Reiser4 http://www.namesys.com/v4/v4.html
[3] Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield http://www.amazon.com/exec/obidos/tg/detail/-/0521585198/102-8454493-8075306
Persistent backend using nested Zope BTrees.
cspo cpos spo pos c
Persistent backend using Sleepycat BTrees (Flat, not nested).
i2k k2i c cspo cpos cosp spo pos osp