Data Stores ...

Status of this Document

This document is no longer maintained. For a more recent report on database schemas, please see Mapping Semantic Web Data with RDBMSes.

Following is a snapshot of script output as of 22 April, 2003. The script is no longer available and the source data should be considered static.

Survey of RDF/Triple Data Stores

RDFStore - Perl API for RDF Storage

Property	Value
author	Alberto Reggiori
introduction	RDFStore implements a generic hashed data storage that allows to serialise RDF models, resources, properties and property values either to disk or in-memory data structures. It does support several different persistent storage models such as SDBM, BerkeleyDB (standard and Sleepycat) and DBMS. The latter is a custom TCP/IP based storage library that allows to a perl script to transparently read/write hashed data values stored on a remote database server. One RDFStore database currently consists of 4 on-disk DB files but it is under development a completly new indexing method that should use 5 distinct files.
implementation	The DBMS storage module is a fast networked transactional object store that uses multiple single key hash based BerkeleyDB along with Object Serialization in Perl and an optimized network routing daemon with a single thread/process per database. The acutal running code consists of two parts: the TCP/IP deamon (written in C) and a perl extension to tie hashes to DBMS storages. The deamon can handle multiple connections concurrently, where each table accessor is given its own thread of execution by forking. Having them forked means no locking overhead; the dbmsd supports only original BerkeleyDB 1.85 style interface. All oprerations are atomic and serialised using a FIFO like algorithm; the storage support arbitrary sized data. To reduce latence and avoid stagger situations dbms uses non blocking IO and extensive buffering, but the dbms server is still 100% IO limited. The DBMS system has been tested in the past to use threading instead (like rdfdb) of forking but it did not show any serious performance (in part because both on FreeBSD 3.2 and on Linux 2.2.3 threads were not yet that efficient).
query	triples-matching. Planned to add free-text search over triples and a SQL/DBI interface on persistent storages.
inference	basic RDF Schema inference
scalability	1470000 triples get stored in a ~98MB database. A new indexing method is under development that should reduce the storage requirements even more.
performance	Remote DBMS has been tested in the past for 2000/tps. Local Sleepycat BerkeleyDB storages using locking under apache+mod_perl perform ~183 read operations/second
provenance	None
license	BSD like license
api	Perl5 apis
transaction	None
platform	Tested on FreeBSD and Linux but should run on any platform Perl runs :)
seealso	http://rdfstore.jrc.it/dbms.html
lastupdate	2001-06-06

Cwm

Property	Value
author	Tim Berners-Lee, Dan Connolly, et al
introduction	A general purpose data processor for the Semantic Web. Not optimized, but demonstrating the feasability of Semantic Web ideas. Please see the Cwm home page for details.
implementation	Python, Open source
query	Notation3 is an RDF syntax which is extened to be able to express queries and rules.
inference	Forward chaining, with built-in functions and remote query delegation
scalability
performance	Not optimized.
provenance
license	W3C licence
api
transaction
platform	Any Python platform.
distribution	Query delegation
seealso
lastupdate	2003-02-26

The RDF Schema Specific DataBase (RSSDB)

Property	Value
author	Sofia Alexaki ICS-FORTH mailto:alexaki@ics.forth.gr
introduction	RSSDB is a persistent RDF Store for loading resource descriptions inan object-relational DBMS (ORDBMS) by exploiting the available RDFschema knowledge. It preserves the flexibility of RDF in refiningschemas and/or enriching descriptions at any time whilst it can becustomized in several ways (as opposed to triple-based repositories)according to the specificities of both the manipulated RDF descriptions(i.e., schemas) and the underlying RDF application queries.
implementation	RSSDB has been implemented on top of an ORDBMS (i.e., PostgresrSql). Itcomprises a Loading and an Update module, both implemented in Javausing a number of primitive methods (i.e., APIs) forinserting/deleting/modifying RDF triples. Access to the ORDBMS isaccomplished through the JDBC interface in order to ensureinteroperability with various commercial or public domain ORDBMS.
query	Querying of stored RDF descriptions is accomplished by RQL. RQL is atyped language following a functional approach (a la ODMG OQL) andsupports generalized path expressions featuring variables on bothlabels for nodes (i.e., classes) and edges (i.e., properties). RQLrelies on a formal graph model (as opposed to triple-based approaches)that captures the RDF modeling primitives and permits theinterpretation of superimposed resource descriptions by means of oneor more schemas. The novelty of RQL lies in its ability to smoothlyswitch between schema and data querying while exploiting - in atransparent way - the taxonomies of labels and multiple classificationof resources. The functionality and formal interpretation of RQL isgiven for several classes of useful queries required by Semantic WebApplications. For a comparison between RQL and Squish see http://swordfish.rdfweb.org:8085/tests/.
inference	RSSDB and RQL have built-in support for recursive traversal of classand property hierarchies. Furthermore, RQL provides universal andexistential quantification over RDFS classes and properties. Thus,members of a Community Web are able to query resources describedaccording to their preferred schema, while discover, in the sequel,how the same resources are also described using another communityschema. Finally, RQL fully supports (a) XML Schema data types (forfiltering literal values), (b) powerful grouping primitives (forconstructing complex XML results), (c) aggregate functions (forextracting statistics) and, (d) in the near future, sorting. An onlinedemo of the RQL filtering/navigation/restructuring capabilities isavailable at http://139.91.183.30:9090/RDF/RQL/.
scalability	Our experiments showed that the size of DBMS scales linearly with thenumber of triples (seehttp://139.91.183.30:9090/RDF/publications/semweb2001.html). We usedas testbed the Open Directory RDF dump, which comprises about 6 milliontriples.
performance	In most real-scale RDF applications, variations of a basic databaserepresentation are required in order to take into account the specificcharacteristics of the employed schema classes and properties, as wellas those of the intended query functionality. The main goal of RSSDBschema-specific representation is the separation of the RDF schemafrom data information, as well, as the distinction between unary andbinary relations holding the instances of classes and properties. Wehave carried out experiments in order to compare RSSDB representationwith the triple-based one, using as testbed the Open Directory RDFdump. The results illustrate that our approach yields considerableperformance gains in query processing and storage volumes. Detailedinformation about the database size and time required for the storageof RDF descriptions, as well as about the querying time for bothrepresentations can be found at the publication "The ICS-FORTHRDFSuite: Managing Voluminous RDF Description Bases" at the URLhttp://139.91.183.30:9090/RDF/publications/semweb2001.html
license	C-Web (c-web.inria.fr) Open Source Software License
api	We are curretly working on the specification of a Java API coveringthe whole spectrum of RDF manipulation, namely, constructing,validating, storing, updating, and querying RDF triples.
transaction	Dependent on the underlying ORDBMS support (user option)
platform	Java 2 Platform. Tested on Solaris and Lunix (RQL+PostgresSql).
seealso	RDFSuite http://139.91.183.30:9090/RDF/
lastupdate	2001-08-15

rdfDB

Property	Value
author	R.V.Guha
introduction	rdfDB is intended to be a simple, scalable, open-source database for RDF.
implementation	Small, C based, uses Sleepycat DB for on-disk storage.
query	Supports a graph oriented API via a textual query language ala SQL (aka Squish). C and perl bindings for this api.
inference	none.
scalability	Tested with about 20 million triples. Should scale to much more.
license	Mozilla Public License
api	C and perl apis.
transaction	none
platform	Unix (linux, bsd, solaris)
lastupdate	2001-05-22

Redland

Property	Value
author	Dave Beckett
introduction	Redland abstracts the storage implementation, but I'll consider the Berkeley DB (BDB) based storage which uses several (currently 3) on-disk BDB databases.
query	triples-matching with wildcards
inference	None
scalability	Unknown, but tested with 1.5M stored statements.
performance	The exact query speed is 6,200 statements/second for the 1.5M statements stored
provenance	None
license	3 alternatives - GPL, LGPL or MPL (Mozilla)
api	C (native); Perl; Python; Tcl; will compile with C++ Java support tested, not complete
transaction	None
platform	Pretty portable POSIX based - has been built on Linux, Solaris, OSF/1 Alpha, FreeBSD, MacOS X.
distribution	None
lastupdate	2001-04-20

Extensible Open Rdf

Property	Value
author	OCLC Office of Research and the Dublin Core Metadata Initiative
introduction	EOR is an open source project, whose goal is to facilitate the rapid development of RDF applications focused on the discovery, management, integration and navigation of metadata.
implementation	The EOR toolkit is a collection of extensible Java classes and services which serve as a code base, demonstrating by example functions and services common to RDF applications, i.e., metadata capture, search engines, etc..
query	Triples-matching with wildcards. Model-centric approach. RDB/JDBC data store using Melnik "Hashed With Origin" data model.
scalability	Unknown, but tested with > 1000 RDF models.
performance	Unknown.
license	Dublin Core Open Source Software License
api	SQL/JDBC
transaction	Dependent on the underlying RDB (user option)
platform	All Java Platforms
lastupdate	2001-06-12

Sesame

Property	Value
author	Sesame
introduction	Sesame is a system consisting of a repository, a query engine and an administration module for adding and deleting RDF data and Schema information. Sesame is being developed as part of the OnToKnowledge project. A public demo server running Sesame can be found at http://sesame.aidministrator.nl/. This site also contains documentation on Sesame. People who would like to test Sesame on their own data can get their own repository on this demo server.
implementation	Sesame's repository is not only a repository for RDF, but also for RDF Schema. Sesame understands the semantics of most of the RDF Schema classes and properties and correctly handles transitive properties like rdfs:subClassOf and rdfs:subPropertyOf. Sesame currently uses PostgreSQL for its repository, but it can switch to other (kinds of) databases quite easily. The rest of Sesame is completely implemented in Java and can run on any platform for which there exists a Java 2 runtime environment.
query	The language for the query engine is based on RQL (from ICS-FORTH), which offers full support for querying both plain RDF and RDF Schema. Our RQL implementation is slightly different than ICS-FORTH's because our interpretation of RDF Schema differs from theirs and because Sesame is less restrictive on the (RDF Schema-) ontologies that can be used. Our query engine does not support all features of RQL yet.
inference	Sesame support the basic inferencing needed for supporting RDF Schema, such as transitivity of subClassOf- and subPropertyOf-properties.
scalability	Unknown, but tested with 300,000 stored statements from the wordnet nouns file available at http://www.semanticweb.org/library/
performance	We haven't done any serious performance testing on Sesame yet, but the aim for the OnToKnowledge project is to support at least ontologies of O(10^3) classes and O(10^5) triples on desk-top hardware. Note that these are minimum requirements on Sesame, and that it probably will support larger ontologies and larger numbers of triples.
platform	Java 2
seealso	Some documents and papers related to Sesame: [1] OnToKnowledge deliverable 9: Query Language Definition http://www.ontoknowledge.org/countd/countdown.cgi?del9.pdf [2] Babysteps in Sesame RQL: a tuturial on Sesame's RQL http://sesame.aidministrator.nl/doc/rql-babysteps.html [3] Sesame's interpretation of RDF Schema http://sesame.aidministrator.nl/doc/rdf-interpretation.html http://sesame.aidministrator.nl/doc/rdf-interpretation.html
lastupdate	2001-05-22

Jena

Property	Value
author	The Jena Team
introduction	Jena is an RDF toolkit that contains: Model/graph API Persistent storage and in-memory storage An RDF parser A query language Support for DAML+OIL ontology Jena has a storage abstraction that enables new storage subsystems to beintegrated. The persistent storage mechanisms are current based onBerkeleyDB and SQL.
implementation	The SQL implementation of this storage system supports multiple databaselayouts and different database types through a mixture of Java subclassing anddynamically loaded SQL driver files. Current versions support two variants on ageneric triple table layout, two variants on a hash indexed layout, andhave been tested on Interbase and Postgresql. Other layouts are in development.The BerkeleyDB implementation uses indexes a number of tables: the basedata and indexing tables SP->O, PO->S and OS->P.
query	The query language is RDQL,which is a syntax and a query API that can extract information from a model. RDQLis not tied to any storage implementation but can be used with any Jena modelimplementation, including any storage mechanism. RDQL provides subgraph patterns and boolean expressions in an SQL-likesyntax (see SquishQL).
inference	The current toolkit does not provide any inferencing mechanisms. RDQL doesnot provide inference; model implementers can do so if this manifest itselfthrough the triple interface.
scalability	Unknown - limitations are due to the underlying storage technology. RDQLqueries have been executed on 800K statement models (custom memory mappedfile). In-memory storage has been used with 600K statements (wordnet).
performance	Small scale performance tests have used a tiny fragment of the dmoz datasetand tested link following, reversed link following and search based onstring and integer matching constraints. Performance on workstation classmachines for the SQL store is around 10ms/statement load,1-7ms/returned-statement search. The BDB implementation is typically10x faster but does not provide transaction support.
provenance	No support provided
license	BSD (version 1)
api	Java
transaction	SQL backend supports transactions through the transaction isolationcapability of the underlying database. The BDB subsystem does not currentlysupport transactions.
platform	Java 1.2 and up. Tested on MS Windows and Linux with BerkeleyDB version 3.3.11
distribution	None
lastupdate	2001-11-23
seealso	More information is available at the HPLabs semantic web website