W3C

Status of this Document

This document is no longer maintained. For a more recent report on database schemas, please see Mapping Semantic Web Data with RDBMSes.

Following is a snapshot of script output as of 22 April, 2003. The script is no longer available and the source data should be considered static.

Survey of RDF/Triple Data Stores

RDFStore - Perl API for RDF Storage

Property Value
author Alberto Reggiori
introduction RDFStore implements a generic hashed data storage that allows to serialise RDF models, resources, properties and property values either to disk or in-memory data structures. It does support several different persistent storage models such as SDBM, BerkeleyDB (standard and Sleepycat) and DBMS. The latter is a custom TCP/IP based storage library that allows to a perl script to transparently read/write hashed data values stored on a remote database server. One RDFStore database currently consists of 4 on-disk DB files but it is under development a completly new indexing method that should use 5 distinct files.
implementation The DBMS storage module is a fast networked transactional object store that uses multiple single key hash based BerkeleyDB along with Object Serialization in Perl and an optimized network routing daemon with a single thread/process per database. The acutal running code consists of two parts: the TCP/IP deamon (written in C) and a perl extension to tie hashes to DBMS storages. The deamon can handle multiple connections concurrently, where each table accessor is given its own thread of execution by forking. Having them forked means no locking overhead; the dbmsd supports only original BerkeleyDB 1.85 style interface. All oprerations are atomic and serialised using a FIFO like algorithm; the storage support arbitrary sized data. To reduce latence and avoid stagger situations dbms uses non blocking IO and extensive buffering, but the dbms server is still 100% IO limited. The DBMS system has been tested in the past to use threading instead (like rdfdb) of forking but it did not show any serious performance (in part because both on FreeBSD 3.2 and on Linux 2.2.3 threads were not yet that efficient).
query triples-matching. Planned to add free-text search over triples and a SQL/DBI interface on persistent storages.
inference basic RDF Schema inference
scalability 1470000 triples get stored in a ~98MB database. A new indexing method is under development that should reduce the storage requirements even more.
performance Remote DBMS has been tested in the past for 2000/tps. Local Sleepycat BerkeleyDB storages using locking under apache+mod_perl perform ~183 read operations/second
provenance None
license BSD like license
api Perl5 apis
transaction None
platform Tested on FreeBSD and Linux but should run on any platform Perl runs :)
seealso http://rdfstore.jrc.it/dbms.html
lastupdate 2001-06-06

Cwm

Property Value
author Tim Berners-Lee, Dan Connolly, et al
introduction A general purpose data processor for the Semantic Web. Not optimized, but demonstrating the feasability of Semantic Web ideas. Please see the Cwm home page for details.
implementation Python, Open source
query Notation3 is an RDF syntax which is extened to be able to express queries and rules.
inference Forward chaining, with built-in functions and remote query delegation
scalability
performance Not optimized.
provenance
license W3C licence
api
transaction
platform Any Python platform.
distribution Query delegation
seealso
lastupdate 2003-02-26

The RDF Schema Specific DataBase (RSSDB)

Property Value
author Sofia Alexaki ICS-FORTH mailto:alexaki@ics.forth.gr
introduction RSSDB is a persistent RDF Store for loading resource descriptions inan object-relational DBMS (ORDBMS) by exploiting the available RDFschema knowledge. It preserves the flexibility of RDF in refiningschemas and/or enriching descriptions at any time whilst it can becustomized in several ways (as opposed to triple-based repositories)according to the specificities of both the manipulated RDF descriptions(i.e., schemas) and the underlying RDF application queries.
implementation RSSDB has been implemented on top of an ORDBMS (i.e., PostgresrSql). Itcomprises a Loading and an Update module, both implemented in Javausing a number of primitive methods (i.e., APIs) forinserting/deleting/modifying RDF triples. Access to the ORDBMS isaccomplished through the JDBC interface in order to ensureinteroperability with various commercial or public domain ORDBMS.
query Querying of stored RDF descriptions is accomplished by RQL. RQL is atyped language following a functional approach (a la ODMG OQL) andsupports generalized path expressions featuring variables on bothlabels for nodes (i.e., classes) and edges (i.e., properties). RQLrelies on a formal graph model (as opposed to triple-based approaches)that captures the RDF modeling primitives and permits theinterpretation of superimposed resource descriptions by means of oneor more schemas. The novelty of RQL lies in its ability to smoothlyswitch between schema and data querying while exploiting - in atransparent way - the taxonomies of labels and multiple classificationof resources. The functionality and formal interpretation of RQL isgiven for several classes of useful queries required by Semantic WebApplications. For a comparison between RQL and Squish see http://swordfish.rdfweb.org:8085/tests/.
inference RSSDB and RQL have built-in support for recursive traversal of classand property hierarchies. Furthermore, RQL provides universal andexistential quantification over RDFS classes and properties. Thus,members of a Community Web are able to query resources describedaccording to their preferred schema, while discover, in the sequel,how the same resources are also described using another communityschema. Finally, RQL fully supports (a) XML Schema data types (forfiltering literal values), (b) powerful grouping primitives (forconstructing complex XML results), (c) aggregate functions (forextracting statistics) and, (d) in the near future, sorting. An onlinedemo of the RQL filtering/navigation/restructuring capabilities isavailable at http://139.91.183.30:9090/RDF/RQL/.
scalability Our experiments showed that the size of DBMS scales linearly with thenumber of triples (seehttp://139.91.183.30:9090/RDF/publications/semweb2001.html). We usedas testbed the Open Directory RDF dump, which comprises about 6 milliontriples.
performance In most real-scale RDF applications, variations of a basic databaserepresentation are required in order to take into account the specificcharacteristics of the employed schema classes and properties, as wellas those of the intended query functionality. The main goal of RSSDBschema-specific representation is the separation of the RDF schemafrom data information, as well, as the distinction between unary andbinary relations holding the instances of classes and properties. Wehave carried out experiments in order to compare RSSDB representationwith the triple-based one, using as testbed the Open Directory RDFdump. The results illustrate that our approach yields considerableperformance gains in query processing and storage volumes. Detailedinformation about the database size and time required for the storageof RDF descriptions, as well as about the querying time for bothrepresentations can be found at the publication "The ICS-FORTHRDFSuite: Managing Voluminous RDF Description Bases" at the URLhttp://139.91.183.30:9090/RDF/publications/semweb2001.html
license C-Web (c-web.inria.fr) Open Source Software License
api We are curretly working on the specification of a Java API coveringthe whole spectrum of RDF manipulation, namely, constructing,validating, storing, updating, and querying RDF triples.
transaction Dependent on the underlying ORDBMS support (user option)
platform Java 2 Platform. Tested on Solaris and Lunix (RQL+PostgresSql).
seealso RDFSuite http://139.91.183.30:9090/RDF/
lastupdate 2001-08-15

rdfDB

Property Value
author R.V.Guha
introduction rdfDB is intended to be a simple, scalable, open-source database for RDF.
implementation Small, C based, uses Sleepycat DB for on-disk storage.
query Supports a graph oriented API via a textual query language ala SQL (aka Squish). C and perl bindings for this api.
inference none.
scalability Tested with about 20 million triples. Should scale to much more.
license Mozilla Public License
api C and perl apis.
transaction none
platform Unix (linux, bsd, solaris)
lastupdate 2001-05-22

Redland

Property Value
author Dave Beckett
introduction Redland abstracts the storage implementation, but I'll consider the Berkeley DB (BDB) based storage which uses several (currently 3) on-disk BDB databases.
query triples-matching with wildcards
inference None
scalability Unknown, but tested with 1.5M stored statements.
performance The exact query speed is 6,200 statements/second for the 1.5M statements stored
provenance None
license 3 alternatives - GPL, LGPL or MPL (Mozilla)
api C (native); Perl; Python; Tcl; will compile with C++ Java support tested, not complete
transaction None
platform Pretty portable POSIX based - has been built on Linux, Solaris, OSF/1 Alpha, FreeBSD, MacOS X.
distribution None
lastupdate 2001-04-20

Extensible Open Rdf

Property Value
author OCLC Office of Research and the Dublin Core Metadata Initiative
introduction EOR is an open source project, whose goal is to facilitate the rapid development of RDF applications focused on the discovery, management, integration and navigation of metadata.
implementation The EOR toolkit is a collection of extensible Java classes and services which serve as a code base, demonstrating by example functions and services common to RDF applications, i.e., metadata capture, search engines, etc..
query Triples-matching with wildcards. Model-centric approach. RDB/JDBC data store using Melnik "Hashed With Origin" data model.
scalability Unknown, but tested with > 1000 RDF models.
performance Unknown.
license Dublin Core Open Source Software License
api SQL/JDBC
transaction Dependent on the underlying RDB (user option)
platform All Java Platforms
lastupdate 2001-06-12

Sesame

Property Value
author Sesame
introduction Sesame is a system consisting of a repository, a query engine and an administration module for adding and deleting RDF data and Schema information. Sesame is being developed as part of the OnToKnowledge project. A public demo server running Sesame can be found at http://sesame.aidministrator.nl/. This site also contains documentation on Sesame. People who would like to test Sesame on their own data can get their own repository on this demo server.
implementation Sesame's repository is not only a repository for RDF, but also for RDF Schema. Sesame understands the semantics of most of the RDF Schema classes and properties and correctly handles transitive properties like rdfs:subClassOf and rdfs:subPropertyOf. Sesame currently uses PostgreSQL for its repository, but it can switch to other (kinds of) databases quite easily. The rest of Sesame is completely implemented in Java and can run on any platform for which there exists a Java 2 runtime environment.
query The language for the query engine is based on RQL (from ICS-FORTH), which offers full support for querying both plain RDF and RDF Schema. Our RQL implementation is slightly different than ICS-FORTH's because our interpretation of RDF Schema differs from theirs and because Sesame is less restrictive on the (RDF Schema-) ontologies that can be used. Our query engine does not support all features of RQL yet.
inference Sesame support the basic inferencing needed for supporting RDF Schema, such as transitivity of subClassOf- and subPropertyOf-properties.
scalability Unknown, but tested with 300,000 stored statements from the wordnet nouns file available at http://www.semanticweb.org/library/
performance We haven't done any serious performance testing on Sesame yet, but the aim for the OnToKnowledge project is to support *at least* ontologies of O(10^3) classes and O(10^5) triples on desk-top hardware. Note that these are minimum requirements on Sesame, and that it probably will support larger ontologies and larger numbers of triples.
platform Java 2
seealso Some documents and papers related to Sesame:

[1] OnToKnowledge deliverable 9: Query Language Definition http://www.ontoknowledge.org/countd/countdown.cgi?del9.pdf

[2] Babysteps in Sesame RQL: a tuturial on Sesame's RQL http://sesame.aidministrator.nl/doc/rql-babysteps.html

[3] Sesame's interpretation of RDF Schema http://sesame.aidministrator.nl/doc/rdf-interpretation.html http://sesame.aidministrator.nl/doc/rdf-interpretation.html
lastupdate 2001-05-22

Jena

Property Value
author The Jena Team
introduction

Jena is an RDF toolkit that contains:

  • Model/graph API
  • Persistent storage and in-memory storage
  • An RDF parser
  • A query language
  • Support for DAML+OIL ontology

Jena has a storage abstraction that enables new storage subsystems to beintegrated. The persistent storage mechanisms are current based onBerkeleyDB and SQL.

implementation The SQL implementation of this storage system supports multiple databaselayouts and different database types through a mixture of Java subclassing anddynamically loaded SQL driver files. Current versions support two variants on ageneric triple table layout, two variants on a hash indexed layout, andhave been tested on Interbase and Postgresql. Other layouts are in development.The BerkeleyDB implementation uses indexes a number of tables: the basedata and indexing tables SP->O, PO->S and OS->P.
query

The query language is RDQL,which is a syntax and a query API that can extract information from a model. RDQLis not tied to any storage implementation but can be used with any Jena modelimplementation, including any storage mechanism.

RDQL provides subgraph patterns and boolean expressions in an SQL-likesyntax (see SquishQL).

inference The current toolkit does not provide any inferencing mechanisms. RDQL doesnot provide inference; model implementers can do so if this manifest itselfthrough the triple interface.
scalability Unknown - limitations are due to the underlying storage technology. RDQLqueries have been executed on 800K statement models (custom memory mappedfile). In-memory storage has been used with 600K statements (wordnet).
performance

Small scale performance tests have used a tiny fragment of the dmoz datasetand tested link following, reversed link following and search based onstring and integer matching constraints. Performance on workstation classmachines for the SQL store is around 10ms/statement load,1-7ms/returned-statement search.

The BDB implementation is typically10x faster but does not provide transaction support.

provenance No support provided
license BSD (version 1)
api Java
transaction SQL backend supports transactions through the transaction isolationcapability of the underlying database. The BDB subsystem does not currentlysupport transactions.
platform Java 1.2 and up. Tested on MS Windows and Linux with BerkeleyDB version 3.3.11
distribution None
lastupdate 2001-11-23
seealso

More information is available at the HPLabs semantic web website