FOAF driven development
FOAF Galway Workshop position paper

Dave Beckett
Senior Technical Developer
Semantic Web Advanced Development Europe (SWADE),
Institute for Learning and Research Technology (ILRT),
University of Bristol

Abstract

This paper describes the key features of FOAF that have been seen to influence the development of the Redland RDF Application Framework[1].

1. Introduction

The Redland RDF tools are a set of packages consisting of the Redland API, Raptor RDF parser library[2] and Rasqal RDF query library[3].

They were developed from mid-2000 as described in [4] and provide core libraries for embedding in applications to provide RDF support, as well as foundations for new semantic web applications. This is similar to earlier work in being object-based, but was designed to be cross-language as well as portable. The initial version was mostly concentrated on storing graphs and providing convienent ways to access and manipulate them, parse and serialize to concrete syntaxes. This was an appropriate choice for the initial work but in due course, needed feedback from applications to direct the developments.

This direction started to emerge as the FOAF applications began to be built which were in several ways different from earlier RDF applications that tended to be either specialised standalone applications such as Amaya, Annotea for annotations; embedded inside the Mozilla browser as a datamodel, or others using early (before 2000) Java APIs.

FOAF has brought out and required key features described in the following sections, much more than the earlier applications.

2. The Web in "Semantic Web"

This is often overlooked but is a fundamental part and causes similar features and trends - the data is distributed, made by individuals as well as organisations, linking is expected and without pre-coordination, people need to jump from document to document to find things, the content is expected to be updated and moved, broken (404ed) and so on.

This led to adding lots of resilience in Raptor[2] for bad data such that it can try to report the errors, as well as deal with large RDF/XML files efficiently - stream results using a minimum of memory and not leak memory. As well as:

3. Data Aggregation

FOAF quite soon shows a need to consider the network aspects of dealing with web-scale information and managing data aggregation and updating. This was demonstrated in FOAF crawlers such as Edd Dumbill's FOAFbot[5], built with Redland and python.

The problem FOAFbot found was in tracking FOAF content that was updated as it crawled and re-crawled, which was hard to do in a simple triple store. This led to the development of Redland Contexts[6] that efficiently allow sets of triples to be added and removed from a graph (preserving duplicates) to allow easy updating of aggregated information.

4. Triple Stores

There are a wide variety of storage issues for FOAF ranging from the large - store all the triples ever seen - to the small - temporary graphs for new files before they are merged in. Some of these might be best with database backing, others are not appropriate in small environments.

These led to extending the original Redland store from 2 (memory, indexed hashes) to include MySQL for large stores written by Morten Frederiksen and several others, described in [7].

5. Language Bindings

Writing Redland in C gave portability but it also enabled something that promotes RDF and applications such as FOAF, ease of use from other language bindings by using wrappers. Most modern dynamic languages such as Python, Ruby, C# have good or very-good support for calling native C methods. Redland at present has 7 binding languages available[8] to allow a wide range of developers to easily work in their own application languages without the need to deal with C.

This is supporting feature rather than one that FOAF applications alone required, but enables easier development for all applications.

6. RDF Query

To make FOAF applications simpler, sometimes you need to just abstract away from the data details and just ask questions like "give me the person whose foaf:mboxsha1_sum is XXX" which can be done at the API level but is much easier using a query language such as Squish, RDQL, SeRQL etc.

This led to the creation of the Rasqal RDF Query library[3] currently providing RDQL for Redland, along with language bindings for (at present) Python, Perl and C#.

7. Summary

Redland has had a large part of it's development driven by FOAF applications, as FOAF is the largest and webbiest Semantic Web application, with lots of real data available, and this is only going to continue.

References

[1]
Redland RDF Application Framework, Dave Beckett
[2]
Raptor RDF Parser Toolkit, Dave Beckett, a Redland package.
[3]
Rasqal RDF Query Library, Dave Beckett, a Redland package.
[4]
The Design and Implementation of the Redland RDF Application Framework, Dave Beckett, in proceedings of WWW10, May 2-5, 2001, Hong Kong, ACM 1-58113-348-0/01/0005.
[7]
FOAFBot: IRC Community Support Agent, Edd Dumbill as described in his articles Tracking provenance of RDF data and Support online communities with FOAF, for IBM developerWorks.
[6]
Large Scale Resource Discovery and Presentation Demonstrator, Dave Beckett, SWAD-Europe Deliverable 12.4.1, 2004-06-04
[7]
Redland Storage Modules, Dave Beckett
[8]
Redland Language Bindings, Dave Beckett, a Redland package