Recipe 6 implementation

Following ISSUE-17, this page is a discussion on the implementation of Recipe 6 of the BestPracticeRecipes for publishing RDF vocabularies.

NOTE: This page is intended to merely support the production of the Recipe 6 and as such do not even constitute a draft document. While it can be cited, it is highly volatile and should not be considered final in any way.

NOTE: There is an ongoing action to recast this recipe as an implementation pattern, see http://www.w3.org/2007/10/09-swd-minutes.html#action02 (scroll up to read the resolution).

Recipe 6 features

From the working draft:

Extended configuration (i.e.: both HTML and RDF versions, content negotiation)
Slash namespace, for instance:
- http://example.com/example6/ (vocabulary)
- http://example.com/example6/ClassA (class)
- http://example.com/example6/propA (property)
Multiple hyperlinked HTML documents
RDF content being made available via some sort of query service such that clients can obtain a partial RDF description of the vocabulary as appropriate

Features 1-3 are shared with Recipe 5. Only the last feature is actually new. However, it seems an unlikely scenario to use dynamically-generated RDF fragments and static HTML at the same time. Therefore, the partial HTML documents should be also available via (maybe the same?) query service.

Implementation patterns

Recipe 6 just provides some hints on how the vocabulary should be published, and what the external behavior should look like (see the previous section). There are several ways to implement this recipe, using different programming languages and techniques. This document does not prescribe any particular implementation. Moreover, we do not describe a particular implementation to great detail. Consequently, some web programming knowledge is required to implement this recipe. This document describes some common implementation patterns which can be uses as guidelines to implement the recipe.

Using application logic

This pattern relies on some application logic deployed in the web server. This logic can be a thin layer that simply redirects the requests (or acts as a proxy) to an third-party web server (see the DBPL example below), or a thick layer that loads an RDF datasource and translates the HTTP requests into API calls to execute the queries.

There are two alternatives to introduce server-side logic:

Script in the server-side. Common server-side script languages (PHP, Python, Perl, Ruby) have RDF APIs and bindings with RDF stores, and therefore are suitable to write a simple script that queries an RDF file or RDF store and returns the relevant portion).
- Pros: ease of deployment (many web hosting servers have support for one of these languages)
- Cons: webmasters are expected to write a (probably ad-hoc) script
Java Servlet (or equivalent).
- Pros: Java's fairly good support for RDF, SPARQL and RDF stores
- Cons: heavyweight solution, difficult to deploy (requires a servlet container)

Sample implementation (valid for alternatives 1 and 2). The DBPL server published by Free University of Berlin is used in the following example as content provider (Apache rewrite rules, beware of line wraps):

RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteRule ^example6/(.+) http://www4.wiwiss.fu-berlin.de/dblp/page/person/$1 [R=303]

RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^example6/(.+) http://www4.wiwiss.fu-berlin.de/dblp/data/person/$1 [R=303]

RewriteRule ^example6/(.+) http://www4.wiwiss.fu-berlin.de/dblp/data/person/$1 [R=303]

The rules are straightforward. Requests for HTML data are forwarded to the URL of the HTML version exported by the D2R servlet; requests for RDF data (or without Acccept: header) are forwarded to the URL of the RDF data exported by the D2R servlet. This is an example of a thin layer.

The trickiest part of implementing Recipe 6 using application logic is to correctly implement HTTP content-negotiation from scratch. While most web scripting languages (PHP, Python...) and framework provide access to the value of the HTTP headers (and thus, to the Accept: header), choosing the appropiate return type is far from trivial. The Accept header may contain wildcards and q-values, so regular expressions or simple string comparison functions are not enough (NOTE: is there any library to perform content-negotiation? feedback is welcome).

Redirecting to a SPARQL endpoint using Apache

This pattern does not involve writing any application logic. Instead, requests are HTTP-redirected using Apache mod_rewrite. This technique is particularly suited to wrap an SPARQL endpoint. We exploit the fact that many SPARQL endpoints export HTTP bindings.

Forward HTTP requests to a SPARQL endpoint with HTTP bindings
- Pros: lightweight, requires no programming
- Cons: a SPARQL endpoint for the vocabulary must be available somewhere

Sample implementation (Apache rewrite rule, beware of line wraps):

RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteRule ^example6/(.+) http://www4.wiwiss.fu-berlin.de/dblp/page/person/$1 [R=303]

RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^example6/(.+) http://www4.wiwiss.fu-berlin.de/dblp/sparql?query=DESCRIBE+<http://www4.wiwiss.fu-berlin.de/dblp/data/person/$1> [R=303]

RewriteRule ^example6/(.+) http://www4.wiwiss.fu-berlin.de/dblp/sparql?query=DESCRIBE+<http://www4.wiwiss.fu-berlin.de/dblp/data/person/$1> [R=303]

In this example, when the client asks for HTML content, it is forwarded to an external web server. However, RDF requests are handled differently. A request for an URI such as http://example.com/example6/100007 is redirected to the result of executing a DESCRIBE < http://www4.wiwiss.fu-berlin.de/dblp/data/person/100007 > sentence against the D2R SPARQL endpoint. The result is an RDF graph which describes the resource (in this particular case, TimBL's publications from DBLP).

RalphS doesn't like DESCRIBE, see http://www.w3.org/2007/10/09-swd-minutes.html#item04.

Case studies

Joshua Tauberer announced he exposed a large RDF dataset from the US Census. There is an SPARQL endpoint available, and the URIs are dereferencable by means of URL rewriting (see, for instance, http://www.rdfabout.com/rdf/usgov/geo/us).
D2R can publish large datasets and it uses redirects to SPARQL queries. More information in section 5 of 'Cool URIs for the Semantic Web'.