How to add RDF information to a page using RDFa?

The Semantic Web Activity home page has a number of information that might be of interest for the Semantic Web (eg, for data integration). These include: references to existing recommendations, talks on the subject made by working group members or the W3C staff, references to active groups, etc. Ie, it sounds like a good idea to make these available in RDF, too. Of course one could achieve that by publishing two files: http://www.w3.org/2001/sw/Overview.html for the HTML version and a separate http://www.w3.org/2001/sw/Overview.rdf for RDF (remember that W3C’s Apache setup is such that the default index file is called “Overview”). But this would lead to versioning problems; not a good idea!

This is where RDFa comes into the picture: don’t duplicate information if you can avoid it. Instead, add the RDFa attributes to the HTML file and let the machines do the rest. And this is what has been done. If you look under the hood, http://www.w3.org/2001/sw/Overview.html is, in fact, in RDFa, ie, the core (X)HTML information is enriched with some attributes that allows the automatic generation of corresponding RDF data.

The best is, of course, to look at the source to see the details; here is just an example. This is, essentially, how an entry on a recommendation looks like in XHTML+RDFa:

<li resource="http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">
  <a rel="doc:versionOf" href="http://www.w3.org/TR/rdf-syntax-grammar" property="dc:title">RDF/XML Syntax Specification (Revised)</a>,
  <span rel="rdf:type" resource="[tr:REC]">W3C Recommendation,</span>
  <span property="dc:date" content="2004-02-10">February 10, 2004,</span>
  <span rel="tr:editor"><span typeof="contact:Person" property="contact:fullName">Dave Beckett</span></span>, ed.

yielding, in RDF:

<http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/> a tr:REC;
     dc:date "2004-02-10";
     dc:title "RDF/XML Syntax Specification (Revised)";
     doc:versionOf <http://www.w3.org/TR/rdf-syntax-grammar>;
         [ a contact:Person;
             contact:fullName "Dave Beckett"

using a number of existing vocabularies (eg, Dublin Core or the “TR” vocabulary that W3C has been using for years to describe its documents).

So how would one set up the server to get the right version of the documents for the right request? Although, at some point in time, one could expect that (RDF) browsers will just pick up the RDF information automatically, what to do in the meantime? What one would like to have is:

  • http://www.w3.org/2001/sw/ should return
    • XHTML by default
    • RDF/XML or Turtle if so requested by the client, generated from the XHTML file on-the-fly via an RDFa processor
  • http://www.w3.org/2001/sw/Overview.rdf should return RDF/XML, generated on-the-fly
  • http://www.w3.org/2001/sw/Overview.ttl should return RDF in Turtle, again generated on-the-fly
  • http://www.w3.org/2001/sw/Overview.html should return, obviously, XHTML

A bit of Apache Wizzardy works here. First a special Apache file is created to control content negotiation. The usual setup is to associate this to the “var” extension, ie, Overview.var in this case. The file itself looks fairly simple:

URI: Overview

URI: Overview.html
Content-Type: text/html

URI: Overview.rdf
Content-Type: application/rdf+xml; qs=0.4

URI: Overview.ttl
Content-Type: text/turtle; qs=0.5

that will instruct the Apache server to choose the right file depending on the accept header. HTML will be returned if both HTML and RDF/XML are accepted; and Turtle is preferred if both RDF/XML and Turtle are accepted by the client (that is the role of those “qs” values).

That takes care of the content negotiations, but we are not yet done because, remember, the goal is to generate the RDF/XML and Turtle versions on-the-fly. This is achieved by adding the following lines to the .htaccess file in the directory:

RewriteEngine On
RewriteBase /2001/sw/
RewriteRule Overview.rdf /2007/08/pyRdfa/extract?uri=http://www.w3.org/2001/sw/Overview.html [L]
RewriteRule Overview.ttl /2007/08/pyRdfa/extract?format=turtle&uri=http://www.w3.org/2001/sw/Overview.html [L]

that instructs the server to run a script (ie, the RDFa distiller) on the (X)HTML file when an RDF/XML or Turtle versions are required.

That is it… (And thanks to Ralph Swick and Tim Berners-Lee who gave me the right push and information to handle Apache.)