Update of the RDFa distiller

Now that RDFa has been published as a Candidate Recommendation, it was time to make a new version of the RDFa distiller (ie, pyRdfa). The last update was done when RDFa went into Last Call; there has been some improvements since. Besides the (obvious) fact that the distiller follows the latest RDFa syntax, both the RDF/XML and the Turtle serializers went through serious changes: some of the earlier problems with the original serializers of RDFLib have been taken care of.

The most interesting new feature is, however, the distiller’s parser. By default, pyRdfa uses a standard Python XML parser. However, when invoked with the right option, it can also use a HTML5 parser which, after parsing HTML5 (or a non-XML HTML in general), returns simply a DOM Tree. The RDFa syntax is defined in terms of a simple DOM, i.e., the adaptation to the HTML5 parser worked essentially without problems (I want to thank Elias Torres who drew my attention to this browser and made the first steps towards its integration to pyRdfa). This also means that, using pyRdfa, RDFa attributes added to, e.g., non-XML HTML4.01 files would also yield the appropriate RDF graph.

pyRdfa is only one of many implementations of RDFa: the implementation report (which is not yet up to date!) lists already 9 independent implementations. Manu Sporny’s library is in C (and may become part of Dave Beckett’s Redland one day), Benjamin Nowack did one in PhP, Shane McCarron just finished one in Perl, Fabien Gandon did it in XSLT, Ben Adida in client side Javascript, and he also have one, I believe, in Ruby… We can be confident that all these implementations will pass, eventually, all the official tests; this is certainly the goal of all implementers. Actually, some of those implementations also implement the HTML5 parsing feature just like pyRdfa does. Not bad for a technology that has just entered Candidate Recommendation phase…

A number of pages under the W3C Semantic Web Activity are now in XHTML+RDFa, using the setup described elsewhere already. This is the case of the SW Activity Home page, various entries of the SW Use Cases and Case Studies’ collection (see, for example, one of the latets on Semantic Web and Social spaces), or my own talks pages like the talk I gave in Nancy last week. More will follow…

About Ivan Herman

Ivan Herman is the leader of the Digital Publishing Activity at W3C. For more details, see http://www.w3.org/People/Ivan/