This service is now discontinued and the underlying software not maintained any more. It corresponded to the RDFa 1.0 specification. In 2012, W3C has published an updated version of that specification, called RDFa Core 1.1. A new distiller, processing RDFa 1.1 content, was implemented and ran as a separate service, until its retirement in 2022. The underlying software for the 1.1 version is available publicly if someone is interested re-establishing the service somewhere.

RDFa technology button RDFa Distiller and Parser

If you intend to use this service regularly on large scale, consider downloading the package and use it locally. Storing a (conceptually) “cached” version of the generated RDF, instead of referring to the live service, might also be an alternative to consider in trying to avoid overloading this server…

>
 
Show More Options

Distill by File Upload
>
 
Show More Options

Distill by direct input
>
:
 
Show More Options

What is it?

RDFa is a specification for attributes to be used with XHTML or SVG Tiny to express structured data. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don’t need to repeat significant data in the document content. The underlying abstract representation is RDF, which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. pyRdfa is a distiller that generates the RDF triples from an (X)HTML+RDFa or SVG Tiny 1.2 file in various RDF serialization formats. It can either be used directly from a command line or via a CGI service. It corresponds to the RDFa Recommendation, published on the 14th of October, 2008, and, for the SVG version, to the SVG Tiny 1.2 Recommendation, published on the 22nd of December, 2008. The forms above can be used to start the service installed at this site. To learn more about RDFa, please consult the RDFa Syntax Document. See also below for the possibilities to download the package.

pyRdfa is a server-side implementation of RDFa. This also means that pages that generate their XHTML content dynamically (eg, using AJAX) will not be properly processed by this distiller. The present implementation does not handle password protected content, either.

Distiller options

Output format (option: “format”, values: “xml”, “turtle”, “nt”)
The default output format is RDF/XML. Alternative formats are Turtle and N-triples.
Warnings (option: “warnings”, values: “true”/“false”)
When started with the option “true”, the generated RDF graph will possibly include warnings, error messages, or informations, in the form of RDF comment triples. A typical example for a warning is when about="pref:b" is found instead of about="[pref:b]", unless pref stands for one of the commonly used URI protocols like http, ftp, etc. The default is not to generate warnings.
Parser options (option: “parser”, values: “lax”/“strict”)
The pyRdfa parser can be run in two different modes, namely “strict” and “lax”. In both cases the input is parsed first with a built-in XML parser. If an error occurs and the parser is run in “strict” mode, then the processing stops at that point with an error message. However, if the parser is “lax”, the input is also re-parsed using an HTML5 parser. As this parser also handles, essentially, HTML4.01 “tag soup” files, this means that when using the “lax” option the strict XML requirement is relaxed. The default setting is lax.

(Note that the HTML5 parser is work in progress, errors may occur. Note also that the XML parser does not validate the content against the XHTML+RDFa DTD, although a warning is issued if none of the conformance options in the RDFa syntax are used.)

Whitespace preservation in literals (option: “space-preserve”, values: “true”/“false”)
The RDFa syntax specifies that whitespace characters in the original XHTML must be preserved in the literal output. This options instructs the distiller to “normalize” the whitespace. The default is not to normalize.

SVG handling

The SVG Tiny 1.2 recommendation, published in December 2008, also adopted RDFa as a means to add RDF (meta)data. The semantics of the RDFa attributes are identical to the XHTML case but the fact that the host language is SVG does lead to two small differences:

  1. SVG uses xml:base, whereas XHTML1+RDFa disallows it
  2. SVG inherits from earlier versions the possibility to add RDF/XML directly into the SVG content via the metadata element. An SVG+RDFa distiller ought to understand this RDF graph and merge it with the graph produced by the regular RDFa processing. Such interpretation is meaningless in the XHTML case.

The distiller automatically recognizes an SVG content in case it uses the correct SVG namespace and the top level element is svg. For other possible XML dialects the extra “host” option with value “xml” can be used to trigger an identical behaviour.

Alternative access to the Distiller

If you use Firefox or Opera, then you can also drag the following bookmarklets to your browser bar and use them to distill the current page: “RDFa it (RDF/XML)!”, “RDFa it (Turtle)!”, “RDFa it (N triples)!”.

When using the distiller URI directly, the option names for the default options can be ommited. E.g., the URI for the RDF/XML formatted RDFa output of http://www.example.com/rdfa.html, with whitespace preservation and without warnings, and using the “lax” parser is:

http://www.w3.org/2007/08/pyRdfa/extract?uri=http://www.example.com/rdfa.html

The same RDF content in turtle:

http://www.w3.org/2007/08/pyRdfa/extract?format=turtle&uri=http://www.example.com/rdfa.html

The same RDF content but with possible warnings:

http://www.w3.org/2007/08/pyRdfa/extract?warnings=true&uri=http://www.example.com/rdfa.html

Etc. It is also possible to use a fixed pseudo URI:

http://www.w3.org/2007/08/pyRdfa/extract?uri=referer

to generate the RDF (with the default options) of the current file without specifying the URI of the page. This can be used, say, as a link for a button on the page.

Distribution

The underlying package, called pyRdfa, implemented as a Python module, is also available for download. The core package relies on the RDFLib package, on Deron Meranda’s httpheader module, and, if the “lax” mode is used, on a HTML5 parser. Otherwise it needs only the standard Python 2.X distribution (has been tested on version 2.6). The package includes a possible CGI interface script to start a service like this one.


Ivan Herman, (ivan@w3.org)
Last revised: $Date: 2022/06/28 04:29:19 $ (see in RDF)

This software is available for use under the W3C® SOFTWARE NOTICE AND LICENSE

'Valid XHTML + RDFa' button