RDFa 1.1 Distiller and Parser (Closed)

This service is now discontinued and the underlying software not maintained any more. The underlying software is available publicly if someone is interested re-establishing the service somewhere.

RDFa 1.1 Distiller and Parser (closed)

Warning: This version implements RDFa 1.1 Core, including the handling of the Role Attribute. The distiller can also run in XHTML+RDFa 1.0 mode (if the incoming XHTML content uses the RDFa 1.0 DTD and/or sets the version attribute). The package available for download, although it may be slightly out of sync with the code running this service.

Distill by URI
Distill by File Upload
Distill by Direct Text Input

Distill by URI

URI:
Output Format:
Returned content:
Expand vocabularies:
Generate warnings for non RDFa 1.1 Lite usage:

More (non-standard) options

Include turtle content embedded in a <script> element, or RDF/XML content in its own namespace:
Whitespace preserved in literals:
Use caching for vocabularies:
Report the details of vocabulary caching:
Bypass time check on vocabublary cache, i.e., generate a new cache every time:

Distill by File Upload

Local File:
Host Language:
Output Format:
Returned content:
Expand vocabularies:
Generate warnings for non RDFa 1.1 Lite usage:

More (non-standard) options

Include turtle content embedded in a <script> element, or RDF/XML content in its own namespace:
Whitespace preserved in literals:
Use caching for vocabularies:
Report the details of vocabulary caching:
Bypass time check on vocab cache, i.e., generate a new cache every time:

Distill by direct input

Enter the Markup to distill:

Host Language:
Output Format:
Returned content:
Expand vocabularies:
Generate warnings for non RDFa 1.1 Lite usage:

More (non-standard) options

Include turtle content embedded in a <script> element, or RDF/XML content in its own namespace:
Whitespace preserved in literals:
Use caching for vocabularies:
Report the details of vocabulary caching:
Bypass time check on vocab cache, i.e., generate a new cache every time:

If you intend to use this service regularly on large scale, consider downloading the package and use it locally. Storing a (conceptually) “cached” version of the generated RDF, instead of referring to the live service, might also be an alternative to consider in trying to avoid overloading this server…

What is it?

RDFa 1.1 is a specification for attributes to be used with XML languages or with HTML5 to express structured data. The rendered, hypertext data of XML or HTML is reused by the RDFa markup, so that publishers don’t need to repeat significant data in the document content. The underlying abstract representation is RDF, which lets publishers build their own vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. pyRdfa is a distiller that generates RDF triples from an XML or HTML5 file annotated by RDFa in various RDF serialization formats. It can either be used directly from a command line or via a CGI service. It corresponds to the RDFa 1.1 Core document, XHTML+RDFa, and HTML+RDFa specifications, as well as to the SVG Tiny 1.2 Recommendation for the SVG version. The forms above can be used to start the service installed at this site. To learn more about RDFa, please consult the RDFa 1.1 Core Document. See also below for the possibilities to download the package.

As installed, this service is a server-side implementation of RDFa. This also means that pages that generate their (X)HTML content dynamically (e.g., using AJAX) will not be properly processed by this distiller.

Distiller options

Determination of host language type

When the RDFa resource is accessed through HTTP, the host language is determined based on the content type of the return header as follows:

Alternative access to the Distiller

If you use Firefox, Safari, Chrome, or Opera, you can also drag the following bookmarklets to your browser bar and use them to distill the current page: “RDFa it (Turtle)!”, “RDFa it (RDF/XML)!”, “RDFa it (N triples)!”.

When using the distiller URI directly, the option names for the default options can be ommited. Some examples:

Error reporting

The distiller adds either error, warning, or informaation triples into the processor graph. Some of those are defined by the RDFa Core document, some additional messages are generated by the distiller. The latter category includes, e.g., HTTP 404 errors; these are reported using the same error structure as the ones defined by the standard.

Distribution

The underlying package, called pyRdfa, implemented as a Python package, is available for download from GitHub. The package is based on the standard Python 2.x.y distribution, where 'x' should be 5 or higher. (It has been tested on version 2.7.2, which is the highest, and probably the last stable release in Python 2.x; if possible, better use that one). The module does not run (yet) on the Python 3.x family. The documentation of the package can be consulted on-line (but is also part of the distribution).

The core package relies on the RDFLib package. It has been tested on the RDFLib 3.1.0, but it also runs with the RDFLib 2.x versions. RDFLib 3.x is preferred: the serialization modules are superior in quality. (Note, however, that the JSON serialization does not run on RDFLib 2.x versions!) The Python HTML5 parser is used to process HTML5. The general package also relies on a slightly modified version of Deron Meranda’s httpheader module. Finally, for reasons that I do not really understand, in some cases the RDFLib distribution generates an import error on a module called isodate that has to be installed manually. (The HTML5 Parser, the httpheader, and the isodate modules are included in the distribution to make installation easier.)

For the JSON-LD serialization, two more external packages are used: Armin Ronacher’s Ordered Dictionary (odict) package, as well as Bob Ippolito’s simplejson package. odict is needed unless Python 2.7.x is used (an ordered dictionary module has been added to the standard distribution of Python 2.7.x); simplejson is needed for Python 2.5 (json has been added to the standard Python 2.6.x distribution).

To install the package, download the distribution file from github and either move the pyRdfa directory to your PYTHONPATH or modify your PYTHONPATH to to include that directory. Alternatively, you can use the standard 'setup.py' script. The odict and httpheader modules (each consisting of a single Python file) have been added to the pyRdfa package under ‘extras’; you do not have to do anything special to install these. The HTML5 parser must be installed independently; to make this step easier, the compressed tar file has been added to the pyRdfa distribution file. The same is true for the simplejson package although, if you run Python 2.6.x or higher, that module can be ignored.