Microdata to RDF Distiller

This service implements the W3C IG Note on converting microdata to RDF (Second Edition).


Distill by File Upload

Distill by direct input

If you intend to use this service regularly on large scale, consider downloading the package and use it locally. Storing a (conceptually) “cached” version of the generated RDF, instead of referring to the live service, might also be an alternative to consider in trying to avoid overloading this server…

What is it?

Microdata is a specification for attributes to be used with HTML5 to express structured data. A separate Semantic Web Interest Group Note defines a mapping from HTML5+Microdata to RDF. pyMicrodata is a distiller that generates the RDF triples from HTML5 file annotated by Microdata in various RDF serialization formats. It can either be used directly from a command line or via a CGI service. To learn more about Microdata, please consult the HTML5 Microdata Document, as well as the RDF Conversion algorithm. See also below for the possibilities to download the package.

As installed in this service is a server-side implementation of the conversion. This also means that pages that generate their (X)HTML content dynamically (e.g., using AJAX) will not be properly processed by this distiller.

Distiller options

Output format (option: format; values: turtle, xml, json, nt; default: turtle)
The default output format is Turtle. Alternative formats are RDF/XML, JSON-LD, and N-triples.
Perform vocabulary expansion (option: vocab_expansion; values: true, false; default: false )
RDFa 1.1 defines the possibility to “expand” the vocabulary provided by the vocab attribute, i.e., to retrieve the corresponding RDF file and follow the possible subclass and subproperty relationships. See the RDFa 1.1. Core document for further details. This options makes this possibility available for the microdata as well, where the vocabulary is defined by the itemtype. This behavior is non standard and therefore optional.
Use caching for vocabulary expansion (option: vocab_cache; values: true, false; default: true)
In case vocabulary expansion is set, a built-in caching mechanism is used to store the vocabulary information locally to the processor.

Alternative access to the Distiller

If you use Firefox, Safari, Chrome, or Opera, you can also drag the following bookmarklets to your browser bar and use them to distill the current page: “Microdata it (Turtle)!”, “Microdata it (RDF/XML)!”, “Microdata it (N triples)!”.

When using the distiller URI directly, the option names for the default options can be ommited. Some examples:

Extract the RDF from http://www.example.com/md.html, serialized in Turtle:
Extract the RDF from http://www.example.com/md.html, serialized in RDF/XML:
Use a fixed, pseudo URI to extract the RDF from the current page without specifying its URI (with default options); this can be used, say, as a link for a button on the page:


The underlying package, called pyMicrodata, implemented as a Python package, is available for download from GitHub. The package is based on the standard Python 2.x distribution where 'x' is higher or equal than 5. (It has been tested on version 2.7.6, which is the highest, and probably the last stable release in Python 2.x). The module may run on the Python 3.x family with the latest release of RDFLib, but has not been thoroughly tested on that version.

To install the package, download the distribution file (it is a compressed tar file) and either move the pyMicrodata directory to your PYTHONPATH or modify your PYTHONPATH to to include that directory. Alternatively, the included setup.py can also be used to install the library into the system-dependent areas.

Ivan Herman, (ivan@w3.org)
Last revised: $Date: 2014-12-17 08:54:54 $ (see in RDF)

This software is available for use under the W3C® SOFTWARE NOTICE AND LICENSE

'Valid XHTML + RDFa' button