This service is now discontinued and the underlying software not maintained any more. The underlying software is available publicly if someone is interested re-establishing the service somewhere.

Microdata to RDF Distiller (closed)

This service implements the W3C IG Note on converting microdata to RDF (Second Edition).

Distill by URI
>
 

Distill by File Upload
>
 

Distill by direct input
>
:
 

If you intend to use this service regularly on large scale, consider downloading the package and use it locally. Storing a (conceptually) “cached” version of the generated RDF, instead of referring to the live service, might also be an alternative to consider in trying to avoid overloading this server…

What is it?

Microdata is a specification for attributes to be used with HTML5 to express structured data. A separate Semantic Web Interest Group Note defines a mapping from HTML5+Microdata to RDF. pyMicrodata is a distiller that generates the RDF triples from HTML5 file annotated by Microdata in various RDF serialization formats. It can either be used directly from a command line or via a CGI service. To learn more about Microdata, please consult the HTML5 Microdata Document, as well as the RDF Conversion algorithm. See also below for the possibilities to download the package.

As installed in this service is a server-side implementation of the conversion. This also means that pages that generate their (X)HTML content dynamically (e.g., using AJAX) will not be properly processed by this distiller.

Distiller options

Output format (option: format; values: turtle, xml, json, nt; default: turtle)
The default output format is Turtle. Alternative formats are RDF/XML, JSON-LD, and N-triples.

Alternative access to the Distiller

If you use Firefox, Safari, Chrome, or Opera, you can also drag the following bookmarklets to your browser bar and use them to distill the current page: “MD it (Turtle)!”, “MD it (RDF/XML)!”, “MD it (N triples)!”.

When using the distiller URI directly, the option names for the default options can be ommited. Some examples:

Extract the RDF from http://www.example.com/microdata.html, serialized in Turtle:
http://www.w3.org/2012/pyMicrodata/extract?uri=http://www.example.com/microdata.html
Extract the RDF from http://www.example.com/rdfa.html, serialized in RDF/XML:
http://www.w3.org/2012/pyMicrodata/extract?format=xml&uri=http://www.example.com/rdfa.html

Distribution

The underlying package, called pyMicrodata, implemented as a Python package, is available for download from GitHub. The package is based on the standard Python 2.x.y distribution, where 'x' should be 5 or higher. (It has been tested on version 2.7.6, which is the highest, and probably the last stable release in Python 2.x; if possible, better use that one). The module may run on the Python 3.x family with the latest release of RDFLib, but has not been thoroughly tested on that version.

The core package relies on the RDFLib package.

To install the package, download the distribution file from github and either move the pyRdfa directory to your PYTHONPATH or modify your PYTHONPATH to to include that directory. Alternatively, you can use the standard 'setup.py' script.


Ivan Herman, (ivan@w3.org)
Last revised: $Date: 2022/06/21 04:18:32 $

This software is available for use under the W3C® SOFTWARE NOTICE AND LICENSE