Dublin Core Extraction Service

How does it work?

The form invokes a generic XSLT service that takes

an XSLT transformation

the default transformation for this form, dc-extract.xsl, converts from the format given in Expressing Dublin Core in HTML/XHTML meta and link elements. and produces RDF. (an earlier version used the format given in Encoding Dublin Core Metadata in HTML, December 1999 by J. Kunze. It's not yet clear to what extent the new format subsumes the old.)

some XML data

dcq-test1.html has a number of examples from the DC spec.

Try the tidy service if you have HTML that isn't well-formed.

For example, the ADAM page isn't well-formed (i.e. if it isn't XHTML), but the results of running the ADAM page thru tidy is.

and returns the result.

Inspiration

I wrote the guts of dc-extract.xsl on my palm pilot, over drinks with Eric Miller and Dan Brickley in Amsterdam after WWW9 in an effort to show them how easy it is to use XSLT to extract RDF from real-world data.

Version 2 (thanks to Andy Powell)

This version (in testing) lowercases correctly (we want dc:title not dc:Title) and 'dumbs down' constructs such as <meta name="DC.Date.modified" content="2002-11-15" /> to simple RDF properties.

Dan Connolly
$Revision: 1.6 $ of $Date: 2005/09/07 17:15:26 $ by $Author: connolly $