Gleaning Resource Descriptions from Dialects of Languages (GRDDL)

W3C Working Draft 24 October 2006

This Version:
Latest Version:
Dan Connolly
see Acknowledgments


GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. This GRDDL specification introduces markup for declaring that an XML document includes gleanable data and for linking to an algorithm, typically represented in XSLT, for gleaning the resource descriptions from the document.

The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.

A corresponding GRDDL Use Case Working Draft provides motivating examples. A GRDDL Primer demonstrates the mechanism on XHTML documents which include widely-deployed dialects, more recently known as microformats.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the First Public Working Draft of the GRDDL design by the GRDDL Working Group, which was chartered in July 2006 to review the specification and develop use cases, tutorial materials, and tests. The GRDDL design was released as a Note as early as April 2004; see the change log appendix for details.

GRDDL is intended to contribute to addressing Web Architecture issues such as RDFinXHTML-35 and namespaceDocument-8 as well as issues postponed by the RDF Core working group such as rdfms-validating-embedded-rdf and faq-html-compliance.

There are now multiple implementations, including an online service, and a growing collection of tests. A number of issues remain to be decided by the working group; this draft takes a position on some of them.

A few editorial notes and TODOs in this style remain. In particular, the figures have not yet been updated with respect to changes in the text; we hope they are more helpful than distracting in their present state.

Please send comments about this document to public-grddl-comments@w3.org (with public archive).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

  1. Introduction
  2. Adding GRDDL to well-formed XML
  3. GRDDL for XML Namespaces
  4. The GRDDL profile for XHTML
  5. GRDDL for HTML Profiles
  6. GRDDL Transformations
  7. Security Considerations
  8. The GRDDL Vocabulary
  9. References

1. Introduction: Data and Documents

There are many dialects of languages in practice among the many XML documents on the web. There are dialects of XHTML, XML and RDF that are used to represent everything from poetry to prose, purchase orders to invoices, spreadsheets to databases, schemas to scripts, and linked lists to ontologies. Some offer more formally defined semantics and others more loosely-couple semantics. Recently, two progressive encoding techniques have emerged to overlay additional semantics onto valid XHTML documents: RDFa and microformats offer simple, open data formats built upon existing and widely adopted standards.

While this breadth of expression is quite liberating, inspiring new dialects to codify both common and customized meanings, it can prove to be a barrier to understanding across different domains or fields. How, for example, does software discover the author of a poem, a spreadsheet and an ontology? And how can software determine whether authors of each are in fact the same person?

DocBook V4.X TEI
<title>The Stand</title>
<TEI ... >
<title>The Stand</titl>
Atom Open Office
<entry ... >
<title>The Stand</title>
<name>Stephen King</name>
<office:document-meta ... >
<dc:title>The Stand</dc:title>
  Stephen King
<dc:creator>Stephen King</dc:creator>

Resource Descriptions

The Resource Description Framework[RDFC04] provides a standard for making statements about resources in the form of a subject-predicate-object expression. One way to represent the fact "The Stand's author is Stephen King" in RDF would be as a triple whose subject is "The Stand," whose predicate is "has the author," and whose object is "Stephen King." The predicate, "has the author" expresses a relationship between the subject (The Stand) and the object (Stephen King). Using URIs to uniquely identify the book, the author and even the relationship would facilitate software design because not everyone knows Stephen King or even spells his name consistently.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

<rdf:Description rdf:about="http://www.stephenking.com/pages/works/stand/">
  <dc:title>The Stand</dc:title>
  <dc:creator>Stephen King</dc:creator>
      <foaf:isPrimaryTopicOf rdf:resource="http://en.wikipedia.org/wiki/Stephen_King" />


GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. That is, GRDDL provides a relatively inexpensive mechanism for bootstrapping RDF content from uniform XML dialects; shifting the burden from formulating RDF to creating transformation algorithms specifically for each dialect. XML Transformation languages such as XSLT are quite versatile in their ability to process, manipulate, and generate XML. The use of XSLT to generate XHTML from single-purpose XML vocabularies is historically celebrated as a powerful idiom for separating structured content from presentation.

GRDDL shifts this idiom to a different end: separating structured content from its authoritative meaning (or semantics). GRDDL works by associating transformations for an individual document, either through direct inclusion of references or indirectly through profile and namespace documents. Content authors can nominate the transformations for producing RDF from their content and use GRDDL to refer to them.

Faithful Renditions

By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition of the source document, or some portion of the source document, that preserves its meaning in RDF.

Likewise, by specifying a GRDDL namespace transformation or profile transformation, the creator of that namespace or profile states that the transformation will provide a faithful rendition of a class of source documents which relate to that namespace or profile, or some portion of such a source document, that preserves its meaning in RDF. A namespace document or a profile document also provide a means for their authors to explain, prosaically, the purpose of the transformation or any policy statements.

GRDDL Primer

The GRDDL Primer[primer] is a step-by-step tutorial on the GRDDL mechanism. It develops on a number of examples from the GRDDL Use Cases document to illustrate GRDDL techniques for associating documents with transformations for extracting RDF.

GRDDL Use Cases

The use cases document[usecases] collects a number of use cases together with their goals and requirements for GRDDL. These use cases also illustrate how XML and XHTML documents can be decorated with microformat, Embedded RDF or RDFa statements to support GRDDL transformations in charge of extracting valuable data that can then be used to automate a variety of tasks.

GRDDL Specification

This GRDDL specification is a concise technical specification of the GRDDL mechanism and its XML syntax. It specifies the GRDDL syntax to use in valid XHTML and well-formed XML documents, as well as how to encode GRDDL into namespaces and HTML profiles. Discussions of the GRDDL transformation link and security issues are also covered. Appendices provide links to extended examples and existing software and services that employ GRDDL.

2. Adding GRDDL to well-formed XML

The general form of associating a GRDDL transformation link with a well-formed XML document is by adorning the root element with a grddl namespace declaration and a grddl:transformation attribute whose value is a URI reference, or list of URI references, that refer to executable scripts or programs which are expected to transform the source document into RDF. This method is suitable for use with a wide variety of XML dialects that are are not constrained from adding attributes by an XML DTD.

Stated more formally:

GRDDL in well-formed XML

In any dialect of XML not constrained by an XML DTD, the following example applies:

<root-element xmlns:grddl="http://www.w3.org/2003/g/data-view#"
<etc> ... </etc>

GRDDL in well-formed XHTML

In any dialect of XHTML that is not constrained by DTD syntax, the above example can be written:

<html xmlns="http://www.w3.org/1999/xhtml"
  <title>Joe Lambda's Home page [an example of RDF in XHTML]</title>

Notice that data-view was used in the namespace declaration in place of grddl in this example to illustrate the fact that the prefix does not have to be grddl and to emphasize the data-centric focus of the RDF/XML view.

As you will see in later sections, there are other ways to add GRDDL to HTML documents, especially designed to leverage HTML's existing capabilities and thereby overcome constraints imposed by the XML DTDs for some dialects of HTML. See Using GRDDL with valid XHTML and GRDDL for HTML Profiles.

3. Using GRDDL with XML Namespace Documents

Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace. Any resource available for retrieval from a namespace URI is a namespace document (cf. section 4.5.4. Namespace documents in [WEBARCH]). For example, a namespace document may have an XML Schema representation or an RDF Schema representation, or perhaps both, using content negotiation.

To associate a GRDDL transformation with a whole dialect, have the namespace document include the grddl:namespaceTransformation property. The precise methods for allowing various types of namespace documents to include this property are detailed below, first formally and then by example.

Note issue issue-mt-ns is open. perhaps: special case for the RDF/XML namespace: RDF/XML documents are associated with RDF graphs as per the RDF/XML specification.

Using GRDDL with an RDF Namespace document

For example, consider this privacy policy written in P3Q, a contrived analog to P3P[P3P]:

<POLICIES xmlns="http://www.w3.org/2004/01/rdxh/p3q-ns-example">
	<EXPIRY max-age="604800"/>

The namespace document for P3Q relates the grokP3Q.xsl transformation to all P3Q documents:

 <rdf:Description rdf:about="http://www.w3.org/2004/01/rdxh/p3q-ns-example">
diagram: glean via namespace
transformation applied to namespace

The Working Group is likely to add a section to the GRDDL primer much like this subsection. Since this subsection has no novel normative material, we're interested in feedback on whether it should remain part of this specification once it is covered by the primer.

Using GRDDL with an XML Schema namespace document

A namespace transformation link may be discoverable by transforming the namespace document itself. Note that this means that namespace documents need not be written in RDF/XML directly.

Consider a purchase order that has a namespace document represented in XML Schema, where the XML Schema bears a data-view:transformation attribute licensing extraction of statements that include namespaceTransformation statements:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            data-view:transformation="http://www.w3.org/2003/g/embeddedRDF.xsl" >
    <xsd:element name="Order" type="OrderType">
      <xsd:documentation>This element is the root element.</xsd:documentation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<rdf:Description rdf:about="http://www.w3.org/2003/g/po-ex">
	      rdf:resource="grokPO.xsl" />

Every purchase order using that schema as a namespace document is linked to the grokPO.xsl transformation, as illustrated below:

diagram: glean via namespace

using GRDDL with an XML Schema


@@oops... the figure uses "result" as the result of an XSLT transformation, but that clashes with GRDDL result.

The Working Group is likely to add a section to the GRDDL primer much like this subsection. Since this subsection has no novel normative material, we're interested in feedback on whether it should remain part of this specification once it is covered by the primer.

4. Using GRDDL with valid XHTML

To accomodate the DTD-based syntax of XHTML[XHTML], which precludes using attributes from foreign namespaces, we use http://www.w3.org/2003/g/data-view as a metadata profile (cf. section Meta data profiles of [HTML4]).

The general form of adding a GRDDL assertion to a valid XHTML document is by specifying the GRDDL profile in the profile attribute of the head element, and transformation as the value of the rel attribute of a link or a element whose href attribute value is a URI reference that refers to an executable script or program which is expected to transform the source document into RDF. This method is suitable for use with valid XHTML documents which are constrained by an XML DTD.

Stated more formally:

@@TODO: be more clear about what "whose metadata profiles" means

An example Dublin Core META transformation

For example, this document follows the conventions of [RFC2731], and it explicitly uses the GRDDL profile and links to an XSLT transformation that extracts the metadata in RDF/XML in a way that preserves the meaning of the document:

<html xmlns="http://www.w3.org/1999/xhtml">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Some Document</title>

    <link rel="transformation"
       href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />
    <meta name="DC.Subject"
       content="ADAM; Simple Search; Index+; prototype" />

In the figure below, the arrow labelled info relates a document to an abstract notion of the information contained in the document. It shows that the RDF data extracted via the dc-extract.xsl transformation is part of the information contained in the document:

diagram: link to transformation

Decoding HTML meta-data to RDF


This is what the data looks like in RDF/XML:

<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
  <rdf:Description rdf:about="">
    <dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject>

Multiple transformations in XHTML

An XHTML document may conform to a number of dialects simultaneously and link to more than one decoding algorithm. However, since the href attribute of the link and a elements accept only a single URI reference, multiple instances of these elements must be used to assert multiple links:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://www.w3.org/2003/g/data-view">
  <title>Joe Lambda's Home page [an example of RDF in XHTML]</title>

  <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl" />
  <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl" />
  <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl" />
diagram: link to multiple transformations

multiple transformations


5. GRDDL for HTML Profiles

XHTML provides the profile mechanism to link to the meaning of properties and the set of legal values for those properties. As with namespace documents, a profile document can effectively be written using XHTML with embedded RDF statements and a GRDDL transformation to extract the definition of terms that are applicable. Those terms can then be used in an XHTML document to convey profile-dependent meaning. As discussed in Using GRDDL with valid XHTML, the GRDDL profile can be used with XHTML documents to apply GRDDL semantics over link elements where the value of rel attribute is transformation. This very powerful and flexible mechanism integrates well with microformat profiles[MF-RDF-FAQ] which overlay the normally semantically-poor HTML markup.

Adding GRDDL profileTransformation assertion to a profile document is much like adding a namespaceTransformation assertion to a namespace document. For a dialect defined by a valid XHTML profile documents, add profile="http://www.w3.org/2003/g/data-view" to the head element and make a link of type profileTransformation to the transformation of the dialect.

A more formal description on the relation between GRDDL and XHTML profiles follows:

In the following example, written in XHTML, the a element is a link by HTML conventions and profile transformation assertion by GRDDL convention:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://www.w3.org/2003/g/data-view">
<link rel="transformation"
    href="http://www.w3.org/2003/g/glean-profile" />
<p>This is a profile transformation link: 
<a rel="profileTransformation"

6. Transformation Algorithms

Transformations should have available representations in widely-supported formats. We expect most consumers to support XSLT version 1[XSLT1] for the foreseeable future, though XSLT2[XSLT2] deployment is increasing. While javascript, C, or any other programming language technically expresses the relevant information, XSLT is specifically designed to express XML to XML transformations and has some good safety characteristics.

XProc: An XML Pipeline Language[XPROC], a language for describing operations to be performed on XML documents, has recently been published as a W3C Working Draft. It merits consideration for expressing more complex or sophisticated transformations which require control over the flow of processing through a variety of XML processing tools. Using XProc, one could apply a sequence of operations such XInclude, validation, and transformation to a document, aborting if the result of an intermediate stage is not valid, for example.

7. Security considerations

RFC 2046, in section 9. Security Considerations says:

Implementors should pay special attention to the security implications of any media types that can cause the remote execution of any actions in the recipient's environment. In such cases, the discussion of the "application/postscript" type may serve as a model for considering other media types with remote execution capabilities.

Given the expressive power of XSLT, and the possibility to access external resources from a XSLT style sheet (e.g. through the document function or the xsl:import mechanism), implementors should take the appropriate measures to prevent malicious usage of this mechanism.

Note that evaluating a transformation may involve finding representations for not only the resource identified as the transformation, but also any resources referred to by way of mechanisms such as xsl:include. Likewise, it may involve finding representations for not only the source document but also any resources referrred to using mechanisms such as the XSLT document() function.

7. The GRDDL Vocabulary

The following extract from the GRDDL profile document is written using XHTML and embedded RDF statements and includes all of the markup required to define an XHTML profile. (View the source to see the embedded markup)

This document, http://www.w3.org/2003/g/data-view, is a metadata profile in the sense of the HTML specification, in section Meta data profiles.

We define the following terms as XHTML link relationships and RDF properties:


HTML4 definition of the 'rel' attribute.

relates a Document to an Algorithm, usually represented in XSLT, for extracting an RDF/XML representation of (some of) the document's meaning.. See GRDDL specification for full details.
relates a Document, e.g. a namespace document, to an Algorithm, usually encoded in XSLT, for extracting an RDF/XML representation of (some of) the meaning of any document whose root element's namespace name refers to the subject document.
relates a Document to an Algorithm, usually encoded in XSLT, for extracting an RDF/XML representation of (some of) the meaning of any XHTML document with a profile that refers to the subject document.

This document uses Embedded RDF to encode Description of a Project (DOAP) data as well as RDF Schema data and one or two RDDL properties. We have moved away from the RDDL syntax itself.

9. References

Normative References

Parts of the following specifications are include in this one by reference:

HTML 4.01 Specification , D. Raggett, A. Le Hors, I. Jacobs, Editors, W3C Recommendation, 24 December 1999, http://www.w3.org/TR/1999/REC-html401-19991224 . Latest version available at http://www.w3.org/TR/html401 .
Resource Description Framework (RDF): Concepts and Abstract Syntax , G. Klyne, J. J. Carroll, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ . Latest version available at http://www.w3.org/TR/rdf-concepts/ .
Modularization of XHTML™ , S. Schnitzenbaumer, F. Boumphrey, T. Wugofski, S. McCarron, M. Altheim, S. Dooley, Editors, W3C Recommendation, 10 April 2001, http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/ . Latest version available at http://www.w3.org/TR/xhtml-modularization/ .
Architecture of the World Wide Web, Volume One , N. Walsh, I. Jacobs, Editors, W3C Recommendation, 15 December 2004, http://www.w3.org/TR/2004/REC-webarch-20041215/ . Latest version available at http://www.w3.org/TR/webarch/ .
RDF Semantics , P. Hayes, Editor, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ . Latest version available at http://www.w3.org/TR/rdf-mt/ .

@@TODO: cite FOAF normatively or remove the dependency on foaf:Document from the GRDDL namespace document.

Informative references

The following documents provide additional background but are not part of this specification.

GRDDL Primer , I. Davis, Editor, W3C Working Draft (work in progress), 2 October 2006, http://www.w3.org/TR/2006/WD-grddl-primer-20061002/ . Latest version available at http://www.w3.org/TR/grddl-primer/ .
GRDDL Use Cases: Scenarios of extracting RDF data from XML documents , F. Gandon, Editor, W3C Working Draft (work in progress), 2 October 2006, http://www.w3.org/TR/2006/WD-grddl-scenarios-20061002/ . Latest version available at http://www.w3.org/TR/grddl-scenarios/ .
XSL Transformations (XSLT) Version 1.0 , J. Clark, Editor, W3C Recommendation, 16 November 1999, http://www.w3.org/TR/1999/REC-xslt-19991116 . Latest version available at http://www.w3.org/TR/xslt .
XSL Transformations (XSLT) Version 2.0 , M. Kay, Editor, W3C Working Draft (work in progress), 11 February 2005, http://www.w3.org/TR/2005/WD-xslt20-20050211/ . Latest version available at http://www.w3.org/TR/xslt20 .
J. Kunze Encoding Dublin Core Metadata in HTML in 1999
Expressing Simple Dublin Core in RDF/XML Beckett, Miller, Brickley 2002-07-31
The Platform for Privacy Preferences 1.0 (P3P1.0) Specification , M. Marchiori, Editor, W3C Recommendation, 16 April 2002, http://www.w3.org/TR/2002/REC-P3P-20020416/ . Latest version available at http://www.w3.org/TR/P3P/ .
Associating Style Sheets with XML documents , J. Clark, Editor, W3C Recommendation, 29 June 1999, http://www.w3.org/1999/06/REC-xml-stylesheet-19990629 . Latest version available at http://www.w3.org/TR/xml-stylesheet .
XProc: An XML Pipeline Language , N. Walsh, Editor, W3C Working Draft (work in progress), 28 September 2006, http://www.w3.org/TR/2006/WD-xproc-20060928/ . Latest version available at http://www.w3.org/TR/xproc/ .
Microformat FAQs for RDF Fans, last modified 17:57, 30 May 2006

Appendix: Transformations for Styling versus data extraction

The xml-stylesheet processing instruction[STYPI] is generally deployed for automated presentation processing. This type of link is different from links to GRDDL transformation algorithms, which are intended to facilitate extracting data. Also, parsing the content of processing instructions is not supported by XML tools such as XSLT processors, and grounding processing instructions in URI space is not as straightforward as using namespaces with attributes.

Appendix: Available Software and Services

The authors provide pair of online services on an experimental, best-effort basis:

Client-side implementations are also in development:

Appendix: Issues

The editor acknoweldges the following issues and expects the Working Group to make decisions about them:

The following issues have been resolved by the Working Group:

Appendix: Test Cases

A collection of test cases is in development. The original announcement was 02 Feb 2005. As of April 2005, they include:

Tests pending:

Extended Example

An example homepage with Dublin Core, GeoURL, RSS, Creative Commons, etc. demonstrates several transformations and dialects.

Acknowledgements and Change History

The GRDDL Working Group convened August 2006 with Harry Halpin as chair and several of the contributors and implementors above participating, plus Chimezie Ogbuji, Fabien Gandon, Brian Suda, and Rachel Yager.