Bridging XHTML, XML and RDF with GRDDL

Dominique Hazaël-Massieux

W3C Quality Assurance Activity Lead and Systems Engineer
World Wide Web Consortium (W3C) W3C

La Seyne sur Mer

This paper was first submitted to XTech 2005 (published as part of the proceedings) on April 15 2005, and presented on May 26 of this same year. It has evolved since then as described in the changelog.


While SGML and XML languages have had for a long time the possibility to describe syntactic constraints of their vocabularies using DTD and other schema languages, no specific mechanism exists to allow for the mapping between these syntactic constraints and their semantic implications.

GRDDL, a technology in development in W3C, allows to incorporate semantics from XML vocabularies and XHTML conventions into the Semantic Web by re-using existing extensibility hooks of the Web. This paper explains the basic principles of its mechanisms, and explore how it can be applied for various communities.

Table of Contents

Bridging semantics across markup languages
GRDDL mechanisms
Specifying a Transformation For a Family of Documents
Specifying a Transformation For an Individual Document
Scenarios of applications
GRDDL status and future development
Test Suite


Re-using the same same technologies for sharing documents on the Web to share information and data that can be processed directly by computers is an idea as old as the Web itself.

The Semantic Web, built on the Resource Description Framework (RDF), is the point of reference for sharing computer-processable information on the Web. However, meanwhile, a lot more information is available as non-Semantic Web formats than as RDF-based ones at this time: (X)HTML documents obviously, but also a significant set of other formats encoding images, documents, spreadsheets, newsfeeds, etc. Not all this information is formalized enough to fit in the framework set by the Semantic Web, but a lot of it is, and would benefit from being integrated in this Web of data.

This paper describes GRDDL, standing for "Gleaning Resource Descriptions from Dialects of Languages", a technology under development in W3C designed to fill part of this gap, allowing document authors to associate automatically formalized RDF statements with XHTML and XML-based formats.

Bridging semantics across markup languages

Markup languages allow the use of descriptive names to annotate and structure documents, encode data, describe images, etc. These descriptive names and the structure they create add implicit or explicit semantics to documents. For instance,

Example 1. Semantics in HTML snippet

<head><title>Example of an HTML document</title>

asserts more or less formally that this HTML document has a title of "Example of an HTML document". This assertion can be translated in RDF/N3 [N3]as <document.html> dc:title "Example of an HTML document".

While SGML and XML languages have had for a long time the possibility to describe syntactic constraints of their vocabularies using DTD and other schema languages, no specific mechanism exists to allow for the mapping between these syntactic constraints and their semantic implications.

But why would one need to do this mapping? Most of the time, a markup vocabulary is developed for an application-specific purpose, and the semantics bound to this vocabulary are encoded in the applications themselves.

The problem with that approach is that all the work done to define precisely the semantics of the said vocabularies gets encapsulated in the applications code, and cannot be re-used in new contexts without developing new code. Said otherwise, it hurts the potential of re-use of existing data.

Moreover, some vocabularies are designed to fit in a multitude of formats, with various syntactic constraints and processing models.

For instance, Creative Commons [CC] defines set a set of metadata to define licensing rights associated with various forms of content. Users of this metadata vocabulary will want to embed this information in a very wide range of formats (SMIL, SVG, OAI, XHTML). Among these formats the allowed syntactic flexibility varies widely and the way to publish the information must be adapted to each of these cases. Moreover, a processor trying to detect and analyse the various ways this information is embedded in the host formats would need to learn each of the particular embedding methods.

While RDF [RDF] and OWL [OWL], W3C Recommendations since February 2004, provide a direct solution for re-using and combining vocabularies, many existing applications and markup systems cannot realistically be moved to this data model, and even today, many new applications are likely to be built on existing XML and HTML toolkits with their well-deployed workflow tools, rather on the less ubiquitous RDF ones.

In addition to these XML use-cases, the need to incorporate fine-grained metadata in HTML documents beyond what the HTML specification defines, has arisen again and again in the Web history, either to benefit from the deployment of well-known RDF vocabularies, or simply to use HTML as a lever to deploy what has been casually called the "lowercase semantic web", namely the possibility to encode lightweight semantics through HTML markup conventions:

  • The Dublin Core vocabulary allows authors to annotate documents with a set of well-defined metadata and has been requesting a proper way to embed RDF in HTML document for many years.

  • Similarly, the FOAF (Friend of a Friend) community wants to promote the usage of FOAF data [FOAF] - describing relationships between people- and would like to use HTML as a way to help propagate this vocabulary.

  • XFN [XFN] has developed a set of well-defined relationships between authors of Web pages, mainly targeting the blogging community; such a kind of information should clearly be part of the Semantic Web, allowing users to compare and merge these data with data from completely different sources and topics.

  • GeoURL, a set of HTML conventions to annotate content with geographical information, is a similar case of rich data embedded in HTML documents that would benefit from and enhance the Semantic Web.

  • The blogging community has created several markup conventions to manage trackback and pingback between blogs. This application demonstrates a need to extend the semantics of the basic HTML language

In all the cases above it seems natural to use RDF and OWL as well-grounded foundations to describe the semantics associated with these markup constructions, as the Cambridge Communiqué [CambComm] suggested in 1999.

GRDDL, standing for "Gleaning Resource Descriptions from Dialects of Languages" [GRDDL], proposes a mechanism to make these associations possible for any XML or XHTML document. GRDDL grounds these associations in URI space, making it simple to extend the collection of transformations as new XML vocabularies are deployed.

GRDDL mechanisms

Specifying a Transformation For a Family of Documents

Let us refer to a class of XML or XHTML documents that share a specific semantically-meaningful structure as a "family" of documents. The idea behind GRDDL is that a family of documents will advertise its family via a well-known URI and that dereferencing this URI should lead a GRDDL processor to an algorithm to map from the structure to the semantics. The XHTML1 namespace URI is an example of one such family identifier.

The existence of such a well-known URI for vocabularies deployed on the Web is consistent with the World Wide Web Architecture [WEBARCH] described by the W3C Technical Architecture Group:

To benefit from and increase the value of the World Wide Web, agents should provide URIs as identifiers for resources.

In the case of XML vocabularies, the most deployed form of URI as identifier for a family of documents is the namespace of the root element - although this is not enforced by any specification, see the relevant TAG issue [mixedNamespaceMeaning-13] for more details.

For instance, all P3P Policy Reference files [P3P] start with a META element in the namespace; interestingly in the case of P3P, the associated meaning of the XML vocabulary has been formally translated in RDF, as described in the RDF Schema for P3P [P3P-RDF]. Similarly, SVG files and XML Schema files can be recognized by the namespace of their root element, making it a good identifier for the type of structure and semantics associated with these families of document.

XHTML also has been used as a container for sub-vocabularies, e.g. the XHTML Friends Network proposal [XFN]). The proper way to anchor these additional semantics in the Web is to use the profile attribute on the head element, as warranted by the HTML 4.01 specification [HTML4]:

The profile attribute of the HEAD specifies the location of a meta data profile. The value of the profile attribute is a URI. User agents may use this URI in two ways:

  • As a globally unique name. User agents may be able to recognize the name (without actually retrieving the profile) and perform some activity based on known conventions for that profile. For instance, search engines could provide an interface for searching through catalogs of HTML documents, where these documents all use the same profile for representing catalog entries.

  • As a link. User agents may dereference the URI and perform some activity based on the actual definitions within the profile (e.g., authorize the usage of the profile within the current HTML document). [The HTML4] specification does not define formats for profiles.

Indeed, an XHTML document using the set of relationships defined in XFN [XFN] must reference the XFN profile in its head element.

Both for XML and XHTML, when such a family identifier URI exists and is dereferencable, GRDDL proposes that dereferencing the said URI should provide one or more algorithms that transform an instance of a document of this family and turn it into RDF/XML statements. When such a transformation exists, GRDDL specifies that these statements are indeed part of the intended meaning of the document. This is illustrated with the Figure 1, “Extracting RDF Statements from a P3P document using GRDDL”.

Figure 1. Extracting RDF Statements from a P3P document using GRDDL

Extracting RDF Statements from a P3P document using GRDDL

In this figure, we assume that the P3P namespace document has been set up for GRDDL; a GRDDL processor encountering a P3P document DOC would dereference the namespace URI of the root element, find the relevant namespace document NS, determine a URI referencing an algorithm (XSLT) in it, and apply the algorithm to extract RDF statements (RDF) from the original P3P document.

Practically speaking, this works as follows:

  1. A GRDDL processor encounters an XML document whose root element has a dereferencable namespace URI. If this document is an XHTML document, the head is examined for a derefencable profile URI.

  2. Upon dereferencing the said URI, the GRDDL processor finds that it references one or more URIs identifying a set of algorithms (this step is described in more detail below).

  3. If the identified algorithms are well-known to the GRDDL processor, it applies them to the initial XML (or XHTML) document; otherwise, it dereferences the URIs and finds, e.g., an XSL transformation as one of the representations.

  4. Applying the XSL transformations to the initial document, the GRDDL processor extracts a set of RDF/XML statements that are asserted as being part of the intended meaning of the document.

Given the XML nature of the targeted vocabularies, and the growing availability of XSLT processors, GRDDL suggests that algorithms should be expressed in this language, so that a processor configured to fetch new algorithms on the fly could use XSLT [XSLT1] as a common transformation language. At the time of this writing, XSLT 2 is still a Working Draft [XSLT2], and the question of whether GRDDL should require support for XSLT 2 has not been fully addressed, although its admittedly superior expressive power makes it an interesting candidate.

This does not prevent the use of URIs as simple identifiers relying on a library for well-known transformation algorithms, nor the use of other techniques than XSLT to process the said XML documents.

A point has been left open in the process above: how to detect the URI referencing the algorithms in the namespace (or XHTML profile) document, given that there is no standard format for it? (See W3C TAG issue [namespaceDocument-8]). And indeed, namespace owners have put a wide variety of documents as representations of their URIs, from a simple HTML document to content negotiated schemas, DTDs, or RDDL documents to dispatch between these various relevant data.

To solve that problem, GRDDL instructs a processor not to look for a particular format, but to look for a given RDF property stated by the namespace or profile document. And since it is not possible to assume that all namespaces and profiles documents are published in RDF, the GRDDL processing is applied here recursively: namely, if the given namespace/profile document is not in RDF/XML but in some XML format, the processor should simply try to extract RDF statements from it using GRDDL processing.

The examples attached to the XSLT-based GRDDL processor [XSLT-DEMO] shows how this recursive processing can be applied fairly easily to namespace and profile documents given in XML Schema, XHTML and RDDL formats, and can be easily extended to any XML format. The Figure 2, “Applying GRDDL recursively through an XML Schema-based namespace document” illustrates how this would work applied to a namespace represented by an XML Schema - thus implementing partially one of the goals put up in the Cambridge Communiqué [CambComm].

A final question arises with this recursive method: how to stop the recursion without having to modify all the namespace and profile documents of formats that may be used only a few times as containers for GRDDL markup? Not only would this be unlikely to be achievable, it would also limit the number of ways one could use a given format as a container for GRDDL markup.

The following section details the second GRDDL mechanism that makes this possible.

Specifying a Transformation For an Individual Document

There are various situations where it is not possible, practical or desirable to have a URI identifying a given family of documents, or to have it referenced in the places mentioned above (i.e. as namespace of the root element or as profile in the head element), or to change the representation available at such a URI.

Thus, GRDDL provides a second mechanism that allows authors to associate an individual document with a given transformation algorithm. It does so through

  • an attribute (dataview:transformation) on the root element for the generic XML case

    Example 2. GRDDL attribute in an XML Schema document

    <xs:schema xmlns:xs="" 
  • for XHTML, given the syntactic constraints imposed by the required DTD validity, adding an attribute in the html root element is not an option. Thus, GRDDL proposes to use a specific rel attribute value (transformation), anchored in URI space through a defined profile attribute value:

    Example 3. GRDDL link in an XHTML document

    <html xmlns="">
      <head profile="">
        <title>Some Document</title>
        <link rel="transformation"
           href="" />

This mechanism means that a GRDDL processor should

  1. look for the said attribute (resp. the said profile value and the targets of the links with the said rel value) in the root element of processed XML documents (resp. XHTML documents)

  2. if the URIs given in this attribute (resp. in the links) identifies well-known algorithms, it applies the said algorithm to the initial document to extract RDF Statements

  3. otherwise, it dereferences the URIs and applies the XSL style sheets representations available to the initial document to extract RDF/XML Statements

  4. the statements extracted are then considered part of the intended meaning of the document

This mechanism offers a simple and short way to close the recursive processing explained above: in the case of an XML Schema-based namespace document ( Figure 2, “Applying GRDDL recursively through an XML Schema-based namespace document”), one would just need to add a single transformation reference to the XML Schema to allow the GRDDL processor to extract the transformations that apply to all the documents defined in this namespace.

Figure 2. Applying GRDDL recursively through an XML Schema-based namespace document

Applying GRDDL recursively through an XML Schema-based namespace document

The figure above illustrates a GRDDL processor encountering a purchase order document (DOC) whose namespace URI has a representation in XML Schema (P.O. NS). This XML Schema itself references a GRDDL transformation (XSLT-1), that when applied to the schema document gives RDF Statements (RDF-1) referencing a transformation (XSLT-2) to be applied on the initial purchase order to extract RDF statements (RDF-2). This is actually implemented in the XSLT-based demonstrator [XSLT-DEMO].

How can these mechanisms be used in practice with vocabularies deployed today? How do they allow communities to take part to the Semantic Web goals of re-using information as much as possible?

Scenarios of applications

Most of the deployed vocabularies (either XML or XHTML based) are the results of some agreement inside a community on the meaning of the terms defined by the vocabulary.

Some of these communities may wish to integrate these vocabularies into the Semantic Web, either to benefit of the promises of the network effect allowed by sharing a common format to carry semantics, or to solve a particular integration of vocabularies (see the section called “Bridging semantics across markup languages”), or simply to re-use some of the tools available for Semantic Web technologies to make inferences, visualize relationships, or ease indexing by Semantic Web bots.

We explore here a few usage scenarios of various communities, and how they could effectively use GRDDL to meet their needs. This section is purely putative and does not necessarily reflect the thinking of these communities.

Example 4. The W3C XHTML Working Group scenario

The HTML Working Group is working on XHTML 2.0 [XHTML 2.0]. One of the most promising aspects of this new version of XHTML is the possibility to express in XHTML markup a very wide range of RDF Statements. This would offer a whole new community the opportunity to take a direct part to the Semantic Web.

The current proposal make it possible to extract RDF/XML Statements from XHTML 2 documents using XSLT, making it compatible with the GRDDL mechanisms. As such, any Semantic Web agent implementing GRDDL would be able to parse XHTML 2 documents, as long as the XHTML Working Group publishes in the XHTML 2 namespace document a proper link to the relevant XSLT style sheet.

Example 5. The W3C SVG Working Group scenario

The SVG specification [SVG] offers a metadata tag, and its associated XML Schema allows any type of markup inside this element, thus allowing to embed RDF/XML Statements inside any kind of SVG content.

To help make these metadata elements available to Semantic Web agents, the SVG Working Group could decide to update the SVG namespace document, published in XHTML, to make it point to an XSL style sheet that would simply extracts the content of such metadata elements expressed in RDF/XML, making instantaneously all these metadata part of the Semantic Web.

Example 6. The XHTML Friends Network community scenario

The XFN community has defined a set of well-defined relationships names [XFN] anchored in a URI space through an XHTML profile. Since these relationships have been made explicit for authors and authoring tools developers, it is reasonable to assume that people using these conventions agree that using them is indeed expressing the intended meaning.

If the XFN community wanted to bring their data into the Semantic Web, for instance to be able to re-use the existing visualization tools developed for Semantic Web languages, they would simply need:

  1. to create an RDF/XML description of these relationships - or to make the existing XHTML description equivalent to a set of RDF/XML ones using GRDDL

  2. to update the existing XHTML profile to make it reference an XSL transformations that would turn links with XFN-relationships into proper RDF Statements, à la [ foaf:homepage <myhomepage.html>] xfn:met [ foaf:homepage <> ]. (in RDF/N3 for ease of reading)

Example 7. The blogging community scenario

The blogging community has been defining and re-using a number of HTML conventions to markup relationships between blog authors (XFN), geographical indications (GeoURL), communications endpoints (trackback, pingback), links endorsement (nofollow, etc.)

This growing number of conventions allows the encoding of a fascinating number of data that a few ad-hoc applications have started to explore. But the diversity of the topics and the number of used conventions make any development of this application dependent on the creation of a new one.

Moreover, a number of these conventions have not been grounded in URI space, making them somewhat fragile on the process to interpret and make these conventions evolve; in the future, new conventions may end up using clashing names by lack of coordination or due to bad timing.

To lower these risks and benefit from a stronger foundation, the blogging community could create a profile (or a set of profiles) that would associate the existing conventions to a URI; if this profile was set up with a set of GRDDL transformations, any content referencing the said profile could automatically be processed by Semantic Web agents supporting GRDDL. Moreover, any addition of a new convention to the given profile (as a result of a consensus in the community) could be supported in processing tools by simply adding a link to a new XSL style sheet to the said profile.

Example 8. The Creative Commons community scenario

The Creative Commons have defined a set of RDF properties describing the licenses they propose for authors to use on their content. They want search engines to be able to parse this data in a wide variety of formats, from XHTML to the Open Archive Initiative format.

But each of these formats has a set of syntactic constraints that need to be addressed separately, using a different embedding technique.

Instead of having to maintain a library of formats that Creative Commons search engines need to know how to process, the Creative Commons could create a set of XSL transformations that would work for each of these formats, and either propose them as directly included in individual documents, or work with the community owning these formats to have them include the links in the relevant namespace and profile documents.

These various examples make appear an interesting property of GRDDL: given its levels of indirections, it can be adapted to a wide samples of community processes, and be used as a technical tool to express existing consensus among these communities.

GRDDL status and future development


As of May 16 2005, GRDDL has been last published as a Team Submission. It was first published in April 2004 as a Coordination Group Note resulting of a task force set up to explore ways to solve the long-lived question on how to embed RDF in HTML - see [STORING] for more details.

Since May 2004, this task force has been integrated into the Semantic Web Best Practice and Deployment Working Group; this group serves today as the forum for discussing GRDDL (esp. through the mailing list) and seeing what future development is needed for GRDDL.

As of May 2005, the GRDDL specifciation has not been endorsed by W3C Membership.

The authors of the GRDDL specification are interested to get feedback on whether this specification should go through the W3C Recommendation track to help disseminate it through the relevant communities..

On the technical side, one of the interesting issues yet to be resolved concerns the support for XSLT 2 in GRDDL processor; although XSLT 2 is only at Working Draft stage at this time and thus not very widely deployed yet, using it would bring a whole new set of functions and capabilities that are likely to prove very useful in transforming existing XML and XHTML structures into RDF/XML. In particular, XSLT 2.0 may be needed to make XHTML 2.0 fully processable through GRDDL.


As of April 1st 2005, five partial or full implementations of GRDDL have been announced:

The diversity of implementations and their number at this stage of development of the specification is a positive sign of the interest in the technology among the Semantic Web community; hopefully this number should grow even larger, and GRDDL could become a basic part of any RDF toolkit.

Test Suite

To accompany both the development of the specification and the development of GRDDL implementations, a test suite has been developed, with a series of test cases for each mechanism described in the specification, and a small Python test harness that automates the running of the test suite.


GRDDL proposes a set of mechanisms strongly anchored in the Web Architecture through its use of URIs, and has the potential to address a number of issues that have arisen through the co-deployment of XHTML, XML-based vocabularies and RDF-based technologies.

The authors of the GRDDL specification are interested in feedback on the technical content of the specification, as well as on the status that potential users of this technology would like to see attached to this specification: is the development of this specification through the W3C Process to make a W3C Recommendation needed to help its deployment in and out of the Semantic Web community?

Many thanks to Ralph Swick, Dan Connolly, Dan Brickley for their reviews, suggestions and ideas that have helped write this document.


[CambComm] The Cambridge Communiqué, R. R. Swick, Henry S. Thompson, editors, W3C Note 7 October 1999. Available at

[CC] Implementing Creative Commons Metadata, Creative Commons. Available at

[FOAF] FOAF Vocabulary Specification , D. Brickley, L. Miller, RDFWeb FOAF Project 3 April 2005

[GRDDL] Gleaning Resource Descriptions from Dialects of Languages (GRDDL), D. Hazaël-Massieux, D. Connolly, W3C Team Submission 19 May 2005. Available at

[HTML4] HTML 4.01 Specification, D. Raggett, A. Le Hors, I. Jacobs, Editors, W3C Recommendation 24 December 1999. Available at

[mixedNamespaceMeaning-13] What is the meaning of a document composed of content in mixed namespaces?, W3C TAG issue raised on 22 April 2002. Available at

[N3] Notation 3, An RDF language for the Semantic Web, T. Berners-Lee. Available at

[namespaceDocument-8] What should a "namespace document" look like?, W3C Tag issue raised on 14 January 2002. Available at

[OWL] OWL Web Ontology Language Overview, D. L. McGuinness, F. van Harmelen, editors, W3C Recommendation 10 February 2004. Available at

[P3P] The Platform for Privacy Preferences 1.0 (P3P1.0) Specification, L. Cranor, M. Langheinrich, M. Marchiori, M. Presler-Marshall, J. Reagle, W3C Recommendation 16 April 2002. Available at

[P3P-RDF] An RDF Schema for P3P, B. McBride, R.Wenning, L.Cranor, W3C Note 25 January 2002. Available at

[RDF] Resource Description Framework (RDF): Concepts and Abstract Syntax, G. Klyne, J. J. Carroll, B. McBride, editors, W3C Recommendation 10 February 2004. Available at

[STORING] Storing Data in Documents: The Design History and Rationale for GRDDL, Dan Connolly Available at

[SVG] Scalable Vector Graphics (SVG) 1.2, D. Jackson, editors, W3C Working Draft (work in progress) 27 October 2004. Available at

[WEBARCH] Architecture of the World Wide Web, Volume One, I. Jacobs, N. Walsh, Editors, W3C Recommendation 15 December 2004. Available at

[XFN] XFN 1.1 relationships meta data profile, T. Çelik, M Mullenweg, E. Meyer, Global Multimedia Protocols Group. Available at

[XHTML 2.0] XHTML 2.0, J. Axelsson, B. Epperson, M. Ishikawa, S. McCarron, A. Navarro, S. Pemberton, Editors, W3C Working Draft (work in progress) 22 July 2004. Available at

[XSLT-DEMO] Demonstration of GRDDL applied to XML, D. Hazaël-Massieux, D. Connolly. Available at

[XSLT1] XSL Transformations (XSLT) Version 1.0, J. Clark, Editor, W3C Recommendation 16 November 1999. Available at

[XSLT2] XSL Transformations (XSLT) Version 2.0, M. Kay, Editor, W3C Working Draft (work in progress) 4 April 2005. Available at


$Log: xtech-grddl.html,v $
Revision 1.21  2005/11/30 22:02:48  dom
link to XTech proceeding
fixed link to slides

Revision 1.20  2005/06/02 11:42:32  dom
link to my home page

Revision 1.19  2005/06/02 11:41:27  dom
links to slide presentations

Revision 1.18  2005/05/19 10:28:49  dom
Fix broken anchor and added changelog to ToC

Revision 1.17  2005/05/19 10:27:35  dom
GRDDL is now a Team Submission, so updated status and bibliography accordingly