W3C

Gleaning Resource Descriptions from Dialects of Languages (GRDDL)

W3C Coordination Group Note 13 April 2004

This Version:
http://www.w3.org/TR/2004/NOTE-grddl-20040413/
Latest Version:
http://www.w3.org/TR/grddl/
Authors:
Dominique Hazaƫl-Massieux
Dan Connolly

Abstract

This document presents GRDDL, a mechanism for encoding RDF statements in XHTML and XML to be extracted by programs such as XSLT transformations.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

As part of the work of the W3C Semantic Web Activity, the Semantic Web Coordination Group (Member-only) and the HTML Working Group started a task force on RDF in XHTML. This draft is a snapshot of one of the designs discussed in that task force.

Please send review comments, implementation experience reports, etc. to public-rdf-in-xhtml-tf@w3.org, a mailing list with public archive.

The EmbeddingRDFinHTML wiki topic is also available as a shared space for collected wisdom on related topics.

A related design history and rationale discusses contribution of this draft to RDF issues such as faq-html-compliance and rdfms-validating-embedded-rdf and Web Architecture issues such as RDFinXHTML-35 and namespaceDocument-8.

This is something of a design sketch, but it is backed by running code. We provide pair of online services, one demo for XHTML and one demo for generic XML on an experimental, best-effort basis.

The editors are aware of a few remaining issues, marked up like this @@@.

A log of changes is appended.

Publication as a Coordination Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Contents

  1. Introduction
  2. GRDDL for XHTML
  3. GRDDL for XML
  4. GRDDL for XML Namespace Documents
  5. Security Considerations
  6. @@ References

Supplementary Material

1. Introduction

An article by J. Kunze in 1999, Encoding Dublin Core Metadata in HTML, explains one way that the Dublin Core community encodes its metadata in HTML documents. This metadata can also be expressed in the Resource Description Framework (RDF).

The mapping between the HTML encoding and the RDF encoding can be represented as an XSLT transformation, dc-extract.xsl:

diagram: HTML to RDF via dc-extract.xsl
Decoding HTML metadata to RDF
(svg)

If the HTML author understood and agreed to these encoding conventions, then their HTML document will conform to the syntactic conventions. In this case, the mapping preserves the author's meaning. But an author may have accidentally conformed to the syntactic conventions without any knowledge of Dublin Core at all. In that case, the mapping most likely does not preserve the author's meaning.

2. The GRDDL profile for XHTML

The HTML specification, in section 7.4.4.3 Meta data profiles provides a mechanism for authors to use particular metadata vocabularies and thereby indicate the author's intent to use those terms in accordance with the conventions of the community that originated the terms.

Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types.

GRDDL is such a profile; it's a mechanism for Gleaning Resource Descriptions from Dialects of Languages. Use of the http://www.w3.org/2003/g/data-view profile indicates that RDF statements that result from transformation of the HTML document to RDF by designated algorithms are part of the document's meaning.

In this profile, the transformation link relationship relates a document to an algorithm for for gleaning resource descriptions from the dialect the document is written in.

diagram: link to transformation
Decoding HTML metadata to RDF
(svg)

@@@ Should we namespace-qualify token used in rel?cf Profiles attribute: A format to be defined Karl Dubost 15 Jan 2004.

For example:

<html xmlns="http://www.w3.org/1999/xhtml">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Some Document</title>
    <link rel="transformation"
       href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />
    <meta name="DC.Subject"
       content="ADAM; Simple Search; Index+; prototype" />
    ...
  </head>
  ...
</html>

The following RDF statement is part of the meaning of this document:

<rdf:RDF
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  >
  <rdf:Description rdf:about="">
    <dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject>
  </rdf:Description>
</rdf:RDF>

Transformation algorithms should be represented in XSLT. While javascript, C, or any other programming language technically expresses the relevant information, XSLT is specifically designed to express XML to XML transformations and has some good safety characteristics. Other representations may be used by prior agreement of all concerned parties.

Transformation algorithms should be well-defined functions whose only input is the source document. The use of the XSLT document() function to incorporate other data at transformation time is an error.

Limitations on xsl:import?

Note that an XHTML document may conform to a number of dialects simultaneously and link to more than one decoding algorithm. For example, the fictional Joe Lambda's Homepage demonstrates a mixture of Dublin Core, Creative Commons, RSS, FOAF, and geoURL dialects.

3. The GRDDL attribute in XML

The GRDDL profile mechanism is a special case of GRDDL designed to fit within the syntax of XHTML 1.0. The general form of GRDDL is an attribute suitable for use with a wide variety of XML dialects.

Use of the interpreter attribute in the http://www.w3.org/2003/g/data-view# namespace on the root element of an XML document indicates that RDF statements that result from transformation of the HTML document to RDF by designated algorithms are part of the document's meaning.

The value of the grddl:interpreter attribute designates a list of algorithms by URI reference. @@@IRI reference?

For example: update to P3Q example?

<svg xmlns="http://www.w3.org/2000/svg" 
   xmlns:data-view="http://www.w3.org/2003/g/data-view#" 
   data-view:interpreter="http://www.example.org/2004/01/svg2dc.xsl"
    width="4cm" height="8cm" 
    version="1.1" baseProfile="tiny" >

4. XML Namespaces and embedded RDF

The RDF property http://www.w3.org/2003/g/data-view#namespaceTransformation links an XML Namespace to an interpreter that may be applied to any document which has its root element in that namespace, such that the output of the interpreter will be an RDF/XML form of some (or all) of the information content of the document.

For instance, given the XML Namespace http://www.example.net/fooML,

<rdf:Description rdf:about="http://www.example.net/fooML">
 <namespaceTransformation xmlns='http://www.w3.org/2003/g/data-view#'
     rdf:resource='http://www.example.net/fooML2rdf.xsl' />
</rdf:Description>

asserts that if an XML document has a root element in the http://www.example.net/fooML namespace, and it is run through the XSLT style sheet http://www.example.net/fooML2rdf.xsl then the result will be valid RDF/XML which is information which can be considered to have been expressed by the document.

5. Security considerations

RFC 2046, in section 9. Security Considerations says:

Implementors should pay special attention to the security implications of any media types that can cause the remote execution of any actions in the recipient's environment. In such cases, the discussion of the "application/postscript" type may serve as a model for considering other media types with remote execution capabilities.

Given the expressive power of XSLT, and the possibility to access external resources from a XSLT style sheet (e.g. through the document function or the xsl:import mechanism), implementors should take the appropriate measures to prevent malicious usage of this mechanism.

Change History

The Nov 2003 draft is a predecessor of this spec.

An editor's working draft is also available; v1.11 was announced in a message of 16Jan.