Copyright © 2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. This GRDDL specification introduces markup for declaring that an XML document includes gleanable data and for linking to an algorithm, typically represented in XSLT, for gleaning the resource descriptions from the document.
The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data.
A corresponding GRDDL Use Case Working Draft provides motivating examples. A GRDDL Primer demonstrates the mechanism on XHTML documents which include widely-deployed dialects, more recently known as microformats.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the First Public Working Draft of the GRDDL design by the GRDDL Working Group, which was chartered in July 2006 to review the specification and develop use cases, tutorial materials, and tests. The GRDDL design was released as a Note as early as April 2004; see the change log appendix for details.
GRDDL is intended to contribute to addressing Web Architecture issues such as RDFinXHTML-35 and namespaceDocument-8 as well as issues postponed by the RDF Core working group such as rdfms-validating-embedded-rdf and faq-html-compliance.
There are now multiple implementations, including an online service, and a growing collection of tests. A number of issues remain to be decided by the working group; this draft takes a position on some of them.
A few editorial notes and TODOs in this style remain. In particular, the figures have not yet been updated with respect to changes in the text; we hope they are more helpful than distracting in their present state.
Please send comments about this document to public-grddl-comments@w3.org (with public archive).
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
There are many dialects of languages in practice among the many XML documents on the web. There are dialects of XHTML, XML and RDF that are used to represent everything from poetry to prose, purchase orders to invoices, spreadsheets to databases, schemas to scripts, and linked lists to ontologies. Some offer more formally defined semantics and others more loosely-couple semantics. Recently, two progressive encoding techniques have emerged to overlay additional semantics onto valid XHTML documents: RDFa and microformats offer simple, open data formats built upon existing and widely adopted standards.
While this breadth of expression is quite liberating, inspiring new dialects to codify both common and customized meanings, it can prove to be a barrier to understanding across different domains or fields. How, for example, does software discover the author of a poem, a spreadsheet and an ontology? And how can software determine whether authors of each are in fact the same person?
DocBook V4.X | TEI |
<book> <bookinfo> <title>The Stand</title> <author> <firstname>Stephen</firstname> <surname>King</surname> </author></bookinfo> ... </book> |
<TEI ... > ... <title>The Stand</titl> <author><persName> <forename>Stephen</forename> <surname>King</surname> </persName></author> ... </TEI> |
Atom | Open Office |
<entry ... > <title>The Stand</title> <author> <name>Stephen King</name> </author> ... </entry> |
<office:document-meta ... > <office:meta> <dc:title>The Stand</dc:title> <meta:initial-creator> Stephen King </meta:initial-creator> <dc:creator>Stephen King</dc:creator> </office:meta> </office:document-meta> |
The Resource Description Framework[RDFC04] provides a standard for making statements about resources in the form of a subject-predicate-object expression. One way to represent the fact "The Stand's author is Stephen King" in RDF would be as a triple whose subject is "The Stand," whose predicate is "has the author," and whose object is "Stephen King." The predicate, "has the author" expresses a relationship between the subject (The Stand) and the object (Stephen King). Using URIs to uniquely identify the book, the author and even the relationship would facilitate software design because not everyone knows Stephen King or even spells his name consistently.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/" > <rdf:Description rdf:about="http://www.stephenking.com/pages/works/stand/"> <dc:title>The Stand</dc:title> <dc:creator>Stephen King</dc:creator> <foaf:maker> <foaf:Person> <foaf:isPrimaryTopicOf rdf:resource="http://en.wikipedia.org/wiki/Stephen_King" /> </foaf:Person> </foaf:maker> <dc:format>Book</dc:format> </rdf:Description> </rdf:RDF>
GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. That is, GRDDL provides a relatively inexpensive mechanism for bootstrapping RDF content from uniform XML dialects; shifting the burden from formulating RDF to creating transformation algorithms specifically for each dialect. XML Transformation languages such as XSLT are quite versatile in their ability to process, manipulate, and generate XML. The use of XSLT to generate XHTML from single-purpose XML vocabularies is historically celebrated as a powerful idiom for separating structured content from presentation.
GRDDL shifts this idiom to a different end: separating structured content from its authoritative meaning (or semantics). GRDDL works by associating transformations for an individual document, either through direct inclusion of references or indirectly through profile and namespace documents. Content authors can nominate the transformations for producing RDF from their content and use GRDDL to refer to them.
By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition of the source document, or some portion of the source document, that preserves its meaning in RDF.
Likewise, by specifying a GRDDL namespace transformation or profile transformation, the creator of that namespace or profile states that the transformation will provide a faithful rendition of a class of source documents which relate to that namespace or profile, or some portion of such a source document, that preserves its meaning in RDF. A namespace document or a profile document also provide a means for their authors to explain, prosaically, the purpose of the transformation or any policy statements.
The GRDDL Primer[primer] is a step-by-step tutorial on the GRDDL mechanism. It develops on a number of examples from the GRDDL Use Cases document to illustrate GRDDL techniques for associating documents with transformations for extracting RDF.
The use cases document[usecases] collects a number of use cases together with their goals and requirements for GRDDL. These use cases also illustrate how XML and XHTML documents can be decorated with microformat, Embedded RDF or RDFa statements to support GRDDL transformations in charge of extracting valuable data that can then be used to automate a variety of tasks.
This GRDDL specification is a concise technical specification of the GRDDL mechanism and its XML syntax. It specifies the GRDDL syntax to use in valid XHTML and well-formed XML documents, as well as how to encode GRDDL into namespaces and HTML profiles. Discussions of the GRDDL transformation link and security issues are also covered. Appendices provide links to extended examples and existing software and services that employ GRDDL.
The general form of associating a GRDDL transformation link with a
well-formed XML document is by adorning the root element with a
grddl
namespace declaration and a
grddl:transformation
attribute whose value is a URI
reference, or list of URI references, that refer to executable scripts
or programs which are expected to transform the source document into
RDF. This method is suitable for use with a wide
variety of XML dialects that are are not constrained from adding
attributes by an XML DTD.
Stated more formally:
transformation
and a
namespace name of http://www.w3.org/2003/g/data-view#
has a GRDDL transformation for each resource
identified by a URI reference listed in the value of the attribute
(c.f. section 4.4.1. URI
references in [WEBARCH]).
note that issue-output-formats is open; this draft takes the somewhat liberal position that GRDDL transformations yield RDF graphs, not RDF/XML documents.
In any dialect of XML not constrained by an XML DTD, the following example applies:
<root-element xmlns:grddl="http://www.w3.org/2003/g/data-view#" grddl:transformation="http://example.com/fmt3/txformRDF.xsl"> <etc> ... </etc> </root-element>
In any dialect of XHTML that is not constrained by DTD syntax, the above example can be written:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:data-view="http://www.w3.org/2003/g/data-view#" data-view:transformation="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl"> <head> <title>Joe Lambda's Home page [an example of RDF in XHTML]</title> ...
Notice that data-view
was used in the namespace declaration
in place of grddl
in this example to illustrate the fact
that the prefix does not have to be grddl
and to emphasize the data-centric focus of the RDF/XML view.
As you will see in later sections, there are other ways to add GRDDL to HTML documents, especially designed to leverage HTML's existing capabilities and thereby overcome constraints imposed by the XML DTDs for some dialects of HTML. See Using GRDDL with valid XHTML and GRDDL for HTML Profiles.
Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace. Any resource available for retrieval from a namespace URI is a namespace document (cf. section 4.5.4. Namespace documents in [WEBARCH]). For example, a namespace document may have an XML Schema representation or an RDF Schema representation, or perhaps both, using content negotiation.
To associate a GRDDL transformation with a whole dialect, have the
namespace document include the
grddl:namespaceTransformation
property. The precise
methods for allowing various types of namespace documents to include
this property are detailed below, first formally and then by example.
Note issue issue-mt-ns is open. perhaps: special case for the RDF/XML namespace: RDF/XML documents are associated with RDF graphs as per the RDF/XML specification.
For example, consider this privacy policy written in P3Q, a contrived analog to P3P[P3P]:
<POLICIES xmlns="http://www.w3.org/2004/01/rdxh/p3q-ns-example"> <EXPIRY max-age="604800"/> ...
The namespace document for P3Q relates the grokP3Q.xsl transformation to all P3Q documents:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dataview="http://www.w3.org/2003/g/data-view#"> <rdf:Description rdf:about="http://www.w3.org/2004/01/rdxh/p3q-ns-example"> <dataview:namespaceTransformation rdf:resource="http://www.w3.org/2004/01/rdxh/grokP3Q.xsl"/> </rdf:Description> </rdf:RDF>
The Working Group is likely to add a section to the GRDDL primer much like this subsection. Since this subsection has no novel normative material, we're interested in feedback on whether it should remain part of this specification once it is covered by the primer.
A namespace transformation link may be discoverable by transforming the namespace document itself. Note that this means that namespace documents need not be written in RDF/XML directly.
Consider a purchase order that has a namespace document represented in XML Schema, where the XML Schema bears a data-view:transformation attribute licensing extraction of statements that include namespaceTransformation statements:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http:.../Order-1.0" targetNamespace="http:.../Order-1.0" version="1.0" ... xmlns:data-view="http://www.w3.org/2003/g/data-view#" data-view:transformation="http://www.w3.org/2003/g/embeddedRDF.xsl" > <xsd:element name="Order" type="OrderType"> <xsd:annotation <xsd:documentation>This element is the root element.</xsd:documentation> </xsd:annotation> ... <xsd:annotation> <xsd:appinfo> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://www.w3.org/2003/g/po-ex"> <data-view:namespaceTransformation rdf:resource="grokPO.xsl" /> </rdf:Description> </rdf:RDF> </xsd:appinfo> </xs:annotation>
Every purchase order using that schema as a namespace document
is linked to the grokPO.xsl
transformation, as
illustrated below:
using GRDDL with an XML Schema
(svg)@@oops... the figure uses "result" as the result of an XSLT transformation, but that clashes with GRDDL result.
The Working Group is likely to add a section to the GRDDL primer much like this subsection. Since this subsection has no novel normative material, we're interested in feedback on whether it should remain part of this specification once it is covered by the primer.
To accomodate the DTD-based syntax of XHTML[XHTML], which precludes using attributes from
foreign namespaces, we use http://www.w3.org/2003/g/data-view
as a metadata profile (cf. section 7.4.4.3
Meta data profiles of [HTML4]).
The general form of adding a GRDDL assertion to a valid XHTML
document is by specifying the GRDDL profile in the
profile
attribute of the head
element, and
transformation
as the value of the rel
attribute of a link
or a
element whose
href
attribute value is a URI reference that refers to an
executable script or program which is expected to transform the source
document into RDF. This method is suitable for use
with valid XHTML documents which are constrained by an XML DTD.
Stated more formally:
@@TODO: be more clear about what "whose metadata profiles" means
For example, this document follows the conventions of [RFC2731], and it explicitly uses the GRDDL profile and links to an XSLT transformation that extracts the metadata in RDF/XML in a way that preserves the meaning of the document:
<html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title> <link rel="transformation" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" /> <meta name="DC.Subject" content="ADAM; Simple Search; Index+; prototype" /> ... </head> ... </html>
In the figure below, the arrow labelled info relates a document to an abstract notion of the information contained in the document. It shows that the RDF data extracted via the dc-extract.xsl transformation is part of the information contained in the document:
Decoding HTML meta-data to RDF
(svg)This is what the data looks like in RDF/XML:
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about=""> <dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject> </rdf:Description> </rdf:RDF>
An XHTML document may conform to a number of dialects
simultaneously and link to more than one decoding algorithm. However,
since the href
attribute of the link
and
a
elements accept only a single URI reference, multiple
instances of these elements must be used to assert multiple links:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Joe Lambda's Home page [an example of RDF in XHTML]</title> <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl" /> <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl" /> <link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl" /> ...
multiple transformations
(svg)XHTML provides the profile mechanism to link to the meaning of properties
and the set of legal values for those properties. As with namespace documents,
a profile document can effectively be written using XHTML with embedded RDF statements
and a GRDDL transformation to extract the definition of terms that are applicable.
Those terms can then be used in an XHTML document to convey profile-dependent meaning.
As discussed in
Using GRDDL with valid XHTML, the GRDDL profile can be used
with XHTML documents to apply GRDDL semantics over link
elements where
the value of rel
attribute is transformation
.
This very powerful and flexible mechanism integrates well with
microformat profiles[MF-RDF-FAQ] which overlay the normally semantically-poor HTML markup.
Adding GRDDL profileTransformation
assertion to a
profile document is much like adding a
namespaceTransformation
assertion to a namespace
document. For a dialect defined by a valid XHTML profile
documents, add
profile="http://www.w3.org/2003/g/data-view"
to the
head
element and make a link of type
profileTransformation
to the transformation of the
dialect.
A more formal description on the relation between GRDDL and XHTML profiles follows:
In the following example, written in XHTML, the a
element is a link by HTML conventions and profile transformation
assertion by GRDDL convention:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <link rel="transformation" href="http://www.w3.org/2003/g/glean-profile" /> ... <p>This is a profile transformation link: <a rel="profileTransformation" href="http://example.org/BIZ/calendar/extract-rdf.xsl">extract-rdf.xsl</a>
Transformations should have available representations in widely-supported formats. We expect most consumers to support XSLT version 1[XSLT1] for the foreseeable future, though XSLT2[XSLT2] deployment is increasing. While javascript, C, or any other programming language technically expresses the relevant information, XSLT is specifically designed to express XML to XML transformations and has some good safety characteristics.
XProc: An XML Pipeline Language[XPROC], a language for describing operations to be performed on XML documents, has recently been published as a W3C Working Draft. It merits consideration for expressing more complex or sophisticated transformations which require control over the flow of processing through a variety of XML processing tools. Using XProc, one could apply a sequence of operations such XInclude, validation, and transformation to a document, aborting if the result of an intermediate stage is not valid, for example.
RFC 2046, in section 9. Security Considerations says:
Implementors should pay special attention to the security implications of any media types that can cause the remote execution of any actions in the recipient's environment. In such cases, the discussion of the "application/postscript" type may serve as a model for considering other media types with remote execution capabilities.
Given the expressive power of XSLT, and the possibility to access external
resources from a XSLT style sheet (e.g. through the document
function or the xsl:import
mechanism), implementors should take
the appropriate measures to prevent malicious usage of this mechanism.
Note that evaluating a transformation may involve finding
representations for not only the resource identified as
the transformation, but also any resources referred to
by way of mechanisms such as xsl:include
.
Likewise, it may involve finding representations for
not only the source document but also any resources
referrred to using mechanisms such as the XSLT
document()
function.
The following extract from the GRDDL profile document is written using XHTML and embedded RDF statements and includes all of the markup required to define an XHTML profile. (View the source to see the embedded markup)
This document, http://www.w3.org/2003/g/data-view, is a metadata profile in the sense of the HTML specification, in section 7.4.4.3 Meta data profiles.
We define the following terms as XHTML link relationships and RDF properties:
- rel
HTML4 definition of the 'rel' attribute.
- transformation
- relates a Document to an Algorithm, usually represented in XSLT, for extracting an RDF/XML representation of (some of) the document's meaning.. See GRDDL specification for full details.
- namespaceTransformation
- relates a Document, e.g. a namespace document, to an Algorithm, usually encoded in XSLT, for extracting an RDF/XML representation of (some of) the meaning of any document whose root element's namespace name refers to the subject document.
- profileTransformation
- relates a Document to an Algorithm, usually encoded in XSLT, for extracting an RDF/XML representation of (some of) the meaning of any XHTML document with a profile that refers to the subject document.
This document uses Embedded RDF to encode Description of a Project (DOAP) data as well as RDF Schema data and one or two RDDL properties. We have moved away from the RDDL syntax itself.
Parts of the following specifications are include in this one by reference:
@@TODO: cite FOAF normatively or remove the dependency on foaf:Document from the GRDDL namespace document.
The following documents provide additional background but are not part of this specification.
The xml-stylesheet processing instruction[STYPI] is generally deployed for automated presentation processing. This type of link is different from links to GRDDL transformation algorithms, which are intended to facilitate extracting data. Also, parsing the content of processing instructions is not supported by XML tools such as XSLT processors, and grounding processing instructions in URI space is not as straightforward as using namespaces with attributes.
The authors provide pair of online services on an experimental, best-effort basis:
Client-side implementations are also in development:
The editor acknoweldges the following issues and expects the Working Group to make decisions about them:
discussed in the March 2006 SemWeb IG meeting; see irc notes
See also GRDDL extraction *to* RDFa Ben Adida (Friday, 8 September) and following, and comments on Sequential Transformations 20 Oct
The following issues have been resolved by the Working Group:
A collection of test cases is in development. The original announcement was 02 Feb 2005. As of April 2005, they include:
Tests pending:
An example homepage with Dublin Core, GeoURL, RSS, Creative Commons, etc. demonstrates several transformations and dialects.
The GRDDL Working Group convened August 2006 with Harry Halpin as chair and several of the contributors and implementors above participating, plus Chimezie Ogbuji, Fabien Gandon, Brian Suda, and Rachel Yager.