W3C

GRDDL Primer

W3C Working Draft 2 October 2006

This version:
http://www.w3.org/TR/2006/WD-grddl-primer-20061002/
Latest version:
http://www.w3.org/TR/grddl-primer/
Editor:
Ian Davis, Talis
Authors:
see Acknowledgments

Abstract

GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. It is a technique for obtaining RDF data from XML documents and in particular XHTML pages. Authors may explicitly associate documents with transformation algorithms, typically represented in XSLT, using a link element in the head of the document. Alternatively the information needed to obtain the transformation may be held in an associated metadata profile document or namespace document. Clients reading the document can follow their nose using techniques described in the GRDDL specification to discover the appropriate transformations. This document uses a number of examples from the GRDDL Use Cases document to illustrate in detail the techniques GRDDL provides for associating documents with appropriate instructions for extracting any embedded data.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a First Public Working Draft of the GRDDL Primer. The GRDDL design was first released as a W3C technical report in April 2004. This document was developed by the GRDDL Working Group, which was chartered in July 2006 to review the specification and develop use cases, tutorial materials, and tests. The first few examples in this draft have been worked out in detail, though the examples later in the document are still under discussion. The Working Group expects to advance GRDDL to Recommendation Status, though this primer may end up as a separate Working Group Note.

GRDDL is intended to contribute to addressing Web Architecture issues such as RDFinXHTML-35 and namespaceDocument-8 as well as issues postponed by the RDF Core working group such as rdfms-validating-embedded-rdf and faq-html-compliance.

Please send comments about this document to public-grddl-comments@w3.org (with public archive). A log of changes is maintained for the convenience of editors and reviewers.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents

Introduction

GRDDL provides a relatively inexpensive set of mechanisms for bootstrapping RDF content from uniform XML dialects in such a way as to shift the burden of formulating RDF to transformation algorithms written specifically for these dialects. XML Transformation languages such as XSLT are quite versatile in their ability to process, manipulate, and generate XML and the use of XSLT to generate XHTML from single-purpose XML vocabularies is historically celebrated as a powerful idiom for separating structured content from presentation.

GRDDL shifts this idiom to a different end: separating structured content from its authoritative meaning (or semantics). The way in which GRDDL empowers authors of web content can be considered somewhat analogous to allowing a non-native speaker to learn the spoken form of a new language first, before attempting to master its written form - rather than trying to learn both simultaneously.

GRDDL works through associating transformations with an individual document either through direct inclusion of references or indirectly through profile documents. Content authors can nominate the transformations for producing RDF from their content and use GRDDL to refer to them. For XML formats the transformations are commonly expressed using XSLT 1.0, although other methods are permissible. Generally, if the transformation can be fully expressed in XSLT 1.0 then it is preferable to use that format since all GRDDL processors should be capable of interpreting an XSLT 1.0 document.

This document may be read in conjunction with the GRDDL Use Cases which describes a series of common scenarios for which GRDDL may be suitable. Readers desiring complete technical detail on the GRDDL mechanism should refer to the GRDDL Working Draft.

In this document the term HTML is used to refer to the XHTML dialect of HTML.

Scheduling Example

To introduce GRDDL concepts, the following section explores how GRDDL can be used to satisfy the scheduling use case. In this use case Jane, a frequent traveller, is trying to schedule a meeting with three of her friends.

Linking to a GRDDL Transform

GRDDL provides a number of ways for GRDDL Transformations to be associated with content, each of which is appropriate in different situations. The simplest method for authors of HTML content is to embed a reference to the transformations using a link element in the head of the document.

Microformats are simple conventions for embedding semantic markup for a specific domain in human-readable documents. In our example one of Jane's friends has marked up their schedule using the hCalendar microformat. The hCalendar microformat uses HTML class attributes to associate event related semantics with elements in the markup:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>Robin's Schedule</title>
  </head>
  <body>
    <ol class="schedule">
      <li>2006
        <ol>
          <li class="vevent">
            <strong class="summary">Fashion Expo</strong> in 
            <span class="location">Paris, France</span>:
            <abbr class="dtstart" title="2006-10-20">Oct 20</abbr> to 
            <abbr class="dtend" title="2006-10-22">22</abbr>
           </li>
        
          <li class="vevent">
            <strong class="summary">New line review</strong> in 
            <span class="location">Köln, Germany</span>:
            <abbr class="dtstart" title="2006-10-26">Oct 26</abbr> to 
            <abbr class="dtend" title="2006-10-27">27</abbr>
           </li>
    
          <li class="vevent">
            <strong class="summary">Clothing 2006</strong> in 
            <span class="location">Rome, Italy</span>:
            <abbr class="dtstart" title="2006-12-1">Dec 1</abbr> to 
            <abbr class="dtend" title="2006-12-5">5</abbr>
          </li>
        </ol>
      </li>
      <li>2007
        <ol>
          <li class="vevent">
            <strong class="summary">Diva Awards</strong> in 
            <span class="location">Los Angeles, USA</span>:
            <abbr class="dtstart" title="2007-01-6">Jan 6</abbr> to 
            <abbr class="dtend" title="2007-01-8">8</abbr>
           </li>
        
          <li class="vevent">
            <strong class="summary">Board Review</strong> in 
            <span class="location">New York, USA</span>:
            <abbr class="dtstart" title="2007-02-23">Feb 23</abbr> to 
            <abbr class="dtend" title="2007-02-24">24</abbr>
           </li>
   
        </ol>
      </li>
    </ol>
  </body>
</html>
  

To explicitly relate the data in this document to the RDF data model the author needs to make two changes. First she needs to add a profile attribute to the head element to denote that her document contains GRDDL metadata. In HTML, profiles are used to link documents to descriptions of the metadata schemes they employ. The profile URI for GRDDL is http://www.w3.org/2003/g/data-view and by including this URI in her document Robin is declaring that the metadata in her markup can be interpreted using GRDDL.

The resulting HTML might look like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Robin's Schedule</title>
  </head>
  <body>
  ...

Then she needs to add a link element containing the reference to the specific instructions for converting HTML containing hCalendar patterns into RDF. She can either write her own instructions or re-use an existing set. The link element contains the token transformation in the rel attribute and the URI of the instructions for extracting RDF in the href attribute

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Robin's Schedule</title>
    <link rel="transformation" href="http://www.w3.org/2002/12/cal/glean-hcal"/>
  </head>
  <body>
  ...

The profile URI in the resulting document signals that the receiver of the document may look for link elements with a rel attribute containing the token transformation and use any or all of those links to determine how to extract the data as RDF.

A diagram indicating the sequence of steps described for obtaining RDF from a document using an explicit link to the transformation as described in the preceding paragraph

Referencing Via Profile Documents

Another way to associate GRDDL instructions with a document is by referencing those transformations from a profile document referenced in the head of the HTML. This method can be more convenient for the content author but requires that the profile document contains GRDDL metadata and be accessible to the GRDDL client.

In our example another of Jane's friends, David, has chosen to mark up his schedule using Embedded RDF:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head profile="http://purl.org/NET/erdf/profile">
    <title>Where Am I</title>
    <link rel="schema.cal" href="http://www.w3.org/2002/12/cal#" />
  </head>
  <body>
    <p class="-cal-Vevent" id="tiddlywinks">
      From <span class="cal-dtstart" title="2006-10-07">7 October, 2006</span>
      to <span class="cal-dtend"  title="2006-10-13">12 October, 2006</span> 
      I will be attending the <span class="cal-summary">National Tiddlywinks
      Championship</span> in 
      <span class="cal-location">Bognor Regis, England</span>
    </p>
   
    <p class="-cal-Vevent" id="holiday">
      Then I'm <span class="cal-summary">on holiday</span> in the 
      <span class="cal-location">Cayman Islands</span> between
      <span class="cal-dtstart" title="2006-11-14">14 November, 2006</span>
      and <span class="cal-dtend"  title="2007-01-02">1 January, 2007</span> 
    </p>

    <p class="-cal-Vevent" id="award">
      I'm back in the US on <span class="cal-dtstart" title="2007-01-08">the 8th
      January</span> to <span class="cal-summary">pick up a lifetime
      achievement award from the world gamers association</span>. This time
      the ceremony is in <span class="cal-location">Los Angeles</span>. I'll be
      flying home on the <span class="cal-dtend"  title="2007-01-11">10th</span> 
    </p>
  </body>
</html>
  

Note that in this document the profile attribute does not contain a reference to the GRDDL profile. Instead it references the standard profile URI for Embedded RDF which does contain the GRDDL metadata. Anyone wishing to get the RDF data out of David's page can fetch the Embedded RDF profile URI to obtain the following profile document:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Embedded RDF HTML Profile</title>
    <link rel="transformation" href="http://www.w3.org/2003/g/glean-profile" />
  </head>
  <body>
    <p>
      <a rel="profileTransformation" 
          href="http://purl.org/NET/erdf/extract-rdf">GRDDL transform</a>
    </p>
  </body>
</html>
  

This document contains a reference to the GRDDL profile which again indicates that it may contain link elements with references to GRDDL instructions that can be applied. Note that these instructions are applied to this profile document, not David's document. Because the client is inspecting a profile document it expects that the instructions identified by http://www.w3.org/2003/g/glean-profile are for producing a list of URIs identifying instructions to be applied to David's HTML document. Those instructions are identified in the profile document using links with a rel attribute of profileTransformation.

In this case the profile transformation refers to a a stylesheet that can convert HTML containing Embedded RDF into RDF/XML. This stylesheet can be applied to David's document to obtain the equivalent RDF triples.

A diagram indicating the sequence of steps for obtaining RDF from a document using the profile URI as described in the preceding paragraph

Buying a Guitar Example

This section is not worked out in as much detail as the sections above. In particular, the relationship between XFN and FOAF is still under study. Stay tuned for future drafts, or better yet, send us suggested improvements.

In this section the guitar review use case is used to explain more fully the role of GRDDL in aggregating data from a variety of different sources.

Stephen is an avid guitar player. Stephan wishes to buy a new guitar, so he decides to check reviews. There are various special interest publications online which feature musical instrument reviews and could be blogs which contain reviews by individuals. Among the reviewers there may be friends of Stephan and people whose opinion Stephan values (e.g. well-known musicians and people whose reviews Stephan has found useful in the past). There may also be reviews planted by instrument manufacturers which offer very biased views.

First, Stephan needs to get a list of people he considers trusted sources into some sort of machine readable document. One choice would be FOAF (Friend of a Friend), a popular RDF vocabulary for describing social networks of friends and personal data. Other choices include vCard/RDF. The question is how to get these values? Microformats define simple formats which can easily convert between HTML and RDF through the use of GRDDL. To extract vCard/RDF from HTML he uses an XSLT stylesheet to transform the hCard encoded HTML document.

<address class="vcard" id="smith-stephan">
<a href="http://example.org/ssmith" class="fn url">Stephan Smith</a>
</address>
  

This snippet of HTML is converted into RDF with the use of the XSLT:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF 
  xmlns:rdf  ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#">
    
 <rdf:Description rdf:about="http://example.org/ssmith">
  <vCard:FN>Stephan Smith</vCard:FN>
  <vCard:URL>http://example.org/ssmith</vCard:URL>
 </rdf:Description>
</rdf:RDF>
  

Another microformat that allows for more information to be gleaned from the document is XFN. XFN is the XHTML Friends Network. XFN outlines relationships between individuals using a controlled set of values in the rel attributes of links. Examples of such relationships are friends, colleagues, co-workers, etc.

<ul>
  <li><a href="http://peter.example.org/" rel="met friend collegue">Peter Smith</a></li>
  <li><a href="http://john.example.org/" rel="met">John Doe</a></li>
  <li><a href="http://paul.example.org/" rel="met">Paul Revere</a></li>
</ul>
  

Since XFN relationships are embedded in anchor (a) elements, they can be expressed in RDF in a variety of ways. Given a document with XFN data, a GRDDL transformation can extract RDF data about his friends from a document marked up with XFN. These descriptions would allow an RDF spider (a scutter) to follow links to additional RDF content that may include vCard and FOAF descriptions.

On the Guitar site, there are product reviews for each guitar. The guitars are also marked up with microformats, so using GRDDL we can extract machine-readable data about each item. Along with manufacturer data, each member of the site can also leave feedback about the item in the form of a review, using a microformat like hReview that we can also convert to RDF.

With all of these tools we can find Stephan's friends and find the guitar reviews that those friends created. Using GRDDL we can glean information about the guitar in the form of product specifications supplied by the manufacturer and reviews from site members. Once we have this data as RDF we can run queries can be run on it using SPARQL. SPARQL (The SPARQL Protocol and RDF Query Language) is a query language for RDF.

If Stephan was looking for a guitar with a specific review rating or higher from a his group of friends, we now have enough data in RDF to do just that:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rev: <http:/www.purl.org/stuff/rev#>

SELECT DISTINCT ?name ?rating

FROM <http://example.org/guitar/1234/>

WHERE {
  ?x rev:reviewer ?reviewer ;
     rev:rating ?rating . 
  FILTER (?rating > "2") .
  ?reviewer foaf:name ?name .
}
  

The first restriction on the data can be a check on review data to make sure it includes reviewers and ratings. Once we have all the matching reviews, we can then restrict the data so that the reviews are all those by Stephan's friends. From the XFN links in Stephan's page which identify people Stephan trusts, we can match URIs to other locations where they have been asserted (the guitar review page for instance).

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rev: <http:/www.purl.org/stuff/rev#> 
PREFIX xfn: <http://gmpg.org/xfn/11#>

SELECT DISTINCT ?name ?rating ?xfnhomepage ?foafhomepage

FROM <http://example.org/guitar/1234/>
FROM <http://stephans-homepage.org/blogroll/xfn/>

WHERE {
  ?x rev:reviewer ?reviewer ;
     rev:rating ?rating .
  FILTER (?rating > "2") . 

  ?reviewer foaf:name ?name ;
            foaf:homepage ?foafhomepage .

  ?y xfn:friend ?xperson .
  ?xperson foaf:homepage ?xfnhomepage . 
  FILTER (?xfnhomepage = ?foafhomepage) 
}
  

SPARQL results can be obtained as XML or JSON and can easily be consumed by another application. This can display the results on screen, email them to Stephan or it can be pulled into another application to search the web for the best prices on the short list of guitars.

Further Information

This concludes the GRDDL Primer. Full technical detail of the GRDDL mechanism may be found in the corresponding Gleaning Resource Descriptions from Dialects of Languages (GRDDL) Working Draft.

References

[GRDDL Draft]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL), Dominique Hazaël-Massieux, Dan Connolly, Authors'draft, 2006/03/09 15:45:31, http://www.w3.org/2004/01/rdxh/spec. Latest version available at http://www.w3.org/TR/grddl/.
[Microformats]
Microformats.org, 2006/08/30 11:05:31, http://microformats.org/ .
[RDF]
Resource Description Framework (RDF) Model and Syntax Specification, Ora Lassila, Ralph R. Swick, Editors. World Wide Web Consortium Recommendation, 1999,
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.
Latest version available at http://www.w3.org/TR/REC-rdf-syntax/.
RDF Vocabulary Description Language 1.0: RDF Schema, Dan Brickley and R.V. Guha, Editors. W3C Recommendation, 10 February 2004,
http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ .
Latest version available at http://www.w3.org/TR/rdf-schema/.
[SPARQL]
SPARQL Query Language for RDF, Eric Prud'hommeaux and Andy Seaborne, Editors. W3C Candidate Recommendation 6 April 2006,
http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/ .
Latest version available at http://www.w3.org/TR/rdf-sparql-query/.

Acknowledgements

The editor would like to thank the following Working Group members for authoring this document:

This document is a product of the GRDDL Working Group.

Change Log

Changes since the WG decision to publish on 27 Sep include

$Log: Overview.html,v $
Revision 1.4  2006/10/03 19:51:59  jean-gui
Fixed an encoding issue with Dom's name

Revision 1.3  2006/10/03 16:16:05  connolly
removed editor's draft blurb from status section

Revision 1.2  2006/10/03 15:42:34  jean-gui
Removed some editor's draft markup

Revision 1.1  2006/10/03 15:13:04  jean-gui
Renamed primer.html to Overview.html

Revision 1.1  2006/10/03 15:11:54  jean-gui
/TR/2006/WD-grddl-primer-20061002/

Revision 1.16  2006/10/02 22:51:19  connolly
turned public-grddl-comments mailbox into a link

Revision 1.15  2006/09/30 00:38:47  connolly
note in the status section that some examples are incomplete

Revision 1.14  2006/09/30 00:35:01  connolly
removed some links to the glossary that were copied from the use cases document
updated link to suda.co.uk

Revision 1.13  2006/09/30 00:27:26  connolly
fix link from title page to acknowledgements section

Revision 1.12  2006/09/30 00:26:10  connolly
update parts of the status section that are different between
use cases and primer

Revision 1.11  2006/09/30 00:24:34  connolly
- remove "previous version" link to talis copy from title page
- move pubrules check to status section
- expand change log to give full audit trail since WG decision
- remove XHTML 1.1 icon, since pubrules requires 1.0 :-/

Revision 1.10  2006/09/29 23:54:08  hhalpin
fixed minor errors and links

revision 1.9
date: 2006/09/29 23:20:05;  author: hhalpin;  state: Exp;  lines: +5 -90
primer chnages for pubrules
----------------------------
revision 1.8
date: 2006/09/29 23:10:58;  author: hhalpin;  state: Exp;  lines: +1 -1
primer changes again
----------------------------
revision 1.7
date: 2006/09/29 23:07:42;  author: hhalpin;  state: Exp;  lines: +170 -42
primer changes again
----------------------------
revision 1.6
date: 2006/09/29 22:43:53;  author: hhalpin;  state: Exp;  lines: +2 -2
primer changes again spelling errors
----------------------------
revision 1.5
date: 2006/09/29 22:35:39;  author: hhalpin;  state: Exp;  lines: +6 -7
primer changes again
----------------------------
revision 1.4
date: 2006/09/29 22:33:00;  author: hhalpin;  state: Exp;  lines: +33 -70
primer changes
----------------------------

Revision 1.3  2006/09/29 22:05:17  connolly
"under construction" sign atop the section with XFN in it

Revision 1.2  2006/09/29 19:49:46  connolly
copied from devcvs v 1.4 2006/09/29 19:00:43 idavis

Revision 1.4  2006/09/29 19:00:43  idavis
Fixed formatting of CVS log at end of document

----------------------------
revision 1.3
date: 2006/09/29 18:58:18;  author: idavis;  state: Exp;  lines: +22 -13
Revised abstract to align more with use cases; checked in supporting HTML and PNG files
----------------------------
revision 1.2
date: 2006/09/29 18:22:17;  author: idavis;  state: Exp;  lines: +591 -437
Inserted current, latest and previous version links; revised abstract completely; normalised to linefeed line endings
----------------------------
revision 1.1
date: 2006/09/29 16:38:15;  author: connolly;  state: Exp;
6180 2006-09-27 13:29:57Z http://research.talis.com/2006/grddl-wg/primer.html