graphic with four colored squares
slanted W3C logo
Cover page images (keys)

Deploying Web-scale Mash-ups by Linking Microformats and the Semantic Web

Dan Connolly, W3C/MIT
Harry Halpin, University of Edinburgh
16th International World Wide Web Conference
Banff, Alberta, Canada
April 2007

Overview

Calendar Mash-up: The Problem

When can Jane, David, and Robin meet?

Calendar Mash-up: Problem Solved

GRDDL and SPARQL to the rescue!

StartEndPlaceSummary
"2007-01-08""2007-01-11"Edinburgh, UKWeb Design Conference

Jane's friends Robin and David are both in town with her in Edinburgh on January 8th through 10th for the Web Design Conference.

calendars Data in RDF SPARQL

Robin uses hCalendar

Events in Robin's schedule...

... are marked up like this:

          <li class="vevent">
            <strong class="summary">Fashion Expo</strong> in 
            <span class="location">Paris, France</span>:
            <abbr class="dtstart" title="2006-10-20">Oct 20</abbr> to 
            <abbr class="dtend" title="2006-10-23">22</abbr>
          </li>

What is hCalendar?

cal screen shot

hCalendar = iCalendar in XHTML

iCalendar (RFC2445):

BEGIN:VEVENT
UID:20020630T230445Z-3895-69-1-7@jammer
DTSTART;VALUE=DATE:20020703
DTEND;VALUE=DATE:20020706
SUMMARY:XYZ Conference
LOCATION:San Francisco
END:VEVENT

Why hCalendar?

Microformats Logo

hCalendar and other microformats have shared tools, knowledge, process...

Demo of Microformat Browsing using Tails

Microformats Unleashed

Microformats are centralized data formats for different types of data, often (nearly) isomorphic to already widely adopted non-Web standards:

The lower-case semantic web

The Limits of Microformats

Hotel Review Mash-Up: Problem

Hotel Review Mash-Up: Problem Solved

A hotel with a ranking of 5 reviewed by a trusted friend:

reviews Data in RDF query
ratingnameregion homepage hotelname
5 PeterS Edinburgh http://peter.example.org Witch's Caldron Hotel, Edinburgh

"How did you do that?" I'm glad you asked...

Data Mash-Up

hotel review query answer diagram

The Straw that broke the Camel's Back

twitter screenshot

Too many services replicate the same sort of data....what if you have a Friendster, a Myspace, and a Twtter account?

photo by Jon Hicks

Toward Open Data

Is this what Web 2.0 is all about? If so, maybe it's not such a bad thing.

The Semantic Web

... is an open world and universal space for machine-readable data.

things in documents
To a computer, then, the web is a flat, boring world devoid of meaning...This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them...Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.
TimBL, WWW1994

The element of the Semantic Web

arrow tail, body and head are l are subject, property and value.

<#p> foaf:name "PeterS";
<#p> foaf:homepage <http://peter.example.org>.

Note the relationship to HTML links, especially with the re-discovery of the rel attribute.

Use URIs to name relationships

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix rev: <http://www.purl.org/stuff/rev#>.
@prefix vcard: <http://www.w3.org/2006/vcard/ns#>.
@prefix xfn: <http://gmpg.org/xfn/11#>.

_:hotel
  vcard:adr [ vcard:locality "Edinburgh" ];
  rev:hasReview [
  rev:rating 5;
  rev:reviewer _:who;
  rdfs:label "Witch's Caldron Hotel, Edinburgh"
 ].

<jane> xfn:friend _:who.

_:who
   foaf:name "PeterS";
   foaf:homepage <http://peter.example.org>.

Data Mash-Up

hotel review query answer diagram

Semantic Web Basics

Semantic web includes tables, trees...

Arrows can make a table, an arrow from each row to each value

Arrows can make a table, an arrow from each row to each value

... and tangly messes

Arrows can make a table, an arrow from each row to each value

Semantic Web Architecture

The Semantic Web is to spreadsheets and databases what the Web of hypertext documents is to word processor files.

 WebSemantic Web
Traditional Designhypertextdatabase, spreadsheet, logic
+URIs
-link consistencyglobal consistency?
=viral growth

XML and Trees

XML (Xtensible Markup Language) is a generalization of HTML that lets anyone name the elements and attributes

XML

Think ASCII for the 21st Century!

Also a tree model (DOM - Document Object Model), which is a handy data structure.

RDF merges naturally

stack stack

Partial Understanding

RDF statements* are independent. RDF semantics are monotonic.

RDFXML
Premise
<Book rdf:ID="book1">
 <dc:title>The Grapes of Wrath</title>
 <dc:creator>Steinbeck</author>
</Book>
<xsd:simpleType name="myInteger">
  <xsd:restriction base="xsd:integer">
    <xsd:minInclusive value="10000"/>
    <xsd:maxInclusive value="99999"/>
  </xsd:restriction>
</xsd:simpleType>
Conclusion
<Book rdf:ID="book1">
 <dc:title>The Grapes of Wrath</title>
</Book>
<!-- no, this does not follow -->
<xsd:simpleType name="myInteger">
  <xsd:restriction base="xsd:integer">
    <xsd:maxInclusive value="99999"/>
  </xsd:restriction>
</xsd:simpleType>

*RDF/XML does have a rdf:parseType="Collection" syntax, which expands to a lisp style binary tree in the abstract syntax. This erasure property works not on XML elements, but on RDF statements.

On RDF/XML Syntax

At least the issues in the 1998 spec have all been resolved, complete with test cases. There are plenty of interoperable parsers. And it works great with Relax-NG and nxml-mode :)

Using GRDDL to get RDF from XML, XHTML

GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a way to boostrap RDF out of XML and in particular XHTML data by explicitly linking transformations from RDF to XML.

GRDDL terminology:

  1. Source Document: an XML document which references at least one GRDDL transformation and hence licenses a GRDDL-aware to extract RDF.
  2. GRDDL-aware agent: a software agent able to identify the GRDDL transformations and run them to extract RDF.
  3. GRDDL Transformation: an algorithm for getting RDF from a source document

Microformats and the Semantic Web?

Describing your Social Network

Recall Jane needs to her list of trusted sources in some machine readable format.

As long as they can be mapped to RDF, they can be mapped to each other.

Xhtml Friends Network (XFN): an XHTML profile

  1. XFN is an microformat for social network data; this profile of HTML is named with a URI.
  2. The page of Jane's friends adds one attribute to declare this profile:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head profile="http://gmpg.org/xfn/11">
	<title>Jane's XFN List</title>
</head>
<body>
  <h1>Jane's <abbr title="XHTML Friends Network">XFN</abbr> List</h1>
  <ul class="xoxo">
    <li class="vcard"><a href="http://peter.example.org" class="url fn"
    rel="met collegue friend">Peter Smith</a></li>
      <li class="vcard"><a href="http://john.example.org" class="url fn"
    rel="met">John Doe</a></li>
      <li class="vcard"><a href="http://paul.example.org" class="url fn"
    rel="met">Paul Revere</a></li>
    </ul>
</body>
</html>

* actually, the XFN profile isn't quite GRDDL-happy yet; but the eRDF profile is.

Out comes mergeable RDF data

magic* happens here...

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix h: <http://www.w3.org/1999/xhtml> .
@prefix xfn: <http://gmpg.org/xfn/11#> .

[]
    foaf:homepage <http://www.w3.org/2001/sw/grddl-wg/doc29/janefriends.html>;
    xfn:friend [
        foaf:homepage <http://peter.example.org>
    ];
    xfn:met [
        foaf:homepage <http://peter.example.org>
    ], [
        foaf:homepage <http://john.example.org>
    ], [
        foaf:homepage <http://paul.example.org>
    ] .

*we'll explain the trick later.

Using a GRDDL Transformation Directly

The hReview microformat doesn't have an established profile yet, so the Hotel Review data uses GRDDL directly:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Hotel Reviews from Example.com</title>
    <link rel="transformation" 
       href="http://www.w3.org/2001/sw/grddl-wg/doc29/hreview2rdfxml.xsl"/>
</head>
<div class="hreview" id="_665">
  <div class="item vcard">
  <b class="fn org">Witch's Caldron Hotel, Edinburgh</b>
  <span><span class="rating">5</span> out of 5 stars</span>
  1. The transformation link tells a GRDDL-aware agent how to find an transformation, ../hreview2rdfxml.xsl from this syntax to standard RDF/XML syntax.
  2. The http://www.w3.org/2003/g/data-view profile shows that this document uses rel="transformation" as specified in by GRDDL.

SQL * URIs = SPARQL

table subject/property/value
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  PREFIX c: <http://www.w3.org/2002/12/cal/icaltzd#>
  SELECT ?name, ?summary, ?when
   FROM <myFriendsBlogsData>
   WHERE { ?somebody foaf:name ?name; foaf:mbox ?mbox.
           ?event c:summary ?summary;
                  c:dtstart ?ymd;
                  c:attendee [ c:calAddress ?mbox ]
         }.
?name?summary?when
Tantek ÇelikWeb 2.02005-10-05
Norm WalshXML 20052005-11-13
Dan ConnollyW3C tech plenary2006-02-27

See SPARQL Query Language for RDF W3C Working Draft.

Putting RDF to Work with SPARQL

"Find reviews better than 2 stars and tell me the name of the hotel and the reviewer."


PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rev: <http://www.purl.org/stuff/rev#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?name ?rating ?hotelname

FROM <http://www.w3.org/2001/sw/grddl-wg/doc29/review.rdf>

WHERE {
?x rev:hasReview ?review.
?review rev:rating ?rating;
  rdfs:label ?hotelname; 
  rev:reviewer [ foaf:name ?name ].
FILTER (?rating > 2).
}

Details: hotelquery1.rq

SPARQL results from hReview data

The query worked, but it's not precise enough:

name rating hotelname
"PeterS" 5 "Enlightenment Amsterdam Hotel"
"RexR" 5 "Pilgrim Hostel"
"PeterS" 4 "Fano Hotel"
"MaryV" 5 "Franklin Hotel, Philadelphia"
"Simon" 5 "Forest Cafe Youth Hostel, Edinburgh"
"JennyR" 3 "Merton Atlanta"
"JohnD" 4 "Walter Scot Hotel, Edinburgh"
"PeterS" 5 "Royal Moon Hotel, Boston"
"JohnD" 5 "Elena Plaza Hotel"
"PeterS" 5 "Witch's Caldron Hotel, Edinburgh"
"RexR" 3 "Bond Plaza Hotel"
"RexR" 5 "McRae Palace, Edinburgh"
"RexR" 5 "Ritchie Centre, Edinburgh"
"PeterS" 5 "Maximus New York Hotel & Towers"

More precise SPARQL query

"Find reviews of hotels in Edinburgh better than 2 stars and tell me the name of the hotel and the reviewer."


SELECT DISTINCT ?name ?rating ?homepage

FROM <hotel-data.rdf>
FROM <janefriends.rdf>

WHERE {
?x rev:hasReview ?review;
  vcard:adr [ vcard:locality "Edinburgh" ].

?review rev:rating ?rating;
  rdfs:label ?hotelname; 
  rev:reviewer [ foaf:name ?name ].
FILTER (?rating > 2).

Sample Query Online

SPARQL results from hReview and vcard locality data

This shows hotels with a rating of 2 stars or higher that are located in Edinburgh, but there might be review spam:

rating name hotelname region
5 "RexR" "Ritchie Centre, Edinburgh" "Edinburgh"
5 "PeterS" "Witch's Caldron Hotel, Edinburgh" "Edinburgh"
5 "Simon" "Forest Cafe Youth Hostel, Edinburgh" "Edinburgh"
5 "RexR" "McRae Palace, Edinburgh" "Edinburgh"
4 "JohnD" "Walter Scott Hotel, Edinburgh" "Edinburgh"

Mashing up Friends and Hotel Reviews

Querying for Trusted Reviews

"Find reviews by my friends of hotels in Edinburgh better than 2 stars and tell me the name of the hotel and the reviewer."


PREFIX xfn: <http://gmpg.org/xfn/11#>

SELECT DISTINCT ?rating ?name ?homepage ?hotelname
FROM <review.rdf>
FROM <xfn.rdf>
WHERE {
?place rev:hasReview ?review;
  vcard:adr [ vcard:Locality "Edinburgh"].
?review
  rdfs:label ?hotelname;
  rev:rating ?rating;
  rev:reviewer ?reviewer.

FILTER (?rating > 2).

?reviewer foaf:name ?name;
  foaf:homepage ?homepage.

[ foaf:homepage <janefriends.html> ]
   xfn:friend [ foaf:homepage ?homepage ].
}

Details: hotelquery3.rq

Hotel query diagram

"Find reviews by my friends of hotels in Edinburgh better than 2 stars and tell me the name of the hotel and the reviewer."

hotel review query diagram

Trusted Reviews from SPARQL via GRDDL

Just right:

ratingnameregion homepage hotelname
5 PeterS Edinburgh http://peter.example.org Witch's Caldron Hotel, Edinburgh

Return to Calendar Mash-up

When can Jane, David, and Robin meet?

Embedded RDF

David has chosen to mark up his schedule using Embedded RDF (an alternative to RDFa), a way to use GRDDL to get out RDF from documents.

Embedded RDF file online

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head profile="http://purl.org/NET/erdf/profile">
    <title>Where Am I</title>
    <link rel="schema.cal" href="http://www.w3.org/2002/12/cal#" />
  </head>

  <body>
    <p class="-cal-Vevent" id="tiddlywinks">
      From <span class="cal-dtstart" title="2006-10-07">7 October, 2006</span>
      to <span class="cal-dtend"  title="2006-10-13">12 October, 2006</span> 
      I will be attending the <span class="cal-summary">National Tiddlywinks
      Championship</span> in 
      <span class="cal-location">Bognor Regis, UK</span>.
    </p>

   
    <p class="-cal-Vevent" id="holiday">
      Then I'm <span class="cal-summary">on holiday</span> in the 
      <span class="cal-location">Cayman Islands</span> between
      <span class="cal-dtstart" title="2006-11-14">14 November, 2006</span>

      and <span class="cal-dtend"  title="2007-01-02">1 January, 2007</span>. 
    </p>

    <p class="-cal-Vevent" id="award">
      I then visit Scotland on <span class="cal-dtstart" title="2007-01-08">the 8th
      January</span> to <span class="cal-summary">pick up a lifetime
      achievement award from the world gamers association</span>. This time
      the ceremony is in <span class="cal-location">Edinburgh, UK</span>. I'll be
      taking the train home on the <span class="cal-dtend"  title="2007-01-11">10th</span>. 
    </p>

  </body>
</html>
  

The magic trick: GRDDL Recursion

GRDDL has gone meta!

This allows the HTML profile document to be GRDDL-enabled to link the standard library transformation of <link rel="transformation" href="http://www.w3.org/2003/g/glean-profile" /> and so extract the http://www.w3.org/2003/g/data-view#profileTransformation whose object is the transformation itself.

A diagram indicating the sequence of steps for obtaining RDF from a document using the profile URI as described in the preceding paragraph

Linking GRDDL to a Profile Document

Embedded RDF has a link to a GRDDL transformation in its profile document.

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Embedded RDF HTML Profile</title>
    <link rel="transformation" href="http://www.w3.org/2003/g/glean-profile" />
  </head>
  <body>
    <p>
      <a rel="profileTransformation" 
          href="http://purl.org/NET/erdf/extract-rdf">GRDDL transform</a>
    </p>
  </body>
</html>

GRDDL in Namespace Documents

No Transformation Links - just go to the namespace document!

In RDF, OWL, RDF Schema:

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dataview="http://www.w3.org/2003/g/data-view#">
 <rdf:Description rdf:about="http://www.w3.org/2004/01/rdxh/p3q-ns-example">
   <dataview:namespaceTransformation
       rdf:resource="http://www.w3.org/2004/01/rdxh/grokP3Q.xsl"/>
 </rdf:Description>
</rdf:RDF>

In XML Schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns="http:.../Order-1.0"
            targetNamespace="http:.../Order-1.0"
            version="1.0"
            ...
            xmlns:data-view="http://www.w3.org/2003/g/data-view#"
            data-view:transformation="http://www.w3.org/2003/g/embeddedRDF.xsl" >
    <xsd:element name="Order" type="OrderType">
    <xsd:annotation 
      <xsd:documentation>This element is the root element.</xsd:documentation>
    </xsd:annotation>
                 ...
  <xsd:annotation>
    <xsd:appinfo>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<rdf:Description rdf:about="http://www.w3.org/2003/g/po-ex">
	  <data-view:namespaceTransformation
	      rdf:resource="grokPO.xsl" />
	</rdf:Description>
      </rdf:RDF>
    </xsd:appinfo>
  </xsd:annotation>
...

RDF for Free

Is it too much work to ask people to add the transformation and profile to their individual instance data?

Creators or maintainers of vocabularies can also give users of their data the option of having their data transformed into RDF without having to even add any new markup to individual documents

Since once the tranformation has been linked to the profile or namespace document, all the users of the dialect get the added value of RDF for free

In either the namespace document or profile URI there has to be the following RDF property: http://www.w3.org/2003/g/data-view#profileTransformation whose subject is the namespace doc or profile document and whose object is the transformation itself.

RDFa

While GRDDL has primarily in the wild been used to convert widely deployed microformats to RDF, it can actually be used with the W3C RDFa work item that allows one to "microformat-style" embed arbitary RDF statements in HTML

RDFa is useful because microformats exist as a number of centralized vocabularies, and what if you want to mark-up meta-data in a web-page about a subject there isn't a microformat about?

Since RDFa is still a moving target, we personally recommend people use Embedded RDF for the time being unless they are willing to track the changes in RDFa, but RDFa is more expressive than Embedded RDF (allowing XML Schema datatypes, etc.

HTML Contains Implicit Structure

This document is licensed under a

<a href="http://cc.org/licenses/by/3.0/">
   CC License
</a>

and was written by TimBL.

Basic Stuff: Typing a Link

    This document is licensed under a

    <a href="http://cc.org/licenses/by/3.0/"
       xmlns:cc="http://cc.org/ns#" rel="cc:license">
       CC License
    </a>

    and was written by TimBL.
    

More Complex Structure: RDFa goes Deep

    This document 
    ...
    <div rel="dc:creator" class="foaf:Person"
       xmlns:dc="http://..." xmlns:foaf="http://...">
       and was written by
       <span property="foaf:nickname">
          TimBL
       </span>.
    </div>
    

yields

    <> dc:creator [a foaf:Person ; foaf:nickname "TimBL"] .

GRDDL Does RDFa

RDFa for Jane's schedule online

RDFa After GRDDL

<html xmlns:cal="http://www.w3.org/2002/12/cal/icaltzd#" xmlns:xs="http://www.w3.org/2001/XMLSchema#">
<head profile="http://www.w3.org/2003/g/data-view"> <title>Jane's Blog</title> <link rel="transformation"
href="http://www.w3.org/2001/sw/grddl-wg/td/RDFa2RDFXML.xsl"/>
</head> <body> <p about="#event1" class ="cal:Vevent"> <b property="cal:summary">Weekend off in Iona</b>:
<span property="cal:dtstart" content="2006-10-21" datatype="xs:date">Oct 21st</span>

to <span property="cal:dtend" content="2006-10-21" datatype="xs:date">Oct 23rd</span>.
See <a rel="cal:url" href="http://freetime.example.org/">FreeTime.Example.org</a> for
info on <span property="cal:location">Iona, UK</span>.
</p>
<p about="#event2" class ="cal:Vevent">
<b property="cal:summary">Holiday in Ireland</b>:
<span property="cal:dtstart" content="2006-12-23" datatype="xs:date">Dec 23rd</span>

to <span property="cal:dtend" content="2006-12-27" datatype="xs:date">Dec 27th</span>.
See <a rel="cal:url" href="http://vacation.example.org/">Vacation.Example.org</a> for
info on <span property="cal:location">Belfast, Ireland</span>.
</p>
<p><b>New Years!</b> Now it's 2007...</p>
<p about="#event3" class ="cal:Vevent">

<b property="cal:summary">Web Conference</b>:
<span property="cal:dtstart" content="2007-01-08" datatype="xs:date">Jan 8th</span>
to <span property="cal:dtend" content="2007-01-11" datatype="xs:date">Jan 11th</span>.
See <a rel="cal:url" href="http://webconf.example.org/">webconf.example.org</a> for
info on <span property="cal:location">Edinburgh, UK</span>.
</p>

Onwards!

Time for a break!

Now to Second Part