slanted W3C logo
Cover page images (keys)

Semantic Web Data Integration with hCalendar and GRDDL

Dan Connolly
XML Conference & Exposition 2005 | From Syntax to Semantics (XML 2005)
Atlanta, GA, USA
16 November 2005

Toward Open Data

Is this what Web 2.0 is all about? If so, maybe it's not such a bad thing.

Outline

  1. A history of (X)HTML and the Web
  2. Toward data integration, Web style
  3. Semantic Web * iCalendar = RDF Calendar
  4. GRDDL: Semantic Web data in XHTML
  5. hCalendar and microformats
  6. hCalendar * GRDDL = RDF Calendar
  7. Microformats + SPARQL
  8. Microformats + OWL, rules

Objective: making it cost-effective to record and share knowledge formally, i.e. so that computers can manipulate it.

Getting into the Web

What was the tipping point— the killer appfor you?

Getting into the Web: downhill steps

To grow, start with some actual value plus lots of potential value and a downhill path for contributors:

Architect for participation -- Tim O'Reilly @ W3C's 10 anniversary

Web Basics: URIs, HTTP, ...

So what makes all that work? Much of it is a story for another day.

See Architecture of the World Wide Web, Volume One
W3C Recommendation 15 December 2004

Web Basics: HTML

Aside: CSS deployment

... and the importance of test suites

HTML / SGML * XML = XHTML

along with CSS, of course

The Personal Information Disaster

RDF Calendar for travel itineraries

cal screen shot

But why not just spit out .ics from the nasty perl script?

Travel from place to place

map of itinerary

iCalendar does have location/geo fields, but only one per event. Flights have a departure and an arrival.

TODO list views of email archives

iCal screenshot

details: bug status in .ics of Jan 2004

Semantic Web Basics

The element of the Semantic Web

arrow tail, body and head are l are subject, property and value.

      @prefix foaf: <http://xmlns.com/foaf/0.1/>.
      <people.rdf#dan> foaf:name "Dan Connolly".
      <people.rdf#dan> foaf:interest <http://www.w3.org/XML/>.
    

Note the relationship to HTML links, especially with the re-discovery of the rel attribute.

Semantic web includes tables, trees...

Arrows can make a table, an arrow from each row to each value

Arrows can make a table, an arrow from each row to each value

... and tangly messes

Arrows can make a table, an arrow from each row to each value

RDFS and OWL

RDF Schema (RDFS) and the Web Ontology Language (OWL) correspond to UML notions such as subclass, domain, range, cardinality, ...

travel concepts schema

RDF Calendar in a nutshell

For details, see RDF Calendar - an application of the Resource Description Framework to iCalendar Data, Connolly and Miller September 2005

Round-trip testing

Comparing .ics files is tricky, so...

On RDF/XML Syntax

At least the issues in the 1998 spec have all been resolved, complete with test cases. There are plenty of interoperable parsers. And it works great with Relax-NG and nxml-mode :)

Observation: lots of structured data in XHTML dialects

Data in Documents

I believe that one of the best ways to transition into RDF, if not a long-term deployment strategy for RDF, is to manage the information in human-consumable form (XHTML) annotated with just enough info to extract the RDF statements that the human info is intended to convey. In other words: using a relational database or some sort of native RDF data store, and spitting out HTML dynamically, is a lot of infrastructure to operate and probably not worth it for lots of interesting cases.

We all know that we have to produce a human-readable version of the thing... why not use that as the primary source?

XSLT for screen-scraping RDF out of real-world data
Dan Connolly to www-rdf-interest March 2000

Case Study: News Syndication at W3C

Site Summaries in XHTML is a cost-effective way to formalize our news metadata.

GRDDL Semantics: explicit, grounded in the Web

A person (or a machine) can "follow your nose" from the document to the transformation algorithm, to the data, to the definitions of the terms used in the data.

GRDDL Syntax: author's choice

GRDDL drawback: turing completeness

GRDDL Details

GRDDL: multiple dialects allowed

one document, multiple transformations

GRDDL: one profile for lots of documents

transformation via profile

For details, see Gleaning Resource Descriptions from Dialects of Languages (GRDDL), Hazaël-Massieux and Connolly May 2005

GRDDL for any XML, not just XHTML

<java version="1.5.0_04" class="java.beans.XMLDecoder"
   xmlns:grddl="http://www.w3.org/2003/g/data-view#"
   grddl:transformation="grokVioletUML.xsl">
 <object class="com.horstmann.violet.ClassDiagramGraph"> 
  <void method="addNode"> 
   ...
UML diagram with OWL formalization

Origins of hCalendar and Microformats

Borrowing from Microformats: Evolving the Web by Tantek Çelik Sep 2005

Microformats Principles

XHTML * iCalendar = hCalendar

more hCalendar

hCalendar + GRDDL = RDF Calendar

Upcoming events on blogs

From Tantek's Thoughts:

Mix with glean-hcal.xsl and we get:

<r:RDF xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:c="http://www
.w3.org/2002/12/cal/icaltzd#">
...
      <c:Vevent>
        <c:summary xml:lang="en-us">Web 2.0 </c:summary>
        <c:dtstart r:datatype="http://www.w3.org/2001/XMLSchema#date">2005-10-05</c:dtstart>
        <c:dtend r:datatype="http://www.w3.org/2001/XMLSchema#date">2005-10-08</c:dtend>
        <c:url r:resource="http://web2con.com/"/>
        <c:location xml:lang="en-us">The Argent, San Francisco </c:location>
      </c:Vevent>

SQL * URIs = SPARQL

Use GRDDL to aggregate data from friends etc, then...

table subject/property/value
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  PREFIX c: <http://www.w3.org/2002/12/cal/icaltzd#>
  SELECT ?name, ?summary, ?when
   FROM <myFriendsBlogsData>
   WHERE { ?somebody foaf:name ?name; foaf:mbox ?mbox.
           ?event c:summary ?summary;
                  c:dtstart ?ymd;
                  c:attendee [ c:calAddress ?mbox ]
         }.
?name?summary?when
Tantek ÇelikWeb 2.02005-10-05
Norm WalshXML 20052005-11-13
Dan ConnollyW3C tech plenary2006-02-27

See SPARQL Query Language for RDF W3C Working Draft 21 July 2005, plus Norm's travel this year

Fill in blanks with rules

Mix with some logical rules:

Full details: calendar background rules

A model behind hCalendar

person, place, dates

Can we do this event too?

RDF merges trivially

stack stack

Partial Understanding

RDF statements* are independent. RDF semantics are monotonic.

RDFXML
Premise
<Book rdf:ID="book1">
 <dc:title>The Grapes of Wrath</title>
 <dc:creator>Steinbeck</author>
</Book>
<xsd:simpleType name="myInteger">
  <xsd:restriction base="xsd:integer">
    <xsd:minInclusive value="10000"/>
    <xsd:maxInclusive value="99999"/>
  </xsd:restriction>
</xsd:simpleType>
Conclusion
<Book rdf:ID="book1">
 <dc:title>The Grapes of Wrath</title>
</Book>
<!-- no, this does not follow -->
<xsd:simpleType name="myInteger">
  <xsd:restriction base="xsd:integer">
    <xsd:maxInclusive value="99999"/>
  </xsd:restriction>
</xsd:simpleType>

*RDF/XML does have a rdf:parseType="Collection" syntax, which expands to a lisp style binary tree in the abstract syntax. This erasure property works not on XML elements, but on RDF statements.

Merging hCalendar data

two events, one person, several days and places

OWL identity reasoning

Premise
# one-to-many
foaf:mbox a owl:InverseFunctionalProperty.

:dan foaf:mbox <mailto:connolly@w3.org>.
:dan foaf:name "Dan Connolly".

:daniel foaf:mbox <mailto:connolly@w3.org>.
:daniel foaf:name "Daniel W. Connolly".
Conclusion:
:daniel owl:sameAs :dan.
:daniel foaf:name "Dan Connolly".
:daniel foaf:name "Daniel W. Connolly".

Consistency checking

Review

Acknowledgements, Colophon

These slides: http://www.w3.org/2002/12/cal/mash/slides