W3C SW Dev

HyperRDF: Using XHTML Authoring Tools with XSLT to produce RDF Schemas

Contents:

nearby:

Introduction

XML syntax is a little tedious, but lots of people are evidently willing and able of editing it by hand. RDF adds another layer of tedium, but there are still a few folks willing to write it by hand. I make heavy use of reification/quoting in my representation of logical formulas in RDF. This adds another layer of tedium that I find unmanageable, and I have been writing XML/SGML/HTML by hand for 10 years.

I have had a lot of success lately using XSLT to screen-scrape RDF out of XHTML pages, and I'm quite happy to use a hypertext editor (e.g. Amaya) to record my knowledge. I make use of the occasional class or rel attribute to distinguish the information that a particluar XSLT transformatoin is looking for from stuff that just happens to be there for other reasons. For example, I can write a typed link:

<a rel="interest" href="http://www.w3.org/XML/">XML</a>

on my home page, and convert it to RDF ala:

<rdf:Description about="">
  <interest>
    <rdf:Description
         rdf:about="http://www.w3.org/XML/">
      <rdfs:label>XML</rdfs:label>
    </rdf:Description>
  </interest>
</rdf:Description>

But I want to go beyond the post-hoc/third-party style of screen-scraping and make it clear that I, the author of the web pages, am making the very RDF assertions that the XSLT transformation generates, when I write my web pages. And I'm starting to think that this techique is sufficiently useful that it will be deployed beyond the single-use transformations I have been doing, to a scale where managing collisions among link relationship names and class names is essential.

Grounding link relationships and class names in the Web

NOTE: This section is being reconsidered in light of GRDDL

The HTML 4.0 specification, in section 6.12 Link types, enumerates a few useful link relationships, and then adds:

Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types. Please see the profile attribute of the HEAD element for more details.

We hereby establish the following conventions used to define some link types:

First, a mechanism somewhat analagous to the binding of element and attribute name prefixes to URIs in Namespaces in XML: a link relationship name whose prefix matches the id attribute of the head element denotes the URI resulting from the concatenation of the profile URI (in absolute form) and the local part of the link relationship name. For example:
<html xmlns="http://www.w3.org/1999/xhtml">
  <head id='rel' profile="http://www.w3.org/2000/07/hs78#">
    <title>example</title>
    <link id='c' rel='rel:classes' href='http://www.w3.org/2000/07/hs78#' />
  </head>
  ...
</html>

A relationship name containing no colon (':') character has an empty ("") prefix. The empty prefix should be declared explicitly ala <head id='' profile='...'> rather than by omitting the id attribute.

Second, we define a link relationship called classes that allows class names to denote URIs. A link element that uses this link relationship binds the prefix in its id attribute to the URI denoted by its href attribute. In the following example, the rel attribute refers to this classes link relationship, and the class attribute refers to the Rule class, described below.

<html xmlns="http://www.w3.org/1999/xhtml">
  <head id='rel' profile="http://www.w3.org/2000/07/hs78#">
    <title>example</title>
    <link id='c' rel='rel:classes' href='http://www.w3.org/2000/07/hs78#' />
  </head>
  <body>
    <dl class="c:Rule">
      ...
    </dl>
</html>

@@hmm... I'm using the same URI for three mechanisms here: (a) link relationship namespace mechanism, (b) a namespace for a link relationship, (c) a namespace for three classes. I should probably provide separate URIs for each of those, and define this one as implying all three.target-namespace (obsolete)

Declaring a hierarchy of classes

A div element bearing the global class name ClassTree declares a hierarchy of classes, one for each li element in the div element.

Here's an example from lists; note that we refer to the Seq class, but we declare the List class:

<div class="ClassTree">
<h2>Class hierarchy</h2>
<ul>
  <li><a href="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq">Seq</a>
   <ul>
    <li><b id="List">List</b> e.g. <em id="empty">empty</em></li>
   </ul>
  </li>
</ul>
</div>

Note that you can declare instances ala <em id="empty">empty</em> or <a href="..ref...">thatThing</a>. This markup is translated to the following RDF (see the whole file for details such as namespace declarations):

  <s:Class r:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq"
  s:label="Seq" />
  <s:Class r:ID="List" s:label="List">
    <s:subClassOf
    r:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq" />
  </s:Class>
  <r:Description r:ID="empty">
    <r:type r:resource="#List" />
  </r:Description>

@@TODO: unions, enumerated sets.

This direct translation of id attributes in HTML to id attributes in RDF relies on an assumption that the RDF will be made available at the same address as the HTML is available; i.e. they are variants of the same generic resource (in the sense of section 14.44 Vary in the HTTP specification; see also: Generic Resources) @@TODO: model this generic/variant relationship in RDF.

Declaring a property

An li element bearing the global class name Property declares a property whose URI and label are taken from the id attribute and content of the first element in the property. The domain and range of the property are taken from the first and second a elements in the li element, respectively, if present. A p element in the li is taken as a comment. For example:

  <li class="Property"><b id="first">first</b>:
    <a href="#List">List</a> ->    anything
    <p>first(l, x) = x is the first item in l</p>
  </li>

is transformed to:

  <r:Property r:ID="first" s:label="first" s:domain="#List">
    <s:comment>first(l, x) = x is the first item in l</s:comment>
  </r:Property>

You can link to the property (using <a href="...xyz">xyz</a>) as well as declaring it (using <b id="xyz">xyz</a>).

@@TODO: syntax for "facets", i.e. properties of properties; stuff like inverse, transitive, subproperty, etc.

Declaring an inference rule

A dl element bearing the global class name Rule declares an inference rule. It should have just one dt/dd pair: the dt is the conclusion of the rule, and the dd contains a list (ul) of premises (li elements). Each statement (i.e. premise or conclusion) is written as an element for the predicate, an element for the subject, and an element for the object. Each of the predicate, subject, and object elements is either an a or a var element. var elements represent variables, and a elements refer (by the URI reference in the href attribute) to constants. @@TODO: support for RDF literals, i.e. strings, using tt.

See the lists schema for examples.

@@TODO: support for n-ary relations.

Currently, the only HTML structure interpreted as a knowledge representation by WebKB is the definition list. Its use is similar to the frame-oriented CG notation with strings as type names

2.1.6.1 HTML structures in The WebKB set of tools by Philippe MARTIN

Appendix: Notes on amaya as an authoring tool:

Appendix: Style sheet

I developed a style sheet for use with these markup conventions. It only works with the empty prefix (i.e. class="ClassTree" , not class="my:ClassTree").

A general convention in this stylesheet is: underlined stuff is significant to the transformation to RDF (there are some exceptions: links in free text don't get transformed to RDF).

Appendix: Examples

template

aka stuff to revisit when I upgrade this transformation...


Dan Connolly
originally prepared for a meeting of 19-20 Jul 2000
$Revision: 1.35 $ of $Date: 2005/08/09 04:51:36 $ by $Author: connolly $