RDAL - RDf Annotations Language

Status

This document shows an example run of an experimental schema annotation system. None of this document, nor the implementation, nor the schema annotation conventions are endorsed by the membership of W3C.

Abstract

There is, of course, a large amount of data available in non-RDF XML that would be useful to the semantic web. One can create XSLT templates to convert this data to an RDFXML idiom, but that is tedious, error prone, and not tied to the specification of the XML grammar. RDAL is an annotation convention to express semantic actions for productions described in an XML grammar. This implementation uses RelaxNG Compact Syntax as the annotation language and calls functions in a library for expressing RDF triples in the ntriples syntax.

Note: RDAL is not confined to the making RDF statements, it is merely the test scenerio. The XQuery in the annotations may call other APIs besides RdfXS.

Table of Contents

HR Document Example

We have some colloquial XML document that expresses some information for a human resources department. This includes employee names and addresses and department affiliations.

<per:Personel xmlns:per="http://example.com/Personel"
	      xmlns:addr="http://example.com/Address">
  <per:Person per:ID="bsmith">
    <per:given>Bob</per:given>
    <per:family>Smith</per:family>
    <per:email>bsmith@example.com</per:email>
    <per:addr per:href="#bsmith_addr"/>
  </per:Person>
  ...
  <per:Departement>
    <per:name>R-n-D</per:name>
    <per:manager per:href="#bsmith"/>
    ...
    <per:location per:href="#rnd_addr"/>
  </per:Departement>
  <addr:Address per:ID="rnd_addr">1 king street</addr:Address>
  <addr:Address per:ID="bsmith_addr">123 elm street</addr:Address>
  ...
</per:Personel>

We have a graph in mind and want some RDF triples out of this document, like:

<http://localhost/#bsmith> <http://xmlns.com/foaf/0.1/givenname> "Bob" .
<http://localhost/#bsmith> <http://xmlns.com/foaf/0.1/familyname> "Smith" .
<http://localhost/#bsmith> <http://xmlns.com/foaf/0.1/email> <mailto:bsmith@example.com> .
<http://localhost/#bsmith> <http://example.com/HR#addr> <http://www.w3.org/2004/02/03-rdal/HR.xml#bsmith_addr> .

HR Annotations

This example shows the annotations added to a RelaxNG Compact Syntax schema for the HR schema.

Style Key

orig
original schema
rdal
RDAL schema annotations
xquery
XQuery content
RdfXS
RdfXS API
comment
rnc comment

HR-rdal.rnc

# default namespace = "http://example.com/Personel"
namespace per = "http://example.com/Personel"
namespace addr = "http://example.com/Address"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace a = "http://www.w3.org/2002/12/26-XMLgrammer2RDFdb/annot#"
namespace foaf = "http://xmlns.com/foaf/0.1/"
namespace r = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

start = doc

doc = 
  Personel
  # Function declarations.
  >> a:prototype ["declare function addTriple($predicate, $subject, $object) \x{a}" ~ 
                  "declare function addTriple_lit($predicate, $subject, $object) \x{a}" ~ 
                  "declare function addTriple_ref($predicate, $subject, $object, $baseUri) \x{a}" ~ 
                  "declare function error($hint, $expected)"]
  # All the predicates are here for easy maintenance.
  >> a:globals ["declare global $baseURI:='-- need base URI parameter --' \x{a}" ~ 
                "declare global $foaf:='http://xmlns.com/foaf/0.1/' \x{a}" ~ 
                "declare global $hr:='http://example.com/HR#' \x{a}" ~ 
                "declare global $addr:='http://example.com/Addr#' \x{a}" ~ 
                "declare global $Given:=concat('&lt;', $foaf, 'givenname&gt;') \x{a}" ~ 
                "declare global $Family:=concat('&lt;', $foaf, 'familyname&gt;') \x{a}" ~ 
                "declare global $Email:=concat('&lt;', $foaf, 'email&gt;') \x{a}" ~ 
                "declare global $Addr:=concat('&lt;', $hr, 'addr&gt;') \x{a}" ~ 
                "declare global $Manager:=concat('&lt;', $hr, 'manager&gt;') \x{a}" ~ 
                "declare global $Grunt:=concat('&lt;', $hr, 'grunt&gt;') \x{a}" ~ 
                "declare global $Location:=concat('&lt;', $hr, 'location&gt;') \x{a}" ~ 
                "declare global $StreetAddr:=concat('&lt;', $addr, 'streetAddr&gt;')"]

Personel = 
  element per:Personel {
     PersonelElts+
  }

PersonelElts = 
  Person | 
  Department | 
  Address

Person = 
  element per:Person {
      # The subject of the following triples comes from the @per:ID.
      [a:assignment["let $subject:=concat('&lt;', $baseURI, '#', @per:ID, '&gt;')"]]attribute per:ID { xsd:NMTOKEN },
      element per:given { [a:action["addTriple_lit($Given, $subject, text())"]]Name },
      element per:family { [a:action["addTriple_lit($Family, $subject, text())"]]Name },
      element per:email { [a:action["addTriple($Email, $subject, concat('&lt;mailto:', text(), '&gt;'))"]]text },
      element per:addr {
        [a:action["addTriple_ref($Addr, $subject, @per:href, $baseURI)"]]attribute per:href { Ref }
      }
    }

Department = 
  element per:Departement {
      [a:assignment["let $subject:=concat('&lt;', $baseURI, '#', per:name/text(), '&gt;')"]]element per:name { text },
      element per:manager {
        [a:action["addTriple_ref($Manager, $subject, @per:href, $baseURI)"]]attribute per:href { Ref }
      },
      element per:grunt {
        [a:action["addTriple_ref($Grunt, $subject, @per:href, $baseURI)"]]attribute per:href { Ref }
      }+,
      element per:location {
        [a:action["addTriple_ref($Location, $subject, @per:href, $baseURI)"]]attribute per:href { Ref }
      }
    }

Address = 
  element addr:Address {
    attribute per:ID { xsd:NMTOKEN },
    text
} >> a:action ["addTriple_lit($StreetAddr, concat('&lt;', $baseURI, '#', @per:ID, '&gt;'), text())"]

Ref =
  string
# text
# xsd:NCName

Name =
  text

Invoked Commands

A RelaxNG schema, add some RDAL annotations, run rngSerializer -m xsl and get an XSLT stylesheet. Run that on some instance document and get ntriples.

./rngSerializer -m xsl ../test/rng/HR-rdal.rnc -l RdfXStoNTriple.xsl> HR.xsl
xsltproc --stringparam baseURI http://www.w3.org/2004/02/03-rdal/HR.xml HR.xsl ../test/rng/HR.xml > HR.ntriple

To get a graph image:

./rngSerializer -m xsl ../test/rng/HR-rdal.rnc -l RdfXStoDot.xsl> HRdot.xsl
xsltproc --stringparam baseURI http://www.w3.org/2004/02/03-rdal/HR.xml HRdot.xsl ../test/rng/HR.xml > HR-img.dot
# Impose some namespaces.
perl -pi -e "s|http://example.com/HR#|hr:|g" HR-img.dot
perl -pi -e "s|http://example.com/Addr#|addr:|g" HR-img.dot
perl -pi -e "s|http://xmlns.com/foaf/0.1/|foaf:|g" HR-img.dot
perl -pi -e "s|http://www.w3.org/2004/02/03-rdal/HR.xml#|doc:|g" HR-img.dot
dot -T png -o HR-img.png HR-img.dot

Process

rngSerializer parses the RNC into a SchemaValidationCompileTree. It then calls toXsl on the objects in this tree. These generate a set of XSLT templates that traverse the entire grammar, validating the input document against the grammar. Another component, the Rdal handler, gets callbacks for everything in the rdal namespace. It adds output text to the templates. Basically, rngSerializer is a small tool that connects the output of a RelaxNG parser to the Rdal handler and prints the output.

Given a schema with no annotations, rngSerializer will yield a stylesheet that validates the input document, but produces no output. This is comparable to the step of hand-generating XSLT minus output text.

Next Steps

Order enforcement

RelaxNG describes the production for an element or attribute in terms of a name class defining allowable names for the element or attribute, and pattern defining the content model. The content model is a set of elements, attributes, text and any productions wrapped in some logic describing series and mutual exclusion (and, in the case of attributes and text, allowed values). Let's take the simple case with a root R containing two elements A and B:

start = element R {
  element A {}, 
  element B {}
}

The order can be enforced with XSLT:

<!-- root: . A,B<END> -->
<xsl:call-template name="AB_A" select="/">
  <xsl:with-param name="__INDEX" select="'0'"/>
</xsl:call-template>

<xsl:template name="AB_A">
  <xsl:with-param name="__INDEX"/>

  <xsl:for-each select="*[$__INDEX]">
    <xsl:choose>
      <xsl:when test="self::A">
        <!-- A: . <END> -->
        <xsl:call-template name="A_END">
          <xsl:with-param name="__INDEX" select="'0'"/>
        </xsl:call-template>

        <!-- root: A, . B<END> -->
        <xsl:call-template name="AB_B" select="..">
          <xsl:with-param name="__INDEX" select="$__INDEX+1"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="error"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:template>

<xsl:template name="AB_B">
  <xsl:with-param name="__INDEX"/>

  <xsl:for-each select="*[$__INDEX]">
    <xsl:choose>
      <xsl:when test="self::B">
        <!-- B: . <END> -->
        <xsl:call-template name="_END">
          <xsl:with-param name="__INDEX" select="'0'"/>
        </xsl:call-template>

        <!-- root: A,B . <END> -->
        <xsl:call-template name="_END" select="..">
          <xsl:with-param name="__INDEX" select="$__INDEX+1"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="error"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:template>

<xsl:template name="_END">
  <xsl:with-param name="__INDEX"/>

  <!-- ... . <END> -->
  <xsl:for-each select="*[$__INDEX]">
    <xsl:call-template name="error"/>
  </xsl:for-each>
</xsl:template>

Template names (AB_A, AB_B, ...) are helpful for the observer, but will need to be made unique to prevent name collions in some grammars (even in XML DTDs, two elements may have the same content model).

I think this step will allow XSLT to represent the DFA coming from a RelaxNG schema and thus offer complete validation. This will also make the semantic actions dispatch provably reliable.

Other Schema Annotation Systems

schema_extraction
Mark Nottingham's early proposal based on W3C XML Schema, described in this text from MNot.
Annotating Schemas for the Semantic Web
Joseph Reagle's proposal and implementation for extracting triples from schemas following normal forms.
Combining RDF and XML Schemas to Enhance Interoperability Between Metadata Application Profiles
Assert RDF Schema constraints within W3C XML Schema annotations. presented at WWW10
Using RDAL to Express RDF Database API Calls in an XML Grammer Language
an XML grammer language for making RDF database API calls

Bugs

XQuery grammar incorrect:
Rather than implement a full XQuery parser, I took a stab at a very terse subset of XQuery. In the process, I didn't get real paths working so @foo/ns.bar will give you a parser exception. Conseqnently, some the triples are incorrect.

Valid XHTML 1.0!

$Id: Overview.html,v 1.14 2004/06/25 09:37:40 eric Exp $

Eric Prud'hommeaux