This document describes a XSLT stylesheet that transforms application/xml+rdf to a series of RDF database API calls. Further, it describes a schema annotation system for generating that XSLT, as well as other grammar-defined applications. This research is used in RDAL.


This document represents some experiments in using XSLT to parse RDF. This is not endorsed by the W3C membership.

Table of Contents



This basic RDF API has four syntactically differentiated datatypes (no constructors for them now):

and five functions:

where the parameters $predicate, $subject, $object should all be interned from the database atom dictionary.

RdfXS Todo

It seems practical to add the following constructors:

intern a URI string into the database internal representation
get a bnode object from the database object dictionary
literal(string, [{prefix1, ns1}, ...])
XML-encoded literal data

The expressivity of XSLT limits variable assignment to an awkward construction of segmenting a template and passing all of the state into the new second segment of the template. In a terse syntax, this looks roughly like:

  # Call typedNode with a predicate and a subject.
  call-template typedNode_0(predicate="p1", subject="s1")

# Template for typedNode production.
template typedNode_0 (predicate, subject)
  if (@r:about) call-template typedNode_1(predicate, subject, object=uri(@r:about))
  if (@r:ID) call-template typedNode_1(predicate, subject, object=uri(@r:ID, baseUri))
  if (@r:nodeID) call-template typedNode_1(predicate, subject, object=bnode(@r:nodeID))
  call-template typedNode_1(predicate, subject, object=bnode(generate-id(.)))

# Chained typedNode production with object variable set.
template typedNode_1 (predicate, subject, object)
  # Continue the typedNode template with object set.

To really have uri, bnode and literal be templates would require a version of each template for each possible set of parameters passed to the next template. Yeah, right. Perhaps it will be easy to implement them as sort of a macro that gets expanded when writing the XSLT.


This language is called XQ-like because it is similar in syntax to XQuery, and even shares some semantics like variable assignment, XPath node access... It is intended to use a very small subset of XQuery. That subset currently excludes access to parent and child nodes (apart from the attribute nodes that are children of their containing element) in order to make the SAX event handlers simple. It will be possible to write handlers that track state for access to XPath nodes during other events, but that seemed like work so I punted. (Some of this has to be done to distinguish productions which differ in their nested elements.)

Bits from XPath


Bits from XQuery

let $f:=
declare function foo

Fear, Uncertainty, Doubt

Generating the collection template is going to be hard. I fear it. I rue the day.

RDF is unordered by default. The parseType="Collection" attribute is used to specify ordered, closed (thoroughly enumerated) sets in RDF. As an example

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  <rdf:Description rdf:about="http://example.org/basket">
    <ex:hasFruit rdf:parseType="Collection">
      <rdf:Description rdf:about="http://example.org/banana"/>
      <rdf:Description rdf:about="http://example.org/apple"/>
      <rdf:Description rdf:about="http://example.org/pear"/>

states that the basket has exactly the set of (banana, apple, pear). This is represented by the graph graphviz-generated from 7 triples. This nil node indicates the end of the list (and keeps anyone from adding to the closed list).

The tricky part is adding the arc to the nil node as the rest of the last element. RDFXMLtoRdfXS currently uses the test ./*[$index+1] to find the last element in a collection and has some conditional code to stich the earlier elements together. This breaks the easy mapping to SAX handlers. Guess this will require some of the state-tracking hander alluded to in XQ-like.

Parsing Collections with XSLT

The hand-coded stylesheet uses a recursive template to walk through children (members of the collection):

  if (@parseType = 'Collection')
    addTriple(predicate, subject, bnode(.))
    collection_r(subject, 1)

  collection_r(subject, index)
    for-each select="./*[$index]"
      typedNode_0(r:first, subject)

    if (./*[$index+1])
      addTriple(r:rest, subject, bnode(.))
      collection_r(bnode(.), index+1)
      addTriple(r:rest, subject, r:nil)


So far, I've only tested the hand-generated XSLT on a few RDF tests:

name input output problems
kitchen sink test test.rdf test.ntriple needs XSLT for c14n
attribute testAttr.rdf testAttr.ntriple
literal literal.rdf literal.ntriple

The current machine-generated XSLT shows is much more rigourous, though not actually functional. Features:

attribute test
Test that all attributes are allowed with a given production.
production selection -- not working
I actually have a version of this that's a few hours closer to working, if I can recover my disk drive.
template chaining -- not generalized
This is needed for productions that have assignments while staying in the same production , specifically, parseType="Collection".
semantic actions dispatch -- not started
Provide variable assignment and calls to templates describe with function declarations. This will lean on template-chaining for variable assignment.

Revision History

The bulk of the work is in the RDFXMLtoRdfXS.xsl script so versions track it's CVS version:

added startDocument and endDocument functions to the API.
wrote a dot presentation.
separated ntriple presentation via the RdfXS API.
imported from rdfToDB and output ntriples. Added support for parseType="Collection" and "Literal".

This work started with rdfToDB.xsl, which is no longer being maintained:

added parseType="Literal" support
s/GENID/genid/ for consistency
import of Evan Lenz's work

Valid XHTML 1.0!

Eric Prud'hommeaux
Last modified: Sat Jan 3 04:40:19 EST 2004