Harvesting RDF Statements from XLinks

1 Introduction

The XLink specification [XLink] defines ways for XML documents to establish hyperlinks between resources. The Resource Description Framework specification [RDF] defines a framework for the provision of machine-understandable information about web resources.

Both XLink and RDF provide a way of asserting relations between resources. RDF is primarily for describing resources and their relations, while XLink is primarily for specifying and traversing hyperlinks. However, the overlap between the two is sufficient that a mapping from XLink links to statements in an RDF model can be defined. Such a mapping allows XLink elements to be harvested as a source of RDF statements. XLink links (hereafter, "links") thus provide an alternate syntax for RDF information that may be useful in some situations.

This Note specifies such a mapping, so that links can be harvested and RDF statements generated. The purpose of this harvesting is to create RDF models that, in some sense, represent the intent of the XML document. The purpose is not to represent the XLink structure in enough detail that a set of links could be round-tripped through an RDF model.

Readers of this Note are assumed to be familiar with [XLink] and [RDF]. Terms that are defined in those specifications will not be defined here. Readers should also be familiar with XML Base [XML Base]. Familiarity with the RDF Schema Candidate Recommendation [RDFSchema] will be necessary for those who wish to make use of the mappings provided here that use RDF Schema Classes.

1.1 Terminology

[Definition: The key words must, must not , required, shall, shall not , should, should not , recommended, may, and optional in this specification are to be interpreted as described in [IETF RFC 2119]. ]

Some special terms are defined here in order to clarify their relationship to similar terms used in the technologies on which the mapping is based. Refer to [XLink] and [RDF] for definitions of other technical terms used here.

[Definition: harvesting]: The process of generating RDF statements from XLink elements.
[Definition: Resource]: A "resource" is anything identified by a URI.
[Definition: Participating resource]: A resource that has been identified in a link to serve as a potential starting or ending point of traversal.

1.2 Notation and Document Conventions

The xlink: and rdf: prefixes are used throughout to stand for the declaration of the XLink and RDF namespaces, respectively, on elements in whose scope the so-marked element or attribute appears (on the same element or on some ancestor element), whether or not a namespace declaration is present in the example. The use of specific namespace prefixes is an editorial convienience; as dictated by the Names in XML Recommendation [XML-Names], any prefix may be used as long as the URI it maps to is the correct one.

2 Principles of the Mapping

Simple RDF statements are comprised of a subject, a predicate, and an object. The subject and predicate are identified by URI references, and the object may be a URI reference or a literal string. To map an XLink link into an RDF statement, we need to be able to determine the URI references of the subject and predicate. We must also be able to determine the object, be it a URI reference or a literal.

The general principle behind the mapping specified here is that each arc in a link gives rise to one RDF statement. The starting resource of the arc is mapped to the subject of the RDF statement. The ending resource of the arc is mapped to the object of the RDF statement. The arc role is mapped to the predicate of the RDF statement. However, a number of corner cases arise, described in 3 Mapping Specification.

RDF statements are typically collected together into "models." The details of how models are structured are implementation dependent. This Note assumes that harvested statements are added to "the current model," which is the model being constructed when the statement was harvested. But this Note, like [RDFSchema], does not specify exactly how models must be structured.

3 Mapping Specification

The following sections describe the mapping in detail.

3.1 Synthesizing XPointers

RDF is based on the use of URIs for identifying resources. In XLink, the linking element itself (in the case of a simple link) or a subelement of the linking element (in the case of an extended link) often serves as one of the participating resources in the link. This requires that we be able to define URI references that identify those linking elements. Those URI references must follow the XPointer specification.

Any legal XPointer that identifies the proper element is allowed. However, in order that different implementations harvest equivalent RDF statements from an XLink, the procedure in this section should be used when synthesizing XPointers for such linking elements. The general approach recommended is for the synthesized XPointer to do element-wise navigation down the tree to reach the linking element. The navigation begins at the nearest identified point in the tree.

More formally, the base of the synthesized URI reference shall be specified as defined in [XML Base].

The fragment identifier of the synthesized URI reference shall be delimited from the URI by the '#' character, as required by RFC 2396[RFC 2396]. The fragment identifier of the synthesized URI reference shall be an XPointer[XPTR].

The XPointer should follow the production:

Recommended Syntax for Synthesized XPointers

     XPointer    ::= Name
                   | ChildSeq

     ChildSeq    ::=  '/' [1-9] [0-9]* ('/' [1-9] [0-9]*)*
                   |  Name ('/' [1-9] [0-9]*)+

Note:

This is an edited version of the ChildSeq production, assuming that the production is updated in line with recent working group discussions. In the case of any differences between this document and the final XPointer specification, the XPointer specification's definition of ChildSeq should be followed.

The initial locator term of the XPointer should be an ID reference to the nearest ancestor of the linking element, including the linking element itself, that bears an attribute of type ID. If no such attribute exists on any ancestor of the linking element, the '/' character should be the first linking term, indicating that navigation proceeds from the document element.

As an example, consider a document that contains the following simple link:

In heavy trading, <org
  xlink:type='simple'
  xlink:href="http://www.foo.com/"
  xml:base="http://www.bar.com/report1"
  ID="com231"
>Foo Manufacturing</org> closed sharply lower...

The synthesized XPointer for this linking element is:

http://www.bar.com/report1#com231

3.2 Generating RDF Predicates

Unless stated otherwise, RDF statements are generated to represent the information for the arcs in a link. The value of the xlink:arcrole attribute, if one is given on an "arc"-type element or "simple"-type element, must be mapped to the predicate of the RDF statement. Note that the value of the xlink:arcrole attribute is already required, by the XLink specification, to be a URI reference.

If no xlink:arcrole attribute is specified, harvesting software should not generate an RDF statement. That is certainly the safest course. However, many XML files may not have arcrole attributes. Implementations that wish to attempt to harvest RDF statements from such files may map the element type of the linking element to the predicate of the RDF statement, as long as the element type is namespace qualified. This ensures that an an absolute URI reference can be constructed from the namespace URI and the local part. In this case the namespace name and the local part are concatenated to synthesize the absolute URI reference for the predicate. Implementations should examine the namespace URI to test if it ends in one of the URI characters '#', '?', or '/'. If it does then the namespace URI and the local part shall be concatenated using the simple approach documented in [RDF]. If the namespace URI does not end in such a character, implementations may create a URI reference by inserting a '#' character. Note however that such URI references must not be exchanged with external parties, as they are not guaranteed to actually exist.

3.3 Simple Linking Elements

If a simple link's xlink:arcrole attribute has the value "http://www.w3.org/1999/xlink/properties/linkbase", the link shall be harvested according to section 3.5 Linkbases. Otherwise the mapping defined in this section shall be used.

All simple links define zero or one traversal arcs. No traversal arc is specified if the xlink:href attribute is not specified. Therefore, harvesting software shall not generate an RDF statements if there is no xlink:href attribute in the link.

The starting resource of the simple link shall be mapped to the subject of the RDF statement. Note that the starting resource of a simple link is the linking element itself. Therefore, the harvesting software must create a URI reference that identifies the linking element, as defined in section 3.1 Synthesizing XPointers.

The predicate of the RDF statement is obtained from the simple Link as defined in 3.2 Generating RDF Predicates.

The ending resource of the simple link shall be mapped to the object of the RDF statement. Note that the ending resource of a simple link is always a URI reference, provided as the value of the xlink:href attribute.

If an xlink:role attribute is specified on the simple link, it shall result in an additional statement being added to the model. The subject of the statement is the ending resource of the simple link, its predicate is "rdf:type", and its object is the resource identified by the role attribute.

If an implementation wishes to use facilities defined in the RDF Schema specification [RDFSchema], it may add a second statement to the RDF model when an xlink:role attribute is specified. The subject of the second statement is the resource identified by the role attribute, its predicate is "rdf:type", and its object is the resource "rdfs:Class". The second statement should only be added to the model if an equivalent statement is not already part of the model.

An example of a simple linking element is:

... In a <x:extRef
  xlink:type="simple"
  xlink:href="http://www.foo.com/papers/crops.txt"
  xlink:arcrole="http://links.org/namespace/cite"
  xlink:role="http://links.org/namespace/screed"
>recent paper</x:extRef>, Dr. Taylor assumes that ...

Mapping that link according to this specification (and assuming it was the fourth child element within the third child element of the document) results in the RDF model shown below:

If the xlink:role attribute had not been specified, then the result would have been the RDF model shown below:

3.4 Extended XML Links

We first describe the rules for harvesting the components of an extended link (arcs, locators, and resources). Then we describe the rules for the extended link as a whole.

3.4.1 `arc`-Type Element

If an arc contains an xlink:arcrole attribute whose value is "http://www.w3.org/1999/xlink/properties/linkbase", it shall be harvested according to the procedure in section 3.5 Linkbases. Otherwise the procedures in this section shall be used.

XLink elements of the arc type use the xlink:to and xlink:from attributes to specify the endpoints of zero or more possible traversals by referencing, not URIs, but rather labels that have been defined in the xlink:label attributes of locator-type and resource-type elements.

The number of RDF statements harvested from a single arc-type element is equal to the number of possible traversals specified by that element. That quantity is the multiplicative product of the number of resource and/or locator elements identified by the xlink:to and xlink:from attributes. Each RDF statement will correspond to one and only one of the traversals.

The starting resources of the traversals shall be mapped to the subject of the RDF statement(s). The ending resources of the traversals shall be mapped to the object of the RDF statement(s). The predicate of the RDF statement is obtained as specified in 3.2 Generating RDF Predicates.

Note that any element content of an arc is not harvested.

3.4.2 `locator`-Type Element

Each XLink locator-type element gives rise to zero or more statements in the RDF model. The subject of all of those statements is the value of the xlink:href attribute of the locator, except as noted below.

If the locator element provides an xlink:role attribute, one additional statement shall be added to the model. The value of the locator's xlink:href attribute shall be mapped to the subject of the statement. The value of the xlink:role attribute shall be mapped to the object, and the predicate shall be "rdf:type". Harvesting software that uses the facilities of the RDF Schema specification may generate an additional statement whose subject is the value of the xlink:role attribute, whose predicate is "rdf:type" and whose object is "rdfs:Class". The second statement should not be added to the RDF model if an equivalent statement already exists in the model.

If the locator element provides an xlink:label attribute, an RDF statement is added to the model. The value of the href attribute shall be mapped to the subject of the statement. The predicate of the statement shall be xlink:label. The object of the statement shall be the value of the xlink:label attribute.

If the locator element provides an xlink:title attribute, an RDF statement shall be added to the model. The value of the xlink:href attribute shall be mapped to the subject of the statement. The predicate of the statement shall be "xlink:title". The object of the statement shall be the value of the title attribute.

If the resource element contains one or more title elements, they are harvested as described in section 3.4.4 title-Type Element.

3.4.3 `resource`-Type Element

Each XLink resource-type element gives rise to zero or more statements in the RDF model. Unless noted otherwise, the subject of all of those statements is the resource element itself, identified by an XPointer synthesized according to the procedure described in section 3.1 Synthesizing XPointers.

If the resource element provides an xlink:role attribute, one RDF statement shall be added to the model, and a second RDF statement may be added to the model. The subject of the first statement is the synthesized URI reference for the resource. The value of the xlink:role attribute is mapped to the object of the statement. The predicate of the statement is 'rdf:type'. A second statement may be added to the model if the software supports the RDF Schema specification [RDFSchema]. The value of the xlink:role attribute is mapped to the subject of the optional statement. The predicate of the statement is "rdf:type " and the object is "rdfs:Class". The second statement should not be added to the model if an identical statement already exists in the model.

If the resource element provides an xlink:label attribute, another RDF statement shall be added to the model. The subject of the statement is the synthesized URI reference for the resource. The predicate of the statement is "xlink:label". The object of the statement is the value of the label attribute.

If the resource element provides an xlink:title attribute, another RDF statement shall be added to the model. The subject of the statement is the synthesized URI reference for the resource. The predicate of the statement is "xlink:title". The object of the statement is the value of the title attribute.

If the resource element contains one or more title elements, they are harvested as described in section 3.4.4 title-Type Element.

3.4.4 `title`-Type Element

XLink title-type elements have an XLink-defined meaning only if they appear as a child element within an extended, locator, or resource element.

If an XLink extended-, locator-, or resource-type element contains one or more title-type elements, one RDF statement shall be added to the model for each title element. The subject of the statement shall be either the value of the xlink:href attribute (in the case of a locator element) or a synthesized XPointer identifying the extended or resource element. The predicate of the statement shall be xlink:title. The object of the statement shall be a synthesized XPointer identifying the title element. (Identifying the title element, rather than just its content, allows attributes such as xml:lang to be captured along with the title.)

Note:

Implementations may add a second RDF statement to the model for each title-type element. The object of the second statement shall be a synthesized XPointer identifying the title element. The predicate of the second statement shall be rdf:value. The object of the second statement shall be the content of the title element. (If the title element contains mixed content, the object is a string containing XML markup. The implementation's facilities for dealing with situations where the rdf:parseType attribute has the value "literal" will be needed.)

As an example, consider the following fragment of an extended link:

<annotation xlink:type='extended' ID='genid22'>
  <caption xlink:type='title' ID='genid23'><i>Recent</i> comments</caption>
  <link xlink:type='arc' ...

The RDF statements harvested from the title are shown below:

RDF model with 2 arcs, second one pointing at <i>Recent</i>

3.5 Linkbases

A linkbase is an XML document which functions like a database of links. A linkbase arc is an XLink element (simple- or arc-type) whose xlink:arcrole attribute takes the value of "http://www.w3.org/1999/xlink/properties/linkbase". The ending resource of a linkbase arc is a linkbase.

When harvesting software encounters a linkbase arc, it shall not generate an RDF statement for the arc. It should traverse the arc to retrieve the linkbase, and harvest the links from the linkbase to add to the current model using the methods specified in this Note.

Note:

Different applications might make different tradeoffs on depth of traversal in light of varying network conditions. This Note does not mandate specific behavior, but does recommend that all havesting applications attempt to obtain at least the immediately referenced linkbase.

4 References

XLink: Steve DeRose, Eve Maler, David Orchard, and Ben Trafford, editors. XML Linking Language (XLink) . World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xlink.)
XML-Names: Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML . Textuality, Hewlett-Packard, and Microsoft. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names.)
IETF RFC 2119: S. Bradner, editor. Key words for use in RFCs to Indicate Requirement Levels . March 1997. (See http://www.ietf.org/rfc/rfc2119.txt .)
RDF: Ora Lassila and Ralph Swick, editors. Resource Description Framework (RDF) Model and Syntax Specification . World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-rdf-syntax.)
XPTR: Ron Daniel, Steve DeRose, and Eve Maler, editors. XML Pointer Language (XPointer) V1.0 . World Wide Web Consortium, 1998. (See http://www.w3.org/TR/xptr.)
RFC 2396: RFC 2396: Uniform Resource Identifiers. Internet Engineering Task Force, 1995. (See http://www.ietf.org/rfc/rfc2396.txt.)
RDFSchema: Dan Brickley, R.V. Guha, editors. Resource Description Framework (RDF) Schema Specification 1.0 . World Wide Web Consortium, 2000. (See http://www.w3.org/TR/rdf-schema.)
XML Base: Jonathan Marsh, editor. XML Base (XBase) . World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xmlbase.)

A Implementing the Harvesting in XSLT

One way of harvesting RDF statements from XML documents that contain XLinks is through the use of XSLT. This appendix presents a simple example of such a harvester. It has a number of limitations. It does not generate the synthesized XPointers which are preferred for reasons of interoperability. Second, it uses a Java extension function for adding statements to an RDF storage manager. There is no standard API for such a storage manager. Accordingly, this appendix is provided as an example only. It does not specify any normative behavior.

The stylesheet itself is given in listing 1. Its operation is very simple. It looks for simple XLinks, stores the subject, object, and predicate in variables, then calls an extension function to add the RDF statement to some storage mechanism.


<!--

  Simple XSLT stylesheet to harvest RDF statements from simple
  XLinks. XLinks are detected and an extension function called
  to add RDF statements to an RDF repository. The extension function
  is a very simple mockup that just prints its arguments to stdout.

  Note that the repository is updated as a side effect of
  examining the document. While practical, this is somewhat at odds
  with the philosophy behind XSLT. Any application that actually
  cares about the order in which RDF statements are made is
  cautioned about using this approach.

  Credit where due: I got a head start on this by using Dan Connolly's 
  stylesheet that tried to turn XLinks into RDF's XML syntax.

  Ron Daniel Jr.
  rdaniel@metacode.com   2000-09-15
-->

<xsl:stylesheet 
    version="1.0"
    xmlns:xsl  ="http://www.w3.org/1999/XSL/Transform"
    xmlns:rdf  ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:rdfs ="http://www.w3.org/TR/rdf-schema/"

    xmlns:Radix="http://www.example.com/com.example.Radix"
>

  <!-- Some 'useful' declarations. -->
  <xsl:variable name='rdf-type'
	      select='"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"'/>
  <xsl:variable name='rdfs-class'
	      select='"http://www.w3.org/TR/rdf-schema#Class"'/>

  <xsl:output method="text"/>

  <!-- Look for simple links that are linkbase references, ignore them.
       (This rule is explicitly given a higher priority
        than the system-assigned priorities of the other rules so
        that it will match and discard linkbase references.)
  -->
  <xsl:template priority="2"
       match='*[@xlink:type="simple"][@xlink:arcrole=
               "http://www.w3.org/1999/xlink/properties/linkbase"]'>
      <!-- For now, do nothing. A more complete example could pull in
           the linkbase and harvest its content.
      -->
  </xsl:template>


  <!-- Process the simple links that are not linkbase references.
       Pull the various bits of info into variables and then call the
       extension function.
       (Make sure that we have an href specified. If not, there is no
       arc to add to the RDF model.)
  -->
  <xsl:template match='*[@xlink:type="simple"][@xlink:href]'>

    <!-- Subject - Synthesizing the XPointer in an interoperable way is
         left for more ambitious examples. For simplicity, just generate
         a unique ID.
    -->
    <xsl:variable name='subject'
		  select='concat("#xpointer-for-", generate-id())'/>

    <!-- Predicate name comes from arcrole (preferred) or element type
	 (allowed). Look for arcrole, but if it doesn't exist, use
	 element name. (Note that we have to concatenate the namespace
         URI and the local name according to the RDF spec to make a URI
         reference. This should really test to see if namespace URI
         ends with #, /, or ?)
     -->
    <xsl:variable name='predicate'>
      <xsl:choose>
        <!-- Get arcrole attribute if possible. -->
	<xsl:when test='@xlink:arcrole'>
	  <xsl:value-of select='@xlink:arcrole'/>
	</xsl:when>
        <!-- If no arcrole, use element type as long as there is a
             namespace URI so it can be made into a URI reference.
        -->
	<xsl:when test='namespace-uri()'>
	  <xsl:value-of select='concat(namespace-uri() , name())'/>
	</xsl:when>
      </xsl:choose>
    </xsl:variable>

    <xsl:variable name='object' select='string(@xlink:href)'/>
    <xsl:variable name='objType' select='string(@xlink:role)'/>

    <!--  Here it is - the main call to add a statement to the
          RDF database.
    -->
    <xsl:if test='$predicate'>
      <xsl:value-of select='Radix:addStatement($subject,
	       string($predicate), $object)'/>
    </xsl:if>

    <!-- Additional call if xlink:role specified. (We rely on the
         underlying RDF storage implementation to deal with possible
         multiple additions of the  rdf:type(objType, rdfs:Class)
         statement).
    -->
    <xsl:if test='$objType'>
      <xsl:value-of select='Radix:addStatement($object,
	       $rdf-type, $objType)'/>
      <xsl:value-of select='Radix:addStatement($objType,
	       $rdf-type, $rdfs-class)'/>
    </xsl:if>
  </xsl:template>


  <!-- don't pass text thru -->
  <xsl:template match="text()|@*">
  </xsl:template>

</xsl:stylesheet>

The stylesheet makes use of an extension function, Radix:addStatement(subject, predicate, object), which would actually add the statement to an RDF storage manager. For demonstration purposes, a dummy implementation of that extension function is given in listing 2. This was tested using the Saxon implementation of XSLT. Other implementations may have different conventions for the use of extension functions.


package com.example;
/** Simple demo of a RDF interface. This one is trivial, it has
  * one call that lets statements be added.
  */
public class Radix
{

  /** Mockup of a routine to add RDF statements to a model being
    * constructed. Prints the subject, predicate, and object on 
    * different lines with progressive indention to make it easy
    * to read.
    */
  public static String
  addStatement(String subject, String predicate, String object)
  {
    System.out.println(predicate + "\n  " + subject + "\n    " + object);
    return "";
  }
}

Harvesting RDF Statements from XLinks

W3C Note 29 September 2000

Abstract

Status of This Document

Table of Contents

Appendix

1 Introduction

1.1 Terminology

1.2 Notation and Document Conventions

2 Principles of the Mapping

3 Mapping Specification

3.1 Synthesizing XPointers

Recommended Syntax for Synthesized XPointers

3.2 Generating RDF Predicates

3.3 Simple Linking Elements

3.4 Extended XML Links

3.4.1 `arc`-Type Element

3.4.2 `locator`-Type Element

3.4.3 `resource`-Type Element

3.4.4 `title`-Type Element

3.5 Linkbases

4 References

A Implementing the Harvesting in XSLT