Extracting XML attributes for RDF using GRDDL

Extracting XML attributes for RDF using GRDDL

This is an XSLT 1.0 module, intended to be imported into GRDDL transforms, in order to produce appropriate xml:base and xml:lang attributes, on property elements and node elements in RDF/XML.

The namespace of this module is given by:


There are three named templates, one of which should be called when producing an element within the GRDDL output, which needs to have xml:lang, xml:base attributes or both added according to generally useful rules.

The three templates are:

may produce an xml:lang attribute.
may produce an xml:base attribute.
may produce either or both attributes.

When there is no information, the templates do not produce attributes.

Calling any of these templates, except when it is legal to use <xsl:attribute> elements, is an error.

Basic Usage

Each template has a parameter named "node". This can be omitted and defaults to the current() node.

In the simplest cases, these templates are called immediately after generating an element, which may have property attributes, textual content, or an rdf:about or rdf:resource attributes, that should be qualified with language and base information copied from the input document. As long as no parent or other ancestor element in the output has an xml:lang or xml:base attribute, then the named templates produce appropriate values, computed from some specific node of the input.

Some examples include: inline-rdf, extract-xlink1, and extract-xlink2.

Advanced Usage

Advanced xml:lang

In the case when a parent or other ancestor element in the output, has (possibly) had xml:lang or xml:base values defined, then greater care is needed.

In the xml:lang case, the template does not normally produce xml:lang="" attributes, so that if a parent has being given an xml:lang, since the child node, may have no language, the correct use, is to first set xml:lang to "", and then to call the named template, following the documentation for xsl:attribute , this will overwrite the value "".

As an example:

  <rdf:Description rdf:about="">
  <!-- copy attributes -->
    <xsl:copy-of select="@*" />
  <!-- redefine xml:lang, and xml:base -->
    <xsl:call-template name="xa:base-and-lang"/>
      <!-- inherited xml:lang may be inappropriate -->
      <rdfs:comment xml:lang="">
         <!-- compute appropriate xml:lang -->
         <xsl:call-template name="xa:lang"/>
         <xsl:value-of select="item/comment"/>
  <!-- copy other content -->
    <xsl:copy-of select="node()"/>

Advanced xml:base

For xml:base the situation is decidedly more complicated. This module is an XSLT1 module, and doesn't have the possibility of using the XPATH2 base-uri() function. Also, it doesn't use the W3C XSLT URI library. If either of these are available for a specific transform, they may provide better solutions.

As is, given a relative reference, this module is unable to resolve it to provide an absolute reference. This limits the applicability of this module, and in some cases, it simply gives up, with an xsl:message terminate="yes"

However, in many cases, a relative reference can simply be copied across to the output, and the RDF/XML parser will correctly resolve nested xml:bases.

In order to do this correctly, and to avoid copying a relative xml:base to a context in the output which has an inappropriate xml:base base on a parent or ancestor node, the transform author has to give greater guidance to this module.

The supported cases are when the parent or ancestor node has had its xml:base set using this module.

In this case, the node that was used to compute that xml:base (either as an explicit "node" parameter, or the then current() node), must be passed as a second parameter when computing the xml:base on the child node.

The xa:base and xa:base-and-lang have an additional optional parameter:

If present, this must be the "node" parameter passed to the xa:base or xa:base-and-lang template used on the closest ancestor, in the output, to the current output element that potentially defines an xml:base. This node, and its ancestors are ignored, when computing the correct value of xml:base. In the case when there is no parent element in the output which has already had an xml:base attribute generated, then the "node-used-by-ancestor" parameter is omitted. In some cases, failure to use this parameter may result in the same relative path being output in two nested places, which gives the incorrect result.

As an example, see the transform, trix2rdfxml. This is intended to apply to input such as: trix3, trix4, trix5, trix6 or trix7. In this:

the first rule
copies the xml:base attribute from a graph element;
the second rule
copies the xml:base attribute from the first child of the triple element, passing the graph element node in as the node-used-by-ancestor.
the third rule
copies the xml:base attribute from the third child of the triple element, passing the first child in as the node-used-by-ancestor.

trix7 is example input that the library module finds too difficult, because it does not resolve relative references.

Examination of the output will show that some of the xml:base attributes generated are unnecessary. A more careful, but more complex, transform, would avoid generating so many redundant values, in order to reduce the potential for input that has too many relative references, such as trix7.

Description of Implementation

The rules used by the templates are as follows:

First, for non-XHTML documents:

is produced according to the inheritance rules, applied to $node, as given in the XML Recommendation.
is produced according to the inheritance rules, applied to $node, as given in the XML Base Recommendation. However, an xml:base on the root element is ignored, since this will be handled by the GRDDL aware agent. In the advanced case, the correct set of ancestors to ignore, is computed using the "node-used-by-ancestor".

Second, for XHTML documents, in the default processing mode:

is produced according to the inheritance rules, applied to "node", as given in the XHTML Recommendation: in particular, if both xml:lang and lang attributes are present in the ancestors of "node", xml:lang is used. If only one is present, then that is used.
is not produced at all, and any xml:base attributes are (silently) ignored. xml:base is not supported in XHTML (see the XHTML Recommendation, in particular the DTD). An HTML Base element, if any, will be handled by the GRDDL aware agent. Note, that some derived DTDs, for example, those of An XHTML + MathML + SVG Profile and the Compound Document by Reference Framework , do support xml:base, and the default processing mode is not appropriate for these cases. Also, if the input of the transform is not intended to necessarily be DTD valid (for example, in the transform inline-rdf), then the default processing mode may not be appropriate.

Since, in some instances, as indicated above, a transform may wish to honour xml:base attributes within an XHTML document, it is possible to change the processing mode by setting the variable xa:use-xml-base-in-xhtml to true, e.g.

<xsl:variable name="xa:use-xml-base-in-xhtml"
               select="true()" />

In this case, the processing of xml:base in XHTML documents then follows this rule:

is produced according to the inheritance rules, applied to $node, as given in the XML Base Recommendation, this includes an xml:base on the root element which may not be handled by the GRDDL aware agent.

This is part of the standard library of GRDDL transforms.

This transform is copyright, 2007, W3C. It is available for use under the terms of the W3C Software License

$Id: xml-attributes.xsl,v 1.25 2007/08/09 10:58:15 jcarroll Exp $

http://www.w3.org/2003/g/xml-attributes Nested relative xml:base attributes are not supported. The relative base URI " " presented problems, maybe replace with an absolute URI. Logic error in http://www.w3.org/2003/g/xml-attributes Continuation error.