Base IRIs and GRDDL

Status: This is a draft document by Jeremy J. Carroll, for consideration by the GRDDL WG. It has no official status.

The Base IRI used for processing GRDDL Results

The GRDDL specification, says, @@@todo:link:

The base IRI for interpretting relative IRI references in a serialization of a graph produced by a GRDDL transformation is the @@@fixme [[[base]]] IRI of the source document.

This corresponds to RFC 3986, particularly section 5.1, which illustrates the identification of a base URI, with the following picture:

         .----------------------------------------------------------.
         |  .----------------------------------------------------.  |
         |  |  .----------------------------------------------.  |  |
         |  |  |  .----------------------------------------.  |  |  |
         |  |  |  |  .----------------------------------.  |  |  |  |
         |  |  |  |  |       <relative-reference>       |  |  |  |  |
         |  |  |  |  `----------------------------------'  |  |  |  |
         |  |  |  | (5.1.1) Base URI embedded in content   |  |  |  |
         |  |  |  `----------------------------------------'  |  |  |
         |  |  | (5.1.2) Base URI of the encapsulating entity |  |  |
         |  |  |         (message, representation, or none)   |  |  |
         |  |  `----------------------------------------------'  |  |
         |  | (5.1.3) URI used to retrieve the entity            |  |
         |  `----------------------------------------------------'  |
         | (5.1.4) Default Base URI (application-dependent)         |
         `----------------------------------------------------------'

During typical GRDDL processing, an intermediate RDF/XML serialization is produced as the output of a transform. To convert this serilization into an RDF graph, a base IRI parameter is often needed. To identify the appropriate base IRI, first check for a base URI embedded within this RDF/XML, following XML Base, as permitted by RDF Syntax. If there is no such base, then, because this serialization has an encapsulating entity, section 5.1.2 of RFC 3986 applies, and the base IRI of the original document is used as the base IRI of the serialization.

The original document may be an XHTML family document, or it may be some other XML document.

The Base IRI of an XHTML Family document

For an XHTML family document, the base IRI may be specified as the value of the href attribute of the <base> element (if any). This is in accordance with section 5.1.1 of RFC 3986.

In many other cases, section 5.1.2 does not apply, and section 5.1.3 does apply. Section 5.1.3 specifies the use of the retrieval IRI as the base IRI. Furthermore, section 5.1.3 of RFC 3986 specifies that:

if the retrieval was the result of a redirected request, the last URI used (i.e., the URI that resulted in the actual retrieval of the representation) is the base URI.

The resulting IRI is used as the base IRI parameter for processing the intermediate RDF/XML serialization.

The base IRI of other XML documents

Other XML documents may have used XML Base. This is only recommended when the specific document format permits the use of XML Base. Specifically, XML Base should not be used with XHTML family documents.

When an xml:base attribute is present on the root element of an XML document, this specifies the base IRI for that document, following section 5.1.1 of RFC 3986.

When there is no xml:base attribute on the root element, even if there is such an attribute on a descendent element, then section 5.1.1 of RFC 3986 does not apply.

As in the XHTML case, we then have to consider sections 5.1.2, 5.1.3 and 5.1.4 of RFC 3986.

Of these, sections 5.1.3 is the most common case, and the note about redirected retrieval also applies.

The base IRI in a processing pipeline

A GRDDL aware agent computes GRDDL results when

given a URI I of an information resource IR, and an XPath node N for a representation of IR

To use a GRDDL aware agent in a processing pipeline, as well as the XPath node N, it is also necessary to specify a corresponding IRI I. This is used as the base IRI when the other mechanisms do not apply. This corresponds to section 5.1.4 of RFC 3986. It is even possible for the default IRI used to bear no relationship with the XPath node N, but in such a case, we read:

As this definition is necessarily application-dependent, failing to define a base URI by using one of the other methods may result in the same content being interpreted differently by different types of applications.
A sender of a representation containing relative references is responsible for ensuring that a base URI for those references can be established.

Responsibilities for correct processing of base IRIs

Document authors, including profile and namespace documents

Document authors should, in general, include a base URI if the document is retrievable from some other URI.

For an XHTML family document, this is done using the base element, @@@link

.

For other XML documents, if the format supports xml:base then this should be used. In general, experience suggests that there is least confusion when this is done on the root element. Document authors may also use xml:base attributes elsewhere in their documents, as permitted by the document format, with semantics as defined by XML Base @@@ref.

For XML documents in formats that do not support xml:base, and are not XHTML family documents, there is no support in GRDDL for specifying an in-line base-URI.

When a profile or namespace document can be accessed via multiple URIs, for instance by a redirect, document authors should, in general, provide a GRDDL result that specifies profile transformations or namespace transformations for each of these URIs. The library function @@@{link, implement, document} glean-profile provides support for this @@@{not yet} via the rel="alternate-profile-URI" attribute.

GRDDL aware agents

When a GRDDL result represented in RDF/XML using the rule for RDF/XML, a base URI may be needed for this representation, in order to convert it into a RDF Graph, following the rules in the RDF/XML Syntax Specifcation @@@link.

GRDDL results represented in other ways, may also need a base URI.

Following the analysis above, this base URI is the base URI of the original document. This is computed by following section 5.1 of RFC 3986, as discussed above.

Following section 5.1.1, a base URI can be derived using the rules for the HTML base, for an XHTML family document, or from the value of xml:base on the root element for other XML documents.

In some cases, section 5.1.2 may apply: this case lies outside the scope of this advice.

In other cases, the retrieval URI should be used as the base URI, as in section 5.1.3 of RFC 3986, paying particular attention to the correct treatment of HTTP redirects.

If no retrieval URI is available, then an application default URI is used, as in section 5.1.4. In many applications of GRDDL, the possibility of GRDDL results that depend on the application default URI is highly undesirable; GRDDL aware agents may choose to treat this case as an error.

Note: a GRDDL aware agent ignores xml:base attributes that do not appear on the root element.

GRDDL transformation authors

In general, when writing a GRDDL transformation for an XHTML family document to RDF/XML the best advice is to ignore issues to do with the base URI. The easiest approach is to produce relative URIs in the output, corresponding to any relative URIs in the input, and absolute URIs corresponding to any concepts built into the transform. Such relative URIs will be resolved, during the processing performed by a GRDDL aware agent, against the correct base URI.

When writing a GRDDL transformation for an XML document format that does not support xml:base, and has no means to represent an in-line base URI, there is little choice but to ignore issues of the correct base.

When writing a GRDDL transformation for an XML document format, other than an XHTML family document, that does not support xml:base, but has some other means to represent an in-line base URI, then a GRDDL aware agent will be ignorant of this means, and a well-written GRDDL transformation will attempt to correct for this. When a base URI is specified in such a way, one approach is to insert the base URI into the RDF/XML output as the value of an xml:base attribute, so that the RDF/XML parser will resolve relative URIs against that base, and ignore the base URI passed by the GRDDL aware agent, which will have been computed ignoring the conventions specific to this format.

When writing a GRDDL transformation for an XML document format, that does support xml:base, then it must be remembered that a GRDDL aware agent has responsibility to handle an xml:base on the root element. If there is such an xml:base attribute, then the simplest behaviour for a GRDDL transformation, is to ignore it.

However, other xml:base attributes, not on the root element, are the responsibility of the transform, since the GRDDL aware agent ignores these. Thus, these lower level xml:base attributes should be honored, most easily by copying them into the output graph in the appropriate place. However, in general, xml:base attributes on ancestor nodes also have to be taken into account, unless there is an intervening xml:base attribute with an absolute URI as its value. This is clearly non-trivial to get right: to assist, the GRDDL library provides a module to be imported into your stylesheet, see below.

In all cases, while often unnecessary, if a transform is aware of an absolute base URI, specified in its input, for the whole document, it is never incorrect to use this base URI as the base URI for the output, for example, by adding an appropriate xml:base attribute to the rdf:RDF element.

Transforms that do this, need to guard against the possibly incorrect similar treatment of relative base URIs. For example a xml:base=".." on the root element, might, in the interaction between a correct GRDDL aware agent, and a poorly written transform, be applied twice, resulting in relative references being resolved at the wrong level in the directory hierarchy.

Library support

The library includes a module @@@{write, link test, etc.} xml-attributes, which is intended to be imported into other XSL transforms, such as glean-profile.

This library includes three named templates, that generate attributes in the XML namespace.

<xsl:template name="xattrs:base" xmlns:xattrs="http://www.s3.org/2003/g/xml-attributes">
This template generates an appropriate xml:base attribute.
<xsl:template name="xattrs:lang" xmlns:xattrs="http://www.s3.org/2003/g/xml-attributes">
This template generates an appropriate xml:lang attribute.
<xsl:template name="xattrs:both" xmlns:xattrs="http://www.s3.org/2003/g/xml-attributes">
This template generates appropriate xml:base and xml:lang attributes.

Each of these named templates may omit attributes when no relevant information is present in the input.

Each of these named templates has a single optional parameter node, defaulting to the current @@@{check terminology} node. The generated attributes are appropriate with respect to that node.

Calling any of these templates, except when it is legal to use <xsl:attribute> elements, is an error.

If your stylesheet is also generating other attributes for the current output node @@@@{fix terminology here}, then the ordering rules of XSLT should be born in mind, which determine whether the attributes generated by these templates, or other attributes generated by your stylesheet take precedence. The ordering rules are @@@@{what are they?}.

Examples of use of Library code

@@@todo


Valid XHTML 1.0 Transitional