W3C Logo

XML Base

W3C Working Draft 07-June-2000

This version:
Latest version:
Previous version:
Jonathan Marsh (Microsoft) <jmarsh@microsoft.com>


This document proposes a facility, similar to that of HTML BASE, for defining base URIs for parts of XML documents.

Status of this document

The XML Linking Working Group, with this 07 June 2000 XML Base second Last Call working draft, invites comment on this specification. Because of the number and nature of comments received, a second Last Call Working Draft is considered prudent. See the XML Base Disposition of Comments. The Last Call period begins 7 June and ends 28 June 2000.

The W3C Membership and other interested parties are invited to review the specification and report early implementation experience. Please send comments to www-xml-linking-comments@w3.org (archive).

While we welcome implementation experience reports, the XML Linking Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.

For background on this work, please see the XML Activity Statement.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Table of Contents

1. Introduction
2. Terminology
3. xml:base Attribute
4. Resolving Relative URIs
  4.1. URI Reference Encoding and Escaping


A. References
B. References (Non-normative)
C. Impacts on Other Standards (Non-normative)

1. Introduction

The XML Linking Language [XLink] defines XML constructs to describe links between resources. One of the stated requirements on XLink is to support HTML [HTML 4.01] linking constructs in a generic way. The HTML BASE element is one such construct which the XLink Working Group has considered. BASE allows authors to explicitly specify a document's base URI for the purpose of resolving relative URIs in links to external images, applets, form-processing programs, style sheets, and so on.

This document describes a mechanism for providing base URI services to XLink, but as a modular specification so that other XML applications benefiting from additional control over relative URIs but not built upon XLink can also make use of it. The syntax consists of a single XML attribute named xml:base.

The deployment of XML Base is through normative references by new specifications, for example XLink and the XML Infoset. Applications and specifications built upon these technologies will natively support XML Base.

2. Terminology

[Definition: ] The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [IETF RFC 2119].

The terms base URI and relative URI are used in this spec as they are defined in [IETF RFC 2396].

3. xml:base Attribute

The attribute xml:base may be inserted in XML documents to specify a base URI other than the base URI of the document or external entity. The value of this attribute is interpreted as a URI Reference as defined in RFC 2396 [IETF RFC 2396], after processing according to Section 4.1.

In namespace-aware XML processors, the "xml" prefix is bound to the namespace name http://www.w3.org/XML/1998/namespace as described in Namespaces in XML [XML Names]. Note that xml:base can be still used by non-namespace-aware processors.

An example of xml:base in a simple XHTML document follows.

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml"
    <title>Virtual Library</title>
    <p>See <a href="new.xml">what's new</a>!</p>
    <p>Check out the hot picks of the day!</p>
    <ol xml:base="/hotpicks/">
      <li><a href="pick1.xml">Hot Pick #1</a></li>
      <li><a href="pick2.xml">Hot Pick #2</a></li>
      <li><a href="pick3.xml">Hot Pick #3</a></li>

The URIs in this example resolve to full URIs thus:

4. Resolving Relative URIs

Relative URIs appearing in an XML document are resolved against xml:base attributes as follows:

In each case, if no ancestor with an xml:base attribute is found, the base is determined by applying RFC 2396 [IETF RFC 2396] which is summarized as follows (highest priority to lowest):

  1. The base URI is determined by the scope of xml:base attributes within the current XML entity, as described above.

  2. The base URI is that of the encapsulating entity (message, document, or none).

  3. The base URI is that of the URI used to retrieve the entity.

  4. The base URI is defined by the context of the application.

The scope of xml:base does not extend into external entities, but it does extend into internal entities.

NOTE:The presence of xml:base attributes might lead to unexpected results in the case where the attribute value is provided, not directly in the XML document entity, but via a default attribute declared in an external entity. Such declarations might not be read by software which is based on a non-validating XML processor. Many XML applications fail to require validating processors. For correct operation with such applications, xml:base values should be provided either directly or via default attributes declared in the internal subset of the DTD.

4.1. URI Reference Encoding and Escaping

The set of characters allowed in xml:base attributes is the same as for XML, namely [Unicode]. However, some Unicode characters are disallowed from URI references, and thus processors must encode and escape these characters to obtain a valid URI reference from the attribute value.

The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the crosshatch (#) and percent sign (%) characters and the square bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped as follows:

  1. Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or more bytes.

  2. Any octets corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).

  3. The original character is replaced by the resulting character sequence.


A. References

RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. Internet Engineering Task Force, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
RFC 2279: UTF-8, a transformation format of ISO 10646. Internet Engineering Task Force, 1998. (See http://www.ietf.org/rfc/rfc2279.txt.)
RFC 2396: Uniform Resource Identifiers. Internet Engineering Task Force, 1995. (See http://www.ietf.org/rfc/rfc2396.txt.)
RFC 2732: Format for Literal IPv6 Addresses in URL's. Internet Engineering Task Force, 1999. (See http://www.ietf.org/rfc/rfc2732.txt.)
The Unicode Consortium. The Unicode Standard.(See http://www.unicode.org/unicode/standard/standard.html.)
Tim Bray, Jean Paoli, and C.M. Sperberg-McQueen, editors. Extensible Markup Language (XML) 1.0. World Wide Web Consortium, 1998. (See http://www.w3.org/TR/REC-xml.)
XML Names
Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. Textuality, Hewlett-Packard, and Microsoft. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names/.)

B. References (Non-Normative)

HTML 4.01
Dave Raggett, Arnaud Le Hors, Ian Jacobs, editors. HTML 4.01 Specification. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/html4/.)
Steven Pemberton, et. al. XHTML(TM) 1.0: The Extensible HyperText Markup Language. World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xhtml1/.)
Steve DeRose, Eve Maler, David Orchard, and Ben Trafford, editors. XML Linking Language (XLink). World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xlink/.)
XML Datatypes
Paul V. Biron, Ashok Malhotra, editors. XML Schema Part 2: Datatypes. World Wide Web Consortium Working Draft. (See http://www.w3.org/TR/xmlschema-2/.)
XML Infoset
John Cowan and David Megginson, editors. XML Information Set. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xml-infoset.)

C. Impacts on Other Standards (Non-Normative)