The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.
Copyright © 2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document proposes describes a facility, similar to that of HTML BASE, for defining base URIs for parts of XML documents.
This document is an editors' copy that has no official standing.
This is a public Editor's Draft, published to encourage review of the proposed changes to this document. Since the changes are in the nature of corrections for errata, after public review and possible further modifications in light of comments received, the group expects to request publication as a PER.
This revision has been produced by the W3C XML Core Working Group as part of the XML Activity in the W3C Architecture Domain. For background on this work, please see the XML Activity Statement.
Please send comments on this document to the public email list www-xml-linking-comments@w3.org (archive at http://lists.w3.org/Archives/Public/www-xml-linking-comments/). Please indicate in the subject line that your comment refers to XML Base.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
1 Editor's Notes
2 Introduction
3 Terminology
4 xml:base Attribute
4.1 URI Reference Encoding and Escaping
5 Resolving Relative URIs
5.1 Relation to RFC 3986
5.2 Granularity of base URI information
5.3 Matching URIs with base URIs
5.4 Interpretation of same-document references
6 Conformance
A References
B References (Non-Normative)
C Impacts on Other Standards (Non-Normative)
The changes planned in this revision are:
switch the definition of URI reference from RFC2396 to 3986;
redescribe xml:base attributes as XML resource identifiers (a new term introduced in XLink 1.1);
encourage implementations to return base "URIs" without escaping non-URI characters;
clarify the meaning of xml:base="" and xml:base="#frag";
incorporate the existing errata (see http://www.w3.org/2001/06/xmlbase-errata).
The XML Linking Language [XLink] defines Extensible Markup Language (XML) 1.0 [XML] constructs to describe links between resources. One of the stated requirements on XLink is to support HTML [HTML 4.01] linking constructs in a generic way. The HTML BASE element is one such construct which the XLink Working Group has considered. BASE allows authors to explicitly specify a document's base URI for the purpose of resolving relative URIs in links to external images, applets, form-processing programs, style sheets, and so on.
This document describes a mechanism for providing base URI services to XLink, but
as a modular specification so that other XML applications benefiting from additional
control over relative URIs but not built upon XLink can also make use of it. The
syntax consists of a single XML attribute named xml:base
.
The deployment of XML Base is through normative reference by new
specifications, for example XLink and the XML Infoset. Applications
and specifications built upon these new technologies will natively
support XML Base. The behavior of xml:base
attributes
in applications based on specifications that do not have direct or
indirect normative reference to XML Base is undefined.
[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119].]
The terms base URI and relative URI are used in this specification as they are defined in [RFC 3986].
xml:base
AttributeThe attribute xml:base
may be inserted in XML documents to
specify a base URI other than the base URI of the document or external
entity. The value of this attribute is interpreted as
a URI Reference as
defined in RFC 2396 [RFC2396], after processing
according to Section 3.1
an XML Resource Identifier
as defined in XLink 1.1 [XLink11]
.
In namespace-aware XML processors, the "xml" prefix is bound to the namespace
name http://www.w3.org/XML/1998/namespace
as described in Namespaces in XML
[XML Names]. Note that xml:base
can be still used by
non-namespace-aware processors.
An example of xml:base
in a simple document containing
XLinks follows. XLink normatively references XML Base for interpretation
of relative URI references in xlink:href
attributes.
<?xml version="1.0"?> <doc xml:base="http://example.org/today/" xmlns:xlink="http://www.w3.org/1999/xlink"> <head> <title>Virtual Library</title> </head> <body> <paragraph>See <link xlink:type="simple" xlink:href="new.xml">what's new</link>!</paragraph> <paragraph>Check out the hot picks of the day!</paragraph> <olist xml:base="/hotpicks/"> <item> <link xlink:type="simple" xlink:href="pick1.xml">Hot Pick #1</link> </item> <item> <link xlink:type="simple" xlink:href="pick2.xml">Hot Pick #2</link> </item> <item> <link xlink:type="simple" xlink:href="pick3.xml">Hot Pick #3</link> </item> </olist> </body> </doc>
The URIs in this example resolve to full URIs as follows:
"what's new" resolves to the URI "http://example.org/today/new.xml"
"Hot Pick #1" resolves to the URI "http://example.org/hotpicks/pick1.xml"
"Hot Pick #2" resolves to the URI "http://example.org/hotpicks/pick2.xml"
"Hot Pick #3" resolves to the URI "http://example.org/hotpicks/pick3.xml"
The set of characters allowed in xml:base
attributes
is the same as for XML, namely [Unicode]. However, some
Unicode characters are disallowed from URI references, and thus
processors must encode and escape these
characters to obtain a valid URI reference from the attribute value.
The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [RFC2396], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC 2732]. Disallowed characters must be escaped as follows:
Each disallowed character is converted to UTF-8 [RFC 2279] as one or more bytes.
Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.
The value of an xml:base
attribute is an XML Resource Identifier,
and may contain characters not allowed in URIs. These characters must be escaped
as described in
[XLink11]
before the value is used for retrieval of a resource. In accordance with the principle
that this escaping must occur as late as possible in the processing chain,
applications which provide access to the base URI of an element
should
calculate and return the value without escaping.
RFC 3986 [RFC 3986] provides for base URI information to be embedded within a document. The rules for determining the base URI can be summarized as follows (highest priority to lowest):
The base URI is embedded in the document's content.
The base URI is that of the encapsulating entity (message, document, or none).
The base URI is the URI used to retrieve the entity.
The base URI is defined by the context of the application.
Note:
The term "entity" in points #2 and #3 above uses the RFC 3986 meaning of the term. Elsewhere in this document the term "entity" is used in the XML sense.
This document specifies the details of rule #1 for embedding base URI information in the specific case of XML documents.
Relative URIs appearing in an XML document are always resolved relative to either an element, a document entity, or an external entity. There is no provision for finer granularity, such as per-attribute, per-character, or per-entity base information. Neither internal entities, whether declared in the internal subset or in an external DTD, nor freestanding text (text not enclosed in an element) in an external entity, are considered to set a base URI separate from the base URI in scope for the entity reference.
The base URI of a document entity or an external entity is determined by RFC 3986 rules, namely, that the base URI is the URI used to retrieve the document entity or external entity.
The base URI of an element is:
the base URI specified by an xml:base
attribute
on the element, if one exists, otherwise
the base URI of the element's parent element within the document entity or external entity, if one exists, otherwise
the base URI of the document entity or external entity containing the element.
The base URI of an element bearing an xml:base
attribute with a value
that is not a valid XML Resource Identifier is application dependent.
The base URI corresponding to a given relative URI appearing in an XML document is determined as follows:
The base URI for a URI reference appearing in text content is the base URI of the element containing the text.
The base URI for a URI reference appearing in an
xml:base
attribute is the base URI of the parent
element of the element bearing the xml:base
attribute, if one exists within the document entity or
external entity, otherwise the base URI of the
document entity or external entity containing the element.
The base URI for a URI reference appearing in any other attribute value, including default attribute values, is the base URI of the element bearing the attribute.
The base URI for a URI reference appearing in the content of a processing instruction is the base URI of the parent element of the processing instruction, if one exists within the document entity or external entity, otherwise the base URI of the document entity or external entity containing the processing instruction.
Note:
The presence of xml:base
attributes might
lead to unexpected results in the case where the attribute value
is provided, not directly in the XML document entity, but via a
default attribute declared in an external entity. Such
declarations might not be read by software which is based on
a non-validating XML processor. Many XML applications fail to
require validating processors. For correct operation with
such applications, xml:base
values
should
be provided either directly or via default attributes declared
in the internal subset of the DTD.
Note:
The presence of xml:base
attributes might
lead to unexpected results in the case where the attribute value
is provided, not directly in the XML document entity, but via a
default attribute.
For instance, such a declaration in an external entity might
not be read by software which is based on
a non-validating XML processor. Defaulting attributes
through an external mechanism such as XML Schema may also lead to
unexpected results; even if a validating processor is used by the
application, the addition of defaulted attributes subsequent to
creation of the infoset can cause xml:base attributes to get out of
sync with the [base URI] infoset property. For these reasons,
xml:base
values
should
be provided either directly in the XML document instance
or via default attributes declared
in the internal subset of the DTD.
RFC 3986 defines certain relative URI references, in particular the
empty string and those of the form #fragment
,
as same-document references. Dereferencing
of same-document references is handled specially. However, their use as
the value of an xml:base
attribute does not involve
dereferencing, and XML Base processors should resolve them in
the usual way. In particular, xml:base=""
does not
reset the base URI to that of the containing document.
Note:
Some existing processors do treat these xml:base
values as resetting the base URI to that of the containing document,
so the use of such values is strongly discouraged.
XML Base defines a mechanism for embedding base URI information within an XML document. It does not define a mechanism to recognize which content or attribute values might contain URIs. This is only known by the specifications or applications assigning semantics to the vocabulary.
It is the intention of XML Base that future specifications and revisions of XML vocabularies identify which parts of the XML document are considered to be URIs, and provide normative reference to this specification in order to ensure that relative URIs are treated consistently across XML documents.
The impacts of XML Base on other standards (as of the publication date of this document) are described below.
XML 1.0 [XML] uses URI references in the system identifiers
for external entities. Since these declarations appear outside of the document
element (in an internal subset or external DTD), the scoping rules for
xml:base
prevent these URIs from being affected by the value of
xml:base
.
The XML Infoset [XML Infoset] defines the base URI property of element information items. The latest Infoset specification supports XML Base for purposes of determining the value of this property. Interfaces, applications, and specifications referencing this infoset property will support XML Base natively.
Namespaces in XML [XML Names] uses URI references, which as currently
defined should not be resolved relative to the base URI defined by
xml:base
for the purposes of namespace identification. Higher level
processes which dereference namespace URIs are not covered by the namespaces specification
and might at their option specify that xml:base
is honored for the
purposes of fetching resources at those URIs.
The XPath [XPath] data model preserves neither base
URI information nor the boundaries of external entities and thus is insufficient
to support resolution of relative URI references within these entities to be
resolved correctly. This includes relative URI references in xml:base
attributes.
The XSLT [XSLT] extensions to the XPath data model do provide for base URI information to be retained, but defines this information in a way that precludes support for XML Base. Future XSLT versions might want to require support for XML Base.
XML Schema Part 2: Datatypes [XML Datatypes] defines a uriReference
primitive
datatype. The XML Datatypes specification might want to require that applications
recognizing this datatype and resolving such URIs be aware of XML Base.
The XLink [XLink] specification requires support for XML Base.
XHTML [XHTML] uses URI references beyond those expressible in XLink. These URI references might be resolved by an application relative to the base URI defined by XML Base. The XHTML specification might want to describe their level of support for XML Base.