This technical note addresses some of the issues related to
inheritance of the XML attributes
xml:id and the W3C
Recommendation for Canonical XML Version 1.0 [C14N10] (Errata). Shortcomings of C14N/1.0
are noted out and the use of a new C14N/1.1 recommendation
with the XML Digital Signature 1.0 Recommendation [XMLDSIG] is discussed.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the W3C Working Group Note of "Known Issues with Canonical XML 1.0 (C14N/1.0)", produced by the XML Core Working Group, as part of the XML Activity. A companion note, "XML Digital Signatures in the 2006 XML Environment" [XMLDSIG2006], describes in further detail how a revised canonicalization algorithm (C14N/1.1 or other) may be used with the current XML-SIG/1.0 Specification.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
2. Interaction with XML Base
2.1 Inheriting xml:base values
2.2 Special values of xml:base
3. Interaction with XML Id
4. Implicit use of Canonical XML 1.0 by XML Signature
5. Further considerations for C14N/1.1
5.1 xml:base and URI reference simplification
5.2 An XML infoset strategy for canonicalizing XML base
Section 2.4 of the Canonical XML 1.0 [C14N10]
Specification defines special treatment for attributes in the
XML namespace when a representation of a document subset
is generated. The processing specified assumes that attributes in the
XML namespace are inherited by copying them from the nearest ancestor.
The inheritance rule given is appropriate for the processing of the
xml:lang attributes, but not for
needs a special inheritance mechanism, or for
should not be inherited at all. [XML-BASE-Problem].
Related problems exist in the Decryption Transform for XML
Signature [XMLENCDEC] W3C
Recommendation, which applies a modified C14N/1.0 algorithm and
adds additional rules concerning the copying of attributes in
xml namespace. These rules are based on the
same assumptions as their counterparts in C14N/1.0.
The XML Base Recommendation [XMLBASE] defines
the base URI of an element as the value of the element's
xml:base attribute, the base URI of the element's
parent element within the document or external entity, or the base
URI of the document entity or external entity containing the
element. In particular, the meaning of relative URI references in
xml:base attribute can depend on the chain of
xml:base attributes along an element's ancestor axis.
The canonicalization of
xml:base requires a more
specific algorithm than just copying or inheriting the values of
xml:base attributes. The following cases must
be taken into account:
xml:base values may consist of
only a fragment identifier (this is a no-op)
xml:base values may be empty
(this is a no-op)
xml:base values may be absolute or relative URI references
Depending on the input node set to canonical xml, one can either canonicalize a whole document or a subset of the document's nodes. For example, in [XMLDSIG], one can use either XPointer to dereference only parts of a document or XPath Filter and XPath Filter 2.0 transforms to refer to a given fragment of the document that one wants to sign.
Consider the following XML document (document 1):
<?xml version="1.0"?> <a xml:lang="en"> <b xml:base="http://www.example.org/pathseg1/" xml:lang="de"> <c> </c> </b> </a>
Figure 1: Sample XML document 1
We now canonicalize document 1 with the input nodeset of c14n being
<c>. The element nodes along
<c>'s ancestor axis are examined for the first
occurence of any
xml namespace axis, and these are then
merged into the attribute list of
<?xml version="1.0"?> <c xml:base="http://www.example.org/pathseg1/" xml:lang="de"> </c>
Figure 2: Canonical form of sample XML document 1
xml:base attribute on the
element in the canonicalized node-set indeed contains the base URI of
<c/> element as present in document 1.
Up to now, there have been no problems with the simple duplication
xml:base for maintaining the inheritance. However,
this is not always possible. Let's now consider the following XML
document (document 2):
<?xml version="1.0"?> <a xml:base="http://www.example.org/pathseg1/" xml:lang="en"> <b xml:base="../pathsegA/" xml:lang="de" > <c> </c> </b> </a>
Figure 3: Sample XML document 2
We now canonicalize document 2, the input nodeset of c14n being the element <c>
<?xml version="1.0"?> <c xml:base="../pathsegA/" xml:lang="de"> </c>
Figure 4: Canonical form of sample XML document 2
In the case of
xml:lang, copying the parent's
attributes allowed to retain the context. In the case of
xml:base, we have lost the context of how to resolve the
relative URI reference. Thus, for a given node-set, the application of the C14N/1.0
inheritance rule can lead to
attributes which specify a base URI that is different from the one
in the original document context.
C14N/1.0 also has issues in that it doesn't know how to process
xml:base attributes that have no value or have values
that are a same-document (section 4.2 [RFC
2396]) reference. As indicated by
Roy Fielding and Richard Tobin these should be treated as do
nothing or no operation (noop) in
Consider the following document located at (
<?xml version="1.0"?> <a xml:base="http://www.example.org/pathseg1/"> <b xml:base="file.ext" xml:lang="de"> <c xml:base="" > <d xml:base="" href="file.ext#some-id1"> </d> <e xml:base="#some-fragment" href="file.ext#some-id2"> </e> </c> </b> </a>
Figure 5: Sample XML document 3
We now canonicalize document 3 with the input nodeset of C14N/1.0 being
<c> and all its descendants:
<?xml version="1.0"?> <c xml:base=""> <d xml:base="" href="#some-id1"> </d> <e xml:base="#some-fragment" href="#some-id2"> </e> </c>
Figure 6: Incorrect canonical form of sample XML document 3
As there already exists an
xml:base="" attribute in
<c>, C14N/1.0 rules won't let
Let's now consider the case that the node that has
xml:base="" is in the input-nodeset and that
xml:base="" is considered as a no operation (noop).
According to the C14N/1.0 rules, we would need to copy the ancestor's
value that is not in the input-nodeset. However, this would not
The inheritance rules of the XML Base Recommendation [XMLBASE,
section 4] allows for succesive use of relative references. Also,
such sucessive relative references may not be in the input node set
and hence not rendered. So an inheritance rule for
xml:base would have to combine
with its omitted ancestors
xml:base values. However this
is not stated.
A correct canonicalization of element <c> and all its descendants that preserves the base URI from the original context would be as follows:
<?xml version="1.0"?> <c xml:base="http://www.example.org/pathseg1/file.ext" > <d href="file.ext#some-id1"> </d> <e href="file.ext#some-id2"> </e> </c>
Figure 7: Correct canonical form of sample XML document 3
xml:id [XMLID] attribute is part of the
XML information Set [XMLINFOSET]. It allows
to associate any XML element with a unique identifier. Therefore, the
value of a given
xml:id attribute is unique within an XML
xml:id Recommendation was issued after
Canonical XML 1.0 had become a Recommendation.
The recommended C14N/1.0 processing behavior that requires
inheritance of attributes by copying them from the nearest ancestor
can produce badly-formed documents with respect to the
recommendation. Consider the following fragment of an XML
<a xml:id="id_a"> <b /> <c /> </a>
If we select the children of node
<a> and apply the C14N/1.0
processing rules, both node
<c> would obtain a copy of
xml:id attribute. This produces a badly-formed XML
document as two
xml:id attributes have the same
<b xml:id="id_a" /> <c xml:id="id_a" />
Note that even if only element inherited the
attribute, the result would still be wrong - the
attribute value would be assigned to the wrong element. For example,
let's now select node
<b>. The C14N/1.0 processing would
xml:id attribute value
<b xml:id="id_a" />
Therefore, C14N/1.0 cannot be applied to documents containing
xml:id attributes. Inheritance of any
attributes would produce a wrong or a badly-formed document.
XML Signature [XMLDSIG] identifies the
canonicalization method by an URI inside
<ds:CanonicalizationMethod> on a
<ds:SignedInfo> level. More importantly, the same
is needed on the data object or
by using a
<ds:Transform> inside a
<ds:Reference>. In the latter case, if no such
<ds:Transform> is given on the data object level,
and if a node-set is subject to a transformation that requires an
octet stream or is to be hashed using the message digest, the XML
Signature Reference Processing Model uses Canonical XML C14N/1.0
implicitly to convert a node-set into an octet stream.
If applications require processing according to a particular version of Canonical XML, then they should explicitly give the appropriate algorithm URI. Specifically, the following cases must be taken into account:
insert an explicit
invoking a new version of Canonical XML before each
requires an octet stream as input, but is
applied to a node-set
if the previous transform
outputs a note-set, append a
invoking a new version of Canonical XML as the last
<ds:Transform> before the
use this URI inside
Such an approach, however, will increase the size and the complexity of
XML digital signatures. Future versions of XML Signature [XMLDSIG] should consider the use of
<ds:CanonicalizationMethod> to specify a default
node-set to octet stream conversion method for the XML
Signature Reference Processing Model.
One should also note that a lot of care will have to be taken on
future signature creation as all transforms (including the digest)
that require an octet stream as input but are applied to a node-set
will need to have such a revised version of Canonical XML as
<ds:Transform> before it is input.
For further information, please refer to the companion note, "XML Digital Signatures in the 2006 XML Environment [XMLDSIG2006], which describes with more detail how a revised canonicalization algorithm (C14N/1.1 or other) may be used with the current XML-SIG/1.0 Specification.
Inheritance rules will also have to be able to deal with relative
references having "./" and "../" segments apearing in the values for
According to the rules laid down in the XML Base Recommendation [XMLBASE,
Section 4], relative references are resolved against the
xml:base attribute of the element or element's
ancestor. This implies that relative references are absolutized and
normalized as specified in [RFC 2396, Section
This operation can only be performed from the outermost to the
innermost relative reference. Thus, there is no value in keeping dot
and dot-dot-segments when fixing up relative reference values of
xml:base when defining an inheritance rule for
Some special considerations are needed. When normalizing a relative
URI reference, it is crucial to keep the leading "../" segments of
relative-path references. Otherwise, path-segments of ancestors'
xml:base URIs may not be removed appropriately. Another
issue is that one could create erroneous output that looks similar to
that of a network-path reference when normalizing an absolute-path reference.
For instance, an incorrect
Note: [RFC 3986, Section 4.2] defines the terms relative-path, network-path and absolute-path reference as used in this document.
The removal of dot-segments cause more logically equivalent documents to produce the same canonicalized output. Furthermore, XML Signatures [XMLDSIG] will benefit from such normalization as the likelyhood of false negatives on signature validation decreases.
As stated earlier in this note, the rules for the inheritance of
xml:base require many considerations. Another more
straight-forward approach would be to use a strategy based on the XML
infoset [C14N-INFOSET], namely:
EIIfor an element information item to be canonicalized, and
EIICfor the element information item corresponding to
EIIin the result of parsing the canonical serialization of the node-set containing
EIIC's [base URI] would otherwise be different from
EII's [base URI].
This has the advantage that not only does it correctly produce
<a xml:base="http://example.org"> <c xml:base="test/" /> </a>
when<a xml:base="http://example.org"> <b xml:base="test/ "> <c/> </b> </a>
<b>...</b> is filtered out, but it will also correctly produce
<a xml:base="http://example.org"> <c xml:base="http://example.org/test/test/" /> </a>
<a xml:base="http://example.org"> <b xml:base="test/"> <c xml:base="test/" /> </b> </a>
<b>...</b> is filtered out.
But we can't say it that way, because C14N as written does not use the infoset. Cannonical XML is currently defined on the XPath data model.