W3C

Canonical XML Version 2.0

W3C Working Draft 22 October 2009

This version:
http://www.w3.org/TR/2009/WD-xml-c14n2-20091022/
Latest version:
http://www.w3.org/TR/xml-c14n2/
Editors:
John Boyer, IBM (formerly PureEdge Solutions Inc., Version 1.0)
Glenn Marcy, IBM (Version 1.0)
Pratik Datta, Oracle
Frederick Hirsch, Nokia

Abstract

Canonical XML Version 2.0 is a major rewrite of Canonical XML Version 1.1 to address issues around performance, streaming, hardware implementation, robustness, minimizing attack surface, determining what is signed and more. It also incorporates an update to Exclusive Canonicalization, effectively a 2.0 version, as well.

Any XML document is part of a set of XML documents that are logically equivalent within an application context, but which vary in physical representation based on syntactic changes permitted by XML 1.0 [XML] and Namespaces in XML 1.0 [Namespaces]. This specification describes a method for generating a physical representation, the canonical form, of an XML document that accounts for the permissible changes. Except for limitations regarding a few unusual cases, if two documents have the same canonical form, then the two documents are logically equivalent within the given application context. Note that two documents may have differing canonical forms yet still be equivalent in a given context based on application-specific equivalence rules for which no generalized XML specification could account.

Canonical XML Version 2.0 is applicable to XML 1.0. It is not defined for XML 1.1.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a First Public Working Draft of "Canonical XML Version 2.0".

This document is expected to be further updated based on both Working Group input and public comments. The Working Group anticipates to eventually publish a stabilized version of this document as a W3C Recommendation.

This version of the XML Canonicalization specification defines a version of the canonicalization algorithm that is applied to a less general class of possible nodesets than previous versions. The restrictions of this algorithm dovetail with the transform model defined in XML Signature 2.0. While less generic, we anticipate gains in terms of simplicity, lower attack surface, and streamability. We appreciate early comments on this general approach.

This document was developed by the XML Security Working Group.

Please send comments about this document to public-xmlsec-comments@w3.org (with public archive).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 Terminology
    1.2 Applications
    1.3 Limitations
    1.4 Requirements for 2.0
        1.4.1 Performance
        1.4.2 Streaming
        1.4.3 Robustness
        1.4.4 Simplicity
2 XML Canonicalization
    2.1 Data Model
    2.2 Parameters
    2.3 Processing Model for DOM
        2.3.1 XML Attribute Processing
            2.3.1.1 join-URI-References function
        2.3.2 Node Processing
        2.3.3 Namespace Processing
        2.3.4 Output rules
        2.3.5 Other ideas considered
    2.4 Processing model for Streaming XML parsers
3 References

Appendix

A Remove Dot Segments


1 Introduction

1.1 Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [Keywords].

See [Namespaces] for the definition of QName.

document subset

A document subset is a portion of an XML document that may not include all of the nodes in the document.

canonical form

The canonical form of an XML document is physical representation of the document produced by the method described in this specification

canonical XML

The term canonical XML refers to XML that is in canonical form. The XML canonicalization method is the algorithm defined by this specification that generates the canonical form of a given XML document or document subset. The term XML canonicalization refers to the process of applying the XML canonicalization method to an XML document or document subset.

subtree

Subtree refers to one XML element node, and all that it contains. In XPath terminology it is an element node and all its descendant nodes

1.2 Applications

Since the XML 1.0 Recommendation [XML] and the Namespaces in XML 1.0 Recommendation [Namespaces] define multiple syntactic methods for expressing the same information, XML applications tend to take liberties with changes that have no impact on the information content of the document. XML canonicalization is designed to be useful to applications that require the ability to test whether the information content of a document or document subset has been changed. This is done by comparing the canonical form of the original document before application processing with the canonical form of the document result of the application processing.

For example, a digital signature over the canonical form of an XML document or document subset would allow the signature digest calculations to be oblivious to changes in the original document's physical representation, provided that the changes are defined to be logically equivalent by the XML 1.0 or Namespaces in XML 1.0. During signature generation, the digest is computed over the canonical form of the document. The document is then transferred to the relying party, which validates the signature by reading the document and computing a digest of the canonical form of the received document. The equivalence of the digests computed by the signing and relying parties (and hence the equivalence of the canonical forms over which they were computed) ensures that the information content of the document has not been altered since it was signed.

Note: Although not stated as a requirement on implementations, nor formally proved to be the case, it is the intent of this specification that if the text generated by canonicalizing a document according to this specification is itself parsed and canonicalized according to this specification, the text generated by the second canonicalization will be the same as that generated by the first canonicalization.

1.4 Requirements for 2.0

XML Canonicalization 2.0 solves most of the major issues that have been identified by implementers with Canonical XML 1.0 [C14N10] and 1.1 [C14N11].

1.4.1 Performance

A major factor in performance issues noted in XML Signature is often C14N11 canonicalization. Canonicalization will be slow if the implementation uses the Canonical XML 1.1 specification as a formula without any attempt at optimization. This specification rectifies this problem by incorporating lessons learned from implementation into the specification. Most mature C14N implementations solve the performance problem by inspecting the signature first, to see if it can be canonicalized using a simple tree walk algorithm whose performance is similar to regular XML serialization. If not they fall back to the expensive nodeset based algorithm.

The use cases that cannot be solved by the simple tree walk algorithm are mostly edge use cases. This specification restricts the input of the canonicalization algorithm, so that implementations can always use the simple tree walk algorithm.

C14N 1.x uses an "XPath 1.0 Nodeset" to describe a document subset. This is the root cause of the performance problem and can be solved by not using a Nodeset. This version of the spec does not use a nodeset, visits each node exactly once, and it only visits the nodes that are being canonicalized.

1.4.2 Streaming

A streaming implementation is required to be able to process very large documents without holding it all in memory, i.e. it should be able to process the document one chunk at a time.

1.4.3 Robustness

Whitespace handling was a common cause of signature breakages. XML libraries allow one to "pretty print" an XML document, and most people wrongly assume that the white space introduced by pretty printing will be removed by canonicalization but that is not the case. This specification adds three techniques to improve robustness:

  1. Remove leading and trailing whitespace from text nodes,

  2. Allow for QNames in content especially in the xsi:type attribute,

  3. Rewrite prefixes

1.4.4 Simplicity

C14N 1.x algorithms are complex and depend a full XPath library. This makes it very hard for scripting languages to use XML Signatures. This specification addresses this issue by not using the complex nodeset model, and therefore not relying completely on XPath - also it introduces a minimal canonicalization mode.

2 XML Canonicalization

2.1 Data Model

The input to the canonicalization algorithm consists of an XML document subset, and set of options. The XML document subset can be expressed in two ways, with a DOM model or a Stream model.

In a DOM model the XML subset is expressed as

  • either a whole document, or a list of one or more disjoint subtrees.

  • a list of exclusion subtrees or exclusion attribute nodes. (Note: this model purposely does not support re-inclusion, i.e. all the exclusions are applied after all the inclusions. So this is not like the XPath Filter 2 model [XPath-Filter-2] where there is an ordered list of union, intersect and subtract operations)

Note: exclusion is very limited, only complete subtrees and attribute nodes can be excluded, other kinds of nodes like text nodes, comment nodes, PI nodes cannot be excluded. Even attribute exclusion has limitations, namespace declaration and attributes in XML namespace cannot be excluded.

Note: This input model is a very limited form of the generic XPath Nodeset that was the input model for Canonical XML 1.x. It is designed to be simple and allow a high performance algorithm, while still allowing the essential use cases. Specifically this model does not allow these kinds of document subsets

  • an attribute all by itself

  • an attribute in the document subset, without its owner element being also in the document subset

  • a text node all by itself

  • a text node in the document subset, without its parent text node being also in the document subset

  • an element without some of its text node children

2.2 Parameters

Instead of separate algorithms for each variant of canonicalization, this specification goes with the approach of a single algorithm, which does slightly different things depending on the parameters.

Name Values Description Default
exclusiveMode true or false whether to do inclusive or exclusive dealing of namespaces. In exclusive mode the inclusiveNamespacePrefixList parameter can be specified listing the prefixes that are to be treated in an inclusive mode false
inclusiveNamespacePrefixList space separated list of prefixes list of prefixes to be treated inclusively. Special token #default indicates the default namespace. empty
ignoreComments true or false whether to ignore comments during canonicalization true
trimTextNodes true or false whether to trim (i.e. remove leading and trailing whitespaces) all text nodes when canonicalizing. Adjacent text nodes must be coalesced prior to trimming. If an element has an xml:space="preserve" attribute, then text nodes descendants of that element are not trimmed regardless of the value of this parameter. false
serialization XML or EXI whether to do the normal XML serialization, or do an EXI serialization - which is useful if the original document to be signed is already in EXI format. XML
prefixRewrite none, sequential, derived with none, prefixes are not changed, with sequential prefixes are changed to n1, n2, n3 ... and with derived, each prefix is changed to nSuffix, where the suffix is derived by doing a digest of the namespace URI. none
sortAttributes true or false whether the attributes need to be sorted before canonicalization. In some environments the order of attributes changes in transit so sorting is important. true
ignoreDTD true or false if set to true, ignore the DTD completely, which means do not normalize attributes, do not look into entity definitions, do not add default attributes to each element false
expandEntities true or false if set to true ignore all entity declarations, and expand only the predefined entites (lt, gt, amp, apos, quot) and character references. (Entity declarations are potential attack points, [BradHill] mentions an entity that is 2 GB is length, also expanding external entities can lead to cross site scripting attacks) true
xmlBaseAncestors inherit, none, combine whether to inherit xml:base attributes from ancestors (like C14N 1.0) or not (like Exc C14n 1.0) or combine them (like C14n 1.1) combine
xmlIdAncestors inherit, none whether to inherit xml:id attributes from ancestors (like C14N 1.0) or not (like C14N 1.1 or Exc C14n 1.0) none
xmlLangAncestors inherit, none whether to inherit xml:lang attributes from ancestors (like C14N 1.0 and C14n 1.1) or not (Exc C14n 1.0) inherit
xmlSpaceAncestors inherit, none whether to inherit xml:space attributes from ancestors (like C14N 1.0 and C14n 1.1) or not (Exc C14n 1.0) inherit
xsiTypeAware true or false if set to true, looks for namespace prefix usages in xsi:type attributes as well, otherwise xsi:type attributes are treated just like regular attributes. false

The defaults are set to result in canonical 1.1 with no comments.

Implementation are not required to support all possible combinations of these parameters, instead these parameter are grouped into various "named parameter sets". Implementation can choose to support one or more of these.

  • canonical-xml-1.1-nocomments: exclusiveMode=false, xsiTypeAware=false ...

    This produces the exactly same output as Canonical XML 1.1

  • exclusive-canonical-xml-1.0-nocomments: exclusiveMode=true, xsiTypeAware=false ...

    This produces the exactly same output as Exc Canonical XML 1.0

  • minimal-canonicalization:sortAttributes=false,...

    Very low processing, required in situations where the XML content is expected to be mostly unchanged during transport

2.3 Processing Model for DOM

The basic canonicalization process consist of traversing the tree and outputting octets for each node. The algorithm here is presented in pseudo-code using a recursive function to traverse the tree.

Sort the subtrees by document order, and then start processing each subtree.

canonicalize(list of subtree, list of exclusion elements and attributes, properties)
{
   put the exclusion elements and attributes in hash table for easier lookup
   
   sort the multiple subtrees by document order
   
   for each subtree
      canonicalizeSubtree(subtree) 
}

Note: these subtrees should be distinct, i.e. one subtree should not include any of the other subtrees. if that is not the case, ignore the included subtrees

For the special case when the subtree is actually the whole document, or the document root, directly start processing the node. Otherwise find out a list of ancestors for that subtree, and then look for namespace declarations in this ancestor nodes. Also look for any xml: attributes that need to be inherited, and then temporarily put them in the subtree root, and then start processing the subtree root.

canonicalizeSubtree(node)
{
   initialize namespaceContext to contain the default prefix, mapped
   to an empty URI, and hasBeenOutput to true 
   
   if (node is the document node or a document root element) 
   {
      // (whole document is being processed, no ancestors to worry about)
      call processNode(node, namespaceContext)
   }
   else
   {
      starting from the element, walk up the tree to collect a list of
      ancestors 
        
      for each of this ancestor elements starting with the document
      root, but not including the element itself 
        addNamespaces(ancestorElem, namespaceContext)

      initialize xmlattribContext to empty

      for each of this ancestor elements starting with the document
      root, and also including the element itself 
        addXmlattribs(ancestorElem, xmlattribContext)
          
      if there are any attributes in xmlattribContext 
         temporarily add/replace these XML attributes in node
          
      processNode(node, namspaceContext)
          
      restore the original XML attributes
   }   
}

2.3.1 XML Attribute Processing

Special processing is required for xml:id, xml:lang, xml:space and xml:base attributes. To process this keep a hash table of attribute name to attribute value.

xmlattribContext is a hash table of  name -> value

While processing the ancestors of each subtree, these special XML attributes need to inherited, combined or ignored depending on the parameters.

addXMLAttribute(element, xmlattribContext)
{
   for each of the xml: attributes of this element
   {
      case xml:id attribute: 
        if xmlIdAncestors is inherit then store this attribute value, else do nothing

      case xml:lang attribute 
        if xmlLangAncestors is inherit then store this attribute value, else do nothing

      case xml:space attribute 
        if xmlSpaceAncestors is inherit then store this attribute value, else do nothing

      case xml:base attribute 
        if xmlBaseAncestors is inherit then store this attribute value,
        else if xmlBaseAncestors is combine, and there is a previous value of xml:base
           then do a "join-URI-References" to combine the new value and the old value 
        else do nothing
   } 
}
2.3.1.1 join-URI-References function

The join-URI-References function takes xml:base attribute values from all the ancestor elements and combines it to create a value for an updated xml:base attribute. A simple method for doing this is similar to that found in sections 5.2.1, 5.2.2 and 5.2.4 of RFC 3986 with the following modifications:

  • Perform RFC 3986 section 5.2.1. "Pre-parse the Base URI" modified as follows.

    • The scheme component is not required in the base URI (Base). (i.e. Base.scheme may be null)

    • Replace a trailing ".." segment with "../" segment before processing.

  • Section 5.2.4. "Remove Dot Segments" is modified as follows:

    • Keep leading "../" segments

    • Replace multiple consecutive "/" characters with a single "/" character.

    • Append a "/" character to a trailing ".." segment

  • The "Remove Dot Segments" algorithm is modified to ensure that a combination of two xml:base attribute values that include relative path components (i.e., path components that do not begin with a '/' character) results in an attribute value that is a relative path component.

  • Perform RFC 3986 section 5.2.2. "Transform References" modified as follows to ignore the fragment part of R

    • After parsing R set R.fragment = null

The following examples illustrate the modification of the "Remove Dot Segments" algorithm:
  • "abc/" and "../" should result in ""

  • "../" and "../" are combined as "../../" and the result is "../../"

  • ".." and ".." are combined as "../../" and the result is "../../"

2.3.2 Node Processing

The following pseudo code use a recursive function processNode(node) to traverse the tree.

  • Generic node: Redirect to appropriate node processing function

    processNode(node, namespaceContext)
    {
      call the appropriate function - processElement, processTextNode, ... 
    }
    
  • Document node: Loop through all the children

    processDocument(document, namespaceContext)
    {
      Loop through all child nodes and call
        processNode(child, namespaceContext)
    }
    
  • Element nodes First check if this matches an exclusion node, in which case completely ignore this element and all its descendants. Otherwise process the namespaces as described in the next section. This will return a list of namespaces to be output and also compute the rewritten prefix value. Now output the element start tag, then the list of namespaces and then the list of attributes. After that loop through all the children, and then output the element end tag.

    processElement(element, namespaceContext)
    {
      if this exists in the exclusion hash table
        return
        
      make of copy of xmlattribContext and namespaceContext
      //(by copying, any changes made can be undone when this function returns)
      
      nsToBeOutputList = processNamespaces(element, namespaceContext)
      
      output('<')
      output(element QName)  
    
      for each of the namespaces in the nsToBeOutputList
        output this namespace declaration 
        
      sort each of the non namespaces attributes by URI first then attribute name.
      output each of these attributes
    
      output('>')
      
      Loop through all child nodes and call
        processNode(child, namespaceContext)
      
      output('</')
      output(element QName)
      output('>')
      
      restore xmlattribContext and namespaceContext
    }
    

    Note: Take special care when rewritePrefix parameter is set. In that case use the new prefix value for all QNames, element names, attribute names, and also QNames in xsi:type attributes.

  • Text nodes: Ignore text nodes outside document root. For text nodes inside the document root replace special characters. Also if the trimTextNode is set to true, and there is no xml:space="preserve" declaration trim leading and trailing space.

    processTextNode(textNode)
    {
      if this text node is outside document root
         return
         
      in the text replace 
        all ampersands by &amp;, 
        all open angle brackets (<) by &lt;, 
        all closing angle brackets (>) by &gt;, 
        and all  #xD characters by &#xD;.
        
      If trimTextNode is true and there is no xml:space=preserve declaration in scope
        trim leading and trailing space
          
      output(text)
    }                                       
    

    Note: The DOM parser might have split up a long text node into multiple adjacent text nodes, some of which may be empty. In that case be careful when trimming the leading and trailing space - the net result should be same as if it the adjacent text nodes were concatenated into one

  • Processing Instruction (PI) Nodes: If the string value is empty, do not add the leading space is not added. Also, output a trailing #xA is rendered after the closing PI symbol for PI children of the root node which are before the document element, and output a leading #xA before the opening PI symbol of PI children of the root node which are after the document element.

    processPINode(piNode)
    {
      if before document node
        output('#xA')
        
      output('<?')
      output(the PI target name of the node)
      output(a leading space)
      output(the PI string value)
      output('?>') 
    
      if after document node
        output('#xA')
    }                                        
    
  • Comment Nodes: Output nothing uf generating canonical XML without comments. For canonical XML with comments, generate the opening comment symbol (<!--), the string value of the node, and the closing comment symbol (-->). Also, output a trailing #xA after the closing comment symbol for comment children of the root node which are before the document element, and output a leading #xA before the opening comment symbol of comment children of the root node which are after the document element. (Comment children of the root node represent comments outside of the top-level document element and outside of the document type declaration).

    processCommentNode(commentNode)
    {
      if ignoreComments
        return
        
      if before document node
        output('#xA')
        
      output('<!--')
      output(string value of node)
      output('-->')
    
      if after document node
        output('#xA')
    }
    

2.3.3 Namespace Processing

  • Explicit and Implicit namespace declarations In DOM, there is no special node for namespace declarations, they are just present as regular Attribute nodes, whose prefix is "xmlns" and whose locaName is the prefix begin declared. DOM also allows declaring a namespace "implicitly", i.e. if a new DOM element or attribute is constructed using the createElementNS and createAttributeNS methods, but there is no declaration for that prefix, the declaration is automatically added when serializing the document.

  • Default namespace The default namespace is declared by xmlns="...". If such a declaration does not exist, it means that default namespace is null.

  • Visibility utilized This concept is required for exclusive canonicalization. A namespace prefix is visibly utilitized by an element when

    • The element itself uses the prefix. (Note if an element does not have a prefix, that means it visibily utilizes the default namespace.)

    • An attribute of that element uses that prefix, and that attribute is not in the exclusion list. (Note: unlike elements, if an attribute doesn't have a prefix, its means it is a locally scoped attribute. It does NOT mean that the attribute visibily utilizes the default namespace.)

    • xsiTypeAware is true, and the element has an xsi:type attribute, and this attribute's value uses this prefix.

  • Namespace context

    namespaceContext is a hash table of  prefix -> (uri, hasBeenOutput, newPrefix)
    
    While traversing the subtrees, maintain a "namespace context" which is mapping of prefixes to URIs. Each prefix should also have
    • a boolean flag hasBeenOutput - whether tha namespace declaration has been output

    • a new prefix value - used for prefix rewriting.

    At the beginning of the canoncalization initialize this to contain only entry - the default namespace mapped to an empty URI, and hasBeenOutput = true. A prefix value of null can be used to denote the default namespace.
  • This function is called for every ancestor element, and also at every element of the subtrees (minus the exclusion elements). It adds the namespaces declaration to this namespaceContext.

    addNamespaces(element, namespaceContext)
    {
      for each the explicit and implicit namespace declarations in the element
      {
         if there is already a declaration for this prefix, and this
         declaration is different from existing declaration 
         overwrite the URI , and set hasBeenOutput to false
          
         if there is no entry for this prefix
         add an entry for this URI, and hasBeenOutout to false
             
      } 
    }
    
  • At every element of the subtree (minus the exclusion subtrees), compute a list of namespaces that need be output. In inclusive mode output the namespace declaration right away, but in exclusive delay outputting till the namespace prefix is visibily utilized. After computing the list of namespaces to be output do prefix rewriting.

    • If prefixRewrite is none, just sort the namespaces to be output by prefix name, default prefix is an empty string so if present it goes first

    • If prefixRewrite is sequential, sort the nameapces to be output by URI. Then sequentially assign them prefixes n0, n1, n2 .... For this keep a counter variable that is initialized to 0 at the beginning of the canonicalization and then incremented to get the prefixes.

    • If prefixRewrite is digest, sort the namespaces to be output by URI. Then assign them a prefixes based on a SHA1 digest of the URI, which is then base64 ed, and base64 chars '/' and '+' replaced by '_' and '-' to achieve XML name rules.

    processNamespaces(element, namespaceContext)
    {
      addNamespaces(element, namespaceContext)
      
      initialize nsToBeOutputList to empty list
      
      for each prefix in the namespaceContext for which hasBeenOutput is false
      {
         if exclusiveMode and this prefix is not in the inclusiveNamespacesList
         {
            if the prefix is visibily utilized by this element
                    add the prefix to the nsToBeOutputList and set
                hasBeenOutput to true 
         }
         else
                    add the prefix to the nsToBeOutputList and set hasBeenOutput to true    
      }
      
      if (prefixRewrite is none)
      {
        sort the nsToBeOutputList by the prefix
      }
      else if (prefixRewrite is sequential) 
      {
        sort the nsToBeOutputList by URI
        assign new prefix values "nN" to each prefix in this
        nsToBeOutputList where N represents an incremented counter value ,
        i.e. n0, n1, n2 .. 
        // the counter should be set to 0 in the beginning of the canonicalization
        // note: prefix numbers are assigned in the order that the
        prefixes are present in nsToBeOutputList 
      }
      else if (prefixRewrite in digest)
      {
        sort the nsToBeOutputList by URI
        assign new prefix values "nD" to each prefix in this nsToBeOutputList where
          D represents the SHA1 digest of the URI represented as a Base64
          string 
        // refer to presentation by Ed Simon  
      }
      
      return nsToBeOutputList    
    }
    

2.3.4 Output rules

  • The document is encoded in UTF-8

  • Line breaks normalized to #xA on input (automatically done by a DOM parser)

  • Attribute values are normalized, if ignoreDTD is false

  • Character and parsed entity references are replaced

  • CDATA sections are replaced with their character content

  • The XML declaration and document type declaration are removed

  • Empty elements are converted to start-end tag pairs

  • Whitespace outside of the document element and within start and end tags is normalized

  • Attribute value delimiters are set to quotation marks (double quotes)

  • Special characters in attribute values and character content are replaced by character references

  • Default attributes are added to each element, if ignoreDTD is false

2.3.5 Other ideas considered

  • Qnames in content: Have another parameter listing other element / attribute names that can have QNames, besides xsi:type. Or simply search all text content for QName.

  • Significant white space: Have a parameters listing elements in which whitespace is significant. Instead of listing individual element names, and entire target namespace URI can be specified, e.g. in many elements in xhtml namespace whitespace is significant

2.4 Processing model for Streaming XML parsers

Unlike DOM parsers which represent XML document as a tree of nodes, streaming parsers represent an XML document as stream of events like "start-element", "end-element", "text" etc. A document subset can also be represented as a stream of events. This stream of events in exactly in the same order as a tree walk, so the above canonicalization algorithm can be also used to canonicalize an event stream.

3 References

C14N-20000119
Canonical XML Version 1.0, W3C Working Draft. T. Bray, J. Clark, J. Tauber, and J. Cowan. January 19, 2000. http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html.
C14N-Issues
Known Issues with Canonical XML 1.0, W3C Working Group Note. J. Kahan, K. Lanz. December 2006. http://www.w3.org/TR/C14N-issues/.
C14N10
Canonical XML Version 1.0, W3C Recommendation. ed. J. Boyer. 15 March 2001.http://www.w3.org/TR/xml-c14n.
C14N11
Canonical XML Version 1.1, W3C Recommendation. ed. J. Boyer, G. Marcy. 2 May 2008.http://www.w3.org/TR/xml-c14n11/.
CharModel
Character Model for the World Wide Web, W3C Working Draft. eds. Martin J. Dürst, François Yergeau, Misha Wolf, Asmus Freytag and Tex Texin. http://www.w3.org/TR/charmod/.
CowanExample
Example of Harmful Effect of Character Model Normalization , Letter in XML Signature Working Group Mail Archive. John Cowan, July 7, 2000. http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0038.html.
DSig-Usage
Using XML Digital Signatures in the 2006 XML Environment , W3C Working Group Note. Thomas Roessler. December 2006. http://www.w3.org/TR/DSig-usage/.
ISO-8859-1
ISO-8859-1 Latin 1 Character Set. http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html or http://www.iso.org/iso/iso_catalogue.htm.
Infoset
XML Information Set, W3C Working Draft. eds. John Cowan and Richard Tobin. http://www.w3.org/TR/xml-infoset/.
Keywords
Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119. S. Bradner. March 1997. http://www.ietf.org/rfc/rfc2119.txt.
NFC
TR15, Unicode Normalization Forms. M. Davis, M. Dürst. Revision 18: November 1999. http://www.unicode.org/unicode/reports/tr15/tr15-18.html.
NFC-Corrigendum
> Normalization Corrigendum. The Unicode Consortium. http://www.unicode.org/unicode/uni2errata/Normalization_Corrigendum.html.
Namespaces
Namespaces in XML 1.0 (Second Edition), W3C Recommendation. eds. Tim Bray, Dave Hollander, Andrew Layman, and Richard Tobin. http://www.w3.org/TR/REC-xml-names/.
URI
Uniform Resource Identifiers (URI): Generic Syntax, IETF RFC 3986. T. Berners-Lee, R. Fielding, L. Masinter. January 2005 http://www.ietf.org/rfc/rfc3986.txt.
UTF-16
UTF-16, an encoding of ISO 10646, IETF RFC 2781. P. Hoffman , F. Yergeau. February 2000. http://www.ietf.org/rfc/rfc2781.txt.
UTF-8
UTF-8, a transformation format of ISO 10646, IETF RFC 2279. F. Yergeau. January 1998. http://www.ietf.org/rfc/rfc2279.txt.
Unicode
The Unicode Standard, version 3.0. The Unicode Consortium. ISBN 0-201-61633-5. http://www.unicode.org/unicode/standard/versions/Unicode3.0.html.
XBase
XML Base ed. Jonathan Marsh. 27 June 2001. http://www.w3.org/TR/xmlbase/.
XML
Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation. eds. Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, François Yergeau and Eve Maler. 16 August 2006. http://www.w3.org/TR/REC-xml/.
XML ID
xml:id Version 1.0, W3C Recommendation. eds. Norman Walsh, Daniel Veillard and Jonathan Marsh. 9 September 2005. http://www.w3.org/TR/xml-id/.
XML Plenary Decision
W3C XML Plenary Decision on relative URI References In namespace declarations, W3C Document. 11 September 2000. http://lists.w3.org/Archives/Public/xml-uri/2000Sep/0083.html.
XML-DSig
XML-Signature Syntax and Processing, IETF Draft/W3C Candidate Recommendation. D. Eastlake, J. Reagle, D. Solo, M. Bartel, J. Boyer, B. Fox, and E. Simon. 31 October 2000. http://www.w3.org/TR/xmldsig-core/.
XMLDSIG2
XML Signature Syntax and Processing, Version 2.0, W3C Working Draft 22 October 2009. http://www.w3.org/TR/2009/WD-xmldsig-core2-20091022/
XMLDSIG2nd
XML Signature Syntax and Processing (Second Edition), W3C Recommendation 10 June 2008 http://www.w3.org/TR/2008/REC-xmldsig-core-20080610/
XPath
XML Path Language (XPath) Version 1.0, W3C Recommendation. eds. James Clark and Steven DeRose. 16 November 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.
XPath-Filter-2
XML-Signature XPath Filter 2.0. W3C Recommendation. J. Boyer, M. Hughes, J. Reagle. November 2002. http://www.w3.org/TR/2002/REC-xmldsig-filter2-20021108/

A Remove Dot Segments

The following informative table outlines example results of the modified Remove Dot Segments algorithm described in Section 2.3.1.1 join-URI-References function

Input Output
no/.././/pseudo-netpath/seg/file.ext pseudo-netpath/seg/file.ext
no/..//.///pseudo-netpath/seg/file.ext pseudo-netpath/seg/file.ext
yes/no//..//.///pseudo-netpath/seg/file.ext yes/pseudo-netpath/seg/file.ext
no/../yes yes
no/../yes/ yes/
no/../yes/no/.. yes/
../../no/../.. ../../../
no/../.. ../
no/..  
no/../  
/a/b/c/./../../g /a/g
mid/content=5/../6 mid/6
../../.. ../../../
no/../../ ../
..yes/..no/..no/..no/../../../..yes ..yes/..yes
..yes/..no/..no/..no/../../../..yes/ ..yes/..yes/
../.. ../../
../../../ ../../../
.  
./  
./.  
//no/.. /
../../no/.. ../../
../../no/../ ../../
yes/no/../ yes/
yes/no/no/../.. yes/
yes/no/no/no/../../.. yes/
yes/no/../yes/no/no/../.. yes/yes/
yes/no/no/no/../../../yes yes/yes
yes/no/no/no/../../../yes/ yes/yes/
/no/../ /
/yes/no/../ /yes/
/yes/no/no/../.. /yes/
/yes/no/no/no/../../.. /yes/
../../..no/.. ../../
../../..no/../ ../../
..yes/..no/../ ..yes/
..yes/..no/..no/../.. ..yes/
..yes/...no/..no/..no/../../.. ..yes/
..yes/..no/../..yes/..no/..no/../.. ..yes/..yes/
/..no/../ /
/..yes/..no/../ /..yes/
/..yes/..no/..no/../.. /..yes/
/..yes/..no/..no/..no/../../.. /..yes/
/ /
/. /
/./ /
/./. /
/././ /
/.. /
/../.. /
/../../.. /
/../../.. /
//.. /
//..//.. /
//..//..//.. /
/./.. /
/./.././.. /
/./.././.././.. /
.  
./  
./.  
.. ../
../ ../