W3C

Canonical EXI

W3C Recommendation 07 June 2018

This version:
https://www.w3.org/TR/2018/REC-exi-c14n-20180607/
Latest version:
https://www.w3.org/TR/exi-c14n/
Previous version:
https://www.w3.org/TR/2018/PR-exi-c14n-20180426/
https://www.w3.org/TR/2016/CR-exi-c14n-20161103/
https://www.w3.org/TR/2015/WD-exi-c14n-20150521/
Editors:
Sebastian Käbisch, Siemens AG
Daniel Peintner, Siemens AG

Please refer to the errata for this document, which may include normative corrections.

See also translations.


Abstract

Any EXI document is part of a set of EXI documents that are logically equivalent within an application context, but which vary in physical representation based on differences permitted by the [EXI Format 1.0]. This specification describes a relatively simple method for generating a physical representation, the canonical form, of an EXI document that accounts for the permissible differences. An example of the applications targeted by this specification is one that needs to guarantee non-repudiation using XML Signature yet allows certain flexibility for intermediaries to reconstitute the documents before they reach final destination without breaking the signatures. Note that two documents may have differing canonical forms yet still be equivalent in a given context based on more elaborate application-specific equivalence rules which is out of scope of this specification.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This is the W3C Recommendation of the Canonical EXI specification and it has been produced by the EXI Working Group.

A diff-marked version against the previous version of this document is available.

Please send comments about this document as GitHub issues with label "Canonical EXI".

The interoperability testing has been performed using test cases developed in the EXI testsuite. An implementation report has been produced.

This document has been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 February 2018 W3C Process Document.


1. Introduction

The EXI 1.0 Recommendation [EXI Format 1.0] specifies the syntax of a class of resources called EXI streams. It is possible for EXI streams that are equivalent for the purposes of many applications to differ in physical representation. For example, they may differ in their datatype representation and attribute ordering. It is the goal of this specification to establish a method for determining whether two documents are equivalent, or whether an application has not changed a document, except for transformations permitted by EXI 1.0.

1.1 Notational Conventions and Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear EMPHASIZED in this document, are to be interpreted as described in RFC 2119 [IETF RFC 2119].

The term canonical is used throughout this document to denote a normative form in regard to the physical representation. The term canonical EXI refers to EXI that is in canonical form produced by the method described in this specification.

The term sorted lexicographically denotes lexicographical ordering of strings which is done by comparing character by character. Individual characters are ordered by comparing their Unicode code points.

1.2 Motivation

Many environments and device classes have difficulties handling plain-text XML due to various reasons (e.g., document size and processing overhead). W3C's Efficient XML Interchange (EXI) Format has been developed to provide a solution to these issues and to extend the use of XML and its tools.

With EXI, constrained environments and device classes (low memory, bandwidth, and processing power) have the possibility to be part of the XML world. However, some use cases also require a canonical representation of the XML-based data for comparison of logical and physical equivalence. Hence, supporting EXI canonicalization without going through plain-text XML where nothing else but EXI is available is needed.

In addition, EXI canonicalization is useful for traditional XML users. For example, EXI canonicalization provides a type-aware canonicalization scheme that can discern that +1, 1, 1.0, 1e0 and 1E0 are equivalent representations of the same floating-point value. This allows intermediaries to use binding-models and/or type-aware processing without breaking signatures. Moreover, with a fast EXI processor, EXI canonicalization can be much faster than traditional XML canonicalization and can help address some of the well-known processing bottlenecks for XML security.

1.3 Applications

One application field for the canonical form of an XML-based document or document subset is digital signature. During signature generation, the digest is computed over the canonical form of the document or the document subset respectively. The document is then transferred to the receiving party, which validates the signature by reading the document and computing a digest of the canonical form of the received document (see E.1 Signature Processing Steps). If there is equivalence, the receiving parties can ensure that the information content of the document has not been altered since it was signed.

Although EXI supports plain-text XML Signature by preserving XML information such as comments and prefixes (see EXI Best Practices for XML Signature) this optional strategy is not the most efficient and is not well suited for all environments and use cases.

It is the goal of this specification to provide a canonical EXI form for various use-cases. For example, restricted and very limited devices should be able to create or check against a canonical EXI stream. This applies to devices that may be able to speak only a given EXI language (according to an XML Schema) or support only a subset of all EXI features.

2. Canonical EXI Stream

The EXI specification defines an EXI stream as an EXI header followed by an EXI body. In this sense a Canonical EXI stream is a 3. Canonical EXI Header followed by a 4. Canonical EXI Body.

EXI Canonicalization may be used as a canonicalization method algorithm in XML Signature [XMLDSIG-CORE1] and XML Encryption [XMLENC-CORE1]. This document specifies the following identifier

http://www.w3.org/TR/exi-c14n

2.1 Canonical EXI Options

The Canonical EXI Options provide a single, simple, unambiguous way to express the EXI-C14N options. The following table describes the options that may be specified in the Canonical EXI Options document.

Table 2-1. Canonical EXI Options in Canonical Options Document
Canonical EXI OptionDescriptionDefault Value
omitOptionsDocument Omit EXI Options documentfalse
utcTime Use Coordinated Universal Time (UTC)false

Appendix B XML Schema for Canonical EXI Options Document provides an XML Schema describing the Canonical EXI Options document. This schema is designed for efficient transmission by utilizing EXI options.

[Definition: The omitOptionsDocument option specifies whether the EXI Options document is omitted. ]

[Definition: The utcTime option is used to specify whether Date-Time values must be represented using Coordinated Universal Time (UTC, sometimes called "Greenwich Mean Time"). ]

3. Canonical EXI Header

Each EXI stream begins with an EXI header. The EXI header identifies the version of the EXI format being used and specifies the options used to process the body of the EXI stream. The EXI header has the following structure:

[ EXI Cookie ] Distinguishing Bits
Presence Bit
for EXI Options
EXI Format
Version
[EXI Options][Padding Bits]

A Canonical EXI Header MUST NOT begin with the optional EXI Cookie, and padding bits (if any) MUST always be represented as a sequence of 0 (zero) bits.

If the Canonical EXI Option omitOptionsDocument is equal to true the Presence Bit for EXI Options MUST be 0 (false) to indicate that the fifth part of the EXI Header, the EXI Options, is absent. If the Canonical EXI Option omitOptionsDocument is equal to false, the Presence Bit for the EXI Options MUST be 1 (true) to indicate the EXI Options are present.

The EXI Options are represented as an EXI Options document. That said, the subsequently described canonicalization steps expect as input a set of EXI Options (or respectively an EXI options document) and produce as output a canonicalized set of EXI options that MUST be represented as 4. Canonical EXI Body.

A canonical EXI Options document MUST respect the following constraints.

  1. An EXI Options element blockSize that matches the default value (i.e., <blockSize>1000000</blockSize>) MUST be omitted.

  2. The element blockSize MUST be omitted if neither compression nor pre-compress is present.

  3. When the alignment option compression is set, pre-compress MUST be used instead of compression.

  4. When the value of the Preserve.lexicalValues fidelity option is true the element datatypeRepresentationMap MUST be omitted. When the value of the Preserve.lexicalValues fidelity option is false and the element datatypeRepresentationMap does have nested element tuples (tuple of schema datatype and datatype representation), the tuples are to be sorted lexicographically according to the schema datatype first by {name} then by {namespace}. Moreover, the EXI event sequence of each nested element MUST be Start Element (SE) followed by End Element (EE). Mappings that match the default built-in EXI datatype representations map (e.g., {http://www.w3.org/2001/XMLSchema}base64Binary → {http://www.w3.org/2009/exi}base64Binary) MUST be omitted.

  5. The user defined meta-data MUST NOT be used unless it conveys a convention used by the application. At the time of writing the only available convention is the [EXI Profile]. The associated exi:p element MUST always be represented using the following sequence of EXI events: A SE(exi:p) event, followed by an AT(xsi:type)="xsd:decimal" event, followed by a CH event, followed by an EE event.

    Note:

    The user defined meta-data conveys auxiliary information and does not alter or extend the EXI data format. Hence it is deemed acceptable to omit this information.

  6. Elements that are necessary to structure the EXI options document according to the XML schema (i.e. lesscommon, uncommon, alignment, datatypeRepresentationMap, preserve and common) MUST be omitted unless there is at least one nested element according to the previous steps.

The example below illustrates some requirements and the associated modifications that have been described.

Example 3-1. EXI Options vs. Canonical EXI Options
<exi:header 
    xmlns:exi="http://www.w3.org/2009/exi">
    <exi:lesscommon>
        <exi:preserve/>
        <exi:blockSize>1000000</exi:blockSize>
    </exi:lesscommon>
    <exi:common>
        <exi:compression/>
        <exi:fragment/>
    </exi:common>
</exi:header>
<exi:header 
   xmlns:exi="http://www.w3.org/2009/exi">
   <exi:lesscommon>
       <exi:uncommon>
           <exi:alignment>
               <exi:pre-compress/>
           </exi:alignment> 
       </exi:uncommon> 
   </exi:lesscommon>   
   <exi:common>
        <exi:fragment/>
   </exi:common>
</exi:header>
                            
                            
                            

 

Warning:

Applications that use Canonical EXI need to ensure that the senders and the receivers of EXI documents are using the same schema information. This is regardless schemaId is included in the header options or not. Failure to do so may result in producing an uncomparable (and perhaps undecipherable) EXI document.

4. Canonical EXI Body

The subsequently described EXI Canonicalization steps and algorithms expect as input XML information items. Each information item is mapped to its respective set of EXI events (see Table 4-1 in EXI specification) and produces as output a canonicalized EXI body stream. Following the presented algorithms guarantees that logically-identical documents produce identical serialized EXI body stream representations (assuming the same EXI coding options).

Each event in an EXI stream participates in a mapping system that relates events to XML Information Items so that an EXI document or an EXI fragment as a whole serves to represent an [XML Information Set]. Appendix B Infoset Mapping of [EXI Format 1.0] describes the mapping system in detail.

Note:

An EXI stream can be passed to a final recipient over multiple intermediate nodes. In general, it is feasible to parse and re-encode the EXI stream on such an intermediate node without affecting the canonical EXI stream. However, please note that alternating EXI Options (e.g., preserve option or schemaId) used to encode the body of the EXI stream, may lead to irrecoverable data loss or differences. The same issue applies to XML intermediate nodes (e.g., intermediate nodes removing DTDs et cetera).

4.1 EXI Alignment Options and Streams

EXI provides four alignment options, namely bit-packed, byte-alignment, pre-compression, and compression.

The canonicalized EXI form is the resulting EXI stream following the rules defined in this document. When the alignment option compression is set for an EXI stream, its canonical form is computed as if the EXI stream was encoded using the alignment option pre-compression instead.

EXI processors may make use of padding bits, for example to make the length of the EXI stream byte-aligned. If used, the padding bits in a Canonical EXI stream MUST always be represented as a sequence of 0 (zero) bits.

4.2 EXI Event Selection

EXI processors represent a given event such as a start element or an attribute by serializing an event code first, followed by the corresponding event content. Each event code is represented by a sequence of 1 to 3 parts that uniquely identifies an event.

In situations where EXI grammars provide more than one possible event the canonical EXI form prescribes which event (and respectively which event code) has to be chosen. That said, it is not uncommon that an EXI processor has certain flexibility in choosing the appropriate EXI grammar production, or respectively the appropriate event.

Canonical EXI processors MUST follow a two step process for selecting the one valid event(-code):

4.2.1 Select productions according to specific conventions

The availability of grammar productions is subject to the convention used by the application. A prominent convention is the [EXI Profile], which is more restrictive in regard to which production is usable than the [EXI Format 1.0] specification.

Note:

The EXI Profile uses the xsi:type attribute to switch from an evolving built-in element grammar to a non-evolving schema-informed grammar. The specification mentions in particular the xsd:anyType complex type but does not require this specific type. Canonical EXI processors MUST use xsd:anyType.

4.2.2 Use the event that matches most precisely

After excluding productions that are not usable (according to the convention in use) a canonical EXI processor MUST use the event that matches the following prioritized heuristics most precisely.

  1. For Start Element (SE) events the order is as follows:

    1. SE ( qname )

    2. SE ( uri : * )

    3. SE ( * )

    For Character (CH) events the order is as follows:

    1. CH [schema-typed value]

    2. CH [untyped value]

    For Attribute (AT) events the order is as follows:

    1. AT ( qname ) [schema-typed value]

    2. AT ( qname ) [untyped value]

    3. AT ( uri : * )

    4. AT ( * )

    5. AT ( * ) [untyped value]

  2. IF the representational accuracy is unaffected, then use the event with the least number event code parts.

Note:

The verification is solely based on EXI grammars and EXI datatypes. A Canonical EXI processor does not account for XML schema validity (similar to an EXI processor) in order to maintain high-performance efficiency.

The appendix section D.1 EXI Event Selection depicts one concrete example for choosing the correct event.

4.3 EXI Content Handling

4.3.1 Exclude extraneous events

The EXI grammars permit EXI processors to include extraneous empty-string CH ("") events that are not required by the grammar and do not change the resulting XML Infoset of the produced document.

Canonical EXI MUST exclude extraneous CH ("") events unless they are required by the EXI grammar.

Note:

EXI grammars may still require empty-string CH ("") events. An example are elements typed as String (e.g., element foo typed as xsd:string). When Strict is True an event sequence SE (foo) EE cannot be used. Instead SE (foo) CH ("") EE is required for the following strict simple type grammar.

Example 4-1. Strict Simple Type grammar
Type i, 0:
CH[schema-typed value] Type i, 1
Type i, 1:
EE

Moreover, applications due to various reasons may send a series of consecutive CH events. Canonical EXI MUST merge consecutive CH events to a single CH event.

4.3.2 Whitespace Handling

In general, Canonical EXI SHALL not change XML Information items. One exception to this statement is significant whitespace characters. Except as specified below, Canonical EXI MUST respect xml:space="preserve".

Note:

  • It is not possible to respect whitespace-handling rules in all situations. For example when the grammar in effect is a schema-informed grammar and xml:space is "preserve". For example, the value " 123 " (with a leading and trailing space character) typed as xsd:int cannot preserve the heading and trailing whitespace when typed datatype representation is used.

  • Use-cases requiring whitespace preservation might consider using the Preserve.lexicalValues option set to true. When Preserve.lexicalValues is true CH [schema-typed value] and AT [schema-typed value] productions MUST be used in all cases given that the restricted character sets can represent any string value.

When the current xml:space is not "preserve", different rules apply for 4.3.2.1 Simple Whitespace Data and 4.3.2.2 Complex Whitespace Data.

4.3.2.1 Simple Whitespace Data

The term simple data refers to data between SE and EE (i.e., Start Element tag followed by End Element tag).

When the grammar in effect is a schema-informed grammar use whiteSpace facet if any to normalize whitespaces.

When the grammar in effect is a schema-less grammar, then all whitespaces MUST be preserved.

4.3.2.2 Complex Whitespace Data

The term complex data refers to data between SE and SE, EE and SE, or EE and EE.

For complex data, whitespaces nodes (i.e., strings that consist solely of whitespaces) MUST be removed.

4.4 Stream Order

In general, a canonical EXI processor SHALL NOT change the order of the input sequence. The only exceptions to this statement are sequences of attributes and/or namespace declarations.

The EXI specification defines that namespace (NS) and attribute (AT) events associated with a given element occur directly after the start element (SE) event in the following order:

NSNS...NS AT (xsi:type) AT (xsi:nil) ATAT...AT

In addition, canonical EXI specifies that namespace declarations for a given element MUST be sorted lexicographically according to the NS prefix. Further, canonical EXI strictly requires that an xsi:type or an xsi:nil attribute MUST occur before other AT events even if it does not impact grammar selection. Moreover, attributes other than xsi:type and xsi:nil for a given element MUST be sorted lexicographically, first by qname local-name then by qname uri.

Note:

Optimizations such as pruning insignificant xsi:type values (e.g., xsi:type="xsd:string" for string values) or insignificant xsi:nil values (e.g., xsi:nil="false") are prohibited for a Canonical EXI processor.

For example, EXI Profile uses the xsi:type attribute (e.g., xsi:type="xsd:anyType") to switch to a non-evolving schema-informed grammar.

4.5 EXI Datatypes

This section describes the built-in EXI datatype representations used for representing content items in canonical EXI streams.

When the Preserve.lexicalValues option is true, individual items are represented as String. Each value MUST be represented as a String with the associated restricted character set, if such a set is defined for the associated datatype representation (see Restricted Character Sets for Built-in EXI Datatype Representations). String content items associated with a restricted character MUST also follow the rules described in 4.5.7 Restricted Character Sets.

When the Preserve.lexicalValues option is false, a value content item MUST be represented with the associated datatype representation. The following sub-sections describe the Canonical EXI behavior for datatypes that otherwise may not lead to a uniquely defined representation.

4.5.1 Unsigned Integer

The EXI specification defines that the Unsigned Integer datatype representation supports unsigned integer numbers of arbitrary magnitude. EXI processors SHOULD support arbitrarily large Unsigned Integer values. EXI processors MUST support Unsigned Integer values less than 2147483648.

Canonical EXI processors MUST use the Unsigned Integer datatype representation even if a value goes beyond the value 2147483647.

4.5.2 Enumeration

The EXI Enumeration assigns to each item an unsigned integer value that corresponds to its ordinal position in the enumeration in schema-order starting with position zero. When there is more than one item that represents the same value in the enumeration, the value MUST be represented by using the first ordinal position that represents the value.

4.5.3 Decimal

The EXI Decimal datatype is a Boolean sign followed by two Unsigned Integers. A sign value of zero (0) is used to represent positive Decimal values and a sign value of one (1) is used to represent negative Decimal values. The first Unsigned Integer represents the integral portion of the Decimal value. The second Unsigned Integer represents the fractional portion of the Decimal value with the digits in reverse order to preserve leading zeros.

The canonical EXI Decimal MUST respect the following constraint.

  • The sign value MUST be zero (0) if both the integral portion and the fractional portion of the Decimal value are 0 (zero).

4.5.4 Float

The EXI Float datatype uses two consecutive EXI Integers. The first Integer represents the mantissa of the floating point number and the second Integer represents the base-10 exponent of the floating point number.

The canonical EXI Float MUST respect the following constraints.

  • A mantissa value of -0 is not permitted.

  • An exponent value of -0 is not permitted.

  • If the mantissa is 0 and the exponent value is not -(214) the exponent MUST be 0.

  • If the mantissa is not 0, mantissas MUST have no trailing zeros.

  • If the exponent value is -(214) and the mantissa value is neither 1 nor -1, to indicate the special value not-a-number (NaN), the mantissa MUST be 0.

Given an EXI Float value that consists of one integer representing its mantissa and the other integer representing its exponent, Canonical EXI processors MUST find an equivalent canonical EXI Float that satisfies the above constraints, where the rules of determining equivalence are described below.

Two floats A and B each denoted as (mantissa, exponent) pair of (mA, eA) and (mB, eB) where eA >= eB are equivalent under the following circumstances.

  1. Both mantissa and exponent are the same between the two floats.

  2. Otherwise, if two exponents are different (i.e. eA > eB), substitute A with A2 where A2 has exponent eB and mantissa mA * 10(eA-eB). If A2 and B are equivalent values per the rule 1 above, A and B are equivalent.

The appendix section D.3 EXI Floats depicts one example algorithm for finding the canonical EXI Float that is equivalent to a given EXI Float value.

4.5.5 Date-Time

The EXI Date-Time is a sequence of values representing the individual components of the Date-Time.

The canonical EXI Date-Time MUST respect the following constraints.

  • The Hour value used to compute the Time component MUST NOT be 24.

  • The optional FractionalSecs component MUST be omitted if its value is zero.

  • If the Canonical EXI Option utcTime is equal to true, Date-Time values must be represented using Coordinated Universal Time (UTC, sometimes called "Greenwich Mean Time"). Doing so requires applying the algorithm defined in adding durations to dateTimes [XML Schema Datatypes] without modifying the value of seconds.

4.5.6 String and String Table

The EXI String datatype representation is a length-prefixed sequence of characters. If no restricted character set is defined for the string, each character is represented by its Unicode code point. The Unicode standard allows multiple different representations of certain characters. A canonical EXI processor MUST NOT change the code points (see appendix C.2 No Unicode Normalization for the rationale).

In EXI a string value content item is assigned to two partitions, a "local" value partition and the global value partition (see Partitions Optimized for Frequent use of String Literals). When a string value is found in the global or "local" partition, it may be represented using a compact identifier. In Canonical EXI a string value MUST be represented using a compact identifier if possible. Unless a convention was indicated in 2.1 Canonical EXI Options by an application to dictate differently (e.g., EXI Profile parameter localValuePartitions set to "0"), EXI processors MUST first try to use the "local" compact identifier, and only when this is not successful then try to use the global compact identifier.

Note:

  • One of the reasons the attempt to represent the string value as a "local" compact identifier may fail is because the string has already been used as a "local" compact identifier previously. EXI supports only one local partitions entry per value.

  • When a string value is not found in the global or "local" value partition a processor MAY also need to follow the rules described in 4.5.7 Restricted Character Sets according to the given restricted character set, if available.

Note that a Canonical EXI processor MUST also respect the XML schema whiteSpace facet, if defined.

4.5.7 Restricted Character Sets

Restricted Character Sets are applied in EXI to restrict the characters of the string datatype. The canonical representation dictates that characters from the restricted character set MUST use the according n-bit Unsigned Integer. Hence, only characters that are not in the set SHALL be represented by the n-bit Unsigned Integer N followed by the Unicode code point for each character represented as an Unsigned Integer.

4.5.8 Datatype Representation Map

The EXI option datatypeRepresentationMap may specify an alternate set of datatype representations for typed values in the EXI body stream. This specification does not define any canonicalization rules for alternate representations. Other specifications and/or groups making use of this feature MAY describe a canonical form.

A References

EXI Format 1.0
Efficient XML Interchange (EXI) Format 1.0 (Second Edition), John Schneider, Takuki Kamiya, Daniel Peintner, Rumen Kyusakov, Editors. World Wide Web Consortium. The latest version is available at https://www.w3.org/TR/exi/. (See https://www.w3.org/TR/2014/REC-exi-20140211/.)
XML Schema Datatypes
XML Schema Part 2: Datatypes Second Edition , P. Byron and A. Malhotra, Editors. World Wide Web Consortium, 2 May 2001, revised 28 October 2004. The latest version is available at https://www.w3.org/TR/xmlschema-2 . (See https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/.)
Canonical XML
Canonical XML Version 1.1 John Boyer and Glenn Marcy, Editors. World Wide Web Consortium, W3C Recommendation 2 May 2008. The latest version is available at https://www.w3.org/TR/xml-c14n11/ . (See https://www.w3.org/TR/2008/REC-xml-c14n11-20080502/.)
XML Information Set
XML Information Set (Second Edition), J. Cowan and R. Tobin, Editors. World Wide Web Consortium, 24 October 2001, revised 4 February 2004. This version is https://www.w3.org/TR/2004/REC-xml-infoset-20040204. The latest version is available at https://www.w3.org/TR/xml-infoset. (See https://www.w3.org/TR/2004/REC-xml-infoset-20040204/.)
EXI Impacts
Efficient XML Interchange (EXI) Impacts , Jaakko Kangasharju, Editor. World Wide Web Consortium. The latest version is available at https://www.w3.org/TR/exi-impacts/ . (See https://www.w3.org/TR/2008/WD-exi-impacts-20080903/.)
EXI Best Practices
Efficient XML Interchange (EXI) Best Practices , Mike Cokus and Daniel Vogelheim, Editors. World Wide Web Consortium. The latest version is available at https://www.w3.org/TR/exi-best-practices/ . (See https://www.w3.org/TR/2007/WD-exi-best-practices-20071219/.)
EXI Profile
Efficient XML Interchange (EXI) Profile , Youenn Fablet and Daniel Peintner, Editors. World Wide Web Consortium. The latest version is available at https://www.w3.org/TR/exi-profile/ . (See https://www.w3.org/TR/2012/WD-exi-profile-20120731/.)
XMLDSIG-CORE1
XML Signature Syntax and Processing Version 1.1, Donald Eastlake Joseph Reagle, David Solo, Frederick Hirsch, Magnus Nyström, Thomas Roessler, Kelvin Yiu, Editors. Mark Bartel, John Boyer, Barb Fox, Brian LaMacchia, Ed Simon, Authors. World Wide Web Consortium, W3C Recommendation 11 April 2013. The latest version is available at https://www.w3.org/TR/xmldsig-core1/. (See https://www.w3.org/TR/2013/REC-xmldsig-core1-20130411/.)
Character Model Fundamentals
Character Model for the World Wide Web 1.0: Fundamentals Martin J. Dürst, François Yergeau, Richard Ishida, Misha Wolf, Tex Texin, Authors. World Wide Web Consortium, W3C Recommendation 15 February 2005. The latest version is available at https://www.w3.org/TR/charmod/ (See https://www.w3.org/TR/2005/REC-charmod-20050215/.)
Character Model Identity
Character Model for the World Wide Web: String Matching and Searching Addison Phillips, Authors. World Wide Web Consortium, W3C Working Draft 19 November 2015. The latest version is available at https://www.w3.org/TR/charmod-norm/ (See https://www.w3.org/TR/2015/WD-charmod-norm-20151119/.)
XMLENC-CORE1
XML Encryption Syntax and Processing Version 1.1 Donald Eastlake, Joseph Reagle, Frederick Hirsch, Thomas Roessler, Editors. Takeshi Imamura, Blair Dillaway, Ed Simon, Kelvin Yiu, Magnus Nyström, Authors. World Wide Web Consortium, W3C Recommendation 11 April 2013. The latest version is available at https://www.w3.org/TR/xmlenc-core1/. (See https://www.w3.org/TR/2013/REC-xmlenc-core1-20130411/.)
IETF RFC 2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, Author. Internet Engineering Task Force, June 1999. Available at http://www.ietf.org/rfc/rfc2119.txt. (See http://www.ietf.org/rfc/rfc2119.txt.)
IETF RFC 3986
Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter, Authors. Internet Engineering Task Force, January 2005. Available at https://tools.ietf.org/rfc/rfc3986.txt. (See https://tools.ietf.org/rfc/rfc3986.txt.)

B XML Schema for Canonical EXI Options Document

The following schema describes the Canonical EXI options document for communicating all the Canonical EXI options. It is designed to reuse the EXI options document.

<xsd:schema targetNamespace="http://www.w3.org/2016/exi-c14n"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:exi="http://www.w3.org/2009/exi"
    elementFormDefault="qualified">

    <xsd:import namespace="http://www.w3.org/2009/exi"
        schemaLocation="https://www.w3.org/2009/exi/options.xsd"/>

    <xsd:element name="options">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="omitOptionsDocument" minOccurs="0">
                    <xsd:complexType/>
                </xsd:element>
                <xsd:element name="utcTime" minOccurs="0">
                    <xsd:complexType/>
                </xsd:element>
                <xsd:element ref="exi:header" minOccurs="0"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>

</xsd:schema>

C Design Decisions (Non-Normative)

This section discusses a number of key decision points in the design of Canonical EXI. A rationale for each decision is given and background information is provided.

C.1 Relationship to XML Security

Canonical XML was designed to be useful to applications that test whether an XML document has been changed (e.g., XML signature). EXI can be used in such use cases and offers benefits with respect to compact data exchange and fast processing. To ensure that relevant Infoset items are available the following EXI Fidelity Options must be always enabled: Preserve.pis, Preserve.prefixes, and Preserve.lexicalValues. When the XML Canonicalization algorithm preserves comments in a document, the EXI fidelity option Preserve.comments must be also enabled (see here for more details).

Canonical EXI, in contrast to Canonical XML, deals with EXI documents and does not require the overhead of plain-text XML data and its associated overhead.

Both normal forms, Canonical XML and Canonical EXI, can be used for building the normal form of XML Infoset and are applicable in the XML security context. Depending on the application and the associated requirements one or the other may be better suited. For example, XML Signature applications ideally can chose which one fits better.

Note:

In environments that use Canonical EXI for signing and have intermediate nodes that represent the associated Infoset using text XML, it is important to ensure the Canonical EXI signer and validator use the same set of options (see section E.2 Exchange Canonical EXI Options (Best Practices)).

C.2 No Unicode Normalization

The Unicode standard allows multiple different representations of certain "precomposed characters" (a simple example is "ç"). Thus two character sequences that have the same appearance and meaning when printed or displayed may differ in sequences of code points. The W3C provides a reference for interoperable text [Character Model Fundamentals] and also a normalized representation [Character Model Identity] but many XML processors do not perform this normalization. Furthermore, applications that must solve this problem can typically enforce character model normalization at all times starting when character content is created in order to avoid processing failures (see No Character Model Normalization in Canonical XML).

Therefore, character model normalization is out of scope for EXI canonicalization and a canonical EXI processor must not change the code points.

C.3 Date-Time Canonicalization Option

XML schema provides a canonical dateTime representation. That said, the EXI working group (also based on external feedback) has been found that the canonical form for XML Schema dateTime values is defined to make it easy to determine whether two Date-Time values refer to the same instant, regardless of the timezone used.

On the one hand, for many applications the Date-Time timezone is an important piece of information that should be preserved. As such, it will be surprising if the digital signature is not able to detect changes to this information. In addition, some use cases might be surprised if the canonical EXI format loses all their timezone information and changes all Date-Time values.

On the other hand, other applications may require the above mentioned canonical form.

Therefore, dateTime values, by default, are not canonicalized but one may specify UTC time normalization.

D Canonical EXI Examples (Non-Normative)

D.1 EXI Event Selection

The subsequently following example depicts the available productions for an example DocContent grammar. From the perspective of the [EXI Format 1.0] specification it is perfectly fine to match a start element "A" with event code 0 (zero) or 4 (four). A canonical EXI form prescribes event code 0 (zero).

Example D-1. Example productions with event codes
SyntaxEvent Code
DocContent
SE ("A") DocEnd0
SE ("B") DocEnd1
SE ("C") DocEnd2
SE ("D") DocEnd3
SE(*) DocEnd4
DT DocContent5.0
CM DocContent5.1.0
PI DocContent5.1.1

D.2 Stream Order

Example D-2. Attribute and Namespace Declaration Sorting
Stream (Event Input Sequence)
SDSE(root)NS(www.foo.com, foo)NS(www.bla.com, bla)AT(c)AT(b)AT(a)EEED
Canonical EXI Stream
SDSE(root)NS(www.bla.com, bla)NS(www.foo.com, foo)AT(a)AT(b)AT(c)EEED

D.3 EXI Floats

The Float datatype representation can be converted to the canonical form going through the following steps. Note, implementations are free to choose any strategy as long as the constraints in 4.5.4 Float are met.

Example D-3. An algorithm for converting float values to the canonical form

Let the float value have a decimal notation of the form <before>.<after> where before represents the value before the decimal point and after represents the value after the decimal point. The canonical representation of the mantissa and exponent shall be determined as follows:

  1. Initialize the exponent with the value 0 (zero).

  2. Examine the float value and extract the two portions before and after the decimal point. If the value after the decimal point can be represented as 0 (zero) without losing precision, then jump to step 4, otherwise jump to step 3.

  3. Decrement the exponent by 1 (one) and shift the decimal point of the float value by one digit to the right. Jump back to step 2.

  4. The portion before the decimal point can be safely converted to the signed mantissa value.

  5. If the signed mantissa is unequal 0 (zero), unequal -0 (negative zero), and contains a trailing zero, then jump to 6, otherwise jump to step 7.

  6. Increment the exponent by 1 (one) and shift the mantissa by one digit to the right. Jump back to 5.

  7. If the mantissa is equal -0 set the mantissa value to 0 (zero). Finished.

The subsequently following examples depict possible float values opposed to their canonical form.

Example D-4. Canonicalized EXI Float values
Float ValueCanonical EXI Float Value
MantissaExponent
123.0123001230123-4
0.000
-0.000
1.010
-1230.01-123001-2
0.1230123-3
123001232
12.0120
120E-1120
1.2E1120

D.4 EXI Date-Times

The subsequently following examples depict possible date-Time values opposed to their canonical form. The canonical EXI form always retains the original seconds part.

Example D-5. Canonicalized EXI Date-Time values
Date-Time ValueCanonical EXI Date-Time Value
(utcTime==false)(utcTime==true)
2015-08-11T24:00:00-07:302015-08-12T00:00:00-07:302015-08-12T07:30:00Z
2012-06-30T23:59:60-06:002012-06-30T23:59:60-06:002012-07-01T05:59:60Z

E Canonical EXI Applications (Non-Normative)

E.1 Signature Processing Steps

The figure below describes the involved processing steps when Canonical EXI is used for signing an EXI document or a fragment and the signature value is embedded within the document: First, the EXI stream or fragment of the EXI stream to be signed has to be transformed in a canonical form according to the requirements given in this document (see 2. Canonical EXI Stream). Then, the canonical representation is used to determine the signature value based on the intended signature algorithm. At this point the signature value can be set within the EXI document and can be transmitted to the recipient.

To validate the signature value for compliance, the receiver has to build the canonical EXI stream for the signed portion. Note, this step can be skipped if there is pre-knowledge on the receiver side that the EXI stream already fulfills the requirements of Canonical EXI. Finally, to determine the correctness of the signature value (based on the signature algorithm) it can be compared with the embedded signature value provided by the sender.


Canonical EXI used in Signature

Figure E-1. Canonical EXI used in Signature


In the case of XML Signature processing, a detached signature can be used or an enveloped signature, in which case the signature element is not included in the hash calculation.

During signature generation, the digest is computed over the canonical EXI stream. There are two independent aspects that are to be solved.

  1. What gets hashed, and

  2. How to exchange and share Canonical EXI options (other than out-of-band) (see E.2 Exchange Canonical EXI Options (Best Practices))

E.2 Exchange Canonical EXI Options (Best Practices)

The canonicalization process of EXI is based on the knowledge of the used 2.1 Canonical EXI Options which imply the EXI options. The Canonical EXI options specify whether the EXI Options document is omitted and whether a Date-Time values must be represented using Coordinated Universal Time (UTC). Also, the EXI options part (which in its entirety might be optional in the EXI header) communicate the various options that have been used to encode the actual XML information with EXI and are essential for any EXI processor.

Appendix B XML Schema for Canonical EXI Options Document provides an XML Schema describing the Canonical EXI Options document. Specifically, it allows users to communicate EXI-C14N (and EXI) options as part of the digital signature framework, via an out-of-band protocol, an overarching specification, or in other use-cases.

E.2.1 Example - XML Signature

According to section 6.1 of XML Signature Syntax and Processing Version explicit additional parameters to an algorithm appear as content elements within the algorithm role element. The role element in this case is "CanonicalizationMethod". Hence, Canonical EXI use cases can leverage the sub-elements in CanonicalizationMethod.

CanonicalizationMethod element is given a schema type as:

    <element name="CanonicalizationMethod" type="ds:CanonicalizationMethodType"/> 
    
    <complexType name="CanonicalizationMethodType" mixed="true">
        <sequence>
            <any namespace="##any" minOccurs="0" maxOccurs="unbounded"/>
            <!-- (0,unbounded) elements from (1,1) namespace -->
        </sequence>
        <attribute name="Algorithm" type="anyURI" use="required"/> 
    </complexType>
                    

A possible Canonical EXI options document (as part of XML Signature) may look as follows

...
<CanonicalizationMethod xmlns="http://www.w3.org/2000/09/xmldsig#" 
    Algorithm="http://www.w3.org/TR/exi-c14n">
    <exi-c14n:options xmlns:exi="http://www.w3.org/2009/exi"
        xmlns:exi-c14n="http://www.w3.org/2016/exi-c14n">
        <exi-c14n:omitOptionsDocument/>
        <exi-c14n:utcTime/>
        <exi:header>
            <exi:common>
                <exi:compression/>
                <exi:fragment/>
            </exi:common>
        </exi:header>
    </exi-c14n:options>
</CanonicalizationMethod> 
...
                    

Note:

Another proposal is to use the SignatureProperties element.

E.2.2 Decision Criteria

This section provided best practices how to exchange Canonical EXI options but use-cases are not limited to the afore mentioned proposals. There might be other methods for communicating Canonical EXI options:

  • A community of interest might decide on a set of Canonical EXI options that are appropriate for their use case and codify them in their specifications / standards. Implementations that comply with these specifications / standards will all use the same options, without the need for communicating them dynamically at runtime.

  • A community of interest may devise a protocol for exchanging the Canonical EXI options dynamically out-of-band as needed.

  • and so forth

F Useful References (Non-Normative)

RFC 3986

Normalization and comparison of url and uri values, if applied within a document, is best performed in accordance with [IETF RFC 3986].

G Recent Specification Changes (Non-Normative)

G.1 Changes from Candidate Recommendation

G.2 Changes from Last Call Working Draft

  • Updated the section on motivation to make clear that EXI canonicalization is useful for traditional XML users also (see 1.2 Motivation).

  • Introduced the concept of Canonical EXI Options to provide a single, simple, unambiguous way to express the rules of EXI-C14N options (see 2.1 Canonical EXI Options).

  • The Canonical EXI Option omitOptionsDocument specifies whether the fifth part of the EXI Header, the EXI Options, is present (see 3. Canonical EXI Header).

  • Changed the presence of the element schemaId in the Canonical EXI Header from "MUST" to "MAY" by describing the consequences in an accompanying warning (see 3. Canonical EXI Header).

  • It was made clear that the element datatypeRepresentationMap must be omitted when the value of the Preserve.lexicalValues fidelity option is true (see 3. Canonical EXI Header).

  • It is now explicitly stated that each event in an EXI stream participates in a mapping system that relates events to XML Information Items (see 4. Canonical EXI Body).

  • In the process of selecting the EXI event that matches most precisely Character (CH) events have been added (see 4.2.2 Use the event that matches most precisely).

  • A section about EXI content handling specifices how to deal with extraneous events, empty element content, and whitespaces (see 4.3 EXI Content Handling).

  • Added further constraints for Date-Time values. Moreover, the Canonical EXI Option utcTime specifies whether Date-Time values must be represented using Coordinated Universal Time (see 4.5.5 Date-Time).

H Acknowledgements (Non-Normative)

This document is the work of the Efficient XML Interchange (EXI) WG.

Members of the Working Group are (at the time of writing, sorted alphabetically by last name):

The EXI Working Group would like to acknowledge the following former members or external experts for their leadership, guidance and expertise they provided throughout the process of creating this document (sorted alphabetically by last name):