W3C

Efficient XML Interchange (EXI) Primer

W3C Working Draft 24 April 2014

This version:
http://www.w3.org/TR/2014/WD-exi-primer-20140424/
Latest version:
http://www.w3.org/TR/exi-primer/
Previous version:
http://www.w3.org/TR/2009/WD-exi-primer-20091208/
Editors:
Daniel Peintner, Siemens AG
Santiago Pericas-Geertsen, Sun Microsystems (till 2009-12-08)

Abstract

This is a non-normative document intended to provide an easily readable technical background on the Efficient XML Interchange (EXI) format. It is oriented towards quickly understanding how the EXI format can be used in practice and how options can be set to achieve specific needs. Section 2. Concepts describes the structure and content of an EXI document and introduces the notions of EXI header, EXI body, and EXI grammar which are fundamental to the understanding of the EXI format. Furthermore, additional details about data type representation, compression, and their interaction with other format features are presented. Finally, Section 3. Efficient XML Interchange by Example provides a detailed, bit-level description of both, a schema-less and a schema-informed example.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This working draft is an updated version of the EXI Primer document to reflect changes according to the Second Edition Recommendation of the Efficient XML Interchange Format 1.0. At the same time the encoding and decoding examples have been improved and harmonized to ease understanding.

This document has been produced by the Efficient XML Interchange Working Group as part of the W3C XML Activity. The goals of the Efficient XML Interchange (EXI) Format are discussed in the Efficient XML Interchange (EXI) Format document. The authors of this document are the members of the Efficient XML Interchange Working Group.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Please send comments about this document to the public-exi@w3.org mailing list (Archives).

Table of Contents

1. Introduction
2. Concepts
    2.1 EXI Streams
        2.1.1 EXI Header
        2.1.2 EXI Body
        2.1.3 EXI Grammars
            2.1.3.1 Built-In Grammar
            2.1.3.2 Schema-informed Grammar
    2.2 Content Representation
        2.2.1 Built-in EXI Datatype Representations
        2.2.2 String Table
    2.3 Compression
3. Efficient XML Interchange by Example
    3.1 Notation
    3.2 Options
    3.3 Encoding Example
    3.4 Decoding Example

Appendix

A References


1. Introduction

The intended audience of this document includes users and developers of EXI with a basic understanding of XML and XML Schema. This document provides an informal description of the EXI format; the reader is referred to the [Efficient XML Interchange (EXI) Format 1.0 (Second Edition)] document for further details. Hereinafter, the presentation assumes that the reader is familiar with the basic concepts of XML and the way XML Schema can be used to describe and enforce constraints on XML documents.

The document is comprised of two major parts. The first part describes the structure and content of an EXI document with and without compression. More specifically, it describes the concept of an EXI stream and how it is generated using EXI grammars, as well as the implications on structure and content ordering in an EXI stream when compression is enabled. As a practical application of the concepts from the first part, the second part presents a complete bit-level description of an EXI document.

2. Concepts

The development of the Efficient XML Interchange (EXI) format was guided by five design principles, namely, the format had to be general, minimal, efficient, flexible, and interoperable. The format satisfies these prerequisites, achieving generality, efficiency, and flexibility while at the same time keeping complexity in check.

Many of the concepts employed by the EXI format are applicable to the encoding of arbitrary languages that can be described by a grammar. Even though EXI utilizes schema information to improve compactness and processing efficiency, it does not depend on accurate, complete, or current schemas to work.

2.1 EXI Streams

EXI represents the contents of an XML document as an EXI stream. As shown below, an EXI stream consists of an EXI header followed by an EXI body.

Table 2-1. EXI Stream Structure
      EXI Header             EXI Body      

The EXI header conveys format version information and may also include the set of options that were used during encoding. If these options are omitted, it is assumed that the decoder has access to them out of band. The EXI body comprises an event sequence describing the document (or document fragment) that is encoded. The following two sections describe the EXI header and EXI body in more detail.

2.1.1 EXI Header

The header communicates encoding properties that are needed to decode the EXI body. The minimal header can be represented in a single byte. This keeps the overhead and complexity to a minimum and does not sacrifice compactness, especially for small documents where a header can introduce a large constant factor.

The structure of an EXI header is depicted in the following figure.

Table 2-2. EXI Header Structure
[EXI Cookie]Distinguishing BitsPresence Bit for EXI OptionsEXI Format Version[EXI Options][Padding Bits]

The EXI header starts with an optional four-byte EXI Cookie. The four byte field consists of four characters " $ " , " E ", " X " and " I " in that order, each represented as an ASCII octet, that can be used to distinguish an EXI stream from a broad range of data streams.

The EXI Cookie is followed by a pair of Distinguishing Bits. The two bit-sequence (1 0) can be used to distinguish an EXI document from a textual XML document and is sufficient to distinguish EXI streams from XML streams based on a broad range of character encodings.

The Presence Bit for EXI Options follows the distinguishing bits. The value of this single bit is used to indicate the presence or absence of the EXI Options that appear later in the header.

The EXI Format Version identifies the version of EXI in use and allows future improvements and modifications. A leading 0 (zero) bit indicates that the document is encoded according to the final version of the recommendation, while a leading 1 (one) indicates that it is a preview version. The differentiation is introduced to facilitate early releases of preview versions with less strict interoperability requirements. Only final versions are required to be processed by compliant processors. The leading bit is followed by one or more 4-bit sequences which are collectively interpreted as a format version number starting at 1. For example, the 4-bit sequence 0000 is interpreted as version 1 and the two 4-bit sequences 1111 0001 are interpreted as 15 + 2 or version 17.

The EXI Options specify how the body of an EXI stream is encoded and, as stated earlier, their presence is controlled by the presence bit earlier in the header. The overhead introduced by the EXI options is comparatively small given that they are formally described using an XML schema and are encoded using EXI as well. The following table describes the EXI options that can be specified in the EXI header. When the EXI Options document does not specify a value for a particular option, the default value is assumed.

Table 2-3. EXI Options
EXI OptionDescriptionDefault Value
alignmentAlignment of event codes and content items bit-packed
compressionIndicates if EXI compression is to be used for better compactnessfalse
strictStrict interpretation of schema is used to achieve better compactnessfalse
fragmentIndicates if the body is to be encoded as an EXI fragment instead of an EXI documentfalse
preserveA set of options that controls whether comments, processing instructions, etc. are preservedall false
selfContainedEnables self-contained elements. Self-contained elements may be read independently from the rest of the EXI bodyfalse
schemaIdIdentify the schema information, if any, used to encode the body no default value
datatypeRepresentationMapSpecify alternate datatype representations for typed values in the EXI body no default value
blockSizeSpecifies the number of Attribute (AT) and Character (CH) values for each block used for EXI compression 1,000,000
valueMaxLengthSpecifies the maximum string length of value content items to be considered for addition to the string table unbounded
valuePartitionCapacitySpecifies the total capacity of value partitions in a string table unbounded
[user defined meta-data]User defined options may be added no default value

Most of the options are straightforward and act as boolean values to enable or disable a feature. They are represented using optional XML elements which are also encoded using EXI. For more information on the XML schema that is used to encode these options, the reader is referred to XML Schema for EXI Options Header.

The preserve options shown in the table above are really a family of options that control which XML items are preserved and which XML items are ignored. These are collectively known as fidelity options. These options can be used to eliminate the associated overhead of communicating unused XML items. Certain XML items such as processing instructions or DTDs may never occur (like in SOAP) or are simply unimportant to the use case or application domain. Fidelity options are used to manage filters for certain XML items as shown in the following table.

Table 2-4. Fidelity Options
Fidelity OptionEffect
Preserve.commentsProductions of CM (Comment) events are preserved in grammars
Preserve.pisProductions of PI (Processing Instruction) events are preserved in grammars
Preserve.dtdProductions of DOCTYPE and ER (Entity Reference) events are preserved
Preserve.prefixesNS (Namespace Declaration) events and namespace prefixes are preserved
Preserve.lexicalValuesLexical form of element and attribute values is preserved

Naturally, XML items that are discarded at encoding time (due to a particular setting of the fidelity options) cannot be reconstructed exactly at decoding time. The next section deals with the EXI Body and discusses in more detail the effects of enabling and disabling fidelity options.

2.1.2 EXI Body

The body of an EXI document is composed of a sequence of EXI events. The notion of an event in this context is similar to that in the StAX and SAX APIs. XML items are encoded into one or more EXI events; for example, an attribute named foo can be encoded as AT("foo") and an element named bar as the pair of events SE("bar") and EE. EXI events may have additional content associated with them. For example, the attribute event AT("foo") may have an attribute value foo1 associated with it. The following table shows all the possible event types together with their associated information items distinguished by structure and content. In EXI terminology, content denotes attribute and character values while all other information items are considered as belonging to the structure category.

Table 2-5. EXI Event types
EXI Event TypeGrammar Notation Information Items
StructureContent
¹EXI Options such as preserve and selfContained can be used to prune events from the EXI stream to realize a more compact representation.
Start DocumentSD  
End DocumentED  
Start ElementSE ( qname )[prefix] 
SE ( uri:* ) local-name, [prefix]  
SE ( * ) qname, [prefix]  
End ElementEE  
AttributeAT ( qname )[prefix] value
AT ( uri:* ) local-name, [prefix]
AT ( * ) qname, [prefix]
CharactersCH  value
Namespace Declaration¹NS uri, prefix, local-element-ns  
Comment¹CM text  
Processing Instruction¹PI name, text  
DOCTYPE¹DT name, public, system, text  
Entity Reference¹ER name  
Self Contained¹SC    

For named XML items, such as elements and attributes, there are three types of events: SE(qname), SE(uri:*) and SE(*) as well as AT(qname), AT(uri:*) and AT(*). These events differ in their associated structure: when SE(qname) or AT(qname) are used, the actual qname of the XML item is not encoded as part of the event while SE(uri:*) and AT(uri:*) events do not encode the uri. The decision to use one type of event over the other will be explained later after introducing the notion of EXI grammars. Additionally, Fidelity Options may allow the preservation of namespace prefixes.

The fidelity options introduced in Section 2.1.1 EXI Header may be used to prune EXI events such as Namespace Declaration (NS), Comment (CM), Processing Instruction (PI), DOCTYPE (DT) or Entity Reference (ER). Grammar pruning simplifies the encoding and decoding process and also improves compactness by filtering out unused event types.

Consider a simple XML document from a notebook application:

Example 2-1. Notebook (XML Document)
<?xml version="1.0" encoding="UTF-8"?> 
<notebook date="2007-09-12">
 <note category="EXI" date="2007-07-23">
  <subject>EXI</subject>
  <body>Do not forget it!</body>
 </note>
 <note date="2007-09-12">
  <subject>Shopping List</subject>
  <body>milk, honey</body>
 </note>
</notebook>

The sequence of EXI events corresponding to the body of this XML document is shown below.


EXI Body Stream

Figure 2-1. EXI Body Stream


This sequence of EXI events can be easily mapped to the structure of the XML document shown above. Every document begins with a Start Document (SD) and ends with an End Document (ED). The order in which attributes are encoded may be different in schema-less and schema-informed EXI streams, as is the exact content associated with each event.

The actual number of bits used to represent each type of event, excluding its content, differs depending on the context. The more event types can occur in a certain context, the larger the number of bits required to represent an event in that context. What constitutes a context in this case is more formally defined by an EXI grammar production in the next section.

2.1.3 EXI Grammars

EXI is a knowledge based encoding that uses a set of grammars to determine which events are most likely to occur at any given point in an EXI stream and encodes the most likely alternatives in fewer bits. It does this by mapping the stream of events to a lower entropy set of representative values and encoding those values using a set of simple variable length codes or an EXI compression algorithm.

EXI grammars are regular grammars in which productions are associated with event codes. An EXI encoder, driven by an XML event stream, matches events to grammar productions and uses their associated event codes to represent an XML document or XML fragment. Since EXI grammars are regular grammars, the sequence of event codes written by an encoder corresponds to a path in the finite automaton that accepts the grammar. In reality, given that XML is not a regular language, a single grammar cannot be used to represent an entire XML event stream. Instead, an EXI encoder uses a stack of grammars, one for each element content model (just like an XML Schema validator might do).

An event code is represented by a sequence of one to three parts, where each part is a non-negative integer. Event codes in an EXI grammar are assigned to productions in such a way that shorter event codes are used to represent productions that are more likely to occur. Conversely, longer event codes are used to represent productions that are less likely to occur. EXI grammars are designed in a way that the average number of bits needed to represent each production is less than that for a grammar in which more likely and less likely productions are not distinguished. The following tables illustrate this principle via an example.

Table 2-6. Event Code Assignment
EventIndicator#bits
AT(date)04
AT(category)1
EE2
AT(*)3
NS4
SE(*)5
CH6
CM7
PI8
#distinct values9 
     
EventEventCode#bits
AT(date)0  2
AT(category)1  
EE20 2 + 3
AT(*)21 
NS22 
SE(*)23 
CH24 
CM2502 + 3 + 1
PI251
#distinct values362 
Naive Event Code Assignmentvs.EXI Event Code Assignment

In the first table, where productions are not separated according to their probability, a 4-bit code is needed to represent each entry. In the second table, on the other hand, code lengths vary from 2 bits to 6 bits since productions are grouped based on their likelihood to occur. Assuming the content model for the element being encoded corresponds to the sequence AT(category) AT(date) (i.e., the element declares two attributes) then the encoding of all the event codes will be 4 bits shorter using the second table.

EXI grammars take advantage of a priori knowledge of the kind of data being encoded, namely, XML documents and XML fragments. In particular, EXI grammars can take advantage of the fact that, on any given grammar, certain XML items are more popular than others. For example, by simple inspection of real-world documents, it is easy to verify that attributes occur more frequently than processing instructions and should therefore receive shorter event codes.

Further improvements in grammar design are possible if schema information is available. In this case, we can not only take advantage of generic XML knowledge but also of knowledge that is specific to the type of documents being encoded. For example, as shown in the tables above, we can add specific productions such as AT(category) and AT(date) with shorter event codes than AT(*).

The following two sections describe the differences between the built-in grammars and the schema-informed grammars. Note that an EXI encoder may only have partial schema information in which case it will use a combination of built-in and schema-informed grammars during encoding.

2.1.3.1 Built-In Grammar

EXI uses a set of built-in grammars to encode XML documents and XML fragments when no schema information is available. There are built-in grammars to encode documents, fragments, and elements. Document grammars and fragment grammars describe the top-level structure, while element grammars describe the structure of every element. Fragment grammars are more lenient than document grammars; for example, they allow multiple top-level elements to be encoded as siblings. For more information on these grammars, the reader is referred to Built-in XML Grammars.

The EXI format describes a mechanism by which built-in grammars are dynamically extended using information from the actual instance being encoded. Stated differently, the EXI format describes a learning mechanism to further improve efficiency when no schema information is available statically. Newly learned productions are assigned short event codes, improving compactness for every subsequent use of those productions. In addition, by adding new productions to the grammar, certain data associated with an event only needs to be encoded once. For example, if an element named notebook is matched by SE(*) and subsequently matched by SE(notebook), the actual localName "notebook" is only encoded once as part of the SE(*) event.

As pointed out in the previous section, EXI grammars are always regular and can, therefore, be accepted by finite automata (FA). To provide a more operational view of an EXI processor, we will opt for the use of FA to explain how grammars work. The following figure shows a stack of grammars in which the top-level grammar accepts "note" elements. State transitions in black correspond to the built-in element grammar; state transitions in red have been learned as a result of encoding the element before.


Built-In Grammar for SE(note)

Figure 2-2. Built-In Grammar for SE(note)


The built-in element automaton has two distinguished states: StartTag and Element. The former accepts attribute and namespace events that must occur before any element content; the latter accepts only element content which excludes attribute and namespace events. This separation enables the use of short codes which improves compactness and processing time.

As stated earlier, transitions in red are extensions to the built-in element grammar based on knowledge acquired about the element note. Notice how AT(date), AT(category) and SE(subject) have been added out of the StartTag state while SE(body) has been added out of the Element state. In particular, this suggests that SE(subject) is expected to occur before SE(body), and that both of these SE events are expected to occur after any AT event. In addition, notice that both AT(*) and SE(*) are still available to enable future learning.

2.1.3.2 Schema-informed Grammar

EXI grammars can be further improved if schema information is known statically. Schema information can be interpreted in two different ways or encoding modes: strict and non-strict. In strict mode, the instances being encoded must be largely valid with respect to the schema; most deviations from the schema will result in an encoding error. In non-strict mode, any deviations are accepted and encoded using more generic events. Examples of deviations are attributes whose actual values do not match the type defined in the schema or elements whose structure does not correspond to that in the schema. Given that strict grammars have fewer productions (no need for SE(*) or AT(*) in most cases) shorter event codes can be used to encode each option.

Instead of being dynamically extensible as the built-in grammars, schema-informed grammars are created statically based on the information in the available schema. This process will add productions of the form AT(qname) or SE(qname) guided by the attribute and element declarations in the schema. Let us continue the example from the previous section by assuming the following schema is available statically.

Example 2-2. Notebook (XML Schema)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="notebook">
    <xs:complexType>
      <xs:sequence maxOccurs="unbounded">
        <xs:element name="note" type="Note"/>
      </xs:sequence>
      <xs:attribute ref="date" />
    </xs:complexType>
  </xs:element>
  <xs:complexType name="Note">
    <xs:sequence>
      <xs:element name="subject" type="xs:string"/>
      <xs:element name="body" type="xs:string"/>
    </xs:sequence>
    <xs:attribute ref="date" use="required" />
    <xs:attribute name="category" type="xs:string"/>
  </xs:complexType>
  <xs:attribute name="date" type="xs:date"  />
</xs:schema>

The schema for the element note states that it has a mandatory attribute date and an optional attribute category, and that its structure is composed of an element subject followed by an element body. An automaton that corresponds to the strict grammar for this element is shown next.


Strict Schema-Informed Grammar for SE(note)

Figure 2-3. Strict Schema-Informed Grammar for SE(note)


Note that AT(category) is accepted before AT(date) even though their order is reversed in the schema. This is because attributes in schema-informed grammars must be sorted lexicographically, first by local name and then by namespace URI. Attribute sorting reduces the number of options which, in turn, greatly simplifies grammar creation and improves compactness. Since this automaton does not include transitions on AT(*) or SE(*), any deviations from the schema will result in an encoding error.

Generally speaking, schema-informed grammars should be favored over built-in grammars. Schema knowledge characterizes constraints of the structure and content type of XML information items. Hence schema-informed grammars do not evolve while processing and value items such as characters and attribute values are encoded according to their types. As a consequence, processing speed increases and compaction is improved.

2.2 Content Representation

2.2.1 Built-in EXI Datatype Representations

EXI uses built-in datatype representations to represent so called content value items in an efficient manner. In other words, all attribute and character values are encoded according to their EXI datatype representation associated with their XML Schema datatype. Type information can be retrieved from available schema information. The following table lists the mapping between [XML Schema Datatypes] and the Built-in Datatype Representations supported in EXI.

Table 2-7. Built-in EXI Datatypes
Built-in EXI Datatype RepresentationXML Schema Datatypes
Binary base64Binary, hexBinary
Boolean boolean
Date-Time dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth
Decimal decimal
Float float, double
n-bit Unsigned Integerinteger with bounded range 4096 or smaller as determined by the values of minInclusive, minExclusive, maxInclusive and maxExclusive facets.
Unsigned Integer nonNegativeInteger or integer with either minInclusive facet specified with a value equal to or greater than 0, or minExclusive facet specified with a value equal to or greater than -1.
Integer any other integer that is not already covered by n-bit Unsigned Integer or Unsigned Integer
String string, anySimpleType and all types directly derived by union
ListAll types directly derived by list, including NMTOKENS, IDREFS and ENTITIES
QNamexsi:type attribute values when Preserve.lexicalValues option value is false

Note:

The built-in EXI datatype QName is used for structure coding, such as qualified names for XML elements and attributes.

Editorial note: DP 
Does this note still make sense?

Enumerated values are efficiently encoded as n-bit Unsigned Integers where n = ⌈ log 2 m ⌉ and m is the number of items in the enumerated type. The ordinal position (starting with position zero) of each value item in the enumeration in schema-order is used as identifier. Exceptions are for schema union datatypes, list datatypes, as well as QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.

The interested reader is referred to the Efficient XML Interchange (EXI) Format 1.0 document which describes in details the encoding rules for representing built-in EXI datatypes. When the preserve.lexicalValues option is true, all values are represented as Strings. Some values that would have otherwise been designated to certain built-in EXI datatype representations are represented as Strings with restricted character sets. In the absence of external type information (no available schema information) all attribute and character values are typed as Strings.

2.2.2 String Table

String tables are used in memory-sensitive areas allowing a compact representation of repeated string values. Re-occurring string values are represented using an associated compact identifier rather than encoding the string literally again. When a string value is found in the string table (i.e. a string table hit) the value is encoded using a compact identifier. Only if a string value is not found in the associated table (i.e. a string table miss) is the string encoded as String and a new compact identifier is introduced.

EXI puts the following four information items into string tables and partitions the string tables based on the context in which they occur.

  • uri

  • prefix

  • local-name

  • value

The table below shows EXI information items used in section 2.1.2 EXI Body and its relations to the string table partitions (e.g. a prefix information item is assigned to the Prefix partition while value information items are assigned to the Value partition).

Table 2-8. String Table Partition Assignment
Information itemUsed in EXI EventString Table Partition
local-name SE (uri:*), AT (uri:*) LocalName
prefix NS, SE, AT Prefix
qname SE ( * ), AT ( * ) URI (uri portion of qname)
LocalName (local-name portion of qname)
uri NS URI
value CH, AT Value

Each EXI string table partition is optimized for more frequent use of either compact identifiers or string literals depending on the purpose of the partition. The URI and the Prefix partitions are optimized for frequent use of compact identifiers while the LocalName and the Value partitions are optimized for frequent use of string literals.

In the subsequent paragraphs, more details about the different partitions are given by making use of the previously introduced Notebook example. The Notebook XML Document example is repeated here to simplify algorithm illustrations.

<?xml version="1.0" encoding="UTF-8"?> 
<notebook date="2007-09-12">
 <note category="EXI" date="2007-07-23">
  <subject>EXI</subject>
  <body>Do not forget it!</body>
 </note>
 <note date="2007-09-12">
  <subject>Shopping List</subject>
  <body>milk, honey</body>
 </note>
</notebook>

The uri portion of qname content items and uri content items are assigned to the URI partition. The partition is initially pre-populated with three likely entries (see figure below). When a schema is provided, there is an additional entry that is appended and the URI partition is also pre-populated with the name of each target namespace declared in the schema, plus namespace URIs allowed in wildcard terms and attribute wildcards.


Initial Entries in URI Partition

Figure 2-4. Initial Entries in URI Partition


The local-name portion of qname content items and local-name content items are assigned to LocalName partitions. Respectively, prefix content items are assigned to Prefix partitions. Both partition types are initially pre-populated with likely entries (see figure below). These types of partitions are further differentiated according to the associated namespace URI. In our notebook example no prefixes are used and the default namespace URI ("" [empty-string]) is used. When a schema is provided, LocalName partitions are also pre-populated with the local-name of each attribute, element and type declared in the schema, sorted lexicographically. Further local-name entries are appended in the order they appear in the actual XML instance (no additional sorting is applied).


Initial Entries in Prefix and LocalName Partitions

Figure 2-5. Initial Entries in Prefix and LocalName Partitions


The figure above shows in highlighted form uri and local-name items used throughout the entire example document. For instance, the notebook sample assigns 7 local-name entries, such as notebook and date, to the empty URI namespace. Whenever local-name and/or uri information items occur again, the compact identifier is used instead.

The last string table partition type is the Value partition. A Value partition is initially empty and grows while processing an XML instance. The total number of value items or the maximum string length of value content items can be restricted to save memory on small devices (see valuePartitionCapacity and valueMaxLength EXI Options). Attribute and Character content-values of type String are assigned to this partition.


Final Entries in Value Partition

Figure 2-6. Final Entries in Value Partition


The figure above illustrates that value content items can be indexed from two different partitions, a 'global' partition and a 'local' partition. When a string value is neither found in the global nor in the local value section its string literal is encoded as String and the string value is added to both, the associated local and the global string index.

In our example we assume that all value items are represented as String, as it is the case in schema-less grammars or if the fidelity option Preserve.lexicalValues is true. When a string value is found in the local value section for a given element or attribute, EXI can use the corresponding compact identifier to encode the re-appearance more efficiently. The value "2007-09-12" appears twice in the date context. The second occurrence results in a local value hit and is respectively encoded as a 1 bit compact identifier.

On the other side, if a string value is not found in the local section, but is found in the global section, the corresponding global compact identifier is used. The value "EXI" appears two times in the notebook example, respectively in the context of category and subject. Hence, the second appearance results in a global value hit encoded as a 3 bit compact identifier since there are 6 entries in the global section (3 = ⌈ log 2 6 ⌉). Due to the different table sizes, a global compact identifier is generally less compact than a local compact identifier. Still, global value hits avoid the repeated encoding of string literals.

The number of bits needed to encode a compact identifier depends on the actual number of entries of the associated table at that time. Since all tables are growing while parsing an XML instance, the number of bits is not fixed. The figure above illustrates the situation after coding the entire XML instance. This growth effect applies to all string table partitions and makes the format very compact for small documents.

Note:

This section describes EXI String Tables at a conceptual level. The exact bit representation of table misses and indices is not presented in this document, but is described in full in the Efficient XML Interchange (EXI) Format 1.0 document.

2.3 Compression

EXI can use additional computational resources to achieve higher compaction. EXI compression leverages knowledge of XML to achieve higher compression efficiency than generic compression of an EXI stream. It multiplexes an EXI stream of heterogeneous data elements into channels of more homogeneous data elements that compress better together.

The mechanism used to combine homogeneous data is simple and flexible enough so that it can be used in both, schema-informed and schema-less EXI streams. Element and attribute values are grouped according to their qnames. Structure information such as Event Codes is also combined. To keep compression overhead at a minimum, smaller qname channels are combined while larger channels are compressed separately.

The figure below depicts a bit-packed EXI Body Stream where no EXI compression is in use. Grey buckets represent structure information and colored buckets are used for content information. The color is determined by the associated qname (e.g. date, category, subject, body).


EXI Body Stream

Figure 2-7. EXI Body Stream


XML instances can thus be treated as a combination of structure and content information. The content information can be further divided in different sections according to the context (surrounding structure as indicated by a qname). EXI treats XML instances this way and uses these implied partitions, referred to as channels, to provide blocked input to a standard compression algorithm. This grouping of similar data increases compression efficiency.


EXI Compression

Figure 2-8. EXI Compression


Note:

An pre-compression phase creates a byte-aligned representation of event codes and content items that is more amenable to compression algorithms compared to unaligned representations. Most compression algorithms operate on a series of bytes to identify redundancies in the octets.

By combining smaller channels into the same compressed stream while others are compressed separately, EXI keeps the compression overhead at a minimum. The mechanism to determine whether channels are combined or compressed separately is guided by the number of value content items present in the EXI stream. For small documents (≤ 100 value content items) EXI uses a single compressed stream while larger documents (> 100 value content items) result in several independent compressed streams. The notebook example falls in the first category and is encoded as a single compressed deflate stream containing first the structure channel, followed by the qname channels in the order they appear in the document (date, category, subject, body). The reader is referred to the Efficient XML Interchange (EXI) Format 1.0 document for further details.

The additional task of applying EXI compression is legitimated by the fact that, in many use-cases, encoded files become over 100 times smaller than XML and up to 14 times more compact than gzipped XML. In addition, the use of computational resources to achieve higher compaction is in most cases less than that of conventional compression such as GZip (see [EXI Evaluation Note]).

3. Efficient XML Interchange by Example

This section walks through the EXI coding of the Notebook Example, explaining the concepts previously introduced in a step-by-step approach.

3.1 Notation

The table below shows the notation that is used in the description of EXI encoding and decoding in subsequent sections.

Table 3-1. Example Notation
NotationDescription
N n An unsigned integer N is represented as n-bit Unsigned Integer.
N uint An unsigned integer N is represented as Unsigned Integer.
"Literal String"The string shown between double quotation marks is represented as String (length prefixed sequence of characters) with the length part of the string being omitted.
"Literal String" +n The string shown between double quotation marks is represented as String (length prefixed sequence of characters) with the length part being the length of the string incremented by n.

3.2 Options

We do not make use of specific encoding options such as using compression or user-defined datatype representations to encode and decode the body. The table below shows the fidelity options used throughout the presented example.

Table 3-2. Fidelity options used in this document
Fidelity optionValueEffect
Preserve.commentsfalseProductions of CM events are pruned from grammars
Preserve.pisfalseProductions of PI events are pruned from grammars
Preserve.dtdfalseProductions of DOCTYPE and ER events are pruned from grammars
Preserve.prefixesfalseNS events are pruned from grammars and namespace prefixes are not preserved
Preserve.lexicalValuesfalseLexical form of element and attribute values is not preserved

The fidelity options setting shown above prunes those productions in an EXI grammar that contain CM (Comment), PI (Processing Instruction), DT (DocType) or ER (Entity Reference) events. This is so as to make the grammar presentation as concise as possible, yet preserving the variety of event types that are actually used in the example XML document.

In the example encodings, whitespaces that are originally present in the example XML document are omitted, which is intentional. EXI format is capable of representing those whitespaces in an efficient manner. They are omitted in the example encodings to make the encoding description more articulate by focusing on the primary data and structure without being distracted by the frequent occurrence of indentation whitespaces.

3.3 Encoding Example

This section describes the encoding of an EXI Body. The sample XML document is transcoded on the one hand into an EXI document in the absence of any schemas using built-in grammars and on the other hand using schema-informed grammars according to provided schema information (see Notebook XML Schema).

XML information set item
   EXI       without a schema    
   EXI       with schema information    

While coding an EXI Body stream a stack of EXI grammars is in use. A grammar consists of an ordered list of productions. Each production (e.g. SD  DocContent  0) is made up of an EXI Event, a following grammar (except for End Element), and an associated Event Code.

The initial stack item is a Document or a Fragment Grammar, depending on whether we deal with an XML document or with an XML fragment, respectively. The grammar at the top of the stack represents the context and constitutes the possible productions or events of this context. In addition, Start Element (SE) events push a new grammar onto the top of the stack while End Element (EE) events pop the current grammar.

Each XML information set item is split up into the according event, grammar, and encoding part accompanied with notes.

Table 3-3. Encoding Example Illustration
EventGrammarEXI EncodingNote
EventCodeContent
<?xml version="1.0" encoding="UTF-8"?>
SD
Document
SDDocContent0
00  

The Document grammar is the starting point for any EXI document. Given that there is only one possible production SD, the associated Event Code 0 is represented in zero bits (i.e., omitted).

  1. The current Document grammar moves on to the DocContent grammar.
SD
Document
SDDocContent0
00  
<notebook>
SE
DocContent
SE(*)DocEnd0
00 12 "notebook"+1
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. "notebook"+1 (local-name miss)

    Local-Names (default)
    0"notebook"
  3. DocContent moves on to DocEnd and StartTagNotebook is pushed on grammar-stack

SE
DocContent
SE(notebook)DocEnd0
[Undeclared]1
01  
  1. DocContent moves on to DocEnd and StartTagNotebook1 is pushed on grammar-stack

date = "2007-09-12"
AT
StartTagNotebook
EE0.0
AT(*)StartTagNotebook0.1
SE(*)ElementNotebook0.2
CHElementNotebook0.3
00 12 12 "date"+1 "2007-09-12"+2
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. "date"+1 (local-name miss)

    Local-Names (default)
    0"notebook"
    1"date"
  3. "2007-09-12"+2 (value miss)

    Values (date)
    0"2007-09-12"
  4. StartTagNotebook extended by leading AT(date)

AT
StartTagNotebook1
AT(date)StartTagNotebook20
SE(note)ElementNotebook11
[Undeclared]2
02 01 7uint 3009 01
  1. "2007-09-12" as Date-Time (date)

    YearOffset from 2000 as Integer01 7uint
    MonthDay(Month * 32 + Day) as 9-bit Unsigned Integer3009
    presenceBoolean indicator01
    [TimeZone]
  2. StartTagNotebook1 moves on to StartTagNotebook2

<note>
SE
StartTagNotebook
AT(date)StartTagNotebook0
EE1.0
AT(*)StartTagNotebook1.1
SE(*)ElementNotebook1.2
CHElementNotebook1.3
11 22 12 "note"+1
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. "note"+1 (local-name miss)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
  3. StartTagNotebook extended by leading SE(note)

  4. StartTagNotebook moves on to ElementNotebook and StartTagNote is pushed on grammar-stack

SE
StartTagNotebook2
SE(note)ElementNotebook10
[Undeclared]1
01  
  1. StartTagNotebook2 moves on to ElementNotebook1 and StartTagNote1 is pushed on grammar-stack
category="EXI"
AT
StartTagNote
EE0.0
AT(*)StartTagNote0.1
SE(*)ElementNote0.2
CHElementNote0.3
00 12 12 "category"+1 "EXI"+2
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. "category"+1 (local-name miss)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
  3. "EXI"+2 (value miss)

    Values (category)
    0"EXI"
  4. StartTagNote extended by leading AT(category)

AT
StartTagNote1
AT(category)StartTagNote20
AT(date)StartTagNote31
[Undeclared]2
02 "EXI"+2
  1. "EXI"+2 (value miss)

    Values (category)
    0"EXI"
  2. StartTagNote1 moves on to StartTagNote2
date="2007-07-23"
AT
StartTagNote
AT(category)StartTagNote0
EE1.0
AT(*)StartTagNote1.1
SE(*)ElementNote1.2
CHElementNote1.3
11 12 12 0uint 12 "2007-07-23"+2
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. 0uint 12 (local-name hit)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
  3. "2007-07-23"+2 (value miss)

    Values (date)
    0"2007-09-12"
    1"2007-07-23"
  4. StartTagNote extended by leading AT(date)

AT
StartTagNote2
AT(date)StartTagNote30
[Undeclared]1
01 01 7uint 2479 01
  1. "2007-07-23" as Date-Time (date)

    YearOffset from 2000 as Integer01 7uint
    MonthDay(Month * 32 + Day) as 9-bit Unsigned Integer2479
    presenceBoolean indicator01
    [TimeZone]
  2. StartTagNote2 moves on to StartTagNote3
<subject>
SE
StartTagNote
AT(date)StartTagNote0
AT(category)StartTagNote1
EE2.0
AT(*)StartTagNote2.1
SE(*)ElementNote2.2
CHElementNote2.3
22 22 12 "subject"+1
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. "subject"+1 (local-name miss)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
    4"subject"
  3. StartTagNote extended by leading SE(subject)

  4. StartTagNote moves on to ElementNote and StartTagSubject is pushed on grammar-stack

SE
StartTagNote3
SE(subject)ElementNote10
[Undeclared]1
01  
  1. StartTagNote3 moves on to ElementNote1 and StartTagSubject1 is pushed on grammar-stack
EXI
CH
StartTagSubject
EE0.0
AT(*)StartTagSubject0.1
SE(*)ElementSubject0.2
CHElementSubject0.3
00 32 1uint 12
  1. 1uint 12 (global-value hit)

    Global-Values
    0"2007-09-12"
    1"EXI"
    2"2007-07-23"
  2. StartTagSubject extended by leading CH

  3. StartTagSubject moves on to ElementSubject

CH
StartTagSubject1
CHElementSubject10
[Undeclared]1
01 1uint 00
  1. 1uint 00 (global-value hit)

    Global-Values
    0"EXI"
  2. StartTagSubject1 moves on to ElementSubject1

</subject>
EE
ElementSubject
EE0
SE(*)ElementSubject1.0
CHElementSubject1.1
01  
  1. ElementSubject popped from grammar-stack

EE
ElementSubject1
EE0
[Undeclared]1
01  
  1. ElementSubject1 popped from grammar-stack

<body>
SE
ElementNote
EE0
SE(*)ElementNote1.0
CHElementNote1.1
11 01 12 "body"+1
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. "body"+1 (local-name miss)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
    4"subject"
    5"body"
  3. ElementNote extended by leading SE(body)

  4. StartTagBody is pushed on grammar-stack

SE
ElementNote1
SE(body)ElementNote20
[Undeclared]1
01 
  1. ElementNote1 moves on to ElementNote2 and StartTagBody1 is pushed on grammar-stack

Do not forget it!
CH
StartTagBody
EE0.0
AT(*)StartTagBody0.1
SE(*)ElementBody0.2
CHElementBody0.3
00 32 "Do not forget it!"+2
  1. "Do not forget it!"+2 (value miss)

    Values (body)
    0"Do not forget it!"
  2. StartTagBody extended by leading CH

  3. StartTagBody moves on to ElementBody

CH
StartTagBody1
CHElementBody10
[Undeclared]1
01 "Do not forget it!"+2
  1. "Do not forget it!"+2 (value miss)

    Values (body)
    0"Do not forget it!"
  2. StartTagBody1 moves on to ElementBody1

</body>
EE
ElementBody
EE0
SE(*)ElementBody1.0
CHElementBody1.1
01  
  1. ElementBody popped from grammar-stack

EE
ElementBody1
EE0
[Undeclared]1
01  
  1. ElementBody1 popped from grammar-stack

</note>
EE
ElementNote
SE(body)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
12  
  1. ElementNote popped from grammar-stack

EE
ElementNote2
EE0
[Undeclared]1
01  
  1. ElementNote2 popped from grammar-stack

<note>
SE
ElementNotebook
EE0
SE(*)ElementNotebook1.0
CHElementNotebook1.1
11 01 12 0uint 23
  1. 12 (uri hit)

    URIs
    0[uri miss]
    1"" [empty string]
    2".../XML/1998/namespace"
    3".../XMLSchema-instance"
  2. 0unit 23 (local-name hit)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
    4"subject"
    5"body"
  3. ElementNotebook extended by leading SE(note)

  4. StartTagNote is pushed on grammar-stack

SE
ElementNotebook1
SE(note)ElementNotebook10
EE1
[Undeclared]2
02 
  1. StartTagNote1 is pushed on grammar-stack

date="2007-09-12"
AT
StartTagNote
SE(subject)ElementNote0
AT(date)StartTagNote1
AT(category)StartTagNote2
EE3.0
AT(*)StartTagNote3.1
SE(*)ElementNote3.2
CHElementNote3.3
12 0uint 01
  1. 0uint 01 (value hit)

    Values (date)
    0"2007-09-12"
    1"2007-07-23"
AT
StartTagNote1
AT(category)StartTagNote20
AT(date)StartTagNote31
[Undeclared]2
12 01 7uint 3009 01
  1. "2007-09-12" as Date-Time (date)

    YearOffset from 2000 as Integer01 7uint
    MonthDay(Month * 32 + Day) as 9-bit Unsigned Integer3009
    presenceBoolean indicator01
    [TimeZone]
  2. StartTagNote1 moves on to StartTagNote3
<subject>
SE
StartTagNote
SE(subject)ElementNote0
AT(date)StartTagNote1
AT(category)StartTagNote2
EE3.0
AT(*)StartTagNote3.1
SE(*)ElementNote3.2
CHElementNote3.3
02  
  1. StartTagNote moves on to ElementNote and StartTagSubject is pushed on grammar-stack

SE
StartTagNote3
SE(subject)ElementNote10
[Undeclared]1
01  
  1. StartTagNote3 moves on to ElementNote1 and StartTagSubject1 is pushed on grammar-stack
Shopping List
CH
StartTagSubject
CHElementSubject0
EE1.0
AT(*)StartTagSubject1.1
SE(*)ElementSubject1.2
CHElementSubject1.3
01 "Shopping List"+2
  1. "Shopping List"+2 (value miss)

    Values (subject)
    0"Shopping List"
  2. StartTagSubject moves on to ElementSubject

CH
StartTagSubject1
CHElementSubject10
[Undeclared]1
01 "Shopping List"+2
  1. "Shopping List"+2 (value miss)

    Values (subject)
    0"Shopping List"
  2. StartTagSubject1 moves on to ElementSubject1

</subject>
EE
ElementSubject
EE0
SE(*)ElementSubject1.0
CHElementSubject1.1
01  
  1. ElementSubject popped from grammar-stack

EE
ElementSubject1
EE0
[Undeclared]1
01  
  1. ElementSubject1 popped from grammar-stack

<body>
SE
ElementNote
SE(body)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
02  
  1. StartTagBody is pushed on grammar-stack
SE
ElementNote1
SE(body)ElementNote20
[Undeclared]1
01 
  1. ElementNote1 moves on to ElementNote2 and StartTagBody1 is pushed on grammar-stack

milk, honey
CH
StartTagBody
CHElementBody0
EE1.0
AT(*)StartTagBody1.1
SE(*)ElementBody1.2
CHElementBody1.3
01 "milk, honey"+2
  1. "milk, honey"+2 (value miss)

    Values (body)
    0"Do not forget it!"
    1"milk, honey"
  2. StartTagBody moves on to ElementBody

CH
StartTagBody1
CHElementBody10
[Undeclared]1
01 "milk, honey"+2
  1. "milk, honey"+2 (value miss)

    Values (body)
    0"Do not forget it!"
    1"milk, honey"
  2. StartTagBody1 moves on to ElementBody1

</body>
EE
ElementBody
EE0
SE(*)ElementBody1.0
CHElementBody1.1
01  
  1. ElementBody popped from grammar-stack

EE
ElementBody1
EE0
[Undeclared]1
01  
  1. ElementBody1 popped from grammar-stack

</note>
EE
ElementNote
SE(body)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
12  
  1. ElementNote popped from grammar-stack

EE
ElementNote2
EE0
[Undeclared]1
01  
  1. ElementNote2 popped from grammar-stack

</notebook>
EE
ElementNotebook
SE(note)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
12  
  1. ElementNotebook popped from grammar-stack
EE
ElementNotebook1
SE(note)ElementNotebook10
EE1
[Undeclared]2
12 
  1. ElementNotebook1 is popped from grammar-stack

EOF
ED
DocEnd
ED0
00   
ED
DocEnd
ED0
00   

Please find below the resulting byte sequences of the step-by-step encoding descriptions:

EXI body streams
without a schema 42 5B 9B DD 19 58 9B DB DA D4 15 91 85 D1 94 30 C8 C0 C0 DC B4 C0 E4 B4 C4 CB 20 AD CD EE 8C AA 00 86 19 18 18 1B 96 98 1B 96 99 19 D4 25 8D 85 D1 95 9D BD C9 E4 15 15 61 26 90 87 37 56 26 A6 56 37 4C 06 48 2B 13 7B 23 CE 26 88 DE 40 DC DE E8 40 CC DE E4 CE CA E8 40 D2 E8 42 64 01 40 00 1E E6 D0 DE E0 E0 D2 DC CE 40 D8 D2 E6 E8 01 AD AD 2D 8D 65 84 0D 0D ED CC AF 25
with schema information 00 79 60 02 A2 AC 24 80 EF 70 02 04 D1 1B C8 1B 9B DD 08 19 9B DC 99 D9 5D 08 1A 5D 08 41 03 CB 00 7B 9B 43 7B 83 83 4B 73 39 03 63 4B 9B A0 0D 6D 69 6C 6B 2C 20 68 6F 6E 65 79 10

3.4 Decoding Example

This section describes the process of decoding an EXI Body stream using an example. The EXI body streams produced by the encoding example in the previous section are parsed into EXI events. As was the case with the encoding example, the process is shown using two separate modes in parallel. One mode uses no schema information (i.e. schema-less), and the other mode uses an appropriate schema (Notebook XML Schema).

EXI Grammar
   Tinged with light gray       without a schema (i.e. schema-less)    
   Tinged with dark gray       with schema information    

Given an initial grammar (either a Document Grammar or a Fragment Grammar as determined by assumed or specified EXI options), the process of decoding an EXI Body Stream simply repeats the following steps.

  1. Decode Event Code (according to grammar-rule context)

  2. Decode event content (if present, see EXI Event Types)

  3. Move forward in grammar according to current grammar rules

  4. Return to step 1 if last event was not EndDocument (ED)

  5. [Done]

Table 3-4. Decoding Example Illustration
GrammarEXI DecodingNote
EventCodeEventContent
Document
SDDocContent0
00 SD 
  • The Document grammar is the starting point for any EXI document. Given that there is only one possible production SD, the associated Event Code 0 is represented in zero bits (i.e., omitted).
  • The current Document grammar moves on to the DocContent grammar.
Document
SDDocContent0
00 SD 
<?xml version="1.0" encoding="UTF-8"?>
DocContent
SE(*)DocEnd0
00 SE 12 "notebook"+1

Decode content as qname.

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. "notebook"+1 (literal local-name)

    Add it to local-name partition for uri "" (empty string).

    Local-Names (default)
    0"notebook"

Then,

  • DocContent moves on to DocEnd and StartTagNotebook is pushed on grammar-stack
DocContent
SE(notebook)DocEnd0
[Undeclared]1
01 SE 
  • DocContent moves on to DocEnd and StartTagNotebook1 is pushed on grammar-stack
<notebook>
StartTagNotebook
EE0.0
AT(*)StartTagNotebook0.1
SE(*)ElementNotebook0.2
CHElementNotebook0.3
00 12 AT 12 "date"+1 "2007-09-12"+2

Decode content as qname and value (String)

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. "date"+1 (literal local-name)

    Add it to local-name partition for uri "" (empty string).

    Local-Names (default)
    0"notebook"
    1"date"
  3. "2007-09-12"+2 (literal string value)

    Add it to local value partition for qname date.

    Values (date)
    0"2007-09-12"

Then,

  • StartTagNotebook extended by leading AT(date)
StartTagNotebook1
AT(date)StartTagNotebook20
SE(note)ElementNotebook11
[Undeclared]2
02 AT01 7uint 3009 01

Decode content as value of Date-Time (date):

  1. 01 7uint (Year offset from 2000 as Integer)
  2. 3009 (Month*32+Day as 9-bit Unsigned Integer)
  3. 01 (TimeZone presence indicator as Boolean)

to reconstitute "2007-09-12".

Then,

  • StartTagNotebook1 moves on to StartTagNotebook2
date = "2007-09-12"
StartTagNotebook
AT(date)StartTagNotebook0
EE1.0
AT(*)StartTagNotebook1.1
SE(*)ElementNotebook1.2
CHElementNotebook1.3
11 22 SE 12 "note"+1

Decode content as qname.

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. "note"+1 (literal local-name)

    Add it to local-name partition for uri "" (empty string).

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"

Then,

  • StartTagNotebook extended by leading SE(note)

  • StartTagNotebook moves on to ElementNotebook and StartTagNote is pushed on grammar-stack

StartTagNotebook2
SE(note)ElementNotebook10
[Undeclared]1
01 SE 
  • StartTagNotebook2 moves on to ElementNotebook1 and StartTagNote1 is pushed on grammar-stack
<note>
StartTagNote
EE0.0
AT(*)StartTagNote0.1
SE(*)ElementNote0.2
CHElementNote0.3
00 12 AT 12 "category"+1 "EXI"+2

Decode content as qname and value (String).

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. "category"+1 (literal local-name)

    Add it to local-name partition for uri "" (empty string).

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
  3. "EXI"+2 (literal string value)

    Add it to local value partition for qname category.

    Values (category)
    0"EXI"

Then,

  • StartTagNote extended by leading AT(category)

StartTagNote1
AT(category)StartTagNote20
AT(date)StartTagNote31
[Undeclared]2
02 AT"EXI"+2

Decode content as value (String).

  1. "EXI"+2 (literal string value)

    Add it to local value partition for qname category.

    Values (category)
    0"EXI"

Then,

  • StartTagNote1 moves on to StartTagNote2
category="EXI"
StartTagNote
AT(category)StartTagNote0
EE1.0
AT(*)StartTagNote1.1
SE(*)ElementNote1.2
CHElementNote1.3
11 12 AT 12 0uint 12 "2007-07-23"+2

Decode content as qname and value (String).

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. 0uint 12 (local-name id)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
  3. "2007-07-23"+2 (literal string value)

    Add it to local value partition for qname date.

    Values (date)
    0"2007-09-12"
    1"2007-07-23"

Then,

  • StartTagNote extended by leading AT(date)

StartTagNote2
AT(date)StartTagNote30
[Undeclared]1
01 AT01 7uint 2479 01

Decode content as value of Date-Time (date):

  1. 01 7uint (Year offset from 2000 as Integer)
  2. 2479 (Month*32+Day as 9-bit Unsigned Integer)
  3. 01 (TimeZone presence indicator as Boolean)

to reconstitute "2007-07-23".

Then,

  • StartTagNote2 moves on to StartTagNote3
date="2007-07-23"
StartTagNote
AT(date)StartTagNote0
AT(category)StartTagNote1
EE2.0
AT(*)StartTagNote2.1
SE(*)ElementNote2.2
CHElementNote2.3
22 22 SE 12 "subject"+1

Decode content as qname.

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. "subject"+1 (literal local-name)

    Add it to local-name partition for uri "" (empty string).

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
    4"subject"

Then,

  • StartTagNote extended by leading SE(subject)

  • StartTagNote moves on to ElementNote and StartTagSubject is pushed on grammar-stack

StartTagNote3
SE(subject)ElementNote10
[Undeclared]1
01 SE 
  • StartTagNote3 moves on to ElementNote1 and StartTagSubject1 is pushed on grammar-stack
<subject>
StartTagSubject
EE0.0
AT(*)StartTagSubject0.1
SE(*)ElementSubject0.2
CHElementSubject0.3
00 32 CH 1uint 12

Decode content as value (String).

  1. 1uint 12 (value id in global value partition)

    Global-Values
    0"2007-09-12"
    1"EXI"
    2"2007-07-23"

Then,

  • StartTagSubject extended by leading CH

  • StartTagSubject moves on to ElementSubject

StartTagSubject1
CHElementSubject10
[Undeclared]1
01CH 1uint 00

Decode content as value (String).

  1. 1uint 00 (value id in global value partition)

    Global-Values
    0"EXI"

Then,

  • StartTagSubject1 moves on to ElementSubject1

EXI
ElementSubject
EE0
SE(*)ElementSubject1.0
CHElementSubject1.1
01 EE  
  • ElementSubject popped from grammar-stack

ElementSubject1
EE0
[Undeclared]1
01 EE  
  • ElementSubject1 popped from grammar-stack

</subject>
ElementNote
EE0
SE(*)ElementNote1.0
CHElementNote1.1
11 01 SE 12 "body"+1

Decode content as qname.

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. "body"+1 (literal local-name)

    Add it to local-name partition for uri "" (empty string).

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
    4"subject"
    5"body"

Then,

  • ElementNote extended by leading SE(body)

  • StartTagBody is pushed on grammar-stack

ElementNote1
SE(body)ElementNote20
[Undeclared]1
01SE 
  • ElementNote1 moves on to ElementNote2 and StartTagBody1 is pushed on grammar-stack

<body>
StartTagBody
EE0.0
AT(*)StartTagBody0.1
SE(*)ElementBody0.2
CHElementBody0.3
00 32 CH "Do not forget it!"+2

Decode content as value (String).

  1. "Do not forget it!"+2 (literal string value)

    Add it to local value partition for qname body.

    Values (body)
    0"Do not forget it!"

Then,

  • StartTagBody extended by leading CH

  • StartTagBody moves on to ElementBody

StartTagBody1
CHElementBody10
[Undeclared]1
01CH "Do not forget it!"+2

Decode content as value (String).

  1. "Do not forget it!"+2 (literal string value)

    Add it to local value partition for qname body.

    Values (body)
    0"Do not forget it!"

Then,

  • StartTagBody1 moves on to ElementBody1

Do not forget it!
ElementBody
EE0
SE(*)ElementBody1.0
CHElementBody1.1
01 EE  
  • ElementBody popped from grammar-stack

ElementBody1
EE0
[Undeclared]1
01 EE  
  • ElementBody1 popped from grammar-stack

</body>
ElementNote
SE(body)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
12 EE  
  • ElementNote popped from grammar-stack

ElementNote2
EE0
[Undeclared]1
01 EE  
  • ElementNote2 popped from grammar-stack

</note>
ElementNotebook
EE0
SE(*)ElementNotebook1.0
CHElementNotebook1.1
11 01 SE 12 0uint 23

Decode content as qname.

  1. 12 makes uri id 0 (i.e. 12 - 1 = 0)

    URIs
    0"" [empty string]
    1".../XML/1998/namespace"
    2".../XMLSchema-instance"
  2. 0unit 23 (local-name id)

    Local-Names (default)
    0"notebook"
    1"date"
    2"note"
    3"category"
    4"subject"
    5"body"

Then,

  • ElementNotebook extended by leading SE(note)

  • StartTagNote is pushed on grammar-stack

ElementNotebook1
SE(note)ElementNotebook10
EE1
[Undeclared]2
02SE 
  • StartTagNote1 is pushed on grammar-stack

<note>
StartTagNote
SE(subject)ElementNote0
AT(date)StartTagNote1
AT(category)StartTagNote2
EE3.0
AT(*)StartTagNote3.1
SE(*)ElementNote3.2
CHElementNote3.3
12 AT 0uint 01

Decode content as value (String).

  1. 0uint 01 (value id in local value partition for qname date)

    Values (date)
    0"2007-09-12"
    1"2007-07-23"
StartTagNote1
AT(category)StartTagNote20
AT(date)StartTagNote31
[Undeclared]2
12 AT01 7uint 3009 01

Decode content as value of Date-Time (date):

  1. 01 7uint (Year offset from 2000 as Integer)
  2. 3009 (Month*32+Day as 9-bit Unsigned Integer)
  3. 01 (TimeZone presence indicator as Boolean)

to reconstitute "2007-09-12".

Then,

  • StartTagNote1 moves on to StartTagNote3
date="2007-09-12"
StartTagNote
SE(subject)ElementNote0
AT(date)StartTagNote1
AT(category)StartTagNote2
EE3.0
AT(*)StartTagNote3.1
SE(*)ElementNote3.2
CHElementNote3.3
02 SE 
  • StartTagNote moves on to ElementNote and StartTagSubject is pushed on grammar-stack

StartTagNote3
SE(subject)ElementNote10
[Undeclared]1
01 SE 
  • StartTagNote3 moves on to ElementNote1 and StartTagSubject1 is pushed on grammar-stack
<subject>
StartTagSubject
CHElementSubject0
EE1.0
AT(*)StartTagSubject1.1
SE(*)ElementSubject1.2
CHElementSubject1.3
01 CH "Shopping List"+2

Decode content as value (String).

  1. "Shopping List"+2 (literal string value)

    Add it to local value partition for qname subject.

    Values (subject)
    0"Shopping List"

Then,

  • StartTagSubject moves on to ElementSubject

StartTagSubject1
CHElementSubject10
[Undeclared]1
01CH "Shopping List"+2

Decode content as value (String).

  1. "Shopping List"+2 (literal string value)

    Add it to local value partition for qname subject.

    Values (subject)
    0"Shopping List"

Then,

  • StartTagSubject1 moves on to ElementSubject1

Shopping List
ElementSubject
EE0
SE(*)ElementSubject1.0
CHElementSubject1.1
01 EE  
  • ElementSubject popped from grammar-stack

ElementSubject1
EE0
[Undeclared]1
01 EE  
  • ElementSubject1 popped from grammar-stack

</subject>
ElementNote
SE(body)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
02 SE 
  • StartTagBody is pushed on grammar-stack
ElementNote1
SE(body)ElementNote20
[Undeclared]1
01SE 
  • ElementNote1 moves on to ElementNote2 and StartTagBody1 is pushed on grammar-stack

<body>
StartTagBody
CHElementBody0
EE1.0
AT(*)StartTagBody1.1
SE(*)ElementBody1.2
CHElementBody1.3
01 CH "milk, honey"+2

Decode content as value (String).

  1. "milk, honey"+2 (literal string value)

    Add it to local value partition for qname body.

    Values (body)
    0"Do not forget it!"
    1"milk, honey"

Then,

  • StartTagBody moves on to ElementBody

StartTagBody1
CHElementBody10
[Undeclared]1
01CH "milk, honey"+2

Decode content as value (String).

  1. "milk, honey"+2 (literal string value)

    Add it to local value partition for qname body.

    Values (body)
    0"Do not forget it!"
    1"milk, honey"

Then,

  • StartTagBody1 moves on to ElementBody1

milk, honey
ElementBody
EE0
SE(*)ElementBody1.0
CHElementBody1.1
01 EE  
  • ElementBody popped from grammar-stack

ElementBody1
EE0
[Undeclared]1
01 EE  
  • ElementBody1 popped from grammar-stack

</body>
ElementNote
SE(body)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
12 EE  
  • ElementNote popped from grammar-stack

ElementNote2
EE0
[Undeclared]1
01 EE  
  • ElementNote2 popped from grammar-stack

</note>
ElementNotebook
SE(note)ElementNote0
EE1
SE(*)ElementNote2.0
CHElementNote2.1
12 EE 
  • ElementNotebook popped from grammar-stack
ElementNotebook1
SE(note)ElementNotebook10
EE1
[Undeclared]2
12EE 
  • ElementNotebook1 is popped from grammar-stack

</notebook>
DocEnd
ED0
00 ED  
DocEnd
ED0
00 ED  
EOF

The resulting XML instance assembled from decoded EXI events is shown below.

Example 3-1. Decoded XML Document (Notebook)
<?xml version="1.0" encoding="UTF-8"?> 
<notebook date="2007-09-12">
 <note category="EXI" date="2007-07-23">
  <subject>EXI</subject>
  <body>Do not forget it!</body>
 </note>
 <note date="2007-09-12">
  <subject>Shopping List</subject>
  <body>milk, honey</body>
 </note>
</notebook>

A References

Efficient XML Interchange (EXI) Format 1.0 (Second Edition)
Efficient XML Interchange (EXI) Format 1.0 (Second Edition), John Schneider, Takuki Kamiya, Daniel Peintner, Rumen Kyusakov, Editors. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi/. (See http://www.w3.org/TR/2014/REC-exi-20140211/.)
EXI Evaluation Note
Efficient XML Interchange Evaluation, Carine Bournez, Editor. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-evaluation/. (See http://www.w3.org/TR/2009/WD-exi-evaluation-20090407/.)
XML Schema Datatypes
XML Schema Part 2: Datatypes Second Edition, P. Byron and A. Malhotra, Editors. World Wide Web Consortium, 2 May 2001, revised 28 October 2004. The latest version is available at http://www.w3.org/TR/xmlschema-2. (See http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/.)