Efficient XML Interchange (EXI) Primer

1. Introduction

The intended audience of this document includes users and developers with basic understanding of XML and XML Schema. This document provides an informal description of the EXI format; the reader is referred to the Efficient XML Interchange (EXI) Format 1.0 for further details. Hereinafter, the presentation assumes that the reader is familiar with the basic concepts of XML and the way XML Schema can be used to describe and enforce constraints on XML document families.

The document is comprised of two major parts. The first part describes the structure of an EXI document with and without compression. More specifically, it describes the concept of an EXI stream and how it is generated using EXI grammars, as well as the implications on structure and content ordering in an EXI stream when compression is enabled. As a practical application of the concepts from the first part, the second part presents a complete bit-level description of an EXI document.

2. Concepts

The development of the Efficient XML Interchange (EXI) format was guided by five design principles, namely, the format had to be general, minimal, efficient, flexible, and interoperable. The format satisfies these prerequisites, achieving generality, flexibility, and performance while at the same time keeping complexity in check.

Many of the concepts employed by the EXI format are applicable to the encoding of arbitrary languages that can be described by a grammar. Even though EXI utilizes schema information to improve compactness and processing efficiency, it does not depend on accurate, complete or current schemas to work.

2.1 EXI Streams

EXI streams are the basic structure of EXI documents. As shown below, an EXI stream consists of an EXI header followed by an EXI body.

Table 2-1. EXI Stream Structure
EXI Header	EXI Body

The EXI header conveys format version information and may also include the set of options that were used during encoding; if these options are omitted, then it is assumed that the decoder has access to them out of band. The EXI body comprises an event sequence describing the document (or document fragment) that is encoded. The following two sections describe the EXI header and EXI body in more detail.

2.1.1 EXI Header

The header communicates encoding properties that are needed to decode the EXI body. The default settings can be represented in a single byte. This keeps the overhead and complexity to a minimum and does not sacrifice compactness, especially for small documents where a header can introduce a large constant factor.

The structure of an EXI header is depicted in the following figure. Note that even though EXI is a bit aligned format, the header is padded to the next byte to support fast header interpretation.

Table 2-2. EXI Header Structure
Distinguishing Bits	Presence Bit for EXI Options	EXI Format Version	EXI Options	[Padding Bits]

The EXI header, and hence every EXI document, starts with a pair of Distinguishing Bits that can be used to recognize an EXI document from a textual XML document. The two bit-sequence (1 0) is sufficient to distinguish EXI streams from XML streams based on a broad range of character encodings.

Editorial note
The integration of a magic cookie is under consideration by the EXI WG. A magic cookie would allow distinguishing an EXI document from formats other than XML or from future character encodings.

The Presence Bit for EXI Options follows the distinguishing bits. The value of this single bit is used to indicate the presence or absence of the EXI Options that appear later in the header.

The EXI Format Version identifies the version of EXI in use and allows future improvements and modifications. A leading 0 (zero) bit indicates that the document is encoded according to the final version of the recommendation, while a leading 1 (one) indicates that it is a preview version. The differentiation is introduced to facilitate early releases of preview versions with less strict interoperability requirements. Only final versions are required to be processed by compliant processors. The leading bit is followed by one or more 4-bit sequences which are collectively interpreted as a format version number starting at 1. For example, the 4-bit sequence 0000 is interpreted as version 1 and the two 4-bit sequences 1111 0001 is interpreted as 15 + 2 or version 17.

The EXI Options specify how the body of an EXI stream is encoded and, as stated earlier, their presence is controlled by the present bit earlier in the header. The overhead introduced by the EXI options is comparatively small given that they are formally described using an XML schema and can therefore be encoded using EXI as well. The following table describes the EXI options that can be specified in the EXI header.

Table 2-3. EXI Options
EXI Option	Description
alignment	Alignment of event codes and content items
compression	Indicates if EXI compression is to be used for better compactness
fragment	Indicates if the body is to be encoded as an EXI fragment instead of an EXI document
preserve	A set of options that controls whether comments, processing instructions, etc. are preserved
schemaID	Identifies the schema used during encoding
codecMap	Identifies any pluggable CODECs used to encode the body
blockSize	Specifies the block size used for EXI compression
[user defined]	User defined headers may be added

Most of the options are straightforward and act as boolean values to enable or disable a feature. They are represented using optional XML elements which are also encoded using EXI. For more information on the XML schema that is used to encode these options, the reader is referred to XML Schema for EXI Options Header.

The preserve options shown in the table above is really a family of options that control what XML items are preserved and what XML items are ignored. These are collectively known as fidelity options. These options can be used to eliminate the associated overhead of communicating unused XML items. Certain XML items such as processing instructions or DTDs may never occur (like in SOAP) or are simply unimportant to the use case or application domain. Fidelity options are used to manage filters for certain XML items as shown in the following table.

Table 2-4. Fidelity Options
Fidelity Option	Effect
Preserve.comments	Productions of CM (Comment) events are preserved in grammars
Preserve.pis	Productions of PI (Processing Instruction) events are preserved in grammars
Preserve.dtd	Productions of DOCTYPE and ER (Entity Reference) events are preserved
Preserve.prefixes	NS (Namespace Declaration) events and namespace prefixes are preserved
Preserve.lexicalValues	Lexical form of element and attribute values are preserved

Naturally, XML items that are discarded at encoding time (due to a particular setting of the fidelity options) cannot be reconstructed at decoding time. The next section deals with the EXI Body and discusses in more detail the effects of enabling and disabling fidelity options.

2.1.2 EXI Body

The body of an EXI document is composed of a sequence of EXI events. The notion of an event in this context is similar to that in the StAX and SAX APIs. XML items are encoded into one or more EXI events; for example, an attribute named foo can be encoded as AT("foo") and an element named bar as the pair of events SE("bar") and EE. EXI events may have additional content associated with them. For example, the attribute event AT("foo") may have an attribute value foo1 associated with it. The following table shows all the possible event types together with their associated content.

Table 2-5. EXI Event types
EXI Event Type		Grammar Notation	Information Items
EXI Event Type		Grammar Notation	Structure	Content
(+) Fidelity Options can be used to prune events from the EXI stream to realize a more compact representation.
Start Document		SD
End Document		ED
Start Element		SE (qname)
Start Element		SE (*)	qname
End Element		EE
Attribute		AT ( qname )		value
Attribute		AT (*)	qname	value
Characters		CH		value
Namespace Declaration	(+)	NS	prefix, uri, indicator
Comment		CM	text
Processing Instruction		PI	name, text
DOCTYPE		DT	name, public, system, text
Entity Reference		ER	name

For named XML items, such as element and attributes, there are two types of events: SE(qname) and SE(*) as well as AT(qname) and AT(*). These events differ in their associated content: when SE(qname) or AT(qname) are used, the actual qname of the XML item is not encoded as part of event. The decision to use one type of event over the other will be explained later after introducing the notion of EXI grammars.

The fidelity options introduced in Section 2.1.1 EXI Header may be used to prune EXI events like NS, CM, PI, DT (DocType) or ER (Entity Reference). Grammar pruning simplifies the encoding and decoding process and also improves compactness by filtering out unused event types.

Consider a simple XML document from a notebook application:

Example 2-1. Notebook (XML Document)

<?xml version="1.0" encoding="UTF-8"?> 
<notebook date="2007-09-12">
 <note date="2007-07-23" category="EXI">
  <subject>EXI</subject>
  <body>Do not forget it!</body>
 </note>
 <note date="2007-09-12">
  <subject>Shopping List</subject>
  <body>milk, honey</body>
 </note>
</notebook>

The sequence of EXI events corresponding to the body of this XML document is shown below.

Figure 2-1. EXI Body Stream

This sequence of EXI events can be easily mapped to the structure of the XML document shown above. Every document begins with a SD and ends with an ED. The order in which attributes are encoded may be different in schema-less and schema-informed modes, as is the exact content associated with each event.

The actual number of bits used to represent each type of event, excluding its content, differs depending on context. The more event types that can occur in a certain context, the larger the number of bits required to represent an event in that context. What constitutes a context in this case is more formally defined by an EXI grammar production in the next section.

2.1.3 EXI Grammars

EXI is a knowledge based encoding that uses a set of grammars to determine which events are most likely to occur at any given point in an EXI stream and encodes the most likely alternatives in fewer bits. It does this by mapping the stream of events to a lower entropy set of representative values and encoding those values using a set of simple variable length codes or an EXI compression algorithm.

EXI grammars are regular grammars in which productions are associated with event codes. An EXI encoder, driven by an XML event stream, matches grammar productions and uses their associated event codes to represent an XML document or XML fragment. Since EXI grammars are regular grammars, the sequence of event codes written by an encoder corresponds to a path in the finite automaton that accepts the grammar. In reality, given that XML is not a regular language, a single grammar cannot be used to represent an entire XML event stream. Instead, an EXI encoder uses a stack of grammars, one for each element content model (just like an XML Schema validator would).

An event code is represented by a sequence of one to three parts, where each part is a non-negative integer. Event codes in an EXI grammar are assigned to productions in such a way that shorter event codes are used to represent more likely to occur productions. Conversely, longer event codes are used to represent less likely to occur productions. EXI grammars are designed so that the average number of bits needed to represent each production is less than that for a grammar in which more likely and less likely productions are not distinguished. The following tables illustrate this principle via an example.

Table 2-6. Event Code Assignment

Event	Indicator	#bits
AT("category")	0	4
AT("date")	1
EE	2
AT(*)	3
NS	4
SE(*)	5
CH	6
CM	7
PI	8

#distinct values	9

Event	EventCode			#bits
AT("category")	0			2
AT("date")	1			2
EE	2	0		2 + 3
AT(*)	2	1
NS	2	2
SE(*)	2	3
CH	2	4
CM	2	5	0	2 + 3 + 1
PI	2	5	1	2 + 3 + 1

#distinct values	3	6	2

Naive Event Code Assignment

vs.

EXI Event Code Assignment

In the first table, where productions are not separated according to their popularity, a 4-bit code is needed to represent each entry. In the second table, on the other hand, code lengths vary from 2 bits to 6 bits after productions are group based on their likelihood to occur. Assuming the content model for the element being encoded corresponds to the sequence AT("category") AT("date") (i.e., the element declares two attributes) then the encoding of all the event codes will be 4 bits shorter using the second table.

EXI grammars take advantage of a priori knowledge of the kind of data being encoded, namely, XML documents and XML fragments. In particular, EXI grammars can take advantage of the fact that, on any given grammar, certain XML items are more popular than others. For example, by simple inspection of documents in the wild, it is easy to verify that attributes occur more frequently than processing instructions and should therefore receive shorter event codes.

Further improvements in how grammars are designed are possible if schema information is also known at encoding time. In this case, we can not only take advance of generic XML knowledge but also of knowledge that is specific to the type of documents being encoded. For example, as shown in the tables above, we can add specific productions such as AT("category") and AT("date") with shorter event codes than AT(*).

The following two sections describe the differences between the built-in grammars and the schema-informed grammars. Note that an EXI encoder may only have partial schema information in which case it will use a combination of built-in and schema-informed grammars during encoding.

2.1.3.1 Built-In Grammar

EXI uses a set of built-in grammars to encode XML documents and XML fragments when no schema information is available. There are built-in grammars to encode documents, fragments and elements. Document grammars and fragment grammars describe the top-level structure, while element grammars describe the structure of every element. Fragment grammars are more lenient than document grammars; for example, they allow multiple top-level elements to be encoded as siblings. For more information on these grammars, the reader is referred to Built-in XML Grammars.

The EXI format describes a mechanism by which built-in grammars are dynamically extended using information from the actual instance being encoded. Stated differently, the EXI format describes a learning mechanism to further improve efficiency when no schema information is available statically. Newly learned productions are assigned short event codes, improving compactness for every subsequent use of those productions. In addition, by adding new productions to the grammar, certain data associated with an event only needs to be encoded once. For example, if an element named notebook is matched by SE(*) and subsequently matched by SE("notebook"), the actual string "notebook" is only encoded once as part of the SE(*) event.

As pointed out in the previous section, EXI grammars are always regular and can, therefore, be accepted by finite automata (FA). To provide a more operational view of an EXI processor, we will opt for the use of FA to explain how grammars work. The following figure shows a stack of grammars in which the top-level grammar accepts "note" elements. State transitions in black correspond to the built-in element grammar; state transitions in red have been learned as a result of encoding the element before.

Figure 2-2. Built-In Grammar for SE(note)

The built-in element automaton has two distinguished states: StartTag and Element. The former accepts attribute and namespace events that must occur before any element content; the latter accepts only element content which excludes attribute and namespace events. This separation enables the use of short codes which improves compactness and processing time.

As stated earlier, transitions in red are extensions to the built-in element grammar based on knowledge acquired about the element "note". Notice how AT("category"), AT("date") and SE("subject") have been added out of the StartTag state while SE(body) has been added out of the Element state. In particular, this suggests that SE("subject") is expected to occur before SE("body"), and that both of these SE events are expected to occur after any AT event. In addition, notice that both AT(*) and SE(*) are still available to enable future learning.

2.1.3.2 Schema-informed Grammar

EXI grammars can be further improved if schema information is known statically. Schema information can be interpreted in two different ways or encoding modes: strict and non-strict. In strict mode, the instances being encoded must be valid with respect to the schema; any deviation from the schema will result in an encoding error. In non-strict mode, deviations are accepted and encoded using more generic events. Examples of deviations are attributes whose actual values do not match the type defined in the schema or elements whose structure does not correspond to that in the schema. Given that strict grammars have fewer productions (no need for SE(*) or AT(*) in most cases) shorter event codes can be used to encode each option.

Instead of being dynamically extensible as the built-in grammars, schema-informed grammars are created statically based on the information in the available schema. This process will add productions of the form AT(qname) or SE(qname) guided by the attribute and element declarations in the schema. Let us continue the example from the previous section by assuming the following schema is available statically.

Example 2-2. Notebook (XML Schema)

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="notebook">
    <xs:complexType>
      <xs:sequence maxOccurs="unbounded">
        <xs:element name="note" type="Note"/>
      </xs:sequence>
      <xs:attribute ref="date" />
    </xs:complexType>
  </xs:element>
  <xs:complexType name="Note">
    <xs:sequence>
      <xs:element name="subject" type="xs:string"/>
      <xs:element name="body" type="xs:string"/>
    </xs:sequence>
    <xs:attribute ref="date" use="required" />
    <xs:attribute name="category" type="xs:string"/>
  </xs:complexType>
  <xs:attribute name="date" type="xs:date"  />
</xs:schema>

The schema for the element "note" states that it has a mandatory attribute "date" and an optional attribute "category", and that its structure is composed of an element "subject" followed by an element "body". An automaton that corresponds to the strict grammar for this element is shown next.

Figure 2-3. Schema-Informed Grammar for SE(note)

Note that AT("category") is accepted before AT("date") even though their order is reversed in the schema. This is because attributes in schema-informed grammars must be lexicographically sorted first by local name and then by namespace URI. Attribute sorting reduces the number of options which, in turn, greatly simplifies grammar creation and improves compactness. Since this automaton does not include transitions on AT(*) or SE(*) any deviations from the schema will result in an encoding error.

2.2 Content Representation

2.2.1 Built-In Types

EXI uses built-in types to represent so called content value items in an efficient manner. In other words, all attribute and character values are encoded according to its type information. Type information can be retrieved from available schema information. The following table lists the mapping between XML Schema Types and Built-in Types in EXI.

Table 2-7. Built-in EXI Datatypes
Built-in EXI Datatype	XML Schema Types
Binary	base64Binary, hexBinary
Boolean	boolean
Date-Time	dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth
Decimal	decimal
Float	float, double
Integer	integer without minInclusive or minExclusive facets, or with minInclusive or minExclusive facet of negative value
Unsigned Integer	nonNegativeInteger or integer with minInclusive or minExclusive facet value of 0 or above
n-bit Unsigned Integer	integer with bounded range of 4095 or smaller as determined by the values of minInclusive, minExclusive, maxInclusive and maxExclusive facets
String	string, anySimpleType, anyURI, duration, All types derived by union
List	All types derived by list, including IDREFS and ENTITIES
QName

Note:

The datatype QName is used for structure coding only, such as qualified names for XML elements and attributes.

The interested reader is referred to the EXI specification which describes in details the encoding rules for representing built-in EXI datatypes. In the absence of external type information (no available schema information) or when the preserve.lexicalValues option is set to true, all attribute and character values are typed as String.

2.2.2 String Table

String tables are used in memory-sensitive areas allowing a compact representation of repeating string values. Re-occurring string values are represented using an associated compact identifier rather than encoding the string literally again. When a string value is found in the string table (string table hit) the value is encoded using a compact identifier. Only if a string value is not found in the associated table (string table miss) the string is encoded as String and a new compact identifier is introduced.

EXI uses four different string table partitions to reflect the different uses of strings in the XML Infoset:

URI
Prefix
LocalName
Value

An EXI string table partition is optimized for more frequent use of either compact identifiers or string literals depending on the purpose of the partition. The URI and Prefix partition is optimized for frequent use of compact identifiers while LocalName and Value partition is optimized for frequent use of string literals.

The table below shows EXI content items used in section 2.1.2 EXI Body to describe the content of EXI events and their mapping to Built-in Datatypes. In addition relations to the string table partitions are annotated (e.g. content item prefix is assigned to the Prefix partition).

Table 2-8. Data types of event content items
Content item	Used in	Built-in EXI Datatype	String Table (Partition)
indicator	NS	Boolean
name	PI, DT, ER	String
prefix	NS	String	Prefix
public	DT	String
qname	SE, AT	QName	URI, LocalName
system	DT	String
text	CM, PI	String
uri	NS	String	URI
value	CH, AT	According to schema type	Value
value	CH, AT	String	Value

In the subsequent paragraphs more details about the different partitions are given by making use of the previously introduced Notebook example. The XML example is inserted inline once again to facilitate a better understanding.

<?xml version="1.0" encoding="UTF-8"?> 
<notebook date="2007-09-12">
 <note date="2007-07-23" category="EXI">
  <subject>EXI</subject>
  <body>Do not forget it!</body>
 </note>
 <note date="2007-09-12">
  <subject>Shopping List</subject>
  <body>milk, honey</body>
 </note>
</notebook>

The URI portion of qname content items and uri content items are assigned to the URI partition. The partition is initially pre-populated with three likely entries (see figure below). When XML Schemas are used (schema-informed mode) there is an additional entry that is appended to the URI partition.

Figure 2-4. URI Partition

The LocalName portion of qname content items and prefix content items are assigned to the LocalName or respectively Prefix partition. Both partitions are initially pre-populated with likely entries (see figure below). The partitions are further subdivided in sections according to the associated namespace URI. In our notebook example no prefixes are used and the default namespace URI ("" [empty-string]) is in charge.

Figure 2-5. Prefix and LocalName Partition

The figure above shows in highlighted form the URI and LocalName items used throughout the entire example document. For instance the notebook sample assigns six local-name entries, such as notebook and date, to the empty URI namespace. Whenever the LocalName and/or URI portion of a qname occur again, the compact identifier is used instead.

The last and probably most interesting partition is the Value Partition which is initially empty and grows while processing an XML instance. Attribute and Character content-values are assigned to this partition.

Figure 2-6. Value Partition

The figure above illustrates that value content items can be referenced in two different ways, namely as "global" and "local" value. When a string value is neither found in the global nor in the local value section its string literal is encoded as String and the string value is added to both, the associated local and the global string index.

When a string value is found in the local value section EXI uses the corresponding compact identifier to encode the re-appearance more efficiently. The value "2007-09-12" appears twice in the date context. The second occurrence results in a local value hit and is respectively encoded as a 1 bit compact identifier.

On the other side, if a string value is not found in the local section, but is found in the global section, the corresponding global compact identifier is used. The value "EXI" appears two times in the notebook example, respectively in the context of category and subject, and results in a global value hit encoded in 3 bits. Due to the different table sizes a global compact identifier is less compact than a local compact identifier hit. Nevertheless global value hits avoid encoding string literals again.

The number of bits needed to encode a compact identifier depends on the actual number of entries of the associated table at that time. Since all tables are growing while parsing an XML instance, the number of bits are not fixed. The figure above illustrates the situation after coding the entire XML instance. This growth effect applies to all string table partitions and makes the format very compact for small documents.

Note:

This section describes EXI String Tables at a conceptual level. The exact bit representation of table misses and indices is not presented in this document, but is described in full in the EXI specification document.

2.3 Compression

EXI can use additional computational resources to achieve higher compaction. Instead of compressing the entire stream EXI combines the knowledge of XML and the application of a standard compression algorithm. Homogeneous data is combined and fed separately to the compression engine.

The mechanism used to combine homogeneous data is simple and flexible enough so that it can be used in schema-informed and schema–less mode. Element and attribute values are grouped according to their qualified names while structure information like Event Codes is combined. To keep compression overhead at a minimum, smaller QName channels are combined while larger channels are compressed separately.

Figure 2-7. EXI Body Stream

The figure above uses grey buckets for structure information and colored buckets for content information. The color is determined by the associated QName (e.g. date, category, subject, body).

XML instances can be seen as a combination of structure and content information. The content information can be further divided in different sections according to the context (surrounding structure as indicated by a QName). EXI treats XML instances this way and uses these implied partitions, so called channels, to provide blocked input to a standard compression algorithm. This blocking of similar data increases compression efficiency.

Figure 2-8. EXI Compression

Note:

An alignment phase creates a byte-aligned representation of event codes and content items that is more amenable to compression algorithms compared to unaligned representations. Most compression algorithms operate on a series of bytes to identify redundancies in the octets.

By combining smaller channels into the same compressed stream while others are compressed separately EXI keeps the compression overhead at a minimum. The mechanism to determine whether channels are combined or compressed separately is guided by the number of value content items present in the EXI stream. For small documents (≤ 100 value content items) EXI uses a single compressed stream while larger documents (> 100 value content items) result in several independent compressed streams. The reader is referred to the EXI specification for further details.

The notebook example falls in the first category and is encoded as a single compressed deflate stream containing first the structure channel, followed by the QName channels in the order they appear in the document (date, category, subject, body).

3. Efficient XML Interchange by Example

This section walks through the coding of the Notebook Example explaining the concepts previously introduced in a step-by-step approach.

3.1 Notation

The table below shows the notations that are used in the description of EXI encoding in subsequent sections.

Notation	Description
N _n	An unsigned integer N is encoded as n-bit Unsigned Integer.
N _uint	An unsigned integer N is encoded as Unsigned Integer.
"Literal String"	The string shown between double quotation marks is encoded as String (length prefixed sequence of characters).
"Literal String" _+n	The string shown between double quotation marks is encoded as String (length prefixed sequence of characters) with the length part being the length of the string incremented by n.

3.2 Options

We do not make use of specific encoding options like using compression or pluggable codecs to encode the body.

The table below shows the fidelity options used in the presented example throughout the document.

Table 3-1. Fidelity options used in this document
Fidelity option	Value	Effect
Preserve.comments	false	Productions of CM events are pruned from grammars
Preserve.pis	false	Productions of PI events are pruned from grammars
Preserve.dtd	false	Productions of DOCTYPE and ER events are pruned from grammars
Preserve.prefixes	true	NS events and namespace prefixes are preserved
Preserve.lexicalValues	false	Lexical form of element and attribute values is not preserved

The fidelity options setting shown above prunes those productions in EXI grammar that contain CM (Comment), PI (Processing Instruction), DT (DocType) or ER (Entity Reference) events. This is so as to make the grammar presentation as concise as possible, yet preserving the variety of event types that are actually used in the example XML document.

In the example encodings, whitespaces that are originally present in the example XML document are omitted, which is intentional. EXI format is capable of representing those whitespaces in an efficient manner. They are omitted in the example encodings to make the encoding description more articulate by focusing on the primary data and structure without being distracted by the frequent occurrence of indentation whitespaces.

3.3 Encoding without a schema

This section describes the encoding of EXI stream body when the sample XML document is transcoded into an EXI document in the absence of any schemas.

Event Grammar EXI Encoding Note

EventCode Content

<?xml version="1.0"
                                    encoding="UTF-8"?>

Document
SD	DocContent	0

0₀

<notebook>

DocContent
SE(*)	DocEnd	0

0₀

1₂ "notebook"₊₁

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
"notebook"₊₁ (local-name miss)
Local-Names (default)
0 "notebook"
DocContent moves on to DocEnd and StartTagNotebook is pushed on rule-stack

date = "2007-09-12"

StartTagNotebook
EE		0.0
AT(*)	StartTagNotebook	0.1
NS	StartTagNotebook	0.2
SE(*)	ElementNotebook	0.3
CH	ElementNotebook	0.4

0₀ 1₃

1₂ "date"₊₁ "2007-09-12"₊₂

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
"date"₊₁ (local-name miss)
Local-Names (default)
0 "notebook"
1 "date"
"2007-09-12"₊₂ (value miss)
Values (date)
0 "2007-09-12"
StartTagNotebook extended by leading AT(date)

<note>

StartTagNotebook
AT(date)	StartTagNotebook	0
EE		1.0
AT(*)	StartTagNotebook	1.1
NS	StartTagNotebook	1.2
SE(*)	ElementNotebook	1.3
CH	ElementNotebook	1.4

1₁ 3₃

1₂ "note"₊₁

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
"note"₊₁ (local-name miss)
Local-Names (default)
0 "notebook"
1 "date"
2 "note"
StartTagNotebook extended by leading SE(note)
StartTagNotebook moves on to ElementNotebook and StartTagNote is pushed on rule-stack

date="2007-07-23"

StartTagNote
EE		0.0
AT(*)	StartTagNote	0.1
NS	StartTagNote	0.2
SE(*)	ElementNote	0.3
CH	ElementNote	0.4

0₀ 1₃

1₂ 0_1B 1₂ "2007-07-23"₊₂

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
0_uint 1₂ (local-name hit)
Local-Names (default)
0 "notebook"
1 "date"
2 "note"
"2007-07-23"₊₂ (value miss)
Values (date)
0 "2007-09-12"
1 "2007-07-23"
StartTagNote extended by leading AT(date)

category="EXI"

StartTagNote
AT(date)	StartTagNote	0
EE		1.0
AT(*)	StartTagNote	1.1
NS	StartTagNote	1.2
SE(*)	ElementNote	1.3
CH	ElementNote	1.4

1₁ 1₃

1₂ "category"₊₁ "EXI"₊₂

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
"category"₊₁ (local-name miss)
Local-Names (default)
0 "notebook"
1 "date"
2 "note"
3 "category"
"EXI"₊₂ (value miss)
Values (category)
0 "EXI"
StartTagNote extended by leading AT(category)

<subject>

StartTagNote
AT(category)	StartTagNote	0
AT(date)	StartTagNote	1
EE		2.0
AT(*)	StartTagNote	2.1
NS	StartTagNote	2.2
SE(*)	ElementNote	2.3
CH	ElementNote	2.4

2₂ 3₃

1₂ "subject"₊₁

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
"subject"₊₁ (local-name miss)
Local-Names (default)
0 "notebook"
1 "date"
2 "note"
3 "category"
4 "subject"
StartTagNote extended by leading SE(subject)
StartTagNote moves on to ElementNote and StartTagSubject is pushed on rule-stack

EXI

StartTagSubject
EE		0.0
AT(*)	StartTagSubject	0.1
NS	StartTagSubject	0.2
SE(*)	ElementSubject	0.3
CH	ElementSubject	0.4

0₀ 4₃

1_1B 2₂

1_uint 2₂ (global-value hit)
Global-Values
0 "2007-09-12"
1 "2007-07-23"
2 "EXI"
StartTagSubject extended by leading CH
StartTagSubject moves on to ElementSubject

</subject>

ElementSubject
EE		0
SE(*)	ElementSubject	1.0
CH	ElementSubject	1.1

0₁

ElementSubject popped from rule-stack

<body>

ElementNote
EE		0
SE(*)	ElementNote	1.0
CH	ElementNote	1.1

1₁ 0₁

1₂ "body"₊₁

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
"body"₊₁ (local-name miss)
Local-Names (default)
0 "notebook"
1 "date"
2 "note"
3 "category"
4 "subject"
5 "body"
ElementNote extended by leading SE(body)
StartTagBody is pushed on rule-stack

Do not forget it!

StartTagBody
EE		0.0
AT(*)	StartTagBody	0.1
NS	StartTagBody	0.2
SE(*)	ElementBody	0.3
CH	ElementBody	0.4

0₀ 4₃

"Do not forget it!"₊₂

"Do not forget it!"₊₂ (value miss)
Values (body)
0 "Do not forget it!"
StartTagBody extended by leading CH
StartTagBody moves on to ElementBody

</body>

ElementBody
EE		0
SE(*)	ElementBody	1.0
CH	ElementBody	1.1

0₁

ElementBody popped from rule-stack

</note>

ElementNote
SE(body)	ElementNote	0
EE		1
SE(*)	ElementNote	2.0
CH	ElementNote	2.1

1₂

ElementNote popped from rule-stack

<note>

ElementNotebook
EE		0
SE(*)	ElementNotebook	1.0
CH	ElementNotebook	1.1

1₁ 0₁

1₂ 0_1B 2₃

(0+1)₂ "" (uri hit)
URIs
0 "" [empty string]
1 ".../XML/1998/namespace"
2 ".../XMLSchema-instance"
3 ".../XMLSchema"
0_unit 2₃ (local-name hit)
Local-Names (default)
0 "notebook"
1 "date"
2 "note"
3 "category"
4 "subject"
5 "body"
ElementNotebook extended by leading SE(note)
StartTagNote is pushed on rule-stack

date="2007-09-12"

StartTagNote
SE(subject)	ElementNote	0
AT(category)	StartTagNote	1
AT(date)	StartTagNote	2
EE		3.0
AT(*)	StartTagNote	3.1
NS	StartTagNote	3.2
SE(*)	ElementNote	3.3
CH	ElementNote	3.4

2₂

0_1B 0₁

0_uint 1₁ (value hit)
Values (date)
0 "2007-09-12"
1 "2007-07-23"

<subject>

StartTagNote
SE(subject)	ElementNote	0
AT(category)	StartTagNote	1
AT(date)	StartTagNote	2
EE		3.0
AT(*)	StartTagNote	3.1
NS	StartTagNote	3.2
SE(*)	ElementNote	3.3
CH	ElementNote	3.4

0₂

StartTagNote moves on to ElementNote and StartTagSubject is pushed on rule-stack

Shopping List

StartTagSubject
CH	ElementSubject	0
EE		1.0
AT(*)	StartTagSubject	1.1
NS	StartTagSubject	1.2
SE(*)	ElementSubject	1.3
CH	ElementSubject	1.4

0₁

"Shopping List"₊₂

"Shopping List"₊₂ (value miss)
Values (subject)
0 "Shopping List"
StartTagSubject moves on to ElementSubject

</subject>

ElementSubject
EE		0
SE(*)	ElementSubject	1.0
CH	ElementSubject	1.1

0₁

ElementSubject popped from rule-stack

<body>

ElementNote
SE(body)	ElementNote	0
EE		1
SE(*)	ElementNote	2.0
CH	ElementNote	2.1

0₂

StartTagBody is pushed on rule-stack

milk, honey

StartTagBody
CH	ElementBody	0
EE		1.0
AT(*)	StartTagBody	1.1
NS	StartTagBody	1.2
SE(*)	ElementBody	1.3
CH	ElementBody	1.4

0₁

"milk, honey"₊₂

"milk, honey"₊₂ (value miss)
Values (body)
0 "Do not forget it!"
1 "milk, honey"
StartTagBody moves on to ElementBody

</body>

ElementBody
EE		0
SE(*)	ElementBody	1.0
CH	ElementBody	1.1

0₁

ElementBody popped from rule-stack

</note>

ElementNote
SE(body)	ElementNote	0
EE		1
SE(*)	ElementNote	2.0
CH	ElementNote	2.1

1₂

ElementNote popped from rule-stack

</notebook>

ElementNotebook
SE(note)	ElementNote	0
EE		1
SE(*)	ElementNote	2.0
CH	ElementNote	2.1

1₂

ElementNotebook popped from rule-stack

EOF

DocEnd
ED		0

0₀

3.4 Schema-less Decoding

Steps to decode an EXI document

Decode Event Code (according to grammar-rule context)
Decode event content (if present, see EXI Event Types)
Move forward in grammar according to current grammar rules
Return to step 1 if last event was not EndDocument (ED)
[Done]

Grammar EXI Decoding Note

EventCode Event Content

Document
SD	DocContent	0

0₀

Decode EventCode SD
0₀
Document moves on to DocContent

<?xml version="1.0" ?>

DocContent
SE(*)	DocEnd	0

0₀

1₂ 14_uint "questionnaire"_13B

Decode EventCode SE(*)
0₀
1₂ (uri hit)
val id URIs
0 [uri miss]
1 0 "" [empty string]
2 1 ".../XML/1998/namespace"
3 2 ".../XMLSchema-instance"
14_uint "questionnaire"_13B (local-name miss)
val_uint Local-Names
0
id Local-Name Partition
0 "questionnaire"

> 0 "Literal String"_(val-1)B
DocContent moves on to DocEnd and StartTagQuestionnaire is pushed on rule-stack

<questionnaire>

StartTagQuestionnaire
EE		0.0
AT(*)	StartTagQuestionnaire	0.1
NS	StartTagQuestionnaire	0.2
SE(*)	ElementQuestionnaire	0.3
CH	ElementQuestionnaire	0.4

0₀ 3₃

1₂ 9_uint "question"_8B

Decode EventCode SE(*)
0₀ 3₃
Decode QName (uri & local-name)
1₂ (uri hit)
val id URIs
0 [uri miss]
1 0 "" [empty string]
2 1 ".../XML/1998/namespace"
3 2 ".../XMLSchema-instance"
9_uint "question"_8B (local-name miss)
val_uint Local-Names
0
id Local-Name Partition
0 "questionnaire"
1 "question"

> 0 "Literal String"_(val-1)B
StartTagQuestionnaire is extended by leading SE(question)
StartTagQuestionnaire moves on to ElementQuestionnaire and StartTagQuestion is pushed on rule-stack

<question>

StartTagQuestion
EE		0.0
AT(*)	StartTagQuestion	0.1
NS	StartTagQuestion	0.2
SE(*)	ElementQuestion	0.3
CH	ElementQuestion	0.4

0₀ 4₃

29_uint "Is EXI ..."_27B

Decode EventCode CH
0₀ 4₃

Decode Characters

29_uint "Is EXI difficult to decode?"_27B (value miss)

val_uint

Values

id	Values (question)
0	"Is EXI difficult to decode?"

id	Global-Values
0	"Is EXI difficult to decode?"

> 1

"Literal String"_(val-2)B

StartTagQuestion is extended by a leading CH
StartTagQuestion moves on to ElementQuestion

Is EXI difficult to decode?

ElementQuestion
EE		0
SE(*)	ElementQuestion	1.0
CH	ElementQuestion	1.1

0₁

Decode EventCode EE
0₁
ElementQuestion popped from rule-stack

</question>

ElementQuestionnaire
EE		0
SE(*)	ElementQuestionnaire	1.0
CH	ElementQuestionnaire	1.1

1₁ 0₁

1₂ 8_uint "choices"_7B

Decode EventCode SE(*)
1₁ 0₁

Decode QName (uri & local-name)

1₂ (uri hit)

val	id	URIs
0		[uri miss]
1	0	"" [empty string]
2	1	".../XML/1998/namespace"
3	2	".../XMLSchema-instance"

8_uint "choices"_7B (local-name miss)

val_uint

Local-Names

id	Local-Name Partition
0	"questionnaire"
1	"question"
2	"choices"

> 0

"Literal String"_(val-1)B

ElementQuestionnaire is extended by leading SE(choices)
StartTagChoices is pushed on rule-stack

<choices>

StartTagChoices
EE		0.0
AT(*)	StartTagChoices	0.1
NS	StartTagChoices	0.2
SE(*)	ElementChoices	0.3
CH	ElementChoices	0.4

0₀ 3₃

1₂ 7_uint "choice"_6B

Decode EventCode SE(*)
0₀ 3₃

Decode QName (uri & local-name)

1₂ (uri hit)

val	id	URIs
0		[uri miss]
1	0	"" [empty string]
2	1	".../XML/1998/namespace"
3	2	".../XMLSchema-instance"

7_uint "choice"_6B (local-name miss)

val_uint

Local-Names

id	Local-Name Partition
0	"questionnaire"
1	"question"
2	"choices"
3	"choice"

> 0

"Literal String"_(val-1)B

StartTagChoices is extended by leading SE(choice)
StartTagChoices moves on to ElementChoices and StartTagChoice is pushed on rule-stack

<choice>

StartTagChoice
EE		0.0
AT(*)	StartTagChoice	0.1
NS	StartTagChoice	0.2
SE(*)	ElementChoice	0.3
CH	ElementChoice	0.4

0₀ 4₃

5_uint "Yes"_3B

Decode EventCode CH
0₀ 4₃

Decode Characters

5_uint "Yes"_3B (value miss)

val_uint

Values

id	Values (choice)
0	"Yes"

id	Global-Values
0	"Is EXI difficult to decode?"
1	"Yes"

> 1

"Literal String"_(val-2)B

StartTagChoice is extended by leading CH
StartTagChoice moves on to ElementChoice

Yes

ElementChoice
EE		0
SE(*)	ElementChoice	1.0
CH	ElementChoice	1.1

0₁

Decode EventCode SE
0₁
ElementChoice popped from rule-stack

</choice>

ElementChoices
EE		0
SE(*)	ElementChoices	1.0
CH	ElementChoices	1.1

1₁ 0₁

1₂ 0_uint 3₂

Decode EventCode SE(*)
1₁ 0₁

Decode QName (uri & local-name)

1₂ (uri hit)

val	id	URIs
0		[uri miss]
1	0	"" [empty string]
2	1	".../XML/1998/namespace"
3	2	".../XMLSchema-instance"

0_uint 3₂ (local-name hit)

val_uint

Local-Names

id	Local-Name Partition
0	"questionnaire"
1	"question"
2	"choices"
3	"choice"

> 0

"Literal String"_(val-1)B

ElementChoices extended by leading SE(choice) and StartTagChoise pushed on rule-stack

<choice>

StartTagChoice
CH	ElementChoice	0
EE		1.0
AT(*)	StartTagChoice	1.1
NS	StartTagChoice	1.2
SE(*)	ElementChoice	1.3
CH	ElementChoice	1.4

0₁

4_uint "No"_2B

Decode EventCode CH
0₁

Decode Characters

4_uint "No"_2B (value miss)

val_uint

Values

id	Values (choice)
0	"Yes"
1	"No"

id	Global-Values
0	"Is EXI difficult to decode?"
1	"Yes"
1	"No"

> 1

"Literal String"_(val-2)B

StartTagChoice moves on to ElementChoice

No

ElementChoice
EE		0
SE(*)	ElementChoice	1.0
CH	ElementChoice	1.1

0₁

Decode EventCode EE
0₁
ElementChoice popped from rule-stack

</choice>

ElementChoices
SE(choice)	ElementChoices	0
EE		1
SE(*)	ElementChoices	2.0
CH	ElementChoices	2.1

1₂

Decode EventCode EE
1₂
ElementChoices popped from rule-stack

</choices>

ElementQuestionnaire
SE(choices)	ElementQuestionnaire	0
EE		1
SE(*)	ElementQuestionnaire	2.0
CH	ElementQuestionnaire	2.1

1₂

Decode EventCode EE
1₂
ElementQuestionnaire popped from rule-stack

</questionnaire>

DocEnd
ED		0

0₀

Decode EventCode ED
0₀

EOF

The resulting XML instance is shown below.

Example 3-1. Decoded XML Document (Questionnaire)

<?xml version="1.0" encoding="UTF-8"?> 
<questionnaire>
  <question>Is EXI difficult to decode?</question>
  <choices>
    <choice>Yes</choice>
    <choice>No</choice>
  </choices>
</questionnaire>

Efficient XML Interchange (EXI) Primer

W3C Working Draft 19 December 2007

Abstract

Status of this Document

Table of Contents

Appendix

1. Introduction

2. Concepts

2.1 EXI Streams

2.1.1 EXI Header

2.1.2 EXI Body

2.1.3 EXI Grammars

2.1.3.1 Built-In Grammar

2.1.3.2 Schema-informed Grammar

2.2 Content Representation

2.2.1 Built-In Types

2.2.2 String Table

2.3 Compression

3. Efficient XML Interchange by Example

3.1 Notation

3.2 Options

3.3 Encoding without a schema

3.4 Schema-less Decoding

A Encoding Examples