<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.2//EN"
       "http://www.w3.org/2002/xmlspec/dtd/2.2/xmlspec.dtd" [
  <!ENTITY proposed-features SYSTEM "proposed-features.xml">       
  <!ENTITY exins "http://www.w3.org/2007/07/exi">
  <!ENTITY times "&#215;">
  <!ENTITY ne "&#8800;">
  <!ENTITY le "&#8804;">
  <!ENTITY oplus "&#8853;">
  <!ENTITY hellip "&#8230;">
  <!ENTITY vellip "&#8942;">
  <!ENTITY lceil "&#8968;">
  <!ENTITY rceil "&#8969;">
  <!ENTITY sqcup "&#x2294;">
  <!-- cup looked so close to alphabet "U", use sqcup instead -->
  <!-- !ENTITY cup "&#8746;" -->
  <!ENTITY nbsp "&#160;">
  <!ENTITY mdash "&#8212;">
]>
<!--

/*
 * Copyright (c) 2007 World Wide Web Consortium,
 *
 * (Massachusetts Institute of Technology, European Research Consortium for
 * Informatics and Mathematics, Keio University). All Rights Reserved. This
 * work is distributed under the W3C(r) Document License [1] in the hope that
 * it will be useful, but WITHOUT ANY WARRANTY; without even the implied
 * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 *
 * [1] http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231
 */

    -->
<!-- $Id: exi.xml,v 1.2 2007/12/18 15:25:55 cbournez Exp $ -->
<?xml-stylesheet type="text/xsl" href="exi.xsl"?>
<spec w3c-doctype="wd">
<header>
<title>Efficient XML Interchange (EXI) Format</title>
<version>1.0</version>
<w3c-designation>WD-exi-20071219</w3c-designation>
<!-- w3c-doctype>W3C Editors' Draft</w3c-doctype -->
<w3c-doctype>W3C Working Draft</w3c-doctype>
<pubdate>
<day>19</day>
<month>December</month>
<year>2007</year></pubdate>
<publoc>
<!-- loc href="http://www.w3.org/XML/Group/EXI/docs/format/exi.html">http://www.w3.org/XML/Group/EXI/docs/format/exi.html</loc -->
<loc href="http://www.w3.org/TR/2007/WD-exi-20071219/">http://www.w3.org/TR/2007/WD-exi-20071219/</loc>
</publoc>
<altlocs>
<loc role="xml" href="exi.xml">XML</loc></altlocs>
<prevlocs>
<loc href="http://www.w3.org/TR/2007/WD-exi-20070716/">http://www.w3.org/TR/2007/WD-exi-20070716/</loc>
</prevlocs>
<latestloc>
<loc href="http://www.w3.org/TR/exi/">http://www.w3.org/TR/exi/</loc></latestloc>
<authlist>
<author>
<name>John Schneider</name>
<affiliation>AgileDelta, Inc.</affiliation>
<!-- email></email --></author>
<author>
<name>Takuki Kamiya</name>
<affiliation>Fujitsu Laboratories of America, Inc.</affiliation>
<!-- email></email --></author></authlist>
<abstract>
<p>This document is the specification of the Efficient XML Interchange (EXI)
format. EXI is a very compact representation for the Extensible Markup
Language (XML) Information Set that is intended to simultaneously optimize
performance and the utilization of computational resources. The EXI
format uses a hybrid approach drawn from the information and formal language
theories, plus practical techniques verified by measurements,
for entropy encoding XML information. Using a relatively simple algorithm,
which is amenable to fast and compact implementation, and a small set of
data types, it reliably produces efficient encodings of XML event streams.
The event production system and format definition of EXI are presented.</p>
<p>As elaborated in the <a href="#Status">Status</a> section, this specification is subject to change. Items presently under consideration by the WG are either noted in the main text or listed in <specref ref="proposedFeatures"/>.</p>
</abstract>
<status id="Status">
<p>
<emph>This section describes the status of this document at the time
of its publication. Other documents may supersede this document. A
list of current W3C publications and the latest revision of this
technical report can be found in the <loc
href="http://www.w3.org/TR/">W3C technical reports index</loc> at
http://www.w3.org/TR/.</emph></p>
<!-- p>This document is an Editors' Draft of a possible future W3C
Recommendation for internal review by W3C members and has no official
standing. It has been developed by the <loc
href="http://www.w3.org/XML/EXI/">Efficient XML Interchange (EXI)
Working Group</loc>, which is part of the <loc
href="http://www.w3.org/XML/Activity">Extensible Markup Language (XML)
Activity</loc>.</p -->
<p>
This is the second Public Working Draft of the Efficient XML Interchange(EXI) Format 1.0 specification and is intended for review by W3C members
and other interested parties.
It has been developed by the <loc href="http://www.w3.org/XML/EXI/">
Efficient XML Interchange (EXI) Working Group</loc>, which is part of
the <loc href="http://www.w3.org/XML/Activity">Extensible Markup Language
(XML) Activity</loc>.

A summary <xspecref href='#changes'>list of changes</xspecref> made to this document since the last publication is available.

</p>
<p>The features and algorithms described in the normative portion of the
document are specified in enough detail to be adequate for early
implementation experiments. Other features
the group is considering are found in the non-normative Appendix
<specref ref="proposedFeatures"/>, for which only brief descriptions are
provided, and should probably not yet be considered for implementation.</p>
<p>Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>
<!-- p>The facilities described herein are based on EXI Member
Submissions. The Working Group anticipates substantial changes in the
mechanisms described herein and expects additional functions
integrated in subsequent Working Drafts.</p -->
<p>Comments on this document are invited and are to be sent to the
public <loc href="mailto:public-exi@w3.org">public-exi@w3.org</loc>
mailing list (<loc
href="http://lists.w3.org/Archives/Public/public-exi/">public
archive</loc>).</p>
<!-- p>Discussion of this document takes place on the <loc
     href="mailto:public-exi@w3.org">public-exi@w3.org</loc>

  mailing list (<loc
  href="http://lists.w3.org/Archives/Public/public-exi/">public
  archive</loc>).</p -->
<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/38502/status#specs">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>
</status>
<langusage>
<language id="en-US">English</language></langusage>
<revisiondesc>
<p>Last Modified: $Date: 2007/12/18 15:25:55 $</p></revisiondesc></header>
<body>
<div1 id="introduction">
<head>Introduction</head>
<p>The Efficient XML Interchange (EXI) format is a very compact, high
performance XML representation that was designed to work well for a
broad range of applications.  It simultaneously improves performance
and significantly reduces bandwidth requirements without compromising
efficient use of other resources such as battery life, code size,
processing power, and memory.
</p>
<p>EXI uses a grammar-driven approach that achieves very efficient
encodings using a straightforward encoding algorithm and a small set
of data types. Consequently, <termref def="key-exiprocessor">EXI processors</termref> are relatively simple and
can be implemented on devices with limited capacity. </p> <p>EXI is schema
&quot;informed&quot;, meaning that it can utilize available schema
information to improve compactness and performance, but does not
depend on accurate, complete or current schemas to work. It supports
arbitrary schema extensions and deviations and also works very
effectively with partial schemas or in the absence of any schema.  The
format itself also does not depend on any particular schema language,
or format, for schema information. </p>
<p><termdef id="key-exiprocessor" term="EXI processor">A program module
called an <term>EXI processor</term>, whether it is part of a software or
a hardware, is used by application programs to encode their structured data
into <termref def="key-existream">EXI streams</termref> and/or to decode
<termref def="key-existream">EXI streams</termref> to make the structured
data accessible to them.</termdef> This document not only specifies the
EXI format, but also defines errors that EXI processors are required to
detect and behave upon.</p>
<p>The primary goal of this document is to define the EXI format completely without leaving ambiguity so as to make it feasible for implementations to interoperate. As such, the document lends itself to describing the design and features of the format in a systematic manner, often declaratively with relatively few prosaic annotations and examples. Those readers who prefer a step-by-step introduction to the EXI format design and features are suggested to start with the non-normative <bibref ref="exiprimer"/>.
</p>
<div2 id="history">
<head>History and Design</head>
<p>EXI is the result of extensive work carried out by the W3C's XML
Binary Characterization (XBC) and Efficient XML Interchange (EXI)
Working Groups. XBC was chartered to investigate the costs and
benefits of an alternative form of XML, and formulate a way to objectively
evaluate the potential of a substitute format for XML.  Based on XBC's
recommendations, EXI was chartered, first to measure, evaluate, and
compare the performance of various XML technologies (using metrics
developed by XBC <bibref ref="xbcmeas"/>), and then, if it appeared
suitable, to formulate a recommendation for a W3C format
specification. The measurements results and analyses, are presented
elsewhere <bibref ref="eximeas"/>. The format described in this
document is the specification so recommended. 
</p>
<p>The functional requirements of the EXI format are those that were
prepared by the XBC WG in their analysis of the desirable properties
of a high performance encoding for XML <bibref ref="xbcproperties"/>.
Those properties were derived from a very broad set of use cases also
identified by the XBC working group <bibref ref="xbcusecases"/>.
</p>
<p>The design of the format presented here, is largely based on the
results of the measurements carried out by the group to evaluate the
performance characteristics (mainly of processing efficiency and
compactness) of various existing formats. The EXI format is based on
Efficient XML <bibref ref="efx"/>, including for example the basis heuristic grammar approach,
compression algorithm, and resulting entropy encoding. Present work
centers around evaluating and integrating some features from other
measured format technologies into EXI (see Appendix <specref ref="proposedFeatures"/>).
</p>
<p>EXI is compatible with XML at the XML Information Set <bibref
ref="XMLInfoset"/> level, rather than at the XML syntax level. This
permits it to encapsulate an efficient alternative syntax and grammar
for XML, while facilitating at least the potential for minimizing the
impact on XML application interoperability.
</p>    
</div2>
<div2 id="conventions">
<head>Notational Conventions and Terminology</head>
<p>The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear
EMPHASIZED in this document, are to be interpreted as described in RFC
2119 <bibref ref="RFC2119"/>. Other terminology used to describe the EXI
format is defined in the body of this specification.
</p>
<p>The term <term>event</term> and <term>stream</term> is used throughout this document to denote <term><termref def="key-exievent">EXI event</termref></term> and <term><termref def="key-existream">EXI stream</termref></term> respectively unless the words are qualified differently to mean otherwise.</p>
<p>This document specifies an abstract grammar for EXI. In grammar notation, all terminal
symbols are represented in plain text and all non-terminal symbols are
represented in <emph>italics</emph>. Grammar productions are
represented as follows: </p>
<table width="100%">
<tbody>
<tr>
<td width="5%"></td>
<td>
<emph>LeftHandSide</emph> :&nbsp;&nbsp;
Event&nbsp;&nbsp;<emph>NonTerminal</emph></td></tr></tbody></table>
<p>A set of one or more grammar productions that share the same
left-hand-side non-terminal symbol are often presented together along
with <termref def="key-eventcode">event codes</termref> that uniquely
identify events among the collocated productions as follows: 
</p>
<table width="100%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>LeftHandSide</emph> :</td></tr>

<tr>
<td></td>
<td width="5%"></td>
<td width="75%">
Event <sub>1</sub>&nbsp;&nbsp;<emph>NonTerminal 
<sub>1</sub></emph></td>
<td>EventCode<sub>1</sub></td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>2</sub>&nbsp;&nbsp;<emph>NonTerminal 
<sub>2</sub></emph></td>
<td>EventCode<sub>2</sub></td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>3</sub>&nbsp;&nbsp;<emph>NonTerminal 
<sub>3</sub></emph></td>
<td>EventCode<sub>3</sub></td></tr>
<tr>
<td></td>
<td></td>
<td>...</td>
<td></td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>n</sub>&nbsp;&nbsp;<emph>NonTerminal 
<sub>n</sub></emph></td>
<td>EventCode<sub>n</sub></td></tr>
</tbody></table>
<p>Section <specref ref="grammarNotation"/> introduces additional notations for describing productions and event codes in grammars. Those additional notations facilitates concise representation of the EXI grammar system.
</p>
<p>Terminal symbols that are qualified with a qname permit the use of a wildcard symbol (*) in place of a qname. The terminal symbol SE (*) matches a start element (SE) event with any qname. Similarly, the terminal symbol AT (*) matches an attribute (AT) event with any qname.
</p>
<p>Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.</p>
<table width="80%" border="1">
<colgroup align="center" width="25%"></colgroup>
<colgroup/>
<thead>
<tr>
<th>Prefix</th>
<th>Namespace Name</th>
</tr>
</thead>
<tbody>
<tr>
<td>exi</td>
<td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&exins;</td>
</tr>
<tr>
<td>xml</td>
<td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
http://www.w3.org/XML/1998/namespace</td>
</tr>
<tr>
<td>xsd</td>
<td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
http://www.w3.org/2001/XMLSchema</td>
</tr>
<tr>
<td>xsi</td>
<td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
http://www.w3.org/2001/XMLSchema-instance</td>
</tr>
</tbody>
</table>
<p>In describing the layout of an EXI format construct, a pair of square brackets [ ] are used to surround the name of a field to denote that the occurrence of the field is optional in the structure of the part or component that contains the field.
</p>
<p>In arithmetic expressions, the notation &lceil;<emph>x</emph>&rceil; where <emph>x</emph> represents a real number denotes the ceiling of <emph>x</emph>, that is, the smallest integer greater than or equal to <emph>x</emph>.
</p>
</div2></div1>
<div1 id="principles">
<head>Design Principles</head>
<p>The following design principles were used to guide the development of EXI and encourage consistent design decisions. They are listed here to provide insight into the EXI design rationale and to anchor discussions on desirable EXI traits.</p>
<glist>
<gitem>
<label>General:</label>
<def>
<p>One of primary objectives of EXI is to maximize the number of systems, devices and applications that can communicate using XML data. Specialized approaches optimized for specific use cases should be avoided.</p></def></gitem>
<gitem>
<label>Minimal:</label>
<def>
<p>To reach the broadest set of small, mobile and embedded applications, simple, elegant approaches are preferred to large, analytical or complex ones. </p></def></gitem>
<gitem>
<label>Efficient:</label>
<def>
<p>EXI must be competitive with hand-optimized binary formats so it can be used by applications that require this level of efficiency. </p></def></gitem>
<gitem>
<label>Flexible:</label>
<def>
<p>EXI must deal flexibly and efficiently with documents that contain arbitrary schema extensions or deviate from their schema. Documents that contain schema deviations should not cause encoding to fail. </p></def></gitem>
<gitem>
<label>Interoperable:</label>
<def>
<p>EXI must integrate well with existing XML technologies, minimizing the changes required to those technologies. It must be compatible with the XML Information Set <bibref ref="XMLInfoset"/>, without significant subsetting or supersetting, in order to maintain interoperability with existing and prospective XML specifications.</p></def></gitem></glist></div1>
<div1 id="concepts">
<head>Basic Concepts</head>
<p>EXI achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur at any given point in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, EXI is optimized specifically for XML languages. </p>
<p>The built-in EXI grammar accepts any XML document or fragment and may be augmented with productions derived from XML Schemas <bibref ref="schema1"/><bibref ref="schema2"/>, RELAX NG schemas <bibref ref="relaxng"/>, DTDs <bibref ref="XML10"/> or other sources of information about what is likely to occur in a set of XML documents. The EXI encoder uses the grammar to map a stream of XML information items onto a smaller, lower entropy, stream of events. </p>
<p>The encoder then represents the stream of events using a set of simple variable length codes called <termref def="key-eventcode">event codes</termref>. <termref def="key-eventcode">Event codes</termref> are similar to Huffman codes <bibref ref="huffman"/>, but are much simpler to compute and maintain. They are encoded directly as a sequence of values, or if additional compression is desired, they are passed to the <termref def="compression">EXI compression</termref> algorithm, which replaces frequently occurring event patterns to further reduce size. </p>
<p>When schemas are used, EXI also supports a user-customizable set of typed encodings for efficiently encoding typed values. </p></div1>

<div1 id="streams">
<head>EXI Streams</head>
<p><termdef id="key-existream" term="EXI Stream">An <term>EXI stream</term> is an EXI header followed by an EXI stream body.</termdef> It is the EXI stream body that carries the content of the document, while the EXI header amongst its roles communicates the options that were used for encoding the EXI stream body. Section
<specref ref="header"/> describes the EXI header. Values in an EXI stream are packed into bytes most significant bit first.</p>
<p>Applications that use EXI streams embedded in a container data format that discerns it is an EXI stream, dictates the EXI format version and the EXI Options used for its encoding, may with to omit the EXI header. Although an EXI Body is not a valid EXI stream, EXI processors MAY provide a capability to process an EXI stream body independent of an EXI stream.
</p>
<p><termdef id="key-exievent" term="EXI Event">The building block of an EXI stream body is an <term>EXI event</term>.</termdef> An EXI stream body consists of a sequence of EXI events representing an EXI document or an <termref def="key-exifragment">EXI fragment</termref>.</p>
<p>The EXI events permitted at any given position in an EXI stream are determined by the EXI grammar. The events occur in a well-formed manner with matching start element and end element events in the same fashion as XML. The EXI grammar incorporates knowledge of the XML grammar and may be augmented and refined using schema information and fidelity options. EXI grammar is formally specified in section <specref ref="grammars"/>.</p>
<p>The following table summarizes the EXI events and associated content that occur in an EXI stream. In addition, the table includes the grammar notation used to represent each event in this specification. Each event in an EXI stream participates in a mapping system that relates events to XML Information Items so that an EXI document as a whole serves to represent an XML Information Set. The table shows XML Information Items relevant to each EXI event type. Appendix <specref ref="InfosetMapping"/> describes the mapping system in detail.</p>
<table id="eventTypes" border="1">
<caption>EXI event types</caption>
<thead>
<tr>
<th>EXI Event Type</th>
<th>Content</th>
<th>Grammar Notation</th>
<th>Information Item</th>
</tr>
</thead>
<tbody>
<tr>
<td>Start Document</td>
<td>&nbsp;</td>
<td>SD</td>
<td rowspan="2"><specref ref="DocumentInformationItem"/></td></tr>
<tr>
<td>End Document</td>
<td>&nbsp;</td>
<td>ED</td></tr>
<tr>
<td rowspan="2">Start Element</td>
<td rowspan="2"><emph>qname</emph>
</td>
<td>SE ( 
<emph>qname</emph> )</td>
<td rowspan="3"><specref ref="ElementInformationItem"/></td></tr>
<tr>
<td>SE ( 
<emph>*</emph> )</td></tr>
<tr>
<td>End Element</td>
<td>&nbsp;</td>
<td>EE</td></tr>
<tr>
<td rowspan="2">Attribute</td>
<td rowspan="2"><emph>qname, value</emph></td>
<td>AT ( 
<emph>qname</emph> )</td>
<td rowspan="2"><specref ref="AttributeInformationItem"/></td></tr>
<tr>
<td>AT ( 
<emph>*</emph> )</td></tr>
<tr>
<td>Characters</td>
<td><emph>value</emph></td>
<td>CH</td>
<td><specref ref="CharacterInformationItem"/></td></tr>
<tr>
<td>Namespace Declaration</td>
<td>
<emph>prefix</emph>, <emph>uri</emph>, <emph>indicator</emph>
</td>
<td>NS</td>
<td><specref ref="NamespaceInformationItem"/></td></tr>
<tr>
<td>Comment</td>
<td>
<emph>text</emph></td>
<td>CM</td>
<td><specref ref="CommentInformationItem"/></td></tr>
<tr>
<td>Processing Instruction</td>
<td>
<emph>name, text</emph></td>
<td>PI</td>
<td><specref ref="ProcessingInstructionInformationItem"/></td></tr>
<tr>
<td>DOCTYPE</td>
<td>
<emph>name, public, system, text</emph></td>
<td>DT</td>
<td><specref ref="DocumentTypeDeclaractionInformationItem"/></td></tr>
<tr>
<td>Entity Reference</td>
<td>
<emph>name</emph></td>
<td>ER</td>
<td><specref ref="UnexpandedEntityInformationItem"/></td></tr></tbody></table>
<p>Section 
<specref ref="encodingEvents"/> describes the algorithm used to encode events in the EXI stream. 
As indicated in the table above, there are some event types that carry content with their event instances while other event types function as markers without content. A grammar production may match a specific Element or Attribute by <emph>qname</emph> or match any Element or Attribute using wildcard notation. When a grammar matches an Element or Attribute by <emph>qname</emph>, the <emph>qname</emph> is not part of the content. When a grammar matches an Element or Attribute using wildcard notation, the <emph>qname</emph> is part of the content.</p>
<p>SE events may be followed by a series of NS events. Each NS event either associates a prefix with an URI, assigns a default namespace, or in the case of a namespace declaration with an empty URI, rescinds one of such associations in effect at the point of its occurrence. The effect of the association or disassociation caused by a NS event stays in effect until the corresponding EE event occurs. The series of NS events may include at most one NS event that carries an <emph>indicator</emph> value of 1. When the <emph>indicator</emph> has the value of 1, the <emph>prefix</emph> of the NS is used as the effective prefix of the element's <emph>qname</emph>. The <emph>uri</emph> of an NS event with an <emph>indicator</emph> value of 1 MUST match the <emph>uri</emph> of the associated SE event.
</p>
<p>Each item in the event content has a data type associated with it as shown in the following table. The content of each event, if any, is encoded as a sequence of items each of which being encoded according to its data type in order starting with the first item followed by subsequent items.</p>
<table border="1" width="95%" id='table2'>
<caption>Data types of event content items</caption>
<colgroup width="20%"/>
<colgroup width="30%"/>
<colgroup width="50%"/>
<thead>
<tr>
<th>Content item</th>
<th>Used in</th>
<th>Type</th></tr>
</thead>
<tbody>
<tr>
<td id="key-indicatorContentItem">
<emph>indicator</emph></td>
<td>NS</td>
<td>
<specref ref="encodingBoolean"/></td></tr>
<tr>
<td id="key-nameContentItem">
<emph>name</emph></td>
<td>PI, DT, ER</td>
<td>
<specref ref="encodingString"/></td></tr>
<tr>
<td id="key-prefixContentItem">
<emph>prefix</emph></td>
<td>NS</td>
<td>
<specref ref="encodingString"/></td></tr>
<tr>
<td id="key-publicContentItem">
<emph>public</emph></td>
<td>DT</td>
<td>
<specref ref="encodingString"/></td></tr>
<tr>
<td id="key-qnameContentItem">
<emph>qname</emph></td>
<td>SE, AT</td>
<td>
<specref ref="encodingQName"/></td></tr>
<tr>
<td id="key-systemContentItem">
<emph>system</emph></td>
<td>DT</td>
<td>
<specref ref="encodingString"/></td></tr>
<tr>
<td id="key-textContentItem">
<emph>text</emph></td>
<td>CM, PI</td>
<td>
<specref ref="encodingString"/></td></tr>
<tr>
<td id="key-uriContentItem">
<emph>uri</emph></td>
<td>NS</td>
<td>
<specref ref="encodingString"/></td></tr>
<tr>
<td id="key-valueContentItem">
<emph>value</emph></td>
<td>CH, AT</td>
<td>According to the schema type (see 
<specref ref="encodingValues"/>) if any is in effect and the <termref def="key-preserveLexicalValuesOption">preserve.lexicalValues</termref> option is set to false, otherwise <specref ref="encodingString"/></td></tr></tbody></table>

<p>Content items other than <emph>value</emph> have their inherent, fixed data types independent of their uses. The data type that governs each occurrence of the <emph>value</emph> item depends on the schema type if any that is in effect for the value in question. The type xsd:anySimpleType is used for <emph>value</emph>s that do not have an associated schema-type, are schema-invalid, or occur in mixed content. Section 
<specref ref="encodingValues"/> describes how each of the types listed above are encoded in an EXI stream. </p></div1>
<div1 id="header">
<head>EXI Header</head>
<p>Each EXI stream begins with an EXI header. The EXI header
distinguishes EXI documents from text XML documents, identifies the
version of the EXI format being used, and can specify the options used to encode
the body of the EXI stream. The EXI header has the following
structure:</p>

<table border="1">
<tbody>
<tr>
<td align="center" width="160"><termref def="key-distinguishingbits">Distinguishing Bits</termref></td>
<td align="center" width="160">
<table border="0">
<tbody>
<tr><td align="center">Presence Bit</td></tr>
<tr><td align="center">for EXI Options</td></tr>
</tbody>
</table></td>
<td align="center" width="160"><termref def="key-version">EXI Format Version</termref></td>
<td align="center" width="160">[<termref def="key-options">EXI Options</termref>]</td>
<td align="center" width="160">[Padding Bits]</td></tr>
</tbody>
</table>
<p>The EXI Options field within an EXI header is optional.  Its presence is indicated by
the value of the presence bit that follows <termref def="key-distinguishingbits">Distinguishing Bits</termref>. The presence and absence is indicated by the value 1 and 0, respectively.</p>


<p>When either <termref def="key-compressionOption">compression</termref> or
<termref def="key-alignmentOption">alignment</termref> of the EXI stream is turned on 
dictated by EXI options, padding bits of minumum length required to make the whole length of 
the header byte-aligned are added at the end of the header.</p>

<p>The following sections describe the remaining three parts of the header.</p>

<div2 id="DistinguishingBits">
<head>Distinguishing Bits</head>
<p><termdef id="key-distinguishingbits" term="Distinguishing Bits">
An EXI header starts with <term>Distinguishing Bits</term> part,
which is a two bit field used to distinguish EXI documents from text
XML documents</termdef>. The first bit
contains the value 1 and the second bit contains the value 0, as follows.</p>

<table border="1">
<tbody>
<tr class="bitcell">
<td align="center" class="bitcell">1</td>
<td align="center" class="bitcell">0</td></tr>
</tbody>
</table>

<p>This 2 bit sequence is the minimum that suffices to distinguish EXI
documents from XML documents since it is the minimum length bit
pattern that cannot occur as the first two bits of a well-formed XML
document represented in any one of the conventional character
encodings, such as UTF-8, UTF-16, UCS-2, UCS-4, EBCDIC, ISO 8859,
Shift-JIS and EUC, according to XML 1.0 <bibref ref="XML10"/>. Therefore, XML
Processors are expected to reject an EXI stream as early as they read
and process the first byte from the stream.</p>

<p>
Systems that use EXI documents as well as XML documents can look at
the Distinguishing Bits to determine whether to interpret a particular
stream as XML or EXI.
</p>

<ednote>
<edtext>

In addition to distinguishing EXI from XML, the 2 bit distinguishing
bit pattern given above can distinguish EXI streams from quite a broad
range of popular content types that occur on the web, like PNG.
However, the working group is actively considering the introduction of
larger set of bits, such as a magic cookie, to distinguish EXI from a
broader range of data types.

</edtext>

</ednote>

</div2>

<div2 id="version">
<head>EXI Format Version</head>
<p><termdef id="key-version" term="EXI Format Version">
The third part in the EXI header is the <term>EXI Format Version</term>, which identifies the version
of the EXI format being used.</termdef>
EXI format version numbers are integers. Each version of the EXI Format Specification specifies the corresponding EXI format version number to be used by conforming implementations. The EXI format version number that corresponds with this version of the EXI format specification is 0 (zero).</p>

<p>The first bit of the version field indicates whether the version is a preview or final version of the EXI format.
A value of 0 indicates this is a final version and a value of 1 indicates this is a preview
version. Final versions correspond to final, approved versions of the EXI format specification.
An <termref def="key-exiprocessor">EXI processor</termref> that implements a final version of the EXI format specification is REQUIRED to process EXI streams that have a version field with its first bit set to 0 followed by a version number that corresponds to the version of the EXI specification the processor implements.
<!-- <termref def="key-exiprocessor">EXI processors</termref> are REQUIRED to process EXI streams that have a version field with its first bit set to 0 and thereof MUST conform to the version of EXI specification that corresponds to the EXI format version that is in use in the stream. -->
Preview versions of the EXI format are useful for
gaining implementation and deployment experience prior to finalizing a
particular version of the EXI format. While preview versions may match drafts of this specification, they are not governed by this specification and the behaviour of EXI processors encountering preview versions of the EXI format is implementation dependent. Implementers are free to coordinate to achieve interoperability between different preview versions of the EXI format.
</p>

<p>Following the first bit of the version is a sequence of one or more
4-bit unsigned integers representing the version number. The version
number is determined by summing this sequence of 4-bit unsigned
values. The sequence is terminated by any 4-bit unsigned integer with
a value in the range 0-14. As such, the first 15 version numbers are
represented by 4 bits, the next 15 are represented by 8 bits, etc.</p>

<p>Given an EXI stream with its stream cursor positioned just past the first bit of the EXI format version field, the EXI format version number can be computed by going through the following steps with version number initially set to 1.</p>
<olist>
<item>Read next 4 bits as an unsigned integer value.</item>
<item>Add the value that was just read to the version number.</item>
<item>If the value is 15, go to step 1, otherwise (i.e. the value being in the range of 0-14), use the current value of the version number as the EXI version number.</item>
</olist>

<p>The following are example EXI format version numbers.</p>

<example>
<head>EXI Format Version Examples</head>
<table border="1">
<!-- caption>EXI Version Examples</caption -->
<thead>
<tr>
<th width="200">EXI Format Version Field</th>
<th width="200">Description</th></tr>
</thead>
<tbody>
<tr>
<td>&nbsp;&nbsp;1 0000</td>
<td>&nbsp;&nbsp;Preview version 1</td>
</tr>
<tr>
<td>&nbsp;&nbsp;0 0000</td>
<td>&nbsp;&nbsp;Final version 1</td>
</tr>
<tr>
<td>&nbsp;&nbsp;0 1110</td>
<td>&nbsp;&nbsp;Final version 15</td>
</tr>
<tr>
<td>&nbsp;&nbsp;0 1111 0000</td>
<td>&nbsp;&nbsp;Final version 16</td>
</tr>
<tr>
<td>&nbsp;&nbsp;0 1111 0001</td>
<td>&nbsp;&nbsp;Final version 17</td>
</tr>
</tbody>
</table>
</example>

<p><termref def="key-exiprocessor">EXI processors</termref> conforming with the final version of this
specification MUST use the 5-bit value 0 0000 as the version
number.</p>

</div2>
<div2 id="options">
<head>EXI Options</head>
<p><termdef id="key-options" term="EXI Options">The fourth part of the EXI
header is the <term>EXI Options</term>, which provides a way to specify the
options used to encode the body of the EXI stream</termdef>.
<termdef id="key-optionsDoc" term="EXI Options document">

The EXI Options are represented as an <term>EXI Options document</term>, which is an XML document encoded using the EXI format described in this specification.

</termdef>
This results in a very compact header
format that can be read and written with very little additional software.
</p>
<p>The presence of EXI Options in its entirety is optional in EXI header,
and it is predicated on the value of the presence bit that follows the
<termref def="key-distinguishingbits">Distinguishing Bits</termref>.
When EXI Options are present in the header, an EXI Processor MUST observe the
specified options to process the EXI stream that follows. Otherwise,
an EXI Procesor may obtain the EXI options using another mechanism. </p>
<p>
<termref def="key-exiprocessor">EXI processors</termref> MAY provide external means for applications or users to
specify EXI Options when the EXI header is absent.
Such <termref def="key-exiprocessor">EXI processors</termref> are typically used in controlled systems
where the knowledge about the effective EXI Options is shared prior to
the exchange of EXI documents. The mechanism to communicate out-of-bound
EXI Options and their representation used in such systems are implementation dependent.</p>
<p>The following table describes the EXI options specified in the
options field.</p>

<table border="1">
<caption>EXI Options in Options Field</caption>
<thead>
<tr>
<th>EXI Option</th>
<th>Description</th>

<th>Default Value</th>

</tr>
</thead>
<tbody>
<!-- tr>
<td>strict</td>
<td>Strict interpretation of schema is used to achieve better compactness</td>
</tr -->
<tr>
<td>

<termref def="key-alignmentOption">alignment</termref>

</td>
<td>
Alignment of event codes and content items

</td>
<td>

<termref def="key-unaligned">bit-packed</termref>

</td>
</tr>
<tr>
<td><termref def="key-compressionOption">compression</termref></td>
<td>EXI compression is used to achieve better compactness</td>
<td>

false

</td>
</tr>
<tr>
<td><termref def="key-fragmentOption">fragment</termref></td>
<td>Body is encoded as an <termref def="key-exifragment">EXI fragment</termref> instead of an EXI document</td>
<td>

false

</td>
</tr>
<tr>
<td><termref def="key-preserveOption">preserve</termref></td>
<td>Specifies whether comments, pis, etc. are preserved</td>
<td>

all false

</td>
</tr>
<tr>
<td><termref def="key-schemaIDOption">schemaID</termref></td>
<td>Identify the schema information, if any, used to encode the body</td>
<td>

none

</td>
</tr>
<tr>
<td><termref def="key-codecMapOption">codecMap</termref></td>
<td>Identify pluggable CODECs used to encode body</td>
<td>

none

</td>
</tr>
<tr>
<td><termref def="key-blockSizeOption">blockSize</termref></td>
<td>Specifies the block size used for EXI compression</td>
<td>

1,000,000

</td>
</tr>
<tr>
<td>[user defined]</td>
<td>User defined headers may be added</td>
<td>

none

</td>
</tr>
</tbody>
</table>

<p>Appendix <specref ref="optionsSchema"/> provides an XML Schema
describing 

<termref def="key-optionsDoc">the EXI Options document</termref>.


This schema is

designed to produce smaller headers
for option combinations used when compactness is critical.</p>

<p>

The <termref def="key-optionsDoc">EXI Options document</termref> is

encoded as an EXI body using the default options specified by the following XML document:</p>

<reprdef>
<head>Header options used for encoding the <termref def="key-optionsDoc">EXI Options document</termref></head>
<repr xml:space="preserve">
  &lt;header xmlns="&exins;"&gt;
  &lt;/header&gt;
</repr>
</reprdef>

<ednote>
<edtext>
The above EXI Options document for encoding <termref def="key-optionsDoc">EXI Options documents</termref> will be revised to use "strict" option when the feature is specified in this specification.</edtext>
</ednote>

<p><termdef id="key-alignmentOption">The <term>alignment option</term> is used to control the alignment of event codes and content items.</termdef> The value is one of <termref def="key-unaligned">bit-packed</termref>, <termref def="key-bytealignment">byte-alignment</termref> or <termref def="key-precompression">pre-compression</termref>, of which <termref def="key-unaligned">bit-packed</termref> is the default value assumed when the "alignment" element is absent in the <termref def="key-optionsDoc">EXI Options document</termref>.</p>

<p><termdef id="key-unaligned">Alignment option value <term>bit-packed</term> indicates that the the event codes and associated content are packed in bits without any paddings in-between.</termdef>
</p>

<p><termdef id="key-bytealignment">Alignment option value <term>byte-alignment</term> indicates that the event codes and associated content are aligned on byte boundaries.</termdef> While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream.</p>

<p>
<termdef id="key-precompression">Alignment option value <term>pre-compression</term> alignment indicates that all steps involved in compression (see section <specref ref="compression"/>) are to be done with the exception of the final step of applying the DEFLATE algorithm.</termdef> The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression.
</p>

<p>
<termdef id="key-compressionOption">The <term>compression option</term> is a Boolean used to increase compactness using additional computational resources.</termdef> The default value "false" is assumed when the "compression" element is absent in the <termref def="key-optionsDoc">EXI Options document</termref>.
When set to true, the event codes and associated content are compressed according to <specref ref="compression"/> regardless of the <termref def="key-alignmentOption">alignment</termref> option value.
</p>

<!-- p>If <termref def="key-compressionOption">compression</termref> or  
<termref def="key-alignmentOption">alignment</termref> are off, the event codes and 
associated content are represented as a sequence of bit-encoded values. 
</p -->

<p>
<termdef id="key-fragmentOption">The <term>fragment option</term> is a Boolean that indicates whether the EXI body is an EXI document or an EXI fragment.</termdef>  When set to true, the EXI body is an EXI fragment. Otherwise, the EXI body is an EXI document. <termdef id="key-exifragment" term="EXI fragment"><term>EXI fragments</term> are analogous in concept to <xspecref spec="XML" ref='wf-entities'>external general parsed entities</xspecref> in XML in that they consist of a sequence of elements, processing instructions and comments in containers of their own that are physically separate from the documents in which they are to be used.</termdef> An EXI fragment is formally defined in terms of its grammar in Section <specref ref="builtinFragGrammars"/>. The XML Information Set an EXI stream is mapped onto contains a document information item if the stream represents an EXI document, otherwise, the XML Information Set does not have a document information item if the stream represents an EXI fragment. The order among elements, processing instructions and comments that appear at the root in an EXI fragment is deemed significant and MUST be preserved by <termref def="key-exiprocessor">EXI processors</termref>.</p>

<p><termdef id="key-preserveOption">The <term>preserve option</term> is a set of Booleans that can be set independently to control whether certain information items are preserved in the EXI stream.</termdef> <specref ref="fidelityOptions"/> describes the set of information items effected by the preserve option.</p>

<p><termdef id="key-schemaIDOption">The <term>schemaID option</term> may be used to identify the schema information used to encode the EXI body.</termdef> When the schemaID is nil, no schema information was used to encode the EXI body. When the schemaID option is absent (i.e., undefined), no statement is made about the schema information used to encode the EXI body and it is assumed this information is communicated out of band.</p>

<p><termdef id="key-codecMapOption">The <term>codecMap option</term> identifies pluggable CODECs used to encode the EXI body as described in <specref ref="pluggableCodecs"/>.</termdef></p>

<p><termdef id="key-blockSizeOption">The <term>blockSize option</term> specifies the block size used for EXI compression.</termdef> When the blockSize option is absent, the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory.</p> 



<!-- It is encoded using the schema in Appendix <specref ref="optionsSchema"/>
with the options specified by the following XML document: -->

</div2>
</div1>

<div1 id="encodingEvents">
<head>Encoding EXI Streams</head>
<p>The rules for encoding a series of events as an EXI stream are very
simple and are driven by a declarative set of grammars that describes
the structure of an EXI stream. Every event in the stream is
encoded using the same set of encoding rules, which are summarized as
follows: </p>
<olist>
<item>Get the next event to be encoded</item>
<item>If fidelity options indicate this event type is not processed,
go to step 1</item>
<item>Use the grammars to determine the <termref def="key-eventcode">event code</termref> of the event</item>
<item>Encode the event code followed by the event content</item>
<item>Evaluate the grammar production matched by the event</item>
<item>Repeat until the End Document (ED) event is encoded</item></olist>

<p>Namespace (NS) events are encoded before attribute (AT) events in the stream following the associated start element (SE) event. 

When <termref def="key-builtinElementGrammar">built-in element grammars</termref> are used, attribute events can occur in any order. Otherwise, when <termref def="key-informedElementGrammar">schema-informed grammars</termref> are used for processing an element, AT(xsi:type) event comes first if present, followed by the AT(xsi:nil) event if present, followed by the rest of the attribute (AT) events in lexical order sorted first by <emph>qname</emph>'s local-name then by <emph>qname</emph>'s URI. Namespace (NS) events can occur in any order regardless of the grammars used for processing the associated element.</p>

<p>EXI uses the same simple procedure described above, to encode well-formed documents, document fragments, schema-valid information items, schema-invalid information items, information items partially described by schemas and information items with no schema at all. Only the grammars that describe these items differ. For example, an element with no schema information is encoded according to the XML grammar defined by the XML specification, while an element with schema information is encoded according to the more specific grammar defined by that schema. </p>

<p><termdef id="key-eventcode" term="Event Code">An <term>event code</term> is a sequence of 1 to 3 non-negative integers called parts. Each production in a grammar has an event code that distinguishes its event from that of other productions that share the same left-hand-side non-terminal symbol. </termdef></p>

<p>Section 
<specref ref="eventCodes"/> describes in detail how the grammar is used to determine the event code of an event. Section 
<specref ref="encodingEventCodes"/> describes in detail how event codes are represented as bits. Section 
<specref ref="fidelityOptions"/> describes available fidelity options and how they effect the EXI stream. Section 
<specref ref="encodingValues"/> describes how the typed event contents are represented as bits. </p>
<div2 id="eventCodes">
<head>Determining Event Codes</head>
<p>The structure of an EXI stream is described by the EXI grammars, which are formally specified in section 
<specref ref="grammars"/>. Each grammar defines which events are permitted to occur at any given point in the EXI stream and provides a pre-assigned event code for each event.</p>

<p>For example, the grammar productions below describe the events that can occur in a schema-informed EXI stream after the Start-Document (SD) event provided there are four global elements defined in the schema and provide an event code for each event:
</p>
<example>
<head>Example productions with event codes</head>

<table width="95%">
<thead>
<tr>
<th align="left" colspan="3">Syntax</th>
<th align="left">Event Code</th></tr>
</thead>
<tbody>
<tr>
<td width="5%"></td>
<td colspan="3"><emph>DocContent</emph></td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">SE ("A") 
<emph>DocEnd</emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">SE ("B") 
<emph>DocEnd</emph></td>
<td>1</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">SE ("C") 
<emph>DocEnd</emph></td>
<td>2</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">SE ("D") 
<emph>DocEnd</emph></td>
<td>3</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">SE (*) 
<emph>DocEnd</emph></td>
<td>4.0</td></tr>
<tr>
<td></td>
<td></td>
<td>DT 
<emph>DocContent</emph></td>
<td>4.1</td></tr>
<tr>
<td></td>
<td></td>
<td>CH 
<emph>DocContent</emph></td>
<td>4.2</td></tr>
<tr>
<td></td>
<td></td>
<td>CM 
<emph>DocContent</emph></td>
<td>4.3.0</td></tr>
<tr>
<td></td>
<td></td>
<td>PI 
<emph>DocContent</emph></td>
<td>4.3.1</td></tr></tbody></table></example>
<p>At the point in an EXI stream where the above grammar productions are in effect, the event code of Start Element "A" (i.e. SE("A")) is 0. The event code of a DOCTYPE (DT) event at this point in the stream is 4.1, and so on. 
</p>
</div2>

<div2 id="encodingEventCodes">
<head>Representing Event Codes</head>

<p>Each event code is represented by a sequence of 1 to 3 parts that uniquely identify an event. 
Event code parts are encoded in order starting with the first part followed by subsequent parts.</p>

<p>When EXI compression and alignment are not in effect for the current processing of the stream, 
the <emph>i</emph>th part of an event code is encoded using the minimum number of bits required to distinguish it from the <emph>i</emph>th part of the other sibling event codes in the current grammar. Specifically, the 
<emph>i</emph>th part of an event code is encoded as an <emph>n</emph>-bit unsigned integer (<specref ref="encodingBoundedUnsigned" />), of which 
<emph>n</emph> is &lceil; log <sub>2</sub> <emph>m</emph> &rceil; where <emph>m</emph> is the number of distinct values used as the 
<emph>i</emph>th part of its own and all its sibling event codes in the current grammar.
<!-- In cases, where there is only one distinct value for a given part, the part is omitted (i.e., encoded in log 
<sub>2</sub> 1 = 0 bits). -->
Two event codes are siblings at the <emph>i</emph>th part if and only if they share the same values in all preceding parts. All event codes are siblings at the first part.
</p>
<p>When the EXI events are subsequently subject to EXI compression or alignment, 
the <emph>i</emph>th part of an event code is encoded using the minimum number of bytes instead of 
bits required to distinguish it from the <emph>i</emph>th part of the other sibling event codes in 
the current grammar.  Each part is encoded as an <emph>n</emph>-bit unsigned integer 
(<specref ref="encodingBoundedUnsigned" />), of which 
<emph>n</emph> is &lceil; log <sub>2</sub> <emph>m</emph> &rceil; where <emph>m</emph> is the 
number of distinct values used as the 
<emph>i</emph>th part of its own and all its sibling event codes in the current grammar.
The number of bytes used for the <emph>n</emph>-bit unsigned integer representation in this case 
is equal to &lceil; <emph>n</emph> / 8 &rceil;.</p>

<p>Regardless of the EXI compression and alignment options, if there is only one distinct value for a given part, the part is omitted (i.e., encoded in log <sub>2</sub> 1 = 0 bits = 0 bytes).
</p>
<p>For example, the nine event codes shown in the 
<emph>DocContent</emph> grammar above have a value ranging from 0 to 4 for their first part. There are five distinct values needed to identify the first part of these event codes. Therefore, when EXI compression and alignment are not in effect, the first part can be encoded in &lceil; log <sub>2</sub> 5 &rceil; = 3 bits. In the same fashion, the number of bits used for encoding second and third part (if present) are calculated as &lceil; log <sub>2</sub> 4 &rceil; = 2 bits and &lceil; log <sub>2</sub> 2 &rceil; = 1 bits, respectively.
On the other hand, when EXI compression or alignment is in effect, the number of bytes used for each part is &lceil; 3 / 8 &rceil; = 1 bytes for the first part, &lceil; 2 / 8 &rceil; = 1 bytes for the second part and &lceil; 1 / 8 &rceil; = 1 bytes for the third part.</p>

<p>The table below illustrates how the event codes of each event in the 
<emph>DocContent</emph> grammar above is encoded. </p>
<example>
<head>Example event code encoding</head>
<p></p>
<table border="1" width="95%">
<caption>Example event code encoding when EXI compression and alignment are not in effect</caption>
<colgroup></colgroup>
<colgroup span="3" align="center"></colgroup>
<colgroup></colgroup>
<colgroup align="center"></colgroup>
<thead>
<tr>
<th width="30%">Event</th>
<th colspan="3">Part values</th>
<th width="40%">Event Code Encoding</th>
<th width="10%"># bits</th></tr>
</thead>
<tbody>
<tr>
<td>SE ("A")</td>
<td>0</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>000</td><td>3</td></tr>
<tr>
<td>SE ("B")</td>
<td>1</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>001</td><td>3</td></tr>
<tr>
<td>SE ("C")</td>
<td>2</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>010</td><td>3</td></tr>
<tr>
<td>SE ("D")</td>
<td>3</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>011</td><td>3</td></tr>
<tr>
<td>SE (*)</td>
<td>4</td>
<td>0</td>
<td>&nbsp;</td>
<td>100&nbsp;&nbsp;00</td><td>5</td></tr>
<tr>
<td>DT</td>
<td>4</td>
<td>1</td>
<td>&nbsp;</td>
<td>100&nbsp;&nbsp;01</td><td>5</td></tr>
<tr>
<td>CH</td>
<td>4</td>
<td>2</td>
<td>&nbsp;</td>
<td>100&nbsp;&nbsp;10</td><td>5</td></tr>
<tr>
<td>CM</td>
<td>4</td>
<td>3</td>
<td>0</td>
<td>100&nbsp;&nbsp;11&nbsp;&nbsp;0</td><td>6</td></tr>
<tr>
<td>PI</td>
<td>4</td>
<td>3</td>
<td>1</td>
<td>100&nbsp;&nbsp;11&nbsp;&nbsp;1</td><td>6</td></tr></tbody></table>
<table border="1" width="95%">
<colgroup></colgroup>
<colgroup span="3" align="center"></colgroup>
<colgroup></colgroup>
<colgroup></colgroup>
<tbody>
<tr>
<td width="30%"># distinct values ( 
<emph>m</emph>)</td>
<td>5</td>
<td>4</td>
<td>2</td>
<td width="40%">&nbsp;</td><td width="10%">&nbsp;</td></tr>
<tr>
<td><table border="0">
<tr><td># bits per part</td></tr>
<tr><td>&nbsp;&nbsp;&lceil; log <sub>2</sub> <emph>m</emph> &rceil;</td></tr>
</table></td>
<td>3</td>
<td>2</td>
<td>1</td>
<td>&nbsp;</td><td>&nbsp;</td></tr></tbody></table>
<p></p>

<table border="1" width="95%">
<caption>Example event code encoding when EXI compression or alignment is in effect</caption>
<colgroup></colgroup>
<colgroup span="3" align="center"></colgroup>
<colgroup></colgroup>
<colgroup align="center"></colgroup>
<thead>
<tr>
<th width="30%">Event</th>
<th colspan="3">Part values</th>
<th width="40%">Event Code Encoding</th>
<th width="10%"># bytes</th></tr>
</thead>
<tbody>
<tr>
<td>SE ("A")</td>
<td>0</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>00000000</td><td>1</td></tr>
<tr>
<td>SE ("B")</td>
<td>1</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>00000001</td><td>1</td></tr>
<tr>
<td>SE ("C")</td>
<td>2</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>00000010</td><td>1</td></tr>
<tr>
<td>SE ("D")</td>
<td>3</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>00000011</td><td>1</td></tr>
<tr>
<td>SE (*)</td>
<td>4</td>
<td>0</td>
<td>&nbsp;</td>
<td>00000100&nbsp;&nbsp;00000000</td><td>2</td></tr>
<tr>
<td>DT</td>
<td>4</td>
<td>1</td>
<td>&nbsp;</td>
<td>00000100&nbsp;&nbsp;00000001</td><td>2</td></tr>
<tr>
<td>CH</td>
<td>4</td>
<td>2</td>
<td>&nbsp;</td>
<td>00000100&nbsp;&nbsp;00000010</td><td>2</td></tr>
<tr>
<td>CM</td>
<td>4</td>
<td>3</td>
<td>0</td>
<td>00000100&nbsp;&nbsp;00000011&nbsp;&nbsp;00000000</td><td>3</td></tr>
<tr>
<td>PI</td>
<td>4</td>
<td>3</td>
<td>1</td>
<td>00000100&nbsp;&nbsp;00000011&nbsp;&nbsp;00000001</td><td>3</td></tr></tbody></table>
<table border="1" width="95%">
<colgroup></colgroup>
<colgroup span="3" align="center"></colgroup>
<colgroup></colgroup>
<colgroup></colgroup>
<tbody>
<tr>
<td width="30%"># distinct values (<emph>m</emph>)</td>
<td>5</td>
<td>4</td>
<td>2</td>
<td width="40%">&nbsp;</td><td width="10%">&nbsp;</td></tr>
<tr>
<td><table border="0">
<tr><td># bytes per part</td></tr>
<tr><td>&nbsp;&nbsp;&lceil; (log <sub>2</sub> <emph>m</emph>) / 8 &rceil;</td></tr>
</table></td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>&nbsp;</td><td>&nbsp;</td></tr>
</tbody></table>
</example>

</div2>
<div2 id="fidelityOptions">
<head>Fidelity Options</head>
<p>Some XML applications do not require the entire XML feature set and would prefer to eliminate the overhead associated with unused features. For example, the SOAP 1.2 specification 
<bibref ref="soap12" /> prohibits the use of XML processing-instructions. In addition, there are many data-exchange use cases that do not require XML comments or DTDs. </p>
<p>Applications can use a set of fidelity options to specify the XML features they require. As specified in section 
<specref ref="pruningProductions"/>, EXI processors MUST use these fidelity options to prune the events that are not required from the grammars, improving compactness and processing efficiency. </p>
<p>The table below lists the fidelity options supported by this version of the EXI specification and describes the effect setting these options has on the EXI stream. </p>
<table border="1">
<caption>Fidelity options</caption>
<thead>
<tr>
<th>Fidelity option</th>
<th>Effect</th></tr>
</thead>
<tbody>
<tr>
<td>Preserve.comments</td>
<td>CM events are preserved</td></tr>
<tr>
<td>Preserve.pis</td>
<td>PI events are preserved</td></tr>
<!-- tr>
<td>Preserve.whitespace</td>
<td>CH events containing only insignificant whitespace are preserved</td></tr -->
<tr>
<td>Preserve.dtd</td>
<td>DOCTYPE and ER events are preserved</td></tr>
<tr>
<td id="key-preservePrefixesOption">Preserve.prefixes</td>
<td>NS events and namespace prefixes are preserved</td></tr>
<tr>
<td id="key-preserveLexicalValuesOption">Preserve.lexicalValues</td>
<td>Lexical form of element and attribute values is preserved</td></tr></tbody></table>
<p>EXI processors may report an error if the application attempts to encode events that have been pruned from the grammar or may simply ignore these events. </p>
<!-- p>Which whitespace is deemed to be insignificant, depends on the available schema information and the xml:space attribute. If xml:space=&quot;preserve&quot; for the current element context or a schema exists and specifies that the content model of the current element is mixed, then all whitespace inside the element is significant. Otherwise, only the whitespace that occurs between consecutive, corresponding start tags and end tags is significant. </p -->
</div2></div1>

<div1 id="encodingValues">
<head>Representing Event Content</head>
<p>The content of each event in an EXI body is represented according to its type (see <specref ref='table2'/>). In the absence of external type information or when the <termref def="key-preserveLexicalValuesOption">preserve.lexicalValues</termref> option is set to true, all attribute and character 
<emph>values</emph> are typed as String. </p>

<p><termdef id="key-exidatatype" term="EXI Datatype">EXI defines a minimal set of data types called <term>Built-in EXI datatypes</term> that define how values are represented in EXI streams as described in <specref ref="encodingDatatypes"/>.</termdef> The following table lists the built-in EXI datatypes, associated type identifiers and the XML Schema Language <bibref ref="schema2" /> built-in types each is used to represent by default.</p>
<table border="1" id="builtInEXITypes">
<caption>Built-in EXI Datatypes</caption>
<thead>
<tr>
<th>Built-in EXI Datatype</th>
<th>EXI Datatype ID</th>
<th colspan="2">
<xspecref spec="XS2" ref="built-in-datatypes">XML Schema Datatypes</xspecref>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">
<xspecref href="#encodingBinary">Binary</xspecref>
</td>
<td>xsd:base64Binary</td>
<td colspan="2"><emph>base64Binary</emph></td>
</tr>
<tr>
<!-- td/ -->
<td>xsd:hexBinary</td>
<td colspan="2"><emph>hexBinary</emph></td>
</tr>
<tr>
<td>
<xspecref href="#encodingBoolean">Boolean</xspecref>
</td>
<td>xsd:boolean</td>
<td colspan="2"><emph>boolean</emph></td>
</tr>
<tr>
<td>
<xspecref href="#encodingDateTime">Date-Time</xspecref>
</td>
<td>xsd:dateTime</td>
<td colspan="2"><emph>dateTime</emph>, <emph>time</emph>, <emph>date</emph>, <emph>gYearMonth</emph>, <emph>gYear</emph>, <emph>gMonthDay</emph>, <emph>gDay</emph>, <emph>gMonth</emph></td>
</tr>
<tr>
<td>
<xspecref href="#encodingDecimal">Decimal</xspecref>
</td>
<td>xsd:decimal</td>
<td colspan="2"><emph>decimal</emph></td>
</tr>
<tr>
<td>
<xspecref href="#encodingFloat">Float</xspecref>
</td>
<td>xsd:double</td>
<td colspan="2"><emph>float</emph>, <emph>double</emph></td>
</tr>
<tr>
<td>
<xspecref href="#encodingInteger">Integer</xspecref>
</td>
<td rowspan="3">xsd:integer</td>
<td colspan="2"><emph>integer</emph> without minInclusive or minExclusive facets, or with minInclusive or minExclusive facet of negative value</td></tr>
<tr>
<td>
<xspecref href="#encodingUnsignedInteger">Unsigned Integer</xspecref>
</td>
<!-- td>&nbsp;</td -->
<td colspan="2"><emph>nonNegativeInteger</emph> or <emph>integer</emph> with minInclusive or minExclusive facet value of 0 or above</td></tr>
<tr>
<td>
<xspecref href="#encodingBoundedUnsigned">n-bit Unsigned Integer</xspecref>
</td>
<!-- td>&nbsp;</td -->
<td colspan="2">
<emph>integer</emph> with bounded range of 4095 or smaller as determined by the values of minInclusive, minExclusive, maxInclusive and maxExclusive facets.
</td></tr>
<tr>
<td>
<xspecref href="#encodingString">String</xspecref>
</td>
<td>xsd:string</td>
<td colspan="2"><emph>string</emph>, <emph>anySimpleType</emph>, <emph>anyURI</emph>, <emph>duration</emph>, All types derived by <emph>union</emph></td>
</tr>
<tr>
<td>
<xspecref href="#encodingList">List</xspecref>
</td>
<td>&nbsp;</td>
<td colspan="2">All types derived by <emph>list</emph>, including
<emph>IDREFS</emph> and <emph>ENTITIES</emph></td></tr>
<tr>
<td>
<xspecref href="#encodingQName">QName</xspecref>
</td>
<td>&nbsp;</td>
<td colspan="2">
<!-- All element and attribute <emph>qnames</emph>,--> <!-- NOTE: these are not schema types. -->
<!-- <xspecref href='http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#QName'>QName</xspecref>, <xspecref href='http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#Notation'>Notation</xspecref>--> 
<!-- note : the qname type is not used for element/attribute values - only for element/attribute names -->
&nbsp;</td></tr>
</tbody></table>
<p>By default, types derived from the XML Schema types above are also represented by the associated <termref def="key-exidatatype">built-in EXI datatype</termref>. When there are more than one XML Schema types above from which a type is derived directly or indirectly, the closest ancestor is used to determine the <termref def="key-exidatatype">built-in EXI datatype</termref>. For example, a value of XML Schema type xsd:int is represented by the same built-in type as for XML Schema type xsd:integer. Although xsd:int is derived indirectly from xsd:integer and also further from xsd:decimal, a value of xsd:int is processed as an instance of xsd:integer because xsd:integer is closer to xsd:int than xsd:decimal is in the datatype inheritance hierarchy.</p>

<p>Each EXI datatype identifier above is a QName. Datatype identifiers uniquely identify one of the built-in EXI datatypes. They are used by <termref def="key-pluggablecodecs">Pluggable CODECS</termref> to designate XML Schema types to <termref def="key-exidatatype">built-in EXI datatypes</termref> different from the ones that are associated by default. Not all built-in EXI datatypes are assigned datatype identifiers. Only those that have identifiers are usable by Pluggable CODECS for designating alternative representations.
</p>
<p>The rules used to represent values of String depend on the content items to which the values belong. There are certain content items whose value representation involve the use of string tables while other content items are represented using the encoding rule described in <specref ref="encodingString"/> without involvement of string tables. The content items that use string tables and how each of such content items uses string tables to represent their values are described in <specref ref="stringTable"/>.</p>
<p>Schemas can provide one or more enumerated values for types. EXI exploits those pre-defined values when they are available to represent values of such types in a more efficient manner than it would otherwise using built-in EXI datatypes. The encoding rule for representing a type of enumerated values is described in <specref ref="encodingEnumerations"/>. Types that are derived from other types by union and their subtypes are always represented as String regardless of the availability of enumerated values. Representation of values of which the schema type is one of QName, Notation or a type derived therefrom by restriction are also not affected by enumerated values if any.
</p>
<!-- p>The encoding rule to represent schema types that are derived by list and their subtypes, including <xspecref href='http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#IDREFS'>IDREFS</xspecref> and <xspecref href='http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#ENTITIES'>ENTITIES</xspecref> is described in <specref ref="encodingList"/>.
</p -->

<div2 id="encodingDatatypes">
<head>Built-in EXI Datatypes Representation</head>
<p>The following sections describe the encoding rules for representing <termref def="key-exidatatype">built-in EXI datatypes</termref>.
</p>
<div3 id="encodingBinary">
<head>Binary</head>
<p>Values typed as Binary are represented as a length-prefixed sequence of octets representing the binary content. The length is represented as an Unsigned Integer (see 
<specref ref="encodingUnsignedInteger"/>). </p></div3>
<div3 id="encodingBoolean">
<head>Boolean</head>

<p>The number of distinct values that a Boolean can represent depends on the presence of pattern facets in the associated schema datatype. Those values are zero (0) and one (1) in the absence of patterns, whereas they are zero (0), one (1), two (2) and three (3) when patterns are available. With patterns, the value set is able to distinguish values not only arithmetically (0 or 1) but also between lexical variances. The following table shows the possible Boolean values as well as what each value represents with/without patterns in comparison.
</p>
<table width="95%" border="1">
<colgroup align="center" width="32%"></colgroup>
<colgroup width="34%"/>
<colgroup width="34%"/>
<thead>
<tr>
<th>Boolean value</th>
<th>without patterns</th>
<th>with patterns</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>&nbsp;&nbsp;0 ("false" or "0")</td>
<td>&nbsp;&nbsp;0 ("false")</td>
</tr>
<tr>
<td>1</td>
<td>&nbsp;&nbsp;1 ("true" or "1")</td>
<td>&nbsp;&nbsp;0 ("0")</td>
</tr>
<tr>
<td>2</td>
<td rowspan="2" align="center">N/A</td>
<td>&nbsp;&nbsp;1 ("true")</td>
</tr>
<tr>
<td>3</td>
<!-- td></td -->
<td>&nbsp;&nbsp;1 ("1")</td>
</tr>
</tbody>
</table>
<table width="95%" border="1">
<colgroup align="center" width="32%"></colgroup>
<colgroup align="center" width="34%"/>
<colgroup align="center" width="34%"/>
<tbody>
<tr>
<td># of distinct values (<emph>N</emph>)</td>
<td>&nbsp;&nbsp;2</td>
<td>&nbsp;&nbsp;4</td>
</tr>
</tbody>
</table>
<p>
When the value of <termref def="key-compressionOption">compression option</termref> is false and
the value <termref def="key-unaligned">bit-packed</termref> is used for <termref def="key-alignmentOption">alignment options</termref>, 
values typed as Boolean 
are represented using <emph>n</emph>-bit unsigned integer (<specref ref="encodingBoundedUnsigned" />) where 
<emph>n</emph> equals to log<sub>2</sub>(<emph>N</emph>) given the number of distinct values which is 2 or 4 
depending on the presence of pattern facets as shown in the above table. Otherwise, they are represented using one byte.
</p>
</div3>

<div3 id="encodingDecimal">
<head>Decimal</head>
<p>Values typed as Decimal are represented as a Boolean sign (see <specref ref="encodingBoolean"/>) followed by two  Unsigned Integers (see <specref
ref="encodingUnsignedInteger"/>). A sign value of zero (0) is used to represent positive Decimal values and a sign value of one (1) is used to represent negative Decimal values. The first Unsigned Integer represents the integral portion of the Decimal value. The second Unsigned Integer represents the fractional portion of the Decimal value with the digits in reverse order to preserve leading zeros.</p>
</div3>

<div3 id="encodingFloat">
<head>Float</head>
<p>Values typed as Float are represented as two consecutive Integers (see 
<specref ref="encodingInteger"/>). The first Integer represents the mantissa of the floating point number and the second Integer represents the base-10 exponent of the floating point number. The range of the mantissa is - (2<sup>63</sup>) to 2<sup>63</sup>-1 and the range of the exponent is - (2<sup>14</sup>-1) to 2<sup>14</sup>-1. Values typed as Float with a mantissa or exponent outside the accepted range are represented as schema-invalid values.</p>

<p>The exponent value -(2<sup>14</sup>) is used to indicate one of the special values: infinity, negative infinity and not-a-number (NaN). An exponent value -(2<sup>14</sup>) with mantissa values 1 and -1 represents 
positive infinity (INF) and negative infinity (-INF) respectively. An exponent value -(2<sup>14</sup>) with any other mantissa value represents NaN.
</p>

<p>A value represented as Float can be decoded by going through the following steps.</p>
<olist>
<item>Retrieve the mantissa value using the procedure described in <specref ref="encodingInteger"/>.</item>
<item>Retrieve the exponent value using the procedure described in <specref ref="encodingInteger"/>.</item>
<item>If the exponent value is -(2<sup>14</sup>), the mantissa value 1 represents INF, the mantissa value -1 represents -INF and any other mantissa value represents NaN. If the exponent value is not -(2<sup>14</sup>), the float value is <emph>m</emph> &times; 10<sup><emph>e</emph></sup> where <emph>m</emph> is the mantissa and <emph>e</emph> is the exponent obtained in the preceding steps.
</item>
</olist>
<note>
<p>Support for IEEE float representation is currently under consideration. (See <specref ref="ieeeFloats"/>)</p>
</note></div3>

<div3 id="encodingInteger">
<head>Integer</head>
<p>The Integer type supports signed integer numbers of arbitrary magnitude. Values typed as Integer are represented as a Boolean sign (see <specref ref="encodingBoolean" />) followed by an Unsigned Integer (see <specref ref="encodingUnsignedInteger" />). A sign value of zero (0) is used to represent positive integers and a sign value of one (1) is used to represent negative integers. For non-negative values, the Unsigned Integer holds the magnitude of the value. For negative values, the Unsigned Integer holds the magnitude of the value minus 1. </p>
</div3>

<div3 id="encodingUnsignedInteger">
<head>Unsigned Integer</head>
<p>The Unsigned Integer type supports unsigned integer numbers of arbitrary magnitude. Values typed as Unsigned Integer are represented using a sequence of octets. The sequence is terminated by an octet with its most significant bit set to 0. The value of the unsigned integer is stored in the least significant 7 bits of the octets as a sequence of 7-bit bytes, with the least significant byte first. </p>
<!-- Unsigned Integer values SHOULD be stored in the minimum number of required octets. -->
<p>A value represented as Unsigned Integer can be decoded by going through the following steps with the initial value set to 0 and the initial multiplier set to 1.</p>
<olist>
<item>Read the next octet.</item>
<item>Multiply the value of the unsigned number represented by the 7 least significant bits of the octet by the current multiplier and add the result to the current value.</item>
<item>Multiply the multiplier by 128.</item>
<item>If the most significant bit of the octet was 1, go back to step 1.</item>
</olist>
<ednote>
<edtext>
EXI also provides a modified representation for Integers that will not fit within a 64-bit integer to facilitate processing by devices that do not support big integers. This capability has not yet been specified. 
</edtext>
</ednote>
</div3>

<div3 id="encodingQName">
<head>QName</head>
<p>Values of type QName are encoded as a sequence of values representing the URI, local-name and prefix components of the QName in that order, where the prefix component is present only when the <termref def="key-preservePrefixesOption">preserve.prefixes</termref> option is set to true.
</p>
<p>When the QName value is specified by a schema-informed grammar using the SE(<emph>qname</emph>) or AT(<emph>qname</emph>) terminal symbols, URI and local-name are implicit and are omitted.
Otherwise, URI and local-name components are encoded as Strings (see 
<specref ref="encodingString"/>) per the rules defined for <termref def="key-uriContentItem"><emph>uri</emph></termref> content item and a <emph id="key-localName">local-name</emph> content item, respectively.
If the QName is in no namespace, the URI is represented by a zero length String. 
</p>
<p>
</p>
<p>When present, prefixes are represented as <emph>n</emph>-bit unsigned integers (<specref ref="encodingBoundedUnsigned" />), where <emph>n</emph> is log<sub>2</sub>(<emph>N</emph>) and <emph>N</emph> is the number of unique <emph>prefix</emph>es specified for the URI of the QName by preceding NS events in the EXI stream. Each unique <emph>prefix</emph> is assigned a unique <emph>n</emph>-bit integer (0 ... <emph>N</emph>-1) according to the order in which the associated NS event occurs in the EXI stream. If there are no <emph>prefix</emph>es specified for the URI of the QName by preceding NS events in the EXI stream, the prefix is undefined. An undefined prefix is represented using zero bits (i.e., omitted).
</p>
<p>Given either a <emph>n</emph>-bit unsigned integer <emph>m</emph> that represents the prefix value or an undefined prefix, the effective prefix value is determined by following the rules described below in order. A QName is in error if it has an undefined prefix that cannot be resolved by the rules below.
</p>
<ol>
<li>If the prefix is defined, select the <emph>m</emph>-th <emph>prefix</emph> value associated with the URI of the QName as the candidate prefix value. Otherwise, there is no candidate prefix value.
</li>
<li>If the QName value is part of an SE event followed by an associated NS event with an indicator value of 1, the prefix value is the <emph>prefix</emph> of such NS event. Otherwise, the prefix value is the candidate value, if any, selected in step 1 above.
</li>
</ol>
<!-- olist>
<item --><!-- /item>
<item id="key-localName"--><!-- /item>
</olist -->
</div3>

<div3 id="encodingDateTime">
<head>Date-Time</head>
<p>Values typed as Date-Time are encoded as a sequence
of values representing the individual components of the Date-Time. The
following table specifies each of the possible date-time components
along with how they are encoded.</p>
<table border="1">
<caption>Date-Time components</caption>
<thead>
<tr>
<th>Component</th>
<th>Value</th>
<th>Type</th></tr>
</thead>
<tbody>
<!-- tr>
<td>Type</td>
<td>The type of date (see below)</td>
<td>3-bit Unsigned Integer (<specref ref="encodingBoundedUnsigned"/>)</td></tr -->
<tr>
<td>Year</td>
<td>Offset from 2000</td>
<td>Integer ( 
<specref ref="encodingInteger"/>)</td></tr>
<tr>
<td>MonthDay</td>
<td>Month * 31 + Day</td> <td>9-bit Unsigned Integer (<specref
ref="encodingBoundedUnsigned"/>) where day is a value in the range 0-30 and month is a value in the range 1-12.</td></tr>
<tr>
<td>Time</td>
<td>((Hour * 60) + Minutes) * 60 + seconds</td>
<td>17-bit Unsigned Integer (<specref ref="encodingBoundedUnsigned"/>)</td></tr>
<!-- tr>
<td>FractionalSecs?</td>
<td>Boolean presence indicator</td>
<td>Boolean (<specref ref="encodingBoolean"/>)</td></tr -->
<tr>
<td>FractionalSecs</td>
<td>Fractional seconds</td>
<td>Unsigned Integer ( 
<specref ref="encodingUnsignedInteger"/>) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros</td></tr>
<!-- tr>
<td>TimeZone?</td>
<td>Boolean presence indicator</td>
<td>Boolean (<specref ref="encodingBoolean"/>)</td></tr-->
<tr>
<td>TimeZone</td>
<td>TZHours * 60 + TZMinutes</td>
<td>11-bit Unsigned Integer (<specref ref="encodingBoundedUnsigned"/>) representing a signed integer offset by 840 ( = 14 * 60 )</td></tr>
<tr>
<td>presence</td>
<td>Boolean presence indicator</td>
<td>Boolean (<specref ref="encodingBoolean"/>)</td></tr>
</tbody></table>
<p>
The variety of components that constitute a value and their appearance order depend on the XML Schema type associated with the value. The following table shows which components are included in a value of each XML Schema type that is relevant to Date-Time datatype. Items listed in square brackets are included if and only if the value of its preceding presence indicator (specified above) is set to true.</p>
<table border="1">
<caption>Assortment of Date-Time components</caption>
<thead>
<tr>
<th>XML Schema Type</th>
<th>Included Components</th></tr>
</thead>
<tbody>
<tr>
<td><xspecref spec="XS2" ref='gYear'>gYear</xspecref></td>
<td>Year, presence, [TimeZone]</td></tr>
<tr>
<td><xspecref spec="XS2" ref='gYearMonth'>gYearMonth</xspecref></td>
<td rowspan="2">Year, MonthDay, presence, [TimeZone]</td></tr>
<tr>
<td><xspecref spec="XS2" ref='date'>date</xspecref></td>
<!-- td>Year, MonthDay, [TimeZone]</td --></tr>
<tr>
<td><xspecref spec="XS2" ref='dateTime'>dateTime</xspecref></td>
<td>Year, MonthDay, Time, presence, [FractionalSecs], presence, [TimeZone]</td></tr>
<tr>
<td><xspecref spec="XS2" ref='gMonth'>gMonth</xspecref></td>
<td rowspan="3">MonthDay, presence, [TimeZone]</td></tr>
<tr>
<td><xspecref spec="XS2" ref='gMonthDay'>gMonthDay</xspecref></td>
<!-- td>MonthDay, [TimeZone]</td --></tr>
<tr>
<td><xspecref spec="XS2" ref='gDay'>gDay</xspecref></td>
<!-- td>MonthDay, [TimeZone]</td --></tr>
<tr>
<td><xspecref spec="XS2" ref='time'>time</xspecref></td>
<td>Time, presence, [FractionalSecs], presence, [TimeZone]</td></tr></tbody></table></div3>

<div3 id="encodingBoundedUnsigned">
<head><emph>n</emph>-bit Unsigned Integer</head>
<p>
When the value of <termref def="key-compressionOption">compression option</termref> is false and
the value <termref def="key-unaligned">bit-packed</termref> is used for <termref def="key-alignmentOption">alignment options</termref>, 
values of type 
<emph>n</emph>-bit Unsigned Integer are represented as an unsigned binary integer using <emph>n</emph> bits. 
Otherwise, they are represented as an unsigned integer using the minimum number of bytes required to store 
<emph>n</emph> bits. Bytes are ordered with the least significant byte first.</p>

<p>The n-bit unsigned integer encoding is also used to encode <emph>bounded integers</emph>. 
These are integer values that have been constrained explicitly through the use of schema facets 
(for example, XML schema minInclusive and maxInclusive facets) or implicitly through the use 
of a restricted data type (for example, the XML schema <emph>unsignedByte</emph> type).</p>

<p>A bounded integer value is encoded as an offset (or delta) from the minimum value in the range. 
It is encoded in the minimum number of bits that would be necessary to hold any value within the 
full range.  For example, if an integer is constrained to have a value between 3 and 10 
(inclusively) and the value to be encoded is 7, the number encoded would be 7 - 3 = 4 and the 
number of bits needed would be 3.</p>

<p>If the range defined by the bounds is large, the average number of bits needed to encode a 
set of values can be larger than the number of bits needed if those values are encoded as 
variable-length integers (see <specref ref="encodingInteger"/>). For this reason, a maximum 
range value is imposed such that if the value to be encoded is larger than this maximum, 
variable-length integer encoding is done.  The maximum range value is 4095 which equates to a 
bit field length of no more than 12 bits.</p>
</div3>

<div3 id="encodingString">
<head>String</head>
<p>Values of type String are represented as a length prefixed sequence of
characters. The length indicates the number of characters in the
string and is represented as an Unsigned Integer (see <specref
ref="encodingUnsignedInteger"/>). If a restricted character set is defined for the string (see <specref ref="restrictedCharSet"/>), each character is represented as an <emph>n</emph>-bit Unsigned Integer (see <specref ref="encodingBoundedUnsigned"/>). Otherwise, each character is represented by its UCS code point encoded as an Unsigned Integer (see <specref ref="encodingUnsignedInteger"/>).
</p>
<p>EXI uses a string table to represent certain
content items more efficiently. Section <specref ref="stringTable"/>
describes the string table and how it is applied to different content
items.</p>
<div4 id="restrictedCharSet">
<head>Restricted Character Sets</head>
<p>If a string value is associated with a schema datatype and one or more of the datatypes in its datatype hierarchy has one or more pattern facets, there may be a restricted character set defined for the string value. The following steps are used to determine the restricted character set, if any, defined for a given string value associated with such a schema datatype.
</p>
<p>First, determine the character set for each datatype in the datatype hierarchy of the string value that has one or more pattern facets according to section <specref ref="regexToCharset"/>. For each datatype with more than one pattern facet, compute the restricted character set based on the union of the regular expressions specified by its pattern facets. If the restricted character set for a datatype contains at least 255 characters or contains non-BMP characters, the character set of the datatype is not restricted and can be omitted from further consideration.</p>

<p>Then, compute the restricted character set for the string value as the intersection of all the character sets computed above. If the resulting character set contains less than 255 characters, the string value has a restricted character set and each character is represented using an <emph>n</emph>-bit Unsigned Integer (see <specref ref="encodingBoundedUnsigned"/>), where <emph>n</emph> is log<sub>2</sub>(<emph>N</emph> + 1) and <emph>N</emph> is the number of characters in the restricted character set.</p>

<p>The characters in the restricted character set are sorted by UCS code point and represented by integer values in the range (0 ... <emph>N</emph>-1) according to their ordinal position in the set. Characters that are not in this set are represented by the integer <emph>N</emph> followed by the UCS code point of the character represented as an Unsigned Integer.</p>

<!-- reworded for clarity: jcs 12/13/07
<ol>
<li> </li>
<li>If the string value does not have have a datatype If the datatype is an ur-type, the character set of the datatype is the entire XML character set.</li>
<li>Otherwise, the character set of the datatype is determined as follows.
</li>
<ol>
<li>If the datatype does not have pattern facets specified within its own definition, the character set of the datatype equals to the character set of its base datatype.
</li>
<li>Otherwise, "local character set" of the datatype is obtained by making union of all the character sets derived from the patterns (i.e. regular expressions) specified within its own datatype definition. See <specref ref="regexToCharset"/> for how to derive a character set from a pattern. Then the character set of the datatype equals to the intersection of the local character set and base datatype's character set.
</li></ol></ol>
<p>Given the number of member characters <emph>N</emph> in the character set in effect, when <emph>N</emph> is greater than 255 or non-BMP characters are included in the set, each character in the string is encoded as Unsigned Integer (see <specref ref="encodingUnsignedInteger"/>) representing its UCS <bibref ref="ISO10646"/> code point. Otherwise (i.e. <emph>N</emph> is equal to or smaller than 255 and only BMP <bibref ref="ISO10646"/> characters are contained in the set), <emph>n</emph>-bit Unsigned Integer (see <specref ref="encodingBoundedUnsigned"/>) is used for character representation, where <emph>n</emph> equals to log<sub>2</sub>(<emph>N</emph> + 1). The characters in the character set are sorted by UCS code point and character serial numbers are assigned sequentially (0 ... <emph>N</emph>-1, inclusively). These serial numbers are used in the <emph>n</emph>-bit representation, and the serial number <emph>N</emph> is reserved to indicate a character that does not participate in the character set. Character serial number <emph>N</emph> is always followed by the UCS code point of the character represented as an Unsigned Integer.
</p>
-->
<p>The figure below illustrates an overview of the process for determining and using restricted character sets described in this section. </p>
<graphic source="restrictedCharset.png" alt="String Processing Model"/>
</div4>
</div3>
<div3 id="encodingList">
<head>List</head>
<p>Values of type List are encoded as a length
prefixed sequence of values. The length is encoded as an Unsigned Integer (see
<specref ref="encodingUnsignedInteger"/>) and each value is encoded according
to its type (see <specref ref="encodingValues"/>).</p>
</div3>

</div2>
<div2 id="encodingEnumerations">
<head>Enumerations</head>
<p>Values of enumerated types are encoded as
<emph>n</emph>-bit Unsigned Integers (<specref ref="encodingBoundedUnsigned"/>) where <emph>n</emph> = &lceil; log <sub>2</sub> <emph>m</emph> &rceil; and <emph>m</emph> is the number of items
in the enumerated type. The value assigned to each item corresponds to
its ordinal position in the enumeration in schema-order starting with
position zero (0).</p>
<p>Exceptions are for schema types derived from others by union and their subtypes, QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatypes instead of being represented as enumerations.</p>
</div2>

<div2 id="stringTable">
<head>String Table</head>
<p>EXI uses a string table to assign "compact identifiers" to some
string values. Occurrences of string values found in the string table
are represented using the associated compact identifier rather than
encoding the entire "string literal". The string table is initially pre-populated with
string values that are likely to occur in certain contexts and is
dynamically expanded to include additional string values encountered
in the document. The following content items are encoded using a
string table: </p>

<ulist>
<item>
<termref def="key-uriContentItem"><emph>uris</emph></termref></item>
<item>
<termref def="key-prefixContentItem"><emph>prefixes</emph></termref></item>
<item>
<termref def="key-localName"><emph>local-names</emph></termref></item>
<!-- item>
<termref def="key-nameContentItem"><emph>names</emph></termref></item -->
<item>
<termref def="key-valueContentItem"><emph>values</emph></termref></item></ulist>

<p>The <emph>uris</emph> and <emph>local-names</emph> used in <emph>qname</emph> content items are also encoded using a string table. When a string value is found in the string table, the value is encoded
using the compact identifier and no changes are made to the string table as a result. 
When a string value is not found in the string table, its string literal is encoded
as a String without using a compact identifier, only after which
the string table is augmented by including the string value with an assigned
compact identifier.</p>

<p>The string table is divided into partitions and each partition is
optimized for more frequent use of either compact identifiers or string literals
depending on the purpose of the partition. Section <specref
ref="stringTablePartitions"/> describes how EXI string table is
partitioned. Section <specref ref="encodingOptimizedForHits"/>
describes how string values are encoded when the associated partition
is optimized for more frequent use of compact identifiers. Section <specref
ref="encodingOptimizedForMisses"/> describes how string values are
encoded when the associated partition is optimized for more frequent use
of string literals.</p>
<p>The life cycle of a string table spans the processing of 
a single EXI stream. String tables are not represented in an EXI stream or exchanged
between EXI processors. A string table cannot be reused across multiple EXI streams;
therefore, EXI processors MUST use a string table that is equivalent to
the one that would have been newly created and pre-populated with initial
values for processing each EXI stream.
</p>


<div3 id="stringTablePartitions">
<head>String Table Partitions</head>
<p>The string table is organized into partitions
so that the indices assigned to compact identifiers can stay relatively small.
Smaller number of indices results in improved average compactness and the efficiency
of table operations. Each partition has a separate set of compact identifiers and
content items are assigned to specific partitions as described below. 
</p>
<p><termref def="key-uriContentItem"><emph>Uri</emph></termref> content items and the URI portion of <emph>qname</emph> content items are assigned to the uri
partition. The uri partition is optimized for frequent use of compact identifiers and is
pre-populated with initial entries as described in <specref ref="initialUriValues"/>.
When a schema is provided, the uri partition is also pre-populated with
the name of each namespace URI declared in the schema,
appended in lexicographical order.</p>

<p><termref def="key-prefixContentItem"><emph>Prefix</emph></termref> content items are assigned to partitions based
on their associated namespace URI. Partitions containing
<emph>prefix</emph> content items are optimized for frequent use of compact identifiers and the
string table is pre-populated with entries as described in
<specref ref="initialPrefixValues"/>.</p>

<p>
<termref def="key-localName"><emph>Local-name</emph></termref> content items and the local-name portion of <emph>qname</emph> content items are assigned to partitions based
on the namespace URI of the NS event or <emph>qname</emph> content item of which the local-name is a part. Partitions containing <termref def="key-localName"><emph>local-name</emph></termref>
content items are optimized for frequent use of string literals and the string table is pre-populated
with entries as described in <specref ref="initialLocalNames"/>.
When a schema is provided, the string table is also pre-populated with the
local name of each attribute, element and type declared in the
schema, partitioned by namespace URI and sorted lexicographically.</p>

<!-- p><termref def="key-nameContentItem"><emph>Name</emph></termref> content items are assigned to the
name partition. The name partition is
optimized for frequent use of string literals and is initially empty.</p -->

<p>
<termref def="key-valueContentItem"><emph>Value</emph></termref>
content items are assigned simultaneously to the global value partition
as well as to the "local" value partition that corresponds to the
<emph>qname</emph> of the attribute or element in context at the time
when the string table is looked up and the string value is not found in both global and local value partitions.
Partitions containing <termref def="key-valueContentItem"><emph>value</emph>
</termref> content items are optimized for frequent use of string literals and are initially empty.</p>
</div3>

<div3 id="encodingOptimizedForHits">
<head>Partitions Optimized for Frequent use of Compact Identifiers</head>
<p>String table partitions that are expected to contain a relatively
small number of entries used repeatedly throughout the document are
optimized for the frequent use of compact identifiers. This includes the <termref def="key-uriContentItem"><emph>uri</emph></termref> partition and
all partitions containing <termref def="key-prefixContentItem"><emph>prefix</emph></termref> content items. </p>

<p>When a string value is found in a partition optimized for frequent use of compact identifiers,
the string value is represented as the value (<emph>i</emph>+1)
encoded as an <emph>n</emph>-bit Unsigned Integer (<specref ref="encodingBoundedUnsigned"/>), where
<emph>i</emph> is the value of the compact identifier, <emph>n</emph> is
&lceil; log<sub>2</sub> (<emph>m</emph>+1) &rceil; and <emph>m</emph> is the number of
entries in the string table partition at the time of the operation.
</p>

<p>When a string value is not found in a partition optimized for frequent use of compact identifiers,
the String value is represented as zero (0) encoded as an
<emph>n</emph>-bit Unsigned Integer, followed by the string literal
encoded as a String (<specref ref="encodingString"/>). After
encoding the String value, it is added to the string table partition
and assigned the next available compact identifier <emph>m</emph>.</p>
</div3>

<div3 id="encodingOptimizedForMisses">
<head>Partitions Optimized for Frequent use of String Literals</head>
<p>The remaining string table partitions are optimized for
the frequent use of string literals. This includes all string table partitions containing
<termref def="key-localName"><emph>local-name</emph></termref> content items
and all string table partitions containing <termref def="key-valueContentItem"><emph>value</emph></termref> content
items.</p>

<p>When a string value is found in the partitions containing
<emph>local-name</emph> content items, the
string value is represented as zero (0) encoded as an Unsigned Integer (see
<specref ref="encodingUnsignedInteger"/>) followed by an the compact
identifier of the string value. The compact identifier of the string
value is encoded as an <emph>n</emph>-bit unsigned integer (<specref ref="encodingBoundedUnsigned"/>), where
<emph>n</emph> is &lceil; log<sub>2</sub> <emph>m</emph> &rceil; and <emph>m</emph> is
the number of entries in the string table partition at the time of the operation.</p>

<p>When a string value is not found in the partitions containing
<emph>local-name</emph> content items, its
string literal is encoded as a String (see <specref
ref="encodingString"/>) with the length of the string is incremented
by one. After encoding the string value, it is added to the string
table partition and assigned the next available compact
identifier <emph>m</emph>.</p>

<p>As described above, <emph>value</emph> content items are assigned
to two partitions, a "local" value partition and the global
value partition. When a string value is found in the "local" value partition,
the string value is represented as zero (0) encoded as an Unsigned Integer (see
<specref ref="encodingUnsignedInteger"/>) followed by the compact identifier
of the string value in the "local" value partition. 
When a string value is found in the global value partition, but not in the "local" value
partition, the String value is represented as one (1) encoded as an
Unsigned Integer (see <specref ref="encodingUnsignedInteger"/>) followed by the compact
identifier of the String value in the global value
partition. The compact identifier is encoded as an <emph>n</emph>-bit
unsigned integer (<specref ref="encodingBoundedUnsigned"/>), where <emph>n</emph> is &lceil; log<sub>2</sub><emph>m</emph> &rceil; and <emph>m</emph> is the number of entries in the
associated partition at the time of the operation.</p>

<p>When a string value is not found in the global or "local" 
<emph>value</emph> partition, its string literal is encoded as a
String (see <specref ref="encodingString"/>) with the length
incremented by two. After encoding the string value, it is added to
both the associated "local" value string table partition and the global value
string table partition.</p>

</div3>
</div2>

<div2 id="pluggableCodecs">
<head>Pluggable CODECS</head>
<p>By default, each typed value in an EXI stream is represented by the
associated built-in EXI data type (e.g., see<specref
ref="builtInEXITypes"/>). However, <termdef id="key-pluggablecodecs"
term="Pluggable CODECS"><termref def="key-exiprocessor">EXI processors</termref> MAY provide the capability to
specify different built-in types or user-defined encoder/decoders
(CODECS) for representing specific schema types. This capability is
called <term>Pluggable CODECS</term></termdef>.
</p>

<p>
EXI processors that support Pluggable CODECS MAY provide
external means to define and install user-defined CODECS, of which EXI
processors are free to choose implementation dependent mechanisms. EXI
processors MAY also provide means for applications or users to specify
alternate built-in types or user-defined CODECS for representing
specific schema types, the mechanisms of which are again
implementation dependent.
</p>
<p>When an EXI processor encodes an EXI stream using Pluggable CODECS,
it MUST specify
in the EXI header each schema type that is not represented using the
default built-in type and the alternate built-in type or user-defined
CODEC used for each one unless the whole <termref def="key-options">EXI Options</termref> part of the header is omitted.
An EXI processor that attempts to decode an
EXI stream that specifies a user-defined CODEC in the EXI header that
it does not recognize MAY report a warning, but this is not an
error. However, when an EXI processor encounters a typed value that
was encoded by a user-defined CODEC that it does not support, it MUST
report an error.</p>
<p>The EXI options header, when it appears in an EXI stream, MUST include a codecMap element for each
schema type that is not represented using the default built-in
type. The codecMap element includes two child elements. The QName of
the first child element identifies the schema type that is not
represented using the default built-in type and the QName of the
second child element identifies the alternate built-in type or
user-defined CODEC used to represent that type. Built-in types are
identified by the type identifiers in <specref
ref="builtInEXITypes"/>. </p>

<p>For example, the following codecMap element indicates all values of
type xsd:decimal in the EXI stream are represented using the built-in
String type, which has the type ID xsd:string: </p>

<example>
<head>codecMap indicating all Decimal values are represented using
built-in String type</head>
<eg xml:space="preserve">
    &lt;codecMap xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
        &lt;xsd:decimal/&gt;
        &lt;xsd:string/&gt;
    &lt;/codecMap&gt;
</eg>
</example>

<p>It is the responsibility of an EXI processor to interface with a particular implementation of built-in types or user-defined CODECs properly. In the example above, an EXI processor may need to provide a string value of the data being processed that is typed as xsd:decimal in order to interface with a built-in String type. In such a case, some EXI processors may have started with a decimal value and such processors may well translate the value into a string before passing the data to the built-in String type while other EXI processors may already have a string value of the data so that it can pass the value directly to the built-in String type without any translation.
</p>

<p>As another example, the following codecMap element indicates all

values of the used-defined type geo:geometricSurface are represented
using the user-defined CODEC geo:geometricInterpolator: </p>

<example>
<head>codecMap illustrating a used defined typed represented by a user
defined CODEC.</head>
<eg xml:space="preserve">
    &lt;codecMap xmlns:geo="http://www.example.com/Geometry"&gt;
        &lt;geo:geometricSurface/&gt;
	&lt;geo:geometricInterpolator/&gt;
    &lt;/codecMap&gt;
</eg>
</example>

<note>

EXI only defines a way to indicate the use of user-defined CODECs for representing values of specific types. CODECs which are assigned to types by QNames, are omnipresent only if the QName is one of those that represent built-in EXI datatypes. For CODECs of other QNames, EXI does not provide nor  suggest a method by which they are identified and shared between EXI Processors. Therefore, its use needs to be restrained by weighing alternatives and considering the consequences of each in pros and cons, in order to avoid unruly proliferation of documents that use custom CODECs. Those applications that ever find Pluggable CODECS useful should make sure that they exchange such documents only among the parties that are pre-known or discovered to be able to process the user-defined CODECs that are in use. Otherwise, if it is not for certain if a receiver undestands the particular user-defined CODECs, the sender should never attempt to send documents that use user-defined CODECs to that recipient.

</note>

</div2>

</div1>

<div1 id="grammars">
<head>EXI Grammars</head>
<p>EXI is a knowledge based encoding that uses a set of grammars to
determine which events are most likely to occur at any given point in
an EXI stream and encodes the most likely alternatives in fewer
bits. It does this by mapping the stream of events to a lower entropy
set of representative values and encoding those values using a set of
simple variable length codes or an EXI compression algorithm. </p>
<p>The result is a very simple, small algorithm that uniformly handles
schema-less encoding, schema-informed encoding, schema deviations,
and any combination thereof in EXI streams. These variations do
not require different algorithms or different parsers, they are simply
informed by different combinations of grammars. </p>
<p>The following sections describe the grammars used to inform the EXI encoding. </p>
<!-- note>The grammars in this specification are intentionally permissive. They accept all valid documents, but also accept several invalid documents. </note -->
<note>The grammar semantics in this specification are written for clarity and generality. They do not prescribe a particular implementation approach. </note>
<div2 id="grammarNotation">
<head>Grammar Notation</head>
<!-- 
<p>In this specification, all terminal symbols are represented in plain text and all non- terminal symbols are represented in 
<emph>italics</emph>. Grammar productions are represented as follows: </p>
<table width="100%">
<tbody>
<tr>
<td width="5%"></td>
<td>
<emph>LeftHandSide</emph> : 
<emph>RightHandSide</emph></td></tr></tbody></table>
<p>A set of one or more grammar productions that share the same left-hand-side non- terminal symbol may be represented as follows: </p>
<table width="100%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="2">
<emph>LeftHandSide</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td>
<emph>RightHandSide 
<sub>1</sub></emph> 
<emph>RightHandSize 
<sub>2</sub></emph></td></tr>
<tr>
<td></td>
<td></td>
<td>
<emph>RightHandSide 
<sub>3</sub></emph></td></tr>
<tr>
<td></td>
<td></td>
<td>...</td></tr>
<tr>
<td></td>
<td></td>
<td>
<emph>RightHandSide 
<sub>n</sub></emph></td></tr></tbody></table -->

<div3 id="fixedEventCodes">
<head>Fixed Event Codes</head>
<p>Each grammar production has an <termref def="key-eventcode">event code</termref>, which is represented by a sequence of one to three parts separated by periods (&quot;.&quot;). Each part is an unsigned integer. The following are examples of grammar productions with event codes as they appear in this specification. </p>
<example>
<head>Example productions with fixed event codes</head>

<table width="95%">
<thead>
<tr>
<th colspan="3" align="left">Productions</th>
<th align="left">Event Codes</th></tr>
</thead>
<tbody>
<tr>
<td>&nbsp;</td></tr>
<tr>
<td width="5%"></td>
<td colspan="4">
<emph>LeftHandSide <sub>1</sub></emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">
Event <sub>1</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>1</sub></emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>2</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>2</sub></emph></td>
<td>1</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>3</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>3</sub></emph></td>
<td>2.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>4</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>4</sub></emph></td>
<td>2.1</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>5</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>5</sub></emph></td>
<td>2.2.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>6</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>6</sub></emph></td>
<td>2.2.1</td></tr>
<tr>
<td colspan="5">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="4">
<emph>LeftHandSide <sub>2</sub></emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>1</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>1</sub></emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>2</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>2</sub></emph></td>
<td>1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>3</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>3</sub></emph></td>
<td>1.1</td></tr></tbody></table>
</example>
<p>The number of parts in a given event code is called the event code's length. No two productions with the same non-terminal symbol on the left-hand-side are permitted to have the same event code. </p></div3>
<div3 id="variableEventCodes">
<head>Variable Event Codes</head>
<p>Some non-terminal symbols are used on the right-hand-side in a production without an event prefixed to them. Such non-terminal symbols are macros and they are used to capture some recurring set of productions into symbols so that a symbol can be used in the grammar representation instead of including all the productions the macro represents in place every time it is used.
</p>

<example>
<head>Example productions that use macro non-terminal symbols</head>
<table width="95%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>ABigProduction <sub>1</sub></emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">
Event <sub>1</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>1</sub></emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>2</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>2</sub></emph></td>
<td>1</td></tr>
<tr>
<td></td>
<td></td>
<td>
<emph>LEFTHANDSIDE <sub>1</sub></emph> (2.0)</td>
<td>2.0</td></tr>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="3">
<emph>ABigProduction <sub>2</sub></emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>1</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>1</sub></emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>
<emph>LEFTHANDSIDE <sub>1</sub></emph> (1.1)</td>
<td>1.1</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>2</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>2</sub></emph></td>
<td>1.2</td></tr>
</tbody></table>

</example>

<p>
Because non-terminal macros are injected into the right-hand-side of more than one production,
the event codes of productions with these macro non-terminals on the left-hand-side are not fixed, but will have different event code values depending on the context in which the macro non-terminal appears. This specification calls these variable event codes and uses variables in place of individual event code parts to indicate the event code parts are determined by the context. Below are some examples of variable event codes: </p>
<example>
<head>Example non-terminal macros and its productions with variable event codes</head>

<table width="95%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="4">
<emph>LEFTHANDSIDE <sub>1</sub> (n.m)</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">
EVENT <sub>1</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>1</sub></emph></td>
<td>
<emph>n</emph>.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>2</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>2</sub></emph></td>
<td>
<emph>n</emph>.1</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>3</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>3</sub></emph></td>
<td>
<emph>n</emph>. 
<emph>m</emph>+2</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>4</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>4</sub></emph></td>
<td>
<emph>n</emph>. 
<emph>m</emph>+3</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>5</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>5</sub></emph></td>
<td>
<emph>n</emph>. 
<emph>m</emph>+4.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>6</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>6</sub></emph></td>
<td>
<emph>n</emph>. 
<emph>m</emph>+4.1</td></tr></tbody></table>
</example>
<p>Unless otherwise specified, the variable 
<emph>n</emph> evaluates to the event code of the production in which the macro non-terminal 
<emph>LEFTHANDSIDE 
<sub>1</sub></emph> appears on the right-hand-side. Similarly, the expression 
<emph>n</emph>. 
<emph>m</emph> represents the first two parts of the event code of the production in which the macro non-terminal 
<emph>LEFTHANDSIDE 
<sub>1</sub></emph> appears on the right-hand-side. </p>

<p>Non-terminal macros are used in this specification for notational convenience only.
They are not non-terminals, even though they are used in place of non-terminals.
Productions that use non-terminal macros on the right-hand-side need to be expanded by macro substitution before such productions are interpreted.
Therefore, <emph>ABigProduction <sub>1</sub></emph> and <emph>ABigProduction <sub>2</sub></emph> shown in the preceding example are equivalent to the following set of productions derived by expanding the non-terminal macro symbol <emph>LEFTHANDSIDE 
<sub>1</sub></emph> and evaluating the variable event codes.
</p>
<example>
<head>Expanded productions equivalent to the productions used above</head>

<table width="95%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="4">
<emph>ABigProduction <sub>1</sub></emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td>
Event <sub>1</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>1</sub></emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>2</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>2</sub></emph></td>
<td>1</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>1</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>1</sub></emph></td>
<td>2.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>2</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>2</sub></emph></td>
<td>2.1</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>3</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>3</sub></emph></td>
<td>2.2</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>4</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>4</sub></emph></td>
<td>2.3</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>5</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>5</sub></emph></td>
<td>2.4.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>6</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>6</sub></emph></td>
<td>2.4.1</td></tr>
<tr>
<td colspan="5">&nbsp;</td></tr>


<tr>
<td width="5%"></td>
<td colspan="4">
<emph>ABigProduction <sub>2</sub></emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>1</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>1</sub></emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td width="75%">
EVENT <sub>1</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>1</sub></emph></td>
<td>1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>2</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>2</sub></emph></td>
<td>1.1</td></tr>
<tr>
<td></td>
<td></td>
<td>
Event <sub>2</sub>&nbsp;&nbsp;
<emph>NonTerminal <sub>2</sub></emph></td>
<td>1.2</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>3</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>3</sub></emph></td>
<td>1.3</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>4</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>4</sub></emph></td>
<td>1.4</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>5</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>5</sub></emph></td>
<td>1.5.0</td></tr>
<tr>
<td></td>
<td></td>
<td>
EVENT <sub>6</sub>&nbsp;&nbsp;<emph>NONTERMINAL 
<sub>6</sub></emph></td>
<td>1.5.1</td></tr>

</tbody></table>

</example></div3>
<!-- div3 id="productionBag">
<head>Production Bag</head>
<p>Some non-terminal symbols are used on the right-hand-side in a production with a pseudo event code of the form of a range (0 ... <emph>n</emph>). Such a non-terminal symbol represents a bag of productions where variable <emph>n</emph> used in the pseudo event code denotes the number of productions in the bag, and is used without an event prefixed to them.
</p>

<example>
<head>Example use of a production bag</head>
<table width="95%">
<thead>
<tr>
<th align="left" colspan="3">&nbsp;</th>
<th align="left">Event Code</th></tr>
</thead>
<tbody>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>ABigProduction</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%"><emph>ProductionBag</emph></td>
<td>0 ... (<emph>n</emph>-1)</td></tr>
<tr>
<td></td>
<td></td>
<td>
<emph>LEFTHANDSIDE (n)</emph></td>
<td><emph>n</emph></td></tr>

<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="3">
<emph>LEFTHANDSIDE (n)</emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>1</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>1</sub></emph></td>
<td><emph>n</emph></td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>2</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>2</sub></emph></td>
<td><emph>n</emph>+1</td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>3</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>3</sub></emph></td>
<td>(<emph>n</emph>+2).0</td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>4</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>4</sub></emph></td>
<td>(<emph>n</emph>+2).1</td></tr>
</tbody></table>
</example>

<p>The content of a production bag can be either static or dynamic. A static bag contains a fixed set of productions throughout its life cycle whereas a dynamic bag grows while processing an EXI stream. Production bags are used in this specification for notational convenience only. They are not non-terminals, even though they are used in place of non-terminals. Productions that use production bags on the right-hand-side need to be expanded by substituting the bags with their content before such productions are interpreted.
</p>

<p>The grammar <emph>ABigProduction</emph> shown in the preceding example is equivalent to the following set of productions when the production bag <emph>ProductionBag</emph> contains productions that have <emph>RightHandSide <sub>1</sub></emph>, <emph>RightHandSide <sub>2</sub></emph> and <emph>RightHandSide <sub>3</sub></emph> as the right-hand-side with respective event code 0, 1 and 2.
</p>

<example>
<head>Expanded productions equivalent to the productions used above</head>
<table width="95%">
<thead>
<tr>
<th align="left" colspan="3">&nbsp;</th>
<th align="left">Event Code</th></tr>
</thead>
<tbody>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>ABigProduction</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%"><emph>RightHandSide <sub>1</sub></emph></td>
<td><emph>0</emph></td></tr>
<tr>
<td></td>
<td></td>
<td><emph>RightHandSide <sub>2</sub></emph></td>
<td><emph>1</emph></td></tr>
<tr>
<td></td>
<td></td>
<td><emph>RightHandSide <sub>3</sub></emph></td>
<td><emph>2</emph></td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>1</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>1</sub></emph></td>
<td><emph>3</emph></td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>2</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>2</sub></emph></td>
<td>4</td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>3</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>3</sub></emph></td>
<td>5.0</td></tr>
<tr>
<td></td>
<td></td>
<td>EVENT <sub>4</sub>&nbsp;&nbsp;<emph>NONTERMINAL <sub>4</sub></emph></td>
<td>5.1</td></tr>
</tbody></table>
</example>

</div3 -->
</div2>
<div2 id="grammarEventCodes">
<head>Grammar Event Codes</head>
<p>Each production rule in the EXI grammar includes an event code value that approximates the likelihood the associated production rule will be matched over the other productions with the same left-hand-side non-terminal symbol. Ultimately, the event codes determine the value(s) by which each non-terminal symbol will be represented in the EXI stream. </p>
<p>To understand how a given event code approximates the likelihood a given production will matched, it is useful to visualize the event codes for a set of production rules that have the same non-terminal symbol on the left-hand-side as a tree. For example, the following set of productions: </p>
<example>
<head>Example productions with event codes</head>

<table width="95%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="4">
<emph>ElementContent</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">EE</td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>SE (*) 
<emph>ElementContent</emph></td>
<td>1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>CH 
<emph>ElementContent</emph></td>
<td>1.1</td></tr>
<tr>
<td></td>
<td></td>
<td>ER 
<emph>ElementContent</emph></td>
<td>1.2</td></tr>
<tr>
<td></td>
<td></td>
<td>CM 
<emph>ElementContent</emph></td>
<td>1.3.0</td></tr>
<tr>
<td></td>
<td></td>
<td>PI 
<emph>ElementContent</emph></td>
<td>1.3.1</td></tr></tbody></table></example>
<p>represents a set of information items that might occur as element content after the start tag. Using the production event codes, we can visualize this set of productions as follows: </p>
<graphic source="eventCodeTree.png" alt="Event code tree for ElementContent grammar"/>
<p>where the non-terminal symbols are represented by the leaf nodes of the tree and the event code of each production rule that contains a non-terminal symbol defines a path from the root of the tree to the node associated with that symbol. We call this the event code tree for a given set of productions. </p>
<p>An event code tree is similar to a Huffman tree <bibref ref="huffman"/> in that shorter paths are generally used for symbols that are considered more likely. However, event code trees are far simpler and less costly to compute and maintain. Event code trees are shallow and contain at most three levels. In addition, the length of each event code in the event code tree is assigned statically without analyzing the data. This classification provides some of the benefits of a Huffman tree without the cost. </p></div2>
<div2 id="pruningProductions">
<head>Pruning Unneeded Productions</head>
<p>As discussed in section 
<specref ref="fidelityOptions"/>, applications MAY provide a set of fidelity options to specify the XML features they require. EXI processors MUST use these fidelity options to prune the events that are not required from the grammars, improving compactness and processing efficiency.</p>
<p>For example, the following set of productions represent the set of information items that might occur as element content after the start tag.</p>
<example>
<head>Example productions with full fidelity</head>

<table width="95%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>ElementContent</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">EE</td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>SE (*) 
<emph>ElementContent</emph></td>
<td>1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>CH 
<emph>ElementContent</emph></td>
<td>1.1</td></tr>
<tr>
<td></td>
<td></td>
<td>ER 
<emph>ElementContent</emph></td>
<td>1.2</td></tr>
<tr>
<td></td>
<td></td>
<td>CM 
<emph>ElementContent</emph></td>
<td>1.3.0</td></tr>
<tr>
<td></td>
<td></td>
<td>PI 
<emph>ElementContent</emph></td>
<td>1.3.1</td></tr></tbody></table>
</example>
<p>If an application sets the fidelity options preserve.comments, preserve.pis and preserve.dtd to false, the productions matching comment (CM), processing instruction (PI) and entity reference (ER) events are pruned from the grammar, producing the following set of productions: </p>
<example>
<head>Example productions after pruning</head>

<table width="95%">
<tbody>
<tr>
<td width="5%"></td>
<td colspan="4">
<emph>ElementContent</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="75%">EE</td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>SE (*) 
<emph>ElementContent</emph></td>
<td>1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>CH 
<emph>ElementContent</emph></td>
<td>1.1</td></tr></tbody></table>
</example>
<p>Removing these productions from the grammar tells EXI processors that comments and processing instructions will never occur in the EXI stream, which reduces the entropy of the stream allowing it to be encoded in fewer bits. </p>
<p>Each time a production is removed from a grammar, the event codes of the other productions with the same non-terminal symbol on the left-hand-side MUST be adjusted to keep them contiguous if its removal has left the remaining productions with non-contiguous event codes.</p></div2>
<div2 id="builtinGrammars">
<head>Built-in XML Grammars</head>
<p>This section describes the built-in XML grammar used by EXI when no additional information is available to describe the contents of the EXI stream. The built-in XML grammar is used when no schema exists, for elements with unrestricted types (e.g., xsd:anyType) and for schema extensions and deviations that are not declared by the schema. </p>
<p>A built-in XML grammar is self-evolving. The built-in grammar continuously reflects the knowledge being learned while processing an EXI stream onto itself in order to keep refining itself for subsequent use of the grammar within the extent of processing a single stream.</p>
<div3 id="builtinDocGrammars">
<head>Built-in Document Grammar</head>
<p>In the absence of additional information about the content of the EXI stream, the following grammar describes the events that will occur in an EXI document. </p>
<table width="100%">
<tbody>
<tr>
<th align="left" colspan="3">Syntax</th>
<th align="left">Event Code</th></tr>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>Document</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="60%">SD 
<emph>DocContent</emph></td>
<td width="30%">0</td></tr>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="3">
<emph>DocContent</emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>SE (*) 
<emph>DocEnd</emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>DT 
<emph>DocContent</emph></td>
<td>1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>CM 
<emph>DocContent</emph></td>
<td>1.1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>PI 
<emph>DocContent</emph></td>
<td>1.1.1</td></tr>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="3">
<emph>DocEnd</emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>ED</td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>CM 
<emph>DocEnd</emph></td>
<td>1.0</td></tr>
<tr>
<td></td>
<td></td>
<td>PI 
<emph>DocEnd</emph></td>
<td>1.1</td></tr></tbody></table>
<p></p>
<table>
<tbody>
<tr>
<th align="left">Semantics:</th></tr></tbody></table>
<p>All productions in the built-in Document grammars of the form 
<emph>LeftHandSide</emph> : SE (*) <emph>RightHandSide</emph>
are evaluated as follows: </p>
<olist>
<item>Let <emph>qname</emph> be the qualified name of the element matched by SE (*) </item>
<item>If a grammar does not exist for element 
<emph>qname</emph>, create one based on the <termref def="key-builtinElementGrammar">Built-in Element Grammar</termref></item>
<item>Evaluate the element contents using a built-in grammar for element <emph>qname</emph></item>
<item>Evaluate the remainder of event sequence using <emph>RightHandSide</emph>.</item>
</olist>
</div3>
<div3 id="builtinFragGrammars">
<head>Built-in Fragment Grammar</head>
<p>In the absence of additional information about the contents of an EXI stream, the following grammar describes the events that will occur in an EXI fragment. The grammar shown below represents the initial set of productions that belong to a built-in fragment grammar at the start of a stream processing, which is supplemented by the semantic description that explains the rules used to evolve the built-in fragment grammar to continuously improve it and be better prepared for subsequent uses of the same grammar during the rest of the processing of the stream.</p>

<table width="100%">
<thead>
<tr>
<th align="left" colspan="3">Syntax</th>
<th align="left">Event Code</th></tr>
</thead>
<tbody>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>Fragment</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="60%">SD 
<emph>FragmentContent</emph></td>
<td width="30%">0</td></tr>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="3">
<emph>FragmentContent</emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>SE (*) 
<emph>FragmentContent</emph></td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>ED</td>
<td>1</td></tr>
<tr>
<td></td>
<td></td>
<td>CM 
<emph>FragmentContent</emph></td>
<td>2.0</td></tr>
<tr>
<td></td>
<td></td>
<td>PI 
<emph>FragmentContent</emph></td>
<td>2.1</td></tr>
</tbody></table>
<p></p>
<table>
<tbody>
<tr>
<th align="left">Semantics:</th></tr></tbody></table>
<p>All productions in the built-in Fragment grammars of the form 
<emph>LeftHandSide</emph> : SE (*) <emph>RightHandSide</emph>
are evaluated as follows: </p>
<olist>
<item>Let <emph>qname</emph> be the qualified name of the element matched by SE (*) </item>
<item>If a grammar does not exist for element 
<emph>qname</emph>, create one based on the <termref def="key-builtinElementGrammar">Built-in Element Grammar</termref></item>
<item>Evaluate the element contents using a built-in grammar for element <emph>qname</emph></item>
<item>Create a production of the form <emph>LeftHandSide</emph> : SE (<emph>qname</emph>) <emph>RightHandSide</emph> with an event code 0</item>
<item>Increment the first part of the event code of each production in the current grammar with the non-terminal <emph>LeftHandSide</emph> on the left hand side.</item>
<item>Add the production created in step 4 to the grammar</item>
<item>Evaluate the remainder of event sequence using <emph>RightHandSide</emph>.</item>
</olist>

<p>All productions of the form <emph>LeftHandSide</emph> : SE (<emph>qname</emph>) <emph>RightHandSide</emph> that were previously added to the grammar upon the first occurrence of the element that has the qualified name <emph>qname</emph> are evaluated as follows when they are matched: </p>
<olist>
<item>Evaluate the element contents using a built-in grammar for element <emph>qname</emph></item>
<item>Evaluate the remainder of event sequence using <emph>RightHandSide</emph>.</item>
</olist>

</div3>
<div3 id="builtinElemGrammars">
<head>Built-in Element Grammar</head>
<p><termdef id="key-builtinElementGrammar" term="Built-in Element Grammar">EXI defines a <term>built-in element grammar</term> that is used in the absence of additional information about the contents of an EXI element prior to its processing.</termdef> A built-in element grammar shown below is prescibed by EXI to reflect the events that will occur in an element and the order amongst them in general without any further constraint about what is likely or not likely to occur inside elements.</p>
<p>A single instance of built-in element grammar is shared by those elements in a stream that have the same qualified name and do not have additional a priori constraints as to their content. A separate instance of built-in element grammar is assigned to each qualified name upon the first occurrence of the elements of the same qualified name, thereafter the grammar continuously evolves by reflecting the knowledge learned while processing the content of those elements. The grammar shown below represents the initial set of productions that belong to a built-in element grammar at the time when a new instance is created, which is supplemented by the semantic description that explains the rules that are applied by the grammar onto itself to evolve and be better prepared for subsequent uses of the same grammar instance during the rest of the processing of the stream.</p>
<table width="100%">
<thead>
<tr>
<th align="left" colspan="3">Syntax</th>
<th align="left">Event Code</th></tr>
</thead>
<tbody>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td width="5%"></td>
<td colspan="3">
<emph>StartTagContent</emph> :</td></tr>
<tr>
<td></td>
<td width="5%"></td>
<td width="60%">EE</td>
<td width="30%">0.0</td></tr>
<tr>
<td></td>
<td></td>
<td>AT (*) 
<emph>StartTagContent</emph></td>
<td>0.1</td></tr>
<tr>
<td></td>
<td></td>
<td>NS 
<emph>StartTagContent</emph></td>
<td>0.2</td></tr>
<tr>
<td></td>
<td></td>
<td>
<emph>ChildContentItems</emph> (0.3)</td>
<td></td></tr>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="3">
<emph>ElementContent</emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>EE</td>
<td>0</td></tr>
<tr>
<td></td>
<td></td>
<td>
<emph>ChildContentItems</emph> (1.0)</td>
<td></td></tr>
<tr>
<td colspan="4">&nbsp;</td></tr>
<tr>
<td></td>
<td colspan="3">
<emph>ChildContentItems (n.m)</emph> :</td></tr>
<tr>
<td></td>
<td></td>
<td>SE (*) <emph>ElementContent</emph></td>
<td>
<emph>n</emph>. 
<emph>m</emph></td></tr>
<tr>
<td></td>
<td></td>
<td>CH <emph>ElementContent</emph></td>
<td>
<emph>n</emph>.(<emph>m</emph>+1)</td></tr>
<tr>
<td></td>
<td></td>
<td>ER <emph>ElementContent</emph></td>
<td>
<emph>n</emph>.(<emph>m</emph>+2)</td></tr>
<tr>
<td></td>
<td></td>
<td>CM <emph>ElementContent</emph></td>
<td>
<emph>n</emph>.(<emph>m</emph>+3).0</td></tr>
<tr>
<td></td>
<td></td>
<td>PI <emph>ElementContent</emph></td>
<td>
<emph>n</emph>.(<emph>m</emph>+3).1</td></tr>
</tbody></table>
<p></p>
<table>
<tbody>
<tr>
<th align="left">Semantics:</th></tr></tbody></table>
<p>All productions in the built-in Element grammar of the form 
<emph>LeftHandSide</emph>: AT (*) 
<emph>RightHandSide</emph> are evaluated as follows: </p>
<olist>
<item>Let 
<emph>qname</emph> be the qualified name of the attribute matched by AT (*) </item>
<item>Create a production of the form 
<emph>LeftHandSide</emph> : AT (<emph>qname</emph>) <emph>StartTagContent</emph>
with an event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal <emph>LeftHandSide</emph> on the left hand side</item>
<item>Add the production created in the previous step to the grammar</item>
<item>Evaluate the remainder of event sequence using <emph>RightHandSide</emph>.</item>
</olist>
<p>All productions in the built-in Element grammars of the form 
<emph>LeftHandSide</emph> : SE (*) <emph>RightHandSide</emph> are evaluated as follows: </p>
<olist>
<item>Let <emph>qname</emph> be the qualified name of the element matched by SE (*) </item>
<item>If a grammar does not exist for element 
<emph>qname</emph>, create one based on the <termref def="key-builtinElementGrammar">Built-in Element Grammar</termref></item>
<item>Evaluate the element contents using a built-in grammar for element <emph>qname</emph></item>
<item>Create a production of the form <emph>LeftHandSide</emph> : SE (<emph>qname</emph>) <emph>RightHandSide</emph> with an event code 0</item>
<item>Increment the first part of the event code of each production in the current grammar with the non-terminal <emph>LeftHandSide</emph> on the left hand side.</item>
<item>Add the production created in step 4 to the grammar</item>
<item>Evaluate the remainder of event sequence using <emph>RightHandSide</emph>.</item>
</olist>
<p>All productions of the form <emph>LeftHandSide</emph> : SE (<emph>qname</emph>) <emph>RightHandSide</emph> that were previously added to the grammar upon the first occurrence of the element that has the qualified name <emph>qname</emph> are evaluated as follows when they are matched: </p>
<olist>
<item>E