Efficient XML Interchange (EXI) Format 1.0

1. Introduction

The Efficient XML Interchange (EXI) format is a very compact, high performance XML representation that was designed to work well for a broad range of applications. It simultaneously improves performance and significantly reduces bandwidth requirements without compromising efficient use of other resources such as battery life, code size, processing power, and memory.

EXI uses a grammar-driven approach that achieves very efficient encodings using a straightforward encoding algorithm and a small set of ~~data types.~~ datatype representations. Consequently, EXI processors are relatively simple and can be implemented on devices with limited capacity.

EXI is schema "informed", meaning that it can utilize available schema information to improve compactness and performance, but does not depend on accurate, complete or current schemas to work. It supports arbitrary schema extensions and deviations and also works very effectively with partial schemas or in the absence of any schema. The format itself also does not depend on any particular schema language, or format, for schema information.

[Definition:] A program module called an EXI processor , whether it is ~~part of a~~ software or a hardware, is used by application programs to encode their structured data into EXI streams and/or to decode EXI streams to make the structured data ~~accessible to them.~~ accessible. The former and ~~the~~ latter ~~of the~~ aforementioned roles of EXI processors are ~~each~~ called [Definition:] EXI stream encoder and [Definition:] EXI stream decoder . respectfully. This document not only specifies the EXI format, but also defines errors that EXI processors are required to detect and behave upon.

The primary goal of this document is to define the EXI format completely without leaving ambiguity so as to make it feasible for implementations to interoperate. As such, the document lends itself to describing the design and features of the format in a systematic manner, often declaratively with relatively few prosaic annotations and examples. Those readers who prefer a step-by-step introduction to the EXI format design and features are suggested to start with the non-normative [EXI Primer] .

1.1 History and Design

EXI is the result of extensive work carried out by the W3C's XML Binary Characterization (XBC) and Efficient XML Interchange (EXI) Working Groups. XBC was chartered to investigate the costs and benefits of an alternative form of XML, and formulate a way to objectively evaluate the potential of a substitute format for XML. Based on XBC's recommendations, EXI was chartered, first to measure, evaluate, and compare the performance of various XML technologies (using metrics developed by XBC [XBC Measurement Methodologies] ), and then, if it appeared suitable, to formulate a recommendation for a W3C format specification. The measurements results and analyses, are presented elsewhere [EXI Measurements Note] . The format described in this document is the specification so recommended.

The functional requirements of the EXI format are those that were prepared by the XBC WG in their analysis of the desirable properties of a high performance ~~encoding~~ representation for XML [XBC Properties] . Those properties were derived from a very broad set of use cases also identified by the XBC working group [XBC Use Cases] .

The design of the format presented here, is largely based on the results of the measurements carried out by the group to evaluate the performance characteristics (mainly of processing efficiency and compactness) of various existing formats. The EXI format is based on Efficient XML [Efficient XML] , including for example the basis heuristic grammar approach, compression algorithm, and resulting entropy encoding.

EXI is compatible with XML at the XML Information Set [XML Information Set] level, rather than at the XML syntax level. This permits it to encapsulate an efficient alternative syntax and grammar for XML, while facilitating at least the potential for minimizing the impact on XML application interoperability.

1.2 Notational Conventions and Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear EMPHASIZED in this document, are to be interpreted as described in RFC 2119 [IETF RFC 2119] . Other terminology used to describe the EXI format is defined in the body of this specification.

The term event and stream is used throughout this document to denote EXI event and EXI stream respectively unless the words are qualified differently to mean otherwise.

This document specifies an abstract grammar for EXI. In grammar notation, all terminal symbols are represented in plain text and all non-terminal symbols are represented in italics . Grammar productions are represented as follows:

LeftHandSide : ~~Event~~ Terminal NonTerminal

A set of one or more grammar productions that share the same ~~left-hand-side~~ left-hand side non-terminal symbol are often presented together ~~along~~ annotated with event codes that ~~uniquely identify~~ specify how events ~~among~~ matching the ~~collocated~~ terminal symbols of the associated productions are represented in the EXI stream as follows:

LeftHandSide :
	~~Event~~ Terminal _1 1 NonTerminal _1 1	EventCode _1 1
	~~Event~~ Terminal _2 2 NonTerminal _2 2	EventCode _2 2
	~~Event~~ Terminal _3 3 NonTerminal _3 3	EventCode _3 3
	...
	~~Event~~ Terminal _n n NonTerminal _n n	EventCode _n n

Section 8.1 Grammar Notation introduces additional notations for describing productions and event codes in grammars. Those additional notations ~~facilitates~~ facilitate concise representation of the EXI grammar system.

[Definition:] In this document, the term qname is used to denote a QName ^XS2. When used to qualify terminal symbols in grammars (see Table 4-1 for notation), to identify built-in element grammars (see 8.4.3 Built-in Element Grammar ) and global type grammars (see 8.5.4.1.3 Type Grammars ), or to distinguish value channels in EXI compression (see 9.2.2 Value Channels ), such uses of qname represent QName ~~values, which~~ values are ~~tuples~~ composed of ~~{ uri, local-name }. Otherwise, a qname represents a QName value affixed with a prefix part to make~~ a ~~triplet of { prefix,~~ uri, ~~local-name }, where the absence of prefix is indicated by "" (an empty string).~~ a localname and an optional prefix. Two qnames are considered equal ~~when~~ if they have the same uri and ~~the same local-name to each other~~ localname, regardless of their prefix values. In cases where prefixes are not relevant, such as in the grammar notation, they are not specified by this document.

Terminal symbols that are qualified with a qname permit the use of a wildcard symbol (*) in place of or as part of a qname. The forms of terminal symbols involving qname wildcards used in grammars and their definitions are described in the table below.

Wildcard	Definition
~~SE ()~~ SE ()	The terminal symbol that matches a start element (SE) event with any qname.
SE ( uri : *)	The terminal symbol that matches a start element (SE) event with any local-name in namespace uri .
~~AT ()~~ AT ()	The terminal symbol that matches an attribute (AT) event with any qname.
AT ( uri : *)	The terminal symbol that matches an attribute (AT) event with any local-name in namespace uri .

Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.

Prefix	Namespace Name
exi	~~http://www.w3.org/2007/07/exi xml http://www.w3.org/XML/1998/namespace~~ http://www.w3.org/2009/exi
xsd	http://www.w3.org/2001/XMLSchema
xsi	http://www.w3.org/2001/XMLSchema-instance

In describing the layout of an EXI format construct, a pair of square brackets [ ] are used to surround the name of a field to denote that the occurrence of the field is optional in the structure of the part or component that contains the field.

In arithmetic expressions, the notation ⌈ x ⌉ where x represents a real number denotes the ceiling of x , that is, the smallest integer greater than or equal to x .

When it is stated that strings are sorted in lexicographical order, it is done so character by character, and the order among characters are determined by comparing their Unicode codepoints.

2. Design Principles

The following design principles were used to guide the development of EXI and encourage consistent design decisions. They are listed here to provide insight into the EXI design rationale and to anchor discussions on desirable EXI traits.

General:: One of primary objectives of EXI is to maximize the number of systems, devices and applications that can communicate using XML data. Specialized approaches optimized for specific use cases should be avoided.
Minimal:: To reach the broadest set of small, mobile and embedded applications, simple, elegant approaches are preferred to large, analytical or complex ones.
Efficient:: EXI must be competitive with hand-optimized binary formats so it can be used by applications that require this level of efficiency.
Flexible:: EXI must deal flexibly and efficiently with documents that contain arbitrary schema extensions or deviate from their schema. Documents that contain schema deviations should not cause encoding to fail.
Interoperable:: EXI must integrate well with existing XML technologies, minimizing the changes required to those technologies. It must be compatible with the XML Information Set [XML Information Set] , without significant subsetting or supersetting, in order to maintain interoperability with existing and prospective XML specifications.

3. Basic Concepts

EXI achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur at any given point in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, EXI is optimized specifically for XML languages.

The built-in EXI ~~grammar accepts~~ grammars accept any XML document or fragment and may be augmented with productions derived from XML Schemas [XML Schema Structures] [XML Schema Datatypes] , RELAX NG schemas [ISO/IEC 19757-2:2003] , DTDs [XML 1.0] [XML 1.1] or other sources of information about what is likely to occur in a set of XML documents. The EXI stream encoder uses the grammar to map a stream of XML information items onto a smaller, lower entropy, stream of ~~events.~~ events .

The EXI stream encoder then represents the stream of events using a set of simple variable length codes called event codes . Event codes are similar to Huffman codes [Huffman Coding] , but are much simpler to compute and maintain. They are encoded directly as a sequence of values, or if additional compression is desired, they are passed to the EXI compression algorithm, which replaces frequently occurring event patterns to further reduce size.

When schemas are used, EXI also supports a user-customizable set of ~~typed encodings~~ Datatype Representations for efficiently ~~encoding~~ representing typed values.

4. EXI Streams

[Definition:] An EXI stream is an EXI header followed by an EXI body. [Definition:] ~~It is the~~ The EXI body ~~that~~ carries the content of the document, while the EXI header ~~amongst its roles~~ communicates the options ~~that were~~ used for encoding the EXI body. Section 5. EXI Header describes the EXI header . ~~Values in an EXI stream are packed into bytes most significant bit first.~~

[Definition:] The building block of an EXI body is an EXI event . An EXI body consists of a sequence of EXI events representing an EXI document or an EXI fragment .

The EXI events permitted at any given position in an EXI stream are determined by the EXI grammar. As is the case with XML, the events occur with nesting pairs of matching start element and end element events where any pair does not intersect with another except when it is fully contained in the other. The EXI grammar incorporates knowledge of the XML grammar and may be augmented and refined using schema information and fidelity options. The EXI grammar is formally specified in section 8. EXI Grammars .

~~The~~ An EXI ~~grammars either permits only~~ body can represent an EXI document with a single root element or ~~multiple root elements in~~ an EXI ~~body, depending on the top-level grammar used for processing the body.~~ fragment with zero or more root elements. [Definition:] EXI documents are EXI bodies ~~encoded using either~~ with a single root element that conform to the Built-in Document Grammar (See 8.4.1 Built-in Document Grammar ) or Schema-informed Document Grammar (See 8.5.1 Schema-informed Document Grammar ~~), and are inherently restricted to each contain only a single root element as per the grammars.~~ ). [Definition:] EXI fragments are EXI bodies ~~encoded using either~~ with zero or more root elements that conform to the Built-in Fragment Grammar (See 8.4.2 Built-in Fragment Grammar ) or Schema-informed Fragment Grammar (See 8.5.2 Schema-informed Fragment Grammar ~~), and are permitted to each contain multiple root elements.~~ ).

[Definition:] When schema information is available to describe the contents of an EXI body, such an EXI stream is a schema-informed EXI stream , and ~~either~~ the EXI body is interpreted according to the Schema-informed Document Grammar (See 8.5.1 Schema-informed Document Grammar ) or Schema-informed Fragment Grammar (See 8.5.2 Schema-informed Fragment Grammar ~~) is used to process the EXI body.~~ ). [Definition:] Otherwise, an EXI stream is a schema-less EXI stream , and ~~either~~ the EXI body is interpreted according to the Built-in Document Grammar (See 8.4.1 Built-in Document Grammar ) or Built-in Fragment Grammar (See 8.4.2 Built-in Fragment Grammar ~~) is used to process the EXI body.~~ ).

The following table summarizes the EXI ~~events~~ event types and associated event content that occur in an EXI stream. [Definition:] The content of an event consists of content items , and the content items appear in an EXI stream in the order they are shown in the ~~table.~~ table following their respective event codes that each marks the start of an event . In addition, the table includes the grammar notation used to represent each event in this specification. Each event in an EXI stream participates in a mapping system that relates events to XML Information Items so that an EXI document or an EXI fragment as a whole serves to represent an XML Information Set. The table shows XML Information Items relevant to each EXI ~~event type.~~ event. Appendix B Infoset Mapping describes the mapping system in detail.

Table 4-1. EXI ~~event types~~ events
EXI Event Type	Event Content (Content Items)	Grammar Notation (Terminal Symbols)	Information Item
Start Document		SD	B.1 Document Information Item
End Document		ED	B.1 Document Information Item
Start Element	qname	SE ( ( qname ) )	B.2 Element Information Items
		~~SE ( * )~~ SE ( * )
		~~SE (~~ SE ( uri : uri :* ) * )
End Element		EE
Attribute	qname, value	~~AT (~~ AT ( qname ) )	B.3 Attribute Information Item
		~~AT (~~ AT ( * )
		AT ( * uri : ) * )
Characters	value	CH	B.6 Character Information item
Namespace Declaration	uri , prefix , local-element-ns	NS	B.11 Namespace Information Item
Comment	text	CM	B.7 Comment Information item
Processing Instruction	name, text	PI	B.4 Processing Instruction Information Item
DOCTYPE	name, public, system, text	DT	B.8 Document Type Declaration Information item
Entity Reference	name	ER	B.5 Unexpanded Entity Reference Information item
Self Contained		SC

Section 6. Encoding EXI Streams describes the algorithm used to encode events in the EXI stream. As indicated in the table above, there are some event types that carry content with their event instances while other event types function as markers without content.

SE events may be followed by a series of NS events. Each NS event either associates a prefix with an URI, assigns a default namespace, or in the case of a namespace declaration with an empty URI, rescinds one of such associations in effect at the point of its occurrence. The effect of the association or disassociation caused by a NS event stays in effect until the corresponding EE event occurs.

Like XML, the namespace of a particular element may be specified by a namespace declaration ~~preceeding~~ preceding the element or a local namespace declaration following the element name. When the namespace is specified by a local namespace declaration, the local-element-ns flag of the associated NS event is set to true and the prefix of the element is set to the prefix of that NS event. When the namespace is specified by a previous namespace declaration, the local-element-ns flag of all local NS events is false and the prefix of the element is set according to the prefix component of the element qname . The series of NS events associated with a particular element may include at most one NS event with its local-element-ns flag set to true. The uri of a NS event with its local-element-ns flag set to true MUST match the uri of the associated SE event.

An SE event may be followed by a SC event, indicating the element is self-contained and can be read independently from the rest of the EXI body. Applications may use self-contained elements to index portions of the EXI body for random access.

The representation of event codes which identify the event type and start each event is described in 6.2 Representing Event Codes . Each item in the event content has a ~~data type~~ datatype representation associated with it as shown in the following table. The content of each ~~event,~~ event , if any, is encoded as a sequence of items each of which being encoded according to its ~~data type~~ datatype representation in order starting with the first item followed by subsequent items.

Table 4-2. ~~Data types~~ Datatype representations of event content items
Content item	Used in	~~Type~~ Datatype representation
name	PI, DT, ER	7.1.10 String
prefix	NS	7.1.10 String
local-element-ns	NS	7.1.2 Boolean
public	DT	7.1.10 String
qname	SE, AT	7.1.7 QName
system	DT	7.1.10 String
text	CM, PI	7.1.10 String
uri	NS	7.1.10 String
value	CH, AT	According to the schema ~~type~~ datatype (see 7. Representing Event Content ) if any is in effect, otherwise 7.1.10 String

~~Content items other than value have their inherent, fixed data types independent of their uses.~~ The ~~data type that governs~~ datatype representation used for each ~~occurrence of the~~ value content item depends on the schema ~~type~~ datatype ^XS2 if any that is in effect for ~~the~~ that value ~~in question.~~ . The ~~type xsd:anySimpleType~~ String datatype representation (see 7.1.10 String ) is used for ~~value~~ values s that do not have an associated ~~schema-type,~~ schema datatype, cannot be or are ~~schema-invalid,~~ opted not to be represented by their associated datatype representations, or occur in mixed content. Section 7. Representing Event Content describes how each of the types listed above are encoded in an EXI stream.

Editorial note

The syntax and semantics of the NS event ~~is so formulated in favor of simplicity in order not~~ are designed to ~~incur processing cost that would have otherwise be involved by such operations as sorting and conditional branching, yet that it keeps~~ minimize the ~~number of additional bits~~ overhead required ~~to achieve the functionality~~ for representing namespace prefixes in ~~overall~~ EXI streams ~~to the minimal with the observation that~~ without introducing significant complexity. In the ~~number of~~ common case where each namespace ~~declarations in an EXI stream~~ is ~~generally small.~~ bound to a single prefix, this design reduces the overhead for representing all element and attribute namespace prefixes to zero bits.

5. EXI Header

Each EXI stream begins with an EXI header. [Definition:] The EXI header can identify EXI streams, distinguish EXI streams from text XML documents, identify the version of the EXI format being used, and specify the options used to process the body of the EXI stream. The EXI header has the following structure:

[ EXI Cookie ]	Distinguishing Bits	Presence Bit	EXI Format	[ EXI Options ]	[Padding Bits]
		for EXI Options	Version

The EXI Options field within an EXI header is optional. Its presence is indicated by the value of the presence bit that follows Distinguishing Bits . The presence and absence is indicated by the value 1 and 0, respectively.

When ~~either~~ the compression option is ~~used,~~ true, or the alignment ~~used~~ option is ~~one of~~ byte-alignment or pre-compression ~~as dictated by EXI Options~~ , padding bits of minumum length required to make the whole length of the header byte-aligned are added at the end of the header. On the other hand, there are no padding bits when the alignment in use is bit-packed . The padding bits field if it is present can contain any values of bits as its contents.

The details of the EXI Cookie , Distinguishing Bits , EXI Format Version and EXI Options are described in the following sections.

5.1 EXI Cookie

[Definition:] An EXI header MAY start with an EXI Cookie , which is a four byte field that serves to indicate that the stream of which it is a part is an EXI stream. The four byte field consists of four characters " $ " , " E ", " X " and " I " in that order, each represented as an ASCII octet, as follows.

' $ '

' E '

' X '

' I '

This four byte sequence is particular to EXI and specific enough to distinguish EXI streams from a broad range of data types currently used on the Web. While the EXI cookie is optional, its use is RECOMMENDED in the EXI header when the EXI stream is exchanged in a context where a longer, more ~~solid~~ specific content-based datatype ~~identification~~ identifier is desired than ~~what is~~ that provided by the Distinguishing Bits , whose role is ~~rather~~ more narrowly focused on distinguishing EXI streams from XML documents.

5.2 Distinguishing Bits

[Definition:] The second part in the EXI header is the Distinguishing Bits , which is a two bit field of which the first bit contains the value 1 and the second bit contains the value 0, as follows.

Unlike the optional EXI cookie that MAY occur to precede this field, the presence of Distinguishing Bits is REQUIRED in the EXI header. It is used to distinguish EXI streams from text XML documents in the absence of an EXI cookie . This two bit sequence is the minimum that suffices to distinguish EXI streams from XML documents since it is the minimum length bit pattern that cannot occur as the first two bits of a well-formed XML document represented in any one of the conventional character encodings, such as UTF-8, UTF-16, UCS-2, UCS-4, EBCDIC, ISO 8859, Shift-JIS and EUC, according to XML ~~1.0~~ [XML 1.0] [XML 1.1] . Therefore, XML Processors that do not support EXI are expected to reject an EXI stream as early as they read and process the first byte from the stream.

Systems that use EXI streams as well as XML documents can reliably look at the Distinguishing Bits to determine whether to interpret a particular stream as XML or EXI.

5.3 EXI Format Version

[Definition:] The fourth part in the EXI header is the EXI Format Version , which identifies the version of the EXI format being used. EXI format version numbers are integers. Each version of the EXI Format Specification specifies the corresponding EXI format version number to be used by conforming implementations. The EXI format version number that corresponds with this version of the EXI format specification is ~~0 (zero).~~ 1 (one).

The first bit of the version field indicates whether the version is a preview or final version of the EXI format. A value of 0 indicates this is a final version and a value of 1 indicates this is a preview version. Final versions correspond to final, approved versions of the EXI format specification. An EXI processor that implements a final version of the EXI format specification is REQUIRED to process EXI streams that have a version field with its first bit set to 0 followed by a version number that corresponds to the version of the EXI specification the processor implements. The behavior of an EXI processor on an EXI stream with its first bit set to 0 followed by a version not corresponding to a version implemented by the processor is not constrained by this specification. For example, the EXI processor MAY reject such a stream outright or it MAY attempt to process the EXI body. Preview versions of the EXI format are useful for gaining implementation and deployment experience prior to finalizing a particular version of the EXI format. While preview versions may match drafts of this specification, they are not governed by this specification and the behaviour of EXI processors encountering preview versions of the EXI format is implementation dependent. Implementers are free to coordinate to achieve interoperability between different preview versions of the EXI format.

Following the first bit of the version is a sequence of one or more 4-bit unsigned integers representing the version number. The version number is determined by summing this sequence of 4-bit unsigned ~~values.~~ values and adding 1 (one). The sequence is terminated by any 4-bit unsigned integer with a value in the range 0-14. As such, the first 15 version numbers are represented by 4 bits, the next 15 are represented by 8 bits, etc.

Given an EXI stream with its stream cursor positioned just past the first bit of the EXI format version field, the EXI format version number can be computed by going through the following steps with version number initially set to 1.

Read next 4 bits as an unsigned integer value.
Add the value that was just read to the version number.
If the value is 15, go to step 1, otherwise (i.e. the value being in the range of 0-14), use the current value of the version number as the EXI version number.

The following are example EXI format version numbers.

Example 5-1. EXI Format Version Examples

EXI Format Version Field	Description
1 0000	Preview version 1
0 0000	Final version 1
0 1110	Final version 15
0 1111 0000	Final version 16
0 1111 0001	Final version 17

EXI processors conforming with the final version of this specification MUST use the 5-bit value 0 0000 as the version number.

5.4 EXI Options

[Definition:] The fifth part of the EXI header is the EXI Options , which provides a way to specify the options used to encode the body of the EXI stream . [Definition:] The EXI Options are represented as an EXI Options document , which is an XML document encoded using the EXI format described in this specification. This results in a very compact header format that can be read and written with very little additional software.

The presence of EXI Options in its entirety is optional in EXI header, and it is predicated on the value of the presence bit that follows the Distinguishing Bits . When EXI Options are present in the header, an EXI Processor MUST observe the specified options to process the EXI stream that follows. Otherwise, an EXI Procesor may obtain the EXI options using another mechanism. There are no fallback option values provided by this specification for use in the absence of the whole EXI Options part.

EXI processors MAY provide external means for applications or users to specify EXI Options when the EXI ~~header~~ Options document is absent. Such EXI processors are typically used in controlled systems where the knowledge about the effective EXI Options is shared prior to the exchange of EXI streams . The ~~mechanism~~ mechanisms to communicate out-of-bound EXI Options and their representation ~~used in such systems~~ are implementation dependent.

The following table describes the EXI options that may be specified in the ~~options field.~~ EXI Options document.

Table 5-1. EXI Options in Options ~~Field~~ Document
EXI Option	Description	Default Value
alignment	Alignment of event codes and content items	bit-packed
compression	EXI compression is used to achieve better compactness	false
strict	Strict interpretation of schemas is used to achieve better compactness	false
fragment	Body is encoded as an EXI fragment instead of an EXI document	false
preserve	Specifies whether comments, pis, etc. are preserved	all false
selfContained	Enables self-contained elements	false
schemaID	Identify the schema information, if any, used to encode the body	~~none~~ no default value
datatypeRepresentationMap	~~Identify~~ Specify alternate datatype representations ~~used to encode~~ for typed values in the EXI body	~~none~~ no default value
blockSize	Specifies the block size used for EXI compression	1,000,000
valueMaxLength	Specifies the maximum string length of value content items to be considered for addition to the string table.	unbounded
valuePartitionCapacity	Specifies the total capacity of value partitions in a string table	unbounded
[user ~~defined]~~ defined meta-data]	User defined ~~options~~ meta-data may be added	~~none~~ no default value

Appendix C XML Schema for EXI Options ~~Header~~ Document provides an XML Schema describing the EXI Options document . This schema is designed to produce smaller headers for option combinations used when compactness is critical.

The EXI Options document is ~~encoded~~ represented as an EXI body informed by the above mentioned schema using the default options specified by the following XML document. An EXI Options document consists only of an EXI body, and MUST NOT start with an EXI header. ~~<header xmlns="http://www.w3.org/2007/07/exi">~~

Header options used for encoding the EXI Options document

  <header xmlns="http://www.w3.org/2009/exi">
    <strict/>
  </header>

Note that this specification does not require EXI processors to read and process the schema prescribed for EXI options document ( C XML Schema for EXI Options Document ), in order to process EXI options documents. EXI processors MUST use the schema-informed grammars that stem from the schema in processing EXI options documents, beyond which there is no requirement as to the use of the schema, and implementations are free to use any methods to retrieve the instructions that observe the grammars for processing EXI options documents. Section 8.5 Schema-informed Grammars describes the system to derive schema-informed grammars from XML Schemas.

Below is a brief description of each EXI option.

[Definition:] The alignment option is used to control the alignment of event codes and content ~~items.~~ items . The value is one of bit-packed , byte-alignment or pre-compression , of which bit-packed is the default value assumed when the "alignment" element is absent in the EXI Options document . When the value of compression option is set to true, alignment of the ~~way event codes and associated contents are represented~~ EXI Body is governed by the ~~rule~~ rules specified in 9. EXI Compression instead of the alignment option ~~value, thus the compression option value "true" effectively rescinds the effect of an alignment option~~ value. The "alignment" element MUST NOT appear in an EXI options document when the "compression" element is present.

[Definition:] ~~Alignment~~ The alignment option value bit-packed indicates that the the event codes and associated content are packed in bits without any ~~paddings~~ padding in-between.

[Definition:] ~~Alignment~~ The alignment option value byte-alignment indicates that the event codes and associated content are aligned on byte boundaries. While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream.

[Definition:] ~~Alignment~~ The alignment option value pre-compression ~~alignment~~ indicates that all steps involved in compression (see section 9. EXI Compression ) are to be done with the exception of the final step of applying the DEFLATE algorithm. The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression.

[Definition:] The compression option is a Boolean used to increase compactness using additional computational resources. The default value "false" is assumed when the "compression" element is absent in the EXI Options document . When set to true, the event codes and associated content are compressed according to 9. EXI Compression regardless of the alignment option value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the "alignment" element is present.

[Definition:] The strict option is a Boolean used to increase compactness by using a strict interpretation of the schemas and omitting preservation of certain items, such as comments, processing instructions and namespace prefixes. The default value "false" is assumed when the "strict" element is absent in the EXI Options document . When set to true, those productions that have NS, CM, PI, ER ER, and SC ~~events~~ terminal symbols are ~~pruned~~ omitted from the EXI grammars, and schema-informed element and type grammars are restricted to only permit items declared in the schemas. Consequently, when the strict option is set to true, xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes are only permitted to appear in a schema-informed element where they match specific schema declarations (i.e., wildcards or ur-types). The "strict" element MUST NOT appear in an EXI options document when ~~the "preserve"~~ one of "dtd", "prefixes", "comments", "pis" or "selfContained" element is present in the same options document.

[Definition:] The fragment option is a Boolean that indicates whether the EXI body is an EXI document or an EXI fragment . When set to true, the EXI body is an EXI fragment . Otherwise, the EXI body is an EXI document . Unlike EXI documents, EXI fragments are capable of representing multiple elements at the root level. They are analogous in concept to external general parsed entities XML in XML in that they consist of a sequence of elements, processing instructions and comments in containers of their own that are physically separate from the documents in which they are to be used. An EXI fragment is formally defined in terms of its grammar in Sections 8.4.2 Built-in Fragment Grammar and 8.5.2 Schema-informed Fragment Grammar . The ~~XML Information Set an EXI stream~~ default value "false" is mapped onto contains a document information item if the stream represents an EXI document, otherwise, the XML Information Set does not have a document information item if the stream represents an EXI fragment. The order among elements, processing instructions and comments that appear at assumed when the ~~root in an EXI fragment~~ "fragment" element is ~~deemed significant and MUST be preserved by~~ absent in the EXI ~~processors~~ Options document .

[Definition:] The preserve option is a set of Booleans that can be set independently to control whether or how certain information items are preserved in the EXI stream. Section 6.3 Fidelity Options describes the set of information items ~~effected~~ affected by the preserve option. The ~~"preserve" element~~ elements "dtd", "prefixes", "comments" and "pis" MUST NOT appear in an EXI options document when the "strict" element is present in the same options document. The element "lexicalValues", on the other hand, is permitted to occur in the presence of "strict" element.

[Definition:] The selfContained option is a Boolean used to enable the use of ~~self contained~~ self-contained elements in the EXI stream. ~~Self contained~~ Self-contained elements may be read independently from the rest of the EXI body, allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in an EXI options document when ~~the "compression" or~~ one of "compression", "pre-compression" or "strict" elements are present in the same options document. The default value "false" is assumed when the "selfContained" element is abscent from the EXI Options document .

[Definition:] The schemaID option may be used to identify the schema information used ~~when encoding~~ for processing the EXI body. When the "schemaID" element in the EXI options document contains the xsi:nil ~~attribute,~~ attribute with its value set to true, no schema information ~~was~~ is used ~~when encoding~~ for processing the EXI body. When the value of the "schemaID" element is empty, no user defined schema information ~~was~~ is used ~~when encoding~~ for processing the EXI body; however, the built-in XML ~~Schema~~ schema types ~~may have been used with~~ are available for use in the ~~xsi:type attribute to specify element types.~~ EXI body. When the schemaID option is absent (i.e., undefined), no statement is made about the schema information used to encode the EXI body and this information MUST be communicated out of band. This specification does not dictate the syntax or semantics of other values specified in this field. An example schemaID scheme is the use of URI that is apt for globally identifying schema resources on the Web. The parties involved in the exchange are free to agree on the scheme of schemaID field that is appropriate for their use to uniquely identify the schema information.

[Definition:] The datatypeRepresentationMap option ~~, represented by a "datatypeRepresentationMap" element, identifies~~ specifies an alternate set of datatype representations ~~used to encode~~ for typed values in the EXI body as described in 7.4 Datatype Representation Map . When there are no "datatypeRepresentationMap" elements in the EXI Options document , no Datatype Representation Map is used for processing the EXI body. This option does not take effect when the value of the preserve.lexicalValues fidelity option is true (see 6.3 Fidelity Options ), or when the EXI stream is a schema-less EXI stream.

[Definition:] The blockSize option specifies the block size used for EXI compression. When the ~~blockSize option~~ "blockSize" element is ~~absent,~~ absent in the EXI Options document , the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory.

[Definition:] The valueMaxLength option specifies the maximum length of ~~string values representing value content items to be considered for addition to the string table. When the valueMaxLength option is absent, the maximum length is unbounded. String values representing~~ value content items ~~that have length larger than~~ to be considered for addition to the ~~valueMaxLength option~~ string table. The default value ~~are excluded from further consideration on account of~~ "unbounded" is assumed when the "valueMaxLength" element is absent in the ~~valuePartitionCapacity~~ EXI Options document ~~for addition to the string table.~~ .

[Definition:] The valuePartitionCapacity option specifies the ~~total capacity of the global and all local value partitions of a string table, where the measurement unit of the capacity is the~~ maximum number of ~~unique enitiries. When the valuePartitionCapacity option is absent, an unbounded capacity is assumed. A string representing a~~ value content ~~item that has length smaller than or equal to~~ items in the string table at any given time. The default value "unbounded" is assumed when the "valuePartitionCapacity" element is absent in the ~~valueMaxLength~~ EXI Options document . Section 7.3.3 Partitions Optimized for Frequent use of String Literals ~~option value and is not found in the value partitions at~~ specifies the ~~time~~ behavior of the ~~value occurrence is to be added into the~~ string table ~~only~~ when ~~doing so would not cause the number of unique values in value partitions to exceed the capacity.~~ this capacity is reached.

[Definition:] The user defined meta-data conveys auxillary information that applications may use to facilitate interpretation of ~~valuePartitionCapacity option value and~~ the EXI stream. . The user defined meta-data MUST NOT be interpreted in a way that alters or extends the ~~number of unique values are counted for value partitions are described~~ EXI data format defined in this specification. User defined meta-data may be added to an EXI Options document just prior to the ~~7.3.1 String Table Partitions~~ . alignment option.

6. Encoding EXI Streams

The rules for encoding a series of events as an EXI stream are very simple and are driven by a declarative set of grammars that describes the structure of an EXI ~~stream.~~ stream . Every event in the stream is encoded using the same set of encoding rules, which are summarized as follows:

Get the next event data to be encoded
If fidelity options (see 6.3 Fidelity Options ) indicate this event type is not processed, go to step 1
Use the grammars to determine the event code of the event
Encode the event code followed by the event content (see Table 4-1 )
Evaluate the grammar production matched by the event
Repeat until the End Document (ED) event is encoded

Self-contained (SC), namespace (NS) and attribute (AT) events associated with a given element occur directly after the start element (SE) event in the following order:

...

AT (xsi:type)

AT (xsi:nil)

...

Namespace (NS) events occur in document order. ~~AT(xsi:type)~~ The xsi:type and ~~AT(xsi:nil)~~ xsi:nil attributes occur before all other AT events. In When the grammar currently in effect for the element is either a ~~schema-less EXI stream~~ built-in element grammar , (see 8.4.3 Built-in Element Grammar ) or a schema-informed element fragment grammar (see 8.5.3 Schema-informed Element Fragment Grammar ), the remaining attribute (AT) events can occur in any order. In Otherwise, when the grammar in effect is a schema-informed ~~EXI stream~~ element grammar , or a schema-informed type grammar (see 8.5.4 Schema-informed Element and Type Grammars ), the remaining ~~attribute (AT) events~~ attributes can occur in ~~lexical~~ any order that is permitted by the grammar, though in practice they SHOULD occur in lexicographical order sorted first by qname 's local-name then by qname ~~'s URI.~~ uri for achieving better compactness, where a qname is a qname of an attribute.

EXI uses the same simple procedure described above, to encode well-formed documents, document fragments, schema-valid information items, schema-invalid information items, information items partially described by schemas and information items with no schema at all. Only the grammars that describe these items differ. For example, an element with no schema information is encoded according to the XML grammar defined by the XML specification, while an element with schema information is encoded according to the more specific grammar defined by that schema.

[Definition:] An event code is a sequence of 1 to 3 non-negative integers called ~~parts. Each production~~ parts used to identify each event in ~~a grammar has~~ an ~~event~~ EXI stream. The EXI grammars describe which events may occur at each point in an EXI stream and associate an even code ~~that distinguishes its event from that of other productions that share the same left-hand-side non-terminal symbol.~~ with each one. (See 8.2 Grammar Event Codes for more description of event codes.)

Section 6.1 Determining Event Codes describes in detail how the grammar is used to determine the event code of an ~~event.~~ event . Section 6.2 Representing Event Codes describes in detail how event codes are represented as bits. Section 6.3 Fidelity Options describes available fidelity options and how they ~~effect~~ affect the EXI stream. Section 7. Representing Event Content describes how the typed event contents are represented as bits.

6.1 Determining Event Codes

The structure of an EXI stream is described by the EXI grammars, which are formally specified in section 8. EXI Grammars . Each grammar defines which events are permitted to occur at any given point in the EXI stream and provides a pre-assigned event code for each ~~event.~~ one.

For example, the grammar productions below describe the events that can occur in a schema-informed EXI stream after the Start-Document (SD) event provided there are four global elements defined in the schema and ~~provide~~ assign an event code for each ~~event:~~ one. See 8.5.1 Schema-informed Document Grammar for the process used for generating the grammar productions below from the schema.

Example 6-1. Example productions with event codes


~~Syntax~~ Syntax			Event Code
	DocContent
		SE ("A") DocEnd	0
		SE ("B") DocEnd	1
		SE ("C") DocEnd	2
		SE ("D") DocEnd	3
		~~SE ()~~ SE () DocEnd	~~4.0~~ 4
		DT DocContent	~~4.1 CH DocContent 4.2~~ 5.0
		CM DocContent	~~4.3.0~~ 5.1.0
		PI DocContent	~~4.3.1~~ 5.1.1

At the point in an EXI stream where the above grammar productions are in effect, the event code of Start Element "A" (i.e. SE("A")) is 0. The event code of a DOCTYPE (DT) event at this point in the stream is ~~4.1,~~ 5.0, and so on.

6.2 Representing Event Codes

Each event code is represented by a sequence of 1 to 3 parts that uniquely identify an event. Event code parts are encoded in order starting with the first part followed by subsequent parts.

~~When the value~~ The i -th part of ~~compression option is false, and~~ an ~~bit-packed~~ event code alignment option is used for the current processing of the stream, the i th part of an event code is encoded using the minimum number of bits required to distinguish it from the i th part of the other sibling event codes in the current grammar. Specifically, the i th part of an event code is encoded as an n -bit unsigned integer ( 7.1.9 n-bit Unsigned Integer ), ~~of which~~ where n is ~~⌈ log~~ ⌈ log _2

2 m ~~⌉ where~~ ⌉ and m is the number of distinct values used as the i th -th part of its own and all its sibling event codes in the current grammar. Two event codes are siblings at the i th -th part if and only if they share the same values in all preceding parts. All event codes are siblings at the first part.

~~On the other hand, when the~~ If there is only one distinct value ~~of compression option~~ for a given part, the part is ~~true, or either byte-alignment or~~ omitted (i.e., encoded in log ₂ 1 = 0 bits = 0 bytes).

For example, the eight ~~pre-compression~~ event codes ~~alignment option is used,~~ shown in the i DocContent ~~th part of an event code is encoded using~~ grammar above have values ranging from 0 to 5 for the ~~minimum number of bytes instead of bits required~~ first part. Six distinct values are needed to ~~distinguish it from~~ identify the ~~i th~~ first part of ~~the other sibling~~ these event codes in . Therefore, the ~~current grammar. Each~~ first part is can be encoded as an n -bit unsigned integer ~~( 7.1.9 n-bit Unsigned Integer ), of which~~ where n ~~is ⌈ log~~ = ⌈ log _2

2 ~~m ⌉ where m is~~ 6 ⌉ = 3. In the ~~number of distinct values used as~~ same fashion, the ~~i th part of its own~~ second and ~~all its sibling event codes in the current grammar. The number of bytes used for the~~ third part (if present) are represented as n -bit unsigned ~~integer representation in this case is equal to ⌈~~ integers where n ~~/ 8 ⌉.~~ is ⌈ log ₂ 2 ⌉ = 1 and ⌈ log ₂ 2 ⌉ = 1 respectively.

~~Regardless of~~ When the ~~values~~ value of the compression option is false and ~~alignment option~~ bit-packed ~~, if there is only one distinct value for a given part, the part~~ alignment is ~~omitted (i.e., encoded in log 2 1 = 0 bits = 0 bytes). For example,~~ used, n -bit unsigned integers are represented using n bits. The first table below illustrates how the ~~nine~~ event codes ~~shown in~~ of each event matched by the DocContent grammar above ~~have a value ranging from 0 to 4 for their first part. There~~ are ~~five distinct values needed to identify the first part of these event codes. Therefore, when EXI compression and alignment are not in effect, the first part can be encoded~~ represented in ~~⌈ log 2 5 ⌉ = 3 bits. In the same fashion,~~ this case.

When the ~~number~~ value of ~~bits used for encoding second and third part (if present) are calculated as ⌈ log 2 4 ⌉ = 2 bits and ⌈ log 2 2 ⌉ = 1 bits, respectively. On the other hand, when EXI~~ compression option is true, or either byte-alignment or pre-compression alignment option is ~~in effect,~~ used, n -bit unsigned integers are represented using the minimum number of bytes ~~used for each part is ⌈ 3 / 8 ⌉ = 1 bytes for the first part, ⌈ 2 / 8 ⌉ = 1 bytes for the second part and ⌈ 1 / 8 ⌉ = 1 bytes for the third part.~~ required to store n bits. The second table below illustrates how the event codes of each event in matched by the DocContent grammar above ~~is encoded.~~ are represented in this case.

Example 6-2. Example event code encoding

Table 6-1. Example event code encoding when EXI compression is not in effect and bit-packed alignment option is used
Event	Part values			Event Code Encoding	# bits
SE ("A")	0			000	3
SE ("B")	1			001	3
SE ("C")	2			010	3
SE ("D")	3			011	3
~~SE ()~~ SE ()	4	0		~~100 00~~ 100	5 3
DT	~~4 1 100 01~~ 5 ~~CH 4~~	2 0		~~100 10~~ 101 0	5 4
CM	4 5	3 1	0	~~100 11 0~~ 101 1 0	6 5
PI	4 5	3 1	1	~~100 11 1~~ 101 1 1	6 5

# distinct values ( m )

5 6

4 2

# bits per part

⌈ log ₂ m ⌉

2 1

Table 6-2. Example event code encoding when EXI compression is in effect, or either byte-alignment or pre-compression alignment option is used
Event	Part values			Event Code Encoding	# bytes
SE ("A")	0			00000000	1
SE ("B")	1			00000001	1
SE ("C")	2			00000010	1
SE ("D")	3			00000011	1
~~SE ()~~ SE ()	4	0	~~00000100 00000000 2 DT 4 1~~	~~00000100 00000001~~ 00000100	2 1
CH DT	4 5	2 0		~~00000100 00000010~~ 00000101 00000000	2
CM	4 5	3 1	0	~~00000100 00000011 00000000~~ 00000101 00000001 00000000	3
PI	4 5	3 1	1	~~00000100 00000011 00000001~~ 00000101 00000001 00000001	3

# distinct values ( m )

5 6

4 2

# bytes per part

⌈ (log ₂ m ) / 8 ⌉

6.3 Fidelity Options

Some XML applications do not require the entire XML feature set and would prefer to eliminate the overhead associated with unused features. For example, the SOAP 1.2 specification [SOAP 1.2] prohibits the use of XML ~~processing-instructions.~~ processing instructions. In addition, there are many data-exchange use cases that do not require XML comments or DTDs.

~~Applications can use~~ The preserve option in EXI Options comprises a set of fidelity ~~options~~ options, each of which independently controls the preservation (or preservation level) of a certain type of information item. Applications can use the preserve option to specify the ~~XML features~~ set of fidelity options they require. As specified in section 8.3 Pruning Unneeded Productions , EXI processors MUST use these fidelity options to prune productions that match the associated events ~~that are not required~~ from the grammars, improving compactness and processing efficiency.

The table below lists the fidelity options supported by this version of the EXI specification and describes the effect setting these options has on the EXI ~~stream.~~ stream .

Table 6-3. Fidelity options
Fidelity option	Effect
Preserve.comments	CM events are preserved
Preserve.pis	PI events are preserved
Preserve.dtd	DOCTYPE and ER events are preserved
Preserve.prefixes	NS events and namespace prefixes are preserved
Preserve.lexicalValues	Lexical form of element and attribute values is preserved in value content items

When qualified names ^NS are used in the value s of AT or CH events in an EXI Stream, the Preserve.prefixes fidelity option SHOULD be turned on to preserve the NS prefix declarations used by these values. See section 4. EXI Streams for the definition of EXI event types and their associated content items .

7. Representing Event Content

The ~~content~~ event code of each event in an EXI body is represented as a sequence of n -bit unsigned integers ( 7.1.9 n-bit Unsigned Integer ). See section 6.2 Representing Event Codes for the description of the event code representation. The content items of an event are represented according to ~~its type~~ their datatype representations (see Table 4-2 ). In the absence of ~~external type information,~~ an associated datatype representation, attribute and character values are ~~typed~~ represented as ~~String.~~ String ( 7.1.10 String ).

[Definition:] EXI defines a minimal set of datatype representations called Built-in EXI datatype representations that define how ~~values~~ content items as well as the parts of an event code are represented in EXI streams. When the preserve.lexicalValues option is false, values are represented ~~according to their schema datatypes per Table 7-1 below~~ using built-in EXI datatype representations ~~as described in~~ (see 7.1 Built-in EXI Datatype Representations . ) or user-defined datatype representations (see 7.4 Datatype Representation Map ) associated with the schema datatypes ^XS2. Otherwise, values are represented as Strings with restricted character sets (see Table 7-2 below). The following table lists the built-in EXI datatype representations, associated ~~type~~ EXI datatype identifiers and the XML Schema ~~Language [XML Schema Datatypes]~~ built-in datatypes ^XS2 each EXI datatype representation is used to represent by default.

~~XS2~~ ~~facets, use~~

Table 7-1. Built-in EXI Datatypes
Built-in EXI Datatype Representation	EXI Datatype ID	XML Schema Datatypes ~~XS2~~
Binary	~~xsd:base64Binary~~ exi:base64Binary	base64Binary
Binary	~~xsd:hexBinary~~ exi:hexBinary	hexBinary
Boolean	~~xsd:boolean~~ exi:boolean	boolean
Date-Time	~~xsd:dateTime~~ exi:dateTime	dateTime ,
	exi:time	time ,
	exi:date	date ,
	exi:gYearMonth	gYearMonth ,
	exi:gYear	gYear ,
	exi:gMonthDay	gMonthDay ,
	exi:gDay	gDay ,
	exi:gMonth	gMonth
Decimal	~~xsd:decimal~~ exi:decimal	decimal
Float	~~xsd:double~~ exi:double	float , double
~~n-bit Unsigned~~ Integer ~~xsd:integer~~	exi:integer	integer ~~, the representation of which depends on the facet~~
String	exi:string	~~values as follows. When the bounded range of~~ ~~integer~~ string , anySimpleType , anyURI , duration , QName ~~is 4095 or smaller as determined by the~~ (except for values of ~~minInclusive XS2 , minExclusive XS2 , maxInclusive XS2 and maxExclusive~~ xsi:type attribute), Notation , All types derived by union ~~XS2~~
n-bit Unsigned Integer ~~representation. Otherwise, when the integer satisfies one of the followings, use Unsigned~~		Used by Integer ~~representation. It is~~ datatype representation for some bounded ~~nonNegativeInteger . Either minInclusive XS2 facet is specified with a value equal to or greater than 0, or minExclusive XS2 facet is specified with a value equal to or greater than -1. Otherwise, use~~ integers (see 7.1.5 Integer ~~representation.~~ )
Unsigned Integer		Used by Integer ~~String xsd:string string , anySimpleType , anyURI , duration , All types derived by~~ datatype representation for unsigned ~~union~~ integers (see 7.1.5 Integer )
List		All types derived by list , including IDREFS and ENTITIES
QName		QName but only for the value of xsi:type attributes

By default, datatypes derived from the XML Schema datatypes above are also represented according to the associated built-in EXI datatype representation . When there are more than one XML Schema datatypes above from which a datatype is derived directly or indirectly, the closest ancestor is used to determine the built-in EXI datatype representation . For example, a value of XML Schema datatype xsd:int is represented according to the same built-in EXI datatype representation as a value of XML Schema datatype xsd:integer. Although xsd:int is derived indirectly from xsd:integer and also further from xsd:decimal, a value of xsd:int is processed as an instance of xsd:integer because xsd:integer is closer to xsd:int than xsd:decimal is in the datatype inheritance hierarchy.

Each EXI datatype identifier above is a qname ~~. Datatype identifiers~~ that uniquely ~~identify~~ identifies one of the built-in EXI datatype representations. ~~They~~ EXI datatype identifiers are used by in Datatype Representation ~~Map~~ Maps to ~~designate XML Schema datatypes to built-in EXI~~ change the datatype representations used for specific schema datatypes ^{different
from
the
ones
that
are
associated
by
default.
Not
all

XS2} and their sub-types. Only built-in EXI datatype representations ~~are assigned datatype identifiers. Only those~~ that have been assigned identifiers are usable by in Datatype Representation ~~Map~~ Maps ~~for designating alternative representations.~~ .

When the preserve.lexicalValues option is true, all values are represented as Strings. ~~Some~~ The table below defines restricted character sets for several of the built-in EXI datatypes. Each ~~values~~ value that would have otherwise been ~~designated to certain built-in EXI datatype representations are~~ represented by one of these datatypes is instead represented as ~~Strings~~ a String with the associated restricted character ~~sets as defined by~~ set, regardless of the ~~table below.~~ actual pattern facets, if any, specified in the definitions of the schema datatypes.

Table 7-2. Restricted Character Sets for Built-in EXI Datatype Representations
EXI Datatype ID	Restricted Character Set
~~xsd:base64Binary~~ exi:base64Binary	{ #x9, #xA, #xD, #x20, +, /, [0-9], =, [A-Z], [a-z] }
~~xsd:hexBinary~~ exi:hexBinary	{ #x9, #xA, #xD, #x20, [0-9], [A-F], [a-f] }
~~xsd:boolean~~ exi:boolean	{ #x9, #xA, #xD, #x20, 0, 1, a, e, f, l, r, s, t, u }
~~xsd:dateTime~~ exi:dateTime	{ #x9, #xA, #xD, #x20, +, -, ., [0-9], :, T, Z }
~~xsd:decimal~~ exi:time
exi:date
exi:gYearMonth
exi:gYear
exi:gMonthDay
exi:gDay
exi:gMonth
exi:decimal	{ #x9, #xA, #xD, #x20, +, -, ., [0-9] }
~~xsd:double~~ exi:double	{ #x9, #xA, #xD, #x20, +, -, ., [0-9], E, F, I, N, a, e }
~~xsd:integer~~ exi:integer	{ #x9, #xA, #xD, #x20, +, -, [0-9] }

The restricted character set for the EXI List datatype representation is ~~determined by~~ the restricted character set of the EXI datatype representation of the ~~values in the List.~~ List item type.

The rules used to represent values of String depend on the content items to which the values belong. There are certain content items whose value representation involve the use of string tables while other content items are represented using the encoding rule described in 7.1.10 String without involvement of string tables. The content items that use string tables and how each of such content items uses string tables to represent their values are described in 7.3 String Table .

Schemas can provide one or more enumerated values for ~~types.~~ datatypes. When the preserve.lexicalValues option is false, EXI exploits those pre-defined values when they are available to represent values of such ~~types~~ datatypes in a more efficient manner than it would have done otherwise without using ~~built-in EXI datatypes.~~ pre-defined values. The encoding rule for representing ~~a type of~~ enumerated values is described in 7.2 Enumerations . ~~Types~~ Datatypes that are derived from ~~other types~~ another by union and their subtypes are always represented as String regardless of the availability of enumerated values. Representation of values of which the ~~schema type~~ datatype is one of QName, Notation or a ~~type~~ datatype derived therefrom by restriction are also not affected by enumerated values if any.

7.1 Built-in EXI Datatype Representations

The following sections describe the ~~encoding rules of~~ built-in EXI datatype representations used for representing event codes ~~values~~ and content items in EXI streams. Unless otherwise stated, individual items in an EXI stream are packed into bytes most significant bit first.

7.1.1 Binary

~~Values typed as~~ The Binary ~~are represented as~~ datatype representation is a length-prefixed sequence of octets representing the binary content. The length is represented as an Unsigned Integer (see 7.1.6 Unsigned Integer ).

7.1.2 Boolean

In the absence of pattern facets in the schema datatype, ~~values typed as~~ the Boolean ~~are represented as~~ datatype representation is a n -bit unsigned integer ( 7.1.9 n-bit Unsigned Integer ), where n is one ~~(1) and the~~ (1). The value zero (0) represents false and the value one (1) represents true.

Otherwise, when pattern facets are available in the schema datatype, the Boolean datatype representation is ~~able to distinguish values not only arithmetically (0 or 1) but also between lexical variances ("0", "1", "false" and "true"), and values typed as Boolean are represented as~~ a n -bit unsigned integer ( 7.1.9 n-bit Unsigned Integer ), where n is two (2) and the ~~value~~ values zero (0), one (1), two (2) and three (3) ~~each represents value~~ represent the values "false", "0", "true" and ~~"1".~~ "1" respectively.

7.1.3 Decimal

~~Values typed as~~ The Decimal ~~are represented as~~ datatype representation is a Boolean sign (see 7.1.2 Boolean ) followed by two Unsigned Integers (see 7.1.6 Unsigned Integer ). A sign value of zero (0) is used to represent positive Decimal values and a sign value of one (1) is used to represent negative Decimal values. The first Unsigned Integer represents the integral portion of the Decimal value. The second Unsigned Integer represents the fractional portion of the Decimal value with the digits in reverse order to preserve leading zeros.

7.1.4 Float

~~Values typed as~~ The Float ~~are represented as~~ datatype representation is two consecutive Integers (see 7.1.5 Integer ). The first Integer represents the mantissa of the floating point number and the second Integer represents the base-10 exponent of the floating point number. The range of the mantissa is - (2 ⁶³ ) to 2 ⁶³ -1 and the range of the exponent is - (2 ¹⁴ -1) to 2 ¹⁴ -1. Values typed as Float with a mantissa or exponent outside the accepted range are represented as ~~schema-invalid~~ untyped values.

The exponent value -(2 ¹⁴ ) is used to indicate one of the special values: infinity, negative infinity and not-a-number (NaN). An exponent value -(2 ¹⁴ ) with mantissa values 1 and -1 represents positive infinity (INF) and negative infinity (-INF) respectively. An exponent value -(2 ¹⁴ ) with any other mantissa value represents NaN.

~~A value represented as~~ The Float datatype representation can be decoded by going through the following steps.

Retrieve the mantissa value using the procedure described in 7.1.5 Integer .
Retrieve the exponent value using the procedure described in 7.1.5 Integer .
If the exponent value is -(2 ¹⁴ ), the mantissa value 1 represents INF, the mantissa value -1 represents -INF and any other mantissa value represents NaN. If the exponent value is not -(2 ¹⁴ ), the float value is m × 10 ^e where m is the mantissa and e is the exponent obtained in the preceding steps.

7.1.5 Integer

The Integer ~~type~~ datatype representation supports signed integer numbers of arbitrary magnitude. ~~Values typed~~ The specific representation used depends on the facet ^XS2 values of the associated schema datatype as follows.

If the bounded range of the associated schema datatype is 4096 or smaller as determined by the values of minInclusive ^XS2, minExclusive ^XS2, maxInclusive ^XS2 and maxExclusive ^XS2 facets, the value is represented as an n-bit Unsigned Integer ~~are~~ where n is ⌈ log ₂ m ⌉ and m is the bounded range of the schema datatype.

Otherwise, if the minInclusive ^XS2 or minExclusive ^XS2 facets specify a lower bound greater than or equal to zero (0), the value is represented as an Unsigned Integer .

Otherwise, the value is represented as a Boolean sign (see 7.1.2 Boolean ) followed by an Unsigned Integer (see 7.1.6 Unsigned Integer ). A sign value of zero (0) is used to represent positive integers and a sign value of one (1) is used to represent negative integers. For non-negative values, the Unsigned Integer holds the magnitude of the value. For negative values, the Unsigned Integer holds the magnitude of the value minus 1.

7.1.6 Unsigned Integer

The Unsigned Integer ~~type~~ datatype representation supports unsigned integer numbers of arbitrary magnitude. ~~Values typed as Unsigned Integer are~~ It is represented ~~using~~ as a sequence of ~~octets. The sequence is~~ octets terminated by an octet with its most significant bit set to 0. The value of the unsigned integer is stored in the least significant 7 bits of the octets as a sequence of 7-bit bytes, with the least significant byte first.

EXI processors SHOULD support arbitrarily large Unsigned Integer values. EXI processors MUST support Unsigned Integer values less than ~~4294967296.~~ 2147483648.

~~A value represented as~~ The Unsigned Integer datatype representation can be decoded by going through the following steps.

Example 7-1. Example algorithm for decoding an Unsigned Integer

Start with the initial value set to 0 and the initial multiplier set to 1.
Read the next octet.
Multiply the value of the unsigned number represented by the 7 least significant bits of the octet by the current multiplier and add the result to the current value.
Multiply the multiplier by 128.
If the most significant bit of the octet was 1, go back to step 2.

7.1.7 QName

~~Values of type~~ The QName ~~are encoded as~~ datatype representation is a sequence of values representing the URI, local-name and prefix components of the QName in that order, where the prefix component is present only when the preserve.prefixes option is set to true.

When the QName value is specified by a schema-informed grammar using the ~~SE(~~ SE ( qname ) or ~~AT(~~ AT ( qname ) terminal symbols, URI and local-name are implicit and are omitted. Similarly, when the URI of the QName value is derived from a schema-informed grammar using ~~SE(~~ SE ( uri : *) or AT ( uri : *) terminal symbols, URI is implicit thus omitted in the representation, and only the local-name component is encoded as a String (see 7.1.10 String ). Otherwise, URI and local-name components are encoded as Strings. If the QName is in no namespace, the URI is represented by a zero length String.

When present, prefixes are represented as n -bit unsigned integers ( 7.1.9 n-bit Unsigned Integer ), where n is ~~log~~ ⌈ log ₂ ( N ) ) ⌉ and N is the number of ~~unique~~ prefix es ~~specified for~~ in the prefix string table partition associated with the URI of the QName ~~by preceding NS events~~ or one (1) if there are no prefixes in this partition. If the ~~EXI stream. Each unique~~ given prefix ~~is assigned a unique n -bit integer (0 ... N -1) according to the order~~ exists in ~~which~~ the associated ~~NS event occurs in~~ prefix string table partition, it is represented using the ~~EXI stream.~~ compact identifier assiged by the partition. If ~~there are no~~ the given prefix ~~es specified for~~ does not exist in the ~~URI of~~ associated partition, the QName MUST be part of an SE event and the prefix MUST be resolved by ~~preceding~~ one of the NS events in immediately following the ~~EXI stream,~~ SE event (see resolution rules below). In this case, the unresolved prefix representation is ~~undefined. An undefined~~ not used and can be zero (0) or the compact identifer of any prefix in the associated partition.

Note:

When N is one, the prefix is represented using zero bits ~~(i.e.,~~ (i.e. omitted).

Given ~~either~~ a n -bit unsigned integer m that represents either the prefix value or an ~~undefined prefix,~~ unresolved prefix value, the effective prefix value is determined by following the rules described below in order. A QName is in error if ~~it has an undefined~~ its prefix ~~that~~ cannot be resolved by the rules below.

If the prefix ~~is defined, select~~ string table partition associated with the URI of the QName assigns the compact identifier m ~~-th~~ to a prefix value, select this prefix value ~~associated with the URI of the QName~~ as the candidate prefix value. Otherwise, there is no candidate prefix value.
If the QName value is part of an SE event followed by an associated NS event with a its local-element-ns flag value ~~being~~ set to true, the prefix value is the prefix of ~~such~~ this NS event. Otherwise, the prefix value is the candidate value, if any, selected in step 1 above.

7.1.8 Date-Time

~~Values typed as~~ The Date-Time ~~are encoded as~~ datatype representation is a sequence of values representing the individual components of the Date-Time. The following table specifies each of the possible date-time components along with how they are encoded.

Table 7-3. Date-Time components
Component	Value	Type
Year	Offset from 2000	Integer ( 7.1.5 Integer )
MonthDay	Month * 32 + Day	9-bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer ) where day is a value in the range 1-31 and month is a value in the range 1-12.
Time	((Hour * ~~60)~~ 64) + Minutes) * 60 64 + seconds	17-bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer )
FractionalSecs	Fractional seconds	Unsigned Integer ( 7.1.6 Unsigned Integer ) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros
TimeZone	TZHours * 60 + TZMinutes	11-bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer ) representing a signed integer offset by 840 ( = 14 * 60 )
presence	Boolean presence indicator	Boolean ( 7.1.2 Boolean )

The variety of components that constitute a value and their appearance order depend on the XML Schema type associated with the value. The following table shows which components are included in a value of each XML Schema type that is relevant to Date-Time datatype. Items listed in square brackets are included if and only if the value of its preceding presence indicator (specified above) is set to true.

Table 7-4. Assortment of Date-Time components
XML Schema Datatype	Included Components
gYear ^XS2	Year, presence, [TimeZone]
gYearMonth ^XS2	Year, MonthDay, presence, [TimeZone]
date ^XS2	Year, MonthDay, presence, [TimeZone]
dateTime ^XS2	Year, MonthDay, Time, presence, [FractionalSecs], presence, [TimeZone]
gMonth ^XS2	MonthDay, presence, [TimeZone]
gMonthDay ^XS2
gDay ^XS2
time ^XS2	Time, presence, [FractionalSecs], presence, [TimeZone]

7.1.9 n -bit Unsigned Integer

When the value of the compression option is false and the ~~value~~ bit-packed ~~is used for~~ alignment ~~options , values of type~~ is used, the n -bit Unsigned Integer ~~are represented as~~ datatype representation is an unsigned binary integer using n bits. Otherwise, ~~they are represented as~~ it is an unsigned integer using the minimum number of bytes required to store n bits. Bytes are ordered with the least significant byte first.

The n -bit unsigned integer is used for ~~encoding~~ representing event codes , the prefix component of ~~QName~~ QNames (see 7.1.7 QName ) ~~as well as~~ and certain value content items, as described in respective relevant parts of this document. As ~~shown~~ described in ~~table Table 7-1~~ section 7.1.5 Integer , integers with a bounded range size m equal to ~~4095~~ 4096 or smaller are ~~encoded using~~ represented as n -bit unsigned ~~integer~~ integers with n being ⌈ log ₂ m ⌉, as an offset from the minimum value in the range.

7.1.10 String

~~Values of type~~ The String ~~are represented as~~ datatype representation is a length prefixed sequence of characters. The length indicates the number of characters in the string and is represented as an Unsigned Integer (see 7.1.6 Unsigned Integer ). If a restricted character set is defined for the string (see 7.1.10.1 Restricted Character Sets ), each character is represented as an n -bit Unsigned Integer (see 7.1.9 n-bit Unsigned Integer ). Otherwise, each character is represented by its UCS [ISO/IEC 10646] code point encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer ).

EXI uses a string table to represent certain content items more efficiently. Section 7.3 String Table describes the string table and how it is applied to different content items.

7.1.10.1 Restricted Character Sets

If a string value is associated with a schema datatype ^XS2 derived from xsd:string and one or more of the datatypes in its datatype hierarchy has one or more pattern facets, there may be a restricted character set defined for the string value. The following steps are used to determine the restricted character set, if any, defined for a given string value associated with such a schema datatype.

~~First, determine~~ Given the ~~character set for each datatype in~~ schema datatype, let the target datatype ~~hierarchy~~ definition be the definition of the ~~string value~~ most-derived datatype that has one or more pattern facets immediately specified in its definition in the schema among those in the datatype inheritance hierarchy that traces backwards toward primitive datatypes ^XS2 starting from the datatype. If the target datatype definition is a definition for a built-in datatype ^XS2, there is no restricted character set for the string value.Otherwise, determine the set of characters for each immediate pattern facet of the target datatype definition according to section E Deriving ~~Character Sets~~ Set of Characters from XML Schema Regular Expressions . ~~For each datatype with more than one pattern facet,~~ Then, compute the restricted ~~character~~ set ~~based on the union~~ of ~~the regular expressions specified by its pattern facets. If the restricted character set for a datatype contains at least 255~~ characters ~~or contains non-BMP characters, the character set of the datatype is not restricted and can be omitted from further consideration. Then, compute the restricted character set~~ for the string value as the ~~intersection~~ union of all the ~~character~~ sets of characters computed ~~above.~~ in the previous step. If the resulting ~~character~~ set of characters contains less than ~~255~~ 256 characters and contains only BMP characters, the string value has a restricted character set and each character is represented using an n -bit Unsigned Integer (see 7.1.9 n-bit Unsigned Integer ), where n is ⌈ log ₂ ( N + 1) ⌉ and N is the number of characters in the restricted character set.

The characters in the restricted character set are sorted by UCS [ISO/IEC 10646] code point and represented by integer values in the range (0 ... N ~~-1)~~ −1) according to their ordinal position in the set. Characters that are not in this set are represented by the ~~integer~~ n -bit Unsigned Integer N followed by the UCS code point of the character represented as an Unsigned Integer.

The figure below illustrates an overview of the process for determining and using restricted character sets described in this section.

Figure 7-1. String Processing Model

7.1.11 List

Values of type List are encoded as a length prefixed sequence of values. The length is encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer ) and each value is encoded according to its type (see 7. Representing Event Content ).

7.2 Enumerations

~~Values of~~ When the preserve.lexicalValues option is false, enumerated ~~types~~ values are encoded as n -bit Unsigned Integers ( 7.1.9 n-bit Unsigned Integer ) where n = ⌈ log ₂ m ⌉ and m is the number of items in the enumerated type. The value assigned to each item corresponds to its ordinal position in the enumeration in schema-order starting with position zero (0).

Exceptions are for schema types derived from others by union and their subtypes, QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.

7.3 String Table

EXI uses a string table to assign "compact identifiers" to some string values. Occurrences of string values found in the string table are represented using the associated compact identifier rather than encoding the entire "string literal". The string table is initially pre-populated with string values that are likely to occur in certain contexts and is dynamically expanded to include additional string values encountered in the document. The following content items are encoded using a string table:

uris
prefixes
uri and local-name in qnames
values

When a string value is found in the string table, the value is encoded using the compact identifier and no changes are made to the string table as a result. When a string value is not found in the string table, its string literal is encoded as a String without using a compact identifier, only after which the string table is augmented by including the string value with an assigned compact identifier unless the string value represents a value content item and fails to satisfy the criteria in effect by virtue of valuePartitionCapacity and valueMaxLength options .

The string table is divided into partitions and each partition is optimized for more frequent use of either compact identifiers or string literals depending on the purpose of the partition. Section 7.3.1 String Table Partitions describes how EXI string table is partitioned. Section 7.3.2 Partitions Optimized for Frequent use of Compact Identifiers describes how string values are encoded when the associated partition is optimized for more frequent use of compact identifiers. Section 7.3.3 Partitions Optimized for Frequent use of String Literals describes how string values are encoded when the associated partition is optimized for more frequent use of string literals.

The life cycle of a string table spans the processing of a single EXI stream. String tables are not represented in an EXI stream or exchanged between EXI processors. A string table cannot be reused across multiple EXI streams; therefore, EXI processors MUST use a string table that is equivalent to the one that would have been newly created and pre-populated with initial values for processing each EXI stream.

7.3.1 String Table Partitions

The string table is organized into partitions so that the indices assigned to compact identifiers can stay relatively small. Smaller number of indices results in improved average compactness and the efficiency of table operations. Each partition has a separate set of compact identifiers and content items are assigned to specific partitions as described below.

Uri content items and the URI portion of qname content items are assigned to the uri partition. The uri partition is optimized for frequent use of compact identifiers and is pre-populated with initial entries as described in D.1 Initial Entries in Uri Partition . When a schema is provided, the uri partition is also pre-populated with the name of each target namespace URI declared in the schema, plus some of the namespace URIs used in wildcard terms and attribute wildcards (see section 8.5.4.1.7 Wildcard Terms and 8.5.4.1.3.2 Complex Type Grammars , respectively for the condition), appended in lexicographical order.

Prefix content items are assigned to partitions based on their associated namespace URI. Partitions containing prefix content items are optimized for frequent use of compact identifiers and the string table is pre-populated with entries as described in D.2 Initial Entries in Prefix Partitions .

The local-name portion of qname content items are assigned to partitions based on the namespace URI of the qname content item of which the local-name is a part. Partitions containing local-names are optimized for frequent use of string literals and the string table is pre-populated with entries as described in D.3 Initial Entries in Local-Name Partitions . When a schema is provided, the string table is also pre-populated with the local name of each attribute, element and type declared in the schema, partitioned by namespace URI and sorted lexicographically.

Each ~~Value~~ value content ~~items are~~ item is assigned ~~simultaneously~~ to both the global value partition ~~as well as to the~~ and a "local" value partition ~~that corresponds to~~ based on the qname of the attribute or element in context at the time ~~when the string table is looked up and~~ the ~~string~~ value is ~~not found in both global and local~~ added to the value partitions. Partitions containing value content items are optimized for frequent use of string literals and are initially empty. [Definition:] ~~All value partitions in a string table share a single~~ The variable ~~valueAmount~~ globalID ~~the value of which~~ is a non-negative integer ~~that reflects~~ representing the ~~current number~~ compact identifier of ~~unique values in~~ the next item added to the global value ~~partitions.~~ partition. Its value is initially set to 0 (zero) and changes while processing an EXI stream per the rule described in 7.3.3 Partitions Optimized for Frequent use of String Literals .

7.3.2 Partitions Optimized for Frequent use of Compact Identifiers

String table partitions that are expected to contain a relatively small number of entries used repeatedly throughout the document are optimized for the frequent use of compact identifiers. This includes the uri partition and all partitions containing prefix content items.

When a string value is found in a partition optimized for frequent use of compact identifiers, the string value is represented as the value ( i +1) encoded as an n -bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer ), where i is the value of the compact identifier, n is ⌈ log ₂ ( m +1) ⌉ and m is the number of entries in the string table partition at the time of the operation.

When a string value is not found in a partition optimized for frequent use of compact identifiers, the String value is represented as zero (0) encoded as an n -bit Unsigned Integer, followed by the string literal encoded as a String ( 7.1.10 String ). After encoding the String value, it is added to the string table partition and assigned the next available compact identifier m .

7.3.3 Partitions Optimized for Frequent use of String Literals

The remaining string table partitions are optimized for the frequent use of string literals. This includes all string table partitions containing local-names and all string table partitions containing value content items.

When a string value is found in the partitions containing local-names, the string value is represented as zero (0) encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer ) followed by an the compact identifier of the string value. The compact identifier of the string value is encoded as an n -bit unsigned integer ( 7.1.9 n-bit Unsigned Integer ), where n is ⌈ log ₂ m ⌉ and m is the number of entries in the string table partition at the time of the operation.

When a string value is not found in the partitions containing local-names, its string literal is encoded as a String (see 7.1.10 String ) with the length of the string is incremented by one. After encoding the string value, it is added to the string table partition and assigned the next available compact identifier m .

As described above, each value content ~~items are~~ item is assigned to two partitions, a "local" value partition and the global value partition. When a string value is found in the "local" value partition, the string value is may be represented as zero (0) encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer ) followed by the compact identifier of the string value in the "local" value partition. When a string value is found in the global value partition, ~~but not in the "local" value partition,~~ the String value is may be represented as one (1) encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer ) followed by the compact identifier of the String value in the global value partition. The compact identifier is encoded as an n -bit unsigned integer ( 7.1.9 n-bit Unsigned Integer ), where n is ⌈ log ₂ m ⌉ and m is the number of entries in the associated partition at the time of the operation.

When a string value S is not found in the global or "local" value partition, its string literal is encoded as a String (see 7.1.10 String ) with the length L + 2 (incremented by two) where L is the number of characters in the string ~~value . After encoding the string value, it~~ value. If valuePartitionCapacity is ~~added to both the associated "local" value string table partition~~ not zero, and ~~the global value string table partition if~~ L is ~~equal to or smaller~~ greater than zero and no more than ~~the~~ valueMaxLength ~~option~~ ~~value,~~ , the string S is added to the associated "local" value partition using the next available compact identifier m and added to the global value of partition using the compact identifier ~~valueAmount~~ globalID . When S is ~~smaller than~~ added to the ~~capacity specified by~~ global value partition and there was already a string V in the global value partition associated with the compact identifier ~~valuePartitionCapacity option~~ globalID . , the string S replaces the string V in the global table, and the string V is removed from its associated local value partition by rendering its compact identifier permanently unassigned. When the string value ~~was~~ is added to the global value ~~partitions,~~ partition, the value of ~~valueAmount~~ globalID is incremented by ~~1 . Editorial note String values representing value content items are never added to~~ one (1). If the ~~string table once~~ resulting value of ~~valueAmount~~ globalID ~~reaches~~ is equal to valuePartitionCapacity ~~. The working group~~ , its value is ~~still looking at other alternatives~~ reset to cap the amount of memory used for value partitions that can result in more compact representation of string values overall, including those that involve reassignment of compact identifiers using some sort of round-robin selection method, and the expected effect on processing efficiency of each alternative. zero (0)

7.4 Datatype Representation Map

By default, each typed value in an EXI stream is represented ~~by the associated~~ using its default built-in EXI datatype representation ~~(e.g., see~~ (see Table 7-1 ). However, [Definition:] EXI processors MAY provide the capability to specify ~~different~~ alternate built-in EXI datatype representations or user-defined datatype representations for ~~representing~~ specific schema ~~datatypes.~~ datatypes ^XS2. This capability is called a Datatype Representation Map .

Note:

This feature is relevant only to simple types in the schema. EXI does not provide a way for applications to infuse custom representations of structured data bound to complex types into the format.

EXI processors that support Datatype Representation ~~Map~~ Maps MAY provide ~~external~~ implementation specific means to define and install user-defined datatype ~~representations , of which EXI processors are free to choose implementation dependent mechanisms.~~ representations. EXI processors MAY also provide implementation specific means for applications or users to specify alternate built-in EXI datatype representations or user-defined datatype representations for representing specific schema ~~datatypes,~~ datatypes. As with the ~~mechanisms of which~~ default EXI datatype representations, alternate datatype representations are ~~again implementation dependent.~~ used for the associated XML Schema types specified in the Datatype Representation Map and XML Schema datatypes derived from those datatypes. When there are built-in or user-defined datatype representations associated with more than one XML Schema datatype in the type hierarchy of a particular datatype, the closest ancestor with an associated datatype representation is used to determine the EXI datatype representation.

When an EXI processor encodes an EXI stream using a Datatype Representation Map ~~, it MUST specify in the EXI header each schema datatype that is not represented using the default built-in EXI datatype representation~~ and the ~~alternate built-in EXI datatype representation or user-defined datatype representation used for each one unless the whole~~ EXI Options part of the header is ~~omitted.~~ present, the EXI options part MUST specify all alternate datatype representations used in the EXI stream. An EXI processor that attempts to decode an EXI stream that specifies a user-defined datatype representation in the EXI header that it does not recognize MAY report a warning, but this is not an error. However, when an EXI processor encounters a typed value that was encoded by a user-defined datatype representation that it does not support, it MUST report an error.

The EXI options header, when it appears in an EXI stream, MUST include a "datatypeRepresentationMap" element for each schema datatype ~~that is~~ of which the descendant datatypes derived by restriction as well as itself are not represented using the default built-in EXI datatype representation. The "datatypeRepresentationMap" element includes two child elements. The qname of the first child element identifies the schema datatype that is not represented using the default built-in EXI datatype representation and the qname of the second child element identifies the alternate built-in EXI datatype representation or user-defined datatype representation used to represent that type. Built-in EXI datatype representations are identified by the type identifiers in Table 7-1 .

For example, the following ~~"datatypeRepresentationMap" element~~ "datatypeRepresentationMap"element indicates all values of type xsd:decimal and the datatypes derived from it by restriction in the EXI stream are represented using the built-in String datatype representation, which has the type ID ~~xsd:string:~~ exi:string :

Example 7-2. datatypeRepresentationMap indicating all Decimal values are represented using ~~built-in String datatype representation <datatypeRepresentationMap xmlns:xsd="http://www.w3.org/2001/XMLSchema">~~ built-in String datatype representation

    <exi:datatypeRepresentationMap xmlns:xsd="http://www.w3.org/2001/XMLSchema">

        <xsd:decimal/>
        <xsd:string/>
    </datatypeRepresentationMap>

        <exi:string/>
    </exi:datatypeRepresentationMap>

It is the responsibility of an EXI processor to interface with a particular implementation of built-in EXI datatype representations or user-defined datatype representations properly. In the example above, an EXI processor may need to provide a string value of the data being processed that is typed as xsd:decimal in order to interface with an implementation of built-in String datatype representation. In such a case, some EXI processors may have started with a decimal value and such processors may well translate the value into a string before passing the data to the implementation of built-in String datatype representation while other EXI processors may already have a string value of the data so that it can pass the value directly to the implementation of built-in String datatype representation without any translation.

As another example, the following "datatypeRepresentationMap" element indicates all values of the used-defined ~~datatype~~ simple type geo:geometricSurface and the datatypes derived from it by restriction are represented using the user-defined datatype representation geo:geometricInterpolator:

~~<datatypeRepresentationMap xmlns:geo="http://www.example.com/Geometry">~~

Example 7-3. datatypeRepresentationMap illustrating a user-defined type represented by a user-defined datatype representation

    <exi:datatypeRepresentationMap xmlns:geo="http://example.com/Geometry">
        <geo:geometricSurface/>
	<geo:geometricInterpolator/>
    </datatypeRepresentationMap>

    </exi:datatypeRepresentationMap>

Note:

EXI only defines a way to indicate the use of user-defined datatype representations for representing values of specific datatypes. Datatype representations ~~which~~ are ~~assigned~~ referred to ~~datatypes~~ by their respective qnames ~~, are~~ in "datatypeRepresentationMap" elements. A datatype representation is omnipresent only if ~~the~~ its qname is one of those that represent built-in EXI datatype representations. For datatype representations of other qnames , EXI does not provide nor suggest a method by which they are identified and shared between EXI Processors. ~~Therefore, its~~ This suggests that the use of user-defined (i.e. custom) datatype representations needs to be restrained by weighing alternatives and considering the consequences of each in pros and cons, in order to avoid unruly proliferation of documents that use ~~custom~~ such datatype ~~representations~~ representations. Those applications that ever find Datatype Representation Map useful should make sure that they exchange such documents only among the parties that are pre-known or discovered to be able to process the user-defined datatype representations that are in use. Otherwise, if it is not for certain if a receiver undestands the particular user-defined datatype representations, the sender should never attempt to send documents that use user-defined datatype representations to that recipient.

8. EXI Grammars

EXI is a knowledge based encoding that uses a set of grammars to determine which events are most likely to occur at any given point in an EXI stream and encodes the most likely alternatives in fewer bits. It does this by mapping the stream of events to a lower entropy set of representative values and encoding those values using a set of simple variable length codes or an EXI compression algorithm.

The result is a very simple, small algorithm that uniformly handles schema-less encoding, schema-informed encoding, schema deviations, and any combination thereof in EXI streams. These variations do not require different algorithms or different parsers, they are simply informed by different combinations of grammars.

The following sections describe the grammars used to inform the EXI encoding.

Note:

The grammar semantics in this specification are written for clarity and generality. They do not prescribe a particular implementation approach.

8.1 Grammar Notation

8.1.1 Fixed Event Codes

Each grammar production has an event code , which is represented by a sequence of one to three parts separated by periods ("."). Each part is an unsigned integer. The following are examples of grammar productions with event codes as they appear in this specification.

Example 8-1. Example productions with fixed event codes


Productions		Event Codes

LeftHandSide _1 1 :
	~~Event~~ Terminal _1 1 NonTerminal _1 1	0
	~~Event~~ Terminal _2 2 NonTerminal _2 2	1
	~~Event~~ Terminal _3 3 NonTerminal _3 3	2.0
	~~Event~~ Terminal _4 4 NonTerminal _4 4	2.1
	~~Event~~ Terminal _5 5 NonTerminal _5 5	2.2.0
	~~Event~~ Terminal _6 6 NonTerminal _6 6	2.2.1

LeftHandSide _2 2 :
	~~Event~~ Terminal _1 1 NonTerminal _1 1	0
	~~Event~~ Terminal _2 2 NonTerminal _2 2	1.0
	~~Event~~ Terminal _3 3 NonTerminal _3 3	1.1

The number of parts in a given event code is called the event code's length. No two productions with the same non-terminal symbol on the ~~left-hand-side~~ left-hand side are permitted to have the same event ~~code.~~ code .

8.1.2 Variable Event Codes

Some non-terminal symbols are used on the ~~right-hand-side~~ right-hand side in a production without ~~an event~~ a terminal symbol prefixed to ~~them.~~ them, but with a parenthesized event code affixed instead. Such non-terminal symbols are macros and they are used to capture some recurring set of productions ~~into~~ as symbols so that a symbol can be used in the grammar representation instead of including all the productions the macro represents in place every time it is used.

Example 8-2. Example productions that use macro non-terminal symbols


ABigProduction _1 1 :
	~~Event~~ Terminal _1 1 NonTerminal _1 1	0
	~~Event~~ Terminal _2 2 NonTerminal _2 2	1
	LEFTHANDSIDE ₁ (2.0)	2.0

ABigProduction _2 2 :
	~~Event~~ Terminal _1 1 NonTerminal _1 1	0
	LEFTHANDSIDE ₁ (1.1)	1.1
	~~Event~~ Terminal _2 2 NonTerminal _2 2	1.2

Because non-terminal macros are injected into the ~~right-hand-side~~ right-hand side of more than one production, the event codes of productions with these macro non-terminals on the ~~left-hand-side~~ left-hand side are not fixed, but will have different event code values depending on the context in which the macro non-terminal appears. This specification calls these variable event codes and uses variables in place of individual event code parts to indicate the event code parts are determined by the context. Below are some examples of variable event codes:

Example 8-3. Example non-terminal macros and its productions with variable event codes


LEFTHANDSIDE _1 1 ~~(n.m)~~ (n.m) :
	~~EVENT~~ TERMINAL _1 1 NONTERMINAL _1 1	n .0
	~~EVENT~~ TERMINAL _2 2 NONTERMINAL _2 2	n .1
	~~EVENT~~ TERMINAL _3 3 NONTERMINAL _3 3	n . m +2
	~~EVENT~~ TERMINAL _4 4 NONTERMINAL _4 4	n . m +3
	~~EVENT~~ TERMINAL _5 5 NONTERMINAL _5 5	n . m +4.0
	~~EVENT~~ TERMINAL _6 6 NONTERMINAL _6 6	n . m +4.1

Unless otherwise specified, the variable n evaluates to the first part of the event code of the production in which the macro non-terminal LEFTHANDSIDE _1

1 appears on the ~~right-hand-side.~~ right-hand side. Similarly, the expression n . m represents the first two parts of the event code of the production in which the macro non-terminal LEFTHANDSIDE _1

1 appears on the ~~right-hand-side.~~ right-hand side.

Non-terminal macros are used in this specification for notational convenience only. They are not non-terminals, even though they are used in place of non-terminals. Productions that use non-terminal macros on the ~~right-hand-side~~ right-hand side need to be expanded by macro substitution before such productions are interpreted. Therefore, ABigProduction _1

1 and ABigProduction _2

2 shown in the preceding example are equivalent to the following set of productions ~~derived~~ obtained by expanding the non-terminal macro symbol LEFTHANDSIDE _1

1 and evaluating the variable event codes.

Example 8-4. Expanded productions equivalent to the productions used above


ABigProduction _1 1 :
	~~Event~~ Terminal _1 1 NonTerminal _1 1	0
	~~Event~~ Terminal _2 2 NonTerminal _2 2	1
	~~EVENT~~ TERMINAL _1 1 NONTERMINAL _1 1	2.0
	~~EVENT~~ TERMINAL _2 2 NONTERMINAL _2 2	2.1
	~~EVENT~~ TERMINAL _3 3 NONTERMINAL _3 3	2.2
	~~EVENT~~ TERMINAL _4 4 NONTERMINAL _4 4	2.3
	~~EVENT~~ TERMINAL _5 5 NONTERMINAL _5 5	2.4.0
	~~EVENT~~ TERMINAL _6 6 NONTERMINAL _6 6	2.4.1

ABigProduction _2 2 :
	~~Event~~ Terminal _1 1 NonTerminal _1 1	0
	~~EVENT~~ TERMINAL _1 1 NONTERMINAL _1 1	1.0
	~~EVENT~~ TERMINAL _2 2 NONTERMINAL _2 2	1.1
	~~Event~~ Terminal _2 2 NonTerminal _2 2	1.2
	~~EVENT~~ TERMINAL _3 3 NONTERMINAL _3 3	1.3
	~~EVENT~~ TERMINAL _4 4 NONTERMINAL _4 4	1.4
	~~EVENT~~ TERMINAL _5 5 NONTERMINAL _5 5	1.5.0
	~~EVENT~~ TERMINAL _6 6 NONTERMINAL _6 6	1.5.1

8.2 Grammar Event Codes

Each production rule in the EXI grammar includes an event code value that approximates the likelihood the associated production rule will be matched over the other productions with the same left-hand-side non-terminal symbol. Ultimately, the event codes determine the value(s) by which each non-terminal symbol will be represented in the EXI stream.

To understand how a given event code approximates the likelihood a given production will ~~matched,~~ match, it is useful to visualize the event codes for a set of production rules that have the same non-terminal symbol on the ~~left-hand-side~~ left-hand side as a tree. For example, the following set of productions:

Example 8-5. Example productions with event codes

ElementContent :
	EE	0
	~~SE ()~~ SE () ElementContent	1.0
	CH CH ElementContent	1.1
	ER ER ElementContent	1.2
	CM CM ElementContent	1.3.0
	PI PI ElementContent	1.3.1

represents a set of information items that might occur as element content after the start tag. Using the production event ~~codes,~~ codes , we can visualize this set of productions as follows:

Figure 8-1. Event code tree for ElementContent grammar

where the ~~non-terminal~~ terminal symbols are represented by the leaf nodes of the ~~tree~~ tree, and the event code of each production rule ~~that contains a non-terminal symbol~~ defines a path from the root of the tree to the node ~~associated with~~ that ~~symbol.~~ represents the terminal symbol that is on the right-hand side of the production. We call this the event code tree for a given set of productions.

An event code tree is similar to a Huffman tree [Huffman Coding] in that shorter paths are generally used for symbols that are considered more likely. However, event code trees are far simpler and less costly to compute and maintain. Event code trees are shallow and contain at most three levels. In addition, the length of each event code in the event code tree is assigned statically without analyzing the data. This classification provides some of the benefits of a Huffman tree without the cost.

8.3 Pruning Unneeded Productions

As discussed in section 6.3 Fidelity Options , applications MAY provide a set of fidelity options to specify the XML features they require. EXI processors MUST use these fidelity options to prune the productions of which the terminal symbols represent the events that are not required from the grammars, improving compactness and processing efficiency.

For example, the following set of productions represent the set of information items that might occur as element content after the start tag.

Example 8-6. Example productions with full fidelity

ElementContent :
	EE	0
	~~SE ()~~ SE () ElementContent	1.0
	CH CH ElementContent	1.1
	ER ER ElementContent	1.2
	CM CM ElementContent	1.3.0
	PI PI ElementContent	1.3.1

If an application sets the fidelity options preserve.comments, preserve.pis and preserve.dtd to false, the productions matching comment (CM), processing instruction (PI) and entity reference (ER) events are pruned from the grammar, producing the following set of productions:

Example 8-7. Example productions after pruning

ElementContent :
	EE	0
	~~SE ()~~ SE () ElementContent	1.0
	CH CH ElementContent	1.1

Removing these productions from the grammar tells EXI processors that comments and processing instructions will never occur in the EXI stream, which reduces the entropy of the stream allowing it to be encoded in fewer bits.

Each time a production is removed from a grammar, the event codes of the other productions with the same non-terminal symbol on the ~~left-hand-side~~ left-hand side MUST be adjusted to keep them contiguous if its removal has left the remaining productions with non-contiguous event codes.

8.4 Built-in XML Grammars

This section describes the built-in XML ~~grammar~~ grammars used by EXI when no ~~additional~~ schema information is available ~~to describe the contents~~ or when available schema information describes only portions of the EXI stream.

The built-in XML ~~grammar is used when no schema exists,~~ grammars are dynamic and ~~for schema extensions~~ continuously evolve to reflect knowledge learned while processing an EXI stream. New built-in element grammars are created to describe the content of newly encountered elements and ~~deviations that~~ new grammar productions are ~~not declared by~~ added to refine existing built-in grammars. Newly learned grammars and productions are used to more efficiently represent subsequent events in the ~~schema.~~ EXI stream. All newly created built-in element grammars are global element grammars .

[Definition:] A ~~built-in XML~~ global element grammar is ~~self-evolving. The built-in~~ a grammar ~~continuously reflects~~ describing the ~~knowledge being learned while~~ content of an element that has global scope (i.e. a global element). At the onset of processing an EXI ~~stream onto itself in order to keep refining itself for subsequent use~~ stream, the set of global element grammars is the set of all schema-informed element grammars derived from element declarations that have a {scope} property of global . Each built-in element grammar ~~within~~ created while processing an EXI stream is added to the ~~extent~~ set of ~~processing~~ global element grammars. Each global element grammar has a ~~single stream.~~ unique qname .

8.4.1 Built-in Document Grammar

In the absence of ~~additional~~ schema information ~~about~~ describing the content of the EXI stream, the following grammar describes the events that will occur in an EXI document .

		Event Code

Document :
	SD DocContent	0

DocContent :
	~~SE ()~~ SE () DocEnd	0
	DT DocContent	1.0
	CM DocContent	1.1.0
	PI DocContent	1.1.1

DocEnd :
	ED	0
	CM DocEnd	1.0
	PI DocEnd	1.1

Semantics:

All productions in the built-in ~~Document~~ document grammars of the form LeftHandSide : ~~SE (*)~~ SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by ~~SE (*)~~ SE (*)
If a global element grammar does not exist for element qname , create one ~~based on the~~ according to section 8.4.3 Built-in Element Grammar .
Evaluate the element ~~contents~~ content using ~~a built-in~~ the global element grammar for element qname .
Evaluate the remainder of event sequence using RightHandSide .

8.4.2 Built-in Fragment Grammar

In the absence of ~~additional~~ schema information ~~about~~ describing the contents of an EXI stream, the following grammar describes the events that ~~will~~ may occur in an EXI fragment . The grammar ~~shown~~ below represents the initial set of productions ~~that belong to a~~ in the built-in fragment grammar at the start of a EXI stream ~~processing, which is supplemented by the semantic description that explains the rules used to evolve~~ processing. The associated semantics explain how the built-in fragment grammar evolves to ~~continuously improve it and be better prepared for~~ more efficiently represent subsequent ~~uses of the same grammar during the rest of the processing of~~ events in the EXI stream.

		Event Code

Fragment :
	SD FragmentContent	0

FragmentContent :
	~~SE ()~~ SE () FragmentContent	0
	ED	1
	CM FragmentContent	2.0
	PI FragmentContent	2.1

Semantics:

All productions in the built-in ~~Fragment~~ fragment grammars of the form LeftHandSide : ~~SE (*)~~ SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by ~~SE (*)~~ SE (*)
If a global element grammar does not exist for element qname , create one ~~based on the~~ according to section 8.4.3 Built-in Element Grammar ~~Evaluate the element contents using a built-in grammar for element qname~~ .
Create a production of the form LeftHandSide : SE ( qname ) RightHandSide with an event code 0
Increment the first part of the event code of each production in the current grammar with the non-terminal LeftHandSide on the ~~left hand side.~~ left-hand side
Add the production created in step 4 3 to the grammar
Evaluate the element content using the global element grammar for element qname .
Evaluate the remainder of event sequence using RightHandSide .

All productions of the form LeftHandSide : SE ( qname ) RightHandSide that were previously added to the grammar upon the first occurrence of the element that has the qname qname are evaluated as follows when they are matched:

Evaluate the element ~~contents~~ content using ~~a built-in~~ the global element grammar for element qname
Evaluate the remainder of event sequence using RightHandSide .

8.4.3 Built-in Element Grammar

[Definition:] ~~EXI defines a built-in element~~ When no grammar ~~that is used~~ exists for an element occuring in ~~the absence of additional information about the contents of~~ an EXI ~~element prior to its processing. A~~ stream, a built-in element grammar ~~shown below~~ is ~~prescibed by EXI to reflect the events~~ created for that ~~will occur in an~~ element. Built-in element grammars are initially generic and are progressively refined as the ~~order amongst them in general without any further constraint about what~~ specific content for the associated element is ~~likely or not likely to occur inside elements. A single instance of~~ learned. All built-in element ~~grammar is shared by those elements in a stream that have the same~~ grammars are ~~qname~~ global element grammar s and ~~do not have additional a priori constraints as to their content. A separate instance~~ can be uniquely identified by the qname of the global element they describe. At the outset of processing an EXI stream, the set of built-in element ~~grammar~~ grammars is ~~assigned to each~~ empty.

Below is the initial set of productions used for all newly created ~~qname~~ built-in element grammars ~~upon the first occurrence of the elements of the same~~ . The semantics describe how productions are added to each ~~qname~~ built-in element grammar ~~, thereafter the grammar continuously evolves by reflecting the knowledge learned while processing~~ as the content of ~~those elements. The grammar shown below represents~~ the ~~initial set of productions that belong to a built-in~~ associated element ~~grammar at the time when a new instance is created, which~~ is supplemented by the semantic description that explains the rules that are applied by the grammar onto itself to evolve and be better prepared for subsequent uses of the same grammar instance during the rest of the processing of the stream. learned.

		Event Code

StartTagContent :
	EE	0.0
	~~AT ()~~ AT () StartTagContent	0.1
	NS StartTagContent	0.2
	SC Fragment	0.3
	ChildContentItems (0.4)

ElementContent :
	EE	0
	ChildContentItems (1.0)

ChildContentItems (n.m) :
	~~SE ()~~ SE () ElementContent	n . m
	CH ElementContent	n .( m +1)
	ER ElementContent	n .( m +2)
	CM ElementContent	n .( m +3).0
	PI ElementContent	n .( m +3).1

Note:

~~The~~ When the value of ~~each AT (xsi:type) event matching the AT() terminal is represented as a QName (see 7.1.7 QName ). If there is no namespace in scope for~~ the ~~specified~~ ~~qname~~ selfContained option ~~prefix,~~ is false, the ~~QName uri~~ production with the terminal symbol SC on the right-hand side is ~~set to empty ("")~~ absent from the above grammar, and the ~~QName~~ use of the non-terminal macro ~~localName~~ ChildContentItems* ~~is set to~~ that has StartTagContent non-terminal on the ~~full lexical value~~ left-hand side (shown in bold above) gets expanded with variable values (0.3) instead of (0.4) used above.
When the ~~QName, including~~ xsi:type attribute appears in an element where the ~~prefix.~~ built-in element grammar is in effect, it MUST occur before any other AT events of the same element.

Semantics:

All productions in the built-in ~~Element~~ element grammar of the form LeftHandSide : ~~AT (*)~~ AT (*) RightHandSide are evaluated as follows:

Let qname be the qname of the attribute matched by ~~AT (*)~~ AT (*)
~~If qname is not xsi:type or xsi:nil, create~~ Create a production of the form LeftHandSide : ~~AT (~~ AT ( qname ) ~~StartTagContent~~ RightHandSide with an event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal LeftHandSide on the ~~left hand~~ left-hand side. Add this production to the grammar.
If qname is xsi:type, let ~~type-qname~~ target-type be the value of the xsi:type ~~attribute,~~ attribute and if assign it the QName datatype representation (see 7.1.7 QName ). If there is no namespace in scope for the specified qname prefix, set the uri of target-type to empty ("") and the localName to the full lexical value of the QName, including the prefix. Encode target-type according to section 7. Representing Event Content . If a grammar ~~exists~~ can be found for the ~~type-qname~~ target-type ~~type,~~ type using the encoded target-type representation, evaluate the ~~remainder of event sequence~~ element contents using the grammar for ~~type-qname~~ target-type type instead of RightHandSide . ~~Otherwise, evaluate the remainder of event sequence using RightHandSide .~~

All productions of the form LeftHandSide : SC Fragment are evaluated as follows:

Save the string table, ~~grammars, namespace prefixes~~ grammars and any implementation-specific state learned while processing this EXI Body.
Initialize the string table, ~~grammars, namespace prefixes~~ grammars and any implementation-specific state learned while processing this EXI Body to the state they held just prior to processing this EXI Body.
Skip to the next byte-aligned boundary in the ~~stream.~~ stream if it is not already at such a boundary.
Let qname be the qname of the SE event immediately preceding this SC event.
Let content be the sequence of events following this SC event that match the grammar for element qname , up to and including the terminating EE event.
Evaluate the sequence of events (SD, SE( qname ), content , ED) according to the Fragment ~~grammar.~~ grammar (see 8.4.2 Built-in Fragment Grammar ).
Skip to the next byte-aligned boundary in the stream if it is not already at such a boundary.
Restore the string table, ~~grammars, namespace prefixes~~ grammars and implementation-specific state learned while processing this EXI Body to that saved in step 1 above.

All productions in the built-in ~~Element grammars~~ element grammar of the form LeftHandSide : ~~SE (*)~~ SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by ~~SE (*)~~ SE (*)
If a global element grammar does not exist for element qname , create one ~~based on the~~ according to section 8.4.3 Built-in Element Grammar ~~Evaluate the element contents using a built-in grammar for element qname~~ .
Create a production of the form LeftHandSide : SE ( qname ) RightHandSide with an event code 0
Increment the first part of the event code of each production in the current grammar with the non-terminal LeftHandSide on the ~~left hand side.~~ left-hand side
Add the production created in step 4 3 to the grammar
Evaluate the element content using the global element grammar for element qname .
Evaluate the remainder of event sequence using RightHandSide .

Evaluate the element ~~contents~~ content using ~~a built-in~~ the global element grammar for element qname
Evaluate the remainder of event sequence using RightHandSide .

All productions in the built-in ~~Element~~ element grammar of the form LeftHandSide : CH RightHandSide are evaluated as follows:

If a production of the form, LeftHandSide : CH RightHandSide with an event code of length 1 does not exist in the current element grammar, create one with event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal LeftHandSide on the ~~left hand~~ left-hand side.
Add the production created in step 1 to the grammar
Evaluate the remainder of event sequence using RightHandSide .

All productions in the built-in element grammar of the form LeftHandSide : EE are evaluated as follows:

If a production of the form, LeftHandSide : EE with an event code of length 1 does not exist in the current element grammar, create one with event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal LeftHandSide on the left-hand side.
Add the production created in step 1 to the grammar

8.5 Schema-informed Grammars

This section describes the schema-informed grammars used by EXI when schema information is available to describe the contents of the EXI ~~stream.~~ stream . Schema information used for processing an EXI stream is either indicated by the header option schemaID , or communicated out-of-band in the absence of schemaID . Schema-informed grammars are independent of any particular schema language and can be derived from ~~W3C~~ XML ~~Schemas,~~ Schemas [XML Schema Structures] [XML Schema Datatypes] , RELAX NG ~~schemas,~~ schemas [ISO/IEC 19757-2:2003] , DTDs [XML 1.0] [XML 1.1] or other schema languages for describing what is likely to occur in an EXI stream.

Schema-informed grammars accept all XML documents and fragments regardless of whether and how closely they match the schema. The EXI stream encoder encodes individual events using schema-informed grammars where they are available and falls back to the built-in XML grammars where they are not. In general, events for which a schema-informed grammar exists will be encoded more efficiently.

Unlike built-in XML grammars, schema-informed grammars are static and do not evolve, which permits the reuse of schema-informed grammars across the processing of multiple EXI streams. This is a single outstanding difference between the two grammar systems.

~~With such differences, however, their uses are not exclusive, but are connected together at individual grammar level. Of particular note~~ It is important to note that schema-informed and built-in grammars ~~that~~ are ~~called upon for schema-deviated parts are still subject~~ often used together within the context of a single EXI stream . While processing a schema-informed grammar, built-in grammars may be created to ~~dynamic grammar learning during~~ represent schema deviations or elements that match wildcards declared in the ~~rest~~ schema. Even though these built-in grammars occur in the context of a schema-informed stream, they are still dynamic and evolve to represent content learned while processing the EXI stream ~~processing~~ as is described in ~~8.4.2~~ 8.4 Built-in ~~Fragment Grammar~~ XML Grammars .

8.5.1 Schema-informed Document Grammar

When schema information is available to describe the contents of an EXI ~~stream,~~ stream , the following grammar describes the events that will occur in an EXI document .

Syntax

Event Code

Document :

SD DocContent

DocContent :

SE (G ₀ ) DocEnd

SE (G ₁ ) DocEnd

⋮

SE (G _n

-1

−1 ) DocEnd

n -1 −1

~~SE (*)~~ SE (*) DocEnd

n .0

DT DocContent

( n .1 +1).0

CM DocContent

( n ~~.2.0~~ +1).1.0

PI DocContent

( n ~~.2.1~~ +1).1.1

~~N.B.~~ Note:

The variable n in the grammar above is the number of global elements declared in the schema. G ₀, G ₁, ... G _n

-1

−1 represent all the qnames of global elements sorted lexicographically, first by ~~localName ,~~ local-name, then by ~~uri . The terminal symbol SE (*) is only matched if a more specific match does not exist among SE(G 0 ), SE(G 1 ), ... SE(G n -1 ).~~ uri.

DocEnd :

CM DocEnd

1.0

PI DocEnd

1.1

Semantics:

~~All productions in the~~ In a schema-informed ~~document grammars~~ grammar, all productions of the form LeftHandSide : ~~SE (*)~~ SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by ~~SE (*)~~ SE (*)
If a global element grammar does not exist for ~~Element~~ element qname , create one ~~based on~~ according to section 8.4.3 Built-in Element Grammar .
Evaluate the element content using the ~~Built-in Element Grammar~~ global element grammar for element qname .
Evaluate the ~~element contents~~ remainder of event sequence using ~~the SE (~~ ~~qname~~ RightHandSide ~~) grammar~~

8.5.2 Schema-informed Fragment Grammar

When schema information is available to describe the contents of an EXI ~~stream,~~ stream , the following grammar describes the events that will occur in an EXI fragment .

Syntax

Event Code

Fragment :

SD FragmentContent

FragmentContent :

SE (F ₀ ) FragmentContent

SE (F ₁ ) FragmentContent

⋮

SE (F _n

-1

−1 ) FragmentContent

n -1 −1

ED SE (*) FragmentContent

~~SE (*) FragmentContent~~ ED

( n ~~+1).0~~ +1

CM FragmentContent

( n ~~+1).1.0~~ +2).0

PI FragmentContent

( n ~~+1).1.1~~ +2).1

~~N.B.~~ Note:

The variable n in the grammar above represents the number of unique element qnames declared in the schema. The variables F _0

0, F _1

1, ... F _n
-1

n−1 represent these qnames sorted lexicographically, first by ~~localName ,~~ local-name, then by ~~uri .~~ uri. If there is more than one element declared with the same ~~qname,~~ qname , the qname is included only ~~once~~ once. If all such elements have the same schema type name and ~~its~~ {nillable} property value, their content is evaluated according to the specific grammar for that element declaration. Otherwise, their content is evaluated according to the relaxed Element Fragment grammar described in 8.5.3 Schema-informed Element Fragment Grammar .

~~The terminal symbol SE (*) is only matched if a more specific match does not exist among SE(F 0 ), SE(F 1 ), ... SE(F n -1 ).~~

Semantics:

~~All productions in the~~ In a schema-informed ~~fragment grammars~~ grammar, all productions of the form LeftHandSide : ~~SE (*)~~ SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by ~~SE (*)~~ SE (*)
If a global element grammar does not exist for ~~Element~~ element qname , create one ~~based on~~ according to section 8.4.3 Built-in Element Grammar .
Evaluate the element content using the ~~Built-in Element Grammar~~ global element grammar for element qname .
Evaluate the ~~element contents~~ remainder of event sequence using ~~the SE (~~ ~~qname~~ RightHandSide ~~) grammar~~

8.5.3 Schema-informed Element Fragment Grammar

[Definition:] When schema information is available to describe the contents of an EXI stream and more than one element is declared with the same ~~qname,~~ qname , but not all such elements have the ~~following grammar describes~~ same type name and {nillable} property value, the Schema-informed Element Fragment Grammar are used for processing the events that may occur in ~~these~~ such elements when they occur inside an EXI fragment or EXI Element Fragment. The schema-informed element fragment grammar consists of ElementFragment and ElementFragmentTypeEmpty which are defined below. ElementFragment is a grammar that accounts both element declarations and attribute declarations in the schemas, whereas ElementFragmentTypeEmpty is a grammar that regards only attribute declarations.

Syntax

Event Code

~~ElementFragmentStartTag~~ ElementFragment ₀ :

~~AT (A~~ AT (A ₀ ) [schema-type value] ~~ElementFragmentStartTag~~ ElementFragment ₀

~~AT (A~~ AT (A ₁ ) [schema-typed value] ~~ElementFragmentStartTag~~ ElementFragment ₀

⋮

~~AT (A~~ AT (A _n

-1

−1 ) [schema-typed value] ~~ElementFragmentStartTag~~ ElementFragment ₀

n −1

AT (*) ElementFragment ₀

n -1

SE (F ₀ ) ~~ElementFragmentContent~~ ElementFragment ₁

n +1

SE (F ₁ ) ~~ElementFragmentContent~~ ElementFragment ₁

n +1 +2

⋮

SE (F _m

-1 ) ~~ElementFragmentContent~~ ElementFragment ₁

n + m -1

EE SE (*) ElementFragment ₁

n + m +1

n + m +2

CH [untyped value] ~~ElementFragmentContent~~ ElementFragment ₁

n + m +1 +3

~~ElementFragmentContent~~ ElementFragment ₁ :

SE (F ₀ ) ~~ElementFragmentContent~~ ElementFragment ₁

SE (F ₁ ) ~~ElementFragmentContent~~ ElementFragment ₁

⋮

SE (F _m

-1 ) ~~ElementFragmentContent~~ ElementFragment ₁

m -1

EE SE (*) ElementFragment ₁

m +1

CH [untyped value] ~~ElementFragmentContent~~ ElementFragment ₁

m +2

ElementFragmentTypeEmpty ₀ :

AT (A ₀ ) [schema-valid value] ElementFragmentTypeEmpty ₀

AT (A ₁ ) [schema-valid value] ElementFragmentTypeEmpty ₀

⋮

AT (A _n

−1 ) [schema-valid value] ElementFragmentTypeEmpty ₀

n −1

AT (*) ElementFragmentTypeEmpty ₀

n +1

ElementFragmentTypeEmpty ₁ :

~~N.B.~~ Note:

The variable n in the grammar above represents the number of unique attribute qnames declared in the schema. The variables A ₀, A ₁, ... A _n

-1

−1 represent these qnames sorted lexicographically, first by ~~localName ,~~ local-name, then by ~~uri .~~ uri. If there is more than one attribute declared with the same ~~qname,~~ qname , the qname is included only ~~once and its~~ once. If all such attributes have the same schema type name, their value is ~~evaluated~~ represented using that type. Otherwise, their value is represented as a String.

The variable m in the grammar above represents the number of unique element qnames declared in the schema. The variables F ₀, F ₁, ... F _m

-1 represent these qnames sorted lexicographically, first by ~~localName ,~~ local-name, then by ~~uri .~~ uri. If there is more than one element declared with the same ~~qname,~~ qname , the qname is included only ~~once~~ once. If all such elements have the same type name and ~~its~~ {nillable} property value, their content is evaluated according to specific grammar for that element declaration. Otherwise, their content is evaluated according to the relaxed Element Fragment grammar described above.

Semantics:

In a schema-informed grammar, all productions of the form LeftHandSide : SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by SE (*)
If a global element grammar does not exist for element qname , create one according to section 8.4.3 Built-in Element Grammar .
Evaluate the element content using the global element grammar for element qname .
Evaluate the remainder of event sequence using RightHandSide

All productions in the schema-informed element fragment grammar of the form LeftHandSide : AT (*) RightHandSide are evaluated as follows:

Let qname be the qname of the attribute matched by AT (*)
If qname is xsi:type, let target-type be the value of the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName ). If there is no namespace in scope for the specified qname prefix, set the uri of target-type to empty ("") and the localName to the full lexical value of the QName, including the prefix. Encode target-type according to section 7. Representing Event Content . If a grammar can be found for the target-type type using the encoded target-type representation, evaluate the element contents using the grammar for target-type type instead of RightHandSide .
If qname is xsi:nil, let nil be the value of the xsi:nil attribute. If nil is a valid Boolean value, assign it the Boolean datatype representation (see 7.1.2 Boolean ) and encode it according to section 7. Representing Event Content . If nil is not a valid Boolean value, represent the xsi:nil attribute event using the AT (*) [untyped value] terminal (see 8.5.4.4 Undeclared Productions ). If the value of nil is true, evaluate the element contents using the grammar ElementFragmentTypeEmpty defined above rather than RightHandSide .
If a global attribute definition exists for qname , let global-type be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated with global-type , it SHOULD be represented using the datatype representation associated with global-type (see 7. Representing Event Content ). If the attribute value is not represented using the datatype representation associated with global-type , represent the attribute event using the AT (*) [untyped value] terminal (see 8.5.4.4 Undeclared Productions ).

As with all schema informed element grammars, the ~~Element Fragment~~ schema-informed element fragment grammar is augmented with additional productions that describe events that may occur in an EXI stream, but are not explicity declared in the schema. The process for augmenting the grammar is described in 8.5.4.4 Undeclared Productions . For the purposes of this process, the schema-informed element fragment grammar is treated as though it is created from an element declaration with a {nillable} property value of true and a type declaration that has named sub-types, and ElementFragmentTypeEmpty is used to serve as the TypeEmpty of the type in the process.

The content index of grammars ElementFragment and ElementFragmentTypeEmpty are both 1 (one).

8.5.4 Schema-informed Element and Type Grammars

[Definition:] When one or more XML Schema is available to describe the contents of an EXI stream, a schema-informed element grammar Element _i is derived for each element declaration E _i described by the schemas, where 0 ≤ i < n and n is the number of element declarations in the schema.

[Definition:] When one or more XML Schema is available to describe the contents of an EXI stream, a schema-informed type grammar Type _i is derived for each named type declaration T _i described by the schemas as well as for each of the built-in primitive types ^XS2 and built-in derived types ^XS2, the complex ur-type ^XS1 and the simple ur-type ^XS2 defined by XML Schema specification [XML Schema Structures] [XML Schema Datatypes] , where 0 ≤ i < n and n is the total number of such available types.

Each schema-informed element grammar and type grammar is constructed according to the following four steps:

Create a proto-grammar that describes the content model according to available schema information (see section 8.5.4.1 EXI Proto-Grammars ).
Normalize the proto-grammar into an EXI grammar (see section 8.5.4.2 EXI Normalized Grammars ).
Assign event codes to each production in the normalized EXI grammar (see section 8.5.4.3 Event Code Assignment ).
Add additional productions to the normalized EXI grammar to represent events that may occur in the EXI stream, but are not described by the schema, such as comments, processing-instructions, schema-deviations, etc. (see section 8.5.4.4 Undeclared Productions ).

Each element grammar Element _i includes a sequence of n non-terminals Element _i, j, where 0 ≤ j < n . The content of the entire element is described by the first non-terminal Element _i, 0. The remaining non-terminals describe portions of the element content. Likewise, each type grammar Type _i includes a sequence of n non-terminals Type _i, j and the content of the entire type is described by the first non-terminal Type _i, 0.

The algorithms expressed in this section provide a concise and formal description of the EXI grammars for a given set of XML Schema definitions. More efficient algorithms likely exist for generating these EXI grammars and EXI implementations are free to use any algorithm that produces grammars and event codes that generate EXI encodings that match those produced by the grammars described here.

An example is provided in the appendix (see H Schema-informed Grammar Examples ) that demonstrates the process described in this section to generate a complete schema-informed element grammar from an element declaration in a schema.

8.5.4.1 EXI Proto-Grammars

This section describes the process for creating the EXI proto-grammars from XML Schema declarations and definitions. EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:

	LeftHandSide :
		~~RightHandSize~~ RightHandSide

where LeftHandSide and RightHandSide are both non-terminals. Whereas, all productions in a normalized EXI grammar contain exactly one terminal symbol and at most one non-terminal symbol on the ~~right hand~~ right-hand side. This is a restricted form of Greibach normal form [Greibach Normal Form] .

EXI proto-grammars are derived from XML Schema in a straight-forward manner and can easily be normalized with simple algorithm (see 8.5.4.2 EXI Normalized Grammars ).

8.5.4.1.1 Grammar Concatenation Operator

Proto-grammars are specified in a modular, constructive fashion. XML Schema components such as terms, particles, attribute uses are transformed each into a distinct proto-grammar, leveraging proto-grammars of their sub-components. At various stages of proto-grammar construction, two or more of proto-grammars are concatenated one after another to form more composite grammars.

The grammar concatenation operator ⊕ is a binary, associative operator that creates a new grammar from its left and right grammar operands. The new grammar accepts any set of symbols accepted by its left operand followed by any set of symbols accepted by its right operand.

Given a left operand Grammar ^L and a right operand Grammar ^R , the following operation

Grammar ^L ⊕ Grammar ^R

creates a combined grammar by replacing each production of the form

	Grammar ^L _k :
		EE

where 0 ≤ k < n and n is the number of non-terminals that occur on the ~~left hand~~ left-hand side of productions in Grammar ^L , with a production of the form

	Grammar ^L _k :
		Grammar ^R ₀

connecting each accept state of Grammar ^L with the start state of Grammar ^R .

8.5.4.1.2 Element Grammars

This section describes the process for creating an EXI element grammar from an XML Schema element declaration ^XS1.

Given an element declaration E _i, with properties {name}, ~~{target namespace}, {type definition},~~ {target namespace}, {type definition}, {scope} and {nillable}, create a corresponding EXI grammar Element _i for evaluating the contents of elements in the specified {scope} with qname ~~localName~~ local-name = {name} and qname uri = {target namespace} where ~~uri~~ qname ~~= {target namespace} .~~ is the qname of the elements.

Let T _j be the ~~{type definition}~~ {type definition} of E _i and Type _j

j be the type grammar created from T _j. The grammar Element _i describing the content model of E _i is created as follows.

Syntax:
	Element _{i, 0 i , 0} :
		Type _{j, 0 j , 0}

8.5.4.1.3 Type Grammars

Given an XML Schema type definition T _i
, with properties {name} and {target namespace}, two type grammars are created, which are denoted by Type _i and TypeEmpty _i. [Definition:] Type _i is a grammar that fully reflects the type definition of T _i
, , whereas [Definition:] TypeEmpty _i is a grammar that ~~accepts~~ regards only the attribute uses and attribute wildcards of T _i, if ~~any.~~ any .

The grammar Type _i is used for evaluating the content of elements that are defined to be of type T _i in the schema. [Definition:] Type _i is a global type grammar when T _i is a named type. Type _i, when it is a global type grammar, can additionally be used as the effective grammar designated by a xsi:type attribute with the attribute value that is a qname with local-name = {name} and uri = {target namespace}. TypeEmpty _i is used in place of Type _i when the element instance that is being evaluated has a xsi:nil attribute with the value true .

[Definition:] For each type grammar Type _i, an unique index number content is determined such that all non-terminal symbols of indices smaller than content have at least one AT ~~event~~ terminal symbol and the rest of the non-terminal symbols in Type _i do not have AT ~~events~~ terminal symbols on their ~~right-hand-side,~~ right-hand side, where indices are assigned to non-terminal symbols in ascending order with the entry non-terminal symbol of Type _i being assined index 0 (zero). There is also a content index associated with each TypeEmpty _i where its value is determined in the same manner as for Type _i.

Sections 8.5.4.1.3.1 ~~SimpleType~~ Simple Type Grammars and 8.5.4.1.3.2 Complex Type Grammars describe the processes for creating Type _i and TypeEmpty _i from XML Schema simple type definitions ^XS1 and complex type definitions ^XS1 defined in schemas as well as built-in primitive types ^XS2, built-in derived types ^XS2 and simple ur-type ^XS2 defined by XML Schema specification [XML Schema Datatypes] . Section 8.5.4.1.3.3 Complex Ur-Type Grammar defines the grammar used for processing instances of element contents of type xsd:anyType ^XS1.

8.5.4.1.3.1 SimpleType Simple Type Grammars

This section describes the process for creating an EXI type grammar from an XML Schema simple type definition ^XS1.

Given a simple type definition T _i, ~~with properties {name} and {target namespace},~~ create two new EXI grammars Type _i and TypeEmpty _i ~~for evaluating instances of types with qname localName = {name} and qname uri = {target namespace}.~~ following the procedure described below.

Add the following grammar productions to Type _i and TypeEmpty _i

i :

Syntax:
	Type _i, 0 :
		CH ~~[schema-valid value ]~~ [schema-typed value] Type _i, 1
	Type _i, 1 :
		EE

	TypeEmpty _i, 0 :
		EE

Note:

Productions of the form LeftHandSide : CH ~~[schema-valid value ]~~ [schema-typed value] RightHandSide represent typed character data that ~~is valid~~ can be represented using the EXI datatype representation associated with ~~respect to~~ the ~~schema. Schema-invalid character~~ simple type definition (see 7. Representing Event Content ). Character data that can be represented using the EXI datatype representation associated with the simple type definition SHOULD be represented this way. Character data that is not represented using the EXI datatype representation associated with the simple type definition is represented by productions of the form LeftHandSide : CH ~~[schema-invalid value ]~~ [untyped value] RightHandSide described in section ~~8.5.4.4.1 Adding~~ 8.5.4.4 Undeclared Productions ~~when Strict is False~~ .

The content index of grammar Type _i and TypeEmpty _i created from an XML Schema simple type definition is always 0 (zero).

8.5.4.1.3.2 Complex Type Grammars

This section describes the process for creating an EXI type grammar from an XML Schema complex type definition ^XS1.

Given a complex type definition T _i, with properties {name}, ~~{target namespace},~~ {target namespace}, {attribute uses}, {attribute wildcard} and {content type}, create two EXI grammars Type _i and TypeEmpty _i ~~for evaluating instances of types with qname local-name = {name} and qname uri = {target namespace} , as follows .~~ following the procedure described below.

Generate a grammar Attribute _i, for each attribute use A _i in {attribute uses} according to section 8.5.4.1.4 Attribute Uses .

Sort the attribute use grammars first by qname local-name, then by qname uri to form a sequence of grammars G ₀, G ₁, …, G _{n-1

n−1}, where n is the number of attribute uses in {attribute uses}.

~~Generate~~ If an {attribute wildcard} is specified, increment n and generate an additional attribute use grammar G _n

n−1 as follows:

	G _{n, 0 n−1, 0} :
		EE

~~If an~~ When the {attribute ~~wildcard}~~ wildcard}'s {namespace constraint} is ~~specified with the value~~ any , or a pair of not and either a namespace name or the special value absent indicating no namespace, add the following production to each grammar G _i generated ~~above:~~ above, where 0 ≤ i < n :

	G _i, 0 :
		~~AT()~~ AT () G _i, 0

~~If an {attribute wildcard}~~ Otherwise, that is, when {namespace constraint} is ~~specified with~~ a set of values whose members are namespace names or the special value absent indicating no namespace, add the following production to each grammar G _i generated ~~above:~~ above where 0 ≤ i < n :

G _i, 0 :

AT( uri _x : *) G _i, 0

~~for each~~ where uri _x is a member value of {namespace constraint}, provided that it is the empty string (i.e. "") that is used as uri _i

x when the member value is the special value absent . Each uri _x is used to augment the uri partition of the String table. Section 7.3.1 String Table Partitions describes how these uri strings are put into String table for pre-population.

Note:

When xsi:type and/or xsi:nil attributes appear in ~~{attribute wildcard}.~~ an element where schema-informed grammars are in effect, they MUST occur before any other AT events of the same element, with xsi:type placed before xsi:nil when they both occur.

Semantics:

~~Note that~~ In complex type grammars, all productions of the form LeftHandSide ~~: AT()~~* : AT () RightHandSide* and LeftHandSide ~~: AT (~~ : AT( uri _x x : ) RightHandSide* that stem from attribute wildcards are ~~only~~ evaluated as follows:

Let qname be the qname of the attribute matched if by AT (*) or AT( uri _x : *)
If qname is xsi:type, let target-type be the value of the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName ). If there is no namespace in scope for the specified qname prefix, set the uri of target-type to empty ("") and the localName to the full lexical value of the QName, including the prefix. Encode target-type according to section 7. Representing Event Content . If a ~~more specific match~~ grammar can be found for the ~~current~~ target-type type using the encoded target-type representation, evaluate the element contents using the grammar for target-type type instead of RightHandSide .
If qname is xsi:nil, let nil be the value of the xsi:nil attribute. If nil is a valid Boolean value, assign it the Boolean datatype representation (see 7.1.2 Boolean ) and encode it according to section 7. Representing Event Content . If nil is not a valid Boolean value, represent the xsi:nil attribute event ~~does~~ using the AT (*) [untyped value] terminal (see 8.5.4.4 Undeclared Productions ). If the value of nil is true, evaluate the element contents using the grammar for the TypeEmpty _i type defined below rather than RightHandSide .
If a global attribute definition exists for qname , let global-type be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated with global-type , it SHOULD be represented using the datatype representation associated with global-type (see 7. Representing Event Content ). If the attribute value is not ~~exist in~~ represented using the ~~grammar.~~ datatype representation associated with global-type , represent the attribute event using the AT (*) [untyped value] terminal (see 8.5.4.4 Undeclared Productions ).

The grammar TypeEmpty _i is created by combining the sequence of attribute use grammars terminated by an empty {content type} grammar as follows:

TypeEmpty _i = G ₀ ⊕ G ₁ ⊕ … ⊕ G _n

n−1 ⊕ Content _i

where the grammar Content _i is created as follows:

	Content _i, 0 :
		EE

The content index of grammar TypeEmpty _i is the index of its last non-terminal symbol.

The grammar Type _i is generated as follows.

If {content type} is a simple type definition T _j , generate a grammar Content _i as Type _j according to section 8.5.4.1.3.1 ~~SimpleType~~ Simple Type Grammars . If {content type} has a content model particle, generate a grammar Content _i according to section 8.5.4.1.5 Particles . Otherwise, if {content type} is empty , create a grammar Content _i as follows:

	Content _i :
		EE

If {content type} is a content model particle with mixed content, add a production for each non-terminal Content _i , j in Content _i as follows:

	Content _i, j :
		CH [untyped value] Content _i, j

Note:
	The value of each Characters event that has an [untyped value] is represented as a String (see 7.1.10 String ).

Then, create a copy H _i of each attribute use grammar G _i and create the grammar Type _i by combining this sequence of attribute use grammars and the Content _i grammar using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:

Type _i = H ₀ ⊕ H ₁ ⊕ … ⊕ H _n

n−1 ⊕ Content _i

The content index of grammar Type _i created from an XML Schema complex type definition is the index of the first non-terminal symbol of Content _i within the context of Type _i.

8.5.4.1.3.3 Complex Ur-Type Grammar

XML Schema [XML Schema Structures] defines a complex ur-type ^XS1 called xsd:anyType ^XS1, which is the default type for declared elements when no type is specified in the declaration. The type xsd:anyType can be used as the type of declared elements in schemas, or as the explicit type given to elements by means of xsi:type attribute in schema-informed EXI streams .

When schemas are available to describe the body of an EXI stream, create ~~an ur-type grammar~~ grammars ~~UrType~~ Type _ur-type and TypeEmpty _ur-type that is are used to process the element contents of type xsd:anyType ^XS1 as ~~follows.~~ shown below.

	~~Ur-Type~~ Type _ur-type,₀ :
		~~AT()~~ AT () ~~Ur-Type~~ Type _ur-type,₀
		SE() ~~Ur-Type~~ Type* _ur-type,₁
		EE
		CH ~~Ur-Type~~ Type _ur-type,₁

	~~Ur-Type~~ Type _ur-type,₁ :
		SE() ~~Ur-Type~~ Type* _ur-type,₁
		EE
		CH ~~Ur-Type~~ Type _ur-type,₁

	TypeEmpty _ur-type,₀ :
		AT () TypeEmpty* _ur-type,₀
		EE

	TypeEmpty _ur-type,₁ :
		EE

Semantics:

In a schema-informed grammar, all productions of the form LeftHandSide : AT (*) RightHandSide are evaluated as follows:

Let qname be the qname of the attribute matched by AT (*)
If qname is xsi:type, let target-type be the value of the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName ). If there is no namespace in scope for the specified qname prefix, set the uri of target-type to empty ("") and the localName to the full lexical value of the QName, including the prefix. Encode target-type according to section 7. Representing Event Content . If a grammar can be found for the target-type type using the encoded target-type representation, evaluate the element contents using the grammar for target-type type instead of RightHandSide .
If qname is xsi:nil, let nil be the value of the xsi:nil attribute. If nil is a valid Boolean value, assign it the Boolean datatype representation (see 7.1.2 Boolean ) and encode it according to section 7. Representing Event Content . If nil is not a valid Boolean value, represent the xsi:nil attribute event using the AT (*) [untyped value] terminal (see 8.5.4.4 Undeclared Productions ). If the value of nil is true, evaluate the element contents using the grammar for the TypeEmpty _ur-type type defined above rather than RightHandSide .
If a global attribute definition exists for qname , let global-type be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated with global-type , it SHOULD be represented using the datatype representation associated with global-type (see 7. Representing Event Content ). If the attribute value is not represented using the datatype representation associated with global-type , represent the attribute event using the AT (*) [untyped value] terminal (see 8.5.4.4 Undeclared Productions ).

In a schema-informed grammar, all productions of the form LeftHandSide : SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by SE (*)
If a global element grammar does not exist for element qname , create one according to section 8.4.3 Built-in Element Grammar .
Evaluate the element content using the global element grammar for element qname .
Evaluate the remainder of event sequence using RightHandSide

The content index of ~~grammar~~ grammars ~~UrType~~ Type ~~is always~~ _ur-type and TypeEmpty _ur-type are both 1 (one).

8.5.4.1.4 Attribute Uses

Given an attribute use A _i with properties {required} and ~~{attribute declaration},~~ {attribute declaration}, where ~~{attribute declaration}~~ {attribute declaration} has properties {name}, ~~{target namespace}~~ {target namespace}, {type definition} and {scope}, generate a new EXI grammar Attribute _i

i for evaluating attributes in the specified {scope} with qname ~~localName~~ local-name = {name} and qname uri = ~~{target namespace}.~~ {target namespace} where qname is the qname of the attributes. Add the following grammar productions to Attribute _i:

	Attribute _i, 0 :
		AT( qname ) [schema-typed value] Attribute _i, 1

	Attribute _i, 1 :
		EE

If the {required} property of A _i is false, add the following grammar production to indicate this attribute occurrence may be omitted from the content model.

	Attribute _i, 0 :
		EE

Note:

Productions of the form LeftHandSide : AT( qname ) [schema-typed value] RightHandSide represent typed attributes that occur in schema-valid contexts with values that can be represented using the EXI datatype representation associated with the attribute's {type definition} (see 7. Representing Event Content ). Attributes that occur in schema-valid contexts that can be represented using the EXI datatype representation associated with the attribute's {type definition}, SHOULD be represented this way. Attributes that are not represented this way, are represented using the alternate forms of AT events described in section 8.5.4.4 Undeclared Productions .

8.5.4.1.5 Particles

Given an XML Schema particle ^XS1 P _i with ~~{min occurs}, {max occurs}~~ {min occurs}, {max occurs} and {term} properties, generate a grammar Particle _i for evaluating instances of P _i as follows.

If {term} is an element declaration, generate the grammar Term ₀ according to section 8.5.4.1.6 Element Terms . If {term} is a wildcard, generate the grammar Term ₀ according to section 8.5.4.1.7 Wildcard Terms Wildcard Terms. If {term} is a model group, generate the grammar Term ₀ according to section 8.5.4.1.8 Model Group Terms .

Create ~~{min occurs}~~ {min occurs} copies of Term ₀.

G ₀, G ₁, …, G _{{min
occurs}-1

{min occurs}-1}

If ~~{max occurs}~~ {max occurs} is not unbounded, create ~~{max occurs}~~ {max occurs} – ~~{min occurs}~~ {min occurs} additional copies of Term ₀,

G _{{min
occurs}

{min occurs}}, G _{{min
occurs}+1

{min occurs}+1}, …, G _{{max
occurs}-1

{max occurs}-1}

Add the following productions to each of the grammars that do not already have a production of this form.

	G _i, 0 :
		EE where ~~{min occurs} ≤~~ {min occurs} ≤ i ~~< {max occurs}~~ < {max occurs}

indicating these instances of Term ₀ may be omitted from the content model. Then, create the grammar for Particle _i using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:

Particle _i = G ₀ ⊕ G ₁ ⊕ … ⊕ G _{{max
occurs}-1

{max occurs}-1}

Otherwise, if ~~{max occurs}~~ {max occurs} is unbounded, generate one additional copy of Term ₀, G _{{min
occurs}

{min occurs}} and replace all productions of the form:

	G _{{min occurs}, k {min occurs}, k} :
		EE

with productions of the form:

	G _{{min occurs}, k {min occurs}, k} :
		G _{{min occurs}, 0 {min occurs}, 0}

indicating this term may be repeated indefinitely. Then if there is no production of the form:

	G _{{min occurs}, 0 {min occurs}, 0} :
		EE

add one after the other productions with the non-terminal G _{{min
occurs}, 0

{min occurs}, 0} on the ~~left hand~~ left-hand side, indicating this term may be omitted from the content model. Then, create the grammar for Particle _i using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:

Particle _i = G ₀ ⊕ G ₁ ⊕ … ⊕ G _{{min
occurs}

{min occurs}}

8.5.4.1.6 Element Terms

Given a particle {term} PT _i that is an XML Schema element declaration ^XS1 with properties {name} and ~~{target namespace},~~ {target namespace}, let S be the set of element declarations that directly or indirectly reaches the element declaration PT _i through the chain of ~~{substitution group affiliation}~~ {substitution group affiliation} property of the elements, plus PT _i itself if was not in the set. Sort the element declarations in S lexicographically first by {name} then by ~~{target namespace},~~ {target namespace}, which makes a sorted list of element declarations E ₀ , E ₁ , … E _{n-1

n−1} where n is the cardinality of S . Then create the grammar ParticleTerm _i

i with the following grammar productions:

Syntax:

	ParticleTerm _i, 0 :
		SE( qname ₀ ) ParticleTerm _i, 1
		SE( qname ₁ ) ParticleTerm _i, 1
		⋮
		SE( qname _{n-1 n−1} ) ParticleTerm _i, 1

	ParticleTerm _i, 1 :
		EE

Note:

	In the productions above, qname _x (where 0 ≤ ≤ x < n) represents a qname of which ~~localname~~ local-name and uri are {name} property and ~~{target namespace}~~ {target namespace} property of the element declaration E _x , respectively.

Semantics:

	In a schema-informed grammar, all productions of the form LeftHandSide : SE( qname ) RightHandSide are evaluated as follows: Evaluate the element contents using the SE( qname ) grammar. Evaluate the remainder of the event sequence using RightHandSide

8.5.4.1.7 Wildcard Terms

Given a particle {term} PT _i

i that is an XML Schema wildcard ^XS1 with property ~~{namespace constraint},~~ {namespace constraint}, a grammar that reflects the wildcard definition is created as follows.

Create a grammar ParticleTerm _i containing the following grammar production:

	ParticleTerm _i, 1 :
		EE

When the wildcard's ~~{namespace constraint}~~ {namespace constraint} is ~~either~~ any , or a pair of not and either a namespace name or the special value ~~other ,~~ absent indicating no namespace, add the following production to ParticleTerm _i.

	ParticleTerm _i, 0 :
		SE() ParticleTerm* _i, 1

~~Otherwise (i.e. {namespace constraint} being~~ Otherwise, that is, when {namespace constraint} is a set of values whose members are namespace ~~names), for each member value uri x in {namespace constraint} where 0 ≤ x < n , and n is the number of members, augment the uri partition of~~ names or the ~~String table with~~ special value ~~uri x~~ absent ~~(see section 7.3.1 String Table Partitions for String table pre-population), and~~ indicating no namespace, add the following production to ParticleTerm _i
.:

	ParticleTerm _i, 0 :
		SE( uri _x : ) ParticleTerm* _i, 1

~~Note that productions of which right hand side start with terminal SE(*) or SE(~~ for each member value uri _x

x in {namespace constraint}, provided that it is the empty string (i.e. "") that is used as uri ~~: *) are only matched if a more specific match for~~ _x when the ~~current event does not exist in~~ member value is the ~~grammar. Terminals SE(~~ special value absent . Each uri _x ~~: *) are matched before it falls back~~ is used to ~~SE(*) among~~ augment the ~~productions~~ uri partition of ~~same~~ the String table. Section 7.3.1 String Table Partitions describes how these ~~LeftHandSide , if any.~~ uri strings are put into String table for pre-population.

Semantics:

In a schema-informed grammar, all productions of the form LeftHandSide : Terminal RightHandSide where Terminal is one of ~~SE(*)~~ SE (*) or ~~SE(~~ SE ( uri _x : *) are evaluated as follows:

Let qname be the qname of the element matched by ~~SE (*)~~ SE (*) or SE( uri _x : *)
If a global element grammar does not exist for ~~Element~~ element qname , create one ~~based on the~~ according to section 8.4.3 Built-in Element Grammar .
Evaluate the element ~~contents~~ content using the ~~SE(~~ global element grammar for element qname ~~) grammar.~~ .
Evaluate the remainder of the event sequence using RightHandSide

8.5.4.1.8 Model Group Terms

8.5.4.1.8.1 Sequence Model Groups

Given a particle {term} PT _i that is a model group with {compositor} equal to "sequence" and a list of n {particles} P ₀, P ₁, …, P _{n-1

n−1}, create a grammar ParticleTerm _i

i as follows:

If the value of n is 0, add the following productions to the grammar ParticleTerm _i.

	ParticleTerm _i, 0 :
		EE

Otherwise, generate a sequence of grammars Particle ₀, Particle ₁, …, Particle _n-1

n−1 corresponding to the list of particles P ₀, P ₁, …, P _n-1

n−1 according to section 8.5.4.1.5 Particles . Then combine the sequence of grammars using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:

ParticleTerm _i

i = Particle ₀ ⊕ Particle ₁ ⊕ … ⊕ Particle _n-1

n−1

8.5.4.1.8.2 Choice Model Groups

Given a particle {term} PT _i that is a model group with {compositor} equal to "choice" and a list of n {particles} P ₀, P ₁, …, P _{n-1

n−1}, create a grammar ParticleTerm _i

i as follows:

If the value of n is 0, add the following productions to the grammar ParticleTerm _i.

	ParticleTerm _i, 0 :
		EE

Otherwise, generate a sequence of grammar productions Particle ₀, Particle ₁, …, Particle _n-1

n−1 corresponding to the list of particles P ₀, P ₁, …, P _n-1

n−1 according to section 8.5.4.1.5 Particles . Then create the grammar ParticleTerm _i with the following grammar productions:

	ParticleTerm _i, 0 :
		Particle _0, 0

		Particle _1, 0
		⋮
		Particle _{n-1, 0 n−1, 0}

indicating the grammar for the term may accept any one of the given {particles}.

8.5.4.1.8.3 All Model Groups

Given a particle {term} PT _i that is a model group with {compositor} equal to "all" and a list of n ~~{particles}~~ { particles } P ₀, P ₁, ..., P _{n-1

n−1}, create a grammar ParticleTerm _i

i as follows:

~~Generate a set of grammars~~ Add the following production to the grammar S ParticleTerm _0

i.

	ParticleTerm _i, 0 :
		EE

If the value of n ~~= {~~ is not 0, generate a sequence of grammar productions Particle ₀, Particle ₁, ~~...,~~ …, Particle _{n-1

n−1} } corresponding to the list of particles P ₀, P ₁, ~~...,~~ …, P _n-1

n−1 according to section 8.5.4.1.5 Particles . ~~Then, generate the grammar ParticleTerm i from the set S 0 by applying~~

Replace all productions of the ~~following rules.~~ form:

~~Given a newly created grammar G and a set of m grammars, S = { G~~ ~~0 , G~~

~~1 , …,~~	G Particle _{m-1 j , k} ~~}, the productions of G are derived from S by the following steps.~~ :
~~If the value of m is 0, add the following~~		EE

with productions ~~to the grammar G , which completes~~ of the ~~grammar G .~~ form:

	G Particle _j , k :
		~~EE Otherwise, if~~ ~~m > 0, generate a sequence of grammars: Part j = C j ⊕ All ( S – { G~~ ParticleTerm _j i, 0 })

where 0 ≤ ≤ j < m n , ~~C j is a copy of G j~~ and 0 ≤ ~~All~~ k ( < S m ~~– {~~ with G m ~~j }) is~~ denoting the ~~All grammar for~~ number non-terminals in the ~~set ( S – {~~ grammar G Particle _{j
})
created
by
applying
this
sequence
of
rules
recursively
starting
step
1.
Then
add}.

Add the following productions to the grammar ~~G :~~ ParticleTerm _i.

	G ParticleTerm _i, 0 :
		~~Part~~ Particle _0, 0

		~~Part~~ Particle _1, 0
		⋮
		~~Part~~ Particle _{m-1, 0 n−1, 0}

Note:

~~Although the algorithm used for constructing all model group grammars presented~~ The grammar above ~~is well suited for clearly describing~~ can accept any sequence of the ~~grammar system~~ given {particles} in ~~an implementation-independent manner,~~ any order. This grammar is intentionally simple and succinct, enabling high-performance, low-footprint implementations ~~may choose~~ on a more efficient method for grammar construction because the number of grammars the algorithm would generate tends to grow exponentially relative to the number of particles. An example alternative grammar generation strategy that may lend better to implementation would be to use an array of boolean values each of which corresponds one of the particles to keep track wide range of ~~the particle occurrences~~ devices, including those with very limited memory resources. More elaborate and ~~construct the~~ precise grammars ~~at run-time based on those particles that have yet to occur at~~ for the ~~time of evaluation. This sort of simplified special handling of all model groups might be appealling given that all model~~ "all" group ~~instances~~ are ~~properly disconnected from another model group instances in that all model group can only appear as the immediate content model of a complex type and~~ possible; however, the ~~particles that they contain have always element terms,~~ associated improvement in precision is not ~~model group terms. Implementations are free to choose or invent any method or any combination of methods~~ sufficient to ~~construct grammars for all model groups.~~ justify their code-footprint and memory resource requirements.

8.5.4.2 EXI Normalized Grammars

This section describes the process for converting an EXI proto-grammar ~~derived~~ generated from an XML Schema in accordance with section 8.5.4.1 EXI Proto-Grammars into an EXI normalized grammar. Each production in an EXI normalized grammar has exactly one non-terminal symbol on the ~~left hand~~ left-hand side and one terminal symbol on the ~~right hand~~ right-hand side followed by at most one non-terminal symbol on the ~~right hand~~ right-hand side. In addition, EXI normalized grammars contain no two grammar productions with the same non-terminal on the ~~left~~ left-hand side and the same terminal symbol on the ~~right-hand-side.~~ right-hand side. This is a restricted form of Greibach normal form [Greibach Normal Form] .

EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:

	LeftHandSide :
		~~RightHandSize~~ RightHandSide

where LeftHandSide and RightHandSide are both non-terminals. Therefore, the first step of the normalization process focuses on replacing productions in this form with productions that conform to the EXI normalized grammar rules. This process can produce a grammar that has more than one production with the same non-terminal on the ~~left hand~~ left-hand side and the same terminal symbol on the ~~right hand~~ right-hand side. Therefore, the second step focuses on eliminating such productions.

The first step of the normalization process is described in Section 8.5.4.2.1 Eliminating Productions with no Terminal Symbol . The second step is described in section 8.5.4.2.2 Eliminating Duplicate Terminal Symbols . Once these two steps are completed, the grammar will be an EXI normalized grammar.

8.5.4.2.1 Eliminating Productions with no Terminal Symbol

Given an EXI proto-grammar G _i, with non-terminals G _i, 0, G _i, 1, …, G _{i, n-1

i, n−1}, replace each production of the form:

	G _i, j :
		G _i, k where ~~0 ≤~~ 0 ≤ j < n and ~~0 ≤~~ 0 ≤ k < n

with a set of productions:

	G _i, j :
		RHS ( G _i, k ) ₀
		RHS ( G _i, k ) ₁
		⋮
		RHS ( G _i, k ) _m-1

where RHS ( G _i, k ) ₀, RHS ( G _i, k ) ₁, …, RHS ( G _i, k ) _m-1 represents the ~~right hand~~ right-hand side of each production in G _i that has the non-terminal G _{j, k

i, k} on the ~~left hand~~ left-hand side and m is the number of such productions.

Remove such productions if any among G _i, j : RHS ( G _i, k ) _h where 0 ≤ h < m of which the right-hand side either is identical to the left-hand side, or has previously been replaced while applying the process described in this section to productions with G _i, j on the left-hand side.

Repeat this process until there are no more ~~production~~ productions of the form:

	G _i, j :
		G _i, k where ~~0 ≤~~ 0 ≤ j < n and ~~0 ≤~~ 0 ≤ k < n

in the grammar G _i.

8.5.4.2.2 Eliminating Duplicate Terminal Symbols

Given an EXI proto-grammar G _i, with non-terminals G _i, 0, G _i, 1, …, G _{i, n-1

i, n−1}, identify all pairs of productions that have the same non-terminal on the ~~left hand~~ left-hand side and the same terminal symbol on the ~~right hand~~ right-hand side of the form:

	G _i, j :
		Terminal G _i, k
		Terminal G _i, l

where k ≠ l and Terminal represents a particular terminal symbol and replace them with a single production:

	G _i, j :
		Terminal G _{i, k ⊔ l}

where G _{i, k ⊔ l} is a distinct non-terminal that accepts the inputs accepted by G _i, k and the inputs accepted by G _i, l. Here the notation " k ⊔ l " denotes a union set of integers and is used to uniquely identify the index of such a non-terminal.

When G _i is a type grammar, if both k and l are smaller than content index of G _i, k ⊔ l is also considered to be smaller than content for the purpose of index comparison purposes. Otherwise, if either k or l is not smaller than content , k ⊔ l is considered to be larger than content .

If the non-terminal G _{i, k ⊔ l} does not exist, create it as follows:

	G _{i, k ⊔ l} :
		RHS ( G _i, k ) ₀
		RHS ( G _i, k ) ₁
		⋮
		RHS ( G _i, k ) _m-1


		RHS ( G _i, l ) ₀
		RHS ( G _i, l ) ₁
		⋮
		RHS ( G _i, l ) _n-1 n−1

where RHS ( G _i, k ) ₀, RHS ( G _i, k ) ₁, …, RHS ( G _i, k ) _m-1 and RHS ( G _i, l ) ₀, RHS ( G _i, l ) ₁, …, RHS ( G _i, l ) _n-1

n−1 represent the ~~right hand~~ right-hand side of each production in the Grammar G _i that has the non-terminals G _j, k and G _j, l on the ~~left hand~~ left-hand side respectively and m and n are the number of such productions.

Repeat this process until there are no more productions in the grammar G _i of the form:

	G _i, j :
		Terminal G _i, k
		Terminal G _i, l

Then, identify any identical productions of the following form:

	G _i, j :
		Terminal G _i, k
		Terminal G _i, k

where 0 ≤ k < n , n is the number of productions in G _i and Terminal represents a specific terminal symbol, then remove one of them until there are no more productions remaining in the grammar G _i of this form.

8.5.4.3 Event Code Assignment

This section describes the process for assigning unique event codes to each production in a normalized EXI grammar. Given a normalized EXI grammar G _i, apply the following process to each unique non-terminal G _i, j that occurs on the ~~left hand~~ left-hand side of the productions in G _i where ~~0 ≤~~ 0 ≤ j < n and n is the number of such non-terminals in G _i.

Sort all productions with G _i, j on the ~~left hand~~ left-hand side in the following order:

~~All~~ all productions with AT( qname ) on the ~~right hand~~ right-hand side sorted ~~lexically~~ lexicographically by qname ~~localName,~~ local-name, then by qname uri, followed by
all productions with AT( uri _x : *) on the right-hand side sorted lexicographically by uri , followed by
any production with ~~AT(*)~~ AT (*) on the ~~right hand~~ right-hand side, followed by
all productions with SE( qname ) on the ~~right hand~~ right-hand side sorted in schema order, followed by
all productions with SE( uri _x : *) on the right-hand side sorted in schema order, followed by
any production with SE(*) on the right-hand side, followed by
any production with EE on the ~~right hand~~ right-hand side, followed by
any production with CH on the ~~right hand~~ right-hand side.

In step 4 and step 5, the schema order of productions with SE( qname ) and SE( uri _x : *) on the right-hand side is determined by the order of the corresponding particles in the schema after any references to named model groups in the schema are expanded in place with the group definitions themselves. A content model of a complex type can be seen as a tree that consists of particles where particles of either element declaration terms or wildcard terms appear as leaves, and the order is assigned to those leaf particles by traversing the tree by depth-first method.

Given the sorted list of productions P ₀, P ₁, … P _n with the non-terminal G _i, j on the ~~left hand~~ left-hand side, assign event codes to each of the productions as follows:

Productions		Event Code

	P ₀	0
	P ₁	1
	⋮	⋮
	P _n-1 n−1	n -1 −1

8.5.4.4 Undeclared Productions

The normalized element and type grammars ~~derived~~ generated from a schema describe the sequences of child elements, attributes and character events that may occur in a particular EXI stream. However, there are additional events that may occur in an EXI stream that are not described by the schema, for example events representing comments, processing-instructions, schema deviations, etc.

This section first describes the process for, in cases with strict option value set to false, augmenting the normalized element and type grammars with productions that describe events that may occur in the EXI stream, but are not explicitly declared in the schema. It then describes the way, in cases with strict option value set to true, normalized element and type grammars are supplemented with productions to be prepared for the occurrences of xsi:type and xsi:nil attributes that are permitted by the schema.

In the normalized ~~element and type~~ grammars, terminal symbols AT and CH represent ~~attributes or~~ attribute and character events that ~~have schema-valid values per~~ can be represented by the EXI datatype representations associated ~~datatypes.~~ with their schema datatypes (see 7. Representing Event Content ). When the strict option ~~value~~ is ~~set to~~ false, ~~in order to efficiently permit schema-invalid values for these event types, terminal symbols~~ additional untyped AT and CH ~~predicated as schema-invalid~~ terminal symbols are ~~introduced to convey~~ added that ~~their values are schema-invalid.~~ can be used for representing attributes and character events that cannot be represented by the associated EXI datatype representations (e.g., schema-invalid values). The following table shows the notation used for such AT and CH terminals along with their definitions.

~~Notation~~ Notation	Definition
AT ( qname ~~) [schema-invalid]~~ ) [untyped value]	Terminal symbol that matches an attribute ~~(AT)~~ event with qname qname and ~~a schema-invalid~~ an untyped value.
~~AT () [schema-invalid]~~ AT () [untyped value]	Terminal symbol that matches an attribute ~~(AT)~~ event with any qname and ~~a schema-invalid~~ an untyped value.
CH ~~[schema-invalid]~~ [untyped value]	Terminal symbol that matches an a characters ~~(CH)~~ event with ~~a schema-invalid~~ an untyped value.

8.5.4.4.1 Adding Productions when Strict is False

This section describes the process for augmenting the normalized grammars when the value of the strict option is false. For each normalized element grammar Element _i, create a copy Element _i, content2 of Element _i, content where the index "content" is the content of the type of the element from which Element _i was created. Then, apply the following procedures.

Add the following production to each non-terminal Element _i, j that does not already include a production of the form Element _i, j : EE, such that 0 ≤ j ≤ content.

		Event Code

Element _i, j :
	EE	n . m

where n . m represents the next available event code with length 2.

Let E _i be the element declaration from which Element _i was created and T _k be the ~~{type definition}~~ {type definition} of E _i. Let Type _k and TypeEmpty _k be the type grammars created from T _k (see section 8.5.4.1.3 Type Grammars ). Add the following productions to Element _i.

		Event Code

Element _i, 0 :
	AT(xsi:type) Element _i, 0	n . m
	AT(xsi:nil) Element _i, 0	n .( m +1)

where n . m represents the next available event code with length 2.

Note:

	When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other attribute events of the same element, with xsi:type placed before xsi:nil when they both occur.

Semantics:

	When using schemas, all productions of the form LeftHandSide : AT (xsi:type) RightHandSide are evaluated as follows: ~~The~~ Let target-type be the value of ~~each AT (xsi:type) event is represented as a~~ the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName ). If there is no namespace in scope for the specified qname prefix, set the ~~QName~~ uri ~~is set~~ of target-type to empty ("") and the ~~QName~~ localName ~~is set~~ to the full lexical value of the QName, including the prefix. ~~The value of each AT (xsi:nil) event is represented as a Boolean (see 7.1.2 Boolean ) unless the given value is not a valid Boolean. If~~ Represent the value ~~is not a schema-valid Boolean, the AT (xsi:nil) event is represented by the AT() [schema-invalid value] terminal (see below). Semantics: When using schemas, all productions~~ of ~~the form~~ ~~LeftHandSide~~ target-type* ~~: AT (xsi:type) are evaluated as follows:~~ according to section 7. Representing Event Content ~~Let qname be the value of the xsi:type attribute~~ . If a grammar ~~exists~~ can be found for the ~~qname~~ target-type ~~type,~~ type using the encoded target-type representation, evaluate the element contents using the grammar for ~~the~~ ~~qname~~ target-type type instead of ~~the declared type for the current element~~ RightHandSide .
	When using schemas, productions of the form LeftHandSide : ~~AT (xsi:nil)~~ AT (xsi:nil) RightHandSide are evaluated as follows: Let nil be the value of the xsi:nil attribute. If nil is a valid Boolean, assign it the Boolean datatype representation (see 7.1.2 Boolean ) and encode it according to section 7. Representing Event Content . If nil is not a valid Boolean, represent the xsi:nil attribute event using the AT () [untyped value] terminal (see 8.5.4.4 Undeclared Productions* ). If the value of nil is true, evaluate the element contents using the grammar ~~for the~~ TypeEmpty _k, 0 k ~~type instead of the declared type for the current element~~ defined above rather than RightHandSide .

~~Add the following productions to~~ For each non-terminal Element _i, j, such that 0 ≤ j ≤ ~~content .~~ content , with zero or more productions of the following form:

	Element _i, j :
		AT ( qname ₀ ) [schema-typed value] NonTerminal ₀
		AT ( qname ₁ ) [schema-typed value] NonTerminal ₁
		⋮
		AT ( qname _x -1 ) [schema-typed value] NonTerminal _x-1

where x represents the number of attributes declared in the schema for this context, add the following productions:

		Event Code

Element _i, j :
	~~AT ()~~ AT () Element _i, j	n . m
	~~AT (~~ AT ( qname ₀ ) ~~[schema-invalid value]~~ [untyped value] ~~Element~~ NonTerminal _i, j 0	n .( m +1).0
	~~AT (~~ AT ( qname ₁ ) ~~[schema-invalid value]~~ [untyped value] ~~Element~~ NonTerminal _i, j 1	n .( m +1).1
	⋮	⋮
	~~AT (~~ AT ( qname _x -1 ) ~~[schema-invalid value]~~ [untyped value] ~~Element~~ NonTerminal _i, j x-1	n .( m +1).( x -1)
	~~AT () [schema-invalid value]~~ AT () [untyped value] Element _i, j	n .( m +1).( x )

where n . m represents the next available event code with length ~~2 and x represents the number of attributes declared in the schema for this context.~~ 2.

Note:

The value of each AT attribute event that has ~~a [schema-invalid value]~~ an [untyped value] is represented as a String (see 7.1.10 String ).

Like an element, an attribute may occur in a schema-invalid context, have a ~~schema-invalid~~ untyped (e.g., schema-invalid) value or both. However, unlike an element whose ~~occurance~~ occurrence and value are represented by separate SE and CH events, the ~~occurance~~ occurrence and value of an attribute are represented by a single AT event. Consequently, four kinds of AT ~~terminals~~ terminal symbols are needed ~~to represent~~ for the four possible ~~validity states for~~ representations of an ~~attribute.~~ attribute event in schema-informed grammars. The table below shows ~~the AT terminals that represent each of~~ these four ~~validity states~~ kinds of AT terminal symbols along with the equivalent combinations of SE and CH ~~events~~ terminal symbols for representing elements.

Table 8-1. ~~Events representing schema-valid and schema-invalid attributes~~ Equivalent terminal symbols for different attribute and ~~elements~~ element representations

~~Schema-valid~~ Schema-typed value

~~Schema-invalid~~ Untyped value

Schema-valid ~~occurance~~ occurrence

~~AT(~~ AT ( qname ) [schema-typed value]

~~SE(~~ SE ( qname ) CH [schema-typed value]

~~AT(~~ AT ( qname ) ~~[schema-invalid value]~~ [untyped value]

~~SE(~~ SE ( qname ) CH ~~[schema-invalid value ]~~ [untyped value]

Schema-invalid ~~occurance~~ occurrence

~~AT(*)~~ AT (*)

~~SE(*)~~ SE (*) CH [schema-typed value]

~~AT(*) [schema-invalid value]~~ AT (*) [untyped value]

~~SE(*)~~ SE (*) CH ~~[schema-invalid value ]~~ [untyped value]

When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other attribute events of the same element, with xsi:type placed before xsi:nil when they both occur.

Semantics:

~~When using schemas,~~	In a schema-informed grammar, all productions of the form LeftHandSide : ~~AT ()~~ AT () are evaluated as follows: Let qname be the qname of the attribute matched by ~~AT()~~ AT () If qname is xsi:type, let target-type be the value of the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName ). If there is no namespace in scope for the specified qname prefix, set the uri of target-type to empty ("") and the localName to the full lexical value of the QName, including the prefix. Encode target-type according to section 7. Representing Event Content . If a grammar can be found for the target-type type using the encoded target-type representation, evaluate the element contents using the grammar for target-type type instead of RightHandSide . If qname is xsi:nil, let nil be the value of the xsi:nil attribute. If nil is a valid Boolean value, assign it the Boolean datatype representation (see 7.1.2 Boolean ) and encode it according to section 7. Representing Event Content . If nil is not a valid Boolean value, represent the xsi:nil attribute event using the AT () [untyped value] terminal (see 8.5.4.4 Undeclared Productions* ). If the value of nil is true, evaluate the element contents using the grammar for the TypeEmpty _k type defined above rather than RightHandSide . If a global attribute definition exists for qname , ~~represent~~ let global-type be the ~~value~~ datatype of the global attribute. If the attribute ~~according to its~~ value can be represented using the datatype representation associated with global-type , it SHOULD be represented using the datatype representation associated with global-type (see 7. Representing Event Content ). ~~Otherwise, represent~~ If the attribute value of is not represented using the datatype representation associated with global-type , represent the attribute ~~as a String~~ event using the AT () [untyped value] terminal (see ~~7.1.10 String~~ 8.5.4.4 Undeclared Productions* ).

Add the following production to Element _i.

		Event Code

Element _i, 0 :
	NS Element _i, 0	n . m

where n . m represents the next available event code with length 2.

When the value of the selfContained option is true, add the following production to Element _i.

		Event Code

Element _i, 0 :
	SC Fragment	n . m

where n . m represents the next available event code with length 2.

Semantics:

All productions of the form LeftHandSide : SC Fragment are evaluated as follows:

Save the string table, ~~grammars, namespace prefixes~~ grammars and any implementation-specific state learned while processing this EXI Body.
Initialize the string table, ~~grammars, namespace prefixes~~ grammars and any implementation-specific state learned while processing this EXI Body to the state they held just prior to processing this EXI Body.
Skip to the next byte-aligned boundary in the ~~stream.~~ stream if it is not already at such a boundary.
Let qname be the qname of the SE event immediately preceding this SC event.
Let content be the sequence of events following this SC event that match the grammar for element qname , up to and including the terminating EE event.
Evaluate the sequence of events (SD, SE( qname ), content , ED) according to the Fragment grammar. (see 8.5.2 Schema-informed Fragment Grammar )
Skip to the next byte-aligned boundary in the stream if it is not already at such a boundary.
Restore the string table, ~~grammars, namespace prefixes~~ grammars and implementation-specific state learned while processing this EXI Body to that saved in step 1 above.

Add the following productions to each non-terminal Element _i, j, such that 0 ≤ j ≤ content .

		Event Code

Element _i, j :
	~~SE ()~~ SE () Element _i, content2	n . m
	CH ~~[schema-invalid value ]~~ [untyped value] Element _i, content2	n .( m +1)
	ER Element _i, content2	n .( m +2)
	CM Element _i, content2	n .( m +3).0
	PI Element _i, content2	n .( m +3).1

where n . m represents the next available event code with length 2.

Note:

Productions of the form LeftHandSide : CH ~~[schema-invalid value ]~~ [untyped value] RightHandSide match ~~schema-invalid~~ untyped character data ~~that is~~ represented as ~~untyped data~~ a String in the EXI ~~stream. Schema-valid character~~ stream (e.g., schema-invalid values). Character data is represented ~~as typed data in~~ using the ~~EXI stream~~ datatype representation associated with the schema datatype of the character data are matched by productions of the form LeftHandSide : CH ~~[schema-valid value ]~~ [schema-typed value] RightHandSide described in section 8.5.4.1.3.1 ~~SimpleType~~ Simple Type Grammars .

Semantics:

In a schema-informed grammar, all productions of the form LeftHandSide : SE (*) RightHandSide are evaluated as follows:

~~The terminal symbol SE (*) is only~~ Let qname be the qname of the element matched if by SE (*)
If a ~~more specific match for the current event~~ global element grammar does not exist in for element qname , create one according to section 8.4.3 Built-in Element Grammar .
Evaluate the ~~grammar.~~ element content using the global element grammar for element qname .
Evaluate the remainder of the event sequence using RightHandSide

Add the following production to Element _i, content2 and to each non-terminal Element _i, j that does not already include a production of the form Element _i, j : EE, such that content < j < n , where n is the number of non-terminals in Element _i.

		Event Code

Element _i, j :
	EE	n . m

where n . m represents the next available event code with length 2.

Add the following productions to Element _i, content2 and to each non-terminal Element _i, j, such that content < j < n , where n is the number of non-terminals in Element _i.

		Event Code

Element _i, j :
	~~SE ()~~ SE () Element _i, j	n . m
	CH ~~[schema-invalid value ]~~ [untyped value] Element _i, j	n .( m +1)
	ER Element _i, j	n .( m +2)
	CM Element _i, j	n .( m +3).0
	PI Element _i, j	n .( m +3).1

where n . m represents the next available event code with length 2.

Semantics:

In a schema-informed grammar, all productions of the form LeftHandSide : SE (*) RightHandSide are evaluated as follows:

Let qname be the qname of the element matched by SE (*)
If a global element grammar does not exist for element qname , create one according to section 8.4.3 Built-in Element Grammar .
Evaluate the element content using the global element grammar for element qname .
Evaluate the remainder of the event sequence using RightHandSide

Apply the process described above for element grammars to each normalized type grammar Type _i
. and TypeEmpty _i.

8.5.4.4.2 Adding Productions when Strict is True

This section describes the process for augmenting the normalized grammars when the value of the strict option is true. For each normalized element grammar Element _i

i, apply the following procedures.

Let E _i be the element declaration from which Element _i was created and T _k be the ~~{type definition}~~ {type definition} of E _i. If T _k either has named ~~sub-types,~~ sub-types or is a simple type definition of which {variety} is union , add the following production to Element _i.

		Event Code

Element _i, 0 :
	AT(xsi:type) Element _i, 0	n . m

where n . m represents the next available event code with length 2.

~~Note:~~

Semantics:

When using schemas, productions of the form LeftHandSide : AT (xsi:type) RightHandSide are evaluated as follows:

	~~The~~ Let target-type be the value of ~~each AT (xsi:type) event is represented as a~~ the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName ). If there is no namespace in scope for the specified qname prefix, set the ~~QName~~ uri ~~is set~~ of target-type to empty ("") and the ~~QName~~ localName ~~is set~~ to the full lexical value of the QName, including the prefix. ~~Semantics:~~ ~~When using schemas, all productions of~~ Represent the ~~form~~ value of ~~LeftHandSide~~ target-type ~~: AT (xsi:type) are evaluated as follows:~~ according to section 7. Representing Event Content ~~Let qname be the value of the xsi:type attribute~~ . If a grammar ~~exists~~ can be found for the ~~qname~~ target-type ~~type,~~ type using the encoded target-type representation, evaluate the element contents using the grammar for ~~the~~ ~~qname~~ target-type type instead of ~~the declared type for the current element~~ RightHandSide .

Let Type _k and TypeEmpty _k be the type grammars created from T _k (see section 8.5.4.1.3 Type Grammars ). If the {nillable} property of E _i is true, add the following production to Element _i.

		Event Code

Element _i, 0 :
	AT(xsi:nil) Element _i, 0	n . m

where n . m represents the next available event code with length 2.

~~Note: The value of each AT (xsi:nil) event is represented as a Boolean (see 7.1.2 Boolean ).~~

Semantics:

When using schemas, productions of the form LeftHandSide : ~~AT (xsi:nil)~~ AT (xsi:nil) RightHandSide are evaluated as follows:

	Let nil be the value of the xsi:nil ~~attribute~~ attribute. If ~~the value of~~ nil is ~~true, evaluate~~ a valid Boolean, assign it the ~~element contents using~~ Boolean datatype representation (see 7.1.2 Boolean ) and encode it according to section 7. Representing Event Content . Otherwise, the ~~grammar for~~ value of the ~~TypeEmpty~~ nil ~~k, 0 type instead of the declared type for the current element When~~ is schema-invalid and cannot be represented when the value of the ~~selfContained~~ strict option is ~~true, add~~ true. If the ~~following production to~~ value of ~~Element~~ nil ~~i : Syntax Event Code~~ is true, evaluate the element contents using the grammar ~~Element~~ TypeEmpty _i, 0 k ~~: SC Fragment n . m where~~ defined above rather than n RightHandSide . ~~m represents the next available event code with length 2.~~

~~Semantics:~~ ~~All productions of the form LeftHandSide : SC Fragment are evaluated as follows:~~

Note:

Save the string table, grammars, namespace prefixes and any implementation-specific state learned while processing this EXI Body. Initialize the string table, grammars, namespace prefixes and any implementation-specific state learned while processing this EXI Body to the state they held just prior to processing this EXI Body. Skip to the next byte-aligned boundary in When the ~~stream. Let qname be~~ value of the ~~qname~~ strict option ~~of the SE event immediately preceding this SC event. Let content be the sequence of events following this SC event that match the grammar for element qname , up~~ is true, it is not possible to use xsi:type and ~~including~~ xsi:nil attributes together on the ~~terminating EE event. Evaluate~~ same element unless there is a wildcard attribute declared for the ~~sequence of events (SD, SE( qname ), content , ED) according~~ element. This is due to the ~~Fragment grammar. Restore the string table, grammars, namespace prefixes and implementation-specific state learned while processing this EXI Body to~~ fact that ~~saved~~ xsi:type specifies a target type definition, but xsi:nil is only permitted on nillable elements, not type definitions.
When xsi:type and/or xsi:nil attributes appear in ~~step 1 above.~~ an element where schema-informed grammars are in effect, they MUST occur before any other attribute events of the same element, with xsi:type placed before xsi:nil when they both occur.

9. EXI Compression

The use of EXI compression increases compactness utilizing additional computational resources. EXI compression combines knowledge of XML with a widely adopted, standard compression algorithm to achieve higher compression ratios than would be achievable by applying compression to the entire stream.

EXI compression is applied when compression is turned on or when alignment is set to pre-compression . Byte-aligned representations of event codes and content items are more amenable to compression algorithms compared to unaligned representations because most compression algorithms operate on series of bytes to identify redundancies in the octets. Therefore, when EXI compression is used, event codes and content items of EXI events are encoded as aligned bytes in accordance with 6.2 Representing Event Codes and 7. Representing Event Content .

EXI compression splits a sequence of EXI events into a number of contiguous blocks of events. Events that belong to the same block are transformed into lower entropy groups of similar values called channels , which are individually well suited for standard compression algorithms. To reduce compression overhead, smaller channels are combined before compressing them, while larger channels are compressed independently. The criteria EXI compression uses to define and combine channels is intentionally simple to facilitate implementation, reduce processing overhead, and avoid the need to encode channel ordering or grouping information in the format. The figure below presents a schematic view of the steps involved in EXI compression.

Figure 9-1. EXI Compression Overview

In the following sections, 9.1 Blocks defines blocks and explains how EXI events are partitioned into blocks. Section 9.2 Channels defines channels, their organization as well as how a group of channels correlate to its corresponding block of events. Section 9.3 Compressed Streams describes how some channels are combined as needed in preparation for applying compression algorithms on channels.

9.1 Blocks

EXI compression partitions the sequence of EXI events into a sequence of one or more non-overlapping blocks. Each block preceding the final block contains the minimum set of consecutive events that ~~have~~ result in exactly blockSize ~~Attribute (AT) and Character (CH)~~ values , in its value channels (see 9.2.2 Value Channels ), where blockSize is the block size of the EXI stream (see 5.4 EXI Options ). The final block contains no more than blockSize ~~Attribute (AT) and Character (CH)~~ values . in its value channels.

9.2 Channels

Events inside each block are multiplexed into channels. The first channel of each block is the structure channel described in Section 9.2.1 Structure Channel . The remaining channels in each block are value channels described in Section 9.2.2 Value Channels . The diagram below presents an exemplary view of the transformation in which events within a block are multiplexed into channels in one way and channels are demultiplexed into events in the other way.

Figure 9-2. Multiplexing EXI events into channels

9.2.1 Structure Channel

The structure channel of each block defines the overall order and structure of the events in that block. It contains the event codes and associated content for each event in the block, except for Attribute (AT) and Character (CH) values , , which are stored in the value channels. In addition, there are two kinds of attribute events whose values are stored in the structure channel instead of in value ~~channels, which are xsi:nil and~~ channels. The value of each xsi:type ~~attributes~~ attribute is stored in the structure channel. The value of each xsi:nil attribute that ~~match~~ matches a schema-informed grammar ~~production.~~ production and has a schema-valid value is also stored in the structure channel. These attribute events are intrinsic to the grammar system thus are essential in processing the structure channel because their values affect the grammar to be used for processing the rest of the elements on which they appear. All event codes and content in the structure stream occur in the same order as they occur in the EXI event sequence.

9.2.2 Value Channels

The values of the Attribute (AT) and Character (CH) events in each block are organized into separate channels based on the qname of the associated attribute or element. Specifically, the value of each Attribute (AT) event is placed in the channel identified by the qname of the Attribute and the value of each Character (CH) event is placed in the channel identified by the qname of its parent Start Element (SE) event. Each block contains exactly one channel for each distinct element or attribute qname that occurs in the block. The values in each channel occur in the order they occur in the EXI event sequence.

9.3 Compressed Streams

The channels in a block are further organized into compressed streams. Smaller channels are combined into the same compressed stream, while others are each compressed separately. Below are the rules applied within the scope of a block used to determine the channels to be combined together, the order of the compressed streams and the order amongst the channels that are combined into the same compressed stream.

If the block contains at most 100 values , the block will contain only 1 compressed stream containing the structure channel followed by all of the value channels. The order of the value channels within the compressed stream is defined by the order in which the first value in each channel occurs in the EXI event sequence.

If the block contains more than 100 values , the first compressed stream contains only the structure channel. The second compressed stream contains all value channels that contain ~~no more than~~ at most 100 values . And the remaining compressed streams each contain only one channel, each having more than 100 values . The order of the value channels within the second compressed stream is defined by the order in which the first value in each channel occurs in the EXI event sequence. Similarly, the order of the compressed streams following the second compressed stream in the block is defined by the order in which the first value of the channel inside each compressed stream occurs in the EXI event sequence.

Note:

EXI compression changes the order in which event codes and value s are read and written to and from an EXI stream. ~~Implementations~~ EXI processors must encode and decode value s in this revised order so order sensitive constructs like the string table (see 7.3 String Table ) work properly.

When the value of the compression option is set to true, each compressed stream in a block is stored using the standard DEFLATE Compressed Data Format defined by RFC 1951 [IETF RFC 1951] . Otherwise, when the value of the alignment option is set to pre-compression , each compressed stream in a block is stored directly without the DEFLATE algorithm.

10. Conformance

10.1 EXI Stream Conformance

[Definition:] A ~~Conformant~~ conformant EXI ~~streams~~ stream ~~consist~~ consists of a sequence of octets that follows the syntax of EXI stream that is defined in this document. [Definition:] EXI format provides a way to involve user-defined datatype representations in EXI streams processing, which is an extension point that, when used in conjunction with relevant datatype representations specifications external to this document, leads to the formulation of Extended EXI streams .

Conformance of extended EXI streams ~~are~~ is relative to the syntax defined by the relevant user-defined datatype representations specifications. The definitions of user-defined datatype representations syntax are out of the scope of this document. [Definition:] An extended EXI stream is a conformant extended EXI ~~streams~~ stream if replacing value items represented using user-defined datatype representations with their intrinsic representations would make the stream a conformant EXI ~~streams~~ stream . ~~When the use of user-defined datatype representations is expected, and agreed upon prior to the exchange of~~ An extended EXI ~~streams, the parties intended to participate in the exchange not only need to share the knowledge about the datatype representations, but also MUST advertise the~~ stream described as ~~"EXI streams with regards to datatype representations S " instead of simply as "EXI streams" when they are asked to do so, where S is~~ an ~~unordered set of datatype representations. An~~ "EXI ~~streams~~ stream with regards to datatype representations S " " where S is the set of datatype representations can be processed by an EXI stream decoder only if the processor has the shared knowledge about each one of the datatype representations in the set S . ~~EXI stream decoders MAY fail with an error when they receive an extended EXI Stream that uses an user-defined datatype representations that it does not understand.~~

The structural syntax of EXI streams and extended EXI streams is described by the abstract EXI grammar system defined in this document. Although this document specifies the normative way in which XML Schema schemas are mapped into the EXI grammar system to make a schema-informed ~~grammar,~~ grammars, EXI allows the use of other schema languages to process EXI streams or extended EXI streams so far as there is a well known EXI grammar binding to of the schema language and the binding preserves the semantics ~~part~~ of the EXI grammar system. EXI streams or extended EXI streams generated using schemas of such schema language are still conformant. The definitions of grammar ~~binding to~~ bindings for schema languages other than XML Schema is are out of the scope of this document, and each ~~community of~~ schema ~~languages~~ language community is encouraged to define a its own binding in order to make it possible to harness the ~~most~~ utmost efficiency out of EXI when schemas of ~~that~~ the language are ~~available .~~ available.

10.2 EXI Processor Conformance

The conformance of EXI Processors ~~are~~ is defined separately for each of the two processor roles, EXI stream encoders and EXI stream decoders ; the conformance of the former is described in terms of the conformance of the EXI streams or extended EXI streams that they produce, while that of the latter is based on the set of format features that EXI stream decoders are prepared ~~with~~ for in the processing of conformant EXI streams or conformant extended EXI streams .

An EXI stream encoder is conformant if and only if it is capable of generating conformant EXI streams or conformant extended EXI streams given any input structured data it is made to work on. On the other hand, EXI stream decoders MUST support all format features described in this document as they are explained, except for the capability of handling Datatype Representation Map which is an optional feature. EXI stream decoders that do not implement Datatype Representation Map feature MUST report an error with a meaningful message upon encountering a "datatypeRepresentationMap" element while processing EXI options documents in EXI headers . Both an EXI stream encoder and an EXI stream decoder MAY support only a certain range of values for the EXI header option blockSize . For interoperability between processors, every EXI processors SHOULD at least support the blockSize option value of 1,000,000.

A References

A.1 Normative References

IETF RFC 1951: DEFLATE Compressed Data Format Specification version 1.3 , P. Deutsch, Author. Internet Engineering Task Force, May 1996. Available at http://www.ietf.org/rfc/rfc1951.txt.
IETF RFC 2119: Key words for use in RFCs to Indicate Requirement Levels , S. Bradner, Author. Internet Engineering Task Force, June 1999. Available at http://www.ietf.org/rfc/rfc2119.txt.
IETF RFC 3023: XML Media Types , M. Murata, S. St.Laurent and D. Kohn, Author. Internet Engineering Task Force, January 2001. Available at http://www.ietf.org/rfc/rfc3023.txt.
ISO/IEC 10646: ISO/IEC 10646-1:2000. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane and ISO/IEC 10646-2:2001. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 2: Supplementary Planes , as, from time to time, amended, replaced by a new edition or expanded by the addition of new parts. [Geneva]: International Organization for Standardization. (See http://www.iso.org for the latest version.)
~~Unicode Database~~ XML 1.0: Extensible Markup Language (XML) 1.0 (Fifth Edition) , T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, Editors. World Wide Web Consortium, 10 February 1998, revised 26 November 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. The ~~Unicode Consortium. Unicode Character Database (Revision 5.0.0). Available at: http://www.unicode.org/Public/5.0.0/ucd/UCD.html~~ latest version is available at http://www.w3.org/TR/REC-xml .
XML ~~1.0~~ 1.1: Extensible Markup Language (XML) ~~1.0 (Fourth~~ 1.1 (Second Edition) , T. Bray, J. Paoli, C. M. Sperberg-McQueen, ~~and~~ E. Maler, F. Yergeau, and J. Cowan, Editors. World Wide Web Consortium, 10 04 February ~~1998,~~ 2004, revised 16 August 2006. This version is ~~http://www.w3.org/TR/2006/REC-xml-20060816.~~ http://www.w3.org/TR/2006/REC-xml11-20060816. The latest version is available at ~~http://www.w3.org/TR/REC-xml~~ http://www.w3.org/TR/xml11 .
Namespaces in XML: Namespaces in XML 1.0 (Second Edition) , T. Bray, D. Hollander, A. Layman, and R. Tobin, Editors. World Wide Web Consortium, 14 January 1999, revised 16 August 2006. This version is http://www.w3.org/TR/2006/REC-xml-names-20060816 . The latest version is available at http://www.w3.org/TR/xml-names .
XML Information Set: XML Information Set (Second Edition) , J. Cowan and R. Tobin, Editors. World Wide Web Consortium, 24 October 2001, revised 4 February 2004. This version is http://www.w3.org/TR/2004/REC-xml-infoset-20040204. The latest version is available at http://www.w3.org/TR/xml-infoset .
XML Schema Structures: XML Schema Part 1: Structures Second Edition , H. Thompson, D. Beech, M. Maloney, and N. Mendelsohn, Editors. World Wide Web Consortium, 2 May 2001, revised 28 October 2004. This version is http://www.w3.org/TR/2004/REC-xmlschema-1-20041028. The latest version is available at http://www.w3.org/TR/xmlschema-1 .
XML Schema Datatypes: XML Schema Part 2: Datatypes Second Edition , P. Byron and A. Malhotra, Editors. World Wide Web Consortium, 2 May 2001, revised 28 October 2004. This version is http://www.w3.org/TR/2004/REC-xmlschema-2-20041028. The latest version is available at http://www.w3.org/TR/xmlschema-2 .

A.2 Other References

Efficient XML: Efficient XML , part of [EXI Measurements Note] independently referenced. The latest version is available at http://www.w3.org/TR/exi-measurements/#contributions-efx .
EXI Evaluation Note: Efficient XML Interchange Evaluation , Carine Bournez, Editor. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-evaluation/ .
EXI Impacts Note: Efficient XML Interchange (EXI) Impacts , Jaakko Kangasharju, Editor. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-impacts/ .
EXI Measurements Note: Efficient XML Interchange Measurements Note , Greg White, Jaakko Kangasharju, Don Brutzman and Stephen Williams, Editors. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-measurements/ .
EXI Primer: Efficient XML Interchange (EXI) Primer , Daniel Peintner, Santiago Pericas-Geertsen, Editors. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-primer/ .
Greibach Normal Form: A New Normal-Form Theorem for Context-Free Phrase Structure Grammars , Sheila A. Greibach, Author. Journal of the ACM Volume 12 Issue 1, January 1965, pp. 42–52.
Huffman Coding: A Method for the Construction of Minimum-Redundancy Codes , D. A. Huffman, Author. Proceedings of the I.R.E., September 1952, pp. 1098-1102.
IEEE 754-2008: IEEE Standard for Floating-Point Arithmetic
ISO/IEC 19757-2:2003: Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG
SOAP 1.2: SOAP Version 1.2 Part 1: Messaging Framework , M. Gudgin, M. Hadley, N. Mendelsohn, J-J. Moreau, H. Frystyk Nielsen, Editors. World Wide Web Consortium, 24 June 2003. This version is http://www.w3.org/TR/2003/REC-soap12-part1-20030624/. The latest version is available at http://www.w3.org/TR/soap12-part1/ .
XBC Measurement Methodologies: XML Binary Characterization Measurement Methodologies , S. D. Williams and P. Haggar, Editors. World Wide Web Consortium, 31 March 2005. This version is http://www.w3.org/TR/2005/NOTE-xbc-measurement-20050331/. The latest version is available at http://www.w3.org/TR/xbc-measurement .
XBC Use Cases: XML Binary Characterization Use Cases , Mike Cokus and Santiago Pericas-Geertsen, Editors. World Wide Web Consortium, 31 March 2005. This version is http://www.w3.org/TR/2005/NOTE-xbc-use-cases-20050331/. The latest version is available at http://www.w3.org/TR/xbc-use-cases .
XBC Properties: XML Binary Characterization Properties , Mike Cokus and Santiago Pericas-Geertsen, Editors. World Wide Web Consortium, 31 March 2005. This version is http://www.w3.org/TR/2005/NOTE-xbc-properties-20050331/ The latest version is available at http://www.w3.org/TR/xbc-properties/ .

B Infoset Mapping

This appendix contains the mappings between the XML Information Set [XML Information Set] model and the EXI format. Starting from the document information item, each information item definition is mapped to its respective unordered set of EXI event ~~types.~~ types (see Table 4-1 ). The actual order amongst information set items when it is relevant reflects the occurrence order of EXI events or their references in an EXI stream that correlate to the infoset items. As used in the XML Information Set specification, the Infoset property names are shown in square brackets, [thus] .

Note:

As has been prescribed in section 2. Design Principles , EXI is designed to be compatible with the XML Information Set. While this approach is both legitimate and practical for designing a succinct format interoperable with XML family of specifications and technologies, it entails that some lexical constructs of XML not recognized by the XML Information Set are not represented by EXI, either. Examples of such unrepresented lexical constructs of XML include white space outside the document element, white space within tags, the kind of quotation marks (single or double) used to quote attribute values, and the boundaries of CDATA marked sections.

No constructs in EXI format facilitate the representation of [character encoding scheme] , [standalone] and [version] properties which are available in the definition of Document Information Item of XML Information Set (see B.1 Document Information Item ). EXI is made agnostic about [character encoding scheme] and [version] properties as they are in XML Information Set, and considers them to be the properties of XML serializers in use. EXI forgoes [standalone] property because simply having no references to any external markup declarations practically serves the purpose with less complexity.

B.1 Document Information Item

A document information item maps to a pair of SD Start Document (SD) and ~~ED event~~ End Document (ED) events with each of its properties subject to further mapping as shown in the following table.

Table B-1. Mapping between the document information item properties to EXI event types
Property	EXI event types
[children]	CM* PI* DT? [SE, EE]
[document element]	[SE, EE]
[notations]	Computed based on text content item of DT to which each notation information set item ~~maps to.~~ maps.
[unparsed entities]	Computed based on text content item of DT to which each unparsed entity information set item ~~maps to.~~ maps.
[base URI]	The base URI of the EXI stream
[character encoding scheme]	N/A
[standalone]	Not available
[version]	Not available
[all declarations processed]	True if all declarations contained directly or indirectly in DT are processed, otherwise false, which is the processor quality as opposed to the information provided by the format.

B.2 Element Information Items

An element information item maps to a pair of a SE Start Element (SE) event and the corresponding EE End Element (EE) event with each of its properties subject to further mapping as shown in the following table.

Table B-2. Mapping of the element information item properties to EXI event types
Property	EXI event types
[namespace name]	SE
[local name]	SE
[prefix]	SE
[children]	[SE, EE]* PI* CM* CH* ER*
[attributes]	AT*
[namespace attributes]	NS*
[in-scope namespaces]	The namespace information items computed using the [namespace attributes] properties of this information item and its ancestors
[base URI]	The base URI of the element information item
[parent]	Computed based on the last SE event encountered that did not get a matching EE event if any, or computed based on the SD event

B.3 Attribute Information Item

An attribute information item maps to an AT Attribute (AT) event with each of its properties subject to further mapping as shown in the following table.

Table B-3. Mapping of the attribute information item properties to EXI event types
Property	EXI event types
[namespace name]	AT
[local name]	AT
[prefix]	AT
[normalized value]	The value of AT
[specified]	True if the item maps to AT, otherwise false
[attribute type]	Computed based on AT and DT
[references]	Computed based on [attribute type] and value of AT
[owner element]	Computed based on the last SE event encountered that did not get a matching EE event

B.4 Processing Instruction Information Item

A processing instruction information maps to a PI Processing Instruction (PI) event with each of its properties subject to further mapping as shown in the following table.

Table B-4. Mapping of the processing instruction information item properties to EXI event types
Property	EXI event types
[target]	PI
[content]	PI
[base URI]	The base URI of the processing information item
[notation]	Computed based on the availability of the internal DTD subset
[parent]	Computed based on the last SE event encountered that did not get a matching EE event type

B.5 Unexpanded Entity Reference Information item

An unexpanded entity reference information item maps to an ER Entity Reference (ER) event with each of its properties subject to further mapping as shown in the following table.

Table B-5. Mapping of the entity reference information item properties to the EXI event types
Property	EXI event types
[name]	ER
[system identifier]	Based on the availability of the internal DTD subset
[public identifier]	Based on the availability of the internal DTD subset
[declaration base URI]	The base URI of the unexpanded entity reference information item
[parent]	Computed based on the last SE event encountered that did not get a matching EE event type

B.6 Character Information item

A character information item maps to the individual characters contained in a CH Characters (CH) event following a SE event that did not get a matching EE event.

Table B-6. Mapping of the character information item properties and the EXI event types
Property	EXI event types
[character code]	Each character in CH
[element content whitespace]	Computed based on [parent] and DT
[parent]	Computed based on the last SE event encountered that did not get a matching EE event

B.7 Comment Information item

A comment information item maps to a CM Comment (CM) event with each of its properties subject to further mapping as shown in the following table.

Table B-7. Mapping of the comment information item properties and the EXI event types
Property	EXI event types
[content]	text content item of CM
[parent]	Computed based on the last SE event encountered that did not get a matching EE event, or the SD event

B.8 Document Type Declaration Information item

A document type declaration information item maps to a DT DOCTYPE (DT) event with each of its properties subject to further mapping as shown in the following table.

Table B-8. Mapping of the document type declaration information item properties to the EXI event types
Property	EXI event types
[system identifier]	DT
[public identifier]	DT
[children]	Computed based on text content item of DT
[parent]	Computed based on the SD event

B.9 Unparsed Entity Information Item

An unparsed entity information item maps to part of the text content item of DT DOCTYPE (DT) event with each of its properties subject to further mapping as shown in the following table.

Table B-9. Mapping of the unparsed entity information item properties to EXI event types
Property	EXI event types
[name]	Computed based on text content item of DT
[system identifier]	Computed based on text content item of DT
[public identifier]	Computed based on text content item of DT
[declaration base URI]	The base URI of the unparsed entity information item
[notation name]	Computed based on text content item of DT
[notation]	Computed based on text content item of DT

B.10 Notation Information Item

An notation information item maps to part of the text content item of DT DOCTYPE (DT) event with each of its properties subject to further mapping as shown in the following table.

Table B-10. Mapping of the notation information item properties to EXI event types
Property	EXI event types
[name]	Computed based on text content item of DT
[system identifier]	Computed based on text content item of DT
[public identifier]	Computed based on text content item of DT
[declaration base URI]	The base URI of the notation information item

B.11 Namespace Information Item

An namespace information item ~~ismaps toa NS~~ maps to a Namespace Declaration (NS) event with each of its properties subject to further mapping as shown in the following table.

Table B-11. Mapping of the namespace information item properties to EXI event types
Property	EXI event types
[prefix]	NS
[namespace name]	NS

C XML Schema for EXI Options Header Document

The following schema describes the EXI options header. It is designed to produce smaller headers for option combinations used when ~~compactness is critical.<xsd:schema targetNamespace="http://www.w3.org/2007/07/exi"~~ compactness is critical.

<xsd:schema targetNamespace="http://www.w3.org/2009/exi"

            xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            elementFormDefault="qualified">
  <xsd:element name="header">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="lesscommon" minOccurs="0">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="uncommon" minOccurs="0">
                <xsd:complexType>
                  <xsd:sequence>
                    <xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded" />

                    <xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded" 
                             processContents="skip" />

                    <xsd:element name="alignment" minOccurs="0">
                      <xsd:complexType>
                        <xsd:choice>
                          <xsd:element name="byte">
                            <xsd:complexType />
                          </xsd:element>
                          <xsd:element name="pre-compress">
                            <xsd:complexType />
                          </xsd:element>
                        </xsd:choice>
                      </xsd:complexType>
                    </xsd:element>
                    <xsd:element name="selfContained" minOccurs="0">
                      <xsd:complexType />
                    </xsd:element>
                    <xsd:element name="valueMaxLength" minOccurs="0">
                      <xsd:simpleType>
                        <xsd:restriction base="xsd:unsignedInt" /> 
                      </xsd:simpleType>
                    </xsd:element>
                    <xsd:element name="valuePartitionCapacity" minOccurs="0">
                      <xsd:simpleType>
                        <xsd:restriction base="xsd:unsignedInt" /> 
                      </xsd:simpleType>
                    </xsd:element>
                    <xsd:element name="datatypeRepresentationMap" minOccurs="0" maxOccurs="unbounded">

                    <xsd:element name="datatypeRepresentationMap" 
                                 minOccurs="0" maxOccurs="unbounded">

                      <xsd:complexType>
                        <xsd:sequence>
                          <xsd:any namespace="##other" /> <!-- schema datatype -->
                          <xsd:any namespace="##other" /> <!-- datatype representation -->

                          <!-- schema datatype -->
                          <xsd:any namespace="##other" processContents="skip" /> 
                          <!-- datatype representation -->
                          <xsd:any processContents="skip" /> 

                        </xsd:sequence>
                      </xsd:complexType>
                    </xsd:element>
                  </xsd:sequence>
                </xsd:complexType>
              </xsd:element>
              <xsd:element name="preserve" minOccurs="0">
                <xsd:complexType>
                  <xsd:sequence>
                    <xsd:element name="dtd" minOccurs="0">
                      <xsd:complexType />
                    </xsd:element>
                    <xsd:element name="prefixes" minOccurs="0">
                      <xsd:complexType />
                    </xsd:element>
                    <xsd:element name="lexicalValues" minOccurs="0">
                      <xsd:complexType />
                    </xsd:element>
                    <xsd:element name="comments" minOccurs="0">
                      <xsd:complexType />
                    </xsd:element>
                    <xsd:element name="pis" minOccurs="0">
                      <xsd:complexType />
                    </xsd:element>
                  </xsd:sequence>
                </xsd:complexType>
              </xsd:element>
              <xsd:element name="blockSize" minOccurs="0">
                <xsd:simpleType>
                  <xsd:restriction base="xsd:unsignedInt" /> 
                </xsd:simpleType>
              </xsd:element>                 
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
        <xsd:element name="common" minOccurs="0">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="compression" minOccurs="0">
                <xsd:complexType />
              </xsd:element>
              <xsd:element name="fragment" minOccurs="0">
                <xsd:complexType />
              </xsd:element>
              <xsd:element name="schemaId" minOccurs="0" nillable="true">
                <xsd:simpleType>
                  <xsd:restriction base="xsd:string" />
                </xsd:simpleType>
              </xsd:element>
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
        <xsd:element name="strict" minOccurs="0">
          <xsd:complexType />
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

  <!-- Built-in EXI Datatype IDs for use in datatype representation maps -->
  <xsd:simpleType name="base64Binary">
     <xsd:restriction base="xsd:base64Binary"/>
  </xsd:simpleType>
  <xsd:simpleType name="hexBinary" >
     <xsd:restriction base="xsd:hexBinary"/>
  </xsd:simpleType>
  <xsd:simpleType name="boolean" >
     <xsd:restriction base="xsd:boolean"/>
  </xsd:simpleType>
  <xsd:simpleType name="decimal" >
     <xsd:restriction base="xsd:decimal"/>
  </xsd:simpleType>
  <xsd:simpleType name="double" >
     <xsd:restriction base="xsd:double"/>
  </xsd:simpleType>
  <xsd:simpleType name="integer" >
     <xsd:restriction base="xsd:integer"/>
  </xsd:simpleType>
  <xsd:simpleType name="string" >
     <xsd:restriction base="xsd:string"/>
  </xsd:simpleType>
  <xsd:simpleType name="dateTime" >
     <xsd:restriction base="xsd:dateTime"/>
  </xsd:simpleType>
  <xsd:simpleType name="date" >
     <xsd:restriction base="xsd:date"/>
  </xsd:simpleType>
  <xsd:simpleType name="time" >
     <xsd:restriction base="xsd:time"/>
  </xsd:simpleType>
  <xsd:simpleType name="gYearMonth" >
     <xsd:restriction base="xsd:gYearMonth"/>
  </xsd:simpleType>
  <xsd:simpleType name="gMonthDay" >
     <xsd:restriction base="xsd:gMonthDay"/>
  </xsd:simpleType>
  <xsd:simpleType name="gYear" >
     <xsd:restriction base="xsd:gYear"/>
  </xsd:simpleType>
  <xsd:simpleType name="gMonth" >
     <xsd:restriction base="xsd:gMonth"/>
  </xsd:simpleType>
  <xsd:simpleType name="gDay" >
     <xsd:restriction base="xsd:gDay"/>
  </xsd:simpleType>

  <!-- Qnames reserved for future use in datatype representation maps -->
  <xsd:simpleType name="ieeeBinary32" >
     <xsd:restriction base="xsd:float"/>
  </xsd:simpleType>
  <xsd:simpleType name="ieeeBinary64" >
     <xsd:restriction base="xsd:double"/>
  </xsd:simpleType>
</xsd:schema>

Note:

The qnames exi:ieeeBinary32 and exi:ieeeBinary64 defined above are reserved for future use in Datatype Representation Maps to identify the 32-bit and 64-bit Binary Interchange Formats defined by the IEEE 754-2008 standard [IEEE 754-2008] .

D Initial Entries in String Table Partitions

D.1 Initial Entries in Uri Partition

The following table lists the entries that are initially populated in uri partitions, where partition name URI denotes that they are entries in the uri partition.

Table D-1. Initial values in *uri* partition
Partition	String ID	String Value
URI	0	"" [empty string]
URI	1	"http://www.w3.org/XML/1998/namespace"
URI	2	"http://www.w3.org/2001/XMLSchema-instance"

When XML Schemas are used to inform the grammars for processing EXI body, there is an additional entry that is appended to the uri partition.

Table D-2. Additional entry when XML Schemas are used
Partition	String ID	String Value
URI	3	"http://www.w3.org/2001/XMLSchema"

D.2 Initial Entries in Prefix Partitions

The following table lists the entries that are initially populated in prefix partitions, where XML-PF represents the partition for prefixes in the "http://www.w3.org/XML/1998/namespace" namespace and XSI-PF represents the partition for prefixes in the "http://www.w3.org/2001/XMLSchema-instance" namespace.

Table D-3. Initial *prefix* string table entries
Partition	String ID	String Value
""	0	"" [empty string]
XML-PF	0	"xml"
XSI-PF	0	"xsi"

D.3 Initial Entries in Local-Name Partitions

The following ~~table lists~~ tables list the entries that are initially populated in local-name partitions, where XML-NS represents the partition for local-names in the "http://www.w3.org/XML/1998/namespace" namespace, XSI-NS represents the partition for local-names in the "http://www.w3.org/2001/XMLSchema-instance" namespace, and XSD-NS represents the partition for local-names in the "http://www.w3.org/2001/XMLSchema" namespace.

Table D-4. Initial *local-name* string table entries
Partition	String ID	String Value
XML-NS	0	~~"space"~~ "base"
XML-NS	1	~~"lang"~~ "id"
XML-NS	2	~~"id"~~ "lang"
XML-NS	3	~~"base"~~ "space"
XSI-NS	0	~~"type"~~ "nil"
XSI-NS	1	~~"nil"~~ "type"

When XML Schemas are used to inform the grammars for processing EXI body, there an additional partition that is appended to the local-name partitions.

Table D-5. Additional initial *local-name* string table entries for schema-informed EXI streams
Partition	String ID	String Value
XSD-NS	0	~~"anyType"~~ "ENTITIES"
XSD-NS	1	~~"anySimpleType"~~ "ENTITY"
XSD-NS	2	~~"string"~~ "ID"
XSD-NS	3	~~"normalizedString"~~ "IDREF"
XSD-NS	4	~~"token"~~ "IDREFS"
XSD-NS	5	~~"language"~~ "NCName"
XSD-NS	6	~~"Name"~~ "NMTOKEN"
XSD-NS	7	~~"NCName"~~ "NMTOKENS"
XSD-NS	8	~~"ID"~~ "NOTATION"
XSD-NS	9	~~"IDREF"~~ "Name"
XSD-NS	10	~~"IDREFS"~~ "QName"
XSD-NS	11	~~"ENTITY"~~ "anySimpleType"
XSD-NS	12	~~"ENTITIES"~~ "anyType"
XSD-NS	13	~~"NMTOKEN"~~ "anyURI"
XSD-NS	14	~~"NMTOKENS"~~ "base64Binary"
XSD-NS	15	~~"duration"~~ "boolean"
XSD-NS	16	~~"dateTime"~~ "byte"
XSD-NS	17	~~"time"~~ "date"
XSD-NS	18	~~"date"~~ "dateTime"
XSD-NS	19	~~"gYearMonth"~~ "decimal"
XSD-NS	20	~~"gYear"~~ "double"
XSD-NS	21	~~"gMonthDay"~~ "duration"
XSD-NS	22	~~"gDay"~~ "float"
XSD-NS	23	~~"gMonth"~~ "gDay"
XSD-NS	24	~~"boolean"~~ "gMonth"
XSD-NS	25	~~"base64Binary"~~ "gMonthDay"
XSD-NS	26	~~"hexBinary"~~ "gYear"
XSD-NS	27	~~"float"~~ "gYearMonth"
XSD-NS	28	~~"double"~~ "hexBinary"
XSD-NS	29	~~"anyURI"~~ "int"
XSD-NS	30	~~"QName"~~ "integer"
XSD-NS	31	~~"NOTATION"~~ "language"
XSD-NS	32	~~"decimal"~~ "long"
XSD-NS	33	~~"integer"~~ "negativeInteger"
XSD-NS	34	~~"nonPositiveInteger"~~ "nonNegativeInteger"
XSD-NS	35	~~"negativeInteger"~~ "nonPositiveInteger"
XSD-NS	36	~~"long"~~ "normalizedString"
XSD-NS	37	~~"int"~~ "positiveInteger"
XSD-NS	38	~~"short"~~ "short"
XSD-NS	39	~~"byte"~~ "string"
XSD-NS	40	~~"nonNegativeInteger"~~ "time"
XSD-NS	41	~~"positiveInteger"~~ "token"
XSD-NS	42	~~"unsignedLong"~~ "unsignedByte"
XSD-NS	43	~~"unsignedInt"~~ "unsignedInt"
XSD-NS	44	~~"unsignedShort"~~ "unsignedLong"
XSD-NS	45	~~"unsignedByte"~~ "unsignedShort"

E Deriving Character Sets Set of Characters from XML Schema Regular Expressions

XML Schema datatypes specification [XML Schema Datatypes] defines ~~its~~ a regular expression ^XS2 syntax for use in pattern facets of simple type definitions. Pattern facets ~~are applied to values literally to~~ constrain the set of valid values to those that lexically ~~matches~~ match the specified regular expression. ~~Though regular expression syntax is defined by dozens of productions, after all, they are character sets that constitute a regular expression at~~ This section describes the ~~finest granularity of~~ rules for deriving the grammar, which are leveraged such as being combined, concatenated, complemented or subtracted in a bottom-up fashion to form a regular expression. In this regard, a regular expression can be seen as a sort of micro-schema that suggests a concrete character set ~~to which~~ of characters allowed in a string ~~are likely~~ value that conforms to ~~belong. The remainder of this section describes a method for deriving~~ a ~~character set from~~ given regular expression in an XML ~~Schema regular expression. Hereinafter, "character set" and "XML Schema regular expression" are referred to as "charset" and "regexp", respectively. Regexp syntax permits~~ Schema. In the ~~use of character class escapes XS2 some of which depend on~~ following description, the ~~mapping from code points to character properties. This document assumes~~ term "set-of-chars" is used as the ~~use~~ shorthand form of ~~revision 5.0.0~~ "set of ~~Unicode Standard [Unicode Database] to obtain the mapping.~~ characters".

At the top level, ~~regexp~~ the XML Schema regular expression syntax is summarized by the following ~~three productions,~~ production excerpted here from [XML Schema Datatypes] . Note the notation used for the numbers that tag the productions. "XSD:" is prefixed to the original numeric tags to make it easier to discern them as belonging to XML Schema specification.

[XSD:1] ^XS2 regExp ::= branch ( '|' branch )* ~~[XSD:2] XS2 branch ::= piece* [XSD:3] XS2 piece ::= atom quantifier?~~

~~These productions indicate that the charset of~~ The set-of-chars for a ~~regexp (i.e. regExp above) equals~~ regex that conforms to the syntax above is the union of ~~all~~ the ~~charsets~~ set-of-chars defined for each branch. Each branch of which corresponds to an atom that participates in the regexp. There are exceptions which are based on empirical observations that are introduced here to identify certain regexps that are not subject to the computation of charsets. If any atom itself a regex is ~~or contains one of~~ described by the following ~~character groups directly or indirectly, the charset of the whole regexp is defined to be the entire set of XML characters. All multi-character escapes~~ production:

[XSD:2] ^XS2 ~~(including meta-character '.' ) except~~ branch ::= piece*

The set-of-chars for ~~'\s' and '\d' . All category escapes XS2 that carry one~~ each branch of a regex is the ~~following character properties. All category names that are~~ union of the forms: 'L'[ulo]? , 'M'[n]? , 'N' , 'P' , 'Z' , 'S'[mo]? or 'C'[o]? . The following block names: Ethiopic, UnifiedCanadianAboriginalSyllabics, CJKUnifiedIdeographs, CJKCompatibilityIdeographs, ArabicPresentationForms-A, CJKUnifiedIdeographsExtensionA, YiSyllables, HangulSyllables and PrivateUse. complEsc XS2 (examples set-of-chars for each piece of ~~which are '\P{ L }' and '\P{ N }' ). negCharGroup~~ the branch. Each piece of a branch is described by the following production:

[XSD:3] ^XS2 ~~as indicated by meta-character '^' . See [XSD:15].~~ piece ::= atom quantifier?

Most regexps that contain one of the character groups listed above result in a very large number of characters, and even in such rare cases where it is not necessarily the case, there are usually alternative ways to describe the same effect more intuitively using none of the above constructs. The ~~rest of this section describes the system to derive character sets from such regexps that do not contain any~~ set-of-chars for each piece of ~~the character groups listed above. Shown below~~ a branch is the ~~rule~~ set-of-chars for ~~assembling~~ the ~~charset of a regexp given a list~~ atom portion of ~~atoms that are contained directly in the regexp, excluding those atoms contained in sub-regexp parenthesized within the regexp (see [XSD:9] below) in which case the sub-regexp itself is~~ the piece. The atom ~~that is included in the list. Note the pseudo-function notation~~ portion of ~~the form CS(arg) with arg being~~ a ~~regexp construct denotes the method of obtaining the character set of~~ piece is described by the ~~argument. [1] CS(regExp) := union of every CS(atom[0...N-1]) where N represents the number of atoms~~ following production:

~~There are three kinds of atoms per its definition [XSD:9].~~ [XSD:9] ^XS2 atom ::= Char | charClass | ( '(' regExp ')' )

~~This production directly translates to the following rule~~ The set-of-chars for ~~acquiring~~ the charset of an atom. [2] CS(atom) := a single char represented by Char (if atom is Char) or CS(charClass) (if atom is charClass. See rule [3]) or CS(regExp) (if atom is sub-regexp. See rule [1]) Similarly, there are three choices for charClass per its definition [XSD:11] below, which atom is ~~followed by~~ the ~~corresponding rule~~ set-of-chars for ~~acquiring~~ the ~~charset of a~~ Char, charClass . [XSD:11] XS2 charClass ::= charClassEsc | charClassExpr | WildcardEsc [3] CS(charClass) := CS(charClassEsc) (if charClass is charClassEsc. See [XSD:23] that defines the characters contained in CS(charClassEsc) for each kind of charClassEsc) CS(charClassExpr) (if charClass charClassExpr. See rule [3]) Note or regExp that ~~there is no rule specified above~~ constitutes the atom.

The set-of-chars for a ~~charClass~~ Char that ~~is a WildcardEsc . This is because the presence of a WildcardEsc causes to conclude the charset of~~ constitutes an atom contains the ~~regExp~~ single character that ~~contains this charClass to be~~ matches the entire XML charset (See rule [1] above). A charClassExpr is either posCharGroup , negCharGroup or charClassSub per production [XSD:12] and [XSD:13] as excerpted below. [XSD:12] XS2   charClassExpr  ::=  '['  charGroup  ']' [XSD:13] Char expression ^{XS2

  charGroup  ::=  posCharGroup  |  negCharGroup  |  charClassSub
[4] CS(charClassExpr) :=
CS(posCharGroup) (if charClassExpr is posCharGroup. See rule [5])
CS(charClassSub) (if charClassExpr is charClassSub. See rule [6])
Note
that
there
is
no
rule
specified
above}. The set-of-chars for a ~~charClassExpr~~ charClass that constitutes an atom is ~~a negCharGroup . This is because the presence of a negCharGroup causes to conclude~~ the ~~charset~~ set of characters specified by the ~~regExp that contains this charClassExpr to be the entire XML charset (See rule [1] above). posCharGroup is defined to be a sequence of charRange and charClassEsc per production [XSD:14]. [XSD:14]~~ charClass expression ^{XS2

  posCharGroup  ::=  (  charRange  |  charClassEsc  )+}. The ~~above production translates to the following rule~~ set-of-chars for ~~acquiring the charset of~~ a posCharGroup . [5] CS(posCharGroup) := union of every CS(charRange[0...M-1]) and every CS(charClassEsc[0...N-1]) where M and N represent the number of charRanges and charClassEscs contained in the posCharGroup, respectively. Lastly, charClassSub is defined using a subtraction operation as follows. [XSD:16] XS2   charClassSub  ::=  (  posCharGroup  |  negCharGroup  )  '-'  charClassExpr Because the presence of negCharGroup would have resulted in the containing regExp ~~to have the entire XML charset~~ sub-expression enclosed in parenthesis that constitutes an atom is the ~~first place, negCharGroup can be pruned from the above production, which makes~~ set-of-chars for the ~~following reduced version of [XSD:16]. [XSD:16']  charClassSub  ::=  posCharGroup  '-'  charClassExpr The above production translates to~~ regExp itself derived by recursively applying the ~~following~~ rule for acquiring the charset of a charClassSub . [6] CS(charClassSub) := characters that are found in CS(posCharGroup) (See rule [5]) but not in CS(charClassExpr) (See rule [4]) defined above.

F Content Coding and Internet Media Type

Two labels are defined for ~~use in~~ identifying and negotiating the ~~interchange~~ use of ~~XML Information Set data encoded as~~ EXI ~~streams.~~ for representing XML information in higher-level protocols. They serve two distinct ~~roles of indicating metadata in data interchange; one~~ roles. One is for content ~~coding,~~ coding and the other is for internet media type. In such protocols that support a mechanism to indicate the encoding transformation of the data being exchanged, the label "exi" is used as a content coding (see section F.1 Content Coding ) in an occurrence of or a request of an XML Information Set interchange of which the document body is encoded as an EXI stream. For other protocols that lack the capability of indicating the encoding transformation of the data being transferred, the other label "application/exi" is defined as an internet media type (see section F.2 Internet Media Type ) in order to identify that the data being retrieved or sent is an XML Information Set represented as an EXI stream.

F.1 Content Coding

The content-coding value "exi" is registered with the Internet Assigned Numbers Authority (IANA) for use with EXI. Protocols that ~~support a mechanism to indicate~~ can identify and negotiate the ~~encoding transformation~~ content coding of ~~the data being transferred (e.g. HTTP 1.1)~~ XML information independent of its media type, SHOULD use the ~~label~~ content coding "exi" (case-insensitive) to ~~annotate the transfer or the request of data structured as an XML Information Set to~~ convey the acceptance or actual use of ~~or the acceptance of~~ EXI encoding for ~~the interchange that is underway.~~ XML information.

F.2 Internet Media Type

A new media type registration "application/exi" described below is being proposed for community review, with the intent to eventually submit it to the IESG for review, approval, and registration with IANA.

Type name:

application

Subtype name:

exi

Required parameters:

none

Optional parameters:

none

Encoding considerations:

binary

Security considerations:

When used as an XML replacement in an application, EXI shares the same security concerns as XML, described in IETF RFC 3023 [IETF RFC 3023] , section 10.

In addition to concerns shared with XML, the schema identifier refers to information external to the EXI document itself. If an attacker is able to substitute another schema in place of the intended one, the semantics of the EXI document could be changed in some ways. As an example, EXI is sensitive to the order of the values in an enumeration. It is not known whether such an attack is possible on the actual structure of the document.

Also, EXI supports user-defined datatype representations, and such representations, if present in a document and purportedly understood by a processor, can be a security weakness. Definitions of these representations are expected to be external, often application- or industry-specific, so any definition needs to be analyzed carefully from the security perspective before being adopted.

Interoperability considerations:

The datatype representation map feature of EXI requires coordination between the producer and consumer of an EXI document, and is not recommended except in controlled environments or using standardized datatype representations potentially defined in the future.

EXI permits information necessary to decode a document to be omitted with the expectation that such information has been communicated out of band. Such omissions hinder interoperability in uncontrolled environments.

Published specification:

Efficient XML Interchange (EXI) Format 1.0, World Wide Web Consortium

Applications that use this media type:

No known applications currently use this media type.

Additional information:

Magic number(s):

	The first four octets may be hexadecimal 24 45 58 49 ("$EXI"). The first octet after these, or the first octet of the whole content if they are not present, has its high two bits set to values 1 and 0 in that order.

File extension(s):
	.exi

Macintosh file type code(s):
	APPL

Consideration of alternatives :
	When transferring EXI streams over a protocol that can identify and negotiate the content coding of XML information independent of its media-type, the content-coding should be used to identify and negotiate how the XML information is encoded and the media-type should be used to negotiate and identify what type of information is transfered.

Person & email address to contact for further information:

World Wide Web Consortium <web-human@w3.org>

Intended usage:

COMMON

Restrictions on usage:

none

Author/Change controller:

The EXI specification is the product of the World Wide Web Consortium's Efficient XML Interchange Working Group. The W3C has change control over this specification.

G Example Encoding (Non-Normative)

EXI Primer [EXI Primer] contains a section that explains the workings of EXI format using simple example documents. Those examples are intended to serve as a tool to confirm the understanding of the EXI format in action by going through encoding and decoding processes step by step.

H Schema-informed Grammar Examples (Non-Normative)

As an example to exercise the process to produce schema-informed element grammars, consider the following XML Schema fragment declaring two complex-typed elements, <product> and <order>:

~~<xs:element name="product">~~

Example H-1. Example XML Schema fragment

<xs:element name="product"> 
  <xs:complexType> 
    <xs:sequence maxOccurs="2"> 
      <xs:element name="description" type="xs:string" minOccurs="0"/> 
      <xs:element name="quantity" type="xs:integer" /> 
      <xs:element name="price" type="xs:float" /> 
    </xs:sequence> 
    <xs:attribute name="sku" type="xs:string" use="required" /> 
    <xs:attribute name="color" type="xs:string" use="optional" /> 
  </xs:complexType> 
</xs:element> 
<xs:element name="order"> 
  <xs:complexType> 
    <xs:sequence> 
      <xs:element ref="product" maxOccurs="unbounded" /> 
    </xs:sequence> 
  </xs:complexType> 
</xs:element>

Section H.1 Proto-Grammar Examples guides you through the process of ~~deriving~~ generating EXI proto-grammars from the schema components available in the example schema above. EXI grammars in the normalized form that correspond to the proto-grammars are shown in section H.2 Normalized Grammar Examples . Section H.3 Complete Grammar Examples shows the complete EXI grammars for elements <product> and <order>.

H.1 Proto-Grammar Examples

Grammars for element declaration terms "description", "quantity" and "price" are as follows. See section 8.5.4.1.6 Element Terms for the rules used to ~~derive~~ generate grammars ~~from~~ for element terms.

Term_description

	Term_description ₀ :
		SE( "description" ) Term_description ₁

	Term_description ₁ :
		EE

Term_quantity

	Term_quantity ₀ :
		SE( "quantity" ) Term_quantity ₁

	Term_quantity ₁ :
		EE

Term_price

	Term_price ₀ :
		SE( "price" ) Term_price ₁

	Term_price ₁ :
		EE

~~Grammars~~ The grammar for element particle "description" ~~are derived from~~ is created based on Term_description given ~~{ minOccurs }~~ { minOccurs } value of 0 and ~~{ maxOccurs }~~ { maxOccurs } value of 1. See section 8.5.4.1.5 Particles for the rules used to ~~derive~~ generate grammars ~~from~~ for particles.

Particle_description

	Term_description ₀ :
		SE( "description" ) Term_description ₁
		EE

	Term_description ₁ :
		EE

Grammars for element particle "quantity" and "prices" are the same as those of their terms ( Term_quantity and Term_price , respectively) because { minOccurs } and { maxOccurs } are both 1.

Particle_quantity

	Term_quantity ₀ :
		SE( "quantity" ) Term_quantity ₁

	Term_quantity ₁ :
		EE

Particle_price

	Term_price ₀ :
		SE( "price" ) Term_price ₁

	Term_price ₁ :
		EE

~~Grammars~~ The grammar for the sequence group term in <product> element declaration is ~~derived from~~ created based on the grammars of subordinate particles as follows. See section 8.5.4.1.8.1 Sequence Model Groups for the rules used to ~~derive~~ generate grammars ~~from a~~ for sequence ~~group.~~ groups.

Term_sequence = Particle_description ⊕ Particle_quantity ⊕ Particle_price

which yields the following grammars for Term_sequence .

Term_sequence

	Term_description ₀ :
		SE("description") Term_description ₁
		Term_quantity ₀

	Term_description ₁ :
		Term_quantity ₀

	Term_quantity ₀ :
		SE("quantity") Term_quantity ₁

	Term_quantity ₁ :
		Term_price ₀

	Term_price ₀ :
		SE("price") Term_price ₁

	Term_price ₁ :
		EE

~~Grammars~~ The grammar for the particle that is the content model of element <product> ~~are derived from~~ is created based on Term_sequence (shown above) given { minOccurs } value of 1 and { maxOccurs } value of 2. See section 8.5.4.1.5 Particles for the rules used to ~~derive~~ generate grammars ~~from~~ for particles.

Particle_sequence

	Term_description _0,0 :
		SE("description") Term_description _0,1
		Term_quantity _0,0

	Term_description _0,1 :
		Term_quantity _0,0

	Term_quantity _0,0 :
		SE("quantity") Term_quantity _0,1

	Term_quantity _0,1 :
		Term_price _0,0

	Term_price _0,0 :
		SE("price") Term_price _0,1

	Term_price _0,1 :
		Term_description _1,0

	Term_description _1,0 :
		SE("description") Term_description _1,1
		Term_quantity _1,0
		EE

	Term_description _1,1 :
		Term_quantity _1,0

	Term_quantity _1,0 :
		SE("quantity") Term_quantity _1,1

	Term_quantity _1,1 :
		Term_price _1,0

	Term_price _1,0 :
		SE("price") Term_price _1,1

	Term_price _1,1 :
		EE

Grammars for attribute uses of attributes "sku" and "color" are as follows. See section 8.5.4.1.4 Attribute Uses for the rules used to ~~derive~~ generate grammars ~~from~~ for attribute uses.

Use_sku

	Use_sku ₀ :
		AT("sku") [schema-typed value] Use_sku ₁

	Use_sku ₁ :
		EE

Use_color

	Use_color ₀ :
		AT("color") [schema-typed value] Use_color ₁
		EE

	Use_color ₁ :
		EE

Note the subtle difference between the forms of the two grammars Use_sku and Use_color . In At the ~~first grammar~~ outset of ~~each,~~ the grammars, only Use_color contains a production of which the ~~right hand~~ right-hand side starts with EE, which ~~stems from~~ is the result of the difference in their occurrence ~~optionality as~~ requirement defined in the schema.

Finally, ~~grammars~~ the grammar for the element <product> is ~~derived from~~ created based on the grammars of its attribute uses and content model particle as follows. See section 8.5.4.1.3.2 Complex Type Grammars for the rules used to ~~derive~~ generate grammars ~~from a~~ for complex ~~type.~~ types.

ProtoG_ProductElement = Use_color ⊕ Use_sku ⊕ Particle_sequence

which yields the following ~~grammars~~ grammar for element <product>.

ProtoG_ProductElement

	Use_color ₀ :
		AT("color") [schema-typed value] Use_color ₁
		Use_sku ₀

	Use_color ₁ :
		Use_sku ₀

	Use_sku ₀ :
		AT("sku") [schema-typed value] Use_sku ₁

	Use_sku ₁ :
		Term_description _0,0

	Term_description _0,0 :
		SE("description") Term_description _0,1
		Term_quantity _0,0

	Term_description _0,1 :
		Term_quantity _0,0

	Term_quantity _0,0 :
		SE("quantity") Term_quantity _0,1

	Term_quantity _0,1 :
		Term_price _0,0

	Term_price _0,0 :
		SE("price") Term_price _0,1

	Term_price _0,1 :
		Term_description _1,0

	Term_description _1,0 :
		SE("description") Term_description _1,1
		Term_quantity _1,0
		EE

	Term_description _1,1 :
		Term_quantity _1,0

	Term_quantity _1,0 :
		SE("quantity") Term_quantity _1,1

	Term_quantity _1,1 :
		Term_price _1,0

	Term_price _1,0 :
		SE("price") Term_price _1,1

	Term_price _1,1 :
		EE

The other element declaration <order> can be processed to generate its proto-grammar in ~~the same~~ a similar fashion as ~~was seen done above~~ follows.


Term_product

	Term_product ₀ :
		SE( "product" ) Term_product ₁

	Term_product ₁ :
		EE

The grammar for element ~~<product>, which would generate~~ particle "product" is created based on Term_product given { minOccurs } value of 1 and { maxOccurs } value of unbounded . See section 8.5.4.1.5 Particles for the ~~following proto-grammars.~~ rules used to generate grammars for particles.


Particle_product (before simplification)


	Term_product _0,0 :
		SE("product") Term_product _0,1

	Term_product _0,1 :
		Term_product _1,0

	Term_product _1,0 :
		SE("product") Term_product _1,1
		EE

	Term_product _1,1 :
		Term_product _1,0

In the above grammars, two grammars Term_product _0,1 and Term_product _1,1 are redundant because they serve for no other purpose than simply relaying one non-terminal to another. Though it is not required, the uses of non-terminals Term_product _0,1 and Term_product _1,1 are each replaced by Term_product _1,0 and Term_product _1,0, which produces the following ~~modified~~ simplified proto-grammars.

~~ProtoG_OrderElement~~ Particle_product (after simplification)



	Term_product _0,0 :
		SE("product") Term_product _1,0

	Term_product _0,1 1,0 :
		SE("product") Term_product _1,0
		EE

The proto-grammar of the element <order> equates to Particle_product because the type definition of element <order> has no attribute uses, and its content model has both { minOccurs } and { maxOccurs } property values of 1 where the element particle "product" is the sole member of the content model.

~~Term_product~~ ProtoG_OrderElement ~~1,0 :~~



~~SE("product")~~	Term_product _1,0 0,0 :
		EE SE("product") Term_product _1,0

	Term_product _1,1 1,0 :
		SE("product") Term_product _1,0
		EE

H.2 Normalized Grammar Examples

The element proto-grammars ProtoG_ProductElement and ProtoG_OrderElement produced in the previous section can be turned into their normalized forms which are shown below with an event code assigned to each production. See section 8.5.4.2 EXI Normalized Grammars for the process that converts proto-grammars into normalized grammars, and section 8.5.4.3 Event Code Assignment for the rules that determine the event codes of productions in normalized grammars.


		Event Code
Use_color ₀ :
	AT("color") [schema-typed value] Use_color ₁	0
	AT("sku") [schema-typed value] Use_sku ₁	1

Use_color ₁ :
	AT("sku") [schema-typed value] Use_sku ₁	0

Use_sku ₁ :
	SE("description") Term_description _0,1	0
	SE("quantity") Term_quantity _0,1	1

Term_description _0,1 :
	SE("quantity") Term_quantity _0,1	0

Term_quantity _0,1 :
	SE("price") Term_price _0,1	0

Term_price _0,1 :
	SE("description") Term_description _1,1	0
	SE("quantity") Term_quantity _1,1	1
	EE	2

Term_description _1,1 :
	SE("quantity") Term_quantity _1,1	0

Term_quantity _1,1 :
	SE("price") Term_price _1,1	0

Term_price _1,1 :
	EE	0


		Event Code
Term_product _0,0 :
	SE("product") Term_product _1,0	0

Term_product _1,0:
	SE("product") Term_product _1,0	0
	EE	1

Note that some ~~grammars~~ productions that were present in the proto-grammars have been removed in the normalized grammars. Those ~~grammars~~ productions were culled upon the completion of grammar normalization because their ~~LeftHandSide~~ left-hand-side non-terminals are not referenced from ~~RightHandSide~~ right-hand side of any available ~~productions.~~ productions, and yet those non-terminals are not the first non-terminals of the grammar they belong to. .

H.3 Complete Grammar Examples

The normalized grammars NormG_ProductElement and NormG_OrderElement are ~~augumented~~ augmented with undeclared productions to become complete grammars. See section 8.5.4.4 Undeclared Productions for the process ~~that augments~~ to augment normalized grammars with productions ~~that represent~~ for accepting terminal symbols not declared in schemas. ~~Those productions not necessary per fidelity options~~ The complete grammars for elements <product> and <order> are shown below. Note that the default grammar settings (i.e. the settings that can be described by an empty header options document <exi:header/> is used for the sake of this augmentation process, and those productions that accept ER, NS, CM and PI have been pruned ~~using~~ according to the rules described in section 8.3 Pruning Unneeded Productions ~~. The resulting grammar with~~ since those terminal symbols are not preserved in the default ~~fidelity options setting is shown below.~~ grammar settings.


		Event Code
Use_color ₀ :
	AT("color") [schema-typed value] Use_color ₁	0
	AT("sku") [schema-typed value] Use_sku ₁	1
	EE	2.0
	AT(xsi:type) Use_color ₀	2.1
	AT(xsi:nil) Use_color ₀	2.2
	~~AT()~~ AT () Use_color ₀	2.3
	AT("color") ~~[schema-invalid value]~~ [untyped value] Use_color _0 1	2.4.0
	AT("sku") ~~[schema-invalid value]~~ [untyped value] ~~Use_color~~ Use_sku _0 1	2.4.1
	~~AT() [schema-invalid value]~~ AT () [untyped value] Use_color ₀	2.4.2
	SE() Use_sku* _{1 1_copied}	2.5
	CH ~~[schema-invalid value]~~ [untyped value] Use_sku _{1 1_copied}	2.6

Use_color ₁ :
	AT("sku") [schema-typed value] Use_sku ₁	0
	EE	1.0
	~~AT()~~ AT () Use_color ₁	1.1
	AT("sku") ~~[schema-invalid value]~~ [untyped value] ~~Use_color~~ Use_sku ₁	1.2.0
	~~AT() [schema-invalid value]~~ AT () [untyped value] Use_color ₁	1.2.1
	SE() Use_sku* _{1 1_copied}	1.3
	CH ~~[schema-invalid value]~~ [untyped value] Use_sku _{1 1_copied}	1.4

Use_sku ₁ :
	SE("description") Term_description _0,1	0
	SE("quantity") Term_quantity _0,1	1
	EE	2.0
	~~AT()~~ AT () Use_sku ₁	2.1
	~~AT() [schema-invalid value]~~ AT () [untyped value] Use_sku ₁	2.2.0
	SE() Use_sku* _{1 1_copied}	2.3
	CH ~~[schema-invalid value]~~ [untyped value] Use_sku _{1 1_copied}	2.4

Use_sku _{1_copied} :
	SE("description") Term_description _0,1	0
	SE("quantity") Term_quantity _0,1	1
	EE	2.0
	SE() Use_sku* _{1_copied}	2.1
	CH [untyped value] Use_sku _{1_copied}	2.2

Term_description _0,1 :
	SE("quantity") Term_quantity _0,1	0
	EE	1
	SE() Term_description* _0,1	2.0
	CH ~~[schema-invalid value]~~ [untyped value] Term_description _0,1	2.1

Term_quantity _0,1 :
	SE("price") Term_price _0,1	0
	EE	1
	SE() Term_quantity* _0,1	2.0
	CH ~~[schema-invalid value]~~ [untyped value] Term_quantity _0,1	2.1

Term_price _0,1 :
	SE("description") Term_description _1,1	0
	SE("quantity") Term_quantity _1,1	1
	EE	2
	SE() Term_price* _0,1	3.0
	CH ~~[schema-invalid value]~~ [untyped value] Term_price _0,1	3.1

Term_description _1,1 :
	SE("quantity") Term_quantity _1,1	0
	EE	1
	SE() Term_description* _1,1	2.0
	CH ~~[schema-invalid value]~~ [untyped value] Term_description _1,1	2.1

Term_quantity _1,1 :
	SE("price") Term_price _1,1	0
	EE	1
	SE() Term_quantity* _1,1	2.0
	CH ~~[schema-invalid value]~~ [untyped value] Term_quantity _1,1	2.1

Term_price _1,1 :
	EE	0
	SE() Term_price* _1,1	1.0
	CH ~~[schema-invalid value]~~ [untyped value] Term_price _1,1	1.1


		Event Code
Term_product _0,0 :
	SE("product") Term_product _1,0	0
	EE	1.0
	AT(xsi:type) Term_product _0,0	1.1
	AT(xsi:nil) Term_product _0,0	1.2
	~~AT()~~ AT () Term_product _0,0	1.3
	~~AT() [schema-invalid value]~~ AT () [untyped value] Term_product _0,0	1.4.0
	SE() Term_product* _{0,0 0,0_copied}	1.5
	CH ~~[schema-invalid value]~~ [untyped value] Term_product _{0,0 0,0_copied}	1.6

Term_product _{0,0_copied} :
	SE("product") Term_product _1,0	0
	EE	1.0
	SE() Term_product* _{0,0_copied}	1.1
	CH [untyped value] Term_product _{0,0_copied}	1.2

Term_product _1,0 :
	SE("product") Term_product _1,0	0
	EE	1
	SE() Term_product* _1,0	2.0
	CH ~~[schema-invalid value]~~ [untyped value] Term_product _1,0	2.1

I Recent Specification Changes (Non-Normative)

I.1 Changes from Last Call Working Draft

Defined schema-order used for sorting productions in schema-informed element and type grammars. (see 8.5.4.3 Event Code Assignment )

Added a caveat regarding the use of content-coding in EXI, clarifying that it is applicable only to XML documents and it is neither byte- nor character-preserving. (see F.1 Content Coding )

Hour, Minute and Second values are comparted into separate sequence of bits in order to make it easier to retrieve each values from Date-Time representation. (see 7.1.8 Date-Time )

Made it clear that value content items of type xsd:QName and xsd:NOTATION are represented as String with exception of xsi:type attribute whose representation uses xsd:QName. (see Table 7-1 )

Described the behaviour expected of an EXI processor when it encountered EXI streams that have EXI versions not implemented by the processor. (see 5.3 EXI Format Version )

AT (*) and/or SE (*) have been permitted on the right-hand side of Schema-informed Element Fragment Grammar. (see 8.5.3 Schema-informed Element Fragment Grammar )

A condition was specified that makes those elements of the same name be excluded from the application of Relaxed Element Fragment grammar. Likewise, a condition that excludes attributes of the same name from the alternative application of String datatype representation. (see 8.5.2 Schema-informed Fragment Grammar and 8.5.3 Schema-informed Element Fragment Grammar )

Schema-informed Element Fragment Grammar is treated as though it was derived from an element declaration with a {nillable} property value of true and a type declaration that has named sub-types, when the grammar goes through the process of undeclared productions augmentation. (see 8.5.3 Schema-informed Element Fragment Grammar )

Added missing semantics for AT (*) productions in Complex Type Grammars and Complex Ur-Type Grammar. (see 8.5.4.1.3.2 Complex Type Grammars and 8.5.4.1.3.3 Complex Ur-Type Grammar )

Added a note indicating xsi:type and xsi:nil attributes cannot be used together on the same element when strict is true.(see 8.5.4.4.2 Adding Productions when Strict is True )

Clarified conditions under which xsi:nil values are stored in structure channel. (see 9.2.1 Structure Channel )

Added simple-type definitions for built-in EXI datatype representations to EXI options document schema. (see C XML Schema for EXI Options Document )

Fixed issue in semantics that prohibited learning of AT(xsi:type) and AT(xsi:nil) in built-in element grammar . (see 8.4.3 Built-in Element Grammar )

In appendix section E Deriving Set of Characters from XML Schema Regular Expressions , the terms "character set" and "charset" have been rephrased as "set of characters" and "set-of-chars", respectively. This is so as to avoid conjuring up some different connotations that these terms often bear in documents such as XML Schema specification.

It was made clear that "strict" element is not permitted to appear alongside "selfContained" element in an options document. (see the term definitions of " strict option " and " selfContained option ")

xsi:type is now allowed on all union types when strict option is true. (see 8.5.4.4.2 Adding Productions when Strict is True )

It has been made clearer that xsi:type and/or xsi:nil attribute events MUST occur before any other attribute events of the same element.(see 8.4.3 Built-in Element Grammar , 8.5.4.1.3.2 Complex Type Grammars , 8.5.4.4.1 Adding Productions when Strict is False and 8.5.4.4.2 Adding Productions when Strict is True )

The order of attribute occurrences when schema-informed grammar is in effect was relaxed to be in any order as long as the grammar permits it. (see 6. Encoding EXI Streams )

The event after a self-contained element starts at an octet boundary. (see 8.4.3 Built-in Element Grammar and see 8.5.4.4.1 Adding Productions when Strict is False )

The provision for selfContained option was removed from section 8.5.4.4.2 Adding Productions when Strict is True . This removal does not cause any change to the format, because selfContained and strict options are mutually exclusive, therefore the provision there could never be utilized thus was was simply extraneous.

SE(*) semantics has been clarified with regards to how grammars are resolved given an element qname .

The initial entries of local-name string table partitions has been split into two, where the second contains entries for the XSD-NS partition and gets appended to first only when XML Schemas are used. (see D.3 Initial Entries in Local-Name Partitions )

It is now explicitly stated that Preserve.prefixes fidelity option SHOULD be turned on when qualified names are used in value content items. (see 6.3 Fidelity Options )

Defined the term " global element grammars " for use in describing the SE (*) semantics.

Fixed various bugs that were present in the examples described in H Schema-informed Grammar Examples .

Sorted the initial entries in local-name partition of string tables were sorted in lexicographical order. (see D.3 Initial Entries in Local-Name Partitions )

Simplified the way in which restricted character sets are derived from datatypes with pattern facets. (see 7.1.10.1 Restricted Character Sets and E Deriving Set of Characters from XML Schema Regular Expressions )

Added references to XML 1.1 alongside those of XML 1.0 to consistently indicate that EXI can be used with both versions of XML. (see 3. Basic Concepts and 5.2 Distinguishing Bits )

Round-robin style mechanism of entry reuse for value string table partitions is described. (see 7.3.3 Partitions Optimized for Frequent use of String Literals )

Added exi:time, exi:date, exi:gYearMonth, exi:gYear, exi:gMonthDay, exi:gDay and exi:gMonth to Table 7-2 .

Appended an empty content grammar to the definition of TypeEmpty _i in "8.5.4.1.3.2 Complex Type Grammars". (see 8.5.4.1.3.2 Complex Type Grammars )

Changed the threshold number used for determining the use of restricted character sets from 255 to 256. (see 7.1.10.1 Restricted Character Sets )

Implementations are allowed to choose whether to use local or global table hit when both are available for representing string values. (see 7.3.3 Partitions Optimized for Frequent use of String Literals )

Pattern facets defined for built-in schema datatypes are disregarded when computing restricted character sets. (see 7.1.10.1 Restricted Character Sets )

The prefix string table partitions are made use of in the encoding of the prefix component of QName. (see 7.1.7 QName )

Changed the default value of schemaID, datatypeRepresentationMap and [user defined meta-data] options to "no default value" in Table 5-1 .

It is made clear that [user defined meta-data] option cannot modify the EXI format. (see 5.4 EXI Options )

The predicates used for AT and CH terminal symbols (i.e. those previously denoted as [schema-valid value] and [schema-invalid value]) have been changed to [schema-typed value] and [untyped value] to more accurately reflect the associated semantics.

Updated the section on content coding to reflect that the "exi" content coding is now in the IANA registry. (see F.1 Content Coding )

Added a non-terminal TypeEmpty _ur-type,₁ to the complex ur-type grammar TypeEmpty _ur-type, and defined the content index of the TypeEmpty _ur-type grammar. (see 8.5.4.1.3.3 Complex Ur-Type Grammar )

Removed the paragraph that described the processor conformance defining the minimal value range required of blockSize option support. (see 10.2 EXI Processor Conformance )

It was made clear that the datatypeRepresentationMap option does not take effect when the value of the preserve.lexicalValues fidelity option is true. (See the definition of datatypeRepresentationMap in 5.4 EXI Options )

It was made clear that the section 7.1.10.1 Restricted Character Sets only applies to datatypes derived from xsd:string.

The maximum size of the range used for determining the use of n -bit Unsigned Integer for representing values of xsd:integer was changed from 4095 to 4096. (see Table 7-1 and 7.1.9 n-bit Unsigned Integer ).

Corrected the semantics associated with the production of the form LeftHandSide : EE in section 8.4.3 Built-in Element Grammar .

Added ElementFragmentTypeEmpty grammar to the schema-informed element fragment grammar, and defined the content index of grammars ElementFragment and ElementFragmentTypeEmpty . (see 8.5.3 Schema-informed Element Fragment Grammar )

The target namespace name of the XML Schema for EXI options document was changed to "http://www.w3.org/2009/exi". (see C XML Schema for EXI Options Document )

Updated the semantics associated with xsi:type and xsi:nil attributes to ensure they are encoded property when the value of preserve.lexicalValues option is true.

Removed the reference to Unicode 5.0.0 character database. Section E Deriving Set of Characters from XML Schema Regular Expressions no longer has dependency on Unicode 5.0.0. A normative reference to XML Schema datatypes specification [XML Schema Datatypes] in regards to processing regular expressions effectively have character class escapes ^XS2 be each resolved into a set of characters using the exact version of XML and Unicode, namely XML 1.0 (Second Edition) and Unicode 3.1.0.

Added the "processContents" attribute with value "skip" to the wildcard terms in the XML Schema for EXI options document. (see C XML Schema for EXI Options Document )

Added a statement that suggests to turn on Preserve.prefixes fidelity option when the Preserve.lexicalValues fidelity option is true and xsi:type attributes are used in the EXI stream. (see 6.3 Fidelity Options )

The EXI datatype identifier exi:integer represents the Integer datatype representation, and Table 7-1 has been updated to accurately reflect the association. The rule previously applied to xsd:Integer datatype in that table is now described in section 7.1.5 Integer .

Two qname definitions "ieeeBinary32" and "ieeeBinary64" have been given in the EXI options document schema. (see C XML Schema for EXI Options Document )

Implentations are required to support all Unsigned Integer values less than 2147483648, whereas the threshold value was previously 4294967296. (see 7.1.6 Unsigned Integer )

I.2 Changes from Fourth Public Working Draft

Added the section F Content Coding and Internet Media Type .

I.2 I.3 Changes from Third Public Working Draft

Added the section 10. Conformance .

EXI Cookie was introduced in the EXI header in order to provide a facility to distinguish EXI streams from a broader range of document types used on the Web. (see 5.1 EXI Cookie )

Header options fields valueMaxLength and valuePartitionCapacity have beed added in EXI options to limit the length of a string and the total number of strings that are put into value partitions of a string table.

Added support for mixed and empty content model in complex type grammar generation. (see 8.5.4.1.3.2 Complex Type Grammars )

Described how built-in element grammars handle xsi:type and ~~xsi:type~~ xsi:nil attribute occurrences in schema-informed EXI streams. (see 8.4.3 Built-in Element Grammar )

Defined the grammar that represents xsd:anyType. (see 8.5.4.1.3.3 Complex Ur-Type Grammar )

Described the restricted character sets used for certain typed values when preserve.lexicalValues is true. (see Table 7-2 )

Added a section 8.5.3 Schema-informed Element Fragment Grammar describing the content of elements declared with the same qname when they occur inside an EXI fragment or EXI Element Fragment.

Added selfContained option for creating elements that can be indexed for random access. (see 5.4 EXI Options )

Added mention of Unsigned Integer size that EXI processors SHOULD and MUST support. (see 7.1.6 Unsigned Integer )

Changed the order of productions representing CH events during event code assignment. (see 8.5.4.3 Event Code Assignment )

Added semantics for empty "schemaID" element value. (see 5.4 EXI Options )

The term "CODEC" has been replaced by a more appropriate term "datatype representation" throughout this document.

Added support for parameterized attribute wildcards in the generation of complex type grammars. (see 8.5.4.1.3.2 Complex Type Grammars )

I.3 I.4 Changes from Second Public Working Draft

Added strict option (see 5.4 EXI Options and 8.5.4.4.2 Adding Productions when Strict is True ).

The order of content items for NS event has been corrected. (see Table 4-1 )

Improved the description of EXI Options document to make it clear that its representation does not start with an EXI header (see section 5.4 EXI Options ).

Reworked the section 8.5.4.4 Undeclared Productions for grammar system accuracy as well as describing how grammars are augumented with undeclared productions when strict option is true.

The indicator content item in NS event type has been renamed to"local-element-ns", and improved its description for clarity (see section 4. EXI Streams ).

The calculus of the MonthDay component used in Date-Time representation has been corrected (see section 7.1.8 Date-Time ).

TypeEmpty is created for each type grammars to facilitate xsi:nil handling (see section 8.5.4.1.3 Type Grammars ).

Described how the use of substitution groups in schemas affects the grammars (see section 8.5.4.1.6 Element Terms ).

Described how the namespace constraints of wildcard terms are factored into the grammars. (see section 8.5.4.1.7 Wildcard Terms ).

Regular expressions do not apply to constrain character sets when regexps contain certain category escapes (see section E Deriving ~~Character Sets~~ Set of Characters from XML Schema Regular Expressions ).

I.4 I.5 Changes from First Public Working Draft

Specified how schema-informed grammars are derived from available XML Schemas (see section 8.5.4 Schema-informed Element and Type Grammars ).

Described how QNames are represented with prefixes when prefix preservation is turned on (see section 7.1.7 QName ).

The alignment option was introduced in EXI Options to support pre-compression stream as well as plain byte-alignment stream, in addition to the default bit-packed representation.

Described how the presence of pattern facets affects the encoding of EXI Boolean (see section 7.1.2 Boolean ) and EXI String (see section 7.1.10 String ) representation.

Values typed as integer with bounded range of 4095 or smaller are now represented as n -bit Unsigned Integers (see section 7.1.9 n-bit Unsigned Integer ).

Added a section that lists additional items that have been advised and are currently under consideration.

J Acknowledgements (Non-Normative)

This document is the work of the Efficient XML Interchange (EXI) WG .

Members of the Working Group are (at the time of writing, sorted alphabetically by last name):

Carine Bournez, W3C/ERCIM ( staff contact )
Don Brutzman, Web3D Consortium
~~Alex Ceponkus, AgileDelta, Inc.~~ Michael Cokus, MITRE Corporation ( co-chair )
~~Roger Cutler, Chevron~~ Youenn Fablet, Canon, Inc.
~~Ed Day, Objective Systems,~~ Jun Fujisawa, Canon, Inc.
Joerg Heuer, Siemens AG
Alan Hudson, Web3D Consortium
Takuki Kamiya, Fujitsu Laboratories of America, Inc. ( co-chair )
Jaakko Kangasharju, University of Helsinki
Richard Kuntschke, Siemens AG
Don McGregor, Web3D Consortium
Daniel Peintner, Siemens AG
Santiago Pericas-Geertsen, Sun Microsystems, Inc.
Liam Quin, W3C/MIT ( staff contact )
Rich Rollman, AgileDelta, Inc.
Paul Sandoz, Sun Microsystems, Inc.
John Schneider, AgileDelta, Inc.
~~Young Wang, Intel Corporation~~ Sheldon Snyder, Web3D Consortium
Greg White, Stanford University ( former co-chair )

The EXI Working Group would like to acknowledge the following former members of the group for their leadership, guidance and expertise they provided throughout their individual tenure in the WG. (sorted alphabetically by last name)

Robin Berjon, Expway ( former co-chair ) (until 17 October 2006)
Oliver Goldman, Adobe Systems, Inc. ( former co-chair ) (until 08 June 2006)
Peter Haggar, IBM (until 07 March 2007)
Kimmo Raatikainen, Nokia (until 13 March 2008)
Paul Thorpe, OSS Nokalva, Inc. (until 11 Sept 2007)
Daniel Vogelheim, Invited Expert ( former co-chair then from Siemens AG) (until 15 July 2008)
Stephen Williams, High Performance Technologies, Inc. (until ~~30 June~~ 8 Aug 2008)
Ed Day, Objective Systems, Inc. (until 23 Oct 2009)

The EXI working group owes so much to our distinguished colleague from Nokia, Kimmo Raatikainen (1955-2008), on the progress of our work, who succumbed to an ailment on March 13, 2008. His breadth of knowledge, depth of insight, ingenuity and courage to speak up constantly shed a light onto us whenever the group seemed to stray into a futile path of disagreements during the course. We shall never forget and will always appreciate his presence in us, and great contribution that is omnipresent in every aspect of our work throughout.

Efficient XML Interchange (EXI) Format 1.0

W3C Working Draft 19 September 2008 Candidate Recommendation 08 December 2009

Abstract

Status of this Document

Table of Contents

Appendices

1. Introduction

1.1 History and Design

1.2 Notational Conventions and Terminology

2. Design Principles

3. Basic Concepts

4. EXI Streams

5. EXI Header

5.1 EXI Cookie

5.2 Distinguishing Bits

5.3 EXI Format Version

5.4 EXI Options

6. Encoding EXI Streams

6.1 Determining Event Codes

6.2 Representing Event Codes

6.3 Fidelity Options

7. Representing Event Content

7.1 Built-in EXI Datatype Representations

7.1.1 Binary

7.1.2 Boolean

7.1.3 Decimal

7.1.4 Float

7.1.5 Integer

7.1.6 Unsigned Integer

7.1.7 QName

7.1.8 Date-Time

7.1.9 n -bit Unsigned Integer

7.1.10 String

7.1.10.1 Restricted Character Sets

7.1.11 List

7.2 Enumerations

7.3 String Table

7.3.1 String Table Partitions

7.3.2 Partitions Optimized for Frequent use of Compact Identifiers

7.3.3 Partitions Optimized for Frequent use of String Literals

7.4 Datatype Representation Map

8. EXI Grammars

8.1 Grammar Notation

8.1.1 Fixed Event Codes

8.1.2 Variable Event Codes

8.2 Grammar Event Codes

8.3 Pruning Unneeded Productions

8.4 Built-in XML Grammars

8.4.1 Built-in Document Grammar

8.4.2 Built-in Fragment Grammar

8.4.3 Built-in Element Grammar

8.5 Schema-informed Grammars

8.5.1 Schema-informed Document Grammar

8.5.2 Schema-informed Fragment Grammar

8.5.3 Schema-informed Element Fragment Grammar

8.5.4 Schema-informed Element and Type Grammars

8.5.4.1 EXI Proto-Grammars

8.5.4.1.1 Grammar Concatenation Operator

8.5.4.1.2 Element Grammars

8.5.4.1.3 Type Grammars

8.5.4.1.3.1 SimpleType Simple Type Grammars

8.5.4.1.3.2 Complex Type Grammars

8.5.4.1.3.3 Complex Ur-Type Grammar

8.5.4.1.4 Attribute Uses

8.5.4.1.5 Particles

8.5.4.1.6 Element Terms

8.5.4.1.7 Wildcard Terms

8.5.4.1.8 Model Group Terms

8.5.4.1.8.1 Sequence Model Groups

8.5.4.1.8.2 Choice Model Groups

8.5.4.1.8.3 All Model Groups

8.5.4.2 EXI Normalized Grammars

8.5.4.2.1 Eliminating Productions with no Terminal Symbol

8.5.4.2.2 Eliminating Duplicate Terminal Symbols

8.5.4.3 Event Code Assignment

8.5.4.4 Undeclared Productions

8.5.4.4.1 Adding Productions when Strict is False

8.5.4.4.2 Adding Productions when Strict is True

9. EXI Compression

9.1 Blocks