Efficient XML Interchange (EXI) Profile

1. Introduction

Many device classes and use-cases desire to use EXI as its exchange format. Due to various restrictions some of those application areas are not capable or allowed to require arbitrary memory growth at runtime. Certain evaluations of EXI in the context of such areas exposed some challenges to the attempt to restrict memory usage predictably within their limited respective threshold.

This EXI profile document specifies rules to ensure that the memory restrictions are respected while keeping compatibility with the EXI 1.0 specification. Section 2. Grammar Capping defines the mechanisms and parameters defined to limit the grammar learning. Section 3. Local Value Capping defines the mechanisms to simplify value indexing. Section 4. Parameters representation defines how the parameters defined in the profile can be represented as part of the EXI header.

To keep EXI 1.0 compatibility, the EXI profile does not provide a specific mechanism to bound the memory used for name tables. The working group discussed strategies and rules that allow EXI processors to overcome this issue. The appendix section (C Prefix Workarounds, D Name Table Workarounds) describes some of these implementation strategies and best practice rules.

2. Grammar Capping

To disable grammar learning, the xsi:type attribute may be used to switch from an evolving built-in element grammar to a non evolving schema-informed grammar. In particular, the xsd:anyType complex type can be used to represent arbitrary XML elements.

Note that the EXI profile can only limit grammar learning for schema-informed EXI streams but not for schema-less EXI streams. In the case where no schema is available, the "schemaId" element may be set to the empty value, so that all grammars derived from the built-in XML Schema types become available through xsi:type grammar switching, in particular the gramar corresponding to the xsd:anyType complex type.

Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.

Prefix	Namespace Name
xsd	http://www.w3.org/2001/XMLSchema
xsi	http://www.w3.org/2001/XMLSchema-instance

2.1 Grammar Learning Disabling Mechanism

For a given element E, the disabling of grammar learning E is done by inserting an xsi:type attribute event with the xsd:anyType value as the first event after the SE event of the element E. If an element E already has an xsi:type attribute and grammar learning is disabled for the grammar representing the element E, the xsi:type attribute value MUST refer to a known schema-informed grammar that can represent the given element.

If grammar learning is disabled for an element E in the case of a new built-in element grammar G, the following rule happens:

An xsi:type attribute event is inserted with the xsd:anyType value after the SE event representing this element. The xsi:type attribute event MUST always be represented by the AT(*) production whose event code length is 2. The same rule applies for all elements with the same QName and that are not represented by a schema informed grammar.
For all elements following the element E in the EXI stream, that have the same QName as the element E and are represented by a built-in element grammar, an xsi:type attribute event is inserted with the xsd:anyType value directly after the corresponding SE event. The xsi:type attribute event MUST always be represented by the AT(*) production whose event code length is 2.

If grammar learning is disabled in the case of a production insertion in a given grammar, named G, the following rule happens:

All events of the given element E that remain to be represented are represented using productions whose event code length is 2. Note that in such a case, the EXI processor must ensure to increment the productions event code according the rules defined in section 8.4.3 of the EXI 1.0 specification but does not need to create and insert the top-level productions. In particular, the EXI processor may need to keep track whether a CH production is already inserted or not in the grammar G to keep the number of top-level productions.
For all elements following this element E in the EXI stream and represented by the grammar G, an xsi:type attribute event is inserted with the xsd:anyType value directly after the corresponding SE event. The xsi:type attribute event MUST always be represented by the AT(*) production whose event code length is 2.

Once production insertion is disabled for a given grammar, the grammar use is restricted so that only the number of top level productions needs to be stored for that grammar.

2.2 Grammar Learning Disabling Parameters

[Definition: A built-in element grammar is considered to be an evolving built-in element grammar if a production has been dynamically inserted within the grammar. ]

To limit increasing memory consumption due to grammar learning, the EXI profile enables to limit the number of evolving built-in grammars and the number of inserted productions. Two parameters are defined for that purpose:

[Definition: The maximumNumberOfEvolvingBuiltInElementGrammars option is the maximum number of elements for which evolving built-in element grammars can be instantiated. ]

[Definition: The maximumNumberOfBuiltInProductions option is the maximum number of top-level productions that can be dynamically inserted in built-in fragment and built-in element grammars. ] Note that only dynamically inserted top level productions are counted. In particular, the top level EE productions of the built-in ElementContent grammar are not counted since they are added when creating each ElementContent grammar.

Grammar learning is disabled if any of following condition is true:

The number of dynamically inserted top level productions of all built-in fragment and element grammars is equal or greater than the maximumNumberOfBuiltInProductions value.
The augmentation process makes it to be a new evolving built-in element grammar while the number of already existing evolving built-in element grammars is equal or greater than the maximumNumberOfBuiltInElementGrammars value.

Whenever the parameters are set, the rules above MUST be properly applied. The parameters may be left unspecified in the case of an application that wants to define a finer-grained control on the application of the grammar learning disabling mechanism. In such a case, the EXI processor will not set the two parameters defined and should use an out-of-band mechanism to convey the precise grammar disabling strategy in use. It may for instance be beneficial to apply grammar learning for elements that occur frequently and that are regular while disabling grammar learning for elements that occur rarely.

3. Local Value Capping

Some classes of EXI processors may not afford the cost of building local value table representations. This profile defines a parameter that can disable the use of local value references. Global value indexing may be controlled using the options defined in the EXI 1.0 specification.

[Definition: The localValuePartitions option is a Boolean used to indicate whether local value partitions are used. ] The value "0" indicates that no local value partition is used while "1" represents the behavior of the EXI 1.0 specification.

When the value localValuePartitions is set to "0", it is an error to represent a string value as a reference to an entry of a local value partition.

Some processors may decide to have fine grained strategy on the local value tables building and usage. For instance, some processors may decide to use local value references based on the QName of the element. In such a case, the localValuePartitions should not be set to any value and the fine-grained strategy should be exchanged by an out-of-band mechanism.

4. Parameters representation

The use of the EXI profile is advertised by encoding the following XML element in the user-defined meta-data section of the EXI options of an EXI stream:

                        <p xmlns="http://www.w3.org/2009/exi" xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
                            xsi:type="xsd:decimal">
                            ...
                        </p>

The content of this element indicates the value of the three parameters of the profile, encoded using a single decimal value. Each profile parameter is represented as follows:

The localValuePartitions parameter is encoded as the sign of the decimal value: the parameter is equal to 0 if the decimal value is positive and 1 if the decimal value is negative.
The maximumNumberOfBuiltInElementGrammars parameter is represented by the first unsigned integer corresponding to integral portion of the decimal value: the maximumNumberOfBuiltInElementGrammars parameter is unbounded if the unsigned integer value is 0; otherwise it is equal to the unsigned integer value - 1.
The maximumNumberOfBuiltInProductions parameter is represented by the second unsigned integer corresponding to the fractional portion in reverse order of the decimal value: the maximumNumberOfBuiltInProductions parameter is unbounded if the unsigned integer value is 0; otherwise it is equal to the unsigned integer value - 1.

To indicate that the EXI profile is in use without advertising each parameter value, the exi:p element is encoded without any content, as follows:

                        <p xmlns="http://www.w3.org/2009/exi"/>

In such a case, the actual profile parameters (or fine-grained capping strategies) should be defined by an out-of-bound mechanism.

Editorial note
Some working group members question the benefit of having maximumNumberOfBuiltInElementGrammars and maximumNumberOfBuiltInProductions parameters in the header. Need to decide whether it is useful to define special undefined values for the decimal unsigned integer values: 0=not-defined, 1= unbounded, above=max-2.

A References

Efficient XML Interchange (EXI) Format 1.0: Efficient XML Interchange (EXI) Format 1.0, John Schneider and Takuki Kamiya, Editors. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi/. (See http://www.w3.org/TR/2011/REC-exi-20110310/.)
EXI Evaluation Note: Efficient XML Interchange Evaluation, Carine Bournez, Editor. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-evaluation/. (See http://www.w3.org/TR/2009/WD-exi-evaluation-20090407/.)
XML Schema Datatypes: XML Schema Part 2: Datatypes Second Edition, P. Byron and A. Malhotra, Editors. World Wide Web Consortium, 2 May 2001, revised 28 October 2004. The latest version is available at http://www.w3.org/TR/xmlschema-2. (See http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/.)

B Header Considerations

The processing of the EXI header may require memory allocation to handle grammar learning and value partition. It is advised that the processing of the EXI header does not go over the memory requirements used for the EXI body processing. In particular:

Built-in grammar learning should be kept to a minimum, optimally not requiring the storage of an element grammar besides the scope of this element.
String partitions should not be needed to correctly parse the corresponding header.

C Prefix Workarounds

It is advised for users that want to preserve prefixes while capping memory requirements to properly set their namespace declarations so that:

QName prefixes are encoded as 0 bit
Prefixes in NS events are always encoded literally

In those conditions, an EXI processor does not need to build the prefix indexing tables. In addition, generic XML parsers and XML serializers often store namespace mapping information and provide APIs to query it. In such a case, applications can retrieve the prefix based on the given namespace URI from the XML parser or serializer.

D Name Table Workarounds

Name tables cannot be restricted without breaking the compatibility with the EXI 1.0 specification. This section describes how an implementation may circumvent this issue by bringing the knowledge of the application to the EXI processor.

D.1 Name Table Encoder Workarounds

The application may declare to the EXI encoder all the QNames it is aware of. Amongst several possible optimizations, this allows the EXI encoder to pre-allocate statically the memory used to store all the possible QNames, be they pre-populated using schema knowledge or not. In addition the string representation of these QNames may also be shared between the application and the EXI processor.

If all the QNames used by the application are declared to the EXI encoder, no name table entry will be dynamically inserted and no additional memory allocation may be needed for the name tables. The EXI encoder will assign indexes dynamically for names that are not pre-populated by the schema knowledge to keep the compatibility with the EXI 1.0 specification. It will also need to keep track of the number of entries that are indexed to compute the binary representation size of entry indexes.

D.2 Name Table Decoder Workarounds

The application may declare to the EXI decoder all the QNames it is interested in. This allows the EXI decoder to pre-allocate the memory used to store these QNames, be they pre-populated using schema knowledge or not. In addition the string representation of these QNames may also be shared between the application and the EXI processor.

When a litterally encoded URI or local name string happens in an EXI stream, the EXI decoder may check whether it is of interest to the application. If so, the EXI processor will be able to assign dynamically to that string an index without allocating any new memory to store this name table entry.

When a URI or local Name string happens in an EXI stream that was NOT declared of interest by the application, the EXI processor may have a behavior specified by the application: adding it to the name tables as defined in the EXI 1.0 specification if there is enough memory, incrementing the corresponding name table counter but not storing the string and skipping the corresponding event, raising an error...

E Grammar Restriction Considerations

E.1 Grammar Restriction Encoder Considerations

For a given QName, EXI encoders will generally need to retain some information related to the state of the corresponding built-in element grammar. In particular, it may need to know the following information for each QName that has no associated schema-informed grammar:

The number of top level productions of the grammar associated to that QName
Whether a xsi:type attribute must be inserted after an SE event of that QName
Whether an xsi:type event was already encoded using the grammar of that QName

Note that in the case of no grammar learning at all, this information may be stored as a boolean, representing whether that QName was already encoded as part of a SE event.

E.2 Grammar Restriction Decoder Considerations

Once grammar learning is disabled, the EXI decoder will not need to create any new built-in element grammar. The EXI decoder can directly deduce the corresponding grammar from the EXI stream:

If the first bit (in bit-packed mode) or byte (in byte-aligned mode) after the SE event of a QName is set to 0, the corresponding grammar is a new built-in element grammar with no top-level production at all.
If the first bit (in bit-packed mode) or byte (in byte-aligned mode) after the SE event of a QName is set to 1, the corresponding grammar is a built-in element grammar with one top level production, AT(xsi:type).

Note that the second part of the encoded production must be equal to 1, so as to refer to a AT(*) production.

In the case of built-in element grammars for which one or more productions were inserted before the grammar learning is disabled and for which a production should have been inserted after the grammar learning is disabled, the EXI decoder may only need to store the number of top-level productions of this grammar and whether the top-level productions already include a AT(xsi:type) production. The first part of the encoded production must be equal to the number of top-level productions. The second part of the encoded production must be equal to 1, so as to refer to a AT(*) production.

An EXI profile decoder SHOULD strip any xsi:type attribute with the xsd:anyType value from the infoset that corresponds to the grammar learning disabling mechanism.