This document describes a profile of the EXI 1.0 specification for devices with limited memory capacities.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the Last Call Public Working Draft of the Efficient XML Interchange (EXI) Profile specification. It is intended for review by W3C members and other interested parties. The document contains several minor updates to the previous version of this document. These updates ensure that EXI streams that follow this profile are fully conformant EXI streams with the EXI 1.0 specification.
Please send comments about this document to the email@example.com mailing list (Archives). When preparing comments to send in, please provide a separate email message for each distinct issue to the extent possible. The Last Call review period for this document extends until 14 September 2012.
This document has been produced by the Efficient XML Interchange Working Group as part of the W3C XML Activity. The goals of the Efficient XML Interchange (EXI) Format are discussed in the Efficient XML Interchange (EXI) Format 1.0 document. The authors of this document are the members of the Efficient XML Interchange Working Group.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
B Header Considerations (Non-Normative)
C Prefix Workarounds (Non-Normative)
D Name Table Workarounds (Non-Normative)
D.1 Name Table Encoder Workarounds
D.2 Name Table Decoder Workarounds
E Grammar Restriction Considerations (Non-Normative)
E.1 Grammar Restriction Encoder Considerations
E.2 Grammar Restriction Decoder Considerations
Many device classes and use-cases desire to use EXI as its exchange format. Due to various restrictions some of those application areas are not capable or allowed to require arbitrary memory growth at runtime. Certain evaluations of EXI in the context of such areas exposed some challenges to the attempt to restrict memory usage predictably within their limited respective threshold.
This EXI profile document specifies rules to ensure that the memory restrictions are respected while keeping compatibility with the EXI 1.0 specification. Section 2. Grammar Capping defines the mechanisms and parameters defined to limit the grammar learning. Section 3. Local Value Capping defines the mechanisms to simplify value indexing. Section 4. Parameters representation defines how the parameters defined in the profile can be represented as part of the EXI header.
To keep EXI 1.0 compatibility, the EXI profile does not provide a specific mechanism to bound the memory used for name tables. The working group discussed strategies and rules that allow EXI processors to overcome this issue. The appendix section (C Prefix Workarounds, D Name Table Workarounds) describes some of these implementation strategies and best practice rules.
To disable grammar learning, the xsi:type attribute may be used to switch from an evolving built-in element grammar to a non evolving schema-informed grammar. In particular, the xsd:anyType complex type can be used to represent arbitrary XML elements.
Note that the EXI profile can only limit grammar learning for schema-informed EXI streams but not for schema-less EXI streams. In the case where no schema is available, the "schemaId" element may be set to the empty value, so that all grammars derived from the built-in XML Schema types become available through xsi:type grammar switching, in particular the gramar corresponding to the xsd:anyType complex type.
Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.
For a given element E, the disabling of grammar learning E is done by inserting an xsi:type attribute event with the xsd:anyType value as the first event after the SE event of the element E. If an element E already has an xsi:type attribute and grammar learning is disabled for the grammar representing the element E, the xsi:type attribute value MUST refer to a known schema-informed grammar that can represent the given element.
If grammar learning is disabled for an element E in the case of a new built-in element grammar G, the following rule happens:
An xsi:type attribute event, if not already present, is inserted with the xsd:anyType value after the SE event representing this element. The xsi:type attribute event MUST always be represented by the AT(*) production whose event code length is 2.
For all elements following the element E in the EXI stream, that have the same QName as the element E and are represented by a built-in element grammar, an xsi:type attribute event, if not already present, is inserted with the xsd:anyType value directly after the corresponding SE event. The xsi:type attribute event MUST always be represented by the AT(*) production whose event code length is 2.
If grammar learning is disabled in the case of a production insertion in a given grammar, named G, the following rule happens:
All events to be encoded that can be represented by top-level productions already inserted in the grammar G are represented by the corresponding top-level productions.
All events of the given element E that remain to be represented are represented using productions whose event code length is 2. Note that in such a case, the EXI processor must ensure to increment the productions event code according the rules defined in section 8.4.3 of the EXI 1.0 specification but does not need to create and insert the top-level productions. In particular, the EXI processor may need to keep track whether a CH production is already inserted or not in the grammar G to keep the number of top-level productions consistent with the rules defined in section 8.4.3 of the EXI 1.0 specification.
For all elements following this element E in the EXI stream and represented by the grammar G, an xsi:type attribute event MAY be inserted with the xsd:anyType value directly after the corresponding SE event. The xsi:type attribute event MUST always be represented by the AT(*) production whose event code length is 2.
Once production insertion is disabled for a given grammar, the grammar use is restricted so that only the productions already inserted and the number of top level productions needs to be stored for that grammar. It should be noted that even if production insertion is disabled for a given grammar, the productions inserted before production insertion is disabled for that grammar may be used throughout the whole document. Additional information, in particular with regards to implementation strategy and impact is available in the appendix section E Grammar Restriction Considerations.
To limit increasing memory consumption due to grammar learning, the EXI profile enables to limit the number of evolving built-in grammars and the number of inserted productions. Two parameters are defined for that purpose:
[Definition: The maximumNumberOfBuiltInElementGrammars option is the maximum number of built-in element grammars for which dynamically productions other than AT(xsi:type) productions have been added. ]
[Definition: The maximumNumberOfBuiltInProductions option is the maximum number of top-level productions that can be dynamically inserted in built-in element grammars excluding AT(xsi:type) productions ]
Grammar learning is disabled for an element E for which a new built-in element grammar G must be created if the following condition is true:
The number of built-in element grammars for which dynamically productions other than AT(xsi:type) productions have been added is equal or greater than the maximumNumberOfBuiltInElementGrammars value.
Grammar learning is disabled in the case of a production insertion if the following condition is true:
The sum of the number of dynamically inserted top level productions of all built-in element grammars is equal or greater than the maximumNumberOfBuiltInProductions value.
Whenever the parameters are set, the rules above MUST be properly applied. The parameters may be left unspecified in the case of an application that wants to define a finer-grained control on the application of the grammar learning disabling mechanism. In such a case, the EXI processor will not set the two parameters defined and should use an out-of-band mechanism to convey the precise grammar disabling strategy in use. It may for instance be beneficial to apply grammar learning for elements that occur frequently and that are regular while disabling grammar learning for elements that occur rarely.
Some classes of EXI processors may not afford the cost of building local value table representations. This profile defines a parameter that can disable the use of local value references. Global value indexing may be controlled using the options defined in the EXI 1.0 specification.
[Definition: The localValuePartitions option is a Boolean used to indicate whether local value partitions are used. ] The value "0" indicates that no local value partition is used while "1" represents the behavior of the EXI 1.0 specification.
When the value localValuePartitions is set to "0", it is an error to represent a string value as a reference to an entry of a local value partition.
Some processors may decide to have fine grained strategy on the local value tables building and usage. For instance, some processors may decide to use local value references based on the QName of the element. In such a case, the localValuePartitions should not be set to any value and the fine-grained strategy should be exchanged by an out-of-band mechanism.
The use of the EXI profile is advertised by encoding the following XML element in the user-defined meta-data section of the EXI options of an EXI stream:
<p xmlns="http://www.w3.org/2009/exi" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:decimal"> ... </p>
The content of this element indicates the value of the three parameters of the profile, encoded using a single decimal value. Each profile parameter is represented as follows:
The localValuePartitions parameter is encoded as the sign of the decimal value: the parameter is equal to 0 if the decimal value is positive and 1 if the decimal value is negative.
The maximumNumberOfBuiltInElementGrammars parameter is represented by the first unsigned integer corresponding to integral portion of the decimal value: the maximumNumberOfBuiltInElementGrammars parameter is unbounded if the unsigned integer value is 0; otherwise it is equal to the unsigned integer value - 1.
The maximumNumberOfBuiltInProductions parameter is represented by the second unsigned integer corresponding to the fractional portion in reverse order of the decimal value: the maximumNumberOfBuiltInProductions parameter is unbounded if the unsigned integer value is 0; otherwise it is equal to the unsigned integer value - 1.
To indicate that the EXI profile is in use without advertising each parameter value, the exi:p element is encoded without any content, as follows:
In such a case, the actual profile parameters (or fine-grained capping strategies) should be defined by an out-of-bound mechanism.
The processing of the EXI header may require memory allocation to handle grammar learning and value partition. It is advised that the processing of the EXI header does not go over the memory requirements used for the EXI body processing. In particular:
Built-in grammar learning should be kept to a minimum, optimally not requiring the storage of an element grammar besides the scope of this element.
String partitions should not be needed to correctly parse the corresponding header.
It is advised for users that want to preserve prefixes while capping memory requirements to properly set their namespace declarations so that:
QName prefixes are encoded as 0 bit
Prefixes in NS events are always encoded literally
In those conditions, an EXI processor does not need to build the prefix indexing tables. In addition, generic XML parsers and XML serializers often store namespace mapping information and provide APIs to query it. In such a case, applications can retrieve the prefix based on the given namespace URI from the XML parser or serializer.
Name tables cannot be restricted without breaking the compatibility with the EXI 1.0 specification. This section describes how an implementation may circumvent this issue by bringing the knowledge of the application to the EXI processor.
The application may declare to the EXI encoder all the QNames it is aware of. Amongst several possible optimizations, this allows the EXI encoder to pre-allocate statically the memory used to store all the possible QNames, be they pre-populated using schema knowledge or not. In addition the string representation of these QNames may also be shared between the application and the EXI processor.
If all the QNames used by the application are declared to the EXI encoder, no name table entry will be dynamically inserted and no additional memory allocation may be needed for the name tables. The EXI encoder will assign indexes dynamically for names that are not pre-populated by the schema knowledge to keep the compatibility with the EXI 1.0 specification. It will also need to keep track of the number of entries that are indexed to compute the binary representation size of entry indexes.
The application may declare to the EXI decoder all the QNames it is interested in. This allows the EXI decoder to pre-allocate the memory used to store these QNames, be they pre-populated using schema knowledge or not. In addition the string representation of these QNames may also be shared between the application and the EXI processor.
When a litterally encoded URI or local name string happens in an EXI stream, the EXI decoder may check whether it is of interest to the application. If so, the EXI processor will be able to assign dynamically to that string an index without allocating any new memory to store this name table entry.
When a URI or local name string happens in an EXI stream that was NOT declared of interest by the application, the EXI processor may have a behavior specified by the application: adding it to the name tables as defined in the EXI 1.0 specification if there is enough memory, incrementing the corresponding name table counter but not storing the string and skipping the corresponding event, raising an error...
Under the circumstance that local value string entries are not disabled the EXI processor needs to store the number of local value partition entries.
For a given QName, EXI encoders will generally need to retain some information related to the state of the corresponding built-in element grammar. In particular, it may need to know the following information for each QName that has no associated schema-informed grammar:
The number of top level productions of the grammar associated to that QName
Whether a xsi:type attribute must be inserted after an SE event of that QName
Whether an xsi:type event was already encoded using the grammar of that QName
Note that in the case of no grammar learning at all, this information may be stored as a boolean, representing whether that QName was already encoded as part of a SE event.
In the case of built-in element grammars for which one or more productions were inserted before the grammar learning is disabled and for which a production should have been inserted after the grammar learning is disabled for a given element E, encoders must use second level productions to encode all remaining events of that given element E that cannot be represented using already inserted productions. It should be noted that every AT(*) or SE(*) second level production encoded will cause EXI decoders that are unaware of the EXI profile to insert a production in the corresponding grammar. The production insertion number will grow with the number of attributes or start element events that cannot be represented using already inserted productions. This number may be arbitrarly large. Encoders need to take that into account when encoding documents so that not-EXI profile decoders can still properly decode the documents. For instance, encoders may decide to use xsi:type based grammar switching to limit the size of such a grammar.
Once grammar learning is disabled, the EXI decoder will not need to create any new built-in element grammar. For any such grammar, the EXI decoder can directly deduce the corresponding grammar state from the EXI stream:
In bit-packed mode:
If the first bit after the SE event of a QName is set to 0, the corresponding grammar is a new built-in element grammar with no top-level production at all.
If the first bit after the SE event of a QName is set to 1, the corresponding grammar is a built-in element grammar with one top level production, AT(xsi:type).
In byte-aligned mode:
The first byte after the SE event of a QName must be equal to 1.
If the second byte after the SE event is 1, the corresponding grammar is a built-in element grammar with one top level production, AT(xsi:type).
Otherwise, the second byte after the SE event is equal to 3 and the corresponding grammar is a new built-in element grammar with no top-level production at all.
Note that the second part of the encoded production must be equal to 1, so as to refer to a AT(*) production.
In the case of built-in element grammars for which one or more productions were inserted before the grammar learning is disabled and for which a production should have been inserted after the grammar learning is disabled, the EXI decoder may only need to store the number of top-level productions of this grammar in addition to the already inserted productions. It will also need to check whether the top-level productions already include a AT(xsi:type) production. The first part of the encoded production must be equal to the number of top-level productions. The second part of the encoded production must be equal to 1, so as to refer to a AT(*) production.
An EXI profile decoder SHOULD strip any xsi:type attribute with the xsd:anyType value from the infoset that corresponds to the grammar learning disabling mechanism.