From W3C EXI WG's Public Wiki
Jump to: navigation, search


This is a compilation of Frequently Asked Questions from developers implementing EXI applications.

General Questions

Q: Why is EXI needed? Why can't a standard compression technology such as gzip be used?

A: The main problem with standard compression technologies such as gzip are a) they require a great deal of computation resources to do the compression, and b) the compressed result is not streamable. A user of the technology would be required to fully restore a compressed XML document back into XML text form before being able to process it. This is a limiting factor in many constrained environments.

It was also discovered in testing that in certain cases, a gzip compressed version of an XML document turned out to be larger than the original. The EXI format was found to always produce smaller document sizes.

Q: What about the ISO Fast Infoset standard? Wasn't this designed to handle this issue?

A: It was and it was a candidate to be used for the W3C recommendation, as were other viable formats. Its standing as an existing standard was taken into consideration as a strong positive in the evaluation process. However, in the end, it was decided that the base format that was ultimately chosen did a much better job of covering the use cases than Fast Infoset.

Q: Is there a simple example of an XML document and corresponding EXI encoding available that can be used as a reference to get started?

A: Yes, a good example can be found in the primer and at the following location: http://www.w3.org/XML/EXI/tutorial/exi-examples.html.

Q: Is there any test data available so that I can test my implementation for interoperability?

A: Yes, a set of EXI encoded documents in now available at the following URL: http://www.movesinstitute.org/exi

Implementor Questions

String Table

Q: When developing the string table, do all strings go into the table regardless of size?

A: The specification contains a maximum string length argument. If this is set, only strings with lengths less than or equal to this length are added to the table. If not set, all strings are added to the table.

Q: Is an encoder considered to be conformant if it chooses to simply ignore the string table? All strings can be encoded with standard length prefixes in this case.

A: This is not a recommended practice because there is a standard way of specifying the string table is not used. If it is ignored, a decoder would not know the string table is not being used an would do unnecessary memory allocations to store strings. This, however, would not make the EXI processor non-conformant.

EXI Header

Q: The options field in the EXI header is not required to be present. What should a conformant EXI processor do if this field is not present?

A: If the field is not present, it is assumed that the options were agreed upon between sender and receiver in an out-of-band way. If this was not done and a receiver encounters an EXI encoding without options, it should raise an exception and not attempt to continue decoding the message. It cannot assume that the default options have been used - this is signaled by the options field being present but empty.

Q: Is schemaID specified in a thorough enough way as to ensure that the wrong version of a schema is not used to decode an EXI message? Using the wrong version of a schema can have the undesirable side effect of allowing a decoder to successfully decode an EXI instance, but return invalid data.

A: The group has determined that it is beyond the scope of the specification to mandate how schema ID is specified. There are certain use cases where the size of a full URI would be too large. For this reason, it is left as implementation detail as to how this is specified.

Q: Where is the character encoding for an XML document (i.e. the encoding field in the XML header) maintained in EXI?

A: The group has determined that the character encoding used for an XML document in not a significant detail and therefore it is not preserved. When an EXI document is decoded back to XML form, any character encoding can be used (normally, the encoding of the locale on the target platform is used).

Schema-Informed Grammar

Q: If schema-informed encoding is used, does exactly the same schema need to be used to decode the message? In other words, is schema versioning supported in such a way as to allow a different version of the schema to be used between encoder and decoder?

A: The extensions and deviations feature of EXI provides the capability to encode/decode an XML document that does not exactly match the schema. However, if an EXI document is encoded in schema-informed mode with one version of a schema, then that exact same version of the schema must be used to decode the document. The only way around this is to use schemaless mode, or to use wildcards (xsd:any, xsd:anyAttribute) in places where future elements/attributes are expected.

Q: How is schema mapping to be handled on mobile devices? It would seem the infrastructure required to support the XSD grammars would be far too large for use on a mobile device.

A:Is is assumed that a compact schema representation of some sort would be required for this type of environment. As this time, this is regarded to be an implementation detail; however, there is an item under consideration to define a standard compact mapping.

Q: In the notebook example in the primer, why is a bit added for the SE(notebook) event in strict mode? Since this is the only global element declared, should it not be encoded in a zero bit field?

A: The first bit is used to distinguish between SE(notebook) and SE(*). SE(*) is required in the document and fragment grammars when strict=true to maintain conformance with the XML schema specification.

XML Schema Type Encoding

Q: In Decimal/Float encoding, is there any easy way to go from floating point format to EXI format? It seems it is necessary to first convert to string form. For example, "123.4567" as float. One needs to determine 1234567 is the first integer and -4 the second.

A: There is not presently any standardized, trivial, way to go from IEEE 754, as used in for instance Java, to the EXI Float format. At the time of writing, which is before there are any typed APIs in existence, this is not considered a serious drawback, because the encoding/decoding algorithm is not hard. However, it is anticipated that when adoption of EXI motivates the production of typed APIs and data binding, then it will become important to some users to communicate floating point data in a way that minimizes the overhead of directly encoding from, and decoding to, native machine format (for instance for high performance web services). At that time, users for whom such efficiency is critical, are likely to utilize the EXI facility of user-defined datatype representations for encoding floats, plus EXI Lists and application level metadata, to define floating point aggregates such as arrays, matrices, sparse matrices, mesh grids, and so on..

Q: How do I use IEEE754 floats in EXI?

A: The Datatype Representation Map capability must be used for this. At this time, there is no standard Datatype Representation Map available; however, the group is considering development of one for this purpose since it is a often-requested feature.

Q: Are integers encoded in big-endian or little-endian format?

A: Little-endian format is used for integer encodings. The byte order is the same in both bit-packed or byte-aligned mode.


Q: I tried to decode the compressed streams provided by the EXI working group and did not succeed. Is there any option that need to be set for the (in)deflater?

A: EXI uses the standard DEFLATE Compressed Data Format defined by IETF RFC 1951. Some libraries (e.g. Suns Java Runtime) use deflate with the default option of *nowrap* set to FALSE. Note: *nowrap* set to FALSE produces a ZLIB header and checksum fields which are to support the compression format used in both GZIP and PKZIP. *nowrap* set to TRUE should be used for EXI.