Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This Working Group Note presents measurement results of various high-performance XML interchange encoding formats and their associated processors, made by the Efficient XML Interchange (EXI) Working Group. The measurements have been conducted following the recommendations of the XML Binary Characterization (XBC) Working Group. In particular, this draft covers measurements of the properties of "compactness", "processing efficiency" and "roundtrip support", as defined by the XBC WG. We start by describing the context in which this analysis is being made, and the position of an efficient format in the landscape of high performance XML strategies. Then we describe the measured quantities in detail and the test framework in which they were made, and give a short description of each format. Finally, a summary of the results and the conclusions of the group are included. The full measurements and analysis are included in an appendix and supporting documents.
As a result of the measurements described here, the working group selected Efficient XML ([EffXML]) to be the basis for the proposed encoding specification to be prepared as a candidate W3C Recommendation. Follow up work has centered around integrating some features from the other measured format technologies, particularly variations for both more efficient structural and value encodings.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This note presents measurement results of the EXI Working Group's tests of various high performance XML interchange encoding formats. This draft (see non-normative version above) concentrates on the properties of "compactness", "processing efficiency" and "roundtrip support".
This draft adds analysis to the measurement results. Additionally measurements of network interactions are included. The results reported in this draft are considered to be stable, though there are still some minor issues to correct with the measurement framework that are also reported.
Following drafts of this note may add measurements of further features under consideration for the candidate format recommendation, such as performance enhancement under strict schema dependence, IEEE float support, random access, etc. (as listed, at the time of writing, in appendix E of the First Draft of the format specification).
While compactness will not generally vary from implementation to implementation, processing efficiency may vary widely between implementations of a given format. The current framework characterizes processing efficiency of specific implementations, rather than attempting to evaluate any nominal upper limit of processing efficiency for each format's primary algorithm.
This document was developed by the Efficient XML Interchange (EXI) Working Group. A complete list of changes to this document is available.
Comments on this document are invited and are to be sent to the public public-exi@w3.org mailing list (public archive). If substantive comments are received, the Working Group may revise this Working Group Note.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The objective of this document is to provide an analysis of the expected performance characteristics of a potential "Efficient XML Interchange" (EXI) encoding format. A successful EXI format will include facilities for helping computers encode and decode the entities of XML documents efficiently and in a compact form. The purpose of such a format would be to enable the use of tools and processing models in the current XML technology stack, in environments where the costs of producing, exchanging, and consuming XML are currently high or prohibitive. Additionally, such a format may well enable an expansion of the use of web technologies to new applications or industries, which presently find some facilities of XML attractive, but are limited by some hard constraints on encoded length or some aspect of computational efficiency.
Based on the measurements described here, the working group has selected Efficient XML ([EffXML]) to be the basis for the proposed encoding specification to be prepared as a candidate W3C Recommendation.
The Extensible Markup Language (XML), is an application profile of Standard Generalized Markup Language (SGML). It provides a well defined syntax and encoding by which structured, textual, data can be examined by people, and exchanged and interpreted by computers. The well defined basis of XML, and its simple syntax that permits ready semantic interpretation, has resulted in a very broad range of successful uses. There are a number of use cases for which these advantages are tempered by inefficiencies which stem from the textual encoding of XML. These cases, for instance, require a compact form, such as in small portable devices like mobile phones, or an encoding which avoids the time spent on floating point conversion, such as in scientific and engineering fields. Many use cases, and the properties of XML to which they are sensitive, were documented extensively by the XML Binary Characterization Working Group, in [XBC Use Cases] and [XBC properties].
The XBC WG recommended that work on an Efficient XML Interchange format proceed, since alternative formats which appeared to meet some or all of the property criteria already existed. However, the XBC WG did not perform a quantitative analysis of those formats, nor define measurable thresholds of performance which should be achieved for each use case and property. This Note provides measurement data regarding some of the quantitative Properties of XML relevant to the creation of an efficient format for the interchange of XML, including its encoding and processing. In general, we do not here review the simple-valued properties of each format, such as whether they are "Royalty Free".
This draft concentrates on the two most critical properties of XML for a potential efficient interchange format: "compactness" and "processing efficiency". These are important in that they are both significant drawbacks with existing XML solutions in many use cases, and they are difficult to accurately model. In addition, results for the "roundtrip support" property are reported, as that is a necessary property for any XML format, and failing to properly roundtrip may allow a candidate to achieve unrealistically good results in the other two properties.
In order to characterize the performance of the candidates with respect to these properties, the EXI working group assembled a significantly larger set of test data than was collected by the XBC. The group has also created a testing framework, based on Japex, which enables efficient testing of different candidate formats and implementations against a variety of different XML data. The wide set of test data is intended to highlight any binary-encoding format which achieves good results at the expense of being narrowly focused, and to help determine the generality of each format. The inclusion of many candidate algorithms improves the likelihood of identifying superior solutions which might ultimately inform an ideal single-format EXI recommendation.
Results are included for each of these candidates versus common XML solutions. Since processing efficiency is a measure of both format and implementation combined, the EXI group issued a public call for fast XML implementations to avoid drawing misleading conclusions. As a result of this call, an existing high-performance XML parser (Xals) was provided, and used in the measurements to obtain a baseline of XML processing efficiency. XML Screamer was tested outside the framework, but intellectual property issues forbade its inclusion in the framework and therefore comparative measurements could not be made.
As documented in [XBC Characterization], the main properties that are considered lacking from XML are Compactness and Processing Efficiency, and these shortcomings have led to the development of alternative formats. However, due to the ubiquity of XML, it would be better for interoperability to attempt to rectify these problems without moving away from XML. For Compactness, the only widely available alternative is generic compression below the XML level, as XML-specific compression algorithms have not achieved wide usage. Another option is to design the XML document structure so that it becomes more compact, for instance by choosing short element and attribute names and preferring content in attributes, as is the case with FIXML.
For improving Processing Efficiency, there are more options. An XML parser is a complex system and a naïve implementation is rarely very efficient. Furthermore, Processing Efficiency also encompasses phases like schema validation and conversion into application data, so improvements may need to be considered throughout the XML stack. The general way to improve Processing Efficiency is to optimize parser performance. Even widely-used parsers may have room for significant improvement. For instance, preliminary measurements in the EXI framework indicate that the default parser shipped with Java improved noticeably from version 5.0 to version 6, showing 2-3-fold improvement for some cases.
Other improvements to XML Processing Efficiency are cases where the performance is improved for a specific usage pattern. Published techniques can be largely divided into three classes: schema-derived parsing, differential parsing, and stack integration. None of these techniques improves processing performance for all, or even a majority of, XML documents in a single application. Rather, they provide techniques that can be used to improve performance in each applicable use case individually.
Schema-derived parsing refers to the technique of pre-compiling an available schema in some manner so that documents conforming to the schema are parsed efficiently. The product of the compilation can be either executable code [ChiuLu] or data structures for a generic parser [EngelenAutomata]. Of these, the latter is usually preferred in dynamic environments where new schemas may be required to be recognized. Furthermore, there should be little difference in efficiency between the two techniques as XML is a simple enough language that techniques for generating efficient parsing tables are well established.
Typical XML processing systems do not expect to receive arbitrary XML. Rather, the common case is that incoming documents in a single use case follow some schema, either an explicit or an implicit one. This observation is the basis behind differential parsing. A differential parser will store information on the XML documents it processes. Then, when another document is being processed, this stored information is used to efficiently process the parts in the document that match the stored information, leaving only the differences to be processed with the general system. Differential parsing has been based both on saving the parser state by creating checkpoints in the processed stream [Abu-Ghazaleh2] and on creating a finite automaton based on the byte sequences in the processed documents [Takase].
Stack integration considers the full XML processing system, not just the parser. By selectively combining the components of the processing stack through abstract APIs, the system can directly produce application data from the bytes that were read. Two prominent examples of this technique are [Screamer] and [EngelenGSOAP]. Both of these can also be called schema-derived as they compile a schema into code. However, neither simply generates a generic parser, but rather a full stack for converting between application data and serialized XML. This gives a significant improvement compared to just applying the pre-compilation to the parsing layer.
Of these techniques, none is specific to XML, but could be applied to the implementation of an EXI format as well. An EXI format is required to be able to use a schema to achieve improved Compactness, which implies a level of schema awareness that could potentially be used to also improve Processing Efficiency. Depending on the format specifics, it may be amenable to differential parsing as well. Properly-designed stack integration of an EXI format is expected to provide large benefits, especially for applications that process large amounts of data in floating point format or other formats with an inefficient text representation [BXSA].
The EXI Working Group published a call for efficient XML parser implementations to obtain a reasonable point of comparison in the measurements. The only parser that was provided as a result of this call was Xals from Fujitsu, a fully conforming high-performance XML parser that supports the usual APIs such as SAX and DOM. The main technique used in Xals is the integration of character decoding into the parser instead of using the platform's default libraries. This avoids the need to copy data in memory prior to it being passed to the application. Xals also checks most of the XML constraints prior to decoding, achieving greater efficiency than with character-based checking. An additional benefit of these techniques is greater portability, as there is no need to rely on a platform's default library for character decoding.
This section describes the measured characteristics, the measurement process, and the analysis employed, to evaluate the performance of submitted candidate formats for efficient XML interchange.
The independent data characteristics over which we measured the candidates were firstly size, and secondly an aggregate called "content density", described below. We also present a taxonomy of use case groups to classify the test data, and so scope the results for a given potential user.
The XBC Measurements Document [XBC Measurements] 6.2.1.5 (Measurement Considerations) says that processing efficiency is related to size and to complexity. Additionally, it discusses data complexity in the context of a large number of scenarios and property profiles of use cases. However, how to quantitatively determine the complexity of a single document, in order to gauge the response of a format's processing characteristics to complexity, is not discussed.
Size can naturally be considered the primary determinant of candidate format performance for many quantitatively measurable properties, and therefore the test documents must be characterized along the size axis. As the measurement of size, the group selected the size of the document in bytes, encoded as it was provided to the group. Since processing efficiency is partly determined by the amount of time that it takes for the document to be transferred into accessible memory, using the number of bytes instead of characters as the metric is the sensible choice.
Quantitatively measuring the complexity of an XML document is more difficult than simply measuring its size. Determining the Complexity of XML Documents ([DET-COMPLEXITY]) uses the following complexity metrics in addition to size:
[DET-COMPLEXITY] also considers the complexity of DTDs, but since many of the test documents do not have associated DTDs or other schemas, these considerations are not applicable.
Characterizing the test suite along too many different complexity axes may not be sensible, as the test documents would then be split into groups too small to facilitate meaningful aggregate conclusions. Therefore, the group decided to use only one complexity metric in addition to size for measuring the amount of structure in an XML document. This metric is content density (CD), computed as follows:
In summary, the documents in the test suite are characterized along two axes: size in bytes and content density. These characteristics are largely independent of each other, since content density is measured as a percentage.
For test data, the group obtained more than 10000 XML documents of various vocabularies. A subset of these were selected for analysis, based on overall representativeness, and the XBC use cases. These data are characterized in detail in the Appendix B: Description of Measurements Test Suite. Almost all of the XBC use cases were present. The missing four use cases are X3D Graphics Model Compression, Serialization and Transmission, XMPP Instant Messaging Compression, Business and Knowledge Processing, and SyncML for Data Synchronization.
As described above, the independent characteristics of the test documents over which we measured the performance of the various formats, were document size, and content density (the ratio between text and markup, abbreviated CD). A plot of these two metrics for the EXI test suite is shown below.
Looking at the plot above, four clusters of approximately equal size can be distinguished. The four also exhibit properties that make them interesting as analysis groups.
Note that the test documents do not divide neatly such that each test group fits entirely into a single cluster, thereby entailing that documents from the same test group frequently belong to separate content density clusters.
Another axis along which to order the test groups relates to the use cases to which they correspond (either because they map to the same use case, or because the use cases that they map to are similar in terms of the properties that they require). This approach yields eight rough categories of test groups, henceforth called Use Groups, that show no overlap:
Conversely to the case with the content density clusters, with use groups every test group belongs to a single use group, but the individual characteristics of the documents inside a use group may vary, even significantly.
As many of the test groups are applicable to more than one use case, the division of test groups into use groups is, to a degree, a matter of judgment. The major intent behind this specific division has been to include a sufficient number of test documents so that aggregate conclusions are valid over a use group, but not too many so that individual results get lost in the noise.
There are still some caveats regarding the test data and the parameters that were used in the measurements.
It also needs to be noted that, despite the attempt to provide useful use groups, some of them may not be of sufficient quality to enable useful aggregate conclusions to be made. In particular, the Broadcast use group consists of only a single test group where all of the documents are very similar to each other in size and somewhat similar in content density as well. This means that aggregate statistics are in danger of being perturbed by large amounts due to anomalous behavior from even a single document. Another potentially problematic use group is the Sensor group that consists of a number of minor variations of a single very small document, one middle-sized document, and one very large document that is also atypical XML. Therefore it is not advisable to attempt to make significant conclusions based on the results of this use group either.
For this draft of the Note we measured one of the three Algorithmic Properties described in the XML Binary Characterization Properties, for each contributed format; Processing Efficiency. Additionally, the tests measure the Format Properties of Compactness and Roundtrip Support. This section describes the quantities used to characterize those properties in each potential EXI format that has been measured.
An Algorithmic Property important for many use cases is Space Efficiency, which is not measured in this document. An implementation of the measurement exists in the framework for the Java-based candidates, using access to heap usage statistics provided by Java, but currently the measurement does not properly differentiate between different components of the property, causing anomalous results and thus making it not sensible to report the measured values.
This subsection describes the quantities we use to evaluate each format's Processing Efficiency.
In addition to characterizing the speed of processing, the XBC Measurement Methodologies [XBC Measurements] document, section 6.2.1 Processing Efficiency Description, delineates four properties of a format's algorithm which need to be analyzed for processing efficiency; Incremental Overhead, Standard APIs vs Abstract Properties, Processing Phases, and Complexity [of the algorithm itself, not of the input data]. Of these, really only incremental overhead is amenable to empirical testing. Incremental Overhead refers to whether a format "allows and supports the ability to operate efficiently so that processing is linear to the application logic steps rather than the size of data complexity of the instance" [emphasis added]. That is, it is reasonable to expect the processing time of an application using an efficient format to be dominated by the complexity of the application, not by the processing implied by the processing of the format. So the desirable characteristic of a format is that its processing time be (only) linearly or sublinearly dependent on the input data complexity.
For most users of XML, linearity over a broad range is likely to be a secondary concern. Of primary interest will be expected wall-clock elapsed time to process "most" typical examples of an individual's use cases, in their own scenarios. From the perspective of the XML community however, we require good performance over a very broad spectrum of document complexities. Therefore, as described above in Data Characteristics and Complexity, the complexity requirement as a whole is dominated by size, which makes the linearity requirement a necessary one in addition to the small elapsed wall-clock time.
Processing speed was measured for each of the following processing tasks, derived from the XML Binary Characterization Measurement Methodologies document, part 6.2.2. However, the measurement process was different for Java as opposed to C/C++ ("native") format candidates. These are described separately below.
For Java based candidates, processing efficiency was measured in each of the following contexts:
For each of these contexts, measurements were made for each of the "application classes" (see below), neither (no schema and no compression), document (compression), schema (use of metadata), and both (use of both compression and metadata).
For the Java candidates, processing efficiency measurements were made over a variety of networks. See below, Measurement Framework for a full description of the measurement process.
For C/C++ ("native") candidates, processing efficiency was measured only for encoding and decoding to and from memory.
Formats which include schema-aware encoding methods were allowed to use them in cases where schemas exist. Since some formats required a schema, where no schema existed for a test group, a naive XML Schema (root of xsd:anyType) instance was generated for purposes of format performance comparison. Where a schema or some other meaningful shared state might be exchanged, timing tests do not include the time it would take to share the state, such as exchanging the schema.
A figure of merit for compactness used in the literature, is the normalized compactification rate = 1-(c/l), (that's one minus c over l) where l is the length of the original XML document (say utf-8 encoded on disk), and c is the length using some compactification scheme. The factor can be multiplied by 100 to get a percent compaction. This formalism results in higher positive values (0..1-eps; or 0..100-eps%) the more compaction there is (and negative values when the output is bigger than the input). In order to provide more intuitive figures we have chosen to use, at least initially and for presentation of results, simply c/l, or (c/l)*100%. That is, smaller is better.
Note that the XBC Measurement Methodologies document [XBC Measurements] describes the property "Compactness", of which one method is "compression". "Compression" in this Note, means "document compression", which is taken to mean loss-less compression algorithms—those which use redundant information in a document to encode its data in a smaller space. Roughly speaking, compactness includes compression, but may also derive from other methods.
The XBC Measurements Methodology document says that the compactness property for a given format can be characterized separately for each of the following methods from which overall compactness may be derived ([XBC Measurements], section 6.1.2):
In accordance with the XBC Measurements Methodology document, each of the elements in this 6-vector was to be measured and evaluated numerically, or else characterized as follows "N/S (not supported) if the method is not supported, or N/A (not applicable) if the method does not apply" ([XBC Measurements] part 6.1.2 para 1). The evaluation of lossy compression itself was to be characterized by a vector, being the amount of compression obtained as a function of the lossiness implied by each (if more than one) combination of lossy compression schemes used by a format (possibly in combination with an input "permitted lossiness parameter").
For these measurements of compaction we have taken two simplifying steps. First, the tests do not attempt to characterize the compactification that may be gained from delta encodings and lossy encodings (5 and 6 above). None of the formats submitted so far utilize deltas in any integral way. If a non-trivial schema was not provided for the test group, the format is not permitted to look for one "elsewhere" (like at a URI). Just as for the processing efficiency measurement, if there is no non-trivial schema, the format's processor is permitted to use a pre-generated trivial schema (root of xsd:anyType).
Also none of the formats we considered include a lossy compression scheme.
Each candidate's measurement framework driver transforms the XML document into the candidate's own format. The result is placed into a memory buffer. The EXI framework gets a reference to this buffer and calculates its size. At the time of writing, all such compactness results reported in this draft, in Appendix A: Measurement Details, are expressed as a factor to that of XML.
As all considered candidates are able to utilize only schema-based techniques and document analysis on top of the always-allowed tokenization, these two techniques combine to form four different application classes:
As the techniques available to candidates will be different in each application class, the data analysis of processing efficiency results was split four ways according to these classes, in addition to the analysis of compactness. In the Document and Both classes, candidates are compared against gzipped XML, while in the Neither and Schema cases, the comparison was to plain XML.
In order to make consistent measurements of all candidates over the whole test suite set, we used a framework based on Japex. Japex is a simple tool that is used to write Java-based micro-benchmarks. It can also be used to test native language (e.g. C/C++) systems via the Java Native Interface (JNI), which we used to test native candidate implementations.
Japex is similar in spirit to JUnit in that it does most of the repetitive programming tasks necessary to make a measurement. These tasks include loading and initializing the required drivers, warming up the VM, forking multiple threads, timing the inner loop, etc.
The input to Japex is an XML file describing a given test. The file's primary constituents are the location of a test data group (for example all of our examples of X3D XML files), and references to one or more "drivers". These drivers interface Japex to the code which is under test. Each implements some well defined micro-benchmark measurement, such as "encoding with schema".
The output of Japex is a timestamped report in XML and HTML formats. The HTML reports include one or more charts generated using JFreeChart. The output gives, for each observable under test, results for each data file, plus aggregated results for the test group.
To measure a candidate's Processing Efficiency, we measured aggregate throughput for each of the processing tasks.
For our purposes, the term "throughput" is defined to be work over time. Japex is designed to estimate an independent throughput for each of the tests in the input. Throughput estimation is done based on some parameters defined in the input file. There are basically two ways to specify that: (i) fix the amount of work and estimate time, or (ii) fix the amount of time and estimate work. We fixed the time and measured work. In addition to each test's individual throughput, aggregate throughtputs in the form of arithmetic, geometric and harmonic means of results for all files in the test suite are computed for each test.
The measurement consisted of feeding the events of the in-memory representation, one by one, to the measured encoder, and then using the measured decoder to parse the produced bytes back into the in-memory representation. Subsequently, if the application class was either Document or Both, the output, or input, streams were wrapped into GZIP deflater or inflater objects, respectively - except in the case of EFX where the built-in compressor was used. Processing time was measured separately for the encode and decode phases.
See the Japex Manual [JAPEX] for further information about Japex.
Two different computer systems were used in the measurements, detailed in Appendix E: Characterization of Measurement Machines. The systems were carefully prepared to ensure reproducible measurements by disconnecting any peripherals that were not needed to avoid interrupts and by running only the measurement framework during the measurements. As the measurement is intended solely for comparing between technologies, this use of a limited number of computer systems is a reasonable choice.
In our tests, measurements were made by a Java-based micro-benchmark framework, both for processors implemented in Java and those implemented in native languages. Since a Java Virtual Machine (JVM) typically performs a just-in-time compilation of the running code, the first run does not usually reflect actual application performance. Therefore, the actions of each experiment were repeated as a warm-up, without measurement, until sufficient cycles had passed to ensure that the measurements were stable.
This process has the intended effect of approximating and benchmarking the performance of a warmed-up application. However, systematic effects may still arise from caching (or warming-up) the input data, in addition to the code. For example, if the benchmarking framework repeatedly reads a copy of the input data from a fixed location in memory while warming up the code, the processor will have the input data in high-speed cache memory, allowing access times typically over 25 times faster than a real application might encounter. One way to address this problem is by sequentially processing multiple copies of the input data laid out end-to-end in memory. However, simplistic versions of this memory access pattern might cause the results to be skewed by pre-fetch caching hardware. Although these issues can be addressed, care must be taken to tailor buffers, gaps, and total data so that cache levels are cleared while avoiding virtual memory paging.
The cache cannot be removed or defeated for an empirical assessment because caching behavior has a significant positive impact on the performance of modern systems. For example, an algorithm designed to have a good locality of reference to take advantage of caching and pre-caching architectures would perform significantly worse if the cache is disabled than it would in actual practice. In addition to data cache effects, modern processors are affected significantly by branch prediction. Modern branch prediction is based on local and global historical branch activity, meaning that warming up the processor on data that is significantly different than the test data can result in some performance difference. Most of the performance of modern systems relies on caching, branch prediction, and related capabilities.
Our primary interest was to understand the performance of each potential EXI format algorithm independently of I/O related factors. Therefore, the benchmarking framework attempts to accurately model the code paths and memory access patterns of real applications to the extent possible, while minimizing the cost of I/O, blocking, context switches, etc. To factor out the majority of the I/O, blocking and context switching costs associated with network I/O, we used a local loopback interface. The local loopback reads and writes all data through local memory instead of a physical network, essentially modeling a memory-speed network. This approach uses the same buffering algorithms, code paths and memory access patterns as a real-world application, reproducing realistic caching effects without warming-up the input data. At the same time, since it's memory-speed, the interface will probably provide data faster than an EXI algorithm can consume it. This way, the algorithm remains CPU bound throughout the duration of the test and is not subject to I/O related blocking or context switching. The resulting benchmarks show the processing efficiency of each EXI algorithm isolated from the speed or I/O effects of a particular network.
The benchmarks that are likely to most accurately reflect the performance of real EXI applications are those that include I/O to some external media, like a network or file system. EXI is about interchange, so most EXI use cases will read or write EXI documents to and from devices that are often slower than main memory (i.e., networks or storage media) although there are important cases of memory to memory interchange. Therefore, the benchmarks that are likely to most accurately model real-world performance for many cases will be those that read and write EXI documents to and from networks and storage devices. Careful scrutiny is needed, especially when not using a loopback interface, to detect bandwidth bottlenecks that are smaller than the throughput of algorithms being tested. These benchmarks are expected to accurately model the interaction between the caching algorithms employed by modern computing architectures and the memory access patterns of buffering algorithms employed by typical device drivers. They also are expected to accurately model the subtle, but significant performance implications of blocking and context switching that occur as algorithms shift repeatedly between CPU and I/O. For a given platform and use case, the average time spent doing context switches and being I/O-bound vs. CPU-bound will differ from one EXI format to another. It will be influenced by a variety of factors, including the average throughput of the EXI algorithm, the average throughput of the I/O device, and the compactness of the EXI format. It must be noted however that this interaction with operating system characteristics may not be deterministic. The determinism of a trial may be subject to particular buffer sizes or other details that are not obvious. Therefore, for higher confidence, a comparison might be needed between a network-intermediated trial and a cache-clearing, memory to memory trial.
To look more closely at those effects of the network, in a real-world- like setting, the framework has been extended to measure the performance of EXI algorithms reading and writing from any TCP/IP based network (e.g., Wired LAN, WI-FI LAN, Internet, GPRS, etc.). This will enable us to collect benchmarks using the same network media, buffering algorithms, device drivers, and code paths employed by EXI use cases. As such, it replicates the memory access patterns, blocking patterns, context switches, etc. of, at least, TCP/IP based real-world applications. Results from 100Mbps network tests using this extension, have been included in this draft of the note.
This section describes the measurement framework's setup for the measurement of Java based candidates. The quantities so measured for Java based candidates are given above.
For the measurement of decoding speed of a Java candidate, the candidate's driver first transforms the XML document into its own format. The result is placed into a memory buffer. If decoding from memory, this memory buffer is wrapped into an input stream and passed to the driver for parsing. All Java drivers parse this input stream using the SAX API (an XMLReader) with an empty SAX content handler, essentially dropping every event after it's reported. If decoding from the network, an "echoing" server is started either as a separate thread, if in localhost mode, or as a separate process, if in non-localhost mode. The framework then sends the entire stream to the echoing service for buffering. It then creates an input stream that reads from the network socket connected to the echoing server and passes it to each driver. Each driver operates identically regardless of whether they are reading from a memory buffer or the network.
For the measurement of encoding speed, the XML document is parsed using a SAX parser and all the events recorded in a data structure (an event array). If encoding to memory, a memory buffer is created and wrapped into an output stream. This output stream is set in each driver and the stream of SAX events is played back to the candidate's encoder. If encoding to the network, a server is started, either as a separate thread, if in localhost mode, or as a separate process, if in non- localhost mode. The framework then creates an output stream connected to the socket on which the server listens. Just as in the in-memory case, the stream of SAX events is played back to each driver.
This section describes the measurement framework's setup for the measurement of C/C++ based candidates. The quantities so measured for C/C++ based candidates are given above.
In the case of libxml2, the reference XML processor, the serialization process was first to read the entire XML document to be tested into a DOM tree structure in the initialization function (the time to do this was not measured). It would then repeatedly serialize this DOM tree to an XML document in memory for either a fixed duration or for a set number of iterations depending on how the test was set up. For decoding, the time to execute empty SAX handlers was measured. The document was read into memory in the initialization function and then repeatedly parsed in the time measurement loop by the SAX parser.
All of the native candidates were based on ASN.1, and for each of them a form of data binding was used. An ASN.1 specification that was equivalent to a given XML schema specification was created using a standardized procedure (specified in X.694). This specification was then run through an ASN.1 compiler to created hard bindings of the structure to C program variables. For example, an XML schema sequence having 3 integer member variables (a, b, and c) would result in the generation of a C structure with 3 equivalent integer members and code would be produced to encode to and from this structure. For encoding, the initialization procedure loaded the XML document into this custom C structure. The encoding loop would then repeatedly invoke the custom encode function to serialize the data to XML form. For decoding, the process was reversed. The XML document was read into memory in the initializer. It was then repeatedly decoded into the C structure in the timing loop to get the decode time. Note that the C structure variable would be cleaned and memory reset between invocations to alleviate the memory cache effect.
Some of the candidates currently being considered as a basis for an EXI standard only have C or C++ implementations, as opposed to Java, available at this time. In order to test the processing efficiency of these candidates using Japex, the framework was augmented to provide a driver API through the Java Native Interface (JNI), that allows non-Java applications to be benchmarked.
The JNI is used in such a way as to isolate the performance of the C/C++ implementation from the Java processing in the Japex framework. This is done by doing all of the timing on the native (i.e. C/C++) side of the interface. Clearly, there will be some residual systematic effect, since at least a JVM is cohosted, but we believe that effect is negligible.
However, the processing efficiency timing results that are produced cannot be directly compared with results for Java candidate implementations for several reasons:
For these reasons, it would not be accurate to compare native candidates with Java ones for processing efficiency. Instead, a comparison is done with an existing C/C++ based XML processor. We used the open source libxml2 library (http://xmlsoft.org), a highly performant implementation available in most Linux distributions. Libxml2 doesn't resolve all issues; for instance, it does not provide data binding.
In this section we also describe the taxonomy with which the group evaluated the extent to which each EXI format candidate satisfies the lexical reproducibility requirements of the examples in the test suite .
The degree to which a candidate can accurately reproduce the information represented by a test group is determined jointly by the following:
To characterize the extent to which each candidate preserved the information necessary for a test group, each test group was annotated with what information must be preserved, and with what accuracy, to satisfy the particular requirements of that case.
The "roundtrip support" measurement property of the test framework was used to determine whether the candidate accurately reproduced the information represented by each test group, according to the annotation. If the result of this measurement was that the candidate did not reproduce the information with the required fidelity, or it was predetermined to fail without running the test, then the candidate was determined to have failed the round-trip support measurement property for that test group.
The fidelity requirements were communicated to the test framework, and to each candidate under test, through japex parameters. Those parameters prescribed what did and did not need to be preserved. Candidates were free to examine the parameters to optimize their encoding. For example, if preservation of comments and processing instructions was not required, as is the case for SOAP message infosets, then a candidate was allowed to use that knowledge to encode SOAP message infosets more efficiently than it could were it not permitted to make such a narrowing assumption.
The following information represented by a test group did not need to be preserved by a candidate:
It is not anticipated nor expected that candidates will accurately reproduce information on a syntactic or byte-per-byte basis [see fidelity scales 3 and 4].
The following subsections present the distinct sets of information that may or may not be preserved by a candidate.
If a white space needs to be preserved then all non-markup white space information [see 2.10 White Space Handling, XML 1.0] represented by the test group MUST be preserved. Otherwise, all non-ignorable white space information MUST be preserved. Such non-ignorable white space information may be determined from a schema, if present with the test case, and/or by inspection of the test group.
If comments need to be preserved then all comment information represented by the test group MUST be preserved. Otherwise, comment information need not be preserved.
If processing instructions need to be preserved then all processing instruction information represented by the test group MUST be preserved. Otherwise, processing instruction information need not be preserved.
If namespace prefixes need to be preserved exactly then all namespace prefix information represented by the test group MUST be preserved verbatim. Otherwise, namespace prefix information need not be preserved identically.
The verbatim preservation of namespace prefixes is important when a test group contains signed information or prefixes as part of a qualified name information in element content or attribute values. Preservation of namespace prefixes is important in any case to maintain document validity and correctness.
If lexical values need to be preserved then all character data (see Extensible Markup language (XML) 1.0 [XML 1.0] section 2.4 Character Data and Markup) information represented by the test group must be preserved. Otherwise, lexical values need not be preserved.
Lexical values may not be fully preserved if a candidate chooses to encode a lexical value into a more efficient lexical value, or in binary form, and the original lexical value cannot be reproduced. For example, the decimal lexical value of "+1.0000000" might be converted to the more efficient canonical decimal lexical value "1.0". Similarly, the boolean lexical value "1" might be converted to binary form as one bit of information and it may not be determinable from the binary form whether the original lexical value was "true" or "1".
If the document type declaration and internal subset need to be preserved then the document type declaration and internal subset information represented by the test group must be preserved. Otherwise, document type declaration and internal subset need not be preserved.
Candidate formats, and test groups, can be classified according to a "scale of fidelity". For formats, the scale is a metric of the extent to which information is preserved with respect to various XML data models. For test groups the same scale is used to classify the requirement the of the test group for round-trip accuracy. We defined it to go in order of increasing fidelity. A candidate which has been determined to score at a certain level on the scale may still preserve some, but not all, information specified at higher levels.
Note that this does not imply that documents produced by candidates have an isomorphic representation of the information represented by certain XML data models, rather that a candidate stores enough information to be able to reproduce the preserved information.
| Level | Preserves |
|---|---|
| -1 | Preserves only a subset of the XPath data model Level -1 defines the class of candidates that preserve a subset of the XPath 1.0 data model. For example, a candidate might not preserve namespace prefixes. |
| 0 | Preserves the XPath data model Level 0 defines the class of candidates that preserve the XPath 1.0 data model. Such preservation includes the root node, elements, text nodes, attributes (not including namespace declarations), namespaces, processing instructions and comments. Unexpanded entities, the document type declaration and the internal subset are not preserved. This level corresponds to a common denominator in XML processing, it is equivalent to the SOAP XML subset with the addition of processing instructions and comments. |
| 1 | Preserves the XML Information Set Level 1 defines the class of candidates that preserve the XML Information Set data model. The XML Information Set preserves more of the information represented in the document type declaration and internal subset than the XPath 1.0 data model but not all information, such as attribute and element declarations, is preserved. The [all declarations processed] property of the Document Information Item is not, strictly speaking, part of the Infoset [XML Infoset] and therefore does not need to be preserved. NOTE: A candidate can fully support level 1, level 2, and the preservation of the complete internal subset by including and encoding the internal subset as a string, thereby leaving it up to the decoder to optionally process it. |
| 2 | Preserves the the XML Information Set and all
declarations Level 2 defines the class of candidates that preserve the XML Information Set data model, as in level 1, in addition to all information that is not purely syntactic. Such additional information will include the attribute, element and entity declarations. This level covers the preservation of all information represented by an XML document but does not accede to supporting purely syntactic constructs. NOTE: A candidate can fully support level 2 by including and encoding the internal subset as a string, and thus leaving it up to the decoder to optionally process it. |
| 3 | Preservation of syntactic sugar of the XML document Level 3 defines the class of candidates that preserves all information represented by an XML document, as in level 2, in additional to some but not all syntactic information. NOTE: Full syntactic information is supported by level 4. Such preservation includes but is not limited to:
|
| 4 | Preservation of bytes Level 4 defines the class of candidates that preserve the XML document byte-per-byte. NOTE: This is the simple level supported by generic encodings such as gzip. |
Creating a benchmarking framework that is able to produce a variety of measurements fairly and accurately, using multiple format implementations from many vendors is a complex and arduous task. This difficulty has affected both the analysis aspects covered in this document as well as the whole methodology of producing measurements.
First, while the XBC Characterization Note lists a number of requirements for an EXI format, this document covers only three: roundtrip support, compactness, and processing efficiency. However, many of the listed requirements are not amenable to measurement as such, but would rather be evaluated based on the format specification. Since this task does not require precise measurement, it is expected to take considerably less time and has been left for later.
The full methodology is an iterative process, starting with a measurement run. The results of such a run are reviewed for stability and perceived correctness. Any discovered problems are corrected and a new run commissioned. Meanwhile, results that have been deemed stable are used as a basis for preliminary analysis. The results presented here come from a stored snapshot that has been judged sufficiently stable for making decisions.
Even so, there is still some variation in the results. This is a concern especially with the C-based candidates that may exhibit enormous and inexplicable variance between different test documents. It is therefore likely that there are still some problems with how the test framework's Java code interacts with the C code of the candidates. Accordingly, the results from the C-based candidates were filtered more carefully, by eliminating results for individual documents that were obviously incorrect. In addition, the stableness review of the results paid careful attention to any improvements in these candidates in particular.
The measurement process produces a large amount of data, consisting of 88 documents, four application classes, three separate runs, and eight candidates. Furthermore, as different applications require different properties of a format in differing proportions, it is not feasible to produce a single, or even a few, figure of merit for comparing candidates. Instead, comparison needs to be performed over a variety of use cases or the like.
A variety of analysis methodologies were used, ranging from graphs, both summaries and detailed ones, to statistical methods for estimating average performance. Comparisons were always made to best available XML implementations, used as would be normal for each particular case under analysis.
Review of the results and aggregation was performed with a spreadsheet program that allows live re-grouping of the data. This leads to an easy viewing of the anomalous results and their elimination from consideration. Other analysis methods included drawing of graphs for particular sub-groups of the whole test suite, making inferences based on these graphs, and then verifying the inferences and establishing precise performance bounds by reviewing the individual measurements.
The overall goal of the measurement and analysis process was to select one or more distinguished candidates that could serve as a basis for the EXI format. Accordingly, much of the analysis was focused on picking the top performers and verify that their top performance was consistent throughout the different use cases, and especially to make sure that none of the distinguished candidates disqualified themselves in any particular case.
This section provides a technical description of the basic architecture of each of the contributed XML formats whose performance characteristics were measured (see Results). The statements in these descriptions are those of representatives of each format and have not been validated by the working group.
The X.694 ASN.1 BER candidate submission uses a set of standards that have been in place for many years for binary messaging (ASN.1 itself since the early 1980s). There are three parts to the candidate:
In this format candidate, BER encoding was selected in preference to PER (Packed Encoding Rules - see X.694 ASN.1 with PER below) because it was felt that the flexibility in the tag-length-values (TLV) that it provides, is closer in spirit and functionality to XML than the tightly coupled encodings produced by PER. The general principle is that each element construct in XML;
<tag>textual content</tag>
is replaced by a similar binary encoding consisting of a tag-length-value descriptor.
Efficiencies are gained from two properties of this format: 1) the tags and lengths are binary tokens that are in general much shorter than XML textual start and end tags, and 2) the content is in binary instead of textual form.
Some advantages of using ASN.1 BER to encode XML are that the TLV format is similar to XML's start-tag / content / end-tag pattern; encoded length can be 5 to 10 times less than textual XML, and encoding and decoding is very efficient - no compression or other CPU intensive algorithms are used. Additionally, ASN.1 BER is mature and stable, and its security has been studied in-depth.
However, it is schema-based - an XSD or similar schema is needed to encode and decode, and some fraction of the XML Information Set can not be represented (for example, randomly occurring comments and Processing Instructions).
See references [USE-OF-ASN.1], [ASN.1], [BER], [X.694].
Similarly to the X.694 ASN.1 with BER candidate format described above, this one uses X.694 to map from an XML Schema document to ASN.1. In this variation, we add the use of two further ITU-T standards, X.693 and PER. These additions allow direct output in textual XML, and a somewhat more compact binary encoding.
PER, which stands for Packed Encoding Rules [PER], is one of the ASN.1 Encoding Rules published by the ITU-T, IEC, and ISO. PER was specifically designed to minimize the size of messages needed to convey information between machines. It has been widely adopted in critical infrastructure where bandwidth is limited. It originated from work on an efficient air-to-ground communication protocol for commercial aviation, and has since been used in many areas including cell phones, internet routers, satellite communications, internet audio/video, and many other areas.
Another of the ASN.1 Encoding Rules, Extended XML Encoding Rules, or "EXTENDED-XOR" [X.693], defines a set of encoding rules, that can be used in a way analogous to XML Schema, and applied to ASN.1 types to produce textual XML.
Since the schema notation used by XML Schema is quite different from the ASN.1 notation, ITU-T Rec. X.694 | ISO/IEC 8825-5 was created to give a standard mapping from schemas in XML Schema to ASN.1 schemas in such a way that XML Schema aware endpoints, could exchange documents with ASN.1 aware endpoints, using the EXTENDED-XER Encoding Rules.
With the ASN.1 schema generated using X.694 from an XML Schema instance, one can generate not only XML documents (using the ASN.1 engine with EXTENDED-XER), but binary encodings with any of the ASN.1 Encoding Rules, of which PER is the most compact.
The Xebu format is a product of research into XML messaging on small mobile devices. The central XBC Properties considered in its design were; Streamable, Small Footprint, and Implementation Cost.
Xebu models an XML document as a sequence of events, similarly to StAX or SAX. The event sequence is serialized one by one. The basic Xebu format, applicable to general XML data, includes mappings from strings in the XML document to small binary tokens. These mappings are discovered dynamically during processing.
Xebu also includes three optional techniques for when a schema is available:
The main advantages of Xebu are; its support for general XML with varying levels of schema-awareness, a direct correspondence with well-understood XML data models, so making XML compatibility easy to achieve, and that it's simple and straightforward to implement.
See reference [XEBU1]
The implementation used in the measurements has been written for mobile phones, and therefore it does not perform as well as an implementation written for desktop machines or servers would.
The measurements with a schema were run with only pretokenization enabled. Typed-content encoding was not enabled due to the difficulties of accessing type information efficiently. The event omission implementation cannot handle arbitrary schemas, so only a subset of the test document schemas would have produced an effect. Furthermore, the implementation is written for RELAX NG, and conversion from XML Schema worked only for a very few cases. Because of these issues, event omission was also disabled in the measurements.
Extensible Schema-Based Compression (XSBC) is a system for encoding XML documents that are described by schemas into a binary format. The result is more compact, faster to parse, and has better databinding performance than textual XML.
XSBC preprocesses the schema that describes an XML document, and creates lookup tables consisting of integers that index the string values of element names. The schema's type information about marked-up data, if any, is captured in one lookup table. In the same way, element attribute names are added to a lookup table, along with their type information. The size of the integers that correspond to the element and attribute names is chosen so that if only a few names are in the document, smaller numbers are used.
Once the schema has been processed and the lookup tables have been populated, the XML file is transcoded into binary format. Text element start and end tags are replaced by the binary format whole numbers to which they correspond in the element lookup table. Additionally, if any marked-up text data can be represented in an equivalent binary format, such as floating-point, it is replaced.
Element attributes are added after the binary element start tag. Attribute representations may be only the attribute start tag, followed by the attribute data, in binary format, if the schema type information makes this possible.
The marked-up data in binary format, for either element data or attribute data is then passed through data compressors. In the case of simple, fixed-length integers and floats, the standard IEEE 754 format can be used. Variable length data, such as strings, can be represented by a starting value that includes the length, followed by the actual data. The data compression system is easily extended to handle data other than that represented by the standard data compressors, and in fact can be extended dynamically. For instance, it's possible to write custom compressors for sparse or repetitive matrices, floating point data which is representable in only a certain number of significant digits, or other specialized data types.
Encoding and decoding a textual XML document to an XSBC document is quite straightforward. A simple finite state machine can be constructed to decode XSBC documents. An XSBC decoder can step through the document easily because of the structure of the document.
An XML Schema instance is required to encode the document. In future, the XSBC team plans to implement a feature in which a document without a schema can be "pre-preprocessed" and a stand-in schema generated for it. This stand-in schema will be untyped (all data will be represented as strings) but this would allow an arbitrary XML document without a schema to be represented.
XSBC's virtue is its simplicity. It is, essentially, every programmer's first idea of how to represent an XML document in binary form. There is a strong correspondence between the textual XML representation and the binary representation.
The Fujitsu XML Data Interchange (FXDI) format was designed to serve as an alternative encoding of the XML Infoset, which allows for more efficiency, both in the exchange of data between applications and in the processing of data at each end-point. FXDI's primary design goals were document compactness, and to enable the implementation of fast decoder and encoder programs, which run in a small footprint, and without involving much complexity.
FXDI is based on the W3C XML Schema Post Schema Validation Infoset (PSVI), though some of the format features derive from the XPath2 Data Model. Although FXDI performs much better when schemas are prescribed before documents are processed, it is capable of handling schema-less documents and fragments through its support for Infoset tokenization.
At the core of FXDI is the "compact schema". FXDI uses Fujitsu Schema Compiler to compile W3C XML Schema into a "schema corpus". A schema corpus contains all the information expressed in the source XML Schema document plus certain computed information such as state transition tables. A compact form of the schema is then computed from a schema corpus by distilling only those information items which are relevant to the function of FXDI processors.
There are two types of FXDI processors. Firstly, those that are used to generate FXDI documents. These are called "FXDI Encoders". Conversely, FXDI Decoders, decode FXDI documents into data usable by programs.
FXDI supports two different methods to create FXDI documents:
In the EXI Test Framework tests, the validating encoder was used in the compactness tests. The validating encoder carries out schema-validation as part of encoding process and logs any errors in test case XML documents. This is useful for diagnosing performance anomalies caused by test case documents that deviate from their associated XML Schemas. On the other hand, the non-validating encoder is used in the encoding tests for maximizing processing efficiency.
FXDI works well with conventional document redundancy-based compression such as gzip. That facilitates use cases that need the additional compression and can spend the additional CPU cycles.
See W3C EXI WG and Fujitsu for more information.
Fast Infoset is an open, standards-based binary format based on the XML Information Set [XML Infoset]. ITU-T Rec. X.891 | ISO/IEC 24824-1 (Fast Infoset) [FI] is also itself an approved ITU-T Recommendation as of 14 May 2005. An ISO ballot has been initiated (and is near completion) that will result in ISO/IEC 24824-1 being available for free when published.
The XML Information Set specifies the result of parsing an XML document, referred to as an "XML infoset" (or just an "infoset"), and a glossary of terms to identify infoset components, referred to as "information items" and "properties". An XML infoset is an abstract model of the information stored in an XML document; it establishes a separation between data and its representation that suits most common uses of XML. An XML infoset (such as a DOM tree, StAX events or SAX events in programmatic representations) may be serialized to an XML 1.x document or, as specified by the Fast Infoset specification, may be serialized to a Fast Infoset document. Fast Infoset documents are generally smaller in size and faster to parse and serialize than equivalent XML documents.
The Fast Infoset format has been designed to jointly optimize the axes of compression, serialization and parsing, while retaining the properties of self-description and simplicity. The approach has been to find, when not taking advantage of advanced features, a "sweet spot" where moderate compression can be achieved but not at the undue expense of creation, processing performance and simplicity.
The use of tables and indexing is the primary mechanism by which Fast Infoset compresses many of the strings present in an infoset. Recurring strings may be replaced with an index (an integer value) which points to a string in a table. A serializer will add the first occurrence of a common string to the string table, and then, on the next occurrence of that string, refer to it using its index. A hash table can be used for efficient checking of strings (the string being the key to obtaining the index; every time a unique string is added to the hash table, the index of the table is incremented. A parser will add the first occurrence of a common string to the string table, and then, on the next occurrence of that string, obtain the string by using the index into the table.
Fast Infoset is a very extensible format. It is possible, via the use of encoding algorithms, to selectively apply redundancy-based compression or optimized encodings to certain fragments. Using this capability, as well as other advanced features, it is possible to tune the "sweet spot" for a particular application domain. An example of this is the use of a built-in encoding algorithm to directly encode binary blobs without the need for any additional encoding, in a way similar to the Message Transmission Optimization Mechanism(MTOM), but with the binary blobs encoded inline as opposed to as attachments. Other built-in algorithms can be used to efficiently encode arrays of primitive data types like integers and floats, a feature often used by scientific applications to reduce message size and increase processing efficiency.
Additional features of Fast Infoset include support for restricted alphabets (for better compactness) and for external indexing tables, for those cases in which a tighter coupling is acceptable in the interest of achieving better performance.
For the Schema and Both application classes Fast Infoset is not fully optimised to utilise schema:
Efficient XML is a general purpose interchange format that works well for a very broad range of applications. It was designed to optimize performance while reducing bandwidth, battery life, processing power and memory requirements. It is the only format currently being tested that supports all of the features specified by the minimum binary XML requirements defined by the W3C XBC group XML Binary Characterization [XBC Characterization].
The encoding is schema "informed", meaning that it can leverage available schema information to improve compactness and performance, but does not depend on accurate, complete or current schemas to work. It will work very effectively with partial schemas or no schemas at all. It also supports arbitrary schema extensions and deviations and allows dynamic schema negotiation, discovery and acquisition.
Efficient XML achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, Efficient XML is optimized specifically for XML languages. The built-in Efficient XML grammar accepts any XML document or XML fragment and may be augmented with productions derived from XML Schemas, RelaxNG schemas, DTDs or other sources of information about what is likely to occur in a set of XML documents. The Efficient XML encoder uses the grammar to map a stream of XML information items onto a smaller, lower entropy, stream of tokens. The encoder then encodes the stream of tokens using a Huffman tree derived from the grammar or, if additional compression is desired, passes the stream of tokens to a more sophisticated XML compression algorithm that replaces frequently occurring token patterns to further reduce size. When schemas are used, Efficient XML also supports a user-customizable set of datatype CODECs for efficiently encoding typed values and provides typed streaming APIs for efficiently accessing typed values.
The binary form of Efficient XML is very compact. It is competitive with hand-optimized formats and is consistently smaller than both ASN.1 PER and gzipped XML. Even on very large, repetitive documents where gzip works best, it is not uncommon for Efficient XML to be 2-5 times smaller than gzipped XML.
Production implementations of Efficient XML have been integrated into a broad range of platforms, including mass market mobile phones, PDAs, application servers, web servers, high-volume message routers, pub-sub systems, vehicles, aircraft, and satellite broadcast systems. High quality, commercial implementations are available for Unix, MS-Windows and a wide variety of mobile devices running both Java and Microsoft.NET.
See Theory, Benefits and Requirements for Efficient Encoding of XML Documents [EffXML] for more information.
The Efficient XML implementation used for W3C tests is a non-production implementation designated for evaluating W3C-proposed changes to Efficient XML. In addition to implementing the minimum W3C requirements, it includes the format features required to support advanced requirements, such as random access, accelerated sequential access and digital signatures. As a non-production version it has not been fully optimized.
This uses both "X.694 with PER" as described above as well as "Fast Infoset" also described above. Where there is an XML Schema, X.694 is used to map the schema to ASN.1. If there is no schema, or if the XML Document deviates from the schema, the entire XML document is serialized using Fast Infoset instead.
Note that in general better performance will be gained if there is a schema from which the documents do not deviate (in which case PER will be used). Any exceptions to this will be handled by Fast Infoset.
Efficiency Structured XML (esXML) is a format that encodes the XML data model in a way that is flexible, compact, and efficient to process. The format allows a range of encoding methods, from purely byte-oriented tokenized data to bit-oriented, variable-token table-based encoding and from fully self-contained instances to a spectrum of externalized forms, such as schema-based encoding. This externalized information, which can include metadata, value typing and table priming, structure, templates, and encoding choices is encoded in the esXML format in an XML Meta Structure (XMS) instance. (A template is a range of XML data that is referred to later in a document so that its structure is copied, with or without data, as a reference with new data slotted.) An XMS instance can be created from one or more schemas, example data, or any other process. The XMS instance captures any information externalized from a logical document and, along with esXML instances created relative to it, can be used to recreate the fully self-contained logical data. The important distinction for an XMS is that it is a directly-interpretable, portable, and standardized representation of schema-like information that can be shared at run time between disparate implementations and used to both encode and decode application data. This avoids sharing schemas, compiling them into usable form, and even agreeing on a common schema language. This also allows for automated choice of encoding options.
Certain widely-useful semantics are part of esXML which could be layered on XML only in inefficient fashion. These semantics allow for stable pointers to data in an instance, copy-on-write layering of arbitrary changes to a base document at both high and low levels, and flexible indexing of element content. These mechanisms can be used for efficiently capturing changes in a delta instance, for direct efficient representation of any data structure, such as a graph, and for random access into an instance. Indexing can be both deep and shallow, hash or sorted, and can occur at any element.
Encoding of data types is flexible, supporting not only text-based values, but opportunistic and schema-informed binary encoding of scalar types, including IEEE floats, doubles, and quads. The two major byte orders are allowed, reader makes right, with only a bit indicating the default order for the document. A unique restricted character set encoding with escapes allows taking advantage of narrow use of character values even in the presence of occasional exceptions. Binary data is encoded directly if indicated by the schema or library hints from the application. An encapsulation token allows any element subtree to be individually compressed, signed, or encrypted without affecting the data model.
The structure of an esXML instance is encoded in either of two modes: byte oriented or bit oriented. The byte oriented mode uses a compact token, which is either 4 bits + ID, 8 bits, or 8 bits + argument, and lengths that are encoded in an variable-length integer, called a "Scalable Int", which is byte by byte continued with the 8th bit. The bit-oriented encoding can be thought of as a table-based encoding of byte sequences which are tokens, IDs, and lengths. In other words, the sequence "Element, ID=8, Length=13" which would take 2 bytes in byte oriented mode, might take 2 bits in bit-oriented mode.
Because esXML has a major requirement to support in-place accessing of data, whether for random access or simply to reduce copies when traversing an instance, aligned byte-oriented operation is desired whenever possible. An example of this might be a series of byte-sized character strings or a large array of IEEE floats. A key insight is that bit-oriented and byte-oriented encoding can be combined in the same instance. While this is sometimes done with padding before every byte-sized value, this is inefficient and often seriously detracts from the bit-oriented space savings. EsXML now uses a Hybrid Byte-Aligned Format (HBAF) which is a simple, low-level mechanism in the memory access layer to pack bit-aligned data while aligning byte-sized data. This mechanism can be tuned so that very little buffering is required: 128 bytes would yield almost no overhead compared to fully bit-packing all data. HBAF simply maintains two output pointers, gliding bit-sized data to holes before recent byte-sized data until a pointer spread distance is reached, at which point the rest of the remaining hole becomes padding. This method could also be used for word aligning scalars if desired.
Because XML instances often vary in schema and characteristic, often distinguished by an envelope and content elements from differing specifications, esXML supports the use of multiple XMS instances, created from different schemas perhaps, and mode switching at each element when desired. This allows an envelope to be processed most efficiently while a content element might be most compressed relative to an XMS instance.
EsXML supports both start/stop token indication and length-prefixed data as there are instances where each can be more efficient in terms of space, buffering, and processing efficiency. In particular, higher-level elements in a large document can be length prefixed to facilitate rapidly skipping through data. To allow such length prefixing to be done without arbitrary buffering requirements, a continuation token was invented that allows length prefixing to be chunked in an efficient manner. The continuation token indicates which level of the XML tree is being continued in the next chunk of data.
EsXML was created with respect to many requirements that became those of the W3C XBC and EXI working groups. Additional requirements to directly support certain ranges of important application architectures indicate the need for pointers, deltas, random access, and, in some cases, random modification. EsXML addresses this full set of requirements by the optional addition of a Storage Layer format, in addition to the Representation Layer format described above. This Storage Layer, using a structure called "Elastic Memory", directly supports efficient processing of pointers, low level deltas (i.e. byte/bit oriented ranges), and random modification (inserts, deletes, replacement) in a way that allows an application stack to avoid parsing and serialization.
See The esXML Specification [ESXML] for more information.
The tested version of esXML does not implement schema-informed encoding, has dropped tests for a couple files, and is producing copious debugging information which drastically affects processing efficiency. The version tested operates only in byte-oriented mode, not the more compact bit-oriented compact form. No XMS, templates, or restricted character sets are turned on.
The XBC Characterization document specifies the set of minimum requirements a format must satisfy to meet W3C requirements. The chart below lists all of the candidates reviewed here in the EXI measurements document and shows which requirements each claims to satisfy. Each candidate name also links to a more detailed discussion on the table entries for that candidate. It is important to note that at this points the assessment has been performed by the candidate submitters themselves.
The XML+gzip candidate in the below table is built from technologies available currently, which means XML that is compressed with gzip when the use case allows document analysis. The candidate assessments are made on the basis of each candidate's encoding, plus generic compression, or a format specific compression scheme if the candidate has one. An example of the later for instance, is Efficient XML, which implements integrated schema-analysis and document analysis.
| Status | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Format | XML + gzip | Fast Infoset | FXDI (Fujitsu Binary) | Efficient XML | Xebu | X.694 with BER | X.694 with PER | X.694 with PER + Fast Infoset | esXML |
| Passes min. bar? | No | No | No | Yes | No | No | No | Yes | No |
| MUST Support | |||||||||
| Format | XML + gzip | Fast Infoset | FXDI (Fujitsu Binary) | Efficient XML | Xebu | X.694 with BER | X.694 with PER | X.694 with PER + Fast Infoset | esXML |
| Directly Readable & Writable | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Transport Independence | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Compactness | No | No | Yes | Yes | No | No | Yes | Yes | No |
| Human Language Neutral | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Platform Neutrality | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Integratable into XML Stack | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Royalty Free | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Fragmentable | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
| Streamable | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes |
| Roundtrip Support | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Generality | No | No | Yes | Yes | No | No | No | Yes | No |
| Schema Extensions and Deviations | Yes | Yes | Yes | Yes | Yes | No | No | Yes | Yes |
| Format Version Identifier | Yes | Yes | No | Yes | No | No | No | Yes | Yes |
| Content Type Management | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Self-Contained | Yes | Yes | Yes | Yes | Yes | No | No | Yes | Yes |
| MUST NOT Prevent (DNP = Does Not Prevent; P = Prevents) | |||||||||
| Format | XML + gzip | Fast Infoset | FXDI (Fujitsu Binary) | Efficient XML | Xebu | X.694 with BER | X.694 with PER | X.694 with PER + Fast Infoset | esXML |
| Processing Efficiency | P | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP |
| Small Footprint | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP |
| Widespread Adoption | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP |
| Space Efficiency | P | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP |
| Implementation Cost | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP |
| Forward Compatibility | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP | DNP |
Of the candidates reviewed in this draft, two claim to satisfy all the minimum W3C requirements. While any of the formats above could form the basis for a W3C standard, it must be noted that those that satisfy fewer constraints would likely require more modifications than those that already meet the minimum W3C requirements.
Each modification to a format will impact the processing efficiency of the format and may also impact other characteristics, such as compactness. For example, it is possible (but not necessary) that modifications required to meet the W3C compactness requirements may significantly reduce processing efficiency and modifications required to improve processing efficiency may impact compactness. As such, the performance characteristics of each candidate presented in this table that does not meet all requirements illustrate what is possible when certain W3C requirements are not met, but do not illustrate the performance of a format based that candidate modified to comply to the W3C requirements.
It must therefore be noted that analysis of candidates should take into account whether candidates meet all or a subset of the W3C requirements.
The full analysis of the results, being quite long, has been placed in Appendix A: Measurement Details. This section provides a summary of those results.
These summaries focus on the level of complete content density clusters and use groups, as such level of detail was considered more appropriate when the intent is not to provide all the details. It must be noted, though, that there can be variation in performance also inside such groups, and therefore a full consideration of the performance must consider the detailed analysis as well.
While Compactness is defined as a "binary" property, that each format either has or does not have, this is typically insufficient information for application. Therefore, it is also necessary to analyze the precise Compactness results as measured in the framework.
To begin with, the XBC Characterization Measurement Methodologies Note defines thresholds for whether a candidate format achieves sufficient compactness for each of the application classes identified above.