1 Introduction

XML 1.0 [1] uses a text representation and has been very successful as a portable, platform-independent format for data exchange. The format, like most text formats such as SGML [2] and HTML [3], frequently uses many more bits to encode an element of information than the theoretical minimum. This fact has led to the idea that XML could benefit from having an alternate encoding format, a “binary XML,” that takes fewer bits for the same data and thereby saves transmission bandwidth or other costs such as parsing time.

The advantages of the single text form are well-known: It is a ubiquitous format that all data can be rendered into and that all XML parsers can process, thereby enabling universal portability of the data representation. The singularity of the format precludes splintering into multiple dialects with eventually differing meanings. It is human readable without the need for special viewers and editors. A wide range of text-processing tools are available for its manipulation, again without requiring any special, additional filters or converters.

Against this we need to weigh the potential for reducing the size or other performance-related measures of the encoding.

This paper presents some evidence that some benefits (such as size) claimed for a binary XML can often be achieved better by other means. More fundamentally, it argues that attempting to create a standard “binary” form of XML will lead to either great complexity or a plurality of new forms of XML, inconsistent with what we understand the goal of XML to be, namely a single, simple, ubiquitous form for portable exchange of data.

2 Factors to Consider

In this section, we consider factors that need to be addressed for a binary XML standard.

2.1 Performance-Based Standard

There are no known standards addressing performance requirements only. This raises a fundamental, philosophical question whether binary representation of XML is an appropriate candidate for standardization. Innovation should be left to vendors, and the convergence of their solutions should lead to a de facto standard. If this does not happen, then imposing a standard upon vendors and users cannot solve the problem either.

A new, standardized binary XML adds complexity to the XML space. Specifically, it increases the barrier to entry for working with XML since vendors and users will have to support two different formats (text and binary) to perform the same task. Vendors and users already have to deal with an increasing complexity of the XML family of technologies. W3C should find ways to simplify the XML landscape and not to impose specialized solutions on the XML community. It is conceivable that an innovative idea can produce a more performant binary representation than the standard allows, and the standard will only stifle its adoption.

2.2 Infoset Preservation

The binary representation must preserve, or be isomorphic to, the Infoset [4] content of the XML value[1] since this is important for portability. It should not require XML Schema [5] or DTD association for generating the binary encoding. A binary DOM format is not attractive since it does not meet the goal of being isomorphic to Infoset.

When a schema exists for the data, it should be used for optimizations. For obvious reasons, our preference is for XML Schema based approach (rather than DTD). The post-schema validation Infoset (PSVI) [5], which adds type information to the Infoset, should be encoded in the binary representation. This can improve parsing speed significantly.

One downside of the Infoset-based encoding is that other specifications may change the Infoset (as XML Schema did) and this information would also have to be transmitted to the binary format, resulting in continual maintenance of the standard.

2.3 Optimization Criteria

One and possibly more criteria could be used as a guiding principle for the binary representation. Some of them can be satisfied individually (e.g. low memory footprint or low parse time) but it is impossible to optimize them all at the same time. The global optimum is virtually impossible to find. The following are the most interesting criteria to pursue.

2.3.1 Memory Footprint

A binary representation generally has a smaller memory footprint than text XML. Consequently, it is useful for communicating with small devices such as PDA, cell phones, etc. It benefits from low bandwidth connections as well. On a bigger machine, such as a server, the emphasis shifts to better usage of bandwidth. A server can exchange more information with clients using the same number of bytes.

Text XML can be compressed using techniques such as gzip [6] or XMill [7] before transmission on the wire. These can achieve very good compression. The binary representation must be decompressed into text XML on the receiving side before being consumed, so that two passes of the data are required for parsing; this conflicts with low parse time optimization. These formats are suitable when a high compression ratio is required (e.g. low bandwidth connection) and generation and parsing costs are less of a concern (e.g. the data is archived and not often queried). On a handheld device, it becomes a tradeoff between smaller memory footprint and higher parsing cost, which consumes more power.

These compression techniques do not allow streaming of XML data. That is, the whole XML must be compressed and decompressed, which makes the overall process costly and infeasible in some scenarios (e.g. small devices). Gzip and XMill allow chunking, which mitigates the issue to a large extent. Streaming is, however, useful for scalability of a data server. With small amounts data, it may be feasible to store the binary representation and scan it multiple times, even for small devices. However, this approach is infeasible if the data size is large when single-pass parsing is desired (e.g. display data). The primary benefit of streaming is the lower memory requirement for parsing/generation of XML, which is a different requirement than the small footprint of the binary representation.

Often, the gain from hardware-based network compression (e.g. MNP-5) can be significant and works very well, which dilutes the need for binary XML representation.

If the binary representation must be stored in files or inside a database server, then low memory footprint may be acceptable. This reasoning also applies to data caching and messaging applications. Low memory footprint is desirable when storage and retrieval are the predominant operations. The increased processing cost may be more than offset by the saving in the I/O cost. The thresholds are unpredictable, so that beyond the low bandwidth situations, low memory footprint solution is probably not practical.

2.3.2 Parsing and Generation Speed

An important measure is the parsing cost of the binary form compared to the text representation. It can be up to one order of magnitude faster or even better, which saves a great deal of power on small devices. Binary XML parsers can be more complex than their text counterparts depending upon the complexity of the encoding, although they can be designed to be as simple as text XML parsers with a different set of goals in mind. Similarly, the generation of the binary representation (from, say, tabular data) can be costly depending upon the encoding, or take roughly the same time as text XML depending upon the factors being optimized. Parsing and generation are strongly correlated; when one cost goes up, so does the other.

Lower parsing/generation cost requires a fairly simple binary representation that includes enough information to aid efficient parsing. One such representation is obtained by creating a map from element and attribute names to numbers. This yields pretty good compression when element and attribute names are long and occur many times in the XML value. What’s left is the data, whose binary values are encoded in the binary stream if the schema is known. Furthermore, there is no need of entity resolution or white space normalization, which contribute to faster parsing speed. It is not hard to imagine that a binary parser can be simpler then a text one. Low parse/generation cost optimization may not yield much compaction of the binary representation, and conflicts with optimizations for small footprint.

Random access during forward-only parsing can yield significant speedup in some scenarios (e.g. execution of XPath expressions). Additional structures must be encoded, which increases generation time and slows down parsing of the whole XML value. True random access (i.e. not forward-only parsing) causes increase in size of XML and punishes modifications of larger XML. Thus, there are tricky tradeoffs to be made about how much to speed up random access at the cost of slowing down parse/generation of the whole XML value, since this optimization is largely determined by the workload.

Parsing/generation speed is a suitable criterion from a server’s viewpoint. A web or database server sends data out in chunks and fast generation speed leads to greater scalability. Buffering the entire data might be OK for small amounts of data, but for large data transfers, the server’s scalability would be poor.

Many client-side applications prefer to have faster parsing speed especially for visual rendering. Low memory footprint might be desirable if the data is cached at the client and may not be used (e.g. the user does not proceed beyond the first result of a search query). Thus, the optimization criterion heavily depends upon the application, and there is no one solution to make everyone happy.

It is possible to use ingenuous techniques to extract the last drop of compression from the binary representation. Typically, the greater the compression the more the parsing time is. Beyond a certain point, the parsing/generation cost outweighs the benefits. This point is not clearly demarcated, and it is very hard to strike the right balance between these conflicting requirements.

2.3.3 Data-only Compression

If the sender and the receiver each have the XML schema used in the encoding, and the XML schema is strict (no open content), further optimizations are possible. Consider the case in which tabular data is converted into XML — only the data needs to be encoded. This can yield very good compression ratios. Furthermore, the parser will be a validating one, so that additional functionality gets built in.

The benefits are large for large amounts of data, and applications can build in data-only compression. For example, this approach is suitable for web services in which a WSDL [8] is the point of agreement — the encoded data and the WSDL together are adequate to decode the data. The WAP binary XML protocol [9] falls into this class (although it is DTD based). Individual vendors can also provide such solutions. The encoding by itself is no longer self-describing. Thus, this approach is at odds with portability of data and works on a single “platform”.

The schema-driven approach is suitable for inter-process and inter-process data exchange in which the goal is extensibility of component architecture rather than portability of data. By changing the schemas, a different behavior is obtained from the system. This, however, is an internal need of a specific application and platform, and outside the purview of a standardization process.

2.4 Multiple Binary Representations

Different applications benefit from different optimizations for the binary representation. A server wants faster generation speed, mid-tier server emphases portability of data, and a client desires small memory footprint over slow connections. Trying to combine all these requirements and optimizations into a single, standardized format may eliminate the performance benefits expected from binary XML.

Multiple binary representations could be allowed within the standard. This can be done in a similar way as the “encoding” specification in XML 1.0. There needs to be a standard set of encodings allowed in binary representations, each optimizing one or more facets; the format has to handle all encodings of XML for the I18N item. Each encoding is designed to benefit an important and widely used class of applications. Within one encoding, a different encoding for a subpart of the content should be avoided. Thus, at database servers, the encoding that optimizes parse/generation speed might be more prevalent than small footprint used by clients.

The net result will be that the sender gets to choose the format to generate while the receiver must be able to decode multiple representations. Each side must be prepared to receive and process all binary encodings in addition to XML 1.0, thereby adding to the complexity of the software.

2.5 “Big” and “Little” Endian

Big and little endian encoding are conflicting requirements. There cannot be an agreement of one over the other, and the standard will have to allow both.

On a compatible platform, both sender and receiver can use the same encoding, and both derive performance benefits. However, in a cross platform scenario, the performance benefits at the receiver might vanish. Text XML runs into a similar issue with Unicode strings, which can be handled without much trouble. So the big/little encoding may not be a big issue.

3 Conclusion

For different classes of applications, the criterion (minimize footprint or minimize parse/generate time) for the binary representation is different and often conflicting. There is no single criterion that optimizes all applications. Consequently, a binary standard could result in a suite of allowable representations that clients and servers must be prepared to receive and process. This is a retrograde step from the portability goals of XML 1.0. Furthermore, the optimal binary representation depends on the machine and OS architectures on each end — translating between binary representations negates much of the advantages that binary XML has over text.

We need viable ideas to simplify the picture instead of muddy the waters by introducing more guidelines. We second the technical concerns voiced by Chris Lilley [10]. Furthermore, the standard’s work can go on for years and the ensuing standard can become a burden on vendors and stifle innovation. It is quite likely that factors other than those addressed in this paper are equally important for some classes of applications. Until these are clearly understood, continuing to innovate is prudent to find solutions that benefit the community at large.

We generally assume an all-or-nothing model in which either the entire XML value is converted into a binary representation or it is not converted at all. There may be intermediate solutions. For example, an interleaved text and binary format preserving Infoset content — in which large blobs of data (e.g. pictures) are sent as binary attachments — is a more promising direction to follow. Its main strength remains portability, and it can improve parsing speed sufficiently to satisfy a large class of applications.

4 References

1. “Extensible Markup Language (XML) 1.0 (Second Edition)”, October 2000. URL: http://www.w3.org/TR/REC-xml.

2. “Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)”, 1986. URL: http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=16387.

3. “HTML 4.01 Specification”, December 1999. URL: http://www.w3.org/TR/html401.

1. “XML Information Set”, October 2001. URL: http://www.w3.org/TR/xml-infoset.

2. “XML Schema”, May 2001. URL: http://www.w3.org/TR/xmlschema-0 {-1, -2}.

3. gzip compression. URL: http://www.gzip.org/.

4. “XMill: An Efficient Compressor for XML”, Hartmut Liefke and Dan Suciu, in Proceedings of SIGMOD, pp. 153-164, 2000. URL: http://www.research.att.com/sw/tools/xmill.

5. “Web Services Description Language (WSDL) 1.1”, March 2001. URL: http://www.w3.org/TR/wsdl.

6. “WAP Binary XML Content Format”, June 1999. URL: http://www.w3.org/TR/wbxml.

7. “Binary XML problem statement”, Chris Lilley, Feb 2003. URL: http://lists.w3.org/Archives/Public/www-tag/2003Feb/0224.

[1] By “XML value” we refer to both XML documents and XML fragments.