XimpleWare W3C Position Paper


jzhang@ximpleware.com
Klovette@ximpleware.com

Abstract

Despite its promises as the foundation technology for next-generation Web-based applications, XML faces many issues that have so far slowed its adoption in the contexts of Web services, B2B and EAI. Most recent efforts of replacing XML with binary surrogates have been directed toward solving the “verbosity” and/or processing performance of XML at the expense of losing XML’s human-readability. XimpleWare introduces a new way of optimizing XML processing performance without compromising the “view source principle.” We describe a procedure by which one can attach the parsed state of an XML document in a small binary file along with the original document so that servers at the receiving end can directly access, query and update the XML document, all without the need of parsing again. Furthermore, XimpleWare’s processing model can be directly mapped into FPGA or ASIC to achieve sustained Gigabit performance.

Position

Performance: The Bigger Problem

Although verbosity and performance of XML are two oft-mentioned issues, usually only one of those two arises as the dominant bottleneck within a particular context. Whereas verbosity matters more in a low-bandwidth and high-latency environment such as mobile computing, for a large class of applications for which available bandwidth ranges from sufficient to abundant, it is performance of XML processing that has become the glaring limiting factor.  This in part attributes to enterprises’ network infrastructure build-up for the past decade during which the rate of bandwidth commoditization has significantly outpaced the “Moore’s law” for computing power. Various research reports and publicly available benchmarks have shown that (1)servers' processing throughput of XML amounts to only a small fraction of network bandwidth (2)  comparable applications based on XML underperforms proprietary-middleware based ones by orders of magnitude. To those applications, XML processing performance presents a significant barrier to adoption and clearly is a bigger problem.

Binary XML: At What Cost?

Binary XML generally refers to alternative XML info-set serialization formats (other than text) with the goals of optimizing processing performance and/or bandwidth requirement. However, those goals are often achieved at the expense of losing human readability. XML is, by design, human-readable as it simplifies many aspects of XML application programming and helps lowering the barriers to learning. Adopting binary XML would force XML programmers to give up the luxury of reading the wire format to quickly figure out how things work or what goes wrong, and go back to the “dark ages” of CORBA and DCOM. Also binary XML often mandates the presence of schema, which is precisely the reason why previous generation of distributed computing has been considered rigid and brittle. Departure from such “tight-coupling” is what makes Web such a great success because a Web browser makes few assumptions on what an HTML page is supposed to look like. In other words, the coupling between a Web browser and Web server is “loose.” XML is chosen as the wire format because it brings similar value proposition in Web services paradigm.

Why XML Processing Is Slow

There are at least two factors contributing to the lackluster performance of XML processing, summarized as follows:
Current XML processing inherits heavily from traditional, LEX-type text processing techniques, which requires that tokenization be done in the form of picking apart the original XML document into many string objects. This is both slow and wasteful in memory usage because of OS inherent inefficiency in managing small memory blocks.

Modern processors are designed to be flexible, i.e., to be able to do many different types of tasks, but don’t do anyone particularly well.  Many of its design philosophy and features, such as sequential execution model, deep pipeline and multi-level memory hierarchy, all become liability to high performance text processing tasks, which require a small set of operations to be performed repetitively over large amount of data.  This is an area where custom hardware, which is designed to perform a small set of operations at very high speed, can come in and help.

Also programmers working with XML often confront the dilemma of picking the right processing model. Generally people like DOM (Document Object Model) because it offers a tree view of the XML document and is a natural and easy way to work with XML. But DOM’s data structure building is slow and quite resource intensive, making it unsuitable for most high performance applications. SAX (Simple API for XML) is faster and consumes less memory, but doesn’t provide much structural information of an XML document. As a result, programmers using SAX often have to manually maintain the state information, which can be quite tedious for a complex XML document. In light of those issues, XML luminary James Clark, in a recent interview (http://www-106.ibm.com/developerworks/xml/library/x-jclark.html?dwzone=xml),  points out that one of the challenges for XML is to "Improve XML processing models. Right now, developers are generally caught between the inefficiencies of DOM and the unfamiliar feel of SAX. An API that offers the best of both is needed."

A New Processing Model

Based on the premise that server throughput is, in a typical enterprise data center setting, much slower than the available bandwidth, Ximpleware has chosen to focus its efforts on improving XML processing performance rather than reducing the bandwidth requirement. To address this challenge, we propose a new XML processing model and it works like this: By maintaining the entire document in memory, a user gets a complete structural view of the document by navigating a separate binary file (it is analogous to an index in a DBMS, and allows the DOM /XPath implementation to quickly find the requested section of the text document), which is typically around 30%~50% of the size of the XML document. The generation of binary file by software is comparable to SAX in performance. A custom hardware implementation can achieve sustained performance of over 100 MB/sec, sufficient to keep pace with a Gigabit connection. Also because the binary file is inherent persistent, one can attach the binary file along with the XML document so application server at the receiving end can directly process the XML data without the need of parsing again.

Its immediate advantages are:
In addition, we plan to move Schema validation on chip, which can be done in parallel with parsing and incur no penalty in parsing performance. This is achievable because Schema, in its essence, is nothing more than a finite state machine, which is well suited for hardware implementation.

How does our processing model compare with DOM or SAX? In short, our processing model is more DOM-like in that it loads everything in memory and allows DOM-type of random access within the XML document at much lower memory consumption. What’s more, by keeping the XML document in its serialized form, dynamic update or modification to the document doesn’t require re-serialization of irrelevant parts of the document, resulting in dramatic serialization performance improvement. This is again superior to traditional text processing techniques, which requires round-trip of taking apart the document and putting everything back, regardless of the processing need.

Preliminary Benchmark Performance

Our preliminary performance test was designed to compare the data structure building of our Java-based processing solution with two similar type of processing technologies: Xerces DOM and XML Cursor. The test platform is an Athlon XP1900+ machine with 512 RAM running Redhat Linux kernel version 2.4.   The test file is a single 100k size document purchase order file. We got it from BEA’s XMLcursor performance benchmark package.

Processor Type Xerces 2.3 XMLCursor XimpleWare
Average Time for building data structure (ms) 42.2 25.1 15.4

Table 1 Preliminary Performance Figure of Data Structure Building

The SAX-type of performance of our processing model is “by design” and the performance figure shown above is preliminary and subject to further optimization. We intend to provide more up-to-date results by the time of the workshop.  Also the navigation performance, which we are still working on at the time of submission, is on par with Xerces and XMLCursor. We intend to show that in the workshop as well.

Potential Applications


Allowing efficient consumption of the binary file by XML processors that requires optimized processing without losing the standard text file, this processing model could be particularly useful in a number of scenarios: XML message load balancers that can boost the XML processing throughput by quickly generating the binary file and attaching it to the XML document; XML database servers that could build the binary file as part of the storage/indexing process and make it available upon retrieval to speed up downstream DOM and XPath processing; Finally, the processing model is probably the most efficient for XML intermediary applications operating at near network speed for which usually limited amount of update is required while most of the XML payload is left unmodified.


Q/A Section

  1. What work has your organization done in this area? (We are particularly interested in measurements!)
We (XimpleWare) are focusing on optimizing the XML processing performance rather than the bandwidth requirement of XML.  Also we achieve our goal without compromising “view source principle.”
     2.  What goals do you believe are most important in this area? (e.g. reducing bandwidth usage; reducing parse time; simplifying APIs or data structures, or other goals)
We think that processing performance and simplifying API are most important in this area.
    3. What sort of documents have you studied the most? (e.g. gigabyte-long relational database table dumps; 20-MByte telephone exchange repair manuals; 2 KByte web service requests)
Our processing model works with any XML file. For our benchmark purposes, we used XML files ranging from 10k to 1M in size containing fairly complex structures.
    4. What sorts of applications did you have in mind?
Our technology is applicable wherever performance is important. The possible areas of applications include XML intermediary (firewall, data router etc.), XML middleware applications, and XML database applications.
    5. If you implemented something, how did you ensure that internationalization and accessibility were not compromised?
Since our processing model maintains the XML document in its original form, so it goes where XML goes. No internalization or accessibility is compromised.
    6. How does your proposal differ from using gzip on raw XML?
We don’t optimize the bandwidth requirement of XML initially.
    7. Does your solution work with any XML? How is it affected by choice of Schema language? (e.g. W3C XML Schema, DTD, Relax NG)
Yes, because our technology is an innovation at tokenization level, it works with any XML and is Schema agnostic.
    8. How important to you are random access within a document, dynamic update and streaming, and how do you see a binary format as impacting these issues?
It is very important. In terms of raw speed, SAX is much faster than DOM; however, when factoring in the overhead of state management, SAX’ real world performance could be a lot slower than its raw performance. In that regard, our binary “companion” file combines the best of both DOM and SAX.


Summary

While the processing performance of XML is a very important issue that will directly impact the its adoption as a platform-independent, interoperable and open data/document encoding format, a good solution should not compromise XML’s human-readability that lies in the core of its value proposition. By pioneering a hardware-accelerated XML processing model that has some interesting properties, XimpleWare hopes to help alleviating XML’s performances issue and does so without violating the “view-source principle.”