XML Binary Characterization

Editors' copy $Date: 2005/03/30 06:40:30 $ @@ @@@ 2005

This version:
Latest version:
Previous version:
Oliver Goldman, Adobe Systems Inc.
Dmitry Lenkov, Oracle


This document describes the processes and results of the XML Binary Characterization Working Group in evaluating the need and feasibility of a "binary XML" recommendation. It includes an analysis of which properties such a format must possess. It recommends that the W3C produce a "binary XML" recommendation and enumerates the minimum requirements which this "binary XML" recommendation must meet.

Status of this Document

This document is an editors' copy that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Table of Contents

1 Introduction
2 Background
    2.1 Definition of Binary XML
    2.2 Analysis Methodology
3 W3C Requirements
4 Use Case Requirements
5 Decision Tree Requirements
    5.1 Property Decision Tree
6 Minimum Binary XML Requirements
7 Feasibility of Binary XML
8 Conclusions
9 References


A Acknowledgments
B XML Binary Characterization Changes

1 Introduction

The W3C XML Binary Characterizations Working Group (XBC WG) has evaluated a set of use cases across a variety of domains which establish the potential benefits of "binary XML". We recommend that the W3C proceed to produce such a recommendation.

We believe such a format will not be successful if it does not maintain interoperability with XML and the family of XML-related standards. In order to preserve XML interoperability producing a Binary XML recommendation must be a W3C activity.

The use cases have been analyzed to determine the minimum set of requirements which Binary XML must meet. This has been done by defining a single set of properties of formats and then stating use case requirements in terms of those properties. The properties can then be further understood in terms of the number of use cases which require them, their presence or absence in XML, and so forth.

Most of these required properties can be met by existing solutions. However, none of those solutions have been adopted in all or even most of the represented domains. At the same time, there are sufficiently many existing solutions to demonstrate the feasibility of creating a format which meets the majority of the requirements.

The remainder of this document contains a detailed analysis. Section 2 provides background on the notion of "binary XML" and attempts a pragmatic definition of the term. It also describes our approach to the analysis.

Sections 3 through 7 form the core of the analysis. Section 3 enumerates required properties informed by W3C architectural principles. Section 4 enumerates required properties derived from the use cases. Section 5 analyzes all collected properties based on how they would affect interoperability with XML. Section 6 consolidates the analysis of the previous three sections and contains the minimum required property list for "binary XML". Section 7 addresses the feasibility of a recommendation which might meet those requirements.

Section 8 concludes with a recommendation for proceeding with the development of "binary XML".

2 Background

The debate over "binary XML" has been common place since XML's inception. It arises because as a successful, widely adopted standard there are good reasons to adopt XML [XML 1.0] [XML 1.1] yet, at the same time, various properties of the format make it unsuitable for some uses. Some of those properties, such as a lack of terseness, are an intentional part of XML. The driving notion behind "binary XML" is generally that it would provide an equally interoperable format with a different set of properties. This would make it suitable to a different set of use cases which are not currently well-served by XML. The perception of what parts of XML a "binary XML" standard should keep or discard often varies.

2.1 Definition of Binary XML

To discuss "binary XML" requires at least a pragmatic definition of the term. For purposes of this document we define Binary XML as a format which does not conform to the XML specification yet maintains a well-defined, useful relationship with XML. By "useful" we mean that practical systems may take advantage of this relationship with little additional effort. For example, it may be useful to convert a file from XML to Binary XML.

For the remainder of this document we use the term Binary XML (without quotation) to refer to this definition. Later we further examine the relationship of Binary XML to XML.

One of the most important questions concerning Binary XML is whether a single solution can operate efficiently on a vast and uneven set of requirements, from full-fledged GUI applications using document-oriented vocabularies such as SVG on mobile devices, to high-performance data-intensive SOAP messages sent between powerful servers over broadband LANs (to take slightly caricatural examples). Indeed, the domains in which XML is or could be used cover so much ground, and the situations into which its adopters wish to bring as much of its value as possible are so diverse, that the variety of the requirements being expressed with regards to Binary XML is larger than most expect. To this effect, the XBC WG has documented a cross-section of these possible requirements in its Use Cases document.

Other important considerations include whether the introduction of a Binary XML format into the core set of XML specifications defined by the W3C would be harmful to the XML ecosystem; whether leaving the definition of such a solution to other, more domain-specific, organizations would be better or worse; and whether standardizing a single Binary XML format would be more or less valuable than not doing so.

The goals of this document are therefore to answer the following questions:

  • Do the use case requirements justify either the development or adoption of a Binary XML recommendation? There is a cost to developing Binary XML. Justifying this cost means that the requirements would have to suggest a format substantially different from XML.

  • Is it possible to create a recommendation which reasonably addresses these requirements? It will be of little value to suggest working on something which is not feasible or something so complex so as to be unusable.

2.2 Analysis Methodology

The efforts of the working group proceeded roughly as follows.

We began by documenting a set of use cases [XBC Use Cases] which benefit from the use of a widely standardized format. In most of these use cases XML has either been considered or adopted but not found satisfactory. Many of these use cases are themselves aggregations of many more distinct applications within their own domains. We ultimately concluded that a Binary XML standard should address all of the use cases we identified.

We then analyzed the requirements of each use case and, by comparing these requirements across use cases, created a set of properties [XBC Properties]. These properties are either a property a format may possess (e.g., a format may be compact) or a property of software implementations which process the format (e.g., a format may permit implementations with a small code footprint).

We next used these as input to answer the first question put to the group, that is, do the requirements justify the development of a Binary XML standard? In other words, would the properties which this format would have to possess differ significantly from XML yet justify the cost of developing such a standard? We derived these properties by:

  • Determining what properties the format would, by virtue of being a W3C standard, need to possess.

  • Determining which properties the use case requirements suggested a Binary XML standard should support.

  • Determining which properties would or would not permit Binary XML to maintain a well-defined, useful relationship with XML.

Beginning in Section 3 we use the keywords MUST, MUST NOT, SHOULD, and SHOULD NOT when stating requirements on Binary XML. When these words appear in this document they are to be interpreted as described in [RFC 2119].

Finally, we addressed the feasibility of the resulting set of requirements by looking at existing standards and the expertise of the members of the XBC WG.

3 W3C Requirements

As a chartered activity of the W3C, we thought it essential that our recommendation conform with architectural principles and best practices laid down by the W3C [WWWArch]. As such, Binary XML MUST support the properties required for this conformance. The properties in this category are:

4 Use Case Requirements

We reviewed each property in the context of each use case and assigned it to one of four categories:

Must have: This is the set of properties which must be supported for a format to be adopted in the use case domain. This is intended to be a high bar in that an unsupported must have property would not simply make a format undesirable but actually unusable.

Should have: This is the set of properties which are important, but not critical, to the use case. A format which did not support should have properties would be significantly less desirable than one that did. However, formats not supporting "should have" properties would still be usable for that use case.

Nice to have: This is the set of properties which are not important, but supporting them brings some benefit to the use case. However, the benefit is generally minor and would be traded off to support should have or must have properties for that use case.

Irrelevant: The property is generally irrelevant to the use case. However, if the inclusion of the property in the format prohibits a must have, should have, or nice to have property then it is undesirable.

The XBC Use Cases document provides a description, for each use case, of its must, should, and nice to have properties. Irrelevant properties are omitted from the descriptions and must be determined by their absence.

It is helpful to view the results of this exercise using the following chart. This chart shows, for each property, the number of use cases which rank it must have , should have, or nice to have. The number of use cases ranking a property as irrelevant is not explicitly indicated.

Bar chart showing the number of use cases requiring each property

The chart is sorted first by the number of use cases for which a property is a must have, then by the number of use cases for which a property is a should have, and finally by the number of use cases for which a property is a nice to have.

We began the analysis by including all properties designed as must have for at least one use case. This was the minimum bar which still permitted all use cases to be addressed. While we were prepared to eliminate use cases and thereby eliminate additional requirements if necessary, further analysis showed this was not the case. In part, this is because some requirements listed here were eliminated by the decision tree applied in Section 5, below. This list, therefore, continues to include each property which is a must have for at least one use case.

5 Decision Tree Requirements

We believe Binary XML must have a well-defined and useful relationship with XML. The inclusion of the property Integratable into XML Stack to 16 of the 18 use cases supports this belief. This position can also be seen as enhancing interoperability not only within the Binary XML community and XML community but between them as well.

Some proposals for Binary XML, such as the use of GZIP [GZIP], operate directly on XML, thus preserving it byte-for-byte. (The use of GZIP as a content encoding or transfer encoding applicable to various types of content including XML is standardized by HTTP 1.1 [HTTP 1.1].) However, the importance of Directly Readable and Writable, Compactness, and Processing Efficiency suggest that any viable candidate must support all three. These three properties cannot be achieved with any approach such as GZIP which requires creating an XML representation as an intermediate step in creating Binary XML. The places an upper bound on how tightly XML and Binary XML can be coupled.

The working group observed that, of the set of properties indicated in the previous section, some had to be "intrinsic" properties of a new format but others could be obtained via other means. For example, if a Binary XML format is Integratable into XML Stack then it could be signable by virtue of integration with the XML Signatures [XMLDSig] recommendation and without defining a new signature mechanism specific to Binary XML.

The working group addressed this issue in a systematic way for each property by applying the following decision tree to determine whether a property should be made a part of Binary XML or addressed in some other fashion.

The use of this decision tree should not be taken as a recommendation to either change or not change other recommendations in the XML stack. Rather, it recommends parity between XML and Binary XML. Subsequent recommendations in the XML stack, whether new or revised, should address both XML and Binary XML equally. For example, a revision to XML Signatures should define a canonicalization algorithm which does not require a conversion to XML and thus negates many benefits of Binary XML in some use cases.

5.1 Property Decision Tree

Does XML support the property directly?

  1. Yes. The Binary XML format should directly support the property.

  2. No. Does XML support this property when combined with other recommendations in the XML stack?

    1. Yes. Binary XML should work with the other recommendations in the XML stack.

    2. No. Is it feasible for XML to support this property?

      1. Yes. The property should be addressed by a general approach (e.g., new recommendation) that works for both XML and Binary XML.

      2. No. The property should be directly supported by Binary XML.

The decision tree divides properties into a number of categories. The following properties are those which the decision tree suggests Binary XML MUST support:

Based on the decision tree we determined that the following properties should be addressed by separate technologies designed to work with both XML and Binary XML. Some of these technologies, such as for signatures and encryption, already exist. These properties therefore fall into the category of properties which Binary XML SHOULD NOT support but which Binary XML also SHOULD NOT prevent separate technologies from addressing:

6 Minimum Binary XML Requirements

The results of the analysis in the three preceding sections were combined to determine the minimum requirements on Binary XML. A property appears in the minimal requirements because either:

  1. It is included in the list required of us as a W3C activity, or

  2. It is a must have property for at least one use case and should, per the decision tree, be supported directly by Binary XML.

All properties met the second test. Those which also met the first have been annotated as such in the table.

Five of these properties are either algorithmic properties or additional considerations related primarily to a Binary XML processor implementation. Binary XML MUST NOT prevent an implementation from achieving these properties. However, an implementation might reasonably choose not to attain one of these properties even for a format which permits it. For example, an implementation may elect to trade off achieving Small Footprint for further improvements in Processing Efficiency. These five properties are therefore stated as MUST NOT Prevent requirements intead of MUST Support requirements.

MUST Support
Directly Readable and Writable
Transport IndependenceW3C
Human Language NeutralW3C
Platform NeutralityW3C
Integratable into XML StackW3C
Royalty FreeW3C
Widespread Adoption
Roundtrip Support
Schema Extensions and Deviations
Format Version Identifier
Content Type ManagementW3C
Self Contained
Forward Compatibility
MUST NOT Prevent
Processing Efficiency
Small Footprint
Space Efficiency
Implementation Cost

7 Feasibility of Binary XML

For Binary XML to be a widely accepted standard it should successfully address a wide and varied range of problems that may involve different, and occasionally, conflicting requirements. This quality has been captured by the Generality property which is a property Binary XML MUST support. As the Use Cases document shows, XML does not achieve the desired degree of generality, i.e. it is not compact enough for some applications, it does prevent certain degrees of efficiency for others, it lacks certain features for yet others, etc.

However, it would it be of little value to recommend the creation of Binary XML if it is not feasible to balance these many constraints. In evaluating the feasibility of a single standard Binary XML format the working group relied on the following two factors:

The table below provides the characterization of existing formats in regard to the list of required properties identified in Section 5 of this document. The characterization of each format was submitted by a particular company or organization represented on the Working Group. Each company or organization is identified by a number in the table and after the table numbers are associated with the information about the company or organization that submitted the corresponding format. The Working Group has not done any measurements on submitted formats. Their placement in the table is no way an endorsement of any of these formats as being appropriate for a standardization activity.

8 Conclusions

The XBC WG developed 18 extensive use cases and documented 38 different format properties and considerations which those use cases might require. The sheer number of requirements has suggested to some that either Binary XML is not achievable or, in attempt to satisfy too many requirements, is destined to collapse under its own weight.

After conducting our analysis the WG is confident that creation and adoption of Binary XML can be reasonably achieved. After a thorough analysis, 17 of the properties (nearly half) did not make the minimum requirements list. 6 of those that remain are derived from W3C architectural principles and would presumably be required of any W3C recommendation.

Using the decision tree-based analysis suggests another 11 properties which, if addressed, should be addressed in the XML stack and not as part of Binary XML itself. While a complete set of requirements for Binary XML is outside the scope of this document it is clear that development of a Binary XML recommendation would not need to satisfy all 38 properties to achieve success.

This achievable minimum requirements list will address the must have requirements of all use cases. While excluding certain use cases might have made Binary XML yet more achievable, it would also have made it less widely applicable.

In conclusion:

Binary XML is needed. Working Group domain experts have collected and examined a comprehensive set of use cases which establish this need for Binary XML. The use cases lay out the properties Binary XML must possess in order to be successful. Formats which possess these properties are being adopted now within the represented domains.

Binary XML is feasible. The number of required properties determined to be must haves for adoption by the use cases is less than half of the nearly forty properties identified. Evaluation of existing approaches has shown that there is at least one format capable of implementing all the required properties.

The W3C must produce Binary XML. Many of the represented domains are already adopting Binary XML formats. In order to preserve XML interoperability and to prevent the establishment of multiple, incompatible binary formats, producing a standard Binary XML must be a W3C activity.

Binary XML must integrate with XML. The required properties make it clear that Binary XML must integrate with the existing XML stack and not require changes to XML itself. Binary XML will significantly widen the domains to which XML expertise and software will apply.

9 References

XBC Use Cases
XML Binary Characterization Use Cases (See http://www.w3.org/TR/xbc-use-cases/.)
XBC Properties
XML Binary Characterization Properties (See http://www.w3.org/TR/xbc-properties/.)
XBC Measurement Methodologies
XML Binary Characterization Measurement Methodologies (See http://www.w3.org/TR/xbc-measurement/.)
XML 1.0
Extensible Markup Language (XML) 1.0 (See http://www.w3.org/TR/REC-xml/.)
XML 1.1
Extensible Markup Language (XML) 1.1 (See http://www.w3.org/TR/xml11/.)
XML-Signature Syntax and Processing (See http://www.w3.org/TR/xmldsig-core/.)
Architecture of the World Wide Web, Volume One (See http://www.w3.org/TR/webarch/.)
GZIP file format specification version 4.3 (See http://www.ietf.org/rfc/rfc1952.txt.)
HTTP 1.1
Hypertext Transfer Protocol -- HTTP/1.1 (See http://www.ietf.org/rfc/rfc2616.txt.)
RFC 2119
Key words for use in RFCs to Indicate Requirement Levels (See http://www.ietf.org/rfc/rfc2119.txt.)

A Acknowledgments

The editors would like to thank the many contributors from the working group. Special thanks go to Robin Berjon and Mike Cokus.

B XML Binary Characterization Changes

2005-03-28OGEditorial changes form John
2005-03-23OG Changes from 3/23 telecon. Updated property demand chart with new data from updated use cases. Revised various property lists.
2005-03-20OG Added abstract. Introduce RFC 2119 and use RFC 2119-defined keywords when making recommendations. Update definitions of property categories to match Properties document. Reference HTTP, not SVG, for definition of GZIP-based content transfer. Clarify "Must not prevent" category. Re-introduce Dmitry's introductory text in chapter 6, plus address comments. Update list of references.
2005-03-16OG Various small changes from emails. Fix many links, update references. New feasibility text from Dmitry.
2005-03-09OG Most changes from the Boston F2F.
2005-02-17OG Significant update incorporating text for all major sections. Did not bring properties list or references up-to-date.
2004-12-20OG Removed "Format Characterizations" section, which is replaced by corresponding text in the Measurements document. Add introductory text in Background from Robin. Add decision tree and related text. Add use case and property analysis charts.
2004-11-18OGInitial draft. Outline only.