W3C

Efficient XML Interchange (EXI) Impacts

W3C Working Draft 03 September 2008

This version:
http://www.w3.org/TR/2008/WD-exi-impacts-20080903
Latest version:
http://www.w3.org/TR/exi-impacts/
Editor:
Jaakko Kangasharju, University of Helsinki

This document is also available in these non-normative formats: XML.


Abstract

The Efficient XML Interchange (EXI) format defines a new representation for the Extensible Markup Language (XML) Information Set. The introduction of such a format may cause disruption in systems that have so far been able to assume XML as the only representation of XML Information Set data. This document reviews areas where the introduction of EXI may disrupt or otherwise have an impact on existing XML technologies, XML processors, and applications. It also describes EXI design features and steps that may be taken by implementors to reduce or eliminate disruption and impacts.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a First Public Working Draft of “Efficient XML Interchange (EXI) Impacts.”

This document is intended to aid people in the XML community to determine whether their particular area of interest is affected by the introduction of EXI. It currently contains the significant impacts identified by the Efficient XML Interchange Working Group, and the group would also appreciate hearing from the XML community if any potential impacts have been missed.

This document was developed by the Efficient XML Interchange (EXI) Working Group.

Please send comments about this document to public-exi@w3.org (public archive).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Terminology and Discussion
3 Existing XML Processors and Applications
4 Existing XML Technologies
    4.1 XML Security
        4.1.1 XML Signature
        4.1.2 XML Encryption
        4.1.3 XML Canonicalization
    4.2 Existing XML Processing APIs
    4.3 XML and Binary Attachments
5 Sacrificing Human Readability
6 Other Impacts
7 Conclusions
8 References

Appendix

A Acknowledgements


1 Introduction

While the introduction of EXI has the potential to bring XML to new communities, it can also have adverse effects on the existing XML community. The precise scope of these effects may not be fully knowable in advance, but based on experience with existing binary formats, educated estimates can be made.

The main goals of EXI in regards to existing systems are to provide maximally seamless compatibility with XML and to avoid disruption of existing XML technologies and specifications. In particular, EXI should not require modifications to existing XML systems, unless these systems are extended to adopt EXI. The purpose of this document is to identify any immediate impacts that require changes to existing XML-based specifications or XML-using applications. It also identifies cases where changes to existing specifications or applications are not required, but might be desirable to increase efficiency.

2 Terminology and Discussion

This section collects relevant definitions from the [XML] and [EXI] specifications.

XML Processor

A module used to read XML documents and provide access to their content and structure

EXI Processor

A module used to encode structured data into EXI streams and/or to decode EXI streams to make structured data accessible

Application

A module on behalf of which an XML processor or an EXI processor does its work

In a system containing both an XML and an EXI processor, the modules would normally be completely separate from each other. The application would be responsible for deciding which processor is to process each document. It could use either out-of-band means, such as communication protocol metadata, or in-band means, such as the distinguishing bits of EXI, to make this decision.

3 Existing XML Processors and Applications

EXI offers two in-band means to distinguish it from other formats: the mandatory Distinguishing Bits and the optional EXI Cookie. In particular, either of these is sufficient to distinguish EXI from XML when using any conventional character encoding (see [EXI Best Practices], section 4.1.1). Assuming such a conventional character encoding, the first octet of an EXI document, either one that includes the distinguishing bits or the first octet of the EXI cookie, can not appear as the first octet of a well-formed XML document. Therefore, an XML processor is required by the XML specification to reject any EXI document immediately upon reading that first octet.

XML is often used in conjunction with other protocols and technologies. In some such cases, in particular the World Wide Web and Web services where HTTP is common, the protocol supports content negotiation to allow applications to indicate which content types and encodings they are prepared to handle. [EXI Best Practices] describes how such support can be used to introduce EXI to such an environment with no impact to applications that have not adopted EXI.

More generally, in an environment consisting of multiple XML applications, where some but not all applications wish to adopt EXI, coordination is needed to avoid transmitting EXI to applications that are not prepared to handle it. Following the EXI best practices, the burden of such coordination should fall only on the applications that adopt EXI, as they should not send EXI to applications that are not known to understand it. In processing of incoming transmissions, an application adopting EXI will need to implement an internal mechanism for routing the incoming content to the appropriate processor (XML or EXI), but a non-EXI-aware application can continue using its XML processor for everything. If the communication protocol does not offer any method for content negotiation, it may be that a non-EXI-aware application occasionally gets sent EXI content. In such cases, the aforementioned immediate rejection should be communicated to the sender so that it can avoid sending EXI content to that receiver in the future.

4 Existing XML Technologies

Most existing XML technologies are specified based on the [XML Infoset]. EXI has been designed as an encoding format of the Infoset and is therefore immediately applicable to such technologies. Some technologies, however, are specified in terms of character or octet data, and therefore require further consideration on the impacts of EXI. This also means that applications requiring byte-for-byte preservation of XML documents cannot always use EXI, though EXI is capable of preserving all the information relevant to [Canonical XML]. Other technologies may gain additional significant benefits if modified to support EXI. While such modifications are not required immediately, they may be desirable in future versions of the relevant specifications.

4.1 XML Security

The XML security specifications [XML Signature] and [XML Encryption] can be used as they currently exist with EXI, so EXI has no immediate impact on them. For interoperability in current environments, this requires computing signatures over an XML serialization and making sure that any encrypted content has been serialized as XML.

4.1.1 XML Signature

In current environments, XML Signature can be used with EXI by specifying an existing XML canonicalization algorithm, such as [Canonical XML]. A signed document can be transmitted using EXI, as long as the necessary fidelity options are enabled. As with XML, the receiver will need to serialize the signed content using the selected XML canonicalization algorithm to verify the signatures. In the future, XML use could be avoided completely by using a URI that designates a to-be-defined EXI canonicalization algorithm, rather than an XML canonicalization.

4.1.2 XML Encryption

Use of XML Encryption in mixed XML/EXI environments may require using XML as the format for any data that is encrypted, as the producer may not know whether the ultimate recipient of the document is capable of understanding EXI. If it is known that the recipient understands EXI, the MimeType attribute of the EncryptedData element could be used to indicate EXI as the format of the encrypted data (though this appears to require a minor modification to [XML Encryption]).

4.1.3 XML Canonicalization

EXI has no impact on existing XML canonicalization algorithms ([Canonical XML], [Excl XML Canonicalization]). For use in signatures, it may be beneficial in the future to define a URI for “canonical EXI” that defines a specific EXI Options document to use in generating a canonical form, but this consideration is completely separate from existing specifications.

4.2 Existing XML Processing APIs

As EXI is an encoding of the XML Infoset, an EXI implementation can support any of the commonly-used XML APIs for XML processing, so EXI has no immediate impact on existing XML APIs. However, using an existing XML API also requires that all names and text appearing in the EXI document be converted into strings. In the future, more efficiency might be achievable if the higher layers could directly use these data as typed values appearing in the EXI document. For instance, if a higher layer needs typed data, going through its string form can produce a performance penalty, so an extended API that supports typed data directly could improve performance when used with EXI.

4.3 XML and Binary Attachments

Some use cases require the inclusion of binary data in XML documents, and to avoid the required base64 conversions, specifications such as [XOP] exist to package the binary data separately from XML. Since EXI is capable of encoding binary data directly, it is possible to simply include the binary data inside an EXI document without a loss in efficiency. If a use case requires a packaging where XML content is separated from the binary data, EXI can still be used as the format for the XML part.

5 Sacrificing Human Readability

As a text-based format, XML allows direct editing with generic text editors as well as debugging generated XML by simply using “view source” features. EXI, as a binary format, does not conveniently permit this, so generating and inspecting EXI is therefore mostly in the domain of specific tools that include an EXI processor.

Already many applications that support viewing and editing XML parse the XML to present a structured view of the data, more attractive than unformatted text. Such applications are usually easy to modify to include recognition of new data formats, so plugging in an existing EXI processor would have low cost and would provide the same data inspection opportunities as the application already provides for XML. Thus the sacrifice of human readability is not as large a concern as it might initially seem due to the XML compatibility that EXI provides.

6 Other Impacts

Content negotiation in protocols like HTTP is based on peers informing each other what content types and encodings they support. While this is sufficient for basic usage of EXI, many use cases also require information on common schemas and datatype representation maps. Negotiation of such additional parameters might be accomplished through a variety of methods, and it is not yet clear which methods are best suited for the task.

7 Conclusions

EXI has been designed to be compatible with XML and can be introduced into the existing family of XML technologies without immediate disruption to XML-using applications. However, with certain modifications to existing XML-related specifications in the future it may be possible to achieve additional benefits when using EXI, still without disruption to existing XML-based applications. Furthermore, in a multi-application system where only some applications adopt EXI, sending EXI data to the other applications can potentially cause disruption, so care is needed to account for differing format support among the participating applications.

8 References

EXI
Efficient XML Interchange (EXI) Format 1.0 (Working Draft), John Schneider and Takuki Kamiya, Editors. World Wide Web Consortium. (See http://www.w3.org/TR/exi/.)
XML
Extensible Markup Language (XML) 1.0 (Fourth Edition), Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maier, and François Yergeau, Editors. World Wide Web Consortium, 16 August 2006. (See http://www.w3.org/TR/2006/REC-xml-20060816/.)
XML Infoset
XML Information Set (Second Edition), John Cowan and Richard Tobin, Editors. World Wide Web Consortium, 4 February 2004. (See http://www.w3.org/TR/2004/REC-xml-infoset-20040204.)
XML Signature
XML-Signature Syntax and Processing, Donald Eastlake, Joseph Reagle, and David Solo, Editors. World Wide Web Consortium, 12 February 2002. (See http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/.)
XML Encryption
XML Encryption Syntax and Processing, Donald Eastlake and Joseph Reagle, Editors. World Wide Web Consortium, 10 December 2002. (See http://www.w3.org/TR/2002/REC-xmlenc-core-20021210/.)
Canonical XML
Canonical XML Version 1.0, John Boyer, Editor. World Wide Web Consortium, 15 March 2001. (See http://www.w3.org/TR/2001/REC-xml-c14n-20010315.)
Excl XML Canonicalization
Exclusive XML Canonicalization Version 1.0, John Boyer, Donald E. Eastlake, and Joseph Reagle, Editors. World Wide Web Consortium, 18 July 2002. (See http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/.)
XOP
XML-binary Optimized Packaging, Martin Gudgin, Noah Mendelsohn, Mark Nottingham, and Hervé Ruellan, Editors. World Wide Web Consortium, 25 January 2005. (See http://www.w3.org/TR/2005/REC-xop10-20050125/.)
EXI Best Practices
Efficient XML Interchange (EXI) Best Practices (Working Draft), Mike Cokus and Daniel Vogelheim, Editors. World Wide Web Consortium. (See http://www.w3.org/TR/exi-best-practices.)

A Acknowledgements

This document is the work of the Efficient XML Interchange (EXI) WG.

Members of the Working Group are (at the time of publication, sorted alphabetically by last name):

Carine Bournez, W3C/ERCIM (staff contact)
Don Brutzman, Web3D Consortium
Alex Ceponkus, AgileDelta, Inc.
Michael Cokus, MITRE Corporation (chair)
Roger Cutler, Chevron
Ed Day, Objective Systems, Inc.
Philippe de Cuetos, Expway
Joerg Heuer, Siemens AG
Alan Hudson, Web3D Consortium
Takuki Kamiya, Fujitsu Limited
Jaakko Kangasharju, University of Helsinki
Richard Kuntschke, Siemens AG
Don McGregor, Web3D Consortium
Daniel Peintner, Siemens AG
Santiago Pericas-Geertsen, Sun Microsystems, Inc.
Liam Quin, W3C/MIT
Rich Rollman, AgileDelta, Inc.
Paul Sandoz, Sun Microsystems, Inc.
John Schneider, AgileDelta, Inc.
Cedric Thienot, Expway
Yun Wang, Intel Corporation
Greg White, Stanford University (former co-chair)

The EXI Working Group would like to acknowledge the following former members of the group for their leadership, guidance and expertise they provided throughout their individual tenure in the WG. (sorted alphabetically by last name)

Robin Berjon, Expway (former co-chair) (until 17 October 2006)
Oliver Goldman, Adobe Systems, Inc. (former co-chair) (until 08 June 2006)
Peter Haggar, IBM (until 07 March 2007)
Kimmo Raatikainen, Nokia (until 18 March 2008)
Paul Thorpe, OSS Nokalva, Inc. (until 11 September 2007)
Daniel Vogelheim, Invited Expert (former co-chair then from Siemens AG) (until 28 February 2008)
Stephen Williams, High Performance Technologies, Inc. (until 30 June 2008)