W3C

XML Binary Characterization Use Cases

W3C Working Draft 28 July 2004

This version:
http://www.w3.org/TR/2004/WD-xbc-use-cases-20040728/
Latest version:
http://www.w3.org/TR/xbc-use-cases
Editors:
Mike Cokus, MITRE Corporation
Santiago Pericas-Geertsen, Sun Microsystems

Abstract

This document describes use cases for evaluating the potential benefits of an alternate serialization for XML. The use cases are documented here to understand the constraints involved in environments for which XML employment is currently problematic because of one or more inherent inefficiencies in XML 1.x. Desirable properties of alternate XML serializations to address the use cases are derived and discussed in a separate publication of the XML Binary Characterization Working Group (XBC WG).

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the First Public Working Draft of the XML Binary Characterization Use Cases document. It has been produced by the XML Binary Characterization Working Group, which is part of the XML Activity.

This document will be part of a series of documents following this work on Use Cases determination. Further work in the XML Binary Characterization Working Group will focus on characterizing the properties that are required by the use cases, and establishing objective, shared measurements to help judge whether XML 1.x and alternate binary encodings provide the required properties.

This is a First Public Working Draft and is expected to change. The XML Binary Characterization Working Group does not expect this document to become a Recommendation. Rather, after further development, review and refinement, it will be published and maintained as a Working Group Note.

Comments on this document should be sent to public-xml-binary-comments@w3.org (public archives). It is inappropriate to send discussion emails to this address.

Discussion of this document takes place on the public public-xml-binary@w3.org (public archives).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Table of Contents

1 Introduction
2 Use Case Structure
3 Documented Use Cases
    3.1 Binary XML in Broadcast Systems (video metadata and TV EPG/ESG data)
        3.1.1 Description
        3.1.2 Domain & Stakeholders
        3.1.3 Justification
        3.1.4 Analysis
        3.1.5 Alternatives
        3.1.6 References
    3.2 Floating Point Arrays in the Energy Industry
        3.2.1 Description
        3.2.2 Domain
        3.2.3 Justification
        3.2.4 Analysis
        3.2.5 Alternatives
        3.2.6 References
    3.3 X3D Graphics Model Compression, Serialization and Transmission
        3.3.1 Description
        3.3.2 Domain & Stakeholders
        3.3.3 Justification
        3.3.4 Analysis
        3.3.5 Alternatives
        3.3.6 References
    3.4 Web Services for Small Devices
        3.4.1 Description
        3.4.2 Domain
        3.4.3 Justification
        3.4.4 Analysis
        3.4.5 Alternatives
        3.4.6 References
    3.5 Web Services as an Alternative to CORBA
        3.5.1 Description
        3.5.2 Domain
        3.5.3 Justification
        3.5.4 Analysis
        3.5.5 Alternatives
        3.5.6 References
    3.6 Embedding External Data in XML Documents
        3.6.1 Description
        3.6.2 Domain
        3.6.3 Justification
        3.6.4 Analysis
        3.6.5 Alternatives
        3.6.6 References
    3.7 Electronic Documents
        3.7.1 Description
        3.7.2 Domain & Stakeholders
        3.7.3 Justification
        3.7.4 Analysis
        3.7.5 Alternatives
        3.7.6 References
    3.8 FIXML in the Securities Industry
        3.8.1 Description
        3.8.2 Domain
        3.8.3 Justification
        3.8.4 Analysis
        3.8.5 Alternatives
        3.8.6 References
    3.9 Multimedia XML Documents for Mobile Handsets
        3.9.1 Description
        3.9.2 Domain
        3.9.3 Justification
        3.9.4 Analysis
        3.9.5 Alternatives
        3.9.6 References
    3.10 PC-free Photo Printing
        3.10.1 Description
        3.10.2 Domain
        3.10.3 Justification
        3.10.4 Analysis
        3.10.5 Alternatives
        3.10.6 References
    3.11 PC-free Photo Album Generation
        3.11.1 Description
        3.11.2 Domain
        3.11.3 Justification
        3.11.4 Analysis
        3.11.5 Alternatives
        3.11.6 References
    3.12 Intra/Inter Business Communication
        3.12.1 Description
        3.12.2 Domain & Stakeholders
        3.12.3 Justification
        3.12.4 Analysis
        3.12.5 Alternatives
        3.12.6 References
    3.13 X3D CAD Files
        3.13.1 Description
        3.13.2 Domain & Stakeholders
        3.13.3 Justification
        3.13.4 Analysis
        3.13.5 Alternatives
        3.13.6 References
    3.14 Businesses Process with XML Documents
        3.14.1 Description
        3.14.2 Domain & Stakeholders
        3.14.3 Justification
        3.14.4 Analysis
        3.14.5 Alternatives
        3.14.6 References
4 Summary
5 References

Appendices

A Acknowledgments (Non-Normative)
B XML Binary Characterization Use Cases Changes (Non-Normative)


1 Introduction

While XML has been enormously successful as a markup language for documents and data, the overhead associated with generating, parsing, transmitting, storing, or accessing XML-based data has hindered its employment in some environments. The question has been raised as to whether some optimized serialization of XML is appropriate to satisfy the constraints present in such environments. In order to address this question, a compatible means of classifying the requirements posed by specific use cases and the applicable characteristics of XML 1.x must be devised. This allows a characterization of the gap between what XML 1.x supports and use case requirements. In addition, it also provides a way to compare use case requirements to determine the degree to which an alternate serialization would be beneficial.

Use cases describing situations where the limitations of XML prevent its effective use are presented in this document. The XBC WG has made efforts through internal discussion and dialog with the XML community to define a set of use cases that are representative of environments in which an alternate XML serialization has benefit. Comments on the set of use cases presented is invited, especially if important use cases have been ommitted.

2 Use Case Structure

In this section we elaborate on the template used to present the use cases. All the use cases collected by this WG are listed in 3 Documented Use Cases.

3 Documented Use Cases

The use cases identified by or submitted to the working group are documented below, in accordance with the meta data defined in 2 Use Case Structure.

3.1 Binary XML in Broadcast Systems (video metadata and TV EPG/ESG data)

3.1.1 Description

The constant progress of digital TV, the multiplication of channels, the competition and convergence with the Web, and the widespread deployment of a variety of set-top boxes (notably PVRs) call for services on TV sets that extend beyond simply broadcasting audio/video content.

For instance, above a certain number of channels, broadcasters find themselves having to provide EPG (Electronic Program Guide) services to their users, without which they would be overwhelmed with the sheer amount of available content. These EPGs also allow PVRs (Personal Video Recorders) to automatically pick up recording schedules for given programs, based on user-defined criteria that match against metadata broadcasted alongside the data.

Similarly, broadcasters are also trying to make their offer more attractive by integrating TV with Web technologies or the Web at large. This includes notably using Web Services from PVRs that benefit from a return channel, using Web UI technologies such as SVG and XForms to define their applications' interfaces, making TV services available to mobile devices, and so forth.

However, there are constraints that cause problems when trying to deploy such services, all of which rely on XML, to television sets:

  • Bandwidth. TV bandwidth is extremely expensive, and how much data you use for services directly constrains the number of channels that you are able to send. In addition to the potential technical issues, there are strong economic motivations to reduce bandwidth usage as much as possible;

  • Processing Power. Most set-top boxes are cheap, and the low-end ones have roughly half the power (if not worse) of a low-end mobile device. Contrary to mobile devices, there are few limitations as to the processing power that can be embarked in a box the size of the average STB, notably the problems relating to heat and battery life are of little or no concern. However, on the one hand large-scale deployment of STBs and similar devices into households requires them to be extremely cheap and therefore as limited as possible, and on the other hand convergence with mobile devices remains a prime motivator for the television industry and contraints applicable to mobile devices apply equally to broadcasted XML metadata;

  • Unidirectional Network. This being broadcast, there is not typically a way for TVs to request data. Instead, it is being continuously streamed and restreamed to them, a process which is called carouselling (the data itself being 'on a carousel'). Some set-top boxes do in fact have a return channel (notably the ones that support Web Services) but most don't. If the data were sent as an XML document, it would have to be fragmented so that STBs wouldn't have to wait for the end of the entire document to have been carouselled in order to start exploiting the data. XML has not shown to be easily fragmentable and currently the carouselling relies on the ability of the binary formats to be fragmented;

  • Change Resilience. Upgrading several million STBs is often very impractical. Therefore, it must be possible to evolve the broadcast format without breaking older hardware. While XML is perfectly suited to this, the above issues make it unusable. It is thus required that the binary XML format replacing it be resilient to changes in the schema.

As a result, MPEG-7 BiM (a binary encoding of XML originally created to carry video metadata), has been integrated into a number of broadcasting standards, notably ARIB, DVB, and TV Anytime.

3.1.2 Domain & Stakeholders

This use case is relevant to the entirety of the television distribution industry, comprising content providers, broadcast infrastructure deployers, television and set-top box manufacturers, and of course the broadcasting companies themselves.

It also covers similar requirements that can be found in digital radio broadcasting, where one equally needs to broadcast EPG metadata to very limited devices, to integrate with mobile devices (for instance by sending SVG ads as part of the radio stream).

And finally, convergence with TV is considered to be a major next step in mobile services, and all participants on both sides of the fence are presently being extremely active in making television available anywhere, at any time, and on any device.

3.1.3 Justification

Television is a very large market that has a strong need for program metadata, and is increasingly converging with the Web at large (with a strong emphasis on mobile devices at first), notably using technologies such as XHTML, SVG, XForms, SMIL, and Web Services.

Deployed systems already use binary XML, currently standardised as part of ISO MPEG and industry fora such as ARIB, DVB, or TV-Anytime.

3.1.4 Analysis

XML is appropriate for these situations because:

  1. existing specifications based on XML are being reused wholesale;

  2. most major TV standards in the area are already XML-based, and the industry has no wish to go through another standards cycle;

  3. XML is well-suited to describing structured information such as metadata;

  4. XML has proven to be a good format to specify user interfaces in, using notably XHTML, SVG, or XForms. These are needed for TV applications, and they need to be broadcasted;

  5. the industry wishes to publish its data, especially the Electronic Program Guides, to as many media as possible. XML enables it to publish directly to desktops, mobiles, and TVs using off-the-shelf or Open Source software.

3.1.5 Alternatives

DVB EIT schedules (mostly obsolete).

3.1.6 References

  1. [DVB]

  2. [TV Anytime]

  3. [ARIB]

  4. [MPEG-7]

3.2 Floating Point Arrays in the Energy Industry

3.2.1 Description

The upstream segment of the energy industry is concerned with exploration for and production of oil and gas. XML-based techniques have made very little penetration into the upstream technology part of the energy industry. The most basic reason for this is the nature of the data, which does not at this time lend itself to being represented usefully in XML.

There are basically two core types of data in this industry: well logs and seismic data. Well logs are moderately large datasets while seismic datasets are real large, typically in the order of gigabytes. Although the Petrotechnical Open Standards Consortium (POSC) has produced an XML schema for well logs, it has not been adopted by the industry. At the time of writing, nobody has even considered defining a schema for seismic data.

Both seismic and well log data include control data, easily represented in XML, as well as large arrays of floating point numbers, not easily represented efficiently in XML. Although in practice an XML representation is not used, such data may be represented as shown in the following fragment (with a whole document consisting of a large number of these fragments):

    
    <header>
      <linename>westcam 2811</linename>
      <ntrace>1207</ntrace>
      <nsamp>3001</nsamp>
      <tstart>0.0</tstart>
      <tinc>4.0</tinc>
      <shot>120</shot>
      <geophone>7</geophone>
    </header>
    <trace>0.0, 0.0, 468.34, 3.245672E04, 6.9762345E05, ... (3001 floats)</trace>
    
    

3.2.2 Domain

The scope within the Energy Industry as discussed above is very broad, encompassing a very large number of technical issues and usage scenarios involving, for example, integration of drilling information, processing of seismic and well data, integration of seismic and well data into interpretation systems, and so on.

3.2.3 Justification

There are a number of dominant technology vendors in this sector as well as a number of small companies that "work around the edges". The dominant technology vendors (which are none of the technology giants) provide proprietary solutions that do not interoperate easily with each other. Providing communications between these products within a company, or between companies, is a constant problem: this is the main motivator to develop Web service interfaces for these products. A second motivator for a standard is that it will open the door for smaller companies to provide useful add-on products. Large budgets in this sector are allocated to the purchase of software packages and display devices, but these budgets are small compared to the leverage of mass-market devices, so a longer term objective is to encourage a situation were more technologies with mass-market cost leverage can be used.

3.2.4 Analysis

Given that this scenario involves interoperability between companies using disparate systems, XML is a natural choice due to its ubiquity and tool availability.

The main shortcoming of XML for this application is the expense incurred while converting floating point data to and from a character representation, as well as the extra size of some of these representations. Thus, the main requirement for this use case is the ability to represent sequences of floating point numbers in a binary format (as close to the native representation as possible), in order to facilitate efficient binding into programmatic objects (primarily, floating point arrays). In the example shown above, the header information would still have a textual representation (useful for any infoset-based processing), but the trace of floating point numbers would appear as an opaque binary stream.

The format chosen to represent a floating point number must be platform independent, with tools supporting conversions to and from the appropriate native format. In practice, most operations involve moving data between machines with the same floating point formats, so the solution should not impose undue overhead on the most common situation in order to handle the less common ones.

3.2.5 Alternatives

  • Data Compression: One expert in this area has said, "For us, binary compression is probably not that important because transmission speeds are constantly improving. The additional time needed to compress and decompress seismic data would probably slow things down. We also place a greater value in the message structures than the transmission mechanics". Or, in more picturesque words, again from an expert in the field when asked about compressing seismic data, "Been there, done that, doesn't work, not interested". Bear in mind that this epigram encapsulates decades of experience and highly sophisticated R&D.

  • CORBA: There is, in fact, a CORBA-based integration platform currently deployed (although perhaps not widely) in this space. Without diving into technical details, it is clear that some companies would prefer an approach based on Web services.

  • XML Protocol Attachments: It is possible to represent seismic data control information in XML and to put the floating point arrays in a binary attachment using XOP. This data architecture is certainly viable, assuming that the issues involving floating point numbers are addressed, as evidenced by the fact that many of the proprietary vendor data formats work this way. It is, however, less flexible than the header-trace architecture described above, which is probably one reason why the latter is used in industry-wide seismic data standards (e.g. SEGY). Nonetheless, Web services that return data using XOP are an attractive alternative for dealing with seismic data.

3.2.6 References

  1. [POSC]

3.3 X3D Graphics Model Compression, Serialization and Transmission

3.3.1 Description

Extensible 3D (X3D) Graphics is an XML-enabled ISO Standard 3D file format to enable real-time communication of 3D data across all applications and network applications. It has a rich set of features for use in commercial applications, engineering and scientific visualization, medical imaging, training, modeling and simulation (M&S), multimedia, entertainment, educational, and more. [1][2] Computer-Aided Design (CAD) and architecture scenes are also supported, but are treated as a separate use case due to even-higher sizes and complexity.

File sizes in this use case typically range from 1-1000 KB of data, often delivered over low-bandwidth links (56 Kbps or less). Binary serialization must be performed in concert with geometric compression (e.g. combining coplanar polygons, quantizing colors, etc.) Lossy geometric compression is sometimes acceptable and typically results in compression rates of 20:1. Due to interactivity requirements, the latency time associated with deserialization, decompression and parsing must be minimal. Digital signature and encryption compatibilities are also important for protecting digital content assets.

3.3.2 Domain & Stakeholders

Support Web-based interchange, rendering and interactivity for 3D graphics scenes.

3.3.3 Justification

The X3D Compressed Binary Encoding Request For Proposals (RFP) [3] lists and justifies ten separate technical requirements. A related XML binary encoding development effort shows that simultaneous successful composition of all these requirements. The Web3D Consortium and X3D designers see great value in aligning this technical approach with a W3C-developed XML binary compression recommendation.

3.3.4 Analysis

Taken together, the following technical requirements for the X3D Compressed Binary Encoding perhaps provide a superset of most compressed binary XML requirements. [4] A further technical challenge is that these capabilities must coexist compatibility in a single document.

  • X3D Compatibility: The compressed binary encoding shall be able to encode all of the abstract functionality described in X3D Abstract Specification.

  • Interoperability: The compressed binary encoding shall contain identical information to the other X3D encodings (XML and Classic VRML). It shall support an identical round-trip conversion between the X3D encodings.

  • Multiple, separable data types: The compressed binary encoding shall support multiple, separable media data types, including all node (element) and field (attribute) types in X3D. In particular, it shall include geometric compression for the following.

    • X3D Geometry - polygons and surfaces, including NURBS

    • XInterpolation data - spline and animation data, including particularly long sequences such as motion capture (also see Streaming requirement)

    • Textures - PixelTexture, other texture and multitexture formats (also see Bundling requirement)

    • Array Datatypes - arrays of generic and geometric data types

    • Tokens - tags, element and attribute descriptors, or field and node textual headers

  • Processing Performance: The compressed binary encoding shall be easy and efficient to process in a runtime environment. Outputs must include directly typed scene-graph data structures, not just strings which might then need another parsing pass. End-to-end processing performance for construction of a scene-graph as in-memory typed data structures (i.e. decompression and deserialization) shall be superior to that offered by gzip and string parsing.

  • Ease of Implementation: Binary compression algorithms shall be easy to implement, as demonstrated by the ongoing Web3D requirement for multiple implementations. Two (or more) implementations are needed for eventual advancement, including at least one open-source implementation.

  • Streaming: Compressed binary encoding will operate in a variety of network-streaming environments, including http and sockets, at various (high and low) bandwidths. Local file retrieval of such files shall remain feasible and practical.

  • Authorability: Compressed binary encoding shall consist of implementable compression and decompression algorithms that may be used during scene-authoring preparation, network delivery and run-time viewing.

  • Compression: Compressed binary encoding algorithms will together enable effective compression of diverse datatypes. At a minimum, such algorithms shall support lossless compression. Lossy compression alternatives may also be supported. When compression results are claimed by proposal submitters, both lossless and lossy characteristics must be described and quantified.

  • Security: Compressed binary encoding will optionally enable security, content protection, privacy preferences and metadata such as encryption, conditional access, and watermarking. Default solutions are those defined by the W3C Recommendations for XML Encryption and XML Signature.

  • Bundling: Mechanisms for bundling multiple files (e.g. X3D scene, Inlined subscenes, image files, audio file, etc.) into a single archive file will be considered.

  • Intellectual Property Rights (IPR): Technology submissions must meet the Web3D Consortium IPR policy. (Of note is that all such submissions and the forthcoming specification are further compatible with the W3C Patent Policy.)

3.3.5 Alternatives

GZIP is the specified compression scheme for Virtual Reality Modeling Language (VRML 97) specification, the second-generation ISO predecessor to X3D. GZIP is not type-aware and does not compress large sets of floating-point numbers well. GZIP allows staged decompression of 64KB blocks, which might be used to support streaming capabilities. GZIP outputs are strings and require a second pass for any parsing, thus degrading parsing and loading performance.

Numerous piecemeal, incompatible proprietary solutions exist in the 3D graphics industry for Web-page plugins. None address the breadth of technical capabilities that might be enabled by binary XML compression.

An X3D-specific binary compression and serialization algorithm for XML is certainly feasible and demonstrated. Compatibility with a general recommendation for XML compression is desirable in order to maximize interoperability with other XML technogies, reduce implementation cost. Many of these issues are common to other use-case domains, broad mutual benefits become possible via a common recommendation.

3.4 Web Services for Small Devices

3.4.1 Description

As Web services become more and more ubiquitous, there is a greater demand to use this technology as a way to deliver content to small devices such PDAs, pagers and mobile phones. All these devices often share the following characteristics:

  1. They have limited memory and limited processing power.

  2. Battery life is at a premium.

  3. They are connected to low-bandwidth, high-latency networks which in some cases are regulated by "pay-per-byte" policies.

XML-based messaging is at the heart of the current Web services technology. XML's self-describing nature has significant advantages, but they come at the price of bandwidth and performance. XML-based messages are larger and require more processing than other protocols, and are therefore not well suited for a domain having the characteristics outlined above. Increased bandwidth usage affects wireless networks due to bandwidth restrictions allotted for communication by each device. In addition, the larger the message the higher the probability of a retransmission as a result of an on-the-air collision.

Mainstream devices limiting code size to 64K and heap size to 230K are the target platform of this use case. The transport packet size may vary from network to network, but it is typically measure in bytes (e.g. 128 bytes).

3.4.2 Domain

Small devices connected to low-bandwidth, high-latency networks.

3.4.3 Justification

XML is the fundamental technology underlying a Web services infrastructure, and one of the main reasons why Web services are not being deployed on the mobile space. A number of alternative serializations have already been developed to deliver XML content to small devices, however, many of these are not interoperable. This lack of interoperability results in fragmentation and the need for specialized gateways to transcode proprietary formats.

3.4.4 Analysis

In order to satisfy the requirements of this use case, an alternative serialization must be faster to process and must produce smaller packets. Faster processing will result in lesser battery consumption while smaller packets will result in reduced latency as well as, assuming a pay-per-byte model, a more cost-effective service. In addition to small and fast, an alternative serialization should also be streamable, i.e. it should be possible for the client application to operate on any prefix of the serialized data.

Assuming that the same amount of information is encoded in an alternative serialization, a way to quantify efficiency is to consider the instruction to data ratio. In other words, the amount of effort that is needed to produce or consume a unit of data. Even though this is an implementation requirement, an alternative serialization must enable the creation of "thin" stacks with a low instruction to data ratio.

The reduction in latency that results by improving parsing speed may or may not be noticeable to the consumer depending on the transport latency of the network --transport latency is the dominant factor in many existing networks. Nevertheless, a more efficient parsing method will improve battery life on the device as well as throughput on the server.

3.4.5 Alternatives

Proprietary solutions result in the so-called gatewayed networks, where communication is always routed through a single point that translates to and from XML. This architecture not only creates a single point of failure within a network but also fragments the entire network by creating non-interoperable, domain-specific solutions.

Message size reductions are attainable via the use of standard data compression techniques. Even though in general decompression is less expensive than compression, it is still too costly for most small devices. Additionally, the extra burden of compressing packets has a negative impact on the overall system throughput.

In addition to the added cost, redundancy-based compression algorithms tend to perform very poorly on small messages, in many cases resulting in larger messages. Mobile clients often carry on dialogs with servers which consist of a large number of small messages. Examples of this include: data synchronization, stateful web services, multi-player games, quering and browsing data. In all of these use cases, the cumulative stream of messages that make up the dialog can grow very large even though all of the individual messages are rather small. Thus, there is still a need to reduce the amount of data exchanged, but doing this by compressing each message individually is not a viable solution.

3.4.6 References

  1. [Fast Web Services]

  2. [SOAP Performance]

  3. (more to come ...)

3.5 Web Services as an Alternative to CORBA

3.5.1 Description

A large number of existing enterprise systems are built using distributed technologies such as RMI, DCOM and CORBA. As the industry moves from distributed object systems to Service Oriented Architectures (SOAs), the use of Web services technologies becomes more significant even within the confines of a single enterprise. Many of the concepts behind SOAs are applicable to divisions within an corporation, so it is only natural to extend the applicability of Web services to intranet systems.

A stumbling block that several re-architected systems are facing is that XML-based messages are larger and require more processing than those from existing protocols: data is represented inefficiently and binding requires more computation. It has been shown that an RMI service can perform up to an order of magnitude faster than an equivalent Web Service due to the processing required to parse and bind XML data into programmatic objects.

3.5.2 Domain

Web Services within the enterprise.

3.5.3 Justification

There are some important economic reasons that support the use of Web services as an alternative to existing technologies for building distributed systems. First, preliminary results show more powerful hardware is needed to re-deploy existing systems using Web services technologies given the additional processing requirements of an XML messaging system. Second, assuming the company in question already develops (or is planning to develop) Web services to communicate outside their firewall, their is the extra incentive in using the same set of tools and the same development team to build intranet applications. This reduces both software fees (e.g., by reusing application servers and development tools) as well as training costs associated with having separate development teams for each technology. Third, some companies that have successfully deployed CORBA-based systems, but are not planning on deploying Web services, may find an additional incentive to do so if a more efficient serialization is standardized.

3.5.4 Analysis

Intranet Web services differ from Internet Web services especially in the areas of deployment and security: deployments are easier to manage and security is typically defined by a single domain. The requirements for intranet systems are somewhat different from those for Internet systems, permitting the use of certain optimizations in the former which would be difficult or simply impossible to implement in the latter. Consequently, in many cases the degree of coupledness of the systems can be adjusted if this helps achieving the desired performance goals.

The main requirement for this use case is reducing XML processing time in order to achieve a level of performance comparable to the existing systems. Due to the available of high-speed networks in these scenarios, reducing message sizes is of a lesser priority. It is worth pointing out that not all systems re-deployed using Web services will be unable to achieve their performance requirements. Therefore, this use case applies only to a subset of the aforementioned re-deployments.

3.5.5 Alternatives

  • Keep using existing technology without migrating to Web services.

  • Re-design the system's interfaces to make them more coarse grained in order to reduce the number of messages exchanged.

3.6 Embedding External Data in XML Documents

3.6.1 Description

Although the data in an XML document is encoded as text, it is often the case that portions of that text are in fact embedded documents in and of themselves. This frequently occurs when XML documents contain images, recordings, or other multimedia elements which have their own file formats --JPEG, MP3, and so on. In order to embed these documents they are translated to a textual representation using base64-encoding or a similar scheme. It is worth noting that these embedded documents are often much larger than the encapsulating document.

For document-oriented applications, these embedded files are often part of the document, in the sense that they are intended to be rendered along with the text in the encapsulating XML document. Thus, the translation from text back to the original file format often occurs as the rest of the document is being parsed. The case in which the embedded document is in fact an XML document (e.g., an SVG graphic embedded within an XSL-FO document) can be regarded as a special case in which no translation is required.

For message-oriented applications, embedded documents are simply payload elements encapsulated within, e.g., a SOAP body element. In these cases, the payloads may be of arbitrary or even unknown formats. They are often translated to text during transmission and back to their original form upon reception, even if not otherwise immediately consumed, in order to reduce storage space. Because these embedded documents can be large, storage requirements are further reduced by streaming on input and output. In these cases the embedded document may also be XML but may contain either processing instructions or DTDs, both of which should not appear within a SOAP body element (WS-I Basic Profile, R1008 and R1009). Therefore, such files may be treated as if they were binary data and base64-encoded even if they are, in fact, valid XML files.

3.6.2 Domain

This use case considers the domains of both electronic documents and Web services.

3.6.3 Justification

XML was not designed to contain binary data, and other packaging mechanisms such as MIME, exist and are in many ways suitable to the purpose at hand. On the other hand, the ability to treat embedded documents as part of the primary documents, and therefore make them accessible to XML-based standards and tools like XPath without resorting to additional standards like MIME, is useful in practice. Thus, this use case is a good demonstration of why one might wish to extend XML with a binary encoding.

3.6.4 Analysis

This use case builds on applications of XML for documents and Web services that are already well established. Furthermore, those uses already involve the transmittion of documents with binary data like integers and floats. The question is, for each given embeddable datum, should it be placed inside the document or should it be carried as an attachment?

The drawbacks to embedding are the penalties in time due to the translation into text, and in space, due to the larger size of the translated data. The benefits include access to other XML-based technologies like XPath, XQuery, etc. and the avoidance of an additional dependency on a packaging technology such as MIME.

To address this use case, a binary XML format must permit binary documents to be embedded within an XML document without requiring a translation to a text form; a binary XML format must also support the streaming of such XML documents, a desirable feature in Web service calls.

3.6.5 Alternatives

SOAP with Attachments provides a MIME-based mechanism for packaging binary data with SOAP messages. It avoids the translation costs, but does not make the binary data part of the XML document itself. In that respect, it is not a streamable format.

XOP describes how a MIME-based package can be used to encapsulate the binary data without a translation overhead by keeping it (at least conceptually) as part of the encapsulating XML document. Because of its use of MIME, this approach suffers from many of the same shortcomings of the SOAP with attachments case.

3.7 Electronic Documents

3.7.1 Description

Documents are the most basic form of recorded human communication, dating back thousands of years. Electronic documents are the transition of this invention to the online, computerized world. Books, forms, contracts, emails, spreadsheets, and Web pages are only some of the forms in which electronic documents are used. Unlike paper-based documents, electronic documents are not limited to static text and images. Electronic documents regularly contain both static content, dynamic content (e.g., animations, video), and interactive content (e.g., form fields). This wide range of content has a great affect on selecting an appropriate representation format and must be considered in evaluating this use case.

Documents are first created in some authoring environment. During the creation process the author may elect to include text, fonts, image, videos, or other resources which are to be rendered more than once when the document is displayed. For example, a company logo may appear in the header of each page of a document, but this should not require adding the logo to the document more than once.

In a special case of document assembly, new documents are created by assembling a set of existing documents into a single aggregate document. For example, this may done to combine a basic product manual with additional documentation for optional product accessories into a customized manual for an individual purchaser. When documents are bound together in this way it may be important that the data in the original documents is not modified, so as to preserve signatures or other properties of the file, or it may be desirable to identify and eliminate duplicated resources, such as fonts.

After a document has been created it is usually read, in whole or in part. Documents are not necessarily read front to back; a particular reader may select a different order or read only part of a document. A reader may, for example, obtain the document by traversing a hyperlink which points to a specific location within the document. It is important that rendering a document for reading be fast, even when starting at an arbitrary location in this way, and even when documents are large (millions of pages). This implies that it must be possible to navigate to specific sections within a document quickly, as well as follow links to shared resources within the document, as mentioned above under document creation. Finally, if a document is being retrieved over a slow link, it may be useful to fetch portions of the document in the order in which they are being rendered and read (e.g., starting at page 700), as opposed to document order (i.e., starting at page 1).

Documents often contain information of a sensitive or proprietary nature and so can be secured using encryption technologies. Encrypting the document can serve either to keep the contents confidential, to--in conjunction with the rendering application--allow only certain operations ("rights") on the document, or both. Typically a description of any rights granted is embedded within the document itself when it is encrypted. It is often desirable that only portions of a document be encrypted so that intermediaries can access some portion of the data in the file.

Documents, and especially those used in business transactions, are often signed to indicate authenticity of, consent to, or agreement with the document. In electronic documents, this is implemented by digitally signing the document. The digital signature must itself be stored in the document. Multiple signatures may be applied to a document, each one signing those which came before it. Additional information is sometimes added to a document after it has been signed but without invalidating a signature--in the same way one can initial a correction to a paper document--but so that it is clear that any subsequent changes were not present when the pre- existing signatures were applied. In some cases signatures should apply to only part of a document, leaving other parts for later modification. Finally, it must be possible for a recipient to validate all of these signatures.

Documents are often long-lived and, during the course of their lives, used in different environments with varying constraints. For example, when a document is being published for general consumption, it might be most desirable to select an encoding such as XML which is widely understood. If, however, the same document is being transmitted between partners with known expectations a more compact format such as XOP might be preferred. Thus, a single document may sometimes be transformed between different encodings at different times and for different purposes. Such transformations should preserve the information in the document, but these operations cannot be expected to be compatible with encryption mechanisms used to secure documents.

Even when various encodings are available documents tend to push the available storage and bandwidth of the devices on which they are created, stored, transmitted, and read. In other words, as device capabilities increase, users respond by creating larger documents. Note that these documents rarely contain only text; they generally contain larger elements such as fonts and images and, increasingly, video and 3D models which these same enhanced devices make possible.

Electronic documents, like their paper counterparts, can be modified or repurposed. In electronic documents, this typically occurs when pages, images, videos, and so forth are either copied out of a document to be used elsewhere or removed from a document to produce an altered version of that document. Again, these operations should be efficient: removing any one page from a one million page document should not take significantly longer than doing the same to a ten page document.

Documents may also be modified by their recipients to include comments of various types--editors' marks, sticky notes, etc.--usually intended to communicate responses back to the author. These comments may be stored within the document itself; both adding them to and extracting them from the document should be efficient.

Finally, some documents are designed to be interactive beyond the limited interactions of rendering, signing, and annotating. These documents may contain form fields, GUI widgets such as buttons and listboxes, or other active elements, data islands bound to these widgets, and code, scripts, or declarative logic to validate input to these elements, enable or disable the elements, transmit the document, modify the document, interact with the rendering application, and so forth. It must be possible to describe and access all of these elements within the document itself.

3.7.2 Domain & Stakeholders

Electronic documents are used extensively throughout government, business, and personal domains as well as in the interchange between these entities.

3.7.3 Justification

XML is in its roots a syntax for marking documents, and so the electronic document use cases seem highly relevant. Interestingly, XML has a number of shortcomings (discussed below) with respect to many of the requirements derived from this use case. Arguably, these occur because XML (and SGML) were focused largely on textual documents, but such documents represent a decreasing fraction of all electronic documents. Thus, Binary XML as a natural extension of XML to handle new document types, and documents containing new content, seems particularly relevant.

3.7.4 Analysis

Documents are almost always exchanged between two or more people, and often between larger entities such as corporations or governments. It is, therefore, extremely desirable that an electronic document format should be easily consumable by all parties involved. XML, as a widely accepted, implemented, and used format, fits this need quite well.

Unfortunately there a number of requirements imposed by electronic documents which XML fails to address:

  • Documents frequently contain embedded resources such as fonts, images, and video which are themselves encoded in binary formats. It must be possible to efficiently embed these resources in documents. XML does not meet this requirement because it requires that such resources are transformed to a text encoding, which adds both time and space costs.

  • The conversion of a document between different encodings must preserve all information in the document, including digital signatures.

  • It must be possible to navigate to and render a specified location in better than linear time with respect to the size of the document (i.e., "random access").

  • The document encoding must be efficient with respect to space, that is, it must have low entropy.

  • In order to make updates efficient, it must be possible to update a document in time proportional to the size of the update rather than the size of the document.

There are a number of requirements which XML does address, but which are enumerated here as well because they would also be requirements on any Binary XML encoding:

  • Re-usable resources may appear, or be referenced from, multiple locations within the document. In order to maintain reasonable document sizes, it must be possible for these resources to be used by reference, rather than by duplication.

  • It must be possible to efficiently assemble even large documents.

  • It must be possible to assemble signed documents in such a way that their signatures are preserved.

  • It must be possible for a document to contain multiple signatures, full or selective, from one or more signers.

  • It must be possible to read a secured (encrypted) document without suffering an unreasonable delay when first viewing the document, without unreasonably exposing the decrypted contents of the document, and while obeying rights associated with the document.

  • It must be possible to efficiently extract data from the document (i.e., a document fragment) and without modification to the extracted data.

3.7.5 Alternatives

The current de facto standard for interchange of electronic documents is Adobe's Portable Document Format, or PDF. PDF meets all of the requirements stated here.

3.7.6 References

  1. [PDF]

3.8 FIXML in the Securities Industry

3.8.1 Description

The Securities industry has cooperated to define a standard protocol and a common messaging language called FIX which allows real-time, vendor/platform neutral electronic exchange of securities transactions between financial institutions.

The original definition of FIX was as a tag-value pair format. Due to increase competition by the year 1999, and to better accomodate business models of emerging initiatives, an XML-based message format for application-layer messages called FIXML was devised. Even though FIXML was designed to have minimum impact on existing systems, in order to protect investments in traditional FIX systems and processes, it soon became evident that the new message size was as much as 6 times larger than its tag-value predecessor, a condition that precluded key participants in the industry to integrate FIXML into their systems. This problem, together with some positive findings made through experiments, spurred the discussion for size reduction of FIXML messages, which culminated in a new format called Transport Optimized FIXML (TO-FIXML) in FIXML version 4.4. TO-FIXML is essentially a collection of XML Schema definitions that uses name abbreviations as well as attributes instead of elements wherever possible to collectively reduce FIXML messages up to 4 times.

3.8.2 Domain

Securities industry engaging in capital markets such as derivatives, equity and fixed-income markets, where the FIX protocol is applicable and is moving towards SOA architectures based on FIXML. Major roles played in the industry include brokers, exchanges and clearing houses.

3.8.3 Justification

Even though TO-FIXML has been designed to minimize message sizes, some industry participants still consider it to be a sub-optimal solution and envisage the possibility of further optimization by studying binary-compatible XML formats.

3.8.4 Analysis

XML was the natural choice for the securities industry in light of its expandability and flexibility, which was required for the continuous and rapid evolution of the FIX protocol. There was also a demand for cross-industry interoperability given the broad adoption of XML by other financial industries.

XML Schema is the point of agreement for mutiple parties to share a common transport format. However, the bloated size of the XML instances resulted in artificial changes to the schemas, with the sole purpose of reducing the number of bytes on the wire. Clearly, XML Schema is not the right place to tackle this problem given that the syntax verbosity is a property exclusive to the XML serialization. Stated differently, XML Schema is the point of agreement in terms of vocabulary and structure, not in terms of syntax.

3.8.5 Alternatives

Message size alone can be substantially reduced by standard compression methods. However, there is a study that shows compression of FIXML instances increases round trip time over 10 Mbps network. Compression may be useful for considerably slower networks, which is not the typical case in FIXML. The same study also suggests that marshalling/unmarshalling costs do not seem to make tangible performance difference in those data sets typically seen in FIX scenarios.

3.9 Multimedia XML Documents for Mobile Handsets

3.9.1 Description

The Service Enabler standard for mobile handsets benefits from extensive use of XML-based technologies for interoperability. For example, SMIL, SVG and XHTML are used as document formats for mobile content services such as:

  • Multimedia Messaging Services (MMS): MMS in 3G consists of multiple XML documents, such as SMIL, SVG and XHTML. The handset is required to parse and render multi-namespaced XML documents.

  • Map Services: Map data delivered to a handset is split into multiple chunks based on region and level of detail; handsets retrieve additional chunks in response to user zooms and scrolls. Additional data, such as restaurant information supplied by other content providers, can also be overlayed on top.

XML documents in these services are considerably large. For instance, the map data represented in SVG could be 100KB or more. Rich content MMS could also be very large. Even on today's high-end handsets with 120 MHz 32-bit RISC processors, parsing a raw 100 KB XML document takes approximately 10 seconds.

3.9.2 Domain

This use case applies to multimedia services for the mobile handsets.

3.9.3 Justification

XML is required for maximum interoperability. In fact, XML technology is already widely adopted in the mobile services space. As this area requires a solution for narrow band and limited footprint devices, the importance of this use case should be considered high.

3.9.4 Analysis

This use case requires the following capabilities of XML to be preserved:

  • Interoperability

  • Multiple namespace support

  • In-memory, random access using a DOM

Interoperability is mandatory as the same documents must be shared among different handsets. Moreover, for map services, the layering of multiple source map data requires interoperability among the providers. Support for multiple namespaces is a must in order to deliver multi-format messages (e.g. HTML + SVG) to the devices. DOM access is required to support ECMA scripting as well as for efficient rendering of formats such as SVG.

The requirements not satisfied by current XML solutions that must be addressed are:

  • Efficient transmission of XML documents by reducing their sizes

  • Efficient access to a DOM, i.e. efficient DOM parsing

3.9.5 Alternatives

The WAP Forum defined a WAP Binary XML format as an alternative serialization for XML. However, this format has a number of shortcomings, the biggest of which is the lack of support for multi-namespace documents due to the use of a "single dimension" system of 6-bit tags.

3.10 PC-free Photo Printing

Editorial note: MC/SPJuly 20, 2004
This use case is currently under development.

3.10.1 Description

PictBridge is a standard used to directly connect digital cameras to print devices, supported by numerous vendors. Future products may require improved prints containing borders, metadata, etc. The ideal format for such display is an XML based presentation format such as SVG or XHTML. Cameras and printers both have limited CPU power and thus cannot afford to consume cycles to base64 encode and decode the image data. A binary packaged aggregate containing the XML document and its referenced image data is required.

3.10.2 Domain

Digital photography.

3.10.3 Justification

Camera to printer direct exists now. Vendors will either adopt a proposed standard or develop their own under the same group that developed PictBridge.

3.10.4 Analysis

XML is appropriate as the SVG namespace provides rich graphical functionality that would enhance photo prints. Few other open standards for such graphical data exist.

3.10.5 Alternatives

Flash. PostScript.

3.10.6 References

Digital camera makers, printer makers, e.g. Canon, Nikon, Eastman Kodak, HP, etc. Standards bodies include W3C and US Camera & Imaging Products Association ( CIPA ).

3.11 PC-free Photo Album Generation

Editorial note: MC/SPJuly 20, 2004
This use case is currently under development.

3.11.1 Description

A person takes a number of photos with their digital camera. They choose an inbuilt template for photo album layout. They connect the camera to an SVGP enabled printer directly and send the final form SVG graphic including images, borders, framing etc. with no driver or PC required.

3.11.2 Domain

Digital photography.

3.11.3 Justification

Camera to printer direct exists now. Vendors will either adopt a proposed standard or develop their own under the same group that developed PictBridge.

3.11.4 Analysis

XML is appropriate as the SVG namespace provides rich graphical functionality that would enhance photo prints. Few other open standards for such graphical data exist.

3.11.5 Alternatives

Flash. PostScript.

3.11.6 References

Digital camera makers, printer makers, e.g. Canon, Nikon, Eastman Kodak, HP, etc. Standards bodies include W3C and US Camera & Imaging Products Association ( CIPA ).

3.12 Intra/Inter Business Communication

3.12.1 Description

Editorial note: MC/SPJuly 20, 2004
This use case is currently under development.

A large business communicates via XML with a number of remote businesses, some of which can be small business partners. These remote or small businesses often have access only to slow transmission lines and have limited hardware and technical expertise. The large business cannot expect the smaller partners to upgrade often or to use expensive technology. The primary illustrations of this use case comes from the energy and banking industries.

In the energy industry, the major upstream (exploration and production) operations of oil companies are largely in developing countries (e.g. Nigeria, Angola, Papua New Guinea), and it is a common problem to have very slow and perhaps unreliable communications between the main office and remote sites. It's not that the oil companies don't know how to set up a satellite feed, it's that they are often required by the local governments to use the communication facilities provided by that government, and these communications can be technically low-end and (very) expensive. So the common problem is one where there is plenty of processing power and bandwidth at both central and remote sites, but the communication between the two is slow.

Although many scenarios illustrating this problem have to do with upstream operations, this specific example will be from downstream (refining and marketing). It involves transmission of Point of Sale (POS) information back and forth between back office systems and remote sites. The data flowing to the remote sites includes "incremental price book" for dry goods and wet stock, currency exchange rates, promotion codes/rates/groups and so on. The data coming back includes raw sales transactions data, tank data, etc. One might have 1000 transactions per day per site with an average file size of 3K, for a total size of 3 Megs typically broken up into 12 documents (transmittal every 2 hours, referred to as "trickle feed"). Each document would then average 250 K. Currently the scope is for many thousands of sites connected to several regional back-office hubs. Connectivity ranges from VSAT to 32k analog connections. The 32k connections would only communicate once a day. This downstream situation includes a factor which is not common in upstream operations. Not only are there communication limitations, but in this case some of the remote sites also have limited processing capabilities because they are small businesses with limited resources.

In the banking industry, there is typically a main data center(s) and several connected branch offices, ATM machines, and business partners. The main data center usually has the latest in technology, however, the connected branch offices, ATM machines and partners are often without access to high speed connections and powerful hardware. Communicating between the various entities can be accomplished with XML Web Services, however, the size and speed issues of XML are troublesome for those without access to high speed lines and/or powerful machines.

3.12.2 Domain & Stakeholders

Retailing operations of large companies, particularly those where the actual retail outlets are SME's (Small to Medium Size Enterprises) and large companies with various small business partners and/or branch offices. The belief is that the experience gained in the scenarios described above is likely to be directly applicable to a number of other scenarios in the energy, banking, and other industries.

Note that the players in this use case have rather different situations and needs. The large company has significant sunk investment in complex backoffice systems, lots of hardware and a team of IT professionals. The objective here is to integrate the solution into a complex, high-tech environment. The SME, partner, or branch office, may have very limited hardware and technical resources and is probably highly motivated toward a simple, low-cost solution, preferably one that plugs-and-plays off the shelf without extensive configuration or integration. This creates a tension between flexible and capable on one hand and simple and cheap on the other.

3.12.3 Justification

TBD

3.12.4 Analysis

TBD

3.12.5 Alternatives

This is a case where there is a need to compress the entire data files, which are composed of a bunch of tags with relatively short data fields. That is, no huge arrays of floating point numbers causing special problems. Also note that the energy industry expects no particular problem with processing on either end, just the transmission, so the overhead of using compression algorithms is not a problem.

The energy industry currently plans to use native VSAT compression and probably one of many standard compression algorithms like ZIP for other transmission mechanisms. Initial tests with freeware ZIP compression software yielded a compression of 39:1, which is plenty. One problem that did occur, however, on small machines such as might be used by a small business, is that the compression algorithm may need to read the entire document into memory and work on it globally. On small machines this can cause paging and the resulting performance difficulties can be painful. Some sort of compression algorithm that works in a streaming mode or on "chunks" of data would obviously be preferable in these cases.

There are cases, however, where the SME is unwilling to use the CPU required by the compression. This has been encountered in cases where the small business has a computer, perhaps a mainframe, that is overburdened by other routine tasks. In this case binary serialization may be an attractive alternative assuming it can be done without extra CPU cycles.

The idea is that if one is going to have to parse the XML anyway (which may or may not be the case, depending on the business process), the CPU required to do that parsing is a "sunk cost". Once parsed, a binary serialization of the XML will probably be smaller than the usual text serialization because the tags are not repeated in text and some of the data fields (e.g. some numbers) may be smaller in their binary representation. For typical business documents one might expect a reduction on the order of a factor of two from binary serialization. This moderate reduction in file size may hit the "sweet spot" in cases where CPU is a big problem and file size a moderate concern. It seems likely, however, that this scenario will be less common than the case where reasonable computational capability is available in the small business and the slow transmission lines are the big problem. In these cases, as documented above, compression via standard techniques of the conventional text serialization of XML is probably the preferred solution.

Note that in both cases the needs of both small and large businesses can potentially be met. The large business gets the XML document it needs in order to integrate with its complex systems. The small business either uses inexpensive compression software or the parser outputs the binary serialization directly, so the complexity of the solution from their viewpoint is minimized.

3.12.6 References

TBD

3.13 X3D CAD Files

3.13.1 Description

Editorial note: MC/SPJuly 20, 2004
This use case is currently under development.

X3D is a 3D graphics standard developed by the Web3D Consortium designed to enable real-time communication of 3D data. It has a rich set of features for use in engineering and scientific visualization, CAD and architecture, medical visualization, training and simulation, multimedia, entertainment, education, and more.

The CAD working group within the consortium focuses on how to deliver CAD data for downstream uses like visualization and training. These files are XML encoded and typically range from 10-1000 MBytes of data. A desired feature is to deliver these files utilizing multi-namespaces to embed 2D data using SVG and other CAD industry specific languages. Current XML parsing speeds for these files are a major hinderance to further usage. In addition compression schemes (such as gzip) do not deliver the compression rates needed for Internet delivery. For this reason, X3D allows for pluggable algorithms to support content-aware compression. In XML terms, this translates to registering specific algorithms for attribute types, elements and document fragments (i.e. an element and its children).

3.13.2 Domain & Stakeholders

3D Visualization.

3.13.3 Justification

Typically float intensive data formats have been deployed as custom binary formats. These formats will not mesh well with other XML specifications. 3D data cannot just live in its own island, it needs to be interleaved between other formats like XHTML and SVG to form a complete document.

3.13.4 Analysis

TBD

3.13.5 Alternatives

Using multiple XML specifications together make using an X3D specific binary problematic. Basically it would need to duplicate a general XML binary compression scheme.

3.13.6 References

  1. [Extensible 3D (X3D) Graphics]

3.14 Businesses Process with XML Documents

Editorial note: MC/SPJuly 20, 2004
This use case is currently under development.

3.14.1 Description

Large XML documents flow through a business process. During the flow of a document, various business processes perform different, disjoint tasks. In addition, each distinct business process may only require portions of the entire XML document to complete their task. For example, a purchase order document may contain various customer information, shipping information, payment and billing information, etc. A business process is then defined where this document is passed to various entities, some being serviced by outside vendors, to approve and fill the purchase order.

3.14.2 Domain & Stakeholders

This Use Case pertains to small, medium, and large businesses that utilize XML to support intra and inter business process workflow.

3.14.3 Justification

TBD

3.14.4 Analysis

Business processes often utilize a workflow where each step in the process only needs and processes certain subsets of the entire document. This results in the different steps in the business process performing disjoint tasks on random parts of the overall document. These disjoint tasks, since they are only processing a subset of the document, do not require the entire schema to perform their task.

Even though the entire document is not required at each step in the business process, the entire document is passed each time. In addition, each business process requires a distinct and disjoint subset of the entire document to perform its task.

The document passed to each entity can be large, meaning that large amounts of potentially unused data is passed to each endpoint. GZIP is not an option as you would pay for the zip and unzip at each endpoint. In addition, each endpoint may require direct access into the document. If the document was compressed with GZIP, it would make this type of access impossible without first uncompressing it.

To avoid the zip and unzip problem and the bandwidth problem, a binary encoding that represented the data in a more compact fashion could be used. The encoding would also need to allow each endpoint to quickly extract and process a subset of the entire original document.

In addition, the document can be modified along the way. This requires that each endpoint have a way to quickly modify a part of the document, then send it to the next step in the workflow process. Modifying the document in place is desirable because creating a DOM, making the change, and writing it back out would be too costly.

The requirements include that the alternate form of the data be more compact than the original XML. This more compact form must be lossless. This means the alternate form can be converted back into the original XML with no differences. In addition, the creation of the alternate form and conversion back to XML must be efficient such that the entire business process does not take more time than it did with XML. Furthermore, the alternate encoding must allow for efficient direct access into the document such that the entire document does not have to be processed only to access a small subset contained at some specified location.

Editorial note: SPJune 7, 2004
What are the exact requirements of this UC? Is it fragmentation? Passing documents by reference using queries? An alternative serialization may not do any good if the entire document still needs to be loaded in memory, for example.

3.14.5 Alternatives

Editorial note: MCJuly 16, 2004
Some concerns have been raised in the WG pertaining to the use of XInclude described below. This should be resolved before publication.

Usage of XInclude is a possible alternative whereby the document sent only includes the relevant pieces of the original. This may not work however, as point A may send a subset of the document to point B, then point B needs to send the entire document to point C. Point B would have the entire document if point A only sent is a subset.

3.14.6 References

TBD

4 Summary

This section includes observations concerning overlaps and other patterns in use case requirements.

Editorial note: MC4 June, 2004
Perhaps we could have a table or something to indicate groups of related use cases, where the grouping is based on overlapping requirements?

5 References

XBC Properties
XML Binary Characterization Properties (See http://www.w3.org/XML/Binary/Properties/.)
XML 1.0
Extensible Markup Language (XML) 1.0 (See http://www.w3.org/TR/REC-xml/.)
XML 1.1
Extensible Markup Language (XML) 1.1 (See http://www.w3.org/TR/xml11/.)
Extensible 3D (X3D) Graphics
Extensible 3D (X3D) Graphics (See http://www.web3D.org/x3d/.)
Fast Web Services
Fast Web Services (See http://java.sun.com/developer/technicalArticles/WebServices/fastWS/.)
SOAP Performance
Investigating the Limits of SOAP Performance for Scientific Computing (See http://www.extreme.indiana.edu/xgws/papers/soap-hpdc2002/soap-hpdc2002.pdf.)
MTOM
SOAP Message Transmission Optimization Mechanism (See http://www.w3.org/TR/2004/WD-soap12-mtom-20040209/.)
XOP
SOAP Message Transmission Optimization Mechanism (See http://www.w3.org/TR/2004/WD-xop10-20040209/.)
SOAP with Attachments
SOAP Messages with Attachments (See http://www.w3.org/TR/SOAP-attachments.)
WS-I Attachments Profile
WS-I: Attachments Profile Version 1.0 (See http://www.ws-i.org/Profiles/Basic/2003-08/AttachmentsProfile-1.0.pdf.)
DVB
Digital Video Broadcasting (See http://www.dvb.org/.)
TV Anytime
TV Anytime (See http://tv-anytime.org/.)
ARIB
Association of Radio Industries and Businesses (See http://www.arib.or.jp/.)
MPEG-7
MPEG-7 (See http://www.iso.org/iso/en/prods-services/popstds/mpeg.html.)
PDF
PDF Reference, 4th Ed. (See http://partners.adobe.com/asn/acrobat/sdk/public/docs/PDFReference15_v6.pdf .)
SOAP in Real-Time Trading Systems
Evaluating SOAP for High Performance Business Application: Real-Time Trading Systems. (See http://www2003.org/cdrom/papers/alternate/P872/p872-kohlhoff.html.)
MIME
The (See http://www.faqs.org/rfcs/rfc2387.html.)
Mobile SVG
Mobile SVG Profiles: SVG Tiny and SVG Basic (See http://www.w3.org/TR/SVGMobile/.)
Mobile XHTML
XHTML Mobile Profile (See http://www.openmobilealliance.org/tech/affiliates/wap/wap-277-xhtmlmp-20011029-a.pdf.)
SMIL
Synchornized Multimedia Integration Language (SMIL 2.0) (See http://www.w3.org/TR/smil20/.)
G-XML
G-XML (See http://gisclh.dpc.or.jp/gxml/contents-e/.)
MMS
Multimedia Messaging Services (See http://www.3gpp.org/ftp/Specs/html-info/26140.htm.)
WAP WBXML
Binary XML Content Format Specification (See http://www.openmobilealliance.org/tech/affiliates/wap/wap-192-wbxml-20010725-a.pdf.)

A Acknowledgments (Non-Normative)

The editors would like to thank the contributers...

B XML Binary Characterization Use Cases Changes (Non-Normative)

2004-05-19SPDocument Created.