Position Paper for the W3C Workshop on Binary Interchange of XML Information Item Sets

Position Paper for the W3C Workshop on Binary Interchange of XML Information Item Sets.
Submitted by Margaret Green, Ontonet.

1. What work has your organization done in this area?
Ontonet has implemented the XML Infoset, the DOM2 Core as a view onto that Infoset implementation, and a partial implementation of XQuery that queries the Infoset and returns sequences of DOM nodes as its results.

Fulfilling both with an eye to efficient exchange of document infosets is a design criterion. The next version will include PSVI and Infoset Exchange. We are actively working on the binary exchange know. We are researching compression now.

2. What goals do you believe are most important in this area? Reducing bandwidth usage is the primary goal. Eliminating redundant parsing is a secondary goal. Doing this without creating another logical model is our design goal.

3. What sort of documents have you studied the most?
We test documents of varied size, with and without namespaces, DTD and XML Schema.

4. What sorts of applications did you have in mind?
Processing by intermediaries is especially important -- SOAP intermediaries. XML Database replication and XQuery results are important uses.

5. If you implemented something, how did you ensure that internationalization and accessibility were not compromised?
The SAX parsing is augmented with pre-processing to obtain the prolog from the source. Success at this is limited; some input streams don’t enable this. Our implementation platform is Java. To the extent Java succeeds with international character sets, so we succeed.

6. How does your proposal differ from using gzip on raw XML?
Ontonet wants to exchange the XML Infoset. We intend to optimize our implementation to support compressible structure, namespace, and content section by learning lessons from XMILL.

7. Does your solution work with any XML? How is it affected by choice of Schema language?
Our next version includes Post Schema Validation Infoset (PSVI) information. Currently we augment SAX parsing with pre-processing to preserve in-line DTD declarations. If Relax NG can be exposed with SAX2 parsing we could incorporate it.

8. How important to you are random access within a document, dynamic update and streaming, and how do you see a binary format as impacting these issues?
Our database supports random access to information items, transactions, and update. The binary format is for exchange. The format should enable stream processing by intermediaries.