Position Paper for the W3C Workshop on Binary Interchange of XML Information
Submitted by Margaret Green, Ontonet.
1. What work has your organization done in this area?
Ontonet has implemented the XML Infoset, the DOM2 Core as a view onto that
Infoset implementation, and a partial implementation of XQuery that queries
the Infoset and returns sequences of DOM nodes as its results.
Fulfilling both with an eye to efficient exchange of document infosets is
a design criterion. The next version will include PSVI and Infoset Exchange.
We are actively working on the binary exchange know. We are researching compression
2. What goals do you believe are most important in this area? Reducing
bandwidth usage is the primary goal. Eliminating redundant parsing is a secondary
goal. Doing this without creating another logical model is our design goal.
3. What sort of documents have you studied the most?
We test documents of varied size, with and without namespaces, DTD and XML
4. What sorts of applications did you have in mind?
Processing by intermediaries is especially important -- SOAP intermediaries.
XML Database replication and XQuery results are important uses.
5. If you implemented something, how did you ensure that internationalization
and accessibility were not compromised?
The SAX parsing is augmented with pre-processing to obtain the prolog from
the source. Success at this is limited; some input streams don’t enable this.
Our implementation platform is Java. To the extent Java succeeds with international
character sets, so we succeed.
6. How does your proposal differ from using gzip on raw XML?
Ontonet wants to exchange the XML Infoset. We intend to optimize our implementation
to support compressible structure, namespace, and content section by learning
lessons from XMILL.
7. Does your solution work with any XML? How is it affected by choice
of Schema language?
Our next version includes Post Schema Validation Infoset (PSVI) information.
Currently we augment SAX parsing with pre-processing to preserve in-line
DTD declarations. If Relax NG can be exposed with SAX2 parsing we could incorporate
8. How important to you are random access within a document, dynamic
update and streaming, and how do you see a binary format as impacting these
Our database supports random access to information items, transactions, and
update. The binary format is for exchange. The format should enable stream
processing by intermediaries.