2. Document Object Model Load and Save

Editors: Andy Heninger, IBM

2.1. Load and Save Requirements

2.1. Load and Save Requirements

DOM Level 3 will provide an API for loading XML source documents into a DOM representation and for saving a DOM representation as a XML document.

Some environments, such as the Java platform or COM, have their own ways to persist objects to streams and to restore them. There is no direct relationship between these mechanisms and the DOM load/save mechanism. This specification defines how to serialize documents only to and from XML format.

2.1.1. General Requirements

Requirements that apply to both loading and saving documents.

2.1.1.1. Document Sources

Documents must be able to be parsed from and saved to the following sources:

Input and Output Streams
URIs
Files

Note that Input and Output streams take care of the in memory case. One point of caution is that a stream doesn't allow a base URI to be defined against which all relative URIs in the document are resolved.

2.1.1.2. Content Model Loading

While creating a new document using the DOM API, a mechanism must be provided to specify that the new document uses a pre-existing Content Model and to cause that Content Model to be loaded.

Note that while DOM Level 2 creation can specify a Content Model when creating a document (public and system IDs for the external subset, and a string for the internal subset), DOM Level 2 implementations do not process the Content Model's content. For DOM Level 3, the Content Model's content must be be read.

2.1.1.3. Content Model Reuse

When processing a series of documents, all of which use the same Content Model, implementations should be able to reuse the already parsed and load ed Content Model rather than reparsing it again for each new document.

This feature may not have an explicit DOM API associated with it, but it does require that nothing in this section, or the Content Model section, of this specification block it or make it difficult to implement.

2.1.1.4. Entity Resolution

Some means is required to allow applications to map public and system IDs to the correct document. This facility should provide sufficient capability to allow the implementation of catalogs, but providing catalogs themselves is not a requirement. In addition XML Base needs to be addressed.

2.1.1.5. Error Reporting

Loading a document can cause the generation of errors including:

I/O Errors, such as the inability to find or open the specified document.
XML well formedness errors.
Validity errors

Saving a document can cause the generation of errors including:

I/O Errors, such as the inability to write to a specified stream, url, or file.
Improper constructs, such as '--' in comments, in the DOM that cannot be represented as well formed XML.

This section, as well as the DOM Level 3 Content Model section should use a common error reporting mechanism. Well-formedness and validity checking are in the domain of the Content Model section, even though they may be commonly generated in response to an application asking that a document be loaded.

2.1.2. Load Requirements

The following requirements apply to loading documents.

2.1.2.1. Parser Properties and Options

Parsers may have properties or options that can be set by applications. Examples include:

Expansion of entity references.
Creation of entity ref nodes.
Handling of white space in element content.
Enabling of namespace handling.
Enabling of content model validation.

A mechanism to set properties, query the state of properties, and to query the set of properties supported by a particular DOM implementation is required.

2.1.3. XML Writer Requirements

The fundamental requirement is to write a DOM document as XML source. All information to be serialized should be available via the normal DOM API.

2.1.3.1. XML Writer Properties and Options

There are several options that can be defined when saving an XML document. Some of these are:

Saving to Canonical XML format.
Pretty Printing.
Specify the encoding in which a document is written.
How and when to use character entities.
Namespace prefix handling.
Saving of Content Models.
Handling of external entities.

2.1.3.2. Content Model Saving

Requirement from the Content Model group.

2.1.4. Other Items Under Consideration

The following items are not committed to, but are under consideration. Public feedback on these items is especially requested.

2.1.4.1. Incremental and/or Concurrent Parsing

Provide the ability for a thread that requested the loading of a document to continue execution without blocking while the document is being loaded. This would require some sort of notification or completion event when the loading process was done.

Provide the ability to examine the partial DOM representation before it has been fully loaded.

In one form, a document may be loaded asynchronously while a DOM based application is accessing the document. In another form, the application may explicitly ask for the next incremental portion of a document to be loaded.

2.1.4.2. Filtered Save

Provide the capability to write out only a part of a document. May be able to leverage TreeWalkers, or the Filters associated with TreeWalkers, or Ranges as a means of specifying the portion of the document to be written.

2.1.4.3. Document Fragments

Document fragments, as specified by the XML Fragment specification, should be able to be loaded. This is useful to applications that only need to process some part of a large document. Because the DOM is typically implemented as an in-memory representation of a document, fully loading large documents can require large amounts of memory.

XPath should also be considered as a way to identify XML Document fragments to load.

2.1.4.4. Document Fragments in Context of Existing DOM

Document fragments, as specified by the XML Fragment specification, should be able to be loaded into the context of an existing document at a point specified by a node position, or perhaps a range. This is a separate feature than simply loading document fragments as a new Node.