01 November, 2000

2. Document Object Model Load and Save

Editors
Andy Heninger, IBM

Table of contents

2.1. Load and Save Requirements

DOM Level 3 will provide an API for loading XML source documents into a DOM representation and for saving a DOM representation as a XML document.

Some environments, such as the Java platform or COM, have their own ways to persist objects to streams and to restore them. There is no direct relationship between these mechanisms and the DOM load/save mechanism. This specification defines how to serialize documents only to and from XML format.

2.1.1. General Requirements

Requirements that apply to both loading and saving documents.

2.1.1.1. Document Sources

Documents must be able to be parsed from and saved to the following sources:

  • Input and Output Streams
  • URIs
  • Files

Note that Input and Output streams take care of the in memory case. One point of caution is that a stream doesn't allow a base URI to be defined against which all relative URIs in the document are resolved.

2.1.1.2. Content Model Loading

While creating a new document using the DOM API, a mechanism must be provided to specify that the new document uses a pre-existing Content Model and to cause that Content Model to be loaded.

Note that while DOM Level 2 creation can specify a Content Model when creating a document (public and system IDs for the external subset, and a string for the subset), DOM Level 2 implementations do not process the Content Model's content. For DOM Level 3, the Content Model's content must be read.

2.1.1.3. Content Model Reuse

When processing a series of documents, all of which use the same Content Model, implementations should be able to reuse the already parsed and loaded Content Model rather than reparsing it again for each new document.

This feature may not have an explicit DOM API associated with it, but it does require that nothing in this section, or the Content Model section, of this specification block it or make it difficult to implement.

2.1.1.4. Entity Resolution

Some means is required to allow applications to map public and system IDs to the correct document. This facility should provide sufficient capability to allow the implementation of catalogs, but providing catalogs themselves is not a requirement. In addition XML Base needs to be addressed.

2.1.1.5. Error Reporting

Loading a document can cause the generation of errors including:

  • I/O Errors, such as the inability to find or open the specified document.
    XML well formedness errors.
    Validity errors

Saving a document can cause the generation of errors including:

  • I/O Errors, such as the inability to write to a specified stream, URL, or file.
    Improper constructs, such as '--' in comments, in the DOM that cannot be represented as well formed XML.

This section, as well as the DOM Level 3 Content Model section should use a common error reporting mechanism. Well-formedness and validity checking are in the domain of the Content Model section, even though they may be commonly generated in response to an application asking that a document be loaded.

2.1.2. Load Requirements

The following requirements apply to loading documents.

2.1.2.1. Parser Properties and Options

Parsers may have properties or options that can be set by applications. Examples include:

  • Expansion of entity references.
  • Creation of entity ref nodes.
  • Handling of white space in element content.
  • Enabling of namespace handling.
  • Enabling of content model validation.

A mechanism to set properties, query the state of properties, and to query the set of properties supported by a particular DOM implementation is required.

2.1.3. XML Writer Requirements

The fundamental requirement is to write a DOM document as XML source. All information to be serialized should be available via the normal DOM API.

2.1.3.1. XML Writer Properties and Options

There are several options that can be defined when saving an XML document. Some of these are:

  • Saving to Canonical XML format.
  • Pretty Printing.
  • Specify the encoding in which a document is written.
  • How and when to use character entities.
  • Namespace prefix handling.
  • Saving of Content Models.
  • Handling of external entities.

2.1.3.2. Content Model Saving

Requirement from the Content Model group.

2.1.4. Other Items Under Consideration

The following items are not committed to, but are under consideration. Public feedback on these items is especially requested.

2.1.4.1. Incremental and/or Concurrent Parsing

Provide the ability for a thread that requested the loading of a document to continue execution without blocking while the document is being loaded. This would require some sort of notification or completion event when the loading process was done.

Provide the ability to examine the partial DOM representation before it has been fully loaded.

In one form, a document may be loaded asynchronously while a DOM based application is accessing the document. In another form, the application may explicitly ask for the next incremental portion of a document to be loaded.

2.1.4.2. Filtered Save

Provide the capability to write out only a part of a document. May be able to leverage TreeWalkers, or the Filters associated with TreeWalkers, or Ranges as a means of specifying the portion of the document to be written.

2.1.4.3. Document Fragments

Document fragments, as specified by the XML Fragment specification, should be able to be loaded. This is useful to applications that only need to process some part of a large document. Because the DOM is typically implemented as an in-memory representation of a document, fully loading large documents can require large amounts of memory.

XPath should also be considered as a way to identify XML Document fragments to load.

2.1.4.4. Document Fragments in Context of Existing DOM

Document fragments, as specified by the XML Fragment specification, should be able to be loaded into the context of an existing document at a point specified by a node position, or perhaps a range. This is a separate feature than simply loading document fragments as a new Node.

2.2. Issue List

2.2.1. Open Issues

Issue LS-Issue-10:
Error Reporting. Loading will be reporting well-formedness and validation errors, just like CM. A common error reporting mechanism needs to be developed.
Issue LS-Issue-12:
Definition of "Non-validating". Exactly how much processing is done by "non-validing" parsers is not fully defined by the XML specification. In particular, they are not required to read any external entities, but are not prohibited from doing so.
Another common user request: a mode that completely ignores DTDs, both and external. Such a parser would not conform to XML 1.0, however.
For the documents produced by a non-validating load to be the same, we need to tie down exactly what processing must be done. The XML Core WG also has question as an open issue .
Some discussion is at http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JanMar/0192.html
Here is proposal: Have three classes of parsers
  • Minimal. No external entities of any type are accessed. DTD subset is processes normally, as required by XML 1.0, including all entity definitions it contains.
  • Non-Validating. All external entities are read. Does everything except validation.
  • Validating. As defined by XML 1.0 rec.

Tentative resolution: use the options from SAX2. These provide separate flags for validation, reading of external general entities and reading of external parameter entities.
Issue LS-Issue-14:
Should there be separate DOM modules for browser or scripting style loading (document.load("whatever")) and server style parsers? It's probably easy for the server style parsers to implement the browser style interface, but the reverse may not be true.
Issue LS-Issue-16:
Loading and saving of content models - DTDs or Schemas - outside of the context of a document is not addressed.
Issue LS-Issue-17:
Loading while validating using an already loaded content model is not addressed. Applications should be able to load a content model (issue 16), and then repeatedly reuse it during the loading of additional documents.
Issue LS-Issue-20:
Action from September f2f to "add issues raised by schema discussion. What were these?
Issue LS-Issue-22:
What do the bindings for things like InputStream look like in ECMA Script?
Issue LS-Issue-27:
How is validation handled when there are multiple possible content models associated with the document? How is one selected?
Issue LS-Issue-30:
Possible additional parser features - option to not create CDATA nodes, and to merge CDATA contents with adjacent TEXT nodes if they exist. Otherwise just create a TEXT node.
Option to omit Comments.

2.2.2. Resolved Issues

Issue LS-Issue-1:
Should these methods be in a new interface, or should they be added to the existing DOMImplementation Interface? I think that adding them to the existing interface is cleaner, because it helps avoid an explosion of new interfaces.
The methods are in a separate interface in this description for convenience in preparing the doc, so that I don't need to edit Core to add the methods. (The same argument could perhaps be made for implementations.)
Resolution: The methods are in a separate DOMImplementationLS interface. Because Load/Save is an optional module, we don't want to add its to the core DOMImplementation interface.
Issue LS-Issue-2:
SAX handles the setting of parser attributes differently. Rather than having distinct getters and setters for each attribute, it has a generic setter and getter of named properties, where properties are specified by a URL. This has an advantage in that implementations do not need to extend the interface when providing additional attributes.
If we choose to use strings, their syntax needs to be chosen. URIs would make sense, except for the fact that these are just names that do not refer to any resources. Dereferencing them would be meaningless. Yet the direction of the W3C is that all URIs must be dereferencable, and refer to something on the web.
Resolution: Use strings for properties. Use Java package name syntax for the identifying names. The question was revisited at the July f2f, with the same conclusion. But some discussion of using URLs continues.
This issue was revisited once again at the 9/2000 meeting. Now all DOM properties or features will be short, descriptive names, and we will recommend that all vendor-specific extensions be prefixed to avoid collisions, but will not make specific recommendations for the syntax of the prefix.
Issue LS-Issue-3:
It's not obvious what name to choose for the parser interface. Taking any of the names already in use by parser implementations would create problems when trying to support both the new API and the existing old API. That leaves out DocumentBuilder (Sun) and DOMParser (Xerces).
Resolution: This is issue really just a comment. The "resolution" is in the names appearing in the API.
Issue LS-Issue-4:
Question: should ResolveEntity pass a baseURI string back to the application, in addition to the publicId, systemId, and/or stream? Particularly in the case of an input stream.
Resolution: No. Sax2 explicitly says that the system ID URI must be fully resolved before passing it out to the entity resolve. We will follow SAX's lead on this unless some additional use case surfaces. This is from the 9/2000 f2f, and reverses an earlier decision.
Issue LS-Issue-5:
When parsing a document that contains errors, should the whole document be decreed unusable, or should we say that portions prior to the point where the error was detected are OK?
Resolution: In the case of errors in the XML source, what, if any, document is returned is implementation dependent.
Issue LS-Issue-6:
The relationship between SAXExceptions and DOM exceptions seems confusing.
Resolution: This issue goes away because we are no longer using SAX. Any exceptions will be DOM Exceptions.
Issue LS-Issue-7:
Question: In the original Java definition, are the strings returned from the methods SAXException.toString() and SAXException.getMessage() always the same? If not, we need to add another attribute.
Resolution: No longer an issue because we are no longer using SAX.
Issue LS-Issue-8:
JAXP defines a mechanism, based on Java system properties, by which the Document Builder Factory locates the specific parser implementation to be used. This ability to redirect to different parsers is a key feature of JAXP. How this redirection works in the context of this design may be something that needs to be defined separately for each language binding.
This question was discussed at the July f2f, without resolution. Agreed that the feature is not critical to the rest of the API, and can be postponed.
Resolution: The issue is moving to core, where it is part of the bigger question of where does the DOM implementation come from, and how do multiple implementations coexist. Allowing separate, or mix-and-match, specification of the parser and the rest of the DOM is not generally practical because parsers generally have some degree of private knowledge about their DOMs.
Issue LS-Issue-9:
The use of interfaces from SAX2 raises some questions. The Java bindings for these interfaces need to be exactly the SAX2 definitions, including the original org.xml.sax package name.
The IDL presented here for these interfaces is an attempt to map the Java into IDL, but it will certainly not round-trip accurately - Java bindings generated from the IDL will not match the original Java.
The reasons for using the SAX interfaces are that they are well designed, widely implemented and used, and provide what is needed. Designing something new would create confusion for application developers (which should be used?) and make extra work for implementers of the DOM, most of whom probably already provide SAX, all for no real gain.
Resolution: Problem is gone. We are not using SAX2. The design will borrow features and concepts from SAX2 when it makes sense to do so.
Issue LS-Issue-11:
Another Error Reporting Question. We decided at the June f2f that validity errors should not be exceptions. This means that a document load operation could encounter multiple errors. Should these be collected and delivered as some sort of collection at the (otherwise) successful completion of the load, or should there be some sort of callback? Callbacks are harder for applications to deal with.
Resolution: Provide a callback mechanism. Provide a default error handler that throws an exception and stops further processing. From July f2f.
Issue LS-Issue-13:
Use of System or Language specific types for Input and Output
Loading and Saving requires that one of the possible sources or destinations of the XML data be some sort of stream that can be used with io streams or memory buffers, or anything else that might take or supply data. The type will vary, depending on the language binding.
The question is, what should be put into the IDL interfaces for these? Should we define an XML stream to abstract out the dependency, or use system classes directly in the bindings?
Resolution: Define IDL types for use in the rest of the interface definitions. These types will be mapped directly to system types for each language binding
Issue LS-Issue-15:
System Exceptions. Loading involves file opens and reads, and these can result in a variety of system errors that may already have associated system exceptions. Should these system exceptions pass through as is, or should they be some how wrapped in DOMExceptions, or should there be a parallel set DOM Exceptions, or what?
Resolution: Introduce a new DOMSystemException to standardize the reporting of common I/O errors across different DOM environments. Let it wrap an underlying system exception or error code when appropriate. To be defined in the common ErrorReporting module, to be shared with ContentModel.
Issue LS-Issue-18:
For the list of parser properties, which must all implementations recognize, which settings must all implementations support, and which are optional?
Resolution: Done
Issue LS-Issue-19:
DOMOutputStream: should this be an interface with methods, or just an opaque type that maps onto an appropriate binding-specific stream type?
If we specify an actual interface with methods, applications can implement it to wrap any arbitrary destination that they may have. If we go with the system type it's simpler to output to that type of stream, but harder otherwise.
Resolution: Opaque.
Issue LS-Issue-21:
Define exceptions. A DOMSystemException needs to be defined as part of the error handling module that is to be shared with CM. Common I/O type errors need to be defined for it, so that they can be reported in a uniform way. A way to imbed errors or exceptions from the OS or language environment is needed, to provide full information to applications that want it.
Resolution: Duplicate of issue #15
Issue LS-Issue-23:
To Do: Add a method or methods to DOMBuilder that will provide information about a parser feature - is the name recognized, which (boolean) values are supported - without throwing exceptions.
Resolution: Done. Added canSetFeature.
Issue LS-Issue-24:
Clearly identify which of the parser properties must be recognized, and which of their settings must be supported by all conforming implementations.
Resolution: Done. All must be recognized.
Issue LS-Issue-25:
How does the validation property work in SAX, and how should it work for us? The default value in SAX2 is "true". Non-validating parsers only support a value of false. Does this mean that the default depends on the parser, or that some sort of an error happens if a parse is attempted before resetting the property, or what?
The same question applies to the External Entities properties too.
Resolution: Make the default value for the validation property be false.
Issue LS-Issue-26:
Do we want to rename the "auto-validation" property to "validate-if-cm"? Proposed at f2f. Resolution unclear.
Resolution: Changed the name to "validate-if-cm".
Issue LS-Issue-29:
Should all properties except namespaces default to false? Discussed at f2f. I'm not so sure now. Some of the properties have somewhat non-standard behavior when false - leaving out ER nodes or whitespace, for example - and support of false will probably not even be required.
Resolution: Not all properties should default to false. But validation should.
Issue LS-Issue-28:
To do: add new parser property "createEntityNodes". default is true. Illegal for it to be false and createEntityReferenceNodes to be true.
Is this really what we want?
Resolution: new feature added.

2.3. Interfaces

This section defines an API for loading (parsing) XML source documents into a DOM representation and for saving (serializing) a DOM representation as an XML document.

The proposal for loading is influenced by Sun's JAXP API for XML Parsing in Java, http://java.sun.com/xml/download.html, and by SAX2, available at http://www.megginson.com/SAX/index.html

2.3.1. Interface Summary

Here is a list of each of the interfaces involved with the Loading and Saving XML documents.

  • DOMImplementationLS -- A new DOMImplementation interface that provides the factory methods for creating the objects required for loading and saving.
  • DOMBuilder -- A parser interface.
  • DOMInputSource -- Encapsulate information about the source of the XML to be loaded.
  • DOMEntityResolver -- During loading, provides a way for applications to redirect references to external entities.
  • DOMBuilderFilter -- Provide the ability to examine and optionally remove Element nodes as they are being processed durning the parsing of a document.
  • DOMFormatter -- Provides for the actual formatting of DOM data into the output format.
  • DOMWriter -- An interface for writing out DOM documents. The form in which the data from the DOM will be written is controlled by a DOMFormatter, and the destination for the data is a DOMOutputStream.

2.3.2. Interfaces

Interface DOMImplementationLS

DOMImplementationLS contains the factory methods for creating objects implementing the DOMBuilder (parser) and DOMWriter interfaces.


IDL Definition
interface DOMImplementationLS {
  DOMBuilder         createDOMBuilder();
  DOMWriter          createDOMWriter();
};

Methods
createDOMBuilder
Create a new DOMBuilder. The newly constructed parser may then be configured by means of its setFeature() method, and used to parse documents by means of its parse() method.
Return Value

DOMBuilder

The newly created parser object.

No Parameters
No Exceptions
createDOMWriter
Create a new DOMWriter object. DOMWriters are used to serialize a DOM tree back into source XML form.
Return Value

DOMWriter

The newly created DOMWriter object.

No Parameters
No Exceptions
Interface DOMBuilder

A parser interface.

DOMBuilder provides an API for parsing XML documents and building the corresponding DOM document tree. A DOMBuilder instance is obtained from the DOMImplementationLS interface by invoking its createDOMBuilder()method.

DOMBuilders have a number of named properties that can be queried or set. Here is a list of properties that must be recognized by all implementations.

  • namespaces
    true: perform Namespace processing.
    false: do not perform name space processing.
    default: true.
    supported values: true: required; false: optional
  • namespace-declarations
    true: include namespace declarations (xmlns attributes) in the DOM document.
    false: discard all namespace declarations. In either case, namespace prefixes will be retained.
    default: true.
    supported values: true: required; false: optional
  • validation
    true: report validation errors (setting true also will force the external-general-entities and external-parameter-entities properties to be set true.) Also note that the validate-if-cm feature will alter the validation behavior when this feature is set true.
    false: do not report validation errors.
    default: false.
    supported values: true: optional; false: required
  • external-general-entities
    true: include all external general (text) entities.
    false: do not include external general entities.
    default: true.
    supported values: true: required; false: optional
  • external-parameter-entities
    true: include all external parameter entities.
    false: do not include external parameter entities.
    default: true.
    supported values: true: required; false: optional
  • validate-if-cm
    true: when both this feature and validation are true, enable validation only when the document being processed has a content model. Documents without content models are parsed without validation.
    false: the validation feature alone controls whether the document is checked for validity. Documents without content models are not valid.
    default: false.
    supported values: true: optional; false: required
  • create-entity-ref-nodes
    true: create entity reference nodes in the DOM document. Setting this value true will also set create-entity-nodes to be true
    false: omit all entity reference nodes from the DOM document, putting the entity expansions directly in their place.
    default: true.
    supported values: true: required; false: optional
  • create-entity-nodes
    true: create entity nodes in the DOM document.
    false: omit all entity nodes from the DOM document. Setting this value false will also set create-entity-ref-nodes false.
    default: true.
    supported values: true: required; false: optional
  • white-space-in-element-content
    true: include white space in element content in the DOM document. This is sometimes referred to as ignorable white space
    false: omit said white space. Note that white space in element content will only be omitted if it can be identified as such, and not all parsers may be able to do so.
    default: true.
    supported values: true: required; false: optional

IDL Definition
interface DOMBuilder {
           attribute DOMEntityResolver  entityResolver;
           attribute DOMErrorHandler  errorHandler;
           attribute DOMBuilderFilter  filter;
  void               setFeature(in DOMString name, 
                                in boolean state)
                                        raises(DOMException);
  boolean            supportsFeature(in DOMString name);
  boolean            canSetFeature(in DOMString name, 
                                   in boolean state);
  boolean            getFeature(in DOMString name)
                                        raises(DOMException);
  Document           parseURI(in DOMString uri)
                                        raises(DOMException, 
                                               DOMSystemException);
  Document           parseDOMInputSource(in DOMInputSource is)
                                        raises(DOMException, 
                                               DOMSystemException);
};

Attributes
entityResolver of type DOMEntityResolver
If a DOMEntityResolver has been specified, each time a reference to an external entity is encountered the DOMBuilder will pass the public and system IDs to the entity resolver, which can then specify the actual source of the entity.
errorHandler of type DOMErrorHandler
In the event that an error is encountered in the XML document being parsed, the DOMDcoumentBuilder will call back to the errorHandler with the error information.

Note: The DOMErrorHandler interface is being developed separately, in conjunction with the design of the content model and validation module.

filter of type DOMBuilderFilter
When the application provides a filter, the parser will call out to the filter at the completion of the construction of each element node. The filter implementation can choose to remove the element from the document being constructed or to terminate the parse early.
Methods
canSetFeature
query whether setting a feature is supported.
The feature name has the same form as a DOM hasFeature string.
It is possible for a DOMBuilder to recognize a feature name but to be unable to set its value.
Parameters
name of type DOMString
The feature name, which is a DOM has-feature style string.
state of type boolean
The requested state of the feature (true or false).
Return Value

boolean

true if the feature could be successfully set to the specified value, or false if the feature is not recognized or the requested value is not supported. The value of the feature itself is not changed.

No Exceptions
getFeature
Look up the value of a feature.
The feature name has the same form as a DOM hasFeature string
Parameters
name of type DOMString
The feature name, which is a string with DOM has-feature syntax.
Return Value

boolean

The current state of the feature (true or false).

Exceptions

DOMException

Raise a NOT_FOUND_ERR When the DOMBuilder does not recognize the feature name.

parseDOMInputSource
Parse an XML document from a location identified by an DOMInputSource.
Parameters
is of type DOMInputSource
The DOMInputSource from which the source document is to be read.
Return Value

Document

The newly created and populatedDocument.

Exceptions

DOMException

Exceptions raised by parseDOMInputSource() originate with the installed ErrorHandler, and thus depend on the implementation of the DOMErrorHandler interfaces. The default ErrorHandlers will raise a DOMException if any form of XML validation or well formedness error or warning occurs during the parse, but application defined errorHandlers are not required to do so.

DOMSystemException

Exceptions raised by parseDOMInputSource() originate with the installed ErrorHandler, and thus depend on the implementation of the DOMErrorHandler interfaces. The default ErrorHandlers will raise a DOMSystemException if any form I/O or other system error occurs during the parse, but application defined ErrorHandlers are not required to do so.

parseURI
Parse an XML document from a location identified by an URI.
Parameters
uri of type DOMString
The location of the XML document to be read.
Return Value

Document

The newly created and populatedDocument.

Exceptions

DOMException

Exceptions raised by parseURI() originate with the installed ErrorHandler, and thus depend on the implementation of the DOMErrorHandler interfaces. The default error handlers will raise a DOMException if any form of XML validation or well formedness error or warning occurs during the parse, but application defined errorHandlers are not required to do so.

DOMSystemException

Exceptions raised by parseURI() originate with the installed ErrorHandler, and thus depend on the implementation of the DOMErrorHandler interfaces. The default error handlers will raise a DOMSystemException if any form I/O or other system error occurs during the parse, but application defined error handlers are not required to do so.

setFeature
Set the state of a feature.
The feature name has the same form as a DOM hasFeature string.
It is possible for a DOMBuilder to recognize a feature name but to be unable to set its value.
Parameters
name of type DOMString
The feature name, which is a DOM has-feature style string.
state of type boolean
The requested state of the feature (true or false).
Exceptions

DOMException

Raise a NOT_SUPPORTED_ERR exception When the DOMBuilder recognizes the feature name but cannot set the requested value.

Raise a NOT_FOUND_ERR When the DOMBuilder does not recognize the feature name.

No Return Value
supportsFeature
query whether the DOMBuilder recognizes a feature name.
The feature name has the same form as a DOM hasFeature string.
It is possible for a DOMBuilder to recognize a feature name but to be unable to set its value. For example, a non-validating parser would recognize the feature "validation", would report that its value was false, and would raise an exception if an attempt was made to enable validation by setting the feature to true.
Parameters
name of type DOMString
The feature name, which has the same syntax as a DOM has-feature string.
Return Value

boolean

true if the feature name is recognized by the DOMBuilder. False if the feature name is not recognized.

No Exceptions
Interface DOMInputSource

This interface represents a single input source for an XML entity.

This interface allows an application to encapsulate information about an input source in a single object, which may include a public identifier, a system identifier, a byte stream (possibly with a specified encoding), and/or a character stream.

The exact definitions of a byte stream and a character stream are binding dependent.

There are two places that the application will deliver this input source to the parser: as the argument to the parseDOMInputSource method, or as the return value of the DOMEntityResolver.resolveEntity method.

The DOMBuilder will use the DOMInputSource object to determine how to read XML input. If there is a character stream available, the parser will read that stream directly; if not, the parser will use a byte stream, if available; if neither a character stream nor a byte stream is available, the parser will attempt to open a URI connection to the resource identified by the system identifier.

An DOMInputSource object belongs to the application: the parser shall never modify it in any way (it may modify a copy if necessary).


IDL Definition
interface DOMInputSource {
           attribute DOMInputStream   byteStream;
           attribute DOMReader        characterStream;
           attribute DOMString        encoding;
           attribute DOMString        publicId;
           attribute DOMString        systemId;
};

Attributes
byteStream of type DOMInputStream
An attribute of a language-binding dependent type that represents a stream of bytes.
The parser will ignore this if there is also a character stream specified, but it will use a byte stream in preference to opening a URI connection itself.
If the application knows the character encoding of the byte stream, it should set the encoding property. Setting the encoding in this way will override any encoding specified in the XML declaration itself.
characterStream of type DOMReader
An attribute of a language-binding dependent type that represents a stream of 16 bit values (utf-16 encoded characters).
If a character stream is specified, the parser will ignore any byte stream and will not attempt to open a URI connection to the system identifier.
encoding of type DOMString
The character encoding, if known. The encoding must be a string acceptable for an XML encoding declaration (see section 4.3.3 of the XML 1.0 recommendation).
This attribute has no effect when the application provides a character stream. For other sources of input, any encoding specified by means of this attribute will override that from the XML encoding declaration itself.
publicId of type DOMString
The public identifier for this input source. The public identifier is always optional: if the application writer includes one, it will be provided as part of the location information.
systemId of type DOMString
The system identifier for this input source. The system identifier is optional if there is a byte stream or a character stream, but it is still useful to provide one, since the application can use it to resolve relative URIs and can include it in error messages and warnings (the parser will attempt to open a connection to the URI only if there is no byte stream or character stream specified).
If the application knows the character encoding of the object pointed to by the system identifier, it can register the encoding by setting the encoding attribute.
If the system ID is a URL, it must be fully resolved.
Interface DOMEntityResolver

DOMEntityResolver Provides a way for applications to redirect references to external entities.

Applications needing to implement customized handling for external entities must implement this interface and register their implementation by setting the entityResolver property of the DOMBuilder.

The DOMBuilder will then allow the application to intercept any external entities (including the external DTD subset and external parameter entities) before including them.

Many DOM applications will not need to implement this interface, but it will be especially useful for applications that build XML documents from databases or other specialized input sources, or for applications that use URI types other than URLs.

DOMEtityResolver is based on the SAX2 EntityResolver interface, described at http://www.megginson.com/SAX/Java/javadoc/org/xml/sax/EntityResolver.html


IDL Definition
interface DOMEntityResolver {
  DOMInputSource     resolveEntity(in DOMString publicId, 
                                   in DOMString systemId )
                                        raises(DOMSystemException);
};

Methods
resolveEntity
Allow the application to resolve external entities.
The DOMBuilder will call this method before opening any external entity except the top-level document entity (including the external DTD subset, external entities referenced within the DTD, and external entities referenced within the document element); the application may request that the DOMBuilder resolve the entity itself, that it use an alternative URI, or that it use an entirely different input source.
Application writers can use this method to redirect external system identifiers to secure and/or local URIs, to look up public identifiers in a catalogue, or to read an entity from a database or other input source (including, for example, a dialog box).
If the system identifier is a URL, the DOMBuilder must resolve it fully before reporting it to the application through this interface.

Note: See issue #4. An alternative would be to pass the URL out without resolving it, and to provide a base as an additional parameter. SAX resolves URLs first, and does not provide a base.

Parameters
publicId of type DOMString
The public identifier of the external entity being referenced, or null if none was supplied.
systemId of type DOMString
The system identifier of the external entity being referenced.
Return Value

DOMInputSource

A DOMInputSource object describing the new input source, or null to request that the parser open a regular URI connection to the system identifier.

Exceptions

DOMSystemException

Any DOMSystemException, possibly wrapping another exception.

Interface DOMBuilderFilter

DOMBuilderFilters provide applications the ability to examine Element nodes as they are being constructed during a parse. As each elements is examined, it may be modified or removed, or the entire parse may be terminated early.


IDL Definition
interface DOMBuilderFilter {
  boolean            endElement(in Element element);
};

Methods
endElement
This method will be called by the parser at the completion of the parse of each element. The element node will exist and be complete, as will all of its children, and their children, recursively. The element's parent node will also exist, although that node may be incomplete, as it may have additional children that have not yet been parsed.
From within this method, the new node may be freely modified - children may be added or removed, text nodes modified, etc. This node may also be removed from its parent node, which will prevent it from appearing in the final document at the completion of the parse. Aside from this one operation on the node's parent, the state of the rest of the document outside of this node is not defined, and the affect of any attempt to navigate to or modify any other part of the document is undefined.
For validating parsers, the checks are made on the original document, before any modification by the filter. No validity checks are made on any document modifications made by the filter.
Parameters
element of type Element
The newly constructed element. At the time this method is called, the element is complete - it has all of its children (and their children, recursively) and attributes, and is attached as a child to its parent.
Return Value

boolean

return true

No Exceptions
Interface DOMWriter

DOMWriter provides the API that an application will use when serializing (writing) a DOM document out in the form of a source document.

Use of a DOMWriter requires two other objects be supplied: a DOMFormatter, which defines the output format in which the document will be expressed, and a DOMOutputStream, which defines where the output will go.


IDL Definition
interface DOMWriter {
           attribute DOMFormatter     formatter;
  void               writeNode(in DOMOutputStream destination, 
                               in Node node)
                                        raises(DOMSystemException);
  void               writeTreeWalker(in DOMOutputStream destination, 
                                     in TreeWalker tree)
                                        raises(DOMSystemException);
  void               writeString(in DOMOutputStream destination, 
                                 in DOMString aString)
                                        raises(DOMSystemException);
};

Attributes
formatter of type DOMFormatter
The formatter defines the output format that will be produced when serializing using a DOMWriter. For now, only an XML formatter is defined, but others, such as an HTML formatter or an arbitrary user-supplied formatter, could be used.
formatter defaults to an XML formatter, meaning that applications do not need to explicitly set this attribute before using a DOMWriter.
Methods
writeNode
Write out the specified node, and, recursively, any children of the node. The format of the output depends on the formatter and the node type.
Parameters
destination of type DOMOutputStream
The destination for the data to be written.
node of type Node
The root node of the tree of nodes to be written.
Exceptions

DOMSystemException

This exception will be raised in response to any sort of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception.

No Return Value
writeString
Write out the specified DOMString.
Parameters
destination of type DOMOutputStream
The destination for the data to be written.
aString of type DOMString
The string to be written.
Exceptions

DOMSystemException

This exception will be raised in response to any sort of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception.

No Return Value
writeTreeWalker
Write out the tree of nodes selected by the specified TreeWalker.
Parameters
destination of type DOMOutputStream
The destination for the data to be written.
tree of type TreeWalker
The tree of nodes to be written.
Exceptions

DOMSystemException

This exception will be raised in response to any sort of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception.

No Return Value
Interface DOMFormatter

DOMFormatter defines the interface through which the application controls the format in which a document will be written.

Three options are available for the general appearance of the formatted output: As-is, canonical and reformatted.

  • As-is formatting leaves all "white space in element content" and new-lines unchanged. If the DOM document originated as XML source, and if all white space was retained, this option will come the closest to recovering the format of the original document. (There may still be differences due to normalization of attribute values and new-line characters or the handling of character references.)
  • Canonical formatting writes the document according to the rules specified by W3C Canonical XML Version 1.0. http://www.w3.org/TR/xml-c14n
  • Reformatted output has white space and newlines adjusted to produce a pretty-printed, indented, human-readable form. The exact form of the transformations is not specified.

Nodes of different types are written as follows:

  • Documents are written including an XML declaration and a DTD subset, if one exists in the DOM.
  • Entity nodes, when written directly by DOMWriter.writeNode(), output the entity expansion and a Text Decl. The resulting output will be valid as an external entity.
    No output is generated for any entity nodes when writing a Document.
  • Entity References result in an entity reference ("&entityName;") in the output. Children (the expansion) of the entity reference are ignored.
    To write out a document with entities expanded, and no entity references, use a TreeWalker that is configured to deliver that kind of a view to the DOMWriter.
  • All other node types (Element, Text, etc.) are written without modification or addition, beyond any implied by the white space or pretty printing formatting options.

Any characters that cannot be represented directly, either because of the rules of XML (& or <), or because of limitations of the output encoding, will be replaced with character references. If this is not possible (in a CDATA section, for example) the substitution character(s) will be output instead.

The XML to be written is assumed to be well formed. The output is undefined if an attempt is made to write not well formed XML, such as a Comment containing "--", or an Element containing two attributes with the same name.

Namespace prefixes, declarations and URIs are not checked for consistency by the DOMWriter. If necessary, the (there is one, right) function from the validation module should be used to bring these items into a consistent state within the DOM prior to writing the document.


IDL Definition
interface DOMFormatter {
           attribute DOMString        encoding;
  readonly attribute DOMString        lastEncoding;
           attribute DOMString        substituteChars;
           attribute unsigned short   format;
  void               formatNode(in Node rootNode, 
                                in DOMOutputStream destination)
                                        raises(DOMSystemException);
  void               formatTreeWalker(in TreeWalker tree, 
                                      in DOMOutputStream destination)
                                        raises(DOMSystemException);
};

Attributes
encoding of type DOMString
The character encoding in which the output will be written.
The encoding to use when writing is determined as follows:
  • If the encoding attribute has been set, that value will be use.
  • If the encoding attribute is null or empty, but the item to be written includes an encoding declaration, that value will be used.
  • If neither of the above provides an encoding name, a default encoding of "utf-8" will be used.

The default value is null.
format of type unsigned short
As-is, canonical or reformatted. Need to add constants for these.
The default value is as-is.
lastEncoding of type DOMString, readonly
The actual character encoding that was last used by this formatter.
substituteChars of type DOMString
If a character to be written can not be represented in the output encoding, the substituteChars string will be output in its place. If any of the characters from the substituteChars string can not be represented, they will be replaced by '?'. ('?' can be represented in all known character encodings.)
This substitution only occurs when serializing CDATA sections. In normal content, characters that can not be represented are output as a numeric character references.
The default value of the substitution string is "?".
Methods
formatNode
Format the tree whose root is the specified node, and put the resulting data to the specified output stream.
This method is not intended to be called directly from applications. Use DOMWriter.writeNode() instead, which will indirectly call back here. This interface (and this method) are intended to be implemented by classes that provide alternative output formats for DOM documents.
Parameters
rootNode of type Node
destination of type DOMOutputStream
The destination for the data to be written.
Exceptions

DOMSystemException

This exception will be raised in response to any kind of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception.

No Return Value
formatTreeWalker
Format the tree selected by the supplied TreeWalker, and put the resulting data to the specified output stream.
This method is not intended to be called directly from applications. Use DOMWriter.writeTreeWalker() instead, which will indirectly call back here. This interface (and this method) are intended to be implemented by classes that provide alternative output formats for DOM documents.
Parameters
tree of type TreeWalker
destination of type DOMOutputStream
The destination for the data to be written.
Exceptions

DOMSystemException

This exception will be raised in response to any kind of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception.

No Return Value