1. Content Models and Validation

Editors: Ben Chang, Oracle; Joe Kesselman, IBM; Rezaur Rahman, Intel Corporation

1.1. Overview

This chapter describes the optional DOM Level 3 Content Model (CM) feature. This module provides a representation for XML content models, e.g., DTDs and XML Schemas, together with operations on the content models, and how such information within the content models could be applied to XML documents used in both the document-editing and CM-editing worlds. It also provides additional tests for well-formedness of XML documents, including Namespace well-formedness. A DOM application can use the hasFeature method of theDOMImplementation interface to determine whether a given DOM supports these capabilities or not. One feature string for the CM-editing interfaces listed in this section is "CM-EDIT" and another feature string for document-editing interfaces is "CM-DOC".

This chapter interacts strongly with the Load and Save chapter, which is also under development in DOM Level 3. Not only will that code serialize/deserialize content models, but it may also wind up defining its well-formedness and validity checks in terms of what is defined in this chapter. In addition, the CM and Load/Save functional areas will share a common error-reporting mechanism allowing user-registered error callbacks. Note that this may not imply that the parser actually calls the DOM's validation code -- it may be able to achieve better performance via its own -- but the appearance to the user should probably be "as if" the DOM has been asked to validate the document, and parsers should probably be able to validate newly loaded documents in terms of a previously loaded DOM CM.

Finally, this chapter will have separate sections to address the needs of the document-editing and CM-editing worlds, along with a section that details overlapping areas such as validation. In this manner, the document-editing world's focuses on editing aspects and usage of information in the CM are made distinct from the CM-editing world's focuses on defining and manipulating the information in the CM.

1.1.1. General Characteristics

In the October 9, 1997 DOM requirements document, the following appeared: "There will be a way to determine the presence of a DTD. There will be a way to add, remove, and change declarations in the underlying DTD (if available). There will be a way to test conformance of all or part of the given document against a DTD (if available)." In later discussions, the following was added, "There will be a way to query element/attribute (and maybe other) declarations in the underlying DTD (if available)," supplementing the primitive support for these in Level 1.

That work was deferred past Level 2, in the hope that XML Schemas would be addressed as well. It is anticipated that lowest common denominator general APIs generated in this chapter can support both DTDs and XML Schemas, and other XML content models down the road.

The kinds of information that a Content Model must make available are mostly self-evident from the definitions of Infoset, DTDs, and XML Schemas. Note that some kinds of information on which the DOM already relies, e.g., default values for attributes, will finally be given a visible representation here, however.

1.1.2. Use Cases and Requirements

The content model referenced in these use cases/requirements is an abstraction and does not refer solely to DTDs or XML Schemas.

For the CM-editing and document-editing worlds, the following use cases and requirements are common to both and could be labeled as the "Validation and Other Common Functionality" section:

Use Cases:

CU1. Associating a content model (external and/or internal) with a document, or changing the current association.
CU2. Using the same external content model with several documents, without having to reload it.

Requirements:

CR1. Validate against the content model.
CR2. Retrieve information from content model.
CR3. Load an existing content model, perhaps independently from a document.
CR4. Being able to determine if a document has a content model associated with it.
CR5. Associate a CM with a document and make it the active CM.

Specific to the CM-editing world, the following are use cases and requirements and could be labeled as the "CM-editing" section:

Use Cases:

CMU1. Clone/map all or parts of an existing content model to a new or existing content model.
CMU2. Save a content model in a separate file. For example, if a DTD can be broken up into reusable pieces, which are then brought in via entity references, these can then be saved in a separate file. Note that the external subset of a DTD, which includes both an internal and external subset, is a special case of dividing a content model into entities.
CMU3. Modify an existing content model.
CMU4. Create a new content model.
CMU5. Partial content model checking. For example, the document need only be validated against a selected portion of the content model.

Requirements:

CMR1. View and modify all parts of the content model.
CMR2. Validate the content model itself.
CMR3. Serialize the content model.
CMR4. Clone all or parts of an existing content model.
CMR5. Create a new content model object.
CMR6. Validate portions of the XML document against the content model.

Specific to the document-editing world, the following are use cases and requirements and could be labeled as the "Document-editing" section:

Use Cases:

DU1. For editing documents with an associated content model, provide the guidance necessary so that valid documents can be modified and remain valid.
DU2. For editing documents with an associated content model, provide the guidance necessary to transform an invalid document into a valid one.

Requirements:

DR1. Be able to determine if the document is well-formed, and if not, be given enough guidance to locate the error.
DR2. Be able to determine if the document is namespace well-formed, and if not, be given enough guidance to locate the error.
DR3. Be able to determine if the document is valid with respect to its associated content model, and if not, give enough guidance to locate the error.
DR4. Be able to determine if specific modifications to a document would make it become invalid.
DR5. Retrieve information from all content models. One example might be getting a list of all the defined element names for document editing purposes.

General Issues:

I1. Some concerns exist regarding whether a single abstract Content Model structure can successfully represent both namespace-unaware, e.g., DTD, and namespace-aware, e.g., XML Schema, models of document's content. For example, when you ask what elements can be inserted in a specific place, the former will report the element's QName, e.g., foo:bar, whereas the latter will report its namespace and local name, e.g., {http://my.namespace}bar. We have added the isNamespaceAware attribute to the generic CM object to help applications determine which of these fields are important, but we are still analyzing this challenge.
I2. An XML document may be associated with multiple CMs. We have decided that only one of these is "active" (for validation and guidance) at a time. DOM applications may switch which CM is active, remove CMs that are no longer relevant, or add CMs to the list. If it becomes necessary to simultaneously consult more than one CM, it should be possible to write a "union" CM which provides that capability within this framework.
I3. Content model being able to handle more datatypes than strings. Currently, this functionality is not available and should be dealt with in the future.
I4. Round-trippability for include/ignore statements and other constructs such as parameter entities, e.g., "macro-like" constructs, will not be supported since no data representation exists to support these constructs without having to re-parse them.
I5. Basic interface for a common error handler for both CM and Load/Save. Agreement has been to utilize user-registered callbacks but other details to be worked out.

1.2. Content Model and CM-Editing Interfaces

A list of the proposed Content Model data structures and functions follow, starting off with the data structures and "CM-editing" methods.

Interface CMModel

CMModel is an abstract object that could map to a DTD, an XML Schema, a database schema, etc. It's a generalized content model object, that has both an internal and external subset. The internal subset would always exist, even if empty, with the external subset (if present) being represented as by an "active" CMExternalModel. Many CMExternalModels could exist, but only one can be specified as "active"; it is also possible that none are "active". The issue of multiple content models is misleading since in this architecture, only one CMModel exists, with an internal subset that references the external subset. If the external subset changes to another "acitve" CMExternalModel, the internal subset is "fixed up." The CMModel also contains the factory methods needed to create a various types of CMNodes like CMElementDeclaration, CMAttributeDeclaration, etc.

IDL Definition

interface CMModel : CMNode {
  readonly attribute boolean          isNamespaceAware;
           attribute CMElementDeclaration  rootElementDecl;
  DOMString          getLocation();
  nsElement          getCMNamespace();
  CMNamedNodeMap     getCMNodes();
  boolean            removeNode(in CMNode node);
  boolean            insertBefore(in CMNode newNode, 
                                  in CMNode refNode);
  boolean            validate();
  CMElementDeclaration createCMElementDeclaration(inout DOMString namespaceURI, 
                                                  in DOMString qualifiedElementName, 
                                                  in int contentSpec)
                                        raises(DOMException);
  CMAttributeDeclaration createCMAttributeDeclaration(inout DOMString namespaceURI, 
                                                      in DOMString qualifiedName)
                                        raises(DOMException);
  CMNotationDeclaration createCMNotationDeclaration(in DOMString name, 
                                                    in DOMString systemIdentifier, 
                                                    inout DOMString publicIdentifier)
                                        raises(DOMException);
  CMEntityDeclaration createCMEntityDeclaration(in DOMString name)
                                        raises(DOMException);
  CMChildren         createCMChildren(in unsigned long minOccurs, 
                                      in unsigned long maxOccurs, 
                                      inout unsigned short operator)
                                        raises(DOMException);
};

Attributes

isNamespaceAware of type boolean, readonly: True if this content model defines the document structure in terms of namespaces and local names; false if the document structure is defined only in terms of QNames.
rootElementDecl of type CMElementDeclaration: The root element declaration for the content model. Although a root element is specified in the document instance, when a content model is generated, a user should be able to chose the root element for editing purpose. This is just a placeholder for that element. It could also be null. For validating an XML document, root element must be defined in its active content model. CMModel.rootElementDecl provides access to that root element declaration. This recommendation does not say how to fill in the rootElementdecl. It could be manually done by the user before validating a document, in some cases where possible, the CMModle loader may be able to fill it in etc.

Methods

createCMAttributeDeclaration

Creates an attribute declaration. The returned object implements CMAttributeDeclaration interface.

Parameters

namespaceURI of type DOMString
qualifiedName of type DOMString: The name of the attribute being declared.

Return Value

CMAttributeDeclaration

A new CMAttributeDeclaration object with attributeName attribute set to input qualifiedname parameter.

Exceptions

DOMException

INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character.

createCMChildren

Creates a new CMChildren object. The subModels of the CMChildren is build using CMChildren interface methods.

Parameters

minOccurs of type unsigned long: The minimum occurance for the subModels of this CMChildren.
maxOccurs of type unsigned long: The maximum occurance for the subModels of this CMChildren.
operator of type unsigned short: operator of type CHOICE, SEQ or NONE

Return Value

CMChildren

A new CMChildren object.

Exceptions

DOMException

INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character.

createCMElementDeclaration

Creates an element declaration for the element type specified. The returned object implements CMElementDeclaration interface.

Parameters

namespaceURI of type DOMString
qualifiedElementName of type DOMString: The qualified name of the element type being declared.
contentSpec of type int: Constant for MIXED, EMPTY, ANY and CHILDREN.

Return Value

CMElementDeclaration

A new CMElementDeclaration object with name attribute set to qualifiedElementName and the contentType set to contentSpec. Other attributes of the element declaration are set through CMElementDeclaration interface methods.

Exceptions

DOMException

INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character.

DUPLICATE_NAME_ERR:Raised if an element declaration already exists with the same name for a given CMModel.

createCMEntityDeclaration

Creates a new entity declaration. The returned object implements CMEntityDeclaration interface.

Parameters

name of type DOMString: The name of the entity being declared.

Return Value

CMEntityDeclaration

A new CMNotationDeclaration object with entityName attribute set to name.

Exceptions

DOMException

INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character.

createCMNotationDeclaration

Creates a new notation declaration. The returned object implements CMNotationDeclaration interface.

Parameters

name of type DOMString: The name of the notation being declared.
systemIdentifier of type DOMString: The system identifier for the notation declaration.
publicIdentifier of type DOMString: The public identifier for the notation declaraiton.

Return Value

CMNotationDeclaration

A new CMNotationDeclaration object with notationName attribute set to name.

Exceptions

DOMException

INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character.

DUPLICATE_NAME_ERR:Raised if a notation declaration already exists with the same name for a given CMModel.

getCMNamespace

Determines namespace of CMModel.

Return Value

nsElement

Namespace of CMModel.

No Parameters

No Exceptions

getCMNodes

Returns CMNode list of all the constituent nodes in the content model.

Return Value

CMNamedNodeMap

List of all CMNodes of the content model.

No Parameters

No Exceptions

getLocation

Location of the document describing the content model defined in this CMModel.

Return Value

DOMString

This method returns a DOMString defining the absolute location from which this document is retrieved including the document name.

No Parameters

No Exceptions

insertBefore

Insert CMNode.

Parameters

newNode of type CMNode: CMNode to be inserted.
refNode of type CMNode: CMNode to be inserted before.

Return Value

boolean

Success or failure..

No Exceptions

removeNode

Removes the specifiedCMNode.

Parameters

node of type CMNode: CMNode to be removed.

Return Value

boolean

Success or failure..

No Exceptions

validate

Determines if a CMModel and CMExternalModel itself is valid, i.e., confirming that it's well-formed and valid per its own formal grammar. Note that within a CMModel, a pointer to a CMExternalModel can exist.

Return Value

boolean

Is the CM valid?

No Parameters

No Exceptions

Interface CMExternalModel

CMExternalModel is an abstract object that could map to a DTD, an XML Schema, a database schema, etc. It's a generalized content model object that is not bound to a particular XML document.

IDL Definition

interface CMExternalModel : CMModel {
};

Interface CMNode

CMNodeis analogous to a Node in the Core DOM, e.g., an element declaration. This can exist for both CMExternalModel and CMModel. It should be able to handle constructs such as comments and processing instructions.

Opaque.

IDL Definition

interface CMNode {
  const unsigned short      CM_ELEMENT_DECLARATION         = 1;
  const unsigned short      CM_ATTRIBUTE_DECLARATION       = 2;
  const unsigned short      CM_NOTATION_DECLARATION        = 3;
  const unsigned short      CM_ENTITY_DECLARATION          = 4;
  const unsigned short      CM_CHILDREN                    = 5;
  const unsigned short      CM_MODEL                       = 6;
  const unsigned short      CM_EXTERNALMODEL               = 7;
  readonly attribute unsigned short   cmNodeType;
           attribute CMModel          ownerCMModel;
           attribute DOMString        nodeName;
           attribute DOMString        prefix;
           attribute DOMString        localName;
           attribute DOMString        namespaceURI;
  CMNode             clone();
};

Constant CM_ELEMENT_DECLARATION

The node is an


CMElementDeclaration

Constant CM_ATTRIBUTE_DECLARATION

The node is an CMAttributeDeclaration.

Constant CM_NOTATION_DECLARATION

The node is a


CMNotationDeclaration

Constant CM_ENTITY_DECLARATION

The node is an CMEntityDeclaration.

Constant CM_CHILDREN

The node is a


CMChildren

Constant CM_MODEL

The node is a CMModel.

Constant CM_EXTERNALMODEL

The node is a CMExternalModel.

Attributes

cmNodeType of type unsigned short, readonly: A code representing the underlying object as defined above.
localName of type DOMString: Returns the local part of the qualified name of this CMNode.
namespaceURI of type DOMString: The namespace URI of this node, or null if it is unspecified.
nodeName of type DOMString: The qualified name of this CMNode depending on the CMNode type.
ownerCMModel of type CMModel: The CMModel object associated with this CMNode. For a node of type CM_MODEL, this is null.
prefix of type DOMString: The namespace prefix of this node, or null if it is unspecified.

Methods

clone

Creates a copy of CMNode.

Return Value

CMNode

Cloned CMNode.

No Parameters

No Exceptions

Interface CMNodeList

CMNodeList is the CM analogue to NodeList; the document order is meaningful, as opposed to CMNamedNodeMap.

IDL Definition

interface CMNodeList {
};

Interface CMNamedNodeMap

CMNamedNodeMap is the CM analogue to NamedNodeMap. The order is not meaningful.

IDL Definition

interface CMNamedNodeMap {
};

Interface CMDataType

The primitive datatypes supported by base DOM CM implementation is: string type only.

IDL Definition

interface CMDataType {
  const short               STRING_DATATYPE                = 1;
  short              getCMPrimitiveType();
};

Constant STRING_DATATYPE

code representing the string data type as defined in XML Schema Datatypes.

Methods

getCMPrimitiveType

Returns one of the enumerated code representing the primitive data type.

Return Value

short

code representing the primitive type of the attached data item.

No Parameters

No Exceptions

Interface CMPrimitiveType

The primitive types supported by optional DOM CM implelementations. A DOM application can use the hasFeature method of the DOMImplementation interface to determine whether this interface is supported or not. The feature string for all the interfaces listed in this section is "CMPTYPES" and the version is "3.0".

IDL Definition

interface CMPrimitiveType : CMDataType {
  const short               BOOLEAN_DATATYPE               = 2;
  const short               FLOAT_DATATYPE                 = 3;
  const short               DOUBLE_DATATYPE                = 4;
  const short               DECIMAL_DATATYPE               = 5;
  const short               HEXBINARY_DATATYPE             = 6;
  const short               BASE64BINARY_DATATYPE          = 7;
  const short               ANYURI_DATATYPE                = 8;
  const short               QNAME_DATATYPE                 = 9;
  const short               DURATION_DATATYPE              = 10;
  const short               DATETIME_DATATYPE              = 11;
  const short               DATE_DATATYPE                  = 12;
  const short               TIME_DATATYPE                  = 13;
  const short               YEARMONTH_DATATYPE             = 14;
  const short               YEAR_DATATYPE                  = 15;
  const short               MONTHDAY_DATATYPE              = 16;
  const short               DAY_DATATYPE                   = 17;
  const short               MONTH_DATATYPE                 = 18;
  const short               NOTATION_DATATYPE              = 19;
           attribute decimal          lowValue;
           attribute decimal          highValue;
};

Constant BOOLEAN_DATATYPE

code representing the boolean data type as defined in XML Schema Datatypes.

Constant FLOAT_DATATYPE

code representing the float data type as defined in XML Schema Datatypes.

Constant DOUBLE_DATATYPE

code representing the double data type as defined in XML Schema Datatypes.

Constant DECIMAL_DATATYPE

code representing a decimal data type as defined in XML Schema Datatypes.

Constant HEXBINARY_DATATYPE

code representing a hexbinary data type as defined in XML Schema Datatypes.

Constant BASE64BINARY_DATATYPE

code representing a base64binary data type as defined in XML Schema Datatypes.

Constant ANYURI_DATATYPE

code representing an uri reference data type as defined in XML Schema Datatypes.

Note: @@uriReference is no longer part of the XML Schema PR draft.

Constant QNAME_DATATYPE

code representing an XML qualified name data type as defined in XML Schema Datatypes.

Constant DURATION_DATATYPE

code representing a duration data type as defined in XML Schema Datatypes.

Constant DATETIME_DATATYPE

code representing adatetime data type as defined in XML Schema Datatypes.

Constant DATE_DATATYPE

code representing adate data type as defined in XML Schema Datatypes.

Constant TIME_DATATYPE

code representing a time data type as defined in XML Schema Datatypes.

Constant YEARMONTH_DATATYPE

code representing a yearmonth data type as defined in XML Schema Datatypes.

Constant YEAR_DATATYPE

code representing a year data type as defined in XML Schema Datatypes.

Constant MONTHDAY_DATATYPE

code representing a monthday data type as defined in XML Schema Datatypes.

Constant DAY_DATATYPE

code representing a day data type as defined in XML Schema Datatypes.

Constant MONTH_DATATYPE

code representing a month data type as defined in XML Schema Datatypes.

Constant NOTATION_DATATYPE

code representing aNOTATIONdata type as defined in XML Schema Datatypes.

Attributes

highValue of type decimal: The high value for a primitive DECIMAL_DATATYPE in the value range.
lowValue of type decimal: The low value for a primitive DECIMAL_DATATYPE in the value range.

Interface CMElementDeclaration

The element name along with the content specification in the context of a CMNode.

IDL Definition

interface CMElementDeclaration : CMNode {
           attribute CMDataType       elementType;
  readonly attribute boolean          isPCDataOnly;
           attribute DOMString        tagName;
  int                getContentType();
  CMChildren         getCMChildren();
  CMNamedNodeMap     getCMAttributes();
  CMNamedNodeMap     getCMGrandChildren();
};

Attributes

elementType of type CMDataType: Datatype of the element.
isPCDataOnly of type boolean, readonly: Boolean defining whether the element type contains child elements and PCDATA or PCDATA only for mixed element types. True if the element is of type PCDATA only. Relevant only for mixed content type elements.
tagName of type DOMString: tagName of the element being declared.

Methods

getCMAttributes

Returns a CMNamedNodeMap containing CMAttributeDeclarations for all the attributes that can appear on this type of element.

Return Value

CMNamedNodeMap

Attributes list for this CMNode.

No Parameters

No Exceptions

getCMChildren

Gets content model of element.

Return Value

CMChildren

Content model of element.

No Parameters

No Exceptions

getCMGrandChildren

Returns a CMNamedNodeMap containing CMElementDeclarations for all the Elements that can appear as children of this type of element. Note that which ones can actually appear, and in what order, is defined by the


CMChildren

Return Value

CMNamedNodeMap

Children list for this CMNode.

No Parameters

No Exceptions

getContentType

Gets content type, e.g., empty, any, mixed, elements, PCDATA, of an element.

Return Value

int

Content type constant.

No Parameters

No Exceptions

Interface CMChildren

The content model of a declared element.

IDL Definition

interface CMChildren : CMNode {
  const unsigned long       UNBOUNDED                      = MAX_LONG;
  const unsigned short      NONE                           = 0;
  const unsigned short      SEQUENCE                       = 1;
  const unsigned short      CHOICE                         = 2;
           attribute unsigned short   listOperator;
           attribute unsigned long    minOccurs;
           attribute unsigned long    maxOccurs;
           attribute CMNodeList       subModels;
  CMNode             removeCMNode(in unsigned long nodeIndex);
  int                insertCMNode(in unsigned long nodeIndex, 
                                  in CMNode newNode);
  int                appendCMNode(in CMNode newNode);
};

Constant UNBOUNDED

Signifies unbounded upper limit. The MAX_LONG value is the maximum value of an unsigned long integer for a given language binding.

Constant NONE

No operators defined on the subModels. This is usually the case where the subModels contain a single element declaration.

Constant SEQUENCE

This constant value signifies a sequence operator ",".

Constant CHOICE

This constant value signifies a choice operator "|".

Attributes

listOperator of type unsigned short: One of CHOICE or SEQUENCE. The operator is applied to all the components(CMNodes) in the the subModels. For example, if the list operator is CHOICE and the components in subModels are a, b and c then the content model for the element being declared is (a|b|c)
maxOccurs of type unsigned long: maximum occurrence for this content particle. Valid values are from 0 to UNBOUNDED.
minOccurs of type unsigned long: min occurrence for this content particle. Valid values are from 0 to UNBOUNDED.
subModels of type CMNodeList: Additional CMNodes in which the element can be defined.

Methods

appendCMNode

Appends a new node to the end of the list representing thesubModels.

Parameters

newNode of type CMNode: The new node to be appended.

Return Value

int

the length of the subModels.

No Exceptions

insertCMNode

Inserts a new node at a position in the submodel referred to by the nodeIndex. Node already exisiting in the list is moved as needed.

Parameters

nodeIndex of type unsigned long: The position of where the newNode is inserted.
newNode of type CMNode: The new node to be inserted.

Return Value

int

The index value at which it is inserted. If the nodeIndex is outside the bound of the subModels list, the item is inserted at the back of the list.

No Exceptions

removeCMNode

Removes the CMNode at the indicated index position in the submodel.

Parameters

nodeIndex of type unsigned long: Index of the node being removed.

Return Value

CMNode

The node removed is returned as a result of this method call. The method returns null if the index is outside the bounds of the subModels list.

No Exceptions

Interface CMAttributeDeclaration

An attribute declaration in the context of a CMNode.

IDL Definition

interface CMAttributeDeclaration : CMNode {
  const short               NO_VALUE_CONSTRAINT            = 0;
  const short               DEFAULT_VALUE_CONSTRAINT       = 1;
  const short               FIXED_VALUE_CONSTRAINT         = 2;
           attribute DOMString        attrName;
           attribute CMDataType       attrType;
           attribute DOMString        attributeValue;
           attribute DOMString        enumAttr;
           attribute CMNodeList       ownerElement;
           attribute short            constraintType;
};

Constant NO_VALUE_CONSTRAINT

Describes that the attribute does not have any value constraint.

Constant DEFAULT_VALUE_CONSTRAINT

Indicates that the there is a default value constraint.

Constant FIXED_VALUE_CONSTRAINT

Indicates that there is a fixed value constraint for this attribute.

Attributes

attrName of type DOMString: Name of the attribute.
attrType of type CMDataType: Datatype of the attribute.
attributeValue of type DOMString: Default value.
constraintType of type short: Constraint type if any for this attribute.
enumAttr of type DOMString: Enumeration of attribute.
ownerElement of type CMNodeList: Owner element CMNode of attribute.

Interface CMEntityDeclaration

Models a general entity declaration in a content model.

(ED: The content model does not handle any parameter entity. It is assumed that the parameter entiites are expanded by the implementation as the content model is built.)

IDL Definition

interface CMEntityDeclaration : CMNode {
  const short               INTERNAL_ENTITY                = 1;
  const short               EXTERNAL_ENTITY                = 2;
           attribute short            entityType;
           attribute DOMString        entityName;
           attribute DOMString        entityValue;
           attribute DOMString        systemId;
           attribute DOMString        publicId;
           attribute DOMString        notationName;
};

Constant INTERNAL_ENTITY

constant defining an internal entity.

Constant EXTERNAL_ENTITY

constant defining an external entity.

Attributes

entityName of type DOMString: The name of the declared general entity.
entityType of type short: One of the INTERNAL_ENTITY or EXTERNAL_ENTITY.
entityValue of type DOMString: The replacement text for the internal entity. The entity references within the replacement text is kept intact. For entity of type EXTERNAL_ENTITY this is null.
notationName of type DOMString: For unparsed entities, the name of the notation declaration for the entity. For parsed entities, this is null.
publicId of type DOMString: The public identifier associated with the entity, if specified. If the public identifier was not specified, this is null.
systemId of type DOMString: The system identifier associated with the entity, if specified. If the system identifier was not specified, this is null.

Interface CMNotationDeclaration

This interface represents a notation declaration.

IDL Definition

interface CMNotationDeclaration : CMNode {
           attribute DOMString        notationName;
           attribute DOMString        systemId;
           attribute DOMString        publicId;
};

Attributes

notationName of type DOMString: The name of this notation declaration.
publicId of type DOMString: The string representing the public identifier for this notation declaration.
systemId of type DOMString: the URI representing the system identifier for the notation declaration, if present, null otherwise.

1.3. Validation and Other Interfaces

This section contains "Validation and Other" methods common to both the document-editing and CM-editing worlds (includes Document, DOMImplementation, and DOMErrorHandler methods).

Interface Document

The setErrorHandler method is off of the Document interface.

IDL Definition

interface Document {
  void               setErrorHandler(in DOMErrorHandler handler);
};

Methods

setErrorHandler

Allow an application to register an error event handler.

Parameters

handler of type DOMErrorHandler: The error handler

No Return Value

No Exceptions

Interface DocumentCM

This interface extends the Document interface with additional methods for both document and CM editing.

IDL Definition

interface DocumentCM : Document {
  const short               WF_CHECK                       = 1;
  const short               NS_WF_CHECK                    = 2;
  const short               PARTIAL_VALIDITY_CHECK         = 3;
  const short               STRICT_VALIDITY_CHECK          = 4;
           attribute boolean          continuousValidityChecking;
           attribute short            wfValidityCheckLevel;
  int                numCMs();
  CMModel            getInternalCM();
  CMNodeList         getCMs();
  CMModel            getActiveCM();
  void               addCM(in CMModel cm);
  void               removeCM(in CMModel cm);
  boolean            activateCM(in CMModel cm);
};

Constant WF_CHECK

Check for well-formedness of the document.

Constant NS_WF_CHECK

Check for namespace well-formedness includes WF_CHECK.

Constant PARTIAL_VALIDITY_CHECK

Checks for whether the document is partially valid. It includes NS_WF_CHECK.
A document is said to be partially valid if it contains elments/attributes for which an element/attribute declaration has not been made in the active content model. However, if the element or the attribute has a declaration in the content model, it must be valid with respect to those declarations.

Constant STRICT_VALIDITY_CHECK

Checks for strict validity of the document with respect to active CM which by defiition includes NS_WF_CHECK.

Attributes

continuousValidityChecking of type boolean: An attribute specifying whether continuous checking for the validity of the document is enforced or not. When set to true the implementation is free to raise the VALIDATION_ERR exception on DOM operations that would make the document invalid with respect to "partial validity". This attribute is false by default.
(ED: Add VALIDATION_ERR code to the list of constants in DOMException.)
wfValidityCheckLevel of type short: This attribute defines the level at which the validity and welformedness testing is done by the isValid method.

Methods

activateCM

Make the given CMModel active. Note that if a user wants to activate one CM to get default attribute values and then activate another to do validation, a user can do that; however, only one CM is active at a time. In case where an attribute is declared in an internal subset and corresponding ownerElement points to


CMElementDeclaration

defined in an external subset, changing active CM will cause the ownerElement to be re-computed. If the owner element is not defined in the newly active CM, the ownerElement will be an empty node list.

Parameters

cm of type CMModel: CM to be active for the document. The CMModel points to a list of CMExternalModels; with this call, only the specified CM will be active.

Return Value

boolean

True if the CMModel has already been associated with the document using addCM(); false if not.

No Exceptions

addCM

Associate a CMModel with a document. Can be invoked multiple times to result in a list of CMExternalModels. Note that only one sole internal CMModel is associated with the document, however, and that only one of the possible list of CMExternalModels is active at any one time.

Parameters

cm of type CMModel: CM to be associated with the document.

No Return Value

No Exceptions

getActiveCM

Find the active CMExternalModel for a document.

Return Value

CMModel

CMModel with a pointer to the active CMExternalModel of document.

No Parameters

No Exceptions

getCMs

Obtains list of CMNodes of typeCM_EXTERNALMODELs associated with the document.This list arises when addCM() is invoked.

Return Value

CMNodeList

A list of CMExternalModels associated with a document.

No Parameters

No Exceptions

getInternalCM

Find the sole CMModel of a document. Only one CMModel may be associated with the document.

Return Value

CMModel

CMModel.

No Parameters

No Exceptions

numCMs

Determines number of CMExternalModels associated with the document. Only one CMModel can be associated with the document, but it may point to a list of CMExternalModels.

Return Value

int

Non-negative number of external CM objects.

No Parameters

No Exceptions

removeCM

Removes a CM associated with a document; actually removes a CMExternalModel. Can be invoked multiple times to remove a number of these in the list of CMExternalModels.

Parameters

cm of type CMModel: CM to be removed.

No Return Value

No Exceptions

Interface DOMImplementationCM

This interface extends the DOMImplementation interface with additional methods.

IDL Definition

interface DOMImplementationCM : DOMImplementation {
  CMModel            createCM();
  CMExternalModel    createExternalCM();
};

Methods

createCM

Creates a CMModel.

Return Value

CMModel

A NULL return indicates failure.

No Parameters

No Exceptions

createExternalCM

Creates a CMExternalModel.

Return Value

CMExternalModel

A NULL return indicates failure.

No Parameters

No Exceptions

1.4. Document-Editing Interfaces

This section contains "Document-editing" methods (includes Node, Element, Text and Document methods).

Interface NodeCM

This interface extends the Node interface with additional methods for guided document editing.

IDL Definition

interface NodeCM : Node {
  boolean            canInsertBefore(in Node newChild, 
                                     in Node refChild)
                                        raises(DOMException);
  boolean            canRemoveChild(in Node oldChild)
                                        raises(DOMException);
  boolean            canReplaceChild(in Node newChild, 
                                     in Node oldChild)
                                        raises(DOMException);
  boolean            canAppendChild(in Node newChild)
                                        raises(DOMException);
  boolean            isValid()
                                        raises(DOMException);
};

Methods

canAppendChild

Has the same args as AppendChild.

Parameters

newChild of type Node: Node to be appended.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

canInsertBefore

Determines whether the Node::InsertBefore operation would make this document invalid with respect to the currently active CM. ISSUE: Describe "valid" when referring to partially completed documents.

Parameters

newChild of type Node: Node to be inserted.
refChild of type Node: Reference Node.

Return Value

boolean

A boolean that is true if the Node::InsertBefore operation is allowed.

Exceptions

DOMException

DOMException.

canRemoveChild

Has the same args as RemoveChild.

Parameters

oldChild of type Node: Node to be removed.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

canReplaceChild

Has the same args as ReplaceChild.

Parameters

newChild of type Node: New Node.
oldChild of type Node: Node to be replaced.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

isValid

Determines if the Node is valid relative to currently active CM.

Return Value

boolean

True if the node is valid/well-formed in the current context and check level defined by wfValidityCheckLevel, false if not.

Exceptions

DOMException

NO_CM_AVAILABLE: Exception is raised if the DocumentCM related to this node does not have any activeCM and wfValidityCheckLevel is set to STRICT_VALIDITY_CHECK.

No Parameters

Interface ElementCM

This interface extends the Element interface with additional methods for guided document editing.

IDL Definition

interface ElementCM : Element,NodeCM {
  int                contentType();
  CMElementDeclaration getElementDeclaration()
                                        raises(DOMException);
  boolean            canSetAttribute(in DOMString attrname, 
                                     in DOMString attrval);
  boolean            canSetAttributeNode(in Node node);
  boolean            canSetAttributeNodeNS(in Node node);
  boolean            canSetAttributeNS(in DOMString attrname, 
                                       in DOMString attrval, 
                                       in DOMString namespaceURI, 
                                       in DOMString localName);
  boolean            canRemoveAttribute(in DOMString attrname);
  boolean            canRemoveAttributeNS(in DOMString attrname, 
                                          inout DOMString namespaceURI);
  boolean            canRemoveAttributeNode(in Node node);
};

Methods

canRemoveAttribute

Verifies if an attribute by the given name can be removed.

Parameters

attrname of type DOMString: Name of attribute.

Return Value

boolean

true or false.

No Exceptions

canRemoveAttributeNS

Verifies if an attribute by the given name and namespace can be removed.

Parameters

attrname of type DOMString: Qualified name of the attribute to be removed.
namespaceURI of type DOMString: The namespace URI of the attribute to remove.

Return Value

boolean

true or false.

No Exceptions

canRemoveAttributeNode

Determines if an attribute node can be removed.

Parameters

node of type Node: The Attr node to remove from the attribute list.

Return Value

boolean

true or false.

No Exceptions

canSetAttribute

Determines if the value for specified attribute can be set.

Parameters

attrname of type DOMString: Name of attribute.
attrval of type DOMString: Value to be assigned to the attribute.

Return Value

boolean

true or false.

No Exceptions

canSetAttributeNS

Determines if the attribute with given namespace and local name can be created if not already present in the attribute list of the element. If the attribute with same local name and namespaceURI is already present in the elements attribute list it sets the value of the attribute and its prefix to the new value. See DOM core setAttributeNS.

Parameters

attrname of type DOMString: Name of attribute.
attrval of type DOMString: Value to be assigned to the attribute.
namespaceURI of type DOMString: namespaceURI of namespace.
localName of type DOMString: localName of namespace.

Return Value

boolean

Success or failure.

No Exceptions

canSetAttributeNode

Determines if attribute node can be added.

Parameters

node of type Node: Node in which the attribute can possibly be set.

Return Value

boolean

Success or failure.

No Exceptions

canSetAttributeNodeNS

Determines if the attribute node with the given namespace can be added.

Parameters

node of type Node: The Attr to be added to the attribute list.

Return Value

boolean

Success or failure.

No Exceptions

contentType

Determines element content type.

Return Value

int

Constant for mixed, empty, any, etc.

No Parameters

No Exceptions

getElementDeclaration

gets the CM editing object describing this element

Return Value

CMElementDeclaration

CMElementDeclaration object

Exceptions

DOMException

If no DTD is present raises this exception

No Parameters

Interface CharacterDataCM

This interface extends the CharacterData interface with additional methods for document editing.

IDL Definition

interface CharacterDataCM : Text,NodeCM {
  boolean            isWhitespaceOnly();
  boolean            canSetData(in unsigned long offset, 
                                in DOMString arg)
                                        raises(DOMException);
  boolean            canAppendData(in DOMString arg)
                                        raises(DOMException);
  boolean            canReplaceData(in unsigned long offset, 
                                    in unsigned long count, 
                                    in DOMString arg)
                                        raises(DOMException);
  boolean            canInsertData(in unsigned long offset, 
                                   in DOMString arg)
                                        raises(DOMException);
  boolean            canDeleteData(in unsigned long offset, 
                                   in DOMString arg)
                                        raises(DOMException);
};

Methods

canAppendData

Determines if data can be appended.

Parameters

arg of type DOMString: Argument to be appended.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

canDeleteData

Determines if data can be deleted.

Parameters

offset of type unsigned long: Offset.
arg of type DOMString: Argument to be set.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

canInsertData

Determines if data can be inserted.

Parameters

offset of type unsigned long: Offset.
arg of type DOMString: Argument to be set.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

canReplaceData

Determines if data can be replaced.

Parameters

offset of type unsigned long: Offset.
count of type unsigned long: Replacement.
arg of type DOMString: Argument to be set.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

canSetData

Determines if data can be set.

Parameters

offset of type unsigned long: Offset.
arg of type DOMString: Argument to be set.

Return Value

boolean

Success or failure.

Exceptions

DOMException

DOMException.

isWhitespaceOnly

Determines if content is only whitespace.

Return Value

boolean

True if content only whitespace; false for non-whitespace if it is a text node in element content.

No Parameters

No Exceptions

Interface DocumentTypeCM

This interface extends the DocumentType interface with additional methods for document editing.

IDL Definition

interface DocumentTypeCM : DocumentType,NodeCM {
  boolean            isElementDefined(in DOMString elemTypeName);
  boolean            isElementDefinedNS(in DOMString elemTypeName, 
                                        in DOMString namespaceURI, 
                                        in DOMString localName);
  boolean            isAttributeDefined(in DOMString elemTypeName, 
                                        in DOMString attrName);
  boolean            isAttributeDefinedNS(in DOMString elemTypeName, 
                                          in DOMString attrName, 
                                          in DOMString namespaceURI, 
                                          in DOMString localName);
  boolean            isEntityDefined(in DOMString entName);
};

Methods

isAttributeDefined

Determines if this attribute is defined for this element in the currently active CM.

Parameters

elemTypeName of type DOMString: Name of the element.
attrName of type DOMString: Name of the attribute.

Return Value

boolean

Success or failure.

No Exceptions

isAttributeDefinedNS

Determines if this attribute's namespace is defined in the currently active CM.

Parameters

elemTypeName of type DOMString: Name of element.
attrName of type DOMString: Name of attribute.
namespaceURI of type DOMString: namespaceURI of namespace.
localName of type DOMString: localName of namespace.

Return Value

boolean

Success or failure.

No Exceptions

isElementDefined

Determines if this element is defined in the currently active CM.

Parameters

elemTypeName of type DOMString: Name of element.

Return Value

boolean

Success or failure.

No Exceptions

isElementDefinedNS

Determines if this element's namespace is defined in the currently active CM.

Parameters

elemTypeName of type DOMString: Name of element.
namespaceURI of type DOMString: namespaceURI of namespace.
localName of type DOMString: localName of namespace.

Return Value

boolean

Success or failure.

No Exceptions

isEntityDefined

Determines if an entity is defined in the document.
ISSUE: Should methods be added to the DocumentTypeCM for the complete list of defined elements and for a particular element type, the complete list of defined attributes. These two methods might return a list of strings which is a type not yet described in the DOM spec.

Parameters

entName of type DOMString: Name of entity.

Return Value

boolean

Success or failure.

No Exceptions

Interface AttributeCM

This interface extends Attr to provide guided editing of an XML document.

IDL Definition

interface AttributeCM : Attr,NodeCM {
  CMAttributeDeclaration getAttributeDeclaration();
  CMNotationDeclaration getNotation()
                                        raises(DOMException);
};

Methods

getAttributeDeclaration

returns the corresponding attribute declaration in the content model.

Return Value

CMAttributeDeclaration

The attribute declaration corresponding to this attribute

No Parameters

No Exceptions

getNotation

Returns the notation declaration for the attributes defined of type NOTATION.

Return Value

CMNotationDeclaration

Returns the notation declaration for this attribute if the type is of notation type, null otherwise.

Exceptions

DOMException

DOMException

No Parameters

1.5. DOM Error Handler Interfaces

This section contains DOM error handling interfaces.

Interface DOMErrorHandler

Basic interface for DOM error handlers. If an application needs to implement customized error handling for DOM such as CM or Load/Save, it must implement this interface and then register an instance using the setErrorHandler method. All errors and warnings will then be reported through this interface. Application writers can override the methods in a subclass to take user-specified actions.

IDL Definition

interface DOMErrorHandler {
  void               warning(in DOMLocator where, 
                             in DOMString how, 
                             in DOMString why)
                                        raises(DOMSystemException);
  void               fatalError(in DOMLocator where, 
                                in DOMString how, 
                                in DOMString why)
                                        raises(DOMSystemException);
  void               error(in DOMLocator where, 
                           in DOMString how, 
                           in DOMString why)
                                        raises(DOMSystemException);
};

Methods

error

Receive notification of a recoverable error per section 1.2 of the W3C XML 1.0 recommendation. The default behavior if the user doesn't register a handler is to report conditions that are not fatal errors, and allow the calling application to continue processing.

Parameters

where of type DOMLocator: Location of the error, which could be either a source position in the case of loading, or a node reference for later validation. The public ID and system ID for the error location could be some of the information.
how of type DOMString: How the error occurred.
why of type DOMString: Why the error occurred.

Exceptions

DOMSystemException

A subclass of DOMException.

No Return Value

fatalError

Report a fatal, non-recoverable CM or Load/Save error per section 1.2 of the W3C XML 1.0 recommendation. The default behavior if the user doesn't register a handler is to throw a DOMSystemException and stop all further processing.

Parameters

where of type DOMLocator: Location of the fatal error, which could be either a source position in the case of loading, or a node reference for later validation. The public ID and system ID for the error location could be some of the information.
how of type DOMString: How the fatal error occurred.
why of type DOMString: Why the fatal error occurred.

Exceptions

DOMSystemException

A subclass of DOMException.

No Return Value

warning

Receive notification of a warning per the W3C XML 1.0 recommendation. The default behavior if the user doesn't register a handler is to report conditions that are not errors or fatal errors, and then allow the calling application to continue even after invoking this method.

Parameters

where of type DOMLocator: Location of the warning, which could be either a source position in the case of loading, or a node reference for later validation. The public ID and system ID for the error location could be some of the information.
how of type DOMString: How the warning occurred.
why of type DOMString: Why the warning occurred.

Exceptions

DOMSystemException

A subclass of DOMException.

No Return Value

Interface DOMLocator

This interface provides document location information and is similar to a SAX locator object.

IDL Definition

interface DOMLocator {
  int                getColumnNumber();
  int                getLineNumber();
  DOMString          getPublicID();
  DOMString          getSystemID();
  Node               getNode();
};

Methods

getColumnNumber

Return the column number.

Return Value

int

The column number, or -1 if none is available.

No Parameters

No Exceptions

getLineNumber

Return the line number.

Return Value

int

The line number, or -1 if none is available.

No Parameters

No Exceptions

getNode

Return the Node.

Return Value

Node

The NODE, or null if none is available.

No Parameters

No Exceptions

getPublicID

Return the public identifier.

Return Value

DOMString

A string containing the public identifier, or null if none is available.

No Parameters

No Exceptions

getSystemID

Return the system identifier.

Return Value

DOMString

A string containing the system identifier, or null if none is available.

No Parameters

No Exceptions

1.6. Editing and Generating a Content Model

Editing and generating a content model falls in the CM-editing world. The most obvious requirement for this set of requirements is for tools that author content models, either under user control, i.e., explicitly designed document types, or generated from other representations. The latter class includes transcoding tools, e.g., synthesizing an XML representation to match a database schema.

It's important to note here that a DTD's "internal subset" is part of the Content Model, yet is loaded, stored, and maintained as part of the individual document instance. This implies that even tools which do not want to let users change the definition of the Document Type may need to support editing operations upon this portion of the CM. It also means that our representation of the CM must be aware of where each portion of its content resides, so that when the serializer processes this document it can write out just the internal subset. A similar issue may arise with external parsed entities, or if schemas introduce the ability to reference other schemas. Finally, the internal-subset case suggests that we may want at least a two-level representation of content models, so a single DOM representation of a DTD can be shared among several documents, each potentially also having its own internal subset; it's possible that entity layering may be represented the same way.

The API for altering the content model may also be the CM's official interface with parsers. One of the ongoing problems in the DOM is that there is some information which must currently be created via completely undocumented mechanisms, which limits the ability to mix and match DOMs and parsers. Given that specialized DOMs are going to become more common (sub-classed, or wrappers around other kinds of storage, or optimized for specific tasks), we must avoid that situation and provide a "builder" API. Particular pairs of DOMs and parsers may bypass it, but it's required as a portability mechanism.

Note that several of these applications require that a CM be able to be created, loaded, and manipulated without/before being bound to a specific Document. A related issue is that we'd want to be able to share a single representation of a CM among several documents, both for storage efficiency and so that changes in the CM can quickly be tested by validating it against a set of known-good documents. Similarly, there is a known problem in DOM Level 2 where we assume that the DocumentType will be created before the Document, which is fine for newly-constructed documents but not a good match for the order in which an XML parser encounters this data; being able to "rebind" a Document to a new CM, after it has been created may be desirable.

As noted earlier, questions about whether one can alter the content of the CM via its syntax, via higher-level abstractions, or both, exist. It's also worth noting that many of the editing concepts from the Document tree still apply; users should probably be able to clone part of a CM, remove and re-insert parts, and so on.

1.7. Content Model-directed Document Manipulation

In addition to using the content model to validate a document instance, applications would like to be able to use it to guide construction and editing of documents, which falls into the document-editing world. Examples of this sort of guided editing already exist, and are becoming more common. The necessary queries can be phrased in several ways, the most useful of which may be a combination of "what does the DTD allow me to insert here" and "if I insert this here, will the document still be valid". The former is better suited to presentation to humans via a user interface, and when taken together with sub-tree validation may subsume the latter.

It has been proposed that in addition to asking questions about specific parts of the content model, there should be a reasonable way to obtain a list of all the defined symbols of a given type (element, attribute, entity) independent of whether they're valid in a given location; that might be useful in building a list in a user-interface, which could then be updated to reflect which of these are relevant for the program's current state.

Remember that namespaces also weigh in on this issue, in the case of attributes, a "can-this-go-there" may prompt a namespace-well-formedness check and warn you if you're about to conflict with or overwrite another attribute with the same namespaceURI/localName but different prefix... or same nodeName but different namespaceURI.

As mentioned above, we have to deal with the fact that the shortest distance between two valid documents may be through an invalid one. Users may want to know several levels of detail (all the possible children, those which would be valid given what precedes this point, those which would be valid given both preceding and following siblings). Also, once XML Schemas introduce context sensitive validity, we may have to consider the effect of children as well as the individual node being inserted.

1.8. Validating a Document Against a Content Model

The most obvious use for a content model (DTD or XML Schema or any Content Model) is to use it to validate that a given XML document is in fact a properly constructed instance of the document type described by this CM. This again falls into the document-editing world. The XML spec only discusses performing this test at the time the document is loaded into the "processor", which most of us have taken to mean that this check should be performed at parse time. But it is obviously desirable to be able to validate again a document -- or selected subtrees -- at other times. One such case would be validating an edited or newly constructed document before serializing it or otherwise passing it to other users. This issue also arises if the "internal subset" is altered -- or if the whole Content Model changes.

In the past, the DOM has allowed users to create invalid documents, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax... or that they would be checked for validity when read back in. We considered adding validity checks to the DOM's existing editing operations to prevent creation of invalid documents, but are currently inclined against this for several reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations, e.g., if the change is occurring in a context where we know the result will be valid. Second, "the shortest distance between two good documents may be through a bad document". Preventing a document from becoming temporarily invalid may impose a considerable amount of additional work on higher-level code and users Hence our current plan is to continue to permit editing to produce invalid DOMs, but provide operations which permit a user to check the validity of a node on demand.

Note that validation includes checking that ID attributes are unique, and that IDREFs point to IDs which actually exist.

1.9. Well-formedness Testing

XML defined the "well-formed" (WF) state for documents which are parsed without reference to their DTDs. Knowing that a document is well-formed may be useful by itself even when a DTD is available. For example, users may wish to deliberately save an invalid document, perhaps as a checkpoint before further editing. Hence, the CM feature will permit both full validity checking (see previous section) and "lightweight" WF checking, as requested by the caller, as well as processing entity declarations in the CM even if validation is not turned on. This falls within the document-editing world.

While the DOM inherently enforces some of XML's well-formedness conditions (proper nesting of elements, constraints on which children may be placed within each node), there are some checks that are not yet performed. These include:

Character restrictions for text content and attribute values. Some characters aren't permitted even when expressed as numeric character entities
The three-character sequence "]]>" in CDATASections.
The two-character sequence "--" in comments. (Which, be it noted, some XML validators don't currently remember to test...)

In addition, Namespaces introduce their own concepts of well-formedness. Specifically:

No two attributes on a single Element may have the same combination of namespaceURI and localName, even if their prefixes are different and hence they don't conflict under XML 1.0 rules.
NamespaceURIs must be legal URI syntax. (Note that once we have this code, it may be reusable for the URI "datatype" in document content; see discussion of datatypes.)
The mapping of namespace prefixes to their URIs must be declared and consistent. That isn't required during normal DOM operation, since we perform "early binding" and thereafter refer to nodes primarily via their namespaceURIs and localName. But it does become an issue when we want to serialize the DOM to XML syntax, and may be an issue if an application is assuming that all the declarations are present and correct. This may imply that we should provide a namespaceNormalize operation, which would create the implied declarations and reconcile conflicts in some reasonably standardized manner. This may be a major undertaking, since some DOMs may be using the namespace to direct subclassing of the nodes or similar special treatment; as with the existing normalize method, you may be left with a different-but-equivalent set of node objects.

In the past, the DOM has allowed users to create documents which violate these rules, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax. We considered adding WF checks to the DOM's existing editing operations to prevent WF violations from arising, but are currently inclined against this for two reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations (for example, if the change is occurring in a context where we know the illegal characters have already been prevented from arising). Second, "the shortest distance between two good documents may be through a bad document" -- preventing a document from becoming temporarily ill-formed may impose a considerable amount of additional work on higher-level code and users. (Note possible issue for Serialization: In some applications, being able to save and reload marginally poorly-formed DOMs might be useful -- editor checkpoint files, for example.) Hence our current plan is to continue to permit editing to produce ill-formed DOMs, but provide operations which permit a user to check the well-formedness of a node on demand, and possibly provide some of the primitive (e.g., string-checking) functions directly.

1. Content Models and Validation

Table of contents

1.1. Overview

1.1.1. General Characteristics

1.1.2. Use Cases and Requirements

1.2. Content Model and CM-Editing Interfaces

1.3. Validation and Other Interfaces

1.4. Document-Editing Interfaces

1.5. DOM Error Handler Interfaces

1.6. Editing and Generating a Content Model

1.7. Content Model-directed Document Manipulation

1.8. Validating a Document Against a Content Model

1.9. Well-formedness Testing