1. Validation

Editors:: Ben Chang, Oracle; Joe Kesselman, IBM (until September 2001); Rezaur Rahman, Intel Corporation (until July 2001)

1.1. Overview
- 1.1.1. General Characteristics
- 1.1.2. Use Cases and Requirements
1.2. Exceptions
- ExceptionVAL, ExceptionVALCode
1.3. Document-Editing Interfaces
- DocumentEditVAL, NodeEditVAL, ElementEditVAL, CharacterDataEditVAL
1.4. Document Manipulation
1.5. Validating a Document
1.6. Well-formedness Testing

1.1. Overview

This chapter describes the optional DOM Level 3 Validation feature. This module provides APIs to query information about the XML document.

A DOM application can use the hasFeature method of the DOMImplementation interface to determine whether a given DOM supports these capabilities or not. This module defines 1 feature string: "VAL-DOC" for document-editing interfaces.

This chapter focuses on the editing aspects used in the XML document-editing world and usage of such information.

1.1.1. General Characteristics

In the October 9, 1997 DOM requirements document, the following appeared: "There will be a way to determine the presence of a DTD. There will be a way to add, remove, and change declarations in the underlying DTD (if available). There will be a way to test conformance of all or part of the given document against a DTD (if available)." In later discussions, the following was added, "There will be a way to query element/attribute (and maybe other) declarations in the underlying DTD (if available)," supplementing the primitive support for these in Level 1.

That work was deferred past Level 2, in the hope that XML Schemas would be addressed as well. Work was deferred on lowest common denominator general grammar APIs due to heightened interest in XML Schema- specific APIs; however, work on querying information on the grammar was only done for DOM Level 3.

1.1.2. Use Cases and Requirements

Here are the following use cases and requirements that prompted the functionality in this document:

Use Cases:

DU1. For editing documents with an associated grammar, provide the guidance necessary so that valid documents can be modified and remain valid.
DU2. For editing documents with an associated grammar, provide the guidance necessary to transform an invalid document into a valid one.

Requirements:

DR1. Be able to determine if the document is well-formed, and if not, be given enough guidance to locate the error.
DR2. Be able to determine if the document is namespace well-formed, and if not, be given enough guidance to locate the error.
DR3. Be able to determine if the document is valid with respect to its associated grammar.
DR4. Be able to determine if specific modifications to a document would make it become invalid.
DR5. Retrieve information from all grammar. One example might be getting a list of all the defined element names for document editing purposes.

1.2. Exceptions

This section describes the "VAL-DOC" exceptions.

Exception ExceptionVAL

These operations may throw a ExceptionVAL as described in their descriptions.

IDL Definition

exception ExceptionVAL {
  unsigned short   code;
};
// ExceptionVALCode
const unsigned short      NO_GRAMMAR_AVAILABLE           = 71;
const unsigned short      VALIDATION_ERR                 = 72;

Definition group ExceptionVALCode

An integer indicating the type of error generated.

Defined Constants

NO_GRAMMAR_AVAILABLE: If the DocumentEditVAL related to the node does not have any grammar and wfValidityCheckLevel is set to PARTIAL or STRICT_VALIDITY_CHECK.
VALIDATION_ERR: Raised if document is invalid.

1.3. Document-Editing Interfaces

This section contains "Document-editing" methods (includes Node, Element, Text and Document methods).

A DOM application may use the hasFeature(feature, version) method of the DOMImplementation interface with parameter values "VAL-DOC" and "3.0" (respectively) to determine whether or not the Document-Editing interfaces are supported by the implementation.

Interface DocumentEditVAL

This interface extends the NodeEditVAL interface with additional methods for document editing.

IDL Definition

interface DocumentEditVAL : NodeEditVAL {
           attribute boolean         continuousValidityChecking;
  void               validateDocument()
                                        raises(ExceptionVAL);
};

Attributes

continuousValidityChecking of type boolean: An attribute specifying whether continuous checking for the validity of the document is enforced or not. Setting this to true will result in an exception being thrown, i.e., VALIDATION_ERR, for documents that are invalid at the time of the call. When set to true, the implementation if free to raise the VALIDATION_ERR exception on DOM operations that would make the document invalid with respect to "partial validity." If the document is invalid, then this attribute will remain false. This attribute is false by default.

Methods

validateDocument

Validates the document against the grammar. If the document is mutated during validation, a warning will be issued. In addition, the validation cannot modify the document, e.g., for default attributes. This method makes use of the passed-in error handler, as described in [DOM Level 3 Core] interface.

Exceptions

ExceptionVAL

NO_GRAMMAR_AVAILABLE: Raised if an error occurs when the grammar is not available for the document.

No Parameters

No Return Value

Interface NodeEditVAL

This interface is similar to the [DOM Level 3 Core] Node interfaces, with methods for guided document editing.

IDL Definition

interface NodeEditVAL {

  // CheckTypeVAL
  const unsigned short      WF_CHECK                       = 1;
  const unsigned short      NS_WF_CHECK                    = 2;
  const unsigned short      PARTIAL_VALIDITY_CHECK         = 3;
  const unsigned short      STRICT_VALIDITY_CHECK          = 4;

  boolean            canInsertBefore(in Node newChild, 
                                     in Node refChild);
  boolean            canRemoveChild(in Node oldChild);
  boolean            canReplaceChild(in Node newChild, 
                                     in Node oldChild);
  boolean            canAppendChild(in Node newChild);
  boolean            isNodeValid(in boolean deep, 
                                 in unsigned short wFValidityCheckLevel)
                                        raises(ExceptionVAL);
};

Definition group CheckTypeVAL

An integer indicating which type of validation this is.

Defined Constants

NS_WF_CHECK: Check for namespace well-formedness includes WF_CHECK.
PARTIAL_VALIDITY_CHECK: Checks for whether this node is partially valid. It includes NS_WF_CHECK.
STRICT_VALIDITY_CHECK: Checks for strict validity of the node with respect to active grammar which by definition includes NS_WF_CHECK.
WF_CHECK: Check for well-formedness of this node.

Methods

canAppendChild

Has the same arguments as Node.appendChild.

Parameters

newChild of type Node: Node to be appended.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canInsertBefore

Determines whether the Node.insertBefore operation would make this document not partially valid with respect to the currently active grammar.

Parameters

newChild of type Node: Node to be inserted.
refChild of type Node: Reference Node.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canRemoveChild

Has the same arguments as Node.removeChild.

Parameters

oldChild of type Node: Node to be removed.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canReplaceChild

Has the same arguments as Node.replaceChild.

Parameters

newChild of type Node: New Node.
oldChild of type Node: Node to be replaced.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

isNodeValid

Determines if the Node is valid relative to currently active grammar. It doesn't normalize before checking if the document is valid. To do so, one would need to explicitly call a normalize method.

Parameters

deep of type boolean: Setting the deep flag on causes the isNodeValid method to check for the whole subtree of the current node for validity. Setting it to false only checks the current node and its immediate child nodes. The validateDocument method on the DocumentVAL interface, however, checks to determine whether the entire document is valid.
wFValidityCheckLevel of type unsigned short: Flag to tell at what level validity and well-formedness checking is done.

Return Value

boolean

true if the node is valid/well-formed in the current context and check level defined by wfValidityCheckLevel, false if not.

Exceptions

ExceptionVAL

NO_GRAMMAR_AVAILABLE: Exception is raised if the DocumentEditVAL related to this node does not have any grammar associated with it and wfValidityCheckLevel is set to PARTIAL or STRICT_VALIDITY_CHECK.

Interface ElementEditVAL

This interface extends the Element interface with additional methods for guided document editing. An object implementing this interface must also implement NodeEditVAL interface.

IDL Definition

interface ElementEditVAL : NodeEditVAL {
  readonly attribute NodeList        definedElementTypes;
  unsigned short     contentType();
  boolean            canSetAttribute(in DOMString attrname, 
                                     in DOMString attrval);
  boolean            canSetAttributeNode(in Attr attrNode);
  boolean            canSetAttributeNS(in DOMString namespaceURI, 
                                       in DOMString qualifiedName, 
                                       in DOMString value);
  boolean            canRemoveAttribute(in DOMString attrname);
  boolean            canRemoveAttributeNS(in DOMString namespaceURI, 
                                          in DOMString localName);
  boolean            canRemoveAttributeNode(in Node attrNode);
  NodeList           getChildElements();
  NodeList           getParentElements();
  NodeList           getAttributeList();
  boolean            isElementDefined(in DOMString name);
  boolean            isElementDefinedNS(in DOMString name, 
                                        in DOMString namespaceURI);
};

Attributes

definedElementTypes of type NodeList, readonly: The list of all element nodes defined in non-namespace aware grammar or list of all element nodes belonging to the particular namespace. These are not nodes from the instance document, but rather are new nodes that could be inserted in the document.

Methods

canRemoveAttribute

Verifies if an attribute by the given name can be removed.

Parameters

attrname of type DOMString: Name of attribute.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canRemoveAttributeNS

Verifies if an attribute by the given local name and namespace can be removed.

Parameters

namespaceURI of type DOMString: The namespace URI of the attribute to remove.
localName of type DOMString: Local name of the attribute to be removed.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canRemoveAttributeNode

Determines if an attribute node can be removed.

Parameters

attrNode of type Node: The Attr node to remove from the attribute list.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canSetAttribute

Determines if the value for specified attribute can be set.

Parameters

attrname of type DOMString: Name of attribute.
attrval of type DOMString: Value to be assigned to the attribute.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canSetAttributeNS

Determines if the attribute with given namespace and qualified name can be created if not already present in the attribute list of the element. If the attribute with the same qualified name and namespaceURI is already present in the element's attribute list, it tests whether the value of the attribute and its prefix can be set to the new value. See DOM core setAttributeNS.

Parameters

namespaceURI of type DOMString: namespaceURI of namespace.
qualifiedName of type DOMString: Qualified name of attribute.
value of type DOMString: Value to be assigned to the attribute.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canSetAttributeNode

Determines if an attribute node can be added with respect to the validity check level.

Parameters

attrNode of type Attr: Node in which the attribute can possibly be set.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

contentType

Determines element content type.

Return Value

unsigned short

Constant for one of EMPTY_CONTENTTYPE, ANY_CONTENTTYPE, MIXED_CONTENTTYPE, ELEMENTS_CONTENTTYPE.

No Parameters

No Exceptions

getAttributeList

Returns a NodeList containing all the possible Attrs that can appear with this type of element. These are not nodes from the instance document, but rather are new nodes that could be inserted in the document.

Return Value

NodeList

List of possible attributes of this element.

No Parameters

No Exceptions

getChildElements

Returns a NodeList containing the possible Element nodes that can appear as children of this type of element, with certain conditions as specified below. These are not nodes from the instance document, but rather are new nodes that could be inserted in the document.

Return Value

NodeList

List of possible children element types of this element. Note that if no context of this element exists, then a NULL is returned; an empty list is returned if the element is not in the document tree.

No Parameters

No Exceptions

getParentElements

Returns a NodeList containing the possible Element nodes that can appear as a parent of this type of element, with certain conditions as specified below. These are not nodes from the instance document, but rather are new nodes that could be inserted in the document.

Return Value

NodeList

List of possible parent element types of this element. Note that if no context of this element exists, for example, the parent element of this element, then a NULL is returned; an empty list is returned if the element is not in the document tree.

No Parameters

No Exceptions

isElementDefined

Determines if name is defined in the currently active grammar.

Parameters

name of type DOMString: Name of element.

Return Value

boolean

A boolean that is true if the element is defined, false otherwise.

No Exceptions

isElementDefinedNS

Determines if name in this namespace is defined in the currently active grammar.

Parameters

name of type DOMString: Name of element.
namespaceURI of type DOMString: namespaceURI of namespace.

Return Value

boolean

A boolean that is true if the element is defined, false otherwise.

No Exceptions

Interface CharacterDataEditVAL

This interface extends the NodeEditVAL interface with additional methods for document editing. An object implementing this interface must also implement NodeEditVAL interface.

IDL Definition

interface CharacterDataEditVAL : NodeEditVAL {
  readonly attribute boolean         isWhitespaceOnly;
  boolean            canSetData(in unsigned long offset, 
                                in DOMString arg);
  boolean            canAppendData(in DOMString arg);
  boolean            canReplaceData(in unsigned long offset, 
                                    in unsigned long count, 
                                    in DOMString arg);
  boolean            canInsertData(in unsigned long offset, 
                                   in DOMString arg);
  boolean            canDeleteData(in unsigned long offset, 
                                   in unsigned long count);
};

Attributes

isWhitespaceOnly of type boolean, readonly: true if content only whitespace; false for non-whitespace.

Methods

canAppendData

Determines if data can be appended.

Parameters

arg of type DOMString: Data to be appended.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canDeleteData

Determines if data can be deleted.

Parameters

offset of type unsigned long: Offset.
count of type unsigned long: Number of 16-bit units to delete.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canInsertData

Determines if data can be inserted.

Parameters

offset of type unsigned long: Offset.
arg of type DOMString: Argument to be set.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canReplaceData

Determines if data can be replaced.

Parameters

offset of type unsigned long: Offset.
count of type unsigned long: Replacement.
arg of type DOMString: Argument to be set.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

canSetData

Determines if data can be set.

Parameters

offset of type unsigned long: Offset.
arg of type DOMString: Argument to be set.

Return Value

boolean

true if no reason it can't be done; false if it can't be done.

No Exceptions

1.4. Document Manipulation

Applications would like to be able to use functionality to guide construction and editing of documents, which falls into the document-editing world. Examples of this sort of guided editing already exist, and are becoming more common. The necessary queries can be phrased in several ways, the most useful of which may be a combination of "what does the DTD allow me to insert here" and "if I insert this here, will the document still be valid". The former is better suited to presentation to humans via a user interface, and when taken together with sub-tree validation may subsume the latter.

It has been proposed that in addition to asking questions about specific parts of the grammar, there should be a reasonable way to obtain a list of all the defined symbols of a given type (element, attribute, entity) independent of whether they're valid in a given location; that might be useful in building a list in a user-interface, which could then be updated to reflect which of these are relevant for the program's current state.

Remember that namespaces also weigh in on this issue, in the case of attributes, a "can-this-go-there" may prompt a namespace-well-formedness check and warn you if you're about to conflict with or overwrite another attribute with the same namespaceURI/localName but different prefix, or same nodeName but different namespaceURI.

We have to deal with the fact that "the shortest distance between two valid documents may be through an invalid one". Users may want to know several levels of detail (all the possible children, those which would be valid given what precedes this point, those which would be valid given both preceding and following siblings). Also, once XML Schemas introduce context sensitive validity, we may have to consider the effect of children as well as the individual node being inserted.

1.5. Validating a Document

The most obvious use for a DTD or XML Schema or any grammar is to use it to validate a given XML document. This again falls into the document-editing world. The XML spec only discusses performing this test at the time the document is loaded into the "processor", which most of us have taken to mean that this check should be performed at parse time. But it is obviously desirable to be able to validate again a document -- or selected subtrees -- at other times. One such case would be validating an edited or newly constructed document before serializing it or otherwise passing it to other users. This issue also arises if the "internal subset" is altered -- or if the grammar changes.

In the past, the DOM has allowed users to create invalid documents, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax... or that they would be checked for validity when read back in. We considered adding validity checks to the DOM's existing editing operations to prevent creation of invalid documents, but are currently inclined against this for several reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations, e.g., if the change is occurring in a context where we know the result will be valid. Second, "the shortest distance between two good documents may be through a bad document". Preventing a document from becoming temporarily invalid may impose a considerable amount of additional work on higher-level code and users Hence our current plan is to continue to permit editing to produce invalid DOMs, but provide operations which permit a user to check the validity of a node on demand. If needed one can use continuousValidityChecking flag to ensure that the DOM remains valid during the editing process.

Note that validation includes checking that ID attributes are unique, and that IDREFs point to IDs which actually exist.

1.6. Well-formedness Testing

XML defined the "well-formed" (WF) state for documents which are parsed without reference to their DTDs. Knowing that a document is well-formed may be useful by itself even when a DTD is available. For example, users may wish to deliberately save an invalid document, perhaps as a checkpoint before further editing. Hence, the "Validation" features will permit both full validity checking (see previous section) and "lightweight" WF checking, as requested by the caller, as well as processing entity declarations in the AS even if validation is not turned on.

While the DOM inherently enforces some of XML's well-formedness conditions (proper nesting of elements, constraints on which children may be placed within each node), there are some checks that are not yet performed. These include:

Character restrictions for text content and attribute values. Some characters aren't permitted even when expressed as numeric character entities
The three-character sequence "]]>" in CDATASections.
The two-character sequence "--" in comments. (Which, be it noted, some XML validators don't currently remember to test...)

In addition, Namespaces introduce their own concepts of well-formedness. Specifically:

No two attributes on a single Element may have the same combination of namespaceURI and localName, even if their prefixes are different and hence they don't conflict under XML 1.0 rules.
NamespaceURIs must be legal URI syntax. (Note that once we have this code, it may be reusable for the URI "datatype" in document content; see discussion of datatypes.)
The mapping of namespace prefixes to their URIs must be declared and consistent. That isn't required during normal DOM operation, since we perform "early binding" and thereafter refer to nodes primarily via their namespaceURIs and localName. But it does become an issue when we want to serialize the DOM to XML syntax, and may be an issue if an application is assuming that all the declarations are present and correct. This may imply that we should provide a namespaceNormalize operation, which would create the implied declarations and reconcile conflicts in some reasonably standardized manner. This may be a major undertaking, since some DOMs may be using the namespace to direct subclassing of the nodes or similar special treatment; as with the existing normalize method, you may be left with a different-but-equivalent set of node objects.

In the past, the DOM has allowed users to create documents which violate these rules, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax. We considered adding WF checks to the DOM's existing editing operations to prevent WF violations from arising, but are currently inclined against this for two reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations (for example, if the change is occurring in a context where we know the illegal characters have already been prevented from arising). Second, "the shortest distance between two good documents may be through a bad document" -- preventing a document from becoming temporarily ill-formed may impose a considerable amount of additional work on higher-level code and users. (Note possible issue for Serialization: In some applications, being able to save and reload marginally poorly-formed DOMs might be useful -- editor checkpoint files, for example.) Hence our current plan is to continue to permit editing to produce ill-formed DOMs, but provide operations which permit a user to check the well-formedness of a node on demand, and possibly provide some of the primitive (e.g., string-checking) functions directly.

1. Validation

Table of contents

1.1. Overview

1.1.1. General Characteristics

1.1.2. Use Cases and Requirements

1.2. Exceptions

1.3. Document-Editing Interfaces

1.4. Document Manipulation

1.5. Validating a Document

1.6. Well-formedness Testing