This chapter describes the optional DOM Level 3 Validation feature. This module provides APIs to query information about the XML document.
A DOM application can use the hasFeature
method of
the DOMImplementation
interface to determine whether a
given DOM supports these capabilities or not. This module defines 1
feature string: "VAL-DOC"
for document-editing
interfaces.
This chapter focuses on the editing aspects used in the XML document-editing world and usage of such information.
In the October 9, 1997 DOM requirements document, the following appeared: "There will be a way to determine the presence of a DTD. There will be a way to add, remove, and change declarations in the underlying DTD (if available). There will be a way to test conformance of all or part of the given document against a DTD (if available)." In later discussions, the following was added, "There will be a way to query element/attribute (and maybe other) declarations in the underlying DTD (if available)," supplementing the primitive support for these in Level 1.
That work was deferred past Level 2, in the hope that XML Schemas would be addressed as well. Work was deferred on lowest common denominator general grammar APIs due to heightened interest in XML Schema- specific APIs; however, work on querying information on the grammar was only done for DOM Level 3.
Here are the following use cases and requirements that prompted the functionality in this document:
Use Cases:
Requirements:
This section describes the "VAL-DOC" exceptions.
These operations may throw a ExceptionVAL
as
described in their descriptions.
exception ExceptionVAL { unsigned short code; }; // ExceptionVALCode const unsigned short NO_GRAMMAR_AVAILABLE = 71; const unsigned short VALIDATION_ERR = 72;
An integer indicating the type of error generated.
NO_GRAMMAR_AVAILABLE
DocumentEditVAL
related to the node does not have any grammar and
wfValidityCheckLevel
is set to PARTIAL
or
STRICT_VALIDITY_CHECK
.VALIDATION_ERR
This section contains "Document-editing" methods (includes
Node
, Element
, Text
and
Document
methods).
A DOM application may use the hasFeature(feature,
version)
method of the DOMImplementation
interface with parameter values "VAL-DOC" and "3.0" (respectively)
to determine whether or not the Document-Editing interfaces are
supported by the implementation.
This interface extends the NodeEditVAL
interface with additional methods for document editing.
interface DocumentEditVAL : NodeEditVAL { attribute boolean continuousValidityChecking; void validateDocument() raises(ExceptionVAL); };
continuousValidityChecking
of type boolean
true
will result in an exception being thrown, i.e.,
VALIDATION_ERR
,
for documents that are invalid at the time of the call. When set to
true, the implementation if free to raise the VALIDATION_ERR
exception on DOM operations that would make the document invalid
with respect to "partial validity." If the document is invalid,
then this attribute will remain false
. This attribute
is false
by default.validateDocument
NO_GRAMMAR_AVAILABLE: Raised if an error occurs when the grammar is not available for the document. |
This interface is similar to the [DOM Level 3 Core]
Node
interfaces, with methods for guided document
editing.
interface NodeEditVAL { // CheckTypeVAL const unsigned short WF_CHECK = 1; const unsigned short NS_WF_CHECK = 2; const unsigned short PARTIAL_VALIDITY_CHECK = 3; const unsigned short STRICT_VALIDITY_CHECK = 4; boolean canInsertBefore(in Node newChild, in Node refChild); boolean canRemoveChild(in Node oldChild); boolean canReplaceChild(in Node newChild, in Node oldChild); boolean canAppendChild(in Node newChild); boolean isNodeValid(in boolean deep, in unsigned short wFValidityCheckLevel) raises(ExceptionVAL); };
An integer indicating which type of validation this is.
NS_WF_CHECK
WF_CHECK
.PARTIAL_VALIDITY_CHECK
NS_WF_CHECK
.STRICT_VALIDITY_CHECK
NS_WF_CHECK
.WF_CHECK
canAppendChild
Node.appendChild
.
newChild
of type
Node
Node
to be appended.
|
|
canInsertBefore
Node.insertBefore
operation would make this document
not partially valid with respect to the currently active grammar.
newChild
of type
Node
Node
to be inserted.refChild
of type
Node
Node
.
|
|
canRemoveChild
Node.removeChild
.
oldChild
of type
Node
Node
to be removed.
|
|
canReplaceChild
Node.replaceChild
.
newChild
of type
Node
Node
.oldChild
of type
Node
Node
to be replaced.
|
|
isNodeValid
deep
of type
boolean
deep
flag on causes the
isNodeValid
method to check for the whole subtree of
the current node for validity. Setting it to false
only checks the current node and its immediate child nodes. The
validateDocument
method on the
DocumentVAL
interface, however, checks to determine
whether the entire document is valid.wFValidityCheckLevel
of
type unsigned short
|
|
NO_GRAMMAR_AVAILABLE: Exception is raised if the DocumentEditVAL
related to this node does not have any grammar associated with it
and |
This interface extends the Element
interface with
additional methods for guided document editing. An object
implementing this interface must also implement NodeEditVAL
interface.
interface ElementEditVAL : NodeEditVAL { readonly attribute NodeList definedElementTypes; unsigned short contentType(); boolean canSetAttribute(in DOMString attrname, in DOMString attrval); boolean canSetAttributeNode(in Attr attrNode); boolean canSetAttributeNS(in DOMString namespaceURI, in DOMString qualifiedName, in DOMString value); boolean canRemoveAttribute(in DOMString attrname); boolean canRemoveAttributeNS(in DOMString namespaceURI, in DOMString localName); boolean canRemoveAttributeNode(in Node attrNode); NodeList getChildElements(); NodeList getParentElements(); NodeList getAttributeList(); boolean isElementDefined(in DOMString name); boolean isElementDefinedNS(in DOMString name, in DOMString namespaceURI); };
definedElementTypes
of type NodeList
, readonlycanRemoveAttribute
attrname
of type
DOMString
|
|
canRemoveAttributeNS
namespaceURI
of type
DOMString
localName
of type
DOMString
|
|
canRemoveAttributeNode
attrNode
of type
Node
Attr
node to remove from the attribute
list.
|
|
canSetAttribute
attrname
of type
DOMString
attrval
of type
DOMString
|
|
canSetAttributeNS
setAttributeNS
.
namespaceURI
of type
DOMString
namespaceURI
of namespace.qualifiedName
of type
DOMString
value
of type
DOMString
|
|
canSetAttributeNode
attrNode
of type
Attr
Node
in which the attribute can possibly be
set.
|
|
contentType
|
Constant for one of |
getAttributeList
NodeList
containing all
the possible Attr
s that can appear with this type of
element. These are not nodes from the instance document, but rather
are new nodes that could be inserted in the document.
|
List of possible attributes of this element. |
getChildElements
NodeList
containing the
possible Element
nodes that can appear as children of
this type of element, with certain conditions as specified below.
These are not nodes from the instance document, but rather are new
nodes that could be inserted in the document.
|
List of possible children element types of this element. Note
that if no context of this element exists, then a |
getParentElements
NodeList
containing the
possible Element
nodes that can appear as a parent of
this type of element, with certain conditions as specified below.
These are not nodes from the instance document, but rather are new
nodes that could be inserted in the document.
|
List of possible parent element types of this element. Note that
if no context of this element exists, for example, the parent
element of this element, then a |
isElementDefined
name
is defined in
the currently active grammar.
name
of type
DOMString
|
A boolean that is |
isElementDefinedNS
name
in this
namespace is defined in the currently active grammar.
name
of type
DOMString
namespaceURI
of type
DOMString
namespaceURI
of namespace.
|
A boolean that is |
This interface extends the NodeEditVAL
interface with additional methods for document editing. An object
implementing this interface must also implement NodeEditVAL
interface.
interface CharacterDataEditVAL : NodeEditVAL { readonly attribute boolean isWhitespaceOnly; boolean canSetData(in unsigned long offset, in DOMString arg); boolean canAppendData(in DOMString arg); boolean canReplaceData(in unsigned long offset, in unsigned long count, in DOMString arg); boolean canInsertData(in unsigned long offset, in DOMString arg); boolean canDeleteData(in unsigned long offset, in unsigned long count); };
isWhitespaceOnly
of type boolean
, readonlytrue
if content only whitespace;
false
for non-whitespace.canAppendData
arg
of type
DOMString
|
|
canDeleteData
offset
of type
unsigned long
count
of type
unsigned long
|
|
canInsertData
offset
of type
unsigned long
arg
of type
DOMString
|
|
canReplaceData
offset
of type
unsigned long
count
of type
unsigned long
arg
of type
DOMString
|
|
canSetData
offset
of type
unsigned long
arg
of type
DOMString
|
|
Applications would like to be able to use functionality to guide construction and editing of documents, which falls into the document-editing world. Examples of this sort of guided editing already exist, and are becoming more common. The necessary queries can be phrased in several ways, the most useful of which may be a combination of "what does the DTD allow me to insert here" and "if I insert this here, will the document still be valid". The former is better suited to presentation to humans via a user interface, and when taken together with sub-tree validation may subsume the latter.
It has been proposed that in addition to asking questions about specific parts of the grammar, there should be a reasonable way to obtain a list of all the defined symbols of a given type (element, attribute, entity) independent of whether they're valid in a given location; that might be useful in building a list in a user-interface, which could then be updated to reflect which of these are relevant for the program's current state.
Remember that namespaces also weigh in on this issue, in the case of attributes, a "can-this-go-there" may prompt a namespace-well-formedness check and warn you if you're about to conflict with or overwrite another attribute with the same namespaceURI/localName but different prefix, or same nodeName but different namespaceURI.
We have to deal with the fact that "the shortest distance between two valid documents may be through an invalid one". Users may want to know several levels of detail (all the possible children, those which would be valid given what precedes this point, those which would be valid given both preceding and following siblings). Also, once XML Schemas introduce context sensitive validity, we may have to consider the effect of children as well as the individual node being inserted.
The most obvious use for a DTD or XML Schema or any grammar is to use it to validate a given XML document. This again falls into the document-editing world. The XML spec only discusses performing this test at the time the document is loaded into the "processor", which most of us have taken to mean that this check should be performed at parse time. But it is obviously desirable to be able to validate again a document -- or selected subtrees -- at other times. One such case would be validating an edited or newly constructed document before serializing it or otherwise passing it to other users. This issue also arises if the "internal subset" is altered -- or if the grammar changes.
In the past, the DOM has allowed users to create invalid
documents, and assumed the serializer would accept the task of
detecting problems and announcing/repairing them when the document
was written out in XML syntax... or that they would be checked for
validity when read back in. We considered adding validity checks to
the DOM's existing editing operations to prevent creation of
invalid documents, but are currently inclined against this for
several reasons. First, it would impose a significant amount of
computational overhead to the DOM, which might be unnecessary in
many situations, e.g., if the change is occurring in a context
where we know the result will be valid. Second, "the shortest
distance between two good documents may be through a bad document".
Preventing a document from becoming temporarily invalid may impose
a considerable amount of additional work on higher-level code and
users Hence our current plan is to continue to permit editing to
produce invalid DOMs, but provide operations which permit a user to
check the validity of a node on demand. If needed one can use
continuousValidityChecking
flag to ensure that the DOM
remains valid during the editing process.
Note that validation includes checking that ID attributes are unique, and that IDREFs point to IDs which actually exist.
XML defined the "well-formed" (WF) state for documents which are parsed without reference to their DTDs. Knowing that a document is well-formed may be useful by itself even when a DTD is available. For example, users may wish to deliberately save an invalid document, perhaps as a checkpoint before further editing. Hence, the "Validation" features will permit both full validity checking (see previous section) and "lightweight" WF checking, as requested by the caller, as well as processing entity declarations in the AS even if validation is not turned on.
While the DOM inherently enforces some of XML's well-formedness conditions (proper nesting of elements, constraints on which children may be placed within each node), there are some checks that are not yet performed. These include:
In addition, Namespaces introduce their own concepts of well-formedness. Specifically:
namespaceNormalize
operation, which would
create the implied declarations and reconcile conflicts in some
reasonably standardized manner. This may be a major undertaking,
since some DOMs may be using the namespace to direct subclassing of
the nodes or similar special treatment; as with the existing
normalize
method, you may be left with a
different-but-equivalent set of node objects.In the past, the DOM has allowed users to create documents which violate these rules, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax. We considered adding WF checks to the DOM's existing editing operations to prevent WF violations from arising, but are currently inclined against this for two reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations (for example, if the change is occurring in a context where we know the illegal characters have already been prevented from arising). Second, "the shortest distance between two good documents may be through a bad document" -- preventing a document from becoming temporarily ill-formed may impose a considerable amount of additional work on higher-level code and users. (Note possible issue for Serialization: In some applications, being able to save and reload marginally poorly-formed DOMs might be useful -- editor checkpoint files, for example.) Hence our current plan is to continue to permit editing to produce ill-formed DOMs, but provide operations which permit a user to check the well-formedness of a node on demand, and possibly provide some of the primitive (e.g., string-checking) functions directly.