This chapter describes the optional DOM Level 3 Content
Model (CM) feature. This module provides a
representation for XML content models, e.g., DTDs and XML Schemas,
together with operations on the content models, and how such
information within the content models could be applied to XML
documents used in both the document-editing and CM-editing worlds.
It also provides additional tests for well-formedness of XML
documents, including Namespace well-formedness. A DOM application
can use the hasFeature method of
theDOMImplementation interface to determine whether a
given DOM supports these capabilities or not. One feature string
for the CM-editing interfaces listed in this section is "CM-EDIT"
and another feature string for document-editing interfaces is
"CM-DOC".
This chapter interacts strongly with the Load and Save chapter, which is also under development in DOM Level 3. Not only will that code serialize/deserialize content models, but it may also wind up defining its well-formedness and validity checks in terms of what is defined in this chapter. In addition, the CM and Load/Save functional areas will share a common error-reporting mechanism allowing user-registered error callbacks. Note that this may not imply that the parser actually calls the DOM's validation code -- it may be able to achieve better performance via its own -- but the appearance to the user should probably be "as if" the DOM has been asked to validate the document, and parsers should probably be able to validate newly loaded documents in terms of a previously loaded DOM CM.
Finally, this chapter will have separate sections to address the needs of the document-editing and CM-editing worlds, along with a section that details overlapping areas such as validation. In this manner, the document-editing world's focuses on editing aspects and usage of information in the CM are made distinct from the CM-editing world's focuses on defining and manipulating the information in the CM.
In the October 9, 1997 DOM requirements document, the following appeared: "There will be a way to determine the presence of a DTD. There will be a way to add, remove, and change declarations in the underlying DTD (if available). There will be a way to test conformance of all or part of the given document against a DTD (if available)." In later discussions, the following was added, "There will be a way to query element/attribute (and maybe other) declarations in the underlying DTD (if available)," supplementing the primitive support for these in Level 1.
That work was deferred past Level 2, in the hope that XML Schemas would be addressed as well. It is anticipated that lowest common denominator general APIs generated in this chapter can support both DTDs and XML Schemas, and other XML content models down the road.
The kinds of information that a Content Model must make available are mostly self-evident from the definitions of Infoset, DTDs, and XML Schemas. Note that some kinds of information on which the DOM already relies, e.g., default values for attributes, will finally be given a visible representation here, however.
The content model referenced in these use cases/requirements is an abstraction and does not refer solely to DTDs or XML Schemas.
For the CM-editing and document-editing worlds, the following use cases and requirements are common to both and could be labeled as the "Validation and Other Common Functionality" section:
Use Cases:
Requirements:
Specific to the CM-editing world, the following are use cases and requirements and could be labeled as the "CM-editing" section:
Use Cases:
Requirements:
Specific to the document-editing world, the following are use cases and requirements and could be labeled as the "Document-editing" section:
Use Cases:
Requirements:
General Issues:
QName, e.g.,
foo:bar, whereas the latter will report its namespace
and local name, e.g., {http://my.namespace}bar. We
have added the isNamespaceAware attribute to the
generic CM object to help applications determine which of these
fields are important, but we are still analyzing this
challenge.A list of the proposed Content Model data structures and functions follow, starting off with the data structures and "CM-editing" methods.
CMModel is an abstract object that could map to a
DTD, an XML Schema, a database schema, etc. It's a generalized
content model object, that has both an internal and external
subset. The internal subset would always exist, even if empty, with
the external subset (if present) being represented as by an
"active" CMExternalModel.
Many CMExternalModels
could exist, but only one can be specified as "active"; it is also
possible that none are "active". The issue of multiple content
models is misleading since in this architecture, only one
CMModel exists, with an internal subset that
references the external subset. If the external subset changes to
another "acitve" CMExternalModel,
the internal subset is "fixed up." The CMModel also contains the
factory methods needed to create a various types of CMNodes like
CMElementDeclaration,
CMAttributeDeclaration, etc.
interface CMModel : CMNode {
readonly attribute boolean isNamespaceAware;
attribute CMElementDeclaration rootElementDecl;
DOMString getLocation();
nsElement getCMNamespace();
CMNamedNodeMap getCMNodes();
boolean removeNode(in CMNode node);
boolean insertBefore(in CMNode newNode,
in CMNode refNode);
boolean validate();
CMElementDeclaration createCMElementDeclaration(inout DOMString namespaceURI,
in DOMString qualifiedElementName,
in int contentSpec)
raises(DOMException);
CMAttributeDeclaration createCMAttributeDeclaration(inout DOMString namespaceURI,
in DOMString qualifiedName)
raises(DOMException);
CMNotationDeclaration createCMNotationDeclaration(in DOMString name,
in DOMString systemIdentifier,
inout DOMString publicIdentifier)
raises(DOMException);
CMEntityDeclaration createCMEntityDeclaration(in DOMString name)
raises(DOMException);
CMChildren createCMChildren(in unsigned long minOccurs,
in unsigned long maxOccurs,
inout unsigned short operator)
raises(DOMException);
};
isNamespaceAware
of type boolean, readonlyQNames.rootElementDecl of
type
CMElementDeclarationcreateCMAttributeDeclarationnamespaceURI of type
DOMStringqualifiedName of type
DOMString|
A new CMAttributeDeclaration object with
|
|
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. |
createCMChildrenminOccurs of type
unsigned longmaxOccurs of type
unsigned longoperator of type
unsigned short|
A new CMChildren object. |
|
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. |
createCMElementDeclarationnamespaceURI of type
DOMStringqualifiedElementName of
type DOMStringcontentSpec of type
int|
A new CMElementDeclaration object with |
|
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. DUPLICATE_NAME_ERR:Raised if an element declaration already exists with the same name for a given CMModel. |
createCMEntityDeclarationname of type
DOMString|
A new CMNotationDeclaration object with |
|
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. |
createCMNotationDeclarationname of type
DOMStringsystemIdentifier of type
DOMStringpublicIdentifier of type
DOMString|
A new CMNotationDeclaration object with
|
|
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. DUPLICATE_NAME_ERR:Raised if a notation declaration already exists with the same name for a given CMModel. |
getCMNamespaceCMModel.
|
|
Namespace of |
getCMNodesgetLocation|
|
This method returns a DOMString defining the absolute location from which this document is retrieved including the document name. |
insertBeforeremoveNodevalidate|
|
Is the CM valid? |
CMExternalModel is an abstract object that could
map to a DTD, an XML Schema, a database schema, etc. It's a
generalized content model object that is not bound to a particular
XML document.
interface CMExternalModel : CMModel {
};
CMNodeis analogous to a Node in the
Core DOM, e.g., an element declaration. This can exist for both CMExternalModel
and CMModel.
It should be able to handle constructs such as comments and
processing instructions.
Opaque.
interface CMNode {
const unsigned short CM_ELEMENT_DECLARATION = 1;
const unsigned short CM_ATTRIBUTE_DECLARATION = 2;
const unsigned short CM_NOTATION_DECLARATION = 3;
const unsigned short CM_ENTITY_DECLARATION = 4;
const unsigned short CM_CHILDREN = 5;
const unsigned short CM_MODEL = 6;
const unsigned short CM_EXTERNALMODEL = 7;
readonly attribute unsigned short cmNodeType;
attribute CMModel ownerCMModel;
attribute DOMString nodeName;
attribute DOMString prefix;
attribute DOMString localName;
attribute DOMString namespaceURI;
CMNode clone();
};
CMElementDeclaration.CMAttributeDeclaration.
CMNotationDeclaration.CMEntityDeclaration.
CMChildren.CMModel.CMExternalModel.cmNodeType of type unsigned
short, readonlylocalName of type
DOMStringqualified name of
this CMNode.namespaceURI of type
DOMStringnodeName of type
DOMStringqualified name of this CMNode depending on the
CMNode type.ownerCMModel of type CMModelCMModel
object associated with this CMNode. For a node of type
CM_MODEL, this is null.prefix of type
DOMStringCMNodeList is the CM analogue to
NodeList; the document order is meaningful, as opposed
to CMNamedNodeMap.
interface CMNodeList {
};
CMNamedNodeMap is the CM analogue to
NamedNodeMap. The order is not meaningful.
interface CMNamedNodeMap {
};
The primitive datatypes supported by base DOM CM implementation
is: string type only.
interface CMDataType {
const short STRING_DATATYPE = 1;
short getCMPrimitiveType();
};
string data type as defined
in XML
Schema Datatypes.getCMPrimitiveType|
|
code representing the primitive type of the attached data item. |
The primitive types supported by optional DOM CM implelementations. A DOM application can use the hasFeature method of the DOMImplementation interface to determine whether this interface is supported or not. The feature string for all the interfaces listed in this section is "CMPTYPES" and the version is "3.0".
interface CMPrimitiveType : CMDataType {
const short BOOLEAN_DATATYPE = 2;
const short FLOAT_DATATYPE = 3;
const short DOUBLE_DATATYPE = 4;
const short DECIMAL_DATATYPE = 5;
const short HEXBINARY_DATATYPE = 6;
const short BASE64BINARY_DATATYPE = 7;
const short ANYURI_DATATYPE = 8;
const short QNAME_DATATYPE = 9;
const short DURATION_DATATYPE = 10;
const short DATETIME_DATATYPE = 11;
const short DATE_DATATYPE = 12;
const short TIME_DATATYPE = 13;
const short YEARMONTH_DATATYPE = 14;
const short YEAR_DATATYPE = 15;
const short MONTHDAY_DATATYPE = 16;
const short DAY_DATATYPE = 17;
const short MONTH_DATATYPE = 18;
const short NOTATION_DATATYPE = 19;
attribute decimal lowValue;
attribute decimal highValue;
};
boolean data type as defined
in XML
Schema Datatypes.float data type as defined
in XML
Schema Datatypes.double data type as defined
in XML
Schema Datatypes.decimal data type as defined
in XML
Schema Datatypes.hexbinary data type as defined
in XML
Schema Datatypes.base64binary data type as
defined in XML Schema
Datatypes.uri reference data type as
defined in XML Schema Datatypes.
Note: @@uriReference is no longer part of the XML Schema PR draft.
XML qualified name data type
as defined in XML
Schema Datatypes.duration data type as defined
in XML
Schema Datatypes.datetime data type as defined
in XML
Schema Datatypes.date data type as defined in XML Schema
Datatypes.time data type as defined in
XML
Schema Datatypes.yearmonth data type as defined
in XML
Schema Datatypes.year data type as defined in
XML
Schema Datatypes.monthday data type as defined
in XML
Schema Datatypes.day data type as defined in XML Schema
Datatypes.month data type as defined in
XML
Schema Datatypes.NOTATIONdata type as defined in
XML
Schema Datatypes.The element name along with the content specification in the
context of a CMNode.
interface CMElementDeclaration : CMNode {
attribute CMDataType elementType;
readonly attribute boolean isPCDataOnly;
attribute DOMString tagName;
int getContentType();
CMChildren getCMChildren();
CMNamedNodeMap getCMAttributes();
CMNamedNodeMap getCMGrandChildren();
};
elementType
of type
CMDataTypeisPCDataOnly
of type boolean, readonlytagName of type
DOMStringgetCMAttributesCMNamedNodeMap
containing
CMAttributeDeclarations for all the attributes
that can appear on this type of element.
|
Attributes list for this |
getCMChildren|
Content model of element. |
getCMGrandChildrenCMNamedNodeMap
containing CMElementDeclarations for all the
Elements that can appear as children of this type of
element. Note that which ones can actually appear, and in what
order, is defined by the
CMChildren.
|
Children list for this |
getContentType|
|
Content type constant. |
The content model of a declared element.
interface CMChildren : CMNode {
const unsigned long UNBOUNDED = MAX_LONG;
const unsigned short NONE = 0;
const unsigned short SEQUENCE = 1;
const unsigned short CHOICE = 2;
attribute unsigned short listOperator;
attribute unsigned long minOccurs;
attribute unsigned long maxOccurs;
attribute CMNodeList subModels;
CMNode removeCMNode(in unsigned long nodeIndex);
int insertCMNode(in unsigned long nodeIndex,
in CMNode newNode);
int appendCMNode(in CMNode newNode);
};
subModels. This is
usually the case where the subModels contain a single element
declaration.listOperator
of type unsigned shortsubModels. For example,
if the list operator is CHOICE and the components in subModels are
a, b and c then the content model for the element being declared is
(a|b|c)maxOccurs
of type unsigned longminOccurs
of type unsigned longsubModels
of type CMNodeListCMNodes
in which the element can be defined.appendCMNodesubModels.
newNode of type CMNode|
|
the length of the |
insertCMNodenodeIndex of type
unsigned longnewNode of type CMNode|
|
The index value at which it is inserted. If the nodeIndex is
outside the bound of the |
removeCMNodenodeIndex of type
unsigned long|
The node removed is returned as a result of this method call.
The method returns |
An attribute declaration in the context of a CMNode.
interface CMAttributeDeclaration : CMNode {
const short NO_VALUE_CONSTRAINT = 0;
const short DEFAULT_VALUE_CONSTRAINT = 1;
const short FIXED_VALUE_CONSTRAINT = 2;
attribute DOMString attrName;
attribute CMDataType attrType;
attribute DOMString attributeValue;
attribute DOMString enumAttr;
attribute CMNodeList ownerElement;
attribute short constraintType;
};
attrName of type
DOMStringattrType of type
CMDataTypeattributeValue
of type DOMStringconstraintType
of type shortenumAttr of type
DOMStringownerElement of
type CMNodeListModels a general entity declaration in a content model.
interface CMEntityDeclaration : CMNode {
const short INTERNAL_ENTITY = 1;
const short EXTERNAL_ENTITY = 2;
attribute short entityType;
attribute DOMString entityName;
attribute DOMString entityValue;
attribute DOMString systemId;
attribute DOMString publicId;
attribute DOMString notationName;
};
entityName
of type DOMStringentityType
of type shortentityValue
of type DOMStringnull.notationName
of type DOMStringnull.publicId
of type DOMStringnull.systemId
of type DOMStringnull.This interface represents a notation declaration.
interface CMNotationDeclaration : CMNode {
attribute DOMString notationName;
attribute DOMString systemId;
attribute DOMString publicId;
};
notationName
of type DOMStringpublicId
of type DOMStringsystemId
of type DOMStringThis section contains "Validation and Other" methods common to
both the document-editing and CM-editing worlds (includes Document,
DOMImplementation, and DOMErrorHandler
methods).
The setErrorHandler method is off of the
Document interface.
interface Document {
void setErrorHandler(in DOMErrorHandler handler);
};
setErrorHandlerhandler of type DOMErrorHandlerThis interface extends the Document
interface with additional methods for both document and CM
editing.
interface DocumentCM : Document {
const short WF_CHECK = 1;
const short NS_WF_CHECK = 2;
const short PARTIAL_VALIDITY_CHECK = 3;
const short STRICT_VALIDITY_CHECK = 4;
attribute boolean continuousValidityChecking;
attribute short wfValidityCheckLevel;
int numCMs();
CMModel getInternalCM();
CMNodeList getCMs();
CMModel getActiveCM();
void addCM(in CMModel cm);
void removeCM(in CMModel cm);
boolean activateCM(in CMModel cm);
};
continuousValidityChecking
of type booleanwfValidityCheckLevel
of type shortisValid
method.activateCMCMModel
active. Note that if a user wants to activate one CM to get default
attribute values and then activate another to do validation, a user
can do that; however, only one CM is active at a time. In case
where an attribute is declared in an internal subset and
corresponding ownerElement points to
CMElementDeclaration defined in an external subset,
changing active CM will cause the ownerElement to be
re-computed. If the owner element is not defined in the newly
active CM, the ownerElement will be an empty node
list.
cm of type CMModelCMModel
points to a list of CMExternalModels;
with this call, only the specified CM will be active.|
|
True if the |
addCMCMModel
with a document. Can be invoked multiple times to result in a list
of CMExternalModels.
Note that only one sole internal CMModel
is associated with the document, however, and that only one of the
possible list of CMExternalModels
is active at any one time.
cm of type CMModelgetActiveCMCMExternalModel
for a document.
|
|
getCMsCMNodes
of typeCM_EXTERNALMODELs associated with the
document.This list arises when addCM() is invoked.
|
A list of |
getInternalCMnumCMsCMExternalModels
associated with the document. Only one CMModel
can be associated with the document, but it may point to a list of
CMExternalModels.
|
|
Non-negative number of external CM objects. |
removeCMCMExternalModel.
Can be invoked multiple times to remove a number of these in the
list of CMExternalModels.
cm of type CMModelThis interface extends the DOMImplementation
interface with additional methods.
interface DOMImplementationCM : DOMImplementation {
CMModel createCM();
CMExternalModel createExternalCM();
};
createCM|
A NULL return indicates failure. |
createExternalCM|
A NULL return indicates failure. |
This section contains "Document-editing" methods (includes
Node, Element, Text and Document
methods).
This interface extends the Node interface with
additional methods for guided document editing.
interface NodeCM : Node {
boolean canInsertBefore(in Node newChild,
in Node refChild)
raises(DOMException);
boolean canRemoveChild(in Node oldChild)
raises(DOMException);
boolean canReplaceChild(in Node newChild,
in Node oldChild)
raises(DOMException);
boolean canAppendChild(in Node newChild)
raises(DOMException);
boolean isValid()
raises(DOMException);
};
canAppendChildAppendChild.
newChild of type
NodeNode to be appended.|
|
Success or failure. |
|
|
DOMException. |
canInsertBeforeNode::InsertBefore operation would make this document
invalid with respect to the currently active CM. ISSUE: Describe
"valid" when referring to partially completed documents.
newChild of type
NodeNode to be inserted.refChild of type
NodeNode.|
|
A boolean that is true if the |
|
|
DOMException. |
canRemoveChildRemoveChild.
oldChild of type
NodeNode to be removed.|
|
Success or failure. |
|
|
DOMException. |
canReplaceChildReplaceChild.
newChild of type
NodeNode.oldChild of type
NodeNode to be replaced.|
|
Success or failure. |
|
|
DOMException. |
isValid|
|
True if the node is valid/well-formed in the current context and check level defined by wfValidityCheckLevel, false if not. |
|
|
NO_CM_AVAILABLE: Exception is raised if the DocumentCM related to this node does not have any activeCM and wfValidityCheckLevel is set to STRICT_VALIDITY_CHECK. |
This interface extends the Element interface with
additional methods for guided document editing.
interface ElementCM : Element,NodeCM {
int contentType();
CMElementDeclaration getElementDeclaration()
raises(DOMException);
boolean canSetAttribute(in DOMString attrname,
in DOMString attrval);
boolean canSetAttributeNode(in Node node);
boolean canSetAttributeNodeNS(in Node node);
boolean canSetAttributeNS(in DOMString attrname,
in DOMString attrval,
in DOMString namespaceURI,
in DOMString localName);
boolean canRemoveAttribute(in DOMString attrname);
boolean canRemoveAttributeNS(in DOMString attrname,
inout DOMString namespaceURI);
boolean canRemoveAttributeNode(in Node node);
};
canRemoveAttributeattrname of type
DOMString|
|
true or false. |
canRemoveAttributeNSattrname of type
DOMStringnamespaceURI of type
DOMString|
|
true or false. |
canRemoveAttributeNodenode of type
NodeAttr node to remove from the attribute
list.|
|
true or false. |
canSetAttributeattrname of type
DOMStringattrval of type
DOMString|
|
true or false. |
canSetAttributeNSsetAttributeNS.
attrname of type
DOMStringattrval of type
DOMStringnamespaceURI of type
DOMStringnamespaceURI of namespace.localName of type
DOMStringlocalName of namespace.|
|
Success or failure. |
canSetAttributeNodenode of type
NodeNode in which the attribute can possibly be
set.|
|
Success or failure. |
canSetAttributeNodeNSnode of type
NodeAttr to be added to the attribute list.|
|
Success or failure. |
contentType|
|
Constant for mixed, empty, any, etc. |
getElementDeclaration|
CMElementDeclaration object |
|
|
If no DTD is present raises this exception |
This interface extends the CharacterData interface
with additional methods for document editing.
interface CharacterDataCM : Text,NodeCM {
boolean isWhitespaceOnly();
boolean canSetData(in unsigned long offset,
in DOMString arg)
raises(DOMException);
boolean canAppendData(in DOMString arg)
raises(DOMException);
boolean canReplaceData(in unsigned long offset,
in unsigned long count,
in DOMString arg)
raises(DOMException);
boolean canInsertData(in unsigned long offset,
in DOMString arg)
raises(DOMException);
boolean canDeleteData(in unsigned long offset,
in DOMString arg)
raises(DOMException);
};
canAppendDataarg of type
DOMString|
|
Success or failure. |
|
|
DOMException. |
canDeleteDataoffset of type
unsigned longarg of type
DOMString|
|
Success or failure. |
|
|
DOMException. |
canInsertDataoffset of type
unsigned longarg of type
DOMString|
|
Success or failure. |
|
|
DOMException. |
canReplaceDataoffset of type
unsigned longcount of type
unsigned longarg of type
DOMString|
|
Success or failure. |
|
|
DOMException. |
canSetDataoffset of type
unsigned longarg of type
DOMString|
|
Success or failure. |
|
|
DOMException. |
isWhitespaceOnly|
|
True if content only whitespace; false for non-whitespace if it is a text node in element content. |
This interface extends the DocumentType interface
with additional methods for document editing.
interface DocumentTypeCM : DocumentType,NodeCM {
boolean isElementDefined(in DOMString elemTypeName);
boolean isElementDefinedNS(in DOMString elemTypeName,
in DOMString namespaceURI,
in DOMString localName);
boolean isAttributeDefined(in DOMString elemTypeName,
in DOMString attrName);
boolean isAttributeDefinedNS(in DOMString elemTypeName,
in DOMString attrName,
in DOMString namespaceURI,
in DOMString localName);
boolean isEntityDefined(in DOMString entName);
};
isAttributeDefinedelemTypeName of type
DOMStringattrName of type
DOMString|
|
Success or failure. |
isAttributeDefinedNSelemTypeName of type
DOMStringattrName of type
DOMStringnamespaceURI of type
DOMStringnamespaceURI of namespace.localName of type
DOMStringlocalName of namespace.|
|
Success or failure. |
isElementDefinedelemTypeName of type
DOMString|
|
Success or failure. |
isElementDefinedNSelemTypeName of type
DOMStringnamespaceURI of type
DOMStringnamespaceURI of namespace.localName of type
DOMStringlocalName of namespace.|
|
Success or failure. |
isEntityDefinedentName of type
DOMString|
|
Success or failure. |
This interface extends Attr to provide guided
editing of an XML document.
interface AttributeCM : Attr,NodeCM {
CMAttributeDeclaration getAttributeDeclaration();
CMNotationDeclaration getNotation()
raises(DOMException);
};
getAttributeDeclaration|
The attribute declaration corresponding to this attribute |
getNotation|
Returns the notation declaration for this attribute if the type is of notation type, null otherwise. |
|
|
DOMException |
This section contains DOM error handling interfaces.
Basic interface for DOM error handlers. If an application needs
to implement customized error handling for DOM such as CM or
Load/Save, it must implement this interface and then register an
instance using the setErrorHandler method. All errors
and warnings will then be reported through this interface.
Application writers can override the methods in a subclass to take
user-specified actions.
interface DOMErrorHandler {
void warning(in DOMLocator where,
in DOMString how,
in DOMString why)
raises(DOMSystemException);
void fatalError(in DOMLocator where,
in DOMString how,
in DOMString why)
raises(DOMSystemException);
void error(in DOMLocator where,
in DOMString how,
in DOMString why)
raises(DOMSystemException);
};
errorwhere of type DOMLocatorhow of type
DOMStringwhy of type
DOMString|
|
A subclass of DOMException. |
fatalErrorwhere of type DOMLocatorhow of type
DOMStringwhy of type
DOMString|
|
A subclass of DOMException. |
warningwhere of type DOMLocatorhow of type
DOMStringwhy of type
DOMString|
|
A subclass of DOMException. |
This interface provides document location information and is similar to a SAX locator object.
interface DOMLocator {
int getColumnNumber();
int getLineNumber();
DOMString getPublicID();
DOMString getSystemID();
Node getNode();
};
getColumnNumber|
|
The column number, or -1 if none is available. |
getLineNumber|
|
The line number, or -1 if none is available. |
getNode|
|
The NODE, or null if none is available. |
getPublicID|
|
A string containing the public identifier, or null if none is available. |
getSystemID|
|
A string containing the system identifier, or null if none is available. |
Editing and generating a content model falls in the CM-editing world. The most obvious requirement for this set of requirements is for tools that author content models, either under user control, i.e., explicitly designed document types, or generated from other representations. The latter class includes transcoding tools, e.g., synthesizing an XML representation to match a database schema.
It's important to note here that a DTD's "internal subset" is part of the Content Model, yet is loaded, stored, and maintained as part of the individual document instance. This implies that even tools which do not want to let users change the definition of the Document Type may need to support editing operations upon this portion of the CM. It also means that our representation of the CM must be aware of where each portion of its content resides, so that when the serializer processes this document it can write out just the internal subset. A similar issue may arise with external parsed entities, or if schemas introduce the ability to reference other schemas. Finally, the internal-subset case suggests that we may want at least a two-level representation of content models, so a single DOM representation of a DTD can be shared among several documents, each potentially also having its own internal subset; it's possible that entity layering may be represented the same way.
The API for altering the content model may also be the CM's official interface with parsers. One of the ongoing problems in the DOM is that there is some information which must currently be created via completely undocumented mechanisms, which limits the ability to mix and match DOMs and parsers. Given that specialized DOMs are going to become more common (sub-classed, or wrappers around other kinds of storage, or optimized for specific tasks), we must avoid that situation and provide a "builder" API. Particular pairs of DOMs and parsers may bypass it, but it's required as a portability mechanism.
Note that several of these applications require that a CM be able to be created, loaded, and manipulated without/before being bound to a specific Document. A related issue is that we'd want to be able to share a single representation of a CM among several documents, both for storage efficiency and so that changes in the CM can quickly be tested by validating it against a set of known-good documents. Similarly, there is a known problem in DOM Level 2 where we assume that the DocumentType will be created before the Document, which is fine for newly-constructed documents but not a good match for the order in which an XML parser encounters this data; being able to "rebind" a Document to a new CM, after it has been created may be desirable.
As noted earlier, questions about whether one can alter the content of the CM via its syntax, via higher-level abstractions, or both, exist. It's also worth noting that many of the editing concepts from the Document tree still apply; users should probably be able to clone part of a CM, remove and re-insert parts, and so on.
In addition to using the content model to validate a document instance, applications would like to be able to use it to guide construction and editing of documents, which falls into the document-editing world. Examples of this sort of guided editing already exist, and are becoming more common. The necessary queries can be phrased in several ways, the most useful of which may be a combination of "what does the DTD allow me to insert here" and "if I insert this here, will the document still be valid". The former is better suited to presentation to humans via a user interface, and when taken together with sub-tree validation may subsume the latter.
It has been proposed that in addition to asking questions about specific parts of the content model, there should be a reasonable way to obtain a list of all the defined symbols of a given type (element, attribute, entity) independent of whether they're valid in a given location; that might be useful in building a list in a user-interface, which could then be updated to reflect which of these are relevant for the program's current state.
Remember that namespaces also weigh in on this issue, in the case of attributes, a "can-this-go-there" may prompt a namespace-well-formedness check and warn you if you're about to conflict with or overwrite another attribute with the same namespaceURI/localName but different prefix... or same nodeName but different namespaceURI.
As mentioned above, we have to deal with the fact that the shortest distance between two valid documents may be through an invalid one. Users may want to know several levels of detail (all the possible children, those which would be valid given what precedes this point, those which would be valid given both preceding and following siblings). Also, once XML Schemas introduce context sensitive validity, we may have to consider the effect of children as well as the individual node being inserted.
The most obvious use for a content model (DTD or XML Schema or any Content Model) is to use it to validate that a given XML document is in fact a properly constructed instance of the document type described by this CM. This again falls into the document-editing world. The XML spec only discusses performing this test at the time the document is loaded into the "processor", which most of us have taken to mean that this check should be performed at parse time. But it is obviously desirable to be able to validate again a document -- or selected subtrees -- at other times. One such case would be validating an edited or newly constructed document before serializing it or otherwise passing it to other users. This issue also arises if the "internal subset" is altered -- or if the whole Content Model changes.
In the past, the DOM has allowed users to create invalid documents, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax... or that they would be checked for validity when read back in. We considered adding validity checks to the DOM's existing editing operations to prevent creation of invalid documents, but are currently inclined against this for several reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations, e.g., if the change is occurring in a context where we know the result will be valid. Second, "the shortest distance between two good documents may be through a bad document". Preventing a document from becoming temporarily invalid may impose a considerable amount of additional work on higher-level code and users Hence our current plan is to continue to permit editing to produce invalid DOMs, but provide operations which permit a user to check the validity of a node on demand.
Note that validation includes checking that ID attributes are unique, and that IDREFs point to IDs which actually exist.
XML defined the "well-formed" (WF) state for documents which are parsed without reference to their DTDs. Knowing that a document is well-formed may be useful by itself even when a DTD is available. For example, users may wish to deliberately save an invalid document, perhaps as a checkpoint before further editing. Hence, the CM feature will permit both full validity checking (see previous section) and "lightweight" WF checking, as requested by the caller, as well as processing entity declarations in the CM even if validation is not turned on. This falls within the document-editing world.
While the DOM inherently enforces some of XML's well-formedness conditions (proper nesting of elements, constraints on which children may be placed within each node), there are some checks that are not yet performed. These include:
In addition, Namespaces introduce their own concepts of well-formedness. Specifically:
namespaceNormalize operation, which would
create the implied declarations and reconcile conflicts in some
reasonably standardized manner. This may be a major undertaking,
since some DOMs may be using the namespace to direct subclassing of
the nodes or similar special treatment; as with the existing
normalize method, you may be left with a
different-but-equivalent set of node objects.In the past, the DOM has allowed users to create documents which violate these rules, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax. We considered adding WF checks to the DOM's existing editing operations to prevent WF violations from arising, but are currently inclined against this for two reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations (for example, if the change is occurring in a context where we know the illegal characters have already been prevented from arising). Second, "the shortest distance between two good documents may be through a bad document" -- preventing a document from becoming temporarily ill-formed may impose a considerable amount of additional work on higher-level code and users. (Note possible issue for Serialization: In some applications, being able to save and reload marginally poorly-formed DOMs might be useful -- editor checkpoint files, for example.) Hence our current plan is to continue to permit editing to produce ill-formed DOMs, but provide operations which permit a user to check the well-formedness of a node on demand, and possibly provide some of the primitive (e.g., string-checking) functions directly.