WD-DOM-Level-2-19981228

3. Document Object Model Filters and Iterators

Editors: Mike Champion, Arbortext; Jonathan Robie, Texcel

3.1. Overview of the DOM Level 2 Filter and Iterator Interfaces

The DOM Level 2 Filter and Iterator interfaces extend the functionality of the DOM to allow simple and efficient traversal of document subtrees, node lists, or the results of queries.

This proposal contains Filter and Iterator interfaces, but no query interfaces. A separate specification will be prepared for query interfaces, which will be query-language independent.

3.1.1. Iterators

In several popular approaches to software design, iterators are considered a basic building block for building reusable software and software libraries. For instance, they are fundamental to the Design Patterns approach, STL, and the Java libraries. The main advantages of node iterators in the DOM are:

Abstracting out the way that specific data structures are navigated. Functions that use iterators can operate on any data structure without knowing the details of how that data structure is navigated; e.g., the same function could process the nodes in a document, a document subtree, or a nodelist. The function can keep asking for the next node without worrying about how that node is found.
Allowing more efficient navigation. Because an iterator hides the manner in which a data structure is navigated, it can use indexes or other supplementary data structures to allow more efficient navigation than might be possible by naively navigating from one node to the next.
Providing views for the most common ways applications want to navigate document structures. Some applications traverse only the element tree, others process additional nodes such as processing instructions or comments, others prefer yet another view. There is no one right way to navigate a document tree, but iterators provide a simple, efficient way to choose the most appropriate view of the document tree for a given application.

An iterator allows the nodes of a data structure to be returned sequentially. When an iterator is first created, calling nextNode() returns the first node. When no more nodes are present, nextNode() returns a null. It is important to remember that DOM structures may change as a document is loaded - when nextNode() finds no more nodes, it is still quite possible that further nodes may be added in the next instant. Since iterators do not know how to predict the future, there is no way to check whether further nodes may be added at any given time.Mutation and Iterators

Since the DOM permits editing, and an iterator may be active while the data structure it navigates is being edited, an iterator must behave gracefully in the face of change.

(ED: The Working Group agrees on the principle that an iterator should "gracefully" adapt to changes in the underlying document, but not on the details of how this can be rigorously specified. This will be addressed in the next draft of the DOM Level 2 spec.Among the issues under consideration are:

Should all iterators be "live" or should there be separate interfaces or factory methods that specify whether the iterator attempts to adapt to document changes or not?
May we assume that the iterator is notified of impending changes before the underlying tree is changed so that it can fix itself up, or do we assume that the iterator "gracefully adapts" to changes when the next() method is called?
What is the cost/benefit justification of specifying a relatively simple fixup that may have undesirable (but easily understandable) behavior under some circumstances, versus specifying a more complex algorithm that has more generally desireable behavior?

)

3.1.2. Filters

Filters allow the user to "filter out" nodes. Each filter contains a user-written function that looks at a node and determines whether or not it should be filtered out. To use a filter, you create an iterator that uses the filter. The iterator applies the filter to each node, and if the filter rejects the node, the iterator skips over the node as though it were not present in the document. Filters are easy to write, since they need not know how to navigate the structure on which they operate, and they can be reused for different kinds of iterators that operate on different data structures.

Let's use a filter to write code to find the named anchors in an HTML document. In HTML, an HREF can refer to any <A> element that has a NAME attribute. The first step is to write a filter that looks at a node and determines whether it is a named anchor:



            class NamedAnchorFilter implements NodeFilter
            {
                boolean acceptNode(Node n) {
                    if (n instanceof Element) {
                        Element e = n;
                        if (n.getAttribute("NAME") != NULL) {
                            return true;
                        }
                    }
                    return false;
                }
            }

To use this filter, create an instance of the filter and create an iterator using it:

(ED: We need to specify the details of how this will work in ECMAScript, which does not have the concept of abstract interfaces or data types, more formally)



            NamedAnchorFilter naf;
            NodeIterator nit = document.createFilteredTreeIterator(naf);

At this point, the iterator will show only the named anchors in the document. Writing equivalent code without filters would be marginally simpler, and no less efficient. The advantage of using filters is that it allows reuse. For instance, if you have another part of your program that needs to find the named anchors in a NodeList, you can use the filter the same way you used it for the document:



            NamedAnchorFilter naf;
            NodeIterator nit = nodelist.createFilteredTreeIterator(naf);

3.2. Formal Interface Definition

Interface NodeIterator

IDL Definition

interface NodeIterator {
  Node                      nextNode();
  void                      reset();
};

Methods

nextNode

Return Value: The next Node in the set being iterated over, or NULL if there are no more members in that set.

This method has no parameters.
This method raises no exceptions.

reset: Resets the iterator to the same state as a new iterator would be if constructed by the same factory method with the same arguments as used to construct this iterator.
This method has no parameters.
This method returns nothing.
This method raises no exceptions.

Interface NodeFilter

IDL Definition

interface NodeFilter {
  boolean                   acceptNode(in Node n);
};

Methods

acceptNode

Parameters

n

The node to check to see if it passes the filter or not.

Return Value: TRUE if a this node is to be passed through the filter and returned by the NodeIterator::nextNode() method, FALSE if this node is to be ignored.</

This method raises no exceptions.

3.3. NodeIterator

NodeIterators are used to step through a set of nodes, e.g. the set of nodes in a NodeList, the document subtree governed by a particular node, the results of a query, or any other set of nodes. The NodeIterator interface is very simple, containing only two methods.



	interface NodeIterator {

		  Node nextNode();
		  void reset();

	};

The nextNode() method returns the next Node. The reset() method sets the internal position back to the start position. When a NodeIterator is first created or reset, the nextNode() method returns the first node. If there are no more nodes, nextNode() returns null.

Any iterator that returns nodes may implement the NodeIterator interface. Users and vendor libraries may also choose to create iterators that implement the NodeIterator interface.

3.4. Iterator Factory Methods

Iterators can be used with or without filters, and flags can be set to determine which node types are traversed by an iterator. The factory methods that create iterators also determine the exact behavior of the iterator.

Node has methods that create iterators to traverse the node and its children in document order (depth first, pre-order traversal, which is equivalent to the order in which the start tags occur in the text representation of the document). The following Node methods create iterators:



   NodeIterator createTreeIterator();
   NodeIterator createSelectiveTreeIterator(int whatToShow);
   NodeIterator createFilteredTreeIterator(NodeFilter filter);
   NodeIterator createSelectiveFilteredTreeIterator(int whatToShow, NodeFilter filter);

In the above factories, the "whatToShow" flag is allows parameters to determine whether entities are shown as expanded, and which node types are shown:



   public static final int  TW_DEFAULT       = 0x0022;
   public static final int  TW_ALL           = 0xFFFF;
   public static final int  TW_ELEMENT       = 0x0002;
   public static final int  TW_PI            = 0x0008;
   public static final int  TW_COMMENT       = 0x0010;
   public static final int  TW_TEXT          = 0x0020;
   public static final int  TW_ENTITYREF     = 0x0040;

These flags can be combined using OR:



   Node iter=factory.create(root, TW_ELEMENT | TW_PI | TW_COMMENT | TW_EXPANDED);

The default view shows elements and text, but no other nodes (attributes are retrieved from the elements). The constant TW_DEFAULT is a mask that defines this default view.

If TW_ENTITYREF is not set, entities are expanded. If TW_ENTITYREF is set, entity references will be encountered by the iterator. There is no setting that shows both the entity reference and its expansion.

Several people have suggested that the functionality of whatToShow be implemented using filters. We feel that it is better to implement them using iterators, since it makes it possible to provide a more efficient implementation. A filter must examine each node individually; an iterator can make use of internal data structures to examine only those nodes that are desired.

Containers should also be able to create iterators. For instance, NodeList will contain the following method, which creates an iterator to traverse the list:



   NodeIterator createListIterator();

In a later version of Level 2, when queries are supported, we will also want factory methods that can issue a query and provide an iterator for the result set. These methods may look something like this:



   NodeIterator createTreeQueryIterator(DOMString query);
   NodeIterator createListQueryIterator(DOMString query);

3.5. Filters

Filters are simply functions that "filter out" nodes. If an iterator is given a filter, before it returns the next node, it applies the filter. If the filter says to show the node, it returns it; otherwise, the iterator looks for the next node. Node filters implement the NodeFilter interface:



   interface NodeFilter
   {
   	boolean acceptNode( Node n );
   }

If acceptNode() returns true, the node is accepted; otherwise, it is rejected.

Filters do not need to know how to iterate, nor do they need to know anything about the data structure that is being iterated. This makes it very easy to write filters, since the only thing they have to know how to do is evaluate a single node. One filter may be used with a number of different kinds of iterators, encouraging code reuse.

The DOM does not specify any concrete NodeFilter classes; NodeFilter is just an interface that allows users to write their own filters.