WD-DOM-Level-2-19990304


6. Document Object Model Filters and Iterators

Editors
Mike Champion, Aliaron
Jonathan Robie, Texcel

Table of contents


6.1. Overview of the DOM Level 2 Query, Iterator, and Filter Interfaces

The DOM Level 2 Query, Iterator, and Filter interfaces extend the functionality of the DOM to allow simple and efficient traversal of document subtrees, node lists, or the results of queries.

This proposal contains Iterator and Filter interfaces, but no query interfaces. A separate specification will be prepared for query interfaces, which will be query-language independent.

6.1.1. Iterators

In several popular approaches to software design, iterators are considered a basic building block for building reusable software and software libraries. For instance, they are fundamental to the Design Patterns approach, STL, and the Java libraries. The main advantages of node iterators in the DOM are:

  1. Abstracting out the way that specific data structures are navigated. Functions that use iterators can operate on any data structure without knowing the details of how that data structure is navigated; e.g., the same function could process the nodes in a document, a document subtree, or a nodelist. The function can keep asking for the next node without worrying about how that node is found.
  2. Allowing more efficient navigation. Because an iterator hides the manner in which a data structure is navigated, it can use indexes or other supplementary data structures to allow more efficient navigation than might be possible by naively navigating from one node to the next.
  3. Providing views for the most common ways applications want to navigate document structures. Some applications traverse only the element tree, others process additional nodes such as processing instructions or comments, others prefer yet another view. There is no one right way to navigate a document tree, but iterators provide a simple, efficient way to choose the most appropriate view of the document tree for a given application.

An iterator allows the nodes of a data structure to be returned sequentially. When an iterator is first created, calling nextNode() returns the first node. When no more nodes are present, nextNode() returns a null. It is important to remember that DOM structures may change as a document is loaded - when nextNode() finds no more nodes, it is still quite possible that further nodes may be added in the next instant. Since iterators do not know how to predict the future, there is no way to check whether further nodes may be added at any given time.

Since the DOM permits liveness and editing, and an iterator may be active while the data structure it navigates is being edited, an iterator must behave gracefully in the face of change. Additions and deletions in the underlying data structure do not invalidate an iterator.

Using ordered set semantics, the position of the iterator is determined by the relative position in the ordered set. There is no current node. When an iterator is created for a list, the position is set before the first element:


 A B C D E F G H I
^

Each call to next() returns a node and advances the position. For instance, if we start with the above position, the first call to next() returns "A" and advances the iterator:


 A B C D E F G H I
  ^

The relative position of the iterator remains valid when nodes are deleted. Suppose the nodes in our list do not come from a tree, but are merely a set of nodes in which none of the nodes are children of other nodes. If you delete "A", the position of the iterator is unchanged with respect to the remaining nodes:


 B C D E F G H I
^

Similarly, if "B" and "C" are deleted, the position remains unchanged with respect to the remaining nodes:


 D E F G H I
^

Moving the "D" node to the end of the set does not change the current position:


 E F G H I D
^

Note that the relative position of the iterator is not the same as the absolute position within the set. The position of the iterator is relative to the node before it and the node after it, which is why the position floats gracefully when nodes are deleted or inserted before or after the position of the iterator. If an iterator were based on absolute position, then an iterator at position 5 would suddenly point to a different item if node 3 were deleted. In many implementations, iterators may need to be adjusted when nodes are inserted or deleted.

(ED: The fix-ups required by this model complicate implementation somewhat, but make life simpler for the user of iterators. Much of the complexity of fix-ups is in notification - the fix-ups themselves are then relatively straightforward. It might seem simpler to invalidate an iterator when changes are made, but invalidation also requires notification. We currently feel that handling change gracefully is worth the added implementation cost, but are interested in feedback on this issue.)

6.1.2. Filters

Filters allow the user to "filter out" nodes. Each filter contains a user-written function that looks at a node and determines whether or not it should be filtered out. To use a filter, you create an iterator that uses the filter. The iterator applies the filter to each node, and if the filter rejects the node, the iterator skips over the node as though it were not present in the document. Filters are easy to write, since they need not know how to navigate the structure on which they operate, and they can be reused for different kinds of iterators that operate on different data structures.

Let's use a filter to write code to find the named anchors in an HTML document. In HTML, an HREF can refer to any <A> element that has a NAME attribute. The first step is to write a filter that looks at a node and determines whether it is a named anchor:



            class NamedAnchorFilter implements NodeFilter
            {
                boolean acceptNode(Node n) {
                    if (n instanceof Element) {
                        Element e = n;
                        if (n.getAttribute("NAME") != NULL) {
                            return true;
                        }
                    }
                    return false;
                }
            }
          

To use this filter, create an instance of the filter and create an iterator using it:

These flags can be combined using OR:



    		Node iter=factory.create(root, TW_ELEMENT | TW_PI | TW_COMMENT | TW_EXPANDED);
		

The default view shows elements and text, but no other nodes (attributes are retrieved from the elements). The constant TW_DEFAULT is a mask that defines this default view.

If TW_ENTITYREF is not set, entities are expanded. If TW_ENTITYREF is set, entity references will be encountered by the iterator. There is no setting that shows both the entity reference and its expansion.

(ED: We need to specify the details of how this will work in ECMAScript, which does not have the concept of abstract interfaces or data types, more formally)


            NamedAnchorFilter naf;
            NodeIterator nit = document.createFilteredTreeIterator(naf);
          

At this point, the iterator will show only the named anchors in the document. Writing equivalent code without filters would be marginally simpler, and no less efficient. The advantage of using filters is that it allows reuse. For instance, if you have another part of your program that needs to find the named anchors in a NodeList, you can use the filter the same way you used it for the document:



            NamedAnchorFilter naf;
            NodeIterator nit = nodelist.createFilteredTreeIterator(naf);
          


6.2. Formal Interface Definition

Interface NodeIterator

NodeIterators are used to step through a set of nodes, e.g. the set of nodes in a NodeList, the document subtree governed by a particular node, the results of a query, or any other set of nodes. The set of nodes to be iterated is determined by the factory that creates the iterator.

Any iterator that returns nodes may implement the NodeIterator interface. Users and vendor libraries may also choose to create iterators that implement the NodeIterator interface.

IDL Definition
interface NodeIterator {
  Node                      nextNode();
  Node                      prevNode();
};

Methods
nextNode
Returns the next node in the set and advances the position of the iterator in the set. After a NodeIterator is created, the first call to nextNode() returns the first node in the set.
Return Value
The next Node in the set being iterated over, or NULL if there are no more members in that set.

This method has no parameters.
This method raises no exceptions.
prevNode
Returns the previous node in the set and moves the position of the iterator backwards in the set.
Return Value
The previous Node in the set being iterated over, or NULL if there are no more members in that set.

This method has no parameters.
This method raises no exceptions.
(ED: Some felt that firstNode() and lastNode() would be useful to position to the beginning or end of the iterated set. Others felt this requires the implementation to maintain too much state. For now, we have chosen not to specify these methods, but we are open to feedback on this issue. One implementor suggested that prevNode() was too complex when nodes are kept in a singly linked list. We suspect that the ability to traverse in both directions is extremely useful, and a quick, informal poll suggested that most DOM implementations probably need to do this already.)
Interface Document

Document contains methods that creates iterators to traverse a node and its children in document order (depth first, pre-order traversal, which is equivalent to the order in which the start tags occur in the text representation of the document).

IDL Definition
interface Document {
  boolean                   createTreeIterator(in Node root, 
                                               in short whatToShow);
};

(ED: What about createListIterator?)
(ED: In a later version of Level 2, when queries are supported, we will also want factory methods that can issue a query and provide an iterator for the result set. These methods may look something like this:


    		NodeIterator createTreeQueryIterator(DOMString query);
    		NodeIterator createListQueryIterator(DOMString query);
		
)
Methods
createTreeIterator
Parameters
root

The node which will be iterated together with its children.

whatToShow

This flag determines whether entities are expanded, and whether comments, processing instructions, or text are presented via the iterator.



			public static final int  TW_DEFAULT       = 0x0022;
			public static final int  TW_ALL           = 0xFFFF;
			public static final int  TW_ELEMENT       = 0x0002;
			public static final int  TW_PI            = 0x0008;
			public static final int  TW_COMMENT       = 0x0010;
			public static final int  TW_TEXT          = 0x0020;
			public static final int  TW_ENTITYREF     = 0x0040;
		

These flags can be combined using OR:



	    		Node iter=factory.create(root, TW_ELEMENT | TW_PI | TW_COMMENT | TW_EXPANDED);
			

The default view shows elements and text, but no other nodes (attributes are retrieved from the elements). The constant TW_DEFAULT is a mask that defines this default view.

If TW_ENTITYREF is not set, entities are expanded. If TW_ENTITYREF is set, entity references will be encountered by the iterator. There is no setting that shows both the entity reference and its expansion.

(ED: Several people have suggested that the functionality of whatToShow be implemented using filters. We feel that it is better to implement them using iterators, since it makes it possible to provide a more efficient implementation. A filter must examine each node individually; an iterator can make use of internal data structures to examine only those nodes that are desired.)
Return Value
TRUE if a this node is to be passed through the filter and returned by the NodeIterator::nextNode() method, FALSE if this node is to be ignored.

This method raises no exceptions.
Interface NodeFilter

Filters are simply objects that know how to "filter out" nodes. If an iterator is given a filter, before it returns the next node, it applies the filter. If the filter says to accept the node, the iterator returns it; otherwise, the iterator looks for the next node and pretends that the node that was rejected was not there.

The DOM does not provide any filters. Filter is just an interface that users can implement to provide their own filters. The introduction to this chapter gives an example of how a user can implement a filter to perform a specific function.

Filters do not need to know how to iterate, nor do they need to know anything about the data structure that is being iterated. This makes it very easy to write filters, since the only thing they have to know how to do is evaluate a single node. One filter may be used with a number of different kinds of iterators, encouraging code reuse.

IDL Definition
interface NodeFilter {
  boolean                   acceptNode(in Node n);
};

Methods
acceptNode
Parameters
n

The node to check to see if it passes the filter or not.

Return Value
TRUE if a this node is to be passed through the filter and returned by the NodeIterator::nextNode() method, FALSE if this node is to be ignored.

This method raises no exceptions.