25412 – Allow some recursion in NodeIterator/TreeWalker

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25412 - Allow some recursion in NodeIterator/TreeWalker

Summary: Allow some recursion in NodeIterator/TreeWalker

Status:	RESOLVED MOVED

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	DOM (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Aryeh Gregor
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-04-22 15:58 UTC by Aryeh Gregor
Modified:	2017-08-08 07:08 UTC (History)
CC List:	10 users (show)

See Also:

Attachments

Description Aryeh Gregor 2014-04-22 15:58:47 UTC

The effect of the active flag is that if the filter causes anything to be called that might invoke the filter, an InvalidStateError is thrown.  Is it really necessary to be so strict?  This test indicates that Chrome and IE both allow some amount of recursion before throwing:

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/2957

Firefox seems to behave like the spec, but I think behavior more like Chrome/IE is preferable, because it's more tolerant of possibly reasonable code.  Chrome allows around 20 layers of recursion in my test, and IE allows 1770 or something.  I'd like to replace the active flag with a counter that caps at some arbitrary value like 100 or 1000.

Comment 1 Olli Pettay 2014-04-22 16:24:35 UTC

The flag does make sense to me.
What should nextNode() do if filter calls it recursively?



Relevant Gecko bug
https://bugzilla.mozilla.org/show_bug.cgi?id=559526

Comment 2 Boris Zbarsky 2014-04-23 00:38:34 UTC

Aryeh, it sounds probable that Chrome and IE are just hitting stack limits and throwing due to being out of stack space...  Good luck specifying that.

Comment 3 Aryeh Gregor 2014-04-23 12:08:46 UTC

(In reply to Olli Pettay from comment #1)
> The flag does make sense to me.
> What should nextNode() do if filter calls it recursively?

As now, except instead of a boolean flag to determine whether to throw InvalidStateError, have a counter that tracks recursion depth and only throw if it gets too high.  This still avoids crashes or running out of stack space, but is more generous in case the code does a limited amount of recursion for some reason.  I don't think we need to make all recursion throw, if we're only concerned about not running out of stack space -- given that Chrome and IE both do allow limited recursion.

(In reply to Boris Zbarsky from comment #2)
> Aryeh, it sounds probable that Chrome and IE are just hitting stack limits
> and throwing due to being out of stack space...  Good luck specifying that.

IE, maybe, but Chrome's limit is much too low to just be running out of stack space -- a maximum depth of around 20.

Comment 4 Boris Zbarsky 2014-04-23 14:08:35 UTC

> but Chrome's limit is much too low to just be running out of stack space -- a
> maximum depth of around 20

So I just poked around in Chrome's code and there is no limit check of any sort in the following functions: NodeIteratorBase::acceptNode, NodeIterator::nextNode, NodeIterator::previousNode, NodeFilter::acceptNode.

The exception message in this case is "Maximum call stack size exceeded."  The only place that string appears in the Blink code is in V8Binding.cpp in the handleMaxRecursionDepthExceeded function.  The only callers of that function are in V8ScriptRunner.cpp, which compares the recursion level to kMaxRecursionDepth before calling out into JS.  So this is not something specific to NodeIterator/TreeWalker, but a general limit of some sort Chrome has on invoking JS from C++.  kMaxRecursionDepth seems to be set to 22.

I don't think we want to duplicate this behavior across the whole platform.

Comment 5 Aryeh Gregor 2014-04-23 14:19:09 UTC

(In reply to Boris Zbarsky from comment #4)
> I don't think we want to duplicate this behavior across the whole platform.

Any specific reason why not, other than the fact that no one actually implements a NodeFilter-specific recursion limit right now?  The required changes to Gecko code would be negligible.  I guess the current way is simpler.

Anne, Ms2ger, any thoughts?

Comment 6 Boris Zbarsky 2014-04-23 14:45:44 UTC

> Any specific reason why not

Why not add a limit of 20 on event dispatch and other callback invocations?  Because I don't think it's needed and I don't think we need that compat hit.

> The required changes to Gecko code would be negligible.

I very much disagree, but maybe you're talking about a different change than the one I think you're talking about?

Comment 7 Aryeh Gregor 2014-04-23 15:12:52 UTC

(In reply to Boris Zbarsky from comment #6)
> Why not add a limit of 20 on event dispatch and other callback invocations? 
> Because I don't think it's needed and I don't think we need that compat hit.

What currently happens in these cases?

> > The required changes to Gecko code would be negligible.
> 
> I very much disagree, but maybe you're talking about a different change than
> the one I think you're talking about?

Change mInAcceptNode from boolean to int8_t, change "if (mInAcceptNode)" to "if (mInAcceptNode > MAX)", change "mInAcceptNode = true" to "mInAcceptNode++", change the AutoRestore thing to decrement instead of resetting to false.  Probably rename mInAcceptNode to mAcceptNodeCtr or something.

Comment 8 Boris Zbarsky 2014-04-23 15:37:57 UTC

> What currently happens in these cases?

We just make the call.  If we hit native stack limits, an exception is thrown.  More or less like the IE behavior you're seeing.

> Change mInAcceptNode 

OK, that's not the "duplicate this behavior across the whole platform" thing I was objecting to, so I'm not sure why you quoted that bit if you were not responding to it... ;)

Comment 9 Aryeh Gregor 2014-04-23 15:50:46 UTC

(In reply to Boris Zbarsky from comment #8)
> We just make the call.  If we hit native stack limits, an exception is
> thrown.  More or less like the IE behavior you're seeing.

So why do we have a special guard for this filter thing to begin with?  Why not just let the usual stack limit provisions handle it?

Comment 10 Boris Zbarsky 2014-04-23 15:56:40 UTC

> So why do we have a special guard for this filter thing to begin with?

Because the algorithm for NodeIterator/TreeWalker doesn't safely handle the case when the filter modifies the iterator/walker location, last I checked.

If it's changed to do that, we could probably remove the recursion check here.

Comment 11 Aryeh Gregor 2014-04-23 16:08:57 UTC

This doesn't catch that case for TreeWalker, because you can set .currentNode to anything from the filter and it doesn't hit the recursion guard.  For NodeIterator, it would behave strangely if you called nextNode() or previousNode() from the filter, but no more strangely than if you mutated the DOM it's operating on.  For instance, according to a strict implementation of the spec, you could get .referenceNode to not be a descendant of .root by calling .nextNode(), and then having the filter move the node somewhere else before the iterator actually updates its position.  Probably something similar is possible in implementations (I didn't test).

So I don't see why we shouldn't just remove the recursion guard entirely, given that IE/Chrome don't implement it.

Comment 12 Boris Zbarsky 2014-04-23 17:46:23 UTC

> So I don't see why we shouldn't just remove the recursion guard entirely, given
> that IE/Chrome don't implement it.

I think we can, as long as we audit the spec to make sure it behaves correctly without it.

Looking at Chrome's implementation of NodeIterator::nextNode, it doesn't match the spec in all sorts of ways (e.g. the m_candidateNode member has no equivalent in the spec, there is nothing like pointerBeforeReferenceNode in the Blink impl afaict, there's an extra m_detached check that has no spec equivalent, and a few other odds and ends), so the fact that they consider it safe to invoke nextNode from inside the filter tells me nothing about whether it's safe with the spec's algorithm.

Comment 13 Aryeh Gregor 2014-04-24 12:17:35 UTC

(In reply to Boris Zbarsky from comment #12)
> I think we can, as long as we audit the spec to make sure it behaves
> correctly without it.

What would you consider "correctly"?  It is certainly true that if the filter mutates the iterator, the return value of .nextNode()/.previousNode() will not reflect the iterator's current state in general.  I don't think this is a problem -- I would say if authors write such a filter, they deserve what they get.  The same type of issue arises if you mutate the DOM from the filter -- loops like while (node = iter.nextNode()) could fail to terminate, for instance.

Or did you mean something else by "correctly"?

Comment 14 Boris Zbarsky 2014-04-24 16:29:16 UTC

> What would you consider "correctly"?

That's a good question.

My main criterion is "A C++ implementation of this algorithm will not crash".  So no getting properties from null pointers, for example, or putting the iterator in a state that the spec doesn't expect it to be in.

Past that, I don't really care about _sanity_ in this case, because as you point out there won't be any no matter what.

Comment 15 Aryeh Gregor 2014-04-25 10:21:59 UTC

I just looked over the spec briefly, and am pretty sure nothing there would be made more problematic by removing the recursion guard.  As it stands, implementers have to be very careful here anyway, just like with mutation events, and e.g. hold strong references to everything they want to reference after the filter runs.

Comment 16 Anne 2014-04-25 10:33:09 UTC

Aryeh, how does it prevent https://bugzilla.mozilla.org/show_bug.cgi?id=559526 from happening if we just remove the flag?

Comment 17 Olli Pettay 2014-04-25 10:58:26 UTC

(In reply to Aryeh Gregor from comment #15)
> I just looked over the spec briefly, and am pretty sure nothing there would
> be made more problematic by removing the recursion guard.  As it stands,
> implementers have to be very careful here anyway, just like with mutation
> events,
"just like mutation events" and those are a nightmare which we all want
to get rid of. 

But even then, what is the sane behavior with nested
next/previousNode() calls? And is there any reason why we should allow that?

Comment 18 Olli Pettay 2014-04-25 11:09:30 UTC

Hmm, I was missing some bugmail here.
Indeed filter callback may mutate the DOM, so perhaps we could remove the flag.
However, disallowing nested calls does prevent certain types of
programming errors. I can't really see a good use case for nested calls.

Comment 19 Aryeh Gregor 2014-04-25 11:59:13 UTC

(In reply to Anne from comment #16)
> Aryeh, how does it prevent
> https://bugzilla.mozilla.org/show_bug.cgi?id=559526 from happening if we
> just remove the flag?

I don't understand how the crash happened there, but it looks like a QoI issue to me, not a spec issue.  The code just needs to make sure that it's reentrant (no static variables, etc.) and allows for the possibility that the current node changed when the filter was called.  If the spec changes, I'll fix Gecko and make sure that it doesn't crash on the testcase through some other means.

(In reply to Olli Pettay from comment #18)
> However, disallowing nested calls does prevent certain types of
> programming errors. I can't really see a good use case for nested calls.

Agreed, but I don't see why we should go out of our way to prevent them.  As Boris alluded to in comment 6, there are other places where we have callbacks, and we don't require recursion guards like this there.  Removing the active flag makes the spec that much shorter and simpler, and probably requires fewer UAs to change (at least among IE/Chrome/Gecko).

Comment 20 Olli Pettay 2014-04-25 12:05:06 UTC

(In reply to Aryeh Gregor from comment #19)
> Agreed, but I don't see why we should go out of our way to prevent them.  As
> Boris alluded to in comment 6, there are other places where we have
> callbacks, and we don't require recursion guards like this there.
And we have other places where we have guards to prevent recursions.
For example click()


But I could live with the change, and after looking at Gecko code it should
be able to deal with this quite easily.


However, before the change some more testing with UAs should be done.
How do they deal with nested calls. I'm not talking about endless recursion
but filter cb just calling prev/nextNode() once or couple of times etc.

Comment 21 Anne 2014-04-25 13:02:57 UTC

The proposed change is here btw: https://github.com/ayg/dom/commit/61544aa440d0ef6cff10776a952a9562aa67e295

Comment 22 Anne 2014-04-28 12:49:41 UTC

Aryeh, are you willing to do the additional tests Olli asks for?

Comment 23 Aryeh Gregor 2014-04-28 12:55:30 UTC

I already tested what a few UAs (IE/Chrome/Firefox) did in this case, which is how I came to file this bug.  I don't know what other testing is necessary.

Comment 24 Anne 2014-04-28 13:32:16 UTC

https://github.com/whatwg/dom/commit/ccf51e85f205efd8a4497322f35fc39226e6616c

Comment 25 Olli Pettay 2014-04-28 14:08:52 UTC

Where are those tests? The one mentioned in comment 0 tests just nextNode() + prevNode(). nextNode() + nextNode() would be more interesting, and in a way which doesn't lead to endless recursion.

I'd just like to see some testing so that we don't end up changing spec
which is followed by 1 or 2 (Gecko + Presto) implementations to something
which isn't followed by any implementation.

Comment 26 Arkadiusz Michalski (Spirit) 2014-08-24 00:08:47 UTC

Reopened because IE11 and Chrome36 allow recursion but support them in other way. Small test case with only NodeIterator and NodeIterator.nextNode().

<div id="box">
	<p>First P.</p>
	<P>Second P.</p>
	<P>Third P.</P>
	<hr>
</div>

<script>

	var root = document.getElementById("box");
	var i = 0;
	var iterator = document.createNodeIterator(root, NodeFilter.SHOW_ELEMENT, function(node){

		i++;

		if (i < 4){

			(function(i){

				var node = iterator.nextNode();
				document.write("iterator.nextNode(): " + node);
				document.write("<br>");
				document.write("iterator.nextNode().textContent: " + node.textContent);
				document.write("<br>");
				document.write(i);
				document.write("<br><br>");
			
			})(i);

		}

		return NodeFilter.FILTER_ACCEPT;

	}, false);

	document.write("init: " + iterator.nextNode());

</script>

Results:
- IE iterate by all (root and P);
- Chrome stuck on root;

Maybe its only bug in Chrome and allow recursion can stay in DOM, or maybe dissallow this completly (like was before).

Comment 27 Anne 2014-08-24 06:55:57 UTC

Chrome does not seem to throw an exception there. How does this relate to the active flag?

Comment 28 Boris Zbarsky 2014-08-24 13:11:15 UTC

Arkadiusz's point, if I understood it correctly, is that when recursion happens IE and Chrome behave differently.  Which, if either, matches what the spec is currently specifying?

Quite honestly, I don't think this spec change should have been made.  It clearly makes the behavior more complicated to specify and implement interoperably...  assuming interoperable implementation is even possible now.

Comment 29 Arkadiusz Michalski (Spirit) 2014-08-24 13:57:04 UTC

Remove active flag allow recursion (what we have in Chrome and IE) for NodeIterator/TreeWalker, but we still get different behaviour when use recursion and not out of stack limit. So probably this change need more test and work by implementators oposite to not remove active flag. I'm not sure wich browsers do this rigth now (if any, when read "traverse" algo in DOM, now realize that probably Chrome).

So if intention of this bug is only allow recursion for traversal (without paying attention to what is the result in actual browsers) then can be close again and wait that browsers will be corrected. Just report this behaviour, because one example when we cross stuck limit is not enought (like Olli Pettay noted).

Comment 30 Aryeh Gregor 2014-08-25 13:05:08 UTC

I'm not active in spec stuff these days, I'm doing Gecko code cleanup with what little time I spend on work.  I won't object at this point if this is reverted and WONTFIXed, although I do still think it shouldn't be.

Comment 31 Philippe Le Hegaret 2014-09-19 18:30:29 UTC

For completeness, WPT has a PR with a test that invokes nextNode() recursively in
TreeWalker-navigate-from-filter.html :
 https://github.com/w3c/web-platform-tests/pull/1235

Comment 32 Boris Zbarsky 2014-09-19 18:42:27 UTC

Note that those tests check that an exception is thrown (like the spec used to say) and don't check interesting edge-case behavior in the cases when none is thrown or infinite recursion hits.

Comment 33 Anne 2016-08-17 12:45:53 UTC

Aryeh, any thoughts on this two years later? Given the inconsistent implementations I'm inclined to revert the change and add the flag back, though it would be good to get some input from Blink and WebKit folks as they appear to handle this differently.

Comment 34 Aryeh Gregor 2016-08-17 13:22:12 UTC

I don't think it makes sense to add this flag here unless we add such a flag in every case where we allow the user to supply a callback of some kind.  What's special here?  If UAs don't want to crash on recursive JS calls, they need general protection against it to prevent function f() { f() }; f() from crashing.  I tested in Edge and Chrome just now, and both allow recursion in this case (100 for Chrome and 10000 for IE).  Edge, Firefox, and Chrome also all have some sort of generic limit against infinite recursion in JavaScript:

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4387

So no, I definitely do not think we should revert to the previous spec, since the current spec matches all UAs other than Gecko, and is simpler and more consistent.  Gecko should change, and if they (we) have concerns about crashes, that's a QoI issue that all other UAs seem to have dealt with to their satisfaction without imposing a special recursion guard.  And which Gecko deals with for all other JS.

Comment 35 Boris Zbarsky 2016-08-22 13:58:46 UTC

> since the current spec matches all UAs other than Gecko

I'm sorry, but that's bullshit.  The current spec can't match "all UAs other than Gecko", because as clearly stated in this bug they DON'T MATCH EACH OTHER.

Comment 36 Aryeh Gregor 2016-08-22 14:17:26 UTC

To be more precise: the current spec specifies no special recursion limit, which is closer to a limit of 10000 or 100 (Edge/Blink) than 1 (Gecko).

Or, if you prefer: the current spec matches Edge and Blink for reasonable code, where code that recurses more than 100 levels is classified as "unreasonable".

However you phrase it exactly, I still don't see any argument for prohibiting recursion here entirely.

You are correct that all UAs do have some sort of recursion limit here that seems to be lower than their generic JavaScript stack limit, based on comparing these two tests:

http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=2957
http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4387

Apparently such an extra limit is necessary, although I'm not sure why.  If we wanted to spec it, we could, but I'm not sure it's necessary unless you think the difference between 100 and 10000 is going to be significant on real-world pages.  Generally I think we don't bother speccing this sort of arbitrary cap, although maybe we should?

Comment 37 Boris Zbarsky 2016-08-22 14:30:10 UTC

I'm talking about behavior when reentry happens, which is not interoperable across browsers.  I'm not talking about the precise point at which recursion is cut off, because that's not really possible to make interoperable short of imposing some arbitrary small-enough-to-not-blow-out-C-stacks limit.  Of course that arbitrary limit could just be "1" at that point...

But again, my point was that the current spec, which allows reentry and specifies what the behavior should be when reentry happens, does NOT in fact match the behavior of all UAs that allow reentry, because those UAs don't match each other.

Comment 38 Aryeh Gregor 2016-08-22 15:23:26 UTC

Okay, okay, I just completely failed to understand what was going on, as usual.  :)  Yes, you're right (as usual), they aren't interoperable.  But I still don't see the problem speccing this.  I think the spec says that Blink's behavior is correct: referenceNode is only updated after the filter is run, so when the filter is called repeatedly, it gets the same referenceNode.  Edge is presumably updating referenceNode before calling the filter.  Is there something wrong with leaving the spec as-is, writing a test, and declaring Edge buggy?  Are there cases that are unclear in the current spec?

Comment 39 Boris Zbarsky 2016-08-22 15:41:15 UTC

I don't think there is anything wrong with that approach, assuming we actually write decent tests that test the various spec edge cases, check them against implementations, file bugs as needed, and get implementors to agree on what the spec should say.

This last step is key: I am willing to remove the recursion change from Gecko, possibly in conjunction with some changes to align to the spec, but I am not willing to do that, then discover that what the spec says is not web-compatible or not willing to be implemented by other UAs and change behavior _again_.  I've been burned by that sort of thing too many times in the past.

Comment 40 Aryeh Gregor 2016-08-22 17:46:56 UTC

Do you think exact behavior in the event of recursion here is likely to have any implications for web compat?  Gecko doesn't support it at all and hasn't gotten bug reports (has it?), and other UAs don't agree on the most basic aspects of how it behaves.  It doesn't seem likely anyone is relying on it.  (But it's still worth speccing because these edge cases can collectively contribute to lack of interop, even if none is important by itself.)

Are there any specific edge cases you would like tested?

By the way, these issues with recursion are similar to what comes up if you set TreeWalker's .currentNode from the filter, right?

Comment 41 Arkadiusz Michalski (Spirit) 2016-08-22 17:50:50 UTC

What is the benefits of recursion if we stuck on referenceNode no matter how much we call filter? IE/Edge behaviour is not more useful here?

Comment 42 Aryeh Gregor 2016-08-22 17:56:16 UTC

I don't know why anyone would actually want to use recursion here.  Unless someone has a use-case, I just want something standardized that requires as little implementation change as possible to get interop, so that when pages do weird stuff they'll be identically weird in all browsers.  I don't care if the recursion actually does anything useful.  3/4 of engines do allow some type of recursion, and it's slightly simpler to spec, so I don't think we should go out of our way to prohibit it.  If WebKit and Chrome agree, I'd go with them, unless Edge's behavior is more useful for something.

Comment 43 Boris Zbarsky 2016-08-22 18:56:03 UTC

> Do you think exact behavior in the event of recursion here is
> likely to have any implications for web compat?

I have a hard time judging likelihood here.  I know for a fact I've encountered cases where pages expected an operation to either throw or work a specific way and broke when it didn't throw and didn't do that specific thing.  So I'm wary of assuming that something that throws in one UA but doesn't in others is OK to not throw but also not match the other UAs.

> Are there any specific edge cases you would like tested?

I haven't looked at this API in enough depth recently to answer this question off the top of my head.  I can try to make time to do that if no one else has either, I guess.

> By the way, these issues with recursion are similar to what comes up if you set
> TreeWalker's .currentNode from the filter, right?

Somewhat, yes.

In terms of implementation complexity, prohibiting something outright is _way_ easier to implement and reason about than allowing it but verifying that it does not violate various invariants.  So if the goal is to have as little implementation change as possible, measured in man-hours of both implementors and spec writers, throwing on reentry is the thing to do, just fwiw.

Comment 44 Aryeh Gregor 2016-08-23 10:51:55 UTC

That's true.  So from that perspective, on consideration, I'd be happy with just banning recursion (like the old spec and Gecko) if other UAs are interested in changing.  If no one has a use-case for recursion here, and no one has requested it in Gecko for all these years, it doesn't make sense to waste time speccing and testing and converging.  Calling .nextNode() from within a .nextNode() call does not seem like a reasonable thing to do ever.

I would like buy-in from at least one additional UA, though.

Comment 45 Travis Leithead [MSFT] 2016-08-23 14:57:45 UTC

Can we get some use-counter data to support the limitation? It would be helpful to know what the real potential impact of this change might be on the web...

Comment 46 Aryeh Gregor 2016-08-23 15:38:21 UTC

We should be able to add usage counting to Gecko, but I don't know how interesting it will be.  Currently we throw in this case, and have thrown for years, and have never received any complaints that I know of, so we already know that the behavior is web-compatible for us.  If you're concerned that you might go down different code paths than Gecko, you'll have to add usage counting yourself.  (This doesn't strike me as plausible in this case, but the web is a very strange place!)

Comment 47 Aryeh Gregor 2016-10-29 20:28:06 UTC

https://github.com/whatwg/dom/pull/359