Bug 17541 - Carefully think through how Range.insertNode should behave when the node to be inserted is already at the beginning of the range
Summary: Carefully think through how Range.insertNode should behave when the node to b...
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: DOM (show other bugs)
Version: unspecified
Hardware: All Windows 3.1
: P2 minor
Target Milestone: ---
Assignee: Anne
QA Contact: public-webapps-bugzilla
Depends on:
Reported: 2012-06-19 07:02 UTC by Aryeh Gregor
Modified: 2014-04-30 13:58 UTC (History)
6 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Aryeh Gregor 2012-06-19 07:02:52 UTC

data:text/html,<!doctype html>
var range = document.createRange();
// Collapse range between "ab" and "cd"
range.setStart(document.body.firstChild.firstChild, 2);
document.body.textContent =
range.startContainer.nodeName + " " +
range.startContainer.nodeValue + " " +
range.startOffset + " " +
range.endContainer.nodeName + " " +
range.endContainer.nodeValue + " " +

The spec appears to require "P null 0 P null 2", and this is what Firefox 16.0a1 does.  Chrome 21 dev throws HierarchyRequestError.  Opera Next 12.00 alpha gives "P null 0 #text cd 0".  IE10 Developer Preview gives "#text ab 0 P null 2".  Or in pictures:

  Spec, Firefox: {abcd}
  Opera:         {ab]cd
  IE:            [abcd}

On reflection, the spec/Firefox behavior does make the most sense given what the user asked for.  But this is a complete coincidence -- basically we have two spec bugs combining to cancel out.  On the one hand we're splitting the node the user asked to insert, so really now we're only inserting half of it.  But on the other hand, the computation for "new offset" doesn't account for the fact that we're not actually inserting a node that wasn't already there, so it incorrectly adds one to the offset and puts the end after the reference node (when it was supposed to be before).  So in the end we actually include the whole node that we were asked to insert -- it's just split.

This is kind of amusing, but it does indicate that the algorithm needs more thought.  At the very least, a note needs to alert the reader to these subtleties.  I discovered it when I was fixing <https://bugzilla.mozilla.org/show_bug.cgi?id=765799>, and I initially tried to do so by substituting a different algorithm that didn't use indexOf() at all.  Which failed in this case.
Comment 1 Anne 2014-02-13 16:27:33 UTC
I don't really understand what is going on here. E.g. if I change the 2 in the setStart() call to 3 or 1 the result is still the same in Gecko. I don't know why.
Comment 2 Aryeh Gregor 2014-02-19 12:19:59 UTC
It doesn't matter where in the text node it falls, no.  Why should it?

What the spec currently requires is this:

Step 3: Split the text node, set referenceNode to the second one ("cd").  The selection is at the end of the first node ("ab"), which is <var>node</var>.

Step 5: parent is <p>.

Step 6: newOffset is 1.

Step 7: newOffset is 2.  This is wrong -- it's meant to adjust for the fact that we're about to add a node, but actually we remove a node first.

Step 8: We remove the "ab" node, which causes the range to jump up to (<p>, 0).  This is not anticipated.  Then we reinsert the node, and the range is still at (<p>, 0).  This means that the range has jumped from "ab[]cd" to "{}abcd".

Step 9: Set the end of the parent to (<p>, 2), so it's "{abcd}".

I assume that's what Gecko does, without looking at the source.

What's expected behavior in this case?  There are two bugs here, I think:

1) We set the new position of the range (step 7) without noticing if we're going to remove a node.

2) We split the node we were given, so we only wind up moving half of it.

As I noted, these happen to more or less cancel, but that's accidental.  You could have a non-text node case where they don't, like:

data:text/html,<!doctype html>
var range = document.createRange();
/* Collapse range at end of <b> */
range.setStart(document.querySelector("b"), 1);
document.body.textContent =
range.startContainer.nodeName + " " +
range.startContainer.nodeValue + " " +
range.startOffset + " " +
range.endContainer.nodeName + " " +
range.endContainer.nodeValue + " " +

In this case the spec would say to throw an IndexSizeError, which is what Gecko in fact does, because the last step of the algorithm tries to set the range beyond the offset of the parent.  Again, step 7 increases the length to account for a node that is not in fact going to be removed.

I think the simplest way to fix this is just to remove <var>node</var> from its parent before doing anything else.  I think that would mean that the example from comment #0 would result in "{abcd}" with the text node not split, and the second case I give would result in "<b>{foo}</b><i>bar</i>", both of which strike me as quite expected.

Comment 3 Anne 2014-03-19 16:14:58 UTC
So we'd add a step 0 that'd check if /node/ has a parent, and if so, removes it from that parent? Or should that be step 1.5 so we throw an exception first?
Comment 4 Aryeh Gregor 2014-04-02 14:15:24 UTC
Definitely throw first.  Otherwise, if you called it on a Comment, it would remove the node and then do nothing else, which is silly.
Comment 6 Aryeh Gregor 2014-04-23 14:52:10 UTC
This has an undesired side effect I spotted while updating the tests -- if the eventual insertion will fail, we still removed the node from its parent.  E.g., if you have a range in the middle of a text node and try range.insertNode(document.doctype), we will now remove the doctype from its current position and then fail to insert it.  Maybe we should test that the node can actually be inserted at the current position before doing any DOM mutations?  We've avoided doing that to date, so e.g. we will split a text node even if the eventual insert will fail.
Comment 7 Anne 2014-04-23 17:49:11 UTC
The "can be inserted" check is quite elaborate as can be seen in #concept-node-pre-insert & friends. Is that really necessary?
Comment 8 Aryeh Gregor 2014-04-24 12:47:19 UTC
If we want to leave the DOM in a consistent state, yes, I think so.  I.e., if we don't want an algorithm that's going to fail in the end to do the job halfway before failing.  I think this would be useful for some of the other nontrivial algorithms in the spec as well, although I can't remember which offhand.  If we want things to be fully correct, we should split out all those checks into a separate algorithm we can call from wherever needed.
Comment 9 Anne 2014-04-24 13:23:29 UTC
Are implementers interested in maintaining that as well? Note that we already have two algorithms (in pre-insert and replace) for checking that are much alike, but different due to context.
Comment 10 Aryeh Gregor 2014-04-24 13:29:55 UTC
I think those two can be unified and split into a separate algorithm.  Let me try.
Comment 11 Aryeh Gregor 2014-04-24 13:41:35 UTC
I looked closer and you're right, they can't.  However, insertNode() should be able to use the same exact checks as pre-insert, no extra maintenance needed.  Are you interested in reviewing a patch if I write one?
Comment 12 Anne 2014-04-24 13:49:23 UTC
Comment 13 Anne 2014-04-28 12:52:03 UTC
See https://github.com/whatwg/dom/pull/11