This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14254 - insertText has to handle things like \r, \0, etc. sanely
Summary: insertText has to handle things like \r, \0, etc. sanely
Status: NEW
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: HISTORICAL - HTML Editing APIs (show other bugs)
Version: unspecified
Hardware: All Windows 3.1
: P2 minor
Target Milestone: ---
Assignee: Aryeh Gregor
QA Contact: HTML Editing APIs spec bugbot
Depends on:
Reported: 2011-09-22 21:19 UTC by Aryeh Gregor
Modified: 2012-12-04 00:52 UTC (History)
3 users (show)

See Also:


Description Aryeh Gregor 2011-09-22 21:19:49 UTC
What should happen if you do document.execCommand("inserttext", false, "\0") or something like that?  That won't serialize to text/html.  Presumably the input needs to be sanitized somehow, but how?  The brute-force option is to say you have to apply a function that works like

  function normalizeText(text) {
    var span = document.createElement("span");
    span.textContent = text;
    span.innerHTML = span.innerHTML;
    return span.innerHTML;

This will work, but is there a simpler way?  It would be pretty ridiculous to require calling the HTML parsing and serialization algorithms here.  I could just require that the results be the same, but that invites bugs.
Comment 1 Simon Pieters 2011-09-23 08:26:12 UTC
The DOM can contain \r and \0 by using e.g. textContent. Why it it a problem for execCommand?
Comment 2 Aryeh Gregor 2011-09-23 19:00:44 UTC
Hmm . . . good point.  I was aiming to only produce DOMs that serialize as text/html, but if the author is going to go out of their way to have scripts insert bogus stuff like this, no reason we should stand in their way.

In that case, I need to update gentest.html so it will generate these tests even though the DOM doesn't serialize.  Currently it assumes that such a test is buggy and skips generating it.
Comment 3 Ehsan Akhgari [:ehsan] 2011-09-23 19:15:05 UTC
FWIW, what Gecko does is that it doesn't store \r's in the DOM, but it stores every other character as passed in.  And the only reason we handle \r's specially is to avoid the line-breaking hell across multiple platforms.