[whatwg] textarea newline format - raw value vs. transformed value and setSelectionRange

Consider the following:

<!DOCTYPE html>
<html>
     <head>
         <meta charset="utf-8">
         <title></title>
         <script>
             window.addEventListener("DOMContentLoaded", function() {
                 var ta = document.getElementsByTagName("textarea")[0];
                 alert(ta.value.replace(/\r|\n/g, encodeURIComponent));
                 ta.focus();
                 ta.setSelectionRange(8, 8);
             }, false);
         </script>
     </head>
     <body>
         <textarea rows="3">Line 1
Line 2
Line 3</textarea>
     </body>
</html>

The behavior between Firefox 4 latest trunk and Opera 10.70 latest  
snapshot is different because they're using different newline formats.

Firefox is using '\n' while Opera is using "\r\n", which causes the cursor  
to be placed at different positions.

See step 1 at  
<http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#attr-textarea-wrap-hard-state>.

That says that the 'value' getter returns the raw value + newlines  
normalized to "\r\n". I always thought that meant that the raw value (what  
was parsed into the DOM) contained newlines normalized to "\r\n" too for  
textareas and that a browser with an HTML5 parser like Firefox would  
automatically show newlines normalized to "\r\n" without even having a  
conversion done (internally) for the 'value' getter. But, now I'm not so  
sure.

I'm also not sure step 1 applies to the 'value' setter. I can't tell for  
sure. It looks like not, but not sure.

Also, does everyone agree with step 1?

Also, I'm not sure if setSelectionRange() should operate on the raw value,  
or the transformed value in step 1.

Opera's not using an HTML5 parser yet, so I can't check what it might do,  
but could this be clarified?

In  
<http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream>  
it says:

"U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)  
characters are treated specially. Any CR characters that are followed by  
LF characters must be removed, and any CR characters not followed by LF  
characters must be converted to LF characters. Thus, newlines in HTML DOMs  
are represented by LF characters, and there are never any CR characters in  
the input to the tokenization stage."

Does that mean that the raw value of the parsed textarea should only ever  
have '\n' for newlines (unless the 'value' setter is used in JS to  
introduce '\r' characters)?

If so, does that mean that setSelectionRange() should operate on the raw,  
internal value (that just has '\n' for newlines in it normally), but the  
'value' getter still returns the transformed value with newlines  
normalized to "\r\n"?

I see  
<http://www.whatwg.org/specs/web-apps/current-work/multipage/editing.html#dom-textarea/input-setselectionrange>,  
but it doesn't mention this.

-- 
Michael

Received on Sunday, 10 October 2010 20:44:02 UTC