[whatwg] textarea newline format - raw value vs. transformed value and setSelectionRange

On Tue, 04 Jan 2011 19:38:17 -0500, Ian Hickson <ian at hixie.ch> wrote:

> On Sun, 10 Oct 2010, Michael A. Puls II wrote:
>>
>> Consider the following [simplified]:
>>
>> <!DOCTYPE html>
>> <title></title>
>> <script>
>>   window.addEventListener("DOMContentLoaded", function() {
>>       var ta = document.getElementsByTagName("textarea")[0];
>>       ta.value = ta.value.replace(/\r|\n/g, encodeURIComponent);
>>   }, false);
>> </script>
>> <textarea rows="3">Line 1
>> Line 2
>> Line 3</textarea>
>>
>> The behavior between Firefox 4 latest trunk and Opera 10.70 latest
>> snapshot is different because they're using different newline formats.
>
> The correct behaviour is that the element's value becomes
>    "Line 1%0ALine 2%0ALine 3"

O.K.

>> See step 1 at
>> <http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#attr-textarea-wrap-hard-state>.
>>
>> That says that the 'value' getter returns the raw value + newlines  
>> normalized
>> to "\r\n".
>
> No, it says that the submission value has that transformation applied.  
> The
> '.value' getter returns the _raw_ value, which doesn't have U+000Ds added
> by the user agent (they can only be there if the script added them).

O.K.

>> I always thought that meant that the raw value (what was parsed into the
>> DOM)
>
> The "raw value" is what the user edits.

O.K.

>> contained newlines normalized to "\r\n" too for textareas and that a
>> browser with an HTML5 parser like Firefox would automatically show
>> newlines normalized to "\r\n" without even having a conversion done
>> (internally) for the 'value' getter.
>
> No, the HTML parser strips U+000D characters ("\r").

O.K.

>> I'm also not sure step 1 applies to the 'value' setter. I can't tell for
>> sure. It looks like not, but not sure.
>
> It doesn't apply to .value at all, only to the 'value' concept, which is  
> a
> concept used in form submission and constraint validation.

O.K.

>> Also, I'm not sure if setSelectionRange() should operate on the raw
>> value, or the transformed value in step 1.
>
> Raw value, because <textarea> is defined as an element that "represents a
> multiline plain text edit control for the element's raw value".

O.K.

>> Opera's not using an HTML5 parser yet, so I can't check what it might
>> do, but could this be clarified?
>
> It's not clear to me what isn't clear. :-) Could you elaborate on what  
> the
> spec says that led you to your interpretation?

At some point, I got the idea that all browsers were going to make the  
.value setter/getter normalize newlines so that it matched the newline  
format that's submitted. Opera does this.

I don't remember how I got the idea for sure, but I think I suggested this  
a while ago and just thought you agreed and put it in the spec. I guess it  
just slipped by me that you were not talking about the value getter/setter.

So, Opera is just completely wrong with its behavior and even when it gets  
an HTML5 parser, the value and textContent getters/setters and the user  
input handling will have to be fixed to not normalize newlines to \r\n.  
Then, Opera will match Firefox I think. Although, last time I checked, I  
think webkit's .value getter/setter normalizations everything to just \n,  
which would be wrong too as no normalize should be done.

So, I understand now. If everyone else understands too, no need to clarify  
anything here.

But, what happens when pressing ENTER in a textarea? Should it always  
create a \n in the raw value? What if you paste content that has "Line  
1\r\nLine 2" in an empty textarea area? Will the raw value contain "Line  
1\nLine 2" then?

Just want to make sure.

>> In
>> <http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream>
>> it says:
>>
>> "U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
>> characters are treated specially. Any CR characters that are followed by
>> LF characters must be removed, and any CR characters not followed by LF
>> characters must be converted to LF characters. Thus, newlines in HTML
>> DOMs are represented by LF characters, and there are never any CR
>> characters in the input to the tokenization stage."
>>
>> Does that mean that the raw value of the parsed textarea should only
>> ever have '\n' for newlines (unless the 'value' setter is used in JS to
>> introduce '\r' characters)?
>
> Yes.

O.K.

>> If so, does that mean that setSelectionRange() should operate on the
>> raw, internal value (that just has '\n' for newlines in it normally),
>> but the 'value' getter still returns the transformed value with newlines
>> normalized to "\r\n"?
>
> The value getting doesn't return the transformed value. See the  
> definition
> of the value getting for details.

O.K.

>> I see
>> <http://www.whatwg.org/specs/web-apps/current-work/multipage/editing.html#dom-textarea/input-setselectionrange>,
>> but it doesn't mention this.
>
> I've clarified the spec to indicate that setSelectionRange() and company
> operate on the raw value.

Thanks

-- 
Michael

Received on Tuesday, 4 January 2011 17:30:33 UTC