This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
At the moment, the input value sanitizing algorithm is required to be run in very specific situations (form reset, type change, content attribute set) which means that, for example, if the user types " foo@bar.com " it seems fine per spec to return " foo@bar.com " as the input's value. Chrome has a different behaviour where " foo@bar.com " will yield "foo@bar.com" as the input's value because the sanitization algorithm happens more often than what the specification requires. This is leading to confusing developer confusion (see discussion here https://bugzil.la/1132142). As far as Boris and I can tell, this behaviour is not per spec and it might be good to update the spec to require it given that it would significantly improve developers' experience. Tamura, what do you think?
I thought the intention of the specification was that "value IDL attribute always returns a sanitized value." Returning arbitrary string doesn't help developers. The specification doesn't assume any UI. Sanitizing user input in Chrome is a part of the UI implementation.
> I thought the intention of the specification was that "value IDL attribute > always returns a sanitized value." Where do you see this in the specification? Per spec, _setting_ the value sanitizes, but getting .value doesn't perform sanitization of any sort. > Sanitizing user input in Chrome is a part of the UI implementation. Can you not dispatch key events to the input to effectively type in a value?
(In reply to Boris Zbarsky from comment #2) > > I thought the intention of the specification was that "value IDL attribute > > always returns a sanitized value." > > Where do you see this in the specification? Per spec, _setting_ the value > sanitizes, but getting .value doesn't perform sanitization of any sort. Nowhere. It' was just my impression when we decided the behavior. > > Sanitizing user input in Chrome is a part of the UI implementation. > > Can you not dispatch key events to the input to effectively type in a value? Do you mean rejecting to type leading/trailing whitespaces? Yeah, we can do it in many cases.
No, I mean if a page can dispatch key events that look like typing to the input, and that will cause its value to change, then the sanitization behavior of typing, if any, needs to be specified.
(In reply to Boris Zbarsky from comment #4) > No, I mean if a page can dispatch key events that look like typing to the > input, and that will cause its value to change, then the sanitization > behavior of typing, if any, needs to be specified. It's unacceptable. Even non-editable <div tabindex=0> dispatches key events.
> It's unacceptable. What's unacceptable? > Even non-editable <div tabindex=0> dispatches key events. What does that have to do with my point about pages simulating typing and then examining the UA's reaction to it?
(In reply to Boris Zbarsky from comment #6) > What does that have to do with my point about pages simulating typing and > then examining the UA's reaction to it? Oh, I'm sorry I misunderstood your comment #4. Do you mean calling EventTarget::dispatchEvent() by script in a page?
Yes, exactly.
dispatchEvent() shouldn't cause text inputs to do anything, since the editing happens as a result of the browser doing the equivalent of: if (dispatchEvent(key)) { // do the edit } The value sanitization algorithm is supposed to sanitise the _content attribute_ from the serialisation, not the user input. It has little to do with user input. There's nothing technically in the spec that defines how the UI should work, intentionally. You could implement <input type=email> as something that only accepts interpretive dance input, or, more seriously, sign language input or some such. In some situations, there's no concept of trailing whitespace. In others, there is. Whether it's exposed or not is up to the UA.
(In reply to Ian 'Hixie' Hickson from comment #9) > The value sanitization algorithm is supposed to sanitise the _content > attribute_ from the serialisation, not the user input. It has little to do > with user input. So, the value [1] may return arbitrary string, and form data set [2] can contain arbitrary string even for email inputs. Is this expected and useful for web authors? [1] https://html.spec.whatwg.org/multipage/forms.html#concept-fe-value [2] https://html.spec.whatwg.org/multipage/forms.html#constructing-the-form-data-set
The form data set can't because you can't submit a form that has a form control in the "invalid data" state. (Unless you override the validation logic, in which case saving the user's bogus input is the whole point.)
(In reply to Ian 'Hixie' Hickson from comment #11) > The form data set can't because you can't submit a form that has a form > control in the "invalid data" state. (Unless you override the validation > logic, in which case saving the user's bogus input is the whole point.) Ah, I see. It's reasonable. I'd like to ask to add a sentence that 'value' can be an arbitrary string to the specification. Summary: - 'value' can return an arbitrary string because the specification doesn't define <input> UI behavior and a UI implementation may put an arbitrary string to the value. - Chrome always puts a sanitized string to the value. It's a part of Chrome's UI implementation, and conforms to the standard. - Even if the value is valid, web authors can't assume 'value' is sanitized. So, an email input value might have leading/trailing spaces. It's same for form submission.
Yeah that's reasonable.
(In reply to Kent Tamura from comment #12) > - Even if the value is valid, web authors can't assume 'value' is > sanitized. So, an email input value might have leading/trailing spaces. > It's same for form submission. That create compatibility issues because authors will develop on a browser which will have that behaviour and will not realize it is actually browser-specific. Why not make this a requirement?
If there was a browser that only allowed you to enter e-mail addresses in ASCII, which is a perfectly allowable UI, should we require that UI of all browsers? If there was a browser that allowed emoji in e-mail addresses, should we require all browsers to allow that? Whether you can type in invalid e-mail addresses or not is a UI issue. Mandating UI is not a path I think we should go down.
(In reply to Kent Tamura from comment #12) > - Even if the value is valid, web authors can't assume 'value' is > sanitized. So, an email input value might have leading/trailing spaces. > It's same for form submission. I read the specification again, and found the second sentence was incorrect. https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email) > Constraint validation: While the value of the element is neither the empty string nor a single valid e-mail address, the element is suffering from a type mismatch. An email address with leading/trailing spaces is invalid.
Right, I said that in comment 11. :-)
https://github.com/whatwg/html/pull/1019