This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28401 - More aggressive sanitizing input value policy
Summary: More aggressive sanitizing input value policy
Status: RESOLVED MOVED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 28402
  Show dependency treegraph
 
Reported: 2015-04-03 15:04 UTC by Mounir Lamouri
Modified: 2016-04-09 00:15 UTC (History)
5 users (show)

See Also:


Attachments

Description Mounir Lamouri 2015-04-03 15:04:02 UTC
At the moment, the input value sanitizing algorithm is required to be run in very specific situations (form reset, type change, content attribute set) which means that, for example, if the user types "  foo@bar.com  " it seems fine per spec to return "  foo@bar.com  " as the input's value.

Chrome has a different behaviour where "  foo@bar.com  " will yield "foo@bar.com" as the input's value because the sanitization algorithm happens more often than what the specification requires. This is leading to confusing developer confusion (see discussion here https://bugzil.la/1132142).

As far as Boris and I can tell, this behaviour is not per spec and it might be good to update the spec to require it given that it would significantly improve developers' experience.

Tamura, what do you think?
Comment 1 Kent Tamura 2015-04-05 23:08:12 UTC
I thought the intention of the specification was that "value IDL attribute always returns a sanitized value."  Returning arbitrary string doesn't help developers.
The specification doesn't assume any UI.  Sanitizing user input in Chrome is a part of the UI implementation.
Comment 2 Boris Zbarsky 2015-04-06 02:10:42 UTC
> I thought the intention of the specification was that "value IDL attribute
> always returns a sanitized value." 

Where do you see this in the specification?  Per spec, _setting_ the value sanitizes, but getting .value doesn't perform sanitization of any sort.

> Sanitizing user input in Chrome is a part of the UI implementation.

Can you not dispatch key events to the input to effectively type in a value?
Comment 3 Kent Tamura 2015-04-06 06:41:14 UTC
(In reply to Boris Zbarsky from comment #2)
> > I thought the intention of the specification was that "value IDL attribute
> > always returns a sanitized value." 
> 
> Where do you see this in the specification?  Per spec, _setting_ the value
> sanitizes, but getting .value doesn't perform sanitization of any sort.

Nowhere.  It' was just my impression when we decided the behavior.

> > Sanitizing user input in Chrome is a part of the UI implementation.
> 
> Can you not dispatch key events to the input to effectively type in a value?

Do you mean rejecting to type leading/trailing whitespaces?  Yeah, we can do it in many cases.
Comment 4 Boris Zbarsky 2015-04-06 06:55:23 UTC
No, I mean if a page can dispatch key events that look like typing to the input, and that will cause its value to change, then the sanitization behavior of typing, if any, needs to be specified.
Comment 5 Kent Tamura 2015-04-06 07:00:06 UTC
(In reply to Boris Zbarsky from comment #4)
> No, I mean if a page can dispatch key events that look like typing to the
> input, and that will cause its value to change, then the sanitization
> behavior of typing, if any, needs to be specified.

It's unacceptable.  Even non-editable <div tabindex=0> dispatches key events.
Comment 6 Boris Zbarsky 2015-04-06 07:07:38 UTC
> It's unacceptable. 

What's unacceptable?

> Even non-editable <div tabindex=0> dispatches key events.

What does that have to do with my point about pages simulating typing and then examining the UA's reaction to it?
Comment 7 Kent Tamura 2015-04-06 07:18:12 UTC
(In reply to Boris Zbarsky from comment #6)
> What does that have to do with my point about pages simulating typing and
> then examining the UA's reaction to it?

Oh, I'm sorry I misunderstood your comment #4.  Do you mean calling EventTarget::dispatchEvent() by script in a page?
Comment 8 Boris Zbarsky 2015-04-06 07:26:18 UTC
Yes, exactly.
Comment 9 Ian 'Hixie' Hickson 2015-04-07 22:57:20 UTC
dispatchEvent() shouldn't cause text inputs to do anything, since the editing happens as a result of the browser doing the equivalent of:

   if (dispatchEvent(key)) {
     // do the edit
   }

The value sanitization algorithm is supposed to sanitise the _content attribute_ from the serialisation, not the user input. It has little to do with user input.

There's nothing technically in the spec that defines how the UI should work, intentionally. You could implement <input type=email> as something that only accepts interpretive dance input, or, more seriously, sign language input or some such. In some situations, there's no concept of trailing whitespace. In others, there is. Whether it's exposed or not is up to the UA.
Comment 10 Kent Tamura 2015-04-07 23:19:53 UTC
(In reply to Ian 'Hixie' Hickson from comment #9)
> The value sanitization algorithm is supposed to sanitise the _content
> attribute_ from the serialisation, not the user input. It has little to do
> with user input.

So, the value [1] may return arbitrary string, and form data set [2] can contain arbitrary string even for email inputs.
Is this expected and useful for web authors?

[1] https://html.spec.whatwg.org/multipage/forms.html#concept-fe-value
[2] https://html.spec.whatwg.org/multipage/forms.html#constructing-the-form-data-set
Comment 11 Ian 'Hixie' Hickson 2015-04-09 19:09:53 UTC
The form data set can't because you can't submit a form that has a form control in the "invalid data" state. (Unless you override the validation logic, in which case saving the user's bogus input is the whole point.)
Comment 12 Kent Tamura 2015-04-15 01:11:48 UTC
(In reply to Ian 'Hixie' Hickson from comment #11)
> The form data set can't because you can't submit a form that has a form
> control in the "invalid data" state. (Unless you override the validation
> logic, in which case saving the user's bogus input is the whole point.)

Ah, I see.  It's reasonable.
I'd like to ask to add a sentence that 'value' can be an arbitrary string to the specification.

Summary:
 - 'value' can return an arbitrary string because the specification doesn't define <input> UI behavior and a UI implementation may put an arbitrary string to the value.

  - Chrome always puts a sanitized string to the value.  It's a part of Chrome's UI implementation, and conforms to the standard.

 - Even if the value is valid, web authors can't assume 'value' is sanitized.  So, an email input value might have leading/trailing spaces.  It's same for form submission.
Comment 13 Ian 'Hixie' Hickson 2015-04-16 22:10:29 UTC
Yeah that's reasonable.
Comment 14 Mounir Lamouri 2015-04-22 09:50:38 UTC
(In reply to Kent Tamura from comment #12)
>  - Even if the value is valid, web authors can't assume 'value' is
> sanitized.  So, an email input value might have leading/trailing spaces. 
> It's same for form submission.

That create compatibility issues because authors will develop on a browser which will have that behaviour and will not realize it is actually browser-specific. Why not make this a requirement?
Comment 15 Ian 'Hixie' Hickson 2015-04-23 03:25:41 UTC
If there was a browser that only allowed you to enter e-mail addresses in ASCII, which is a perfectly allowable UI, should we require that UI of all browsers?

If there was a browser that allowed emoji in e-mail addresses, should we require all browsers to allow that?

Whether you can type in invalid e-mail addresses or not is a UI issue. Mandating UI is not a path I think we should go down.
Comment 16 Kent Tamura 2015-04-23 05:39:47 UTC
(In reply to Kent Tamura from comment #12)
>  - Even if the value is valid, web authors can't assume 'value' is
> sanitized.  So, an email input value might have leading/trailing spaces. 
> It's same for form submission.

I read the specification again, and found the second sentence was incorrect.

https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)
> Constraint validation: While the value of the element is neither the empty string nor a single valid e-mail address, the element is suffering from a type mismatch.

An email address with leading/trailing spaces is invalid.
Comment 17 Ian 'Hixie' Hickson 2015-04-24 19:24:02 UTC
Right, I said that in comment 11. :-)
Comment 18 Domenic Denicola 2016-04-09 00:15:26 UTC
https://github.com/whatwg/html/pull/1019