<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>28401</bug_id>
          
          <creation_ts>2015-04-03 15:04:02 +0000</creation_ts>
          <short_desc>More aggressive sanitizing input value policy</short_desc>
          <delta_ts>2016-04-09 00:15:26 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>HTML</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>MOVED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          <blocked>28402</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Mounir Lamouri">mounir</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>bzbarsky</cc>
    
    <cc>d</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>tkent</cc>
          
          <qa_contact>contributor</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>119220</commentid>
    <comment_count>0</comment_count>
    <who name="Mounir Lamouri">mounir</who>
    <bug_when>2015-04-03 15:04:02 +0000</bug_when>
    <thetext>At the moment, the input value sanitizing algorithm is required to be run in very specific situations (form reset, type change, content attribute set) which means that, for example, if the user types &quot;  foo@bar.com  &quot; it seems fine per spec to return &quot;  foo@bar.com  &quot; as the input&apos;s value.

Chrome has a different behaviour where &quot;  foo@bar.com  &quot; will yield &quot;foo@bar.com&quot; as the input&apos;s value because the sanitization algorithm happens more often than what the specification requires. This is leading to confusing developer confusion (see discussion here https://bugzil.la/1132142).

As far as Boris and I can tell, this behaviour is not per spec and it might be good to update the spec to require it given that it would significantly improve developers&apos; experience.

Tamura, what do you think?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119240</commentid>
    <comment_count>1</comment_count>
    <who name="Kent Tamura">tkent</who>
    <bug_when>2015-04-05 23:08:12 +0000</bug_when>
    <thetext>I thought the intention of the specification was that &quot;value IDL attribute always returns a sanitized value.&quot;  Returning arbitrary string doesn&apos;t help developers.
The specification doesn&apos;t assume any UI.  Sanitizing user input in Chrome is a part of the UI implementation.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119243</commentid>
    <comment_count>2</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2015-04-06 02:10:42 +0000</bug_when>
    <thetext>&gt; I thought the intention of the specification was that &quot;value IDL attribute
&gt; always returns a sanitized value.&quot; 

Where do you see this in the specification?  Per spec, _setting_ the value sanitizes, but getting .value doesn&apos;t perform sanitization of any sort.

&gt; Sanitizing user input in Chrome is a part of the UI implementation.

Can you not dispatch key events to the input to effectively type in a value?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119245</commentid>
    <comment_count>3</comment_count>
    <who name="Kent Tamura">tkent</who>
    <bug_when>2015-04-06 06:41:14 +0000</bug_when>
    <thetext>(In reply to Boris Zbarsky from comment #2)
&gt; &gt; I thought the intention of the specification was that &quot;value IDL attribute
&gt; &gt; always returns a sanitized value.&quot; 
&gt; 
&gt; Where do you see this in the specification?  Per spec, _setting_ the value
&gt; sanitizes, but getting .value doesn&apos;t perform sanitization of any sort.

Nowhere.  It&apos; was just my impression when we decided the behavior.

&gt; &gt; Sanitizing user input in Chrome is a part of the UI implementation.
&gt; 
&gt; Can you not dispatch key events to the input to effectively type in a value?

Do you mean rejecting to type leading/trailing whitespaces?  Yeah, we can do it in many cases.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119246</commentid>
    <comment_count>4</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2015-04-06 06:55:23 +0000</bug_when>
    <thetext>No, I mean if a page can dispatch key events that look like typing to the input, and that will cause its value to change, then the sanitization behavior of typing, if any, needs to be specified.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119247</commentid>
    <comment_count>5</comment_count>
    <who name="Kent Tamura">tkent</who>
    <bug_when>2015-04-06 07:00:06 +0000</bug_when>
    <thetext>(In reply to Boris Zbarsky from comment #4)
&gt; No, I mean if a page can dispatch key events that look like typing to the
&gt; input, and that will cause its value to change, then the sanitization
&gt; behavior of typing, if any, needs to be specified.

It&apos;s unacceptable.  Even non-editable &lt;div tabindex=0&gt; dispatches key events.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119248</commentid>
    <comment_count>6</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2015-04-06 07:07:38 +0000</bug_when>
    <thetext>&gt; It&apos;s unacceptable. 

What&apos;s unacceptable?

&gt; Even non-editable &lt;div tabindex=0&gt; dispatches key events.

What does that have to do with my point about pages simulating typing and then examining the UA&apos;s reaction to it?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119249</commentid>
    <comment_count>7</comment_count>
    <who name="Kent Tamura">tkent</who>
    <bug_when>2015-04-06 07:18:12 +0000</bug_when>
    <thetext>(In reply to Boris Zbarsky from comment #6)
&gt; What does that have to do with my point about pages simulating typing and
&gt; then examining the UA&apos;s reaction to it?

Oh, I&apos;m sorry I misunderstood your comment #4.  Do you mean calling EventTarget::dispatchEvent() by script in a page?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119250</commentid>
    <comment_count>8</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2015-04-06 07:26:18 +0000</bug_when>
    <thetext>Yes, exactly.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119332</commentid>
    <comment_count>9</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2015-04-07 22:57:20 +0000</bug_when>
    <thetext>dispatchEvent() shouldn&apos;t cause text inputs to do anything, since the editing happens as a result of the browser doing the equivalent of:

   if (dispatchEvent(key)) {
     // do the edit
   }

The value sanitization algorithm is supposed to sanitise the _content attribute_ from the serialisation, not the user input. It has little to do with user input.

There&apos;s nothing technically in the spec that defines how the UI should work, intentionally. You could implement &lt;input type=email&gt; as something that only accepts interpretive dance input, or, more seriously, sign language input or some such. In some situations, there&apos;s no concept of trailing whitespace. In others, there is. Whether it&apos;s exposed or not is up to the UA.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119334</commentid>
    <comment_count>10</comment_count>
    <who name="Kent Tamura">tkent</who>
    <bug_when>2015-04-07 23:19:53 +0000</bug_when>
    <thetext>(In reply to Ian &apos;Hixie&apos; Hickson from comment #9)
&gt; The value sanitization algorithm is supposed to sanitise the _content
&gt; attribute_ from the serialisation, not the user input. It has little to do
&gt; with user input.

So, the value [1] may return arbitrary string, and form data set [2] can contain arbitrary string even for email inputs.
Is this expected and useful for web authors?

[1] https://html.spec.whatwg.org/multipage/forms.html#concept-fe-value
[2] https://html.spec.whatwg.org/multipage/forms.html#constructing-the-form-data-set</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119406</commentid>
    <comment_count>11</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2015-04-09 19:09:53 +0000</bug_when>
    <thetext>The form data set can&apos;t because you can&apos;t submit a form that has a form control in the &quot;invalid data&quot; state. (Unless you override the validation logic, in which case saving the user&apos;s bogus input is the whole point.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119506</commentid>
    <comment_count>12</comment_count>
    <who name="Kent Tamura">tkent</who>
    <bug_when>2015-04-15 01:11:48 +0000</bug_when>
    <thetext>(In reply to Ian &apos;Hixie&apos; Hickson from comment #11)
&gt; The form data set can&apos;t because you can&apos;t submit a form that has a form
&gt; control in the &quot;invalid data&quot; state. (Unless you override the validation
&gt; logic, in which case saving the user&apos;s bogus input is the whole point.)

Ah, I see.  It&apos;s reasonable.
I&apos;d like to ask to add a sentence that &apos;value&apos; can be an arbitrary string to the specification.

Summary:
 - &apos;value&apos; can return an arbitrary string because the specification doesn&apos;t define &lt;input&gt; UI behavior and a UI implementation may put an arbitrary string to the value.

  - Chrome always puts a sanitized string to the value.  It&apos;s a part of Chrome&apos;s UI implementation, and conforms to the standard.

 - Even if the value is valid, web authors can&apos;t assume &apos;value&apos; is sanitized.  So, an email input value might have leading/trailing spaces.  It&apos;s same for form submission.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119589</commentid>
    <comment_count>13</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2015-04-16 22:10:29 +0000</bug_when>
    <thetext>Yeah that&apos;s reasonable.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119750</commentid>
    <comment_count>14</comment_count>
    <who name="Mounir Lamouri">mounir</who>
    <bug_when>2015-04-22 09:50:38 +0000</bug_when>
    <thetext>(In reply to Kent Tamura from comment #12)
&gt;  - Even if the value is valid, web authors can&apos;t assume &apos;value&apos; is
&gt; sanitized.  So, an email input value might have leading/trailing spaces. 
&gt; It&apos;s same for form submission.

That create compatibility issues because authors will develop on a browser which will have that behaviour and will not realize it is actually browser-specific. Why not make this a requirement?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119798</commentid>
    <comment_count>15</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2015-04-23 03:25:41 +0000</bug_when>
    <thetext>If there was a browser that only allowed you to enter e-mail addresses in ASCII, which is a perfectly allowable UI, should we require that UI of all browsers?

If there was a browser that allowed emoji in e-mail addresses, should we require all browsers to allow that?

Whether you can type in invalid e-mail addresses or not is a UI issue. Mandating UI is not a path I think we should go down.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119799</commentid>
    <comment_count>16</comment_count>
    <who name="Kent Tamura">tkent</who>
    <bug_when>2015-04-23 05:39:47 +0000</bug_when>
    <thetext>(In reply to Kent Tamura from comment #12)
&gt;  - Even if the value is valid, web authors can&apos;t assume &apos;value&apos; is
&gt; sanitized.  So, an email input value might have leading/trailing spaces. 
&gt; It&apos;s same for form submission.

I read the specification again, and found the second sentence was incorrect.

https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)
&gt; Constraint validation: While the value of the element is neither the empty string nor a single valid e-mail address, the element is suffering from a type mismatch.

An email address with leading/trailing spaces is invalid.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119891</commentid>
    <comment_count>17</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2015-04-24 19:24:02 +0000</bug_when>
    <thetext>Right, I said that in comment 11. :-)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125788</commentid>
    <comment_count>18</comment_count>
    <who name="Domenic Denicola">d</who>
    <bug_when>2016-04-09 00:15:26 +0000</bug_when>
    <thetext>https://github.com/whatwg/html/pull/1019</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>