4.10.22 Form submission

4.10.22.1 Introduction

This section is non-normative.

When a form is submitted, the data in the form is converted into the structure specified by the enctype, and then sent to the destination specified by the action using the given method.

For example, take the following form:

<form action="/find.cgi" method=get>
 <input type=text name=t>
 <input type=search name=q>
 <input type=submit>
</form>

If the user types in "cats" in the first field and "fur" in the second, and then hits the submit button, then the user agent will load /find.cgi?t=cats&q=fur.

On the other hand, consider this form:

<form action="/find.cgi" method=post enctype="multipart/form-data">
 <input type=text name=t>
 <input type=search name=q>
 <input type=submit>
</form>

Given the same user input, the result on submission is quite different: the user agent instead does an HTTP POST to the given URL, with as the entity body something like the following text:

------kYFrd4jNJEgCervE
Content-Disposition: form-data; name="t"

cats
------kYFrd4jNJEgCervE
Content-Disposition: form-data; name="q"

fur
------kYFrd4jNJEgCervE--
4.10.22.2 Implicit submission

A form element's default button is the first submit button in tree order whose form owner is that form element.

If the platform supports letting the user submit a form implicitly (for example, on some platforms hitting the "enter" key while a text field is focused implicitly submits the form), then doing so must cause the form's default button's activation behavior, if any, to be run.

Consequently, if the default button is disabled, the form is not submitted when such an implicit submission mechanism is used. (A button has no activation behavior when disabled.)

If the form has no submit button, then the implicit submission mechanism must just submit the form element from the form element itself.

4.10.22.3 Form submission algorithm

When a form element form is submitted from an element submitter (typically a button), optionally with a submitted from submit() method flag set, the user agent must run the following steps:

  1. Let form document be the form's Document.

  2. If form document has no associated browsing context or its browsing context had its sandboxed forms browsing context flag set when the Document was created, then abort these steps without doing anything.

  3. Let form browsing context be the browsing context of form document.

  4. If form is already being submitted (i.e. the form was submitted again while processing the events fired from the next two steps, probably from a script redundantly calling the submit() method on form), then abort these steps. This doesn't affect the earlier instance of this algorithm.

  5. If the submitted from submit() method flag is not set, and the submitter element's no-validate state is false, then interactively validate the constraints of form and examine the result: if the result is negative (the constraint validation concluded that there were invalid fields and probably informed the user of this) then abort these steps.

  6. If the submitted from submit() method flag is not set, then fire a simple event that is cancelable named submit, at form. If the event's default action is prevented (i.e. if the event is canceled) then abort these steps. Otherwise, continue (effectively the default action is to perform the submission).

  7. Let form data set be the result of constructing the form data set for form in the context of submitter.

  8. Let action be the submitter element's action.

  9. If action is the empty string, let action be the document's address of the form document.

    This step is a willful violation of RFC 3986, which would require base URL processing here. This violation is motivated by a desire for compatibility with legacy content. [RFC3986]

  10. Resolve the URL action, relative to the submitter element. If this fails, abort these steps. Otherwise, let action be the resulting absolute URL.

  11. Let scheme be the <scheme> of the resulting absolute URL.

  12. Let enctype be the submitter element's enctype.

  13. Let method be the submitter element's method.

  14. Let target be the submitter element's target.

  15. If the user indicated a specific browsing context to use when submitting the form, then let target browsing context be that browsing context. Otherwise, apply the rules for choosing a browsing context given a browsing context name using target as the name and form browsing context as the context in which the algorithm is executed, and let target browsing context be the resulting browsing context.

  16. If target browsing context was created in the previous step, or if the form document has not yet completely loaded, then let replace be true. Otherwise, let it be false.

  17. Select the appropriate row in the table below based on the value of scheme as given by the first cell of each row. Then, select the appropriate cell on that row based on the value of method as given in the first cell of each column. Then, jump to the steps named in that cell and defined below the table.

    GET POST
    http Mutate action URL Submit as entity body
    https Mutate action URL Submit as entity body
    ftp Get action URL Get action URL
    javascript Get action URL Get action URL
    data Get action URL Post to data:
    mailto Mail with headers Mail as body

    If scheme is not one of those listed in this table, then the behavior is not defined by this specification. User agents should, in the absence of another specification defining this, act in a manner analogous to that defined in this specification for similar schemes.

    The behaviors are as follows:

    Mutate action URL

    Let query be the result of encoding the form data set using the application/x-www-form-urlencoded encoding algorithm, interpreted as a US-ASCII string.

    Let destination be a new URL that is equal to the action except that its <query> component is replaced by query (adding a "?" (U+003F) character if appropriate).

    Navigate target browsing context to destination. If replace is true, then target browsing context must be navigated with replacement enabled.

    Submit as entity body

    Let entity body be the result of encoding the form data set using the appropriate form encoding algorithm.

    Let MIME type be determined as follows:

    If enctype is application/x-www-form-urlencoded
    Let MIME type be "application/x-www-form-urlencoded".
    If enctype is multipart/form-data
    Let MIME type be the concatenation of the string "multipart/form-data;", a U+0020 SPACE character, the string "boundary=", and the multipart/form-data boundary string generated by the multipart/form-data encoding algorithm.
    If enctype is text/plain
    Let MIME type be "text/plain".

    Otherwise, navigate target browsing context to action using the HTTP method given by method and with entity body as the entity body, of type MIME type. If replace is true, then target browsing context must be navigated with replacement enabled.

    Get action URL

    Navigate target browsing context to action. If replace is true, then target browsing context must be navigated with replacement enabled.

    Post to data:

    Let data be the result of encoding the form data set using the appropriate form encoding algorithm.

    If action contains the string "%%%%" (four U+0025 PERCENT SIGN characters), then %-escape all bytes in data that, if interpreted as US-ASCII, do not match the unreserved production in the URI Generic Syntax, and then, treating the result as a US-ASCII string, further %-escape all the U+0025 PERCENT SIGN characters in the resulting string and replace the first occurrence of "%%%%" in action with the resulting double-escaped string. [RFC3986]

    Otherwise, if action contains the string "%%" (two U+0025 PERCENT SIGN characters in a row, but not four), then %-escape all characters in data that, if interpreted as US-ASCII, do not match the unreserved production in the URI Generic Syntax, and then, treating the result as a US-ASCII string, replace the first occurrence of "%%" in action with the resulting escaped string. [RFC3986]

    Navigate target browsing context to the potentially modified action (which will be a data: URL). If replace is true, then target browsing context must be navigated with replacement enabled.

    Mail with headers

    Let headers be the resulting encoding the form data set using the application/x-www-form-urlencoded encoding algorithm, interpreted as a US-ASCII string.

    Replace occurrences of "+" (U+002B) characters in headers with the string "%20".

    Let destination consist of all the characters from the first character in action to the character immediately before the first "?" (U+003F) character, if any, or the end of the string if there are none.

    Append a single "?" (U+003F) character to destination.

    Append headers to destination.

    Navigate target browsing context to destination. If replace is true, then target browsing context must be navigated with replacement enabled.

    Mail as body

    Let body be the resulting encoding the form data set using the appropriate form encoding algorithm and then %-escaping all the bytes in the resulting byte string that, when interpreted as US-ASCII, do not match the unreserved production in the URI Generic Syntax. [RFC3986]

    Let destination have the same value as action.

    If destination does not contain a "?" (U+003F) character, append a single "?" (U+003F) character to destination. Otherwise, append a single U+0026 AMPERSAND character (&).

    Append the string "body=" to destination.

    Append body, interpreted as a US-ASCII string, to destination.

    Navigate target browsing context to destination. If replace is true, then target browsing context must be navigated with replacement enabled.

    The appropriate form encoding algorithm is determined as follows:

    If enctype is application/x-www-form-urlencoded
    Use the application/x-www-form-urlencoded encoding algorithm.
    If enctype is multipart/form-data
    Use the multipart/form-data encoding algorithm.
    If enctype is text/plain
    Use the text/plain encoding algorithm.
4.10.22.4 Constructing the form data set

The algorithm to construct the form data set for a form form optionally in the context of a submitter submitter is as follows. If not specified otherwise, submitter is null.

  1. Let controls be a list of all the submittable elements whose form owner is form, in tree order.

  2. Let the form data set be a list of name-value-type tuples, initially empty.

  3. Loop: For each element field in controls, in tree order, run the following substeps:

    1. If any of the following conditions are met, then skip these substeps for this element:

      • The field element has a datalist element ancestor.
      • The field element is disabled.
      • The field element is a button but it is not submitter.
      • The field element is an input element whose type attribute is in the Checkbox state and whose checkedness is false.
      • The field element is an input element whose type attribute is in the Radio Button state and whose checkedness is false.
      • The field element is not an input element whose type attribute is in the Image Button state, and either the field element does not have a name attribute specified, or its name attribute's value is the empty string.
      • The field element is an object element that is not using a plugin.

      Otherwise, process field as follows:

    2. Let type be the value of the type IDL attribute of field.

    3. If the field element is an input element whose type attribute is in the Image Button state, then run these further nested substeps:

      1. If the field element has a name attribute specified and its value is not the empty string, let name be that value followed by a single "." (U+002E) character. Otherwise, let name be the empty string.

      2. Let namex be the string consisting of the concatenation of name and a single "x" (U+0078) character.

      3. Let namey be the string consisting of the concatenation of name and a single "y" (U+0079) character.

      4. The field element is submitter, and before this algorithm was invoked the user indicated a coordinate. Let x be the x-component of the coordinate selected by the user, and let y be the y-component of the coordinate selected by the user.

      5. Append an entry to the form data set with the name namex, the value x, and the type type.

      6. Append an entry to the form data set with the name namey and the value y, and the type type.

      7. Skip the remaining substeps for this element: if there are any more elements in controls, return to the top of the loop step, otherwise, jump to the end step below.

    4. Let name be the value of the field element's name attribute.

    5. If the field element is a select element, then for each option element in the select element whose selectedness is true, append an entry to the form data set with the name as the name, the value of the option element as the value, and type as the type.

    6. Otherwise, if the field element is an input element whose type attribute is in the Checkbox state or the Radio Button state, then run these further nested substeps:

      1. If the field element has a value attribute specified, then let value be the value of that attribute; otherwise, let value be the string "on".

      2. Append an entry to the form data set with name as the name, value as the value, and type as the type.

    7. Otherwise, if the field element is an input element whose type attribute is in the File Upload state, then for each file selected in the input element, append an entry to the form data set with the name as the name, the file (consisting of the name, the type, and the body) as the value, and type as the type. If there are no selected files, then append an entry to the form data set with the name as the name, the empty string as the value, and application/octet-stream as the type.

    8. Otherwise, if the field element is an object element: try to obtain a form submission value from the plugin, and if that is successful, append an entry to the form data set with name as the name, the returned form submission value as the value, and the string "object" as the type.

    9. Otherwise, append an entry to the form data set with name as the name, the value of the field element as the value, and type as the type.

    10. If the element has a form control dirname attribute, and that attribute's value is not the empty string, then run these substeps:

      1. Let dirname be the value of the element's dirname attribute.

      2. Let dir be the string "ltr" if the directionality of the element is 'ltr', and "rtl" otherwise (i.e. when the directionality of the element is 'rtl').

      3. Append an entry to the form data set with dirname as the name, dir as the value, and the string "direction" as the type.

      An element can only have a form control dirname attribute if it is a textarea element or an input element whose type attribute is in either the Text state or the Search state.

  4. End: For the name and value of each entry in the form data set whose type is not "file", replace every occurrence of a "CR" (U+000D) character not followed by a "LF" (U+000A) character, and every occurrence of a "LF" (U+000A) character not preceded by a "CR" (U+000D) character, by a two-character string consisting of a U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair.

    In the case of the value of textarea elements, this newline normalization is redundant, as it is already normalized from its raw value for the purposes of the DOM API.

  5. Return the form data set.

4.10.22.5 URL-encoded form data

This form data set encoding is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences.

The application/x-www-form-urlencoded encoding algorithm is as follows:

  1. Let result be the empty string.

  2. If the form element has an accept-charset attribute, then, taking into account the characters found in the form data set's names and values, and the character encodings supported by the user agent, select a character encoding from the list given in the form's accept-charset attribute that is an ASCII-compatible character encoding. If none of the encodings are supported, or if none are listed, then let the selected character encoding be UTF-8.

    Otherwise, if the document's character encoding is an ASCII-compatible character encoding, then that is the selected character encoding.

    Otherwise, let the selected character encoding be UTF-8.

  3. Let charset be the preferred MIME name of the selected character encoding.

  4. For each entry in the form data set, perform these substeps:

    1. If the entry's name is "_charset_" and its type is "hidden", replace its value with charset.

    2. If the entry's type is "file", replace its value with the file's filename only.

    3. For each character in the entry's name and value that cannot be expressed using the selected character encoding, replace the character by a string consisting of a U+0026 AMPERSAND character (&), a "#" (U+0023) character, one or more characters in the range ASCII digits representing the Unicode code point of the character in base ten, and finally a U+003B SEMICOLON character (;).

    4. Encode the entry's name and value using the selected character encoding. The entry's name and value are now byte strings.

    5. For each byte in the entry's name and value, apply the appropriate subsubsteps from the following list:

      The byte is 0x20 (U+0020 SPACE if interpreted as ASCII)
      Replace the byte with a single 0x2B byte ("+" (U+002B) character if interpreted as ASCII).
      If the byte is in the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A

      Leave the byte as is.

      Otherwise
      1. Let s be a string consisting of a "%" (U+0025) character followed by two characters in the ranges ASCII digits and U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F representing the hexadecimal value of the byte in question (zero-padded if necessary).

      2. Encode the string s as US-ASCII, so that it is now a byte string.

      3. Replace the byte in question in the name or value being processed by the bytes in s, preserving their relative order.

    6. Interpret the entry's name and value as Unicode strings encoded in US-ASCII. (All of the bytes in the string will be in the range 0x00 to 0x7F; the high bit will be zero throughout.) The entry's name and value are now Unicode strings again.

    7. If the entry's name is "isindex", its type is "text", and this is the first entry in the form data set, then append the value to result and skip the rest of the substeps for this entry, moving on to the next entry, if any, or the next step in the overall algorithm otherwise.

    8. If this is not the first entry, append a single U+0026 AMPERSAND character (&) to result.

    9. Append the entry's name to result.

    10. Append a single "=" (U+003D) character to result.

    11. Append the entry's value to result.

  5. Encode result as US-ASCII and return the resulting byte stream.

To decode application/x-www-form-urlencoded payloads, the following algorithm should be used. This algorithm uses as inputs the payload itself, payload, consisting of a Unicode string using only characters in the range U+0000 to U+007F; a default character encoding encoding; and optionally an isindex flag indicating that the payload is to be processed as if it had been generated for a form containing an isindex control. The output of this algorithm is a sorted list of name-value pairs. If the isindex flag is set and the first control really was an isindex control, then the first name-value pair will have as its name the empty string.

  1. Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).

  2. If the isindex flag is set and the first string in strings does not contain a "=" (U+003D) character, insert a U+003D EQUALS SIGN character (=) at the start of the first string in strings.

  3. Let pairs be an empty list of name-value pairs.

  4. For each string string in strings, run these substeps:

    1. If string contains a "=" (U+003D) character, then let name be the substring of string from the start of string up to but excluding its first "=" (U+003D) character, and let value be the substring from the first character, if any, after the first "=" (U+003D) character up to the end of string. If the first U+003D EQUALS SIGN character (=) is the first character, then name will be the empty string. If it is the last character, then value will be the empty string.

      Otherwise, string contains no "=" (U+003D) characters. Let name have the value of string and let value be the empty string.

    2. Replace any "+" (U+002B) characters in name and value with U+0020 SPACE characters.

    3. Replace any escape in name and value with the character represented by the escape. This replacement most not be recursive.

      An escape is a "%" (U+0025) character followed by two characters in the ranges ASCII digits, U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F, and U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F.

      The character represented by an escape is the Unicode character whose code point is equal to the value of the two characters after the "%" (U+0025) character, interpreted as a hexadecimal number (in the range 0..255).

      So for instance the string "A%2BC" would become "A+C". Similarly, the string "100%25AA%21" becomes the string "100%AA!".

    4. Convert the name and value strings to their byte representation in ISO-8859-1 (i.e. convert the Unicode string to a byte string, mapping code points to byte values directly).

    5. Add a pair consisting of name and value to pairs.

  5. If any of the name-value pairs in pairs have a name component consisting of the string "_charset_" encoded in US-ASCII, and the value component of the first such pair, when decoded as US-ASCII, is the name of a supported character encoding, then let encoding be that character encoding (replacing the default passed to the algorithm).

  6. Convert the name and value components of each name-value pair in pairs to Unicode by interpreting the bytes according to the encoding encoding.

  7. Return pairs.

Parameters on the application/x-www-form-urlencoded MIME type are ignored. In particular, this MIME type does not support the charset parameter.

4.10.22.6 Multipart form data

The multipart/form-data encoding algorithm is as follows:

  1. Let result be the empty string.

  2. If the algorithm was invoked with an explicit character encoding, let the selected character encoding be that encoding. (This algorithm is used by other specifications, which provide an explicit character encoding to avoid the dependency on the form element described in the next paragraph.)

    Otherwise, if the form element has an accept-charset attribute, then, taking into account the characters found in the form data set's names and values, and the character encodings supported by the user agent, select a character encoding from the list given in the form's accept-charset attribute that is an ASCII-compatible character encoding. If none of the encodings are supported, or if none are listed, then let the selected character encoding be UTF-8.

    Otherwise, if the document's character encoding is an ASCII-compatible character encoding, then that is the selected character encoding.

    Otherwise, let the selected character encoding be UTF-8.

  3. Let charset be the preferred MIME name of the selected character encoding.

  4. For each entry in the form data set, perform these substeps:

    1. If the entry's name is "_charset_" and its type is "hidden", replace its value with charset.

    2. For each character in the entry's name and value that cannot be expressed using the selected character encoding, replace the character by a string consisting of a U+0026 AMPERSAND character (&), a "#" (U+0023) character, one or more characters in the range ASCII digits representing the Unicode code point of the character in base ten, and finally a U+003B SEMICOLON character (;).

  5. Encode the (now mutated) form data set using the rules described by RFC 2388, Returning Values from Forms: multipart/form-data, and return the resulting byte stream. [RFC2388]

    Each entry in the form data set is a field, the name of the entry is the field name and the value of the entry is the field value.

    The order of parts must be the same as the order of fields in the form data set. Multiple entries with the same name must be treated as distinct fields.

    In particular, this means that multiple files submitted as part of a single <input type=file multiple> element will result in each file having its own field; the "sets of files" feature ("multipart/mixed") of RFC 2388 is not used.

    The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).

    File names included in the generated multipart/form-data resource (as part of file fields) must use the character encoding selected above, though the precise name may be approximated if necessary (e.g. newlines could be removed from file names, quotes could be changed to "%22", and characters not expressible in the selected character encoding could be replaced by other characters). User agents must not use the RFC 2231 encoding suggested by RFC 2388.

    The boundary used by the user agent in generating the return value of this algorithm is the multipart/form-data boundary string. (This value is used to generate the MIME type of the form submission payload generated by this algorithm.)

For details on how to interpret multipart/form-data payloads, see RFC 2388. [RFC2388]

4.10.22.7 Plain text form data

The text/plain encoding algorithm is as follows:

  1. Let result be the empty string.

  2. If the form element has an accept-charset attribute, then, taking into account the characters found in the form data set's names and values, and the character encodings supported by the user agent, select a character encoding from the list given in the form's accept-charset attribute. If none of the encodings are supported, or if none are listed, then let the selected character encoding be UTF-8.

    Otherwise, the selected character encoding is the document's character encoding.

  3. Let charset be the preferred MIME name of the selected character encoding.

  4. If the entry's name is "_charset_" and its type is "hidden", replace its value with charset.

  5. If the entry's type is "file", replace its value with the file's filename only.

  6. For each entry in the form data set, perform these substeps:

    1. Append the entry's name to result.

    2. Append a single "=" (U+003D) character to result.

    3. Append the entry's value to result.

    4. Append a "CR" (U+000D) "LF" (U+000A) character pair to result.

  7. Encode result using the selected character encoding and return the resulting byte stream.

Payloads using the text/plain format are intended to be human readable. They are not reliably interpretable by computer, as the format is ambiguous (for example, there is no way to distinguish a literal newline in a value from the newline at the end of the value).

4.10.23 Resetting a form

When a form element form is reset, the user agent must fire a simple event named reset, that is cancelable, at form, and then, if that event is not canceled, must invoke the reset algorithm of each resettable element whose form owner is form.

Each resettable element defines its own reset algorithm. Changes made to form controls as part of these algorithms do not count as changes caused by the user (and thus, e.g., do not cause input events to fire).