HTML5: Edition for Web Authors

← 4.10.21 Constraints – Table of contents – 4.11 Interactive elements →

1. 1. 4.10.22 Form submission
    1. 4.10.22.1 URL-encoded form data
    2. 4.10.22.2 Plain text form data

4.10.22 Form submission

When a form is submitted, the data in the form is converted into the structure specified by the enctype, and then sent to the destination specified by the action using the given method.

For example, take the following form:

<form action="/find.cgi" method=get>
 <input type=text name=t>
 <input type=search name=q>
 <input type=submit>
</form>

If the user types in "cats" in the first field and "fur" in the second, and then hits the submit button, then the user agent will load /find.cgi?t=cats&q=fur.

On the other hand, consider this form:

<form action="/find.cgi" method=post enctype="multipart/form-data">
 <input type=text name=t>
 <input type=search name=q>
 <input type=submit>
</form>

Given the same user input, the result on submission is quite different: the user agent instead does an HTTP POST to the given URL, with as the entity body something like the following text:

------kYFrd4jNJEgCervE
Content-Disposition: form-data; name="t"

cats
------kYFrd4jNJEgCervE
Content-Disposition: form-data; name="q"

fur
------kYFrd4jNJEgCervE--

4.10.22.1 URL-encoded form data

This form data set encoding is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences.

To decode application/x-www-form-urlencoded payloads, the following algorithm should be used. This algorithm uses as inputs the payload itself, payload, consisting of a Unicode string using only characters in the range U+0000 to U+007F; a default character encoding encoding; and optionally an isindex flag indicating that the payload is to be processed as if it had been generated for a form containing an isindex control. The output of this algorithm is a sorted list of name-value pairs. If the isindex flag is set and the first control really was an isindex control, then the first name-value pair will have as its name the empty string.

Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).
If the isindex flag is set and the first string in strings does not contain a "=" (U+003D) character, insert a U+003D EQUALS SIGN character (=) at the start of the first string in strings.
Let pairs be an empty list of name-value pairs.
For each string string in strings, run these substeps:
1. If string contains a "=" (U+003D) character, then let name be the substring of string from the start of string up to but excluding its first "=" (U+003D) character, and let value be the substring from the first character, if any, after the first "=" (U+003D) character up to the end of string. If the first U+003D EQUALS SIGN character (=) is the first character, then name will be the empty string. If it is the last character, then value will be the empty string.
  
  Otherwise, string contains no "=" (U+003D) characters. Let name have the value of string and let value be the empty string.
2. Replace any "+" (U+002B) characters in name and value with U+0020 SPACE characters.
3. Replace any escape in name and value with the character represented by the escape. This replacement most not be recursive.
  
  An escape is a "%" (U+0025) character followed by two characters in the ranges ASCII digits, U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F, and U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F.
  
  The character represented by an escape is the Unicode character whose code point is equal to the value of the two characters after the "%" (U+0025) character, interpreted as a hexadecimal number (in the range 0..255).
  
  So for instance the string "A%2BC" would become "A+C". Similarly, the string "100%25AA%21" becomes the string "100%AA!".
4. Convert the name and value strings to their byte representation in ISO-8859-1 (i.e. convert the Unicode string to a byte string, mapping code points to byte values directly).
5. Add a pair consisting of name and value to pairs.
If any of the name-value pairs in pairs have a name component consisting of the string "_charset_" encoded in US-ASCII, and the value component of the first such pair, when decoded as US-ASCII, is the name of a supported character encoding, then let encoding be that character encoding (replacing the default passed to the algorithm).
Convert the name and value components of each name-value pair in pairs to Unicode by interpreting the bytes according to the encoding encoding.
Return pairs.

Parameters on the application/x-www-form-urlencoded MIME type are ignored. In particular, this MIME type does not support the charset parameter.

For details on how to interpret multipart/form-data payloads, see RFC 2388. [RFC2388]

4.10.22.2 Plain text form data

Payloads using the text/plain format are intended to be human readable. They are not reliably interpretable by computer, as the format is ambiguous (for example, there is no way to distinguish a literal newline in a value from the newline at the end of the value).