When a form is submitted, the data in the form is converted into the structure specified by the enctype, and then sent to the destination specified by the action using the given method.
For example, take the following form:
<form action="/find.cgi" method=get> <input type=text name=t> <input type=search name=q> <input type=submit> </form>
If the user types in "cats" in the first field and "fur" in the
second, and then hits the submit button, then the user agent will
load /find.cgi?t=cats&q=fur
.
On the other hand, consider this form:
<form action="/find.cgi" method=post enctype="multipart/form-data"> <input type=text name=t> <input type=search name=q> <input type=submit> </form>
Given the same user input, the result on submission is quite different: the user agent instead does an HTTP POST to the given URL, with as the entity body something like the following text:
------kYFrd4jNJEgCervE Content-Disposition: form-data; name="t" cats ------kYFrd4jNJEgCervE Content-Disposition: form-data; name="q" fur ------kYFrd4jNJEgCervE--
This form data set encoding is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences.
To decode application/x-www-form-urlencoded
payloads, the following algorithm should be used. This
algorithm uses as inputs the payload itself, payload, consisting of a Unicode string using only
characters in the range U+0000 to U+007F; a default character
encoding encoding; and optionally an isindex flag indicating that the payload is to be
processed as if it had been generated for a form containing an isindex
control. The output of
this algorithm is a sorted list of name-value pairs. If the isindex flag is set and the first control really was
an isindex
control, then
the first name-value pair will have as its name the empty
string.
Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).
If the isindex flag is set and the first string in strings does not contain a "=" (U+003D) character, insert a U+003D EQUALS SIGN character (=) at the start of the first string in strings.
Let pairs be an empty list of name-value pairs.
For each string string in strings, run these substeps:
If string contains a "=" (U+003D) character, then let name be the substring of string from the start of string up to but excluding its first "=" (U+003D) character, and let value be the substring from the first character, if any, after the first "=" (U+003D) character up to the end of string. If the first U+003D EQUALS SIGN character (=) is the first character, then name will be the empty string. If it is the last character, then value will be the empty string.
Otherwise, string contains no "=" (U+003D) characters. Let name have the value of string and let value be the empty string.
Replace any "+" (U+002B) characters in name and value with U+0020 SPACE characters.
Replace any escape in name and value with the character represented by the escape. This replacement most not be recursive.
An escape is a "%" (U+0025) character followed by two characters in the ranges ASCII digits, U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F, and U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F.
The character represented by an escape is the Unicode character whose code point is equal to the value of the two characters after the "%" (U+0025) character, interpreted as a hexadecimal number (in the range 0..255).
So for instance the string "A%2BC
" would become "A+C
".
Similarly, the string "100%25AA%21
"
becomes the string "100%AA!
".
Convert the name and value strings to their byte representation in ISO-8859-1 (i.e. convert the Unicode string to a byte string, mapping code points to byte values directly).
Add a pair consisting of name and value to pairs.
If any of the name-value pairs in pairs
have a name component consisting of the string "_charset_
" encoded in US-ASCII, and the value
component of the first such pair, when decoded as US-ASCII, is the
name of a supported character encoding, then let encoding be that character encoding (replacing the
default passed to the algorithm).
Convert the name and value components of each name-value pair in pairs to Unicode by interpreting the bytes according to the encoding encoding.
Return pairs.
Parameters on the
application/x-www-form-urlencoded
MIME type are
ignored. In particular, this MIME type does not support the charset
parameter.
For details on how to interpret multipart/form-data
payloads, see RFC 2388. [RFC2388]
Payloads using the text/plain
format are intended to
be human readable. They are not reliably interpretable by computer,
as the format is ambiguous (for example, there is no way to
distinguish a literal newline in a value from the newline at the end
of the value).