This is revision 1.5612.
This section is non-normative.
When a form is submitted, the data in the form is converted into the structure specified by the enctype, and then sent to the destination specified by the action using the given method.
For example, take the following form:
<form action="/find.cgi" method=get> <input type=text name=t> <input type=search name=q> <input type=submit> </form>
If the user types in "cats" in the first field and "fur" in the
second, and then hits the submit button, then the user agent will
load /find.cgi?t=cats&q=fur
.
On the other hand, consider this form:
<form action="/find.cgi" method=post enctype="multipart/form-data"> <input type=text name=t> <input type=search name=q> <input type=submit> </form>
Given the same user input, the result on submission is quite different: the user agent instead does an HTTP POST to the given URL, with as the entity body something like the following text:
------kYFrd4jNJEgCervE Content-Disposition: form-data; name="t" cats ------kYFrd4jNJEgCervE Content-Disposition: form-data; name="q" fur ------kYFrd4jNJEgCervE--
A form
element's default button is the
first submit button in
tree order whose form owner is that
form
element.
If the platform supports letting the user submit a form implicitly (for example, on some platforms hitting the "enter" key while a text field is focused implicitly submits the form), then doing so must cause the form's default button's activation behavior, if any, to be run.
Consequently, if the default button is disabled, the form is not submitted when such an implicit submission mechanism is used. (A button has no activation behavior when disabled.)
If the form has no submit
button, then the implicit submission mechanism must just
submit the
form
element from the form
element
itself.
When a form
element form is submitted from an element submitter (typically a button), optionally with a
submitted from submit()
method flag set, the
user agent must run the following steps:
Let form document be the form's Document
.
If form
document has no associated browsing context or
its browsing context had its sandboxed forms
browsing context flag set when the Document
was
created, then abort these steps without doing anything.
Let form browsing context be the browsing context of form document.
If form is already being submitted
(i.e. the form was submitted again while processing
the events fired from the next two steps, probably from a script
redundantly calling the submit()
method on form), then abort these steps. This doesn't affect
the earlier instance of this algorithm.
If the submitted from submit()
method flag is not
set, and the submitter element's no-validate state is false,
then interactively validate the constraints of form and examine the result: if the result is
negative (the constraint validation concluded that there were
invalid fields and probably informed the user of this) then abort
these steps.
If the submitted from submit()
method flag is not
set, then fire a simple event that is cancelable named
submit
, at form. If the event's default action is prevented
(i.e. if the event is canceled) then abort these steps. Otherwise,
continue (effectively the default action is to perform the
submission).
Let form data set be the result of constructing the form data set for form in the context of submitter.
Let action be the submitter element's action.
If action is the empty string, let action be the document's address of the form document.
This step is a willful violation of RFC 3986, which would require base URL processing here. This violation is motivated by a desire for compatibility with legacy content. [RFC3986]
Resolve the URL action, relative to the submitter element. If this fails, abort these steps. Otherwise, let action be the resulting absolute URL.
Let scheme be the <scheme> of the resulting absolute URL.
Let enctype be the submitter element's enctype.
Let method be the submitter element's method.
Let target be the submitter element's target.
If the user indicated a specific browsing context to use when submitting the form, then let target browsing context be that browsing context. Otherwise, apply the rules for choosing a browsing context given a browsing context name using target as the name and form browsing context as the context in which the algorithm is executed, and let target browsing context be the resulting browsing context.
If target browsing context was created in the previous step, or if the form document has not yet completely loaded, then let replace be true. Otherwise, let it be false.
Select the appropriate row in the table below based on the value of scheme as given by the first cell of each row. Then, select the appropriate cell on that row based on the value of method as given in the first cell of each column. Then, jump to the steps named in that cell and defined below the table.
GET | POST | |
---|---|---|
http
| Mutate action URL | Submit as entity body |
https
| Mutate action URL | Submit as entity body |
ftp
| Get action URL | Get action URL |
javascript
| Get action URL | Get action URL |
data
| Get action URL | Post to data: |
mailto
| Mail with headers | Mail as body |
If scheme is not one of those listed in this table, then the behavior is not defined by this specification. User agents should, in the absence of another specification defining this, act in a manner analogous to that defined in this specification for similar schemes.
The behaviors are as follows:
Let query be the result of encoding the
form data set using the application/x-www-form-urlencoded
encoding
algorithm, interpreted as a US-ASCII string.
Let destination be a new URL that is equal to the action except that its <query> component is replaced by query (adding a "?" (U+003F) character if appropriate).
Navigate target browsing context to destination. If replace is true, then target browsing context must be navigated with replacement enabled.
Let entity body be the result of encoding the form data set using the appropriate form encoding algorithm.
Let MIME type be determined as follows:
application/x-www-form-urlencoded
application/x-www-form-urlencoded
".multipart/form-data
multipart/form-data;
", a
U+0020 SPACE character, the string "boundary=
", and the multipart/form-data
boundary string
generated by the multipart/form-data
encoding
algorithm.text/plain
text/plain
".Otherwise, navigate target browsing context to action using the HTTP method given by method and with entity body as the entity body, of type MIME type. If replace is true, then target browsing context must be navigated with replacement enabled.
Navigate target browsing context to action. If replace is true, then target browsing context must be navigated with replacement enabled.
Let data be the result of encoding the form data set using the appropriate form encoding algorithm.
If action contains the string "%%%%
" (four U+0025 PERCENT SIGN characters),
then %-escape all bytes in data that, if
interpreted as US-ASCII, do not match the unreserved
production in the URI Generic Syntax,
and then, treating the result as a US-ASCII string, further
%-escape all the U+0025 PERCENT SIGN characters in the resulting
string and replace the first occurrence of "%%%%
" in action with the
resulting double-escaped string. [RFC3986]
Otherwise, if action contains the string
"%%
" (two U+0025 PERCENT SIGN characters
in a row, but not four), then %-escape all characters in data that, if interpreted as US-ASCII, do not
match the unreserved
production in the URI
Generic Syntax, and then, treating the result as a US-ASCII
string, replace the first occurrence of "%%
" in action with the
resulting escaped string. [RFC3986]
Navigate target
browsing context to the potentially modified action (which will be a data:
URL). If replace is true, then target
browsing context must be navigated with replacement
enabled.
Let headers be the resulting encoding the
form data set using the application/x-www-form-urlencoded
encoding
algorithm, interpreted as a US-ASCII string.
Replace occurrences of "+" (U+002B) characters in
headers with the string "%20
".
Let destination consist of all the characters from the first character in action to the character immediately before the first "?" (U+003F) character, if any, or the end of the string if there are none.
Append a single "?" (U+003F) character to destination.
Append headers to destination.
Navigate target browsing context to destination. If replace is true, then target browsing context must be navigated with replacement enabled.
Let body be the resulting encoding the
form data set using the appropriate
form encoding algorithm and then %-escaping all the bytes
in the resulting byte string that, when interpreted as US-ASCII,
do not match the unreserved
production in
the URI Generic Syntax. [RFC3986]
Let destination have the same value as action.
If destination does not contain a "?" (U+003F) character, append a single "?" (U+003F) character to destination. Otherwise, append a single U+0026 AMPERSAND character (&).
Append the string "body=
" to destination.
Append body, interpreted as a US-ASCII string, to destination.
Navigate target browsing context to destination. If replace is true, then target browsing context must be navigated with replacement enabled.
The appropriate form encoding algorithm is determined as follows:
application/x-www-form-urlencoded
application/x-www-form-urlencoded
encoding
algorithm.multipart/form-data
multipart/form-data
encoding
algorithm.text/plain
text/plain
encoding
algorithm.The algorithm to construct the form data set for a form form optionally in the context of a submitter submitter is as follows. If not specified otherwise, submitter is null.
Let controls be a list of all the submittable elements whose form owner is form, in tree order.
Let the form data set be a list of name-value-type tuples, initially empty.
Loop: For each element field in controls, in tree order, run the following substeps:
If any of the following conditions are met, then skip these substeps for this element:
datalist
element ancestor.input
element whose type
attribute is in the Checkbox state and
whose checkedness is
false.input
element whose type
attribute is in the Radio Button state and
whose checkedness is
false.input
element whose type
attribute is in the Image Button state, and
either the field element does not have a
name
attribute specified, or
its name
attribute's value is
the empty string.object
element that is not using a
plugin.Otherwise, process field as follows:
Let type be the value of the type
IDL attribute of field.
If the field element is an
input
element whose type
attribute is in the Image Button state,
then run these further nested substeps:
If the field element has a name
attribute specified and its
value is not the empty string, let name be
that value followed by a single "." (U+002E) character.
Otherwise, let name be the empty
string.
Let namex be the string consisting of the concatenation of name and a single "x" (U+0078) character.
Let namey be the string consisting of the concatenation of name and a single "y" (U+0079) character.
The field element is submitter, and before this algorithm was invoked the user indicated a coordinate. Let x be the x-component of the coordinate selected by the user, and let y be the y-component of the coordinate selected by the user.
Append an entry to the form data set with the name namex, the value x, and the type type.
Append an entry to the form data set with the name namey and the value y, and the type type.
Skip the remaining substeps for this element: if there are any more elements in controls, return to the top of the loop step, otherwise, jump to the end step below.
Let name be the value of the field element's name
attribute.
If the field element is a
select
element, then for each option
element in the select
element whose selectedness is true,
append an entry to the form data set with the
name as the name, the value of the
option
element as the value, and type as the type.
Otherwise, if the field element is an
input
element whose type
attribute is in the Checkbox state or the
Radio Button state,
then run these further nested substeps:
If the field element has a value
attribute specified, then
let value be the value of that attribute;
otherwise, let value be the string
"on
".
Append an entry to the form data set with name as the name, value as the value, and type as the type.
Otherwise, if the field element is an
input
element whose type
attribute is in the File Upload state, then for
each file selected in the
input
element, append an entry to the form data set with the name as
the name, the file (consisting of the name, the type, and the
body) as the value, and type as the type. If
there are no selected files,
then append an entry to the form data set
with the name as the name, the empty string
as the value, and application/octet-stream
as the
type.
Otherwise, if the field element is an
object
element: try to obtain a form submission
value from the plugin,
and if that is successful, append an entry to the form data set with name as the
name, the returned form submission value as the value, and the
string "object
" as the type.
Otherwise, append an entry to the form data set with name as the name, the value of the field element as the value, and type as the type.
If the element has a form control dirname
attribute, and that attribute's
value is not the empty string, then run these substeps:
Let dirname be the value of the
element's dirname
attribute.
Let dir be the string "ltr
" if the directionality of the
element is 'ltr', and "rtl
" otherwise (i.e. when the
directionality of the element is 'rtl').
Append an entry to the form data set
with dirname as the name, dir as the value, and the string "direction
" as the type.
An element can only have a form control
dirname
attribute if it is a
textarea
element or an input
element
whose type
attribute is in
either the Text state
or the Search
state.
End: For the name and value of each entry in the form data set whose type is not "file
", replace every occurrence of a "CR" (U+000D) character not followed by a "LF" (U+000A) character, and every occurrence of a "LF" (U+000A)
character not preceded by a "CR" (U+000D) character,
by a two-character string consisting of a U+000D CARRIAGE RETURN
U+000A LINE FEED (CRLF) character pair.
In the case of the value of textarea
elements, this newline normalization is redundant, as it is
already normalized from its raw value for the
purposes of the DOM API.
Return the form data set.
This form data set encoding is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences.
The application/x-www-form-urlencoded
encoding
algorithm is as follows:
Let result be the empty string.
If the form
element has an accept-charset
attribute,
then, taking into account the characters found in the form data set's names and values, and the character
encodings supported by the user agent, select a character encoding
from the list given in the form
's accept-charset
attribute
that is an ASCII-compatible character encoding. If
none of the encodings are supported, or if none are listed, then
let the selected character encoding be UTF-8.
Otherwise, if the document's character encoding is an ASCII-compatible character encoding, then that is the selected character encoding.
Otherwise, let the selected character encoding be UTF-8.
Let charset be the preferred MIME name of the selected character encoding.
For each entry in the form data set, perform these substeps:
If the entry's name is "_charset_
"
and its type is "hidden
", replace its value
with charset.
If the entry's type is "file
",
replace its value with the file's filename only.
For each character in the entry's name and value that cannot be expressed using the selected character encoding, replace the character by a string consisting of a U+0026 AMPERSAND character (&), a "#" (U+0023) character, one or more characters in the range ASCII digits representing the Unicode code point of the character in base ten, and finally a U+003B SEMICOLON character (;).
Encode the entry's name and value using the selected character encoding. The entry's name and value are now byte strings.
For each byte in the entry's name and value, apply the appropriate subsubsteps from the following list:
Leave the byte as is.
Let s be a string consisting of a "%" (U+0025) character followed by two characters in the ranges ASCII digits and U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F representing the hexadecimal value of the byte in question (zero-padded if necessary).
Encode the string s as US-ASCII, so that it is now a byte string.
Replace the byte in question in the name or value being processed by the bytes in s, preserving their relative order.
Interpret the entry's name and value as Unicode strings encoded in US-ASCII. (All of the bytes in the string will be in the range 0x00 to 0x7F; the high bit will be zero throughout.) The entry's name and value are now Unicode strings again.
If the entry's name is "isindex
", its type is "text
", and this is the first entry in the form data set, then append the value to result and skip the rest of the substeps for this
entry, moving on to the next entry, if any, or the next step in
the overall algorithm otherwise.
If this is not the first entry, append a single U+0026 AMPERSAND character (&) to result.
Append the entry's name to result.
Append a single "=" (U+003D) character to result.
Append the entry's value to result.
Encode result as US-ASCII and return the resulting byte stream.
To decode application/x-www-form-urlencoded
payloads, the following algorithm should be used. This
algorithm uses as inputs the payload itself, payload, consisting of a Unicode string using only
characters in the range U+0000 to U+007F; a default character
encoding encoding; and optionally an isindex flag indicating that the payload is to be
processed as if it had been generated for a form containing an isindex
control. The output of
this algorithm is a sorted list of name-value pairs. If the isindex flag is set and the first control really was
an isindex
control, then
the first name-value pair will have as its name the empty
string.
Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).
If the isindex flag is set and the first string in strings does not contain a "=" (U+003D) character, insert a U+003D EQUALS SIGN character (=) at the start of the first string in strings.
Let pairs be an empty list of name-value pairs.
For each string string in strings, run these substeps:
If string contains a "=" (U+003D) character, then let name be the substring of string from the start of string up to but excluding its first "=" (U+003D) character, and let value be the substring from the first character, if any, after the first "=" (U+003D) character up to the end of string. If the first U+003D EQUALS SIGN character (=) is the first character, then name will be the empty string. If it is the last character, then value will be the empty string.
Otherwise, string contains no "=" (U+003D) characters. Let name have the value of string and let value be the empty string.
Replace any "+" (U+002B) characters in name and value with U+0020 SPACE characters.
Replace any escape in name and value with the character represented by the escape. This replacement most not be recursive.
An escape is a "%" (U+0025) character followed by two characters in the ranges ASCII digits, U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F, and U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F.
The character represented by an escape is the Unicode character whose code point is equal to the value of the two characters after the "%" (U+0025) character, interpreted as a hexadecimal number (in the range 0..255).
So for instance the string "A%2BC
" would become "A+C
".
Similarly, the string "100%25AA%21
"
becomes the string "100%AA!
".
Convert the name and value strings to their byte representation in ISO-8859-1 (i.e. convert the Unicode string to a byte string, mapping code points to byte values directly).
Add a pair consisting of name and value to pairs.
If any of the name-value pairs in pairs
have a name component consisting of the string "_charset_
" encoded in US-ASCII, and the value
component of the first such pair, when decoded as US-ASCII, is the
name of a supported character encoding, then let encoding be that character encoding (replacing the
default passed to the algorithm).
Convert the name and value components of each name-value pair in pairs to Unicode by interpreting the bytes according to the encoding encoding.
Return pairs.
Parameters on the
application/x-www-form-urlencoded
MIME type are
ignored. In particular, this MIME type does not support the charset
parameter.
The multipart/form-data
encoding
algorithm is as follows:
Let result be the empty string.
If the algorithm was invoked with an explicit character
encoding, let the selected character encoding be that encoding.
(This algorithm is used by other specifications, which provide an
explicit character encoding to avoid the dependency on the
form
element described in the next paragraph.)
Otherwise, if the form
element has an accept-charset
attribute,
then, taking into account the characters found in the form data set's names and values, and the character
encodings supported by the user agent, select a character encoding
from the list given in the form
's accept-charset
attribute
that is an ASCII-compatible character encoding. If
none of the encodings are supported, or if none are listed, then
let the selected character encoding be UTF-8.
Otherwise, if the document's character encoding is an ASCII-compatible character encoding, then that is the selected character encoding.
Otherwise, let the selected character encoding be UTF-8.
Let charset be the preferred MIME name of the selected character encoding.
For each entry in the form data set, perform these substeps:
If the entry's name is "_charset_
" and its type is
"hidden
", replace its value with charset.
For each character in the entry's name and value that cannot be expressed using the selected character encoding, replace the character by a string consisting of a U+0026 AMPERSAND character (&), a "#" (U+0023) character, one or more characters in the range ASCII digits representing the Unicode code point of the character in base ten, and finally a U+003B SEMICOLON character (;).
Encode the (now mutated) form data set
using the rules described by RFC 2388, Returning Values from
Forms: multipart/form-data
, and
return the resulting byte stream. [RFC2388]
Each entry in the form data set is a field, the name of the entry is the field name and the value of the entry is the field value.
The order of parts must be the same as the order of fields in the form data set. Multiple entries with the same name must be treated as distinct fields.
In particular, this means that multiple files
submitted as part of a single <input type=file multiple>
element
will result in each file having its own field; the "sets of files"
feature ("multipart/mixed
") of RFC 2388 is
not used.
The parts of the generated multipart/form-data
resource that correspond to
non-file fields must not have a Content-Type
header
specified. Their names and values must be encoded using the
character encoding selected above (field names in particular do
not get converted to a 7-bit safe encoding as suggested in RFC
2388).
File names included in the generated multipart/form-data
resource (as part of file
fields) must use the character encoding selected above, though the
precise name may be approximated if necessary (e.g. newlines could
be removed from file names, quotes could be changed to "%22", and
characters not expressible in the selected character encoding
could be replaced by other characters). User agents must not use
the RFC 2231 encoding suggested by RFC 2388.
The boundary used by the user agent in generating the return
value of this algorithm is the multipart/form-data
boundary string. (This
value is used to generate the MIME type of the form submission
payload generated by this algorithm.)
For details on how to interpret multipart/form-data
payloads, see RFC 2388. [RFC2388]
The text/plain
encoding
algorithm is as follows:
Let result be the empty string.
If the form
element has an accept-charset
attribute,
then, taking into account the characters found in the form data set's names and values, and the character
encodings supported by the user agent, select a character encoding
from the list given in the form
's accept-charset
attribute.
If none of the encodings are supported, or if none are listed,
then let the selected character encoding be UTF-8.
Otherwise, the selected character encoding is the document's character encoding.
Let charset be the preferred MIME name of the selected character encoding.
If the entry's name is "_charset_
" and its type is
"hidden
", replace its value with charset.
If the entry's type is "file
", replace
its value with the file's filename only.
For each entry in the form data set, perform these substeps:
Append the entry's name to result.
Append a single "=" (U+003D) character to result.
Append the entry's value to result.
Append a "CR" (U+000D) "LF" (U+000A) character pair to result.
Encode result using the selected character encoding and return the resulting byte stream.
Payloads using the text/plain
format are intended to
be human readable. They are not reliably interpretable by computer,
as the format is ambiguous (for example, there is no way to
distinguish a literal newline in a value from the newline at the end
of the value).
When a form
element form is reset, the user agent must
fire a simple event named reset
, that is cancelable, at form, and then, if that event is not canceled, must
invoke the reset
algorithm of each resettable
element whose form owner is form.
Each resettable element
defines its own reset
algorithm. Changes made to form controls as part of these
algorithms do not count as changes caused by the user (and thus,
e.g., do not cause input
events to
fire).