HTTP URI
Section 3.2.1 unfortunately will need some careful wording to deal
with the sticky issues of character sets. The problem is that the BNF
of the URL specification is described in terms of characters, not
octets. That is, the URL "http://foo.baz/%23bar" is defined as a
sequence of characters that includes the character "%", "2", "3", "b",
etc. The translation of those sequences of characters to octets used
in on-the-wire protocols is left up to the individual scheme. The
"ftp:" scheme, for example, calls for the parsing of the URL as a
sequence of characters, and then performing various operations in the
FTP protocol using the de-encoded octets. This is how
"ftp://foo.baz/afs/a/b/c" which is supposed to "CD a", "CD b" and then
"RETR c" can be distinguished from "ftp://foo.baz/%2fafs%2fa%2fb/c"
which is supposed to "CD /a/b" and then "RETR c".
However, for the HTTP protocol, the translation from characters to
octets does not involve any de-encoding; the characters in the URL
after the host name (and the entire URL when talking to a proxy) are
turned into octets using the US-ASCII encoding.
HTTP itself need not say, but the question arises for special
circumstances where octets that are not US-ASCII are used in a URL,
e.g., after a "?" which encodes a query. The HTTP specification itself
could be silent on this issue, but in the case of HTML forms with
<ISINDEX>, it might need to be specified as ISO-8859-1.
http working group issues, 2/24.