This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
RFC 1738 specifies the following unsafe characters which should be always encoded: "{", "}", "|", "\", "^", "~", "[", "]", and "`". "`" and "\" are already covered by the spec, but not the others. The others seem to have less consistency across browsers: - Firefox v25 encodes all of them except "|" and "~". - Chrome v31 encodes all of them except "[", "]" and "~". And converts "\" to "/". - IE v10 encodes all of them except "[", "]" and "~". And converts "\" to "/". - Safari v5: - When opening a link or using location.href: encodes none of them. And converts "\" to "/". - When typing in the URL bar: encodes all of them except "~" (yes, even "\").
Some additional notes: RFC 2396 actually removed "~" from the unwise character set. I would say that all characters in the unwise character set as of RFC 2396 should be percent-encoded in URLs, with the only exception of the special handling of "\". That would make parsed URLs standard-compliant URIs. And that would be particularly good for interoperability with other systems.
The RFCs you quote are obsolete. I think I copied Safari here, it being the most conservative. Are browsers always converting them or only in path?
I still haven't checked about escaping other than path. RFC 3986 is the same as RFC 2396 regarding to this (they're not defined as the "unwise" character set, but they're still forbidden in path).
I hit the issue and surveyed, 1. input following text in the address bar of the browser http://example.com/ !"$%&'()*+,-.:;<=>@[\]^_`{|}~? !"$%&'()*+,-./:;<=>?@[\]^_`{|}~# !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 2. show the page 3. copy the URL in the address bar The result is following: Chrome: http://example.com/%20!%22$%&'()*+,-.:;%3C=%3E@[/]%5E_%60%7B%7C%7D~?%20!%22$%&%27()*+,-./:;%3C=%3E?@[\]^_`{|}~# !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ Safari: http://example.com/%20!%22$%&'()*+,-.:;%3C=%3E@[/]^_`{|}~?%20!%22$%&'()*+,-./:;%3C=%3E?@[\]^_`{|}~#%20!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ Firefox: http://example.com/%20!%22$%&%27%28%29*+,-.:;%3C=%3E@%5B%5C%5D%5E_%60%7B|%7D~?%20!%22$%&%27%28%29*+,-./:;%3C=%3E?@[\]^_%60{|}~ #%20!%22#$%&%27%28%29*+,-./:;%3C=%3E?@[\]^_%60{|}~ IE: error IE without % in path: http://example.com/%20!%22$&'()*+,-.:;%3C=%3E@[/]%5E_%60%7B%7C%7D~? !"$%&'()*+,-./:;<=>?@[\]^_`{|}~# !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Yui, the address bar is part of the UI. It's not a good place to run URL tests in.
Santiago also tests with address bar. As far as I understand, address bar gets various inputs, and output the escaped/normalized URL for other machines. It must be meaningful. Moreover you can see the difference between escaped characters in path/query/fragment. For example Safari, Chrome and IE doesn't escape # in fragment.
(In reply to NARUSE, Yui from comment #6) > Santiago also tests with address bar. > > As far as I understand, address bar gets various inputs, and output the > escaped/normalized URL for other machines. > It must be meaningful. It's much more reliable to test how e.g. <a> or new URL() parses. > Moreover you can see the difference between escaped characters in > path/query/fragment. > For example Safari, Chrome and IE doesn't escape # in fragment. The specification doesn't either at the moment, though I think Firefox hit some issues implementing that. What exactly is the desired change here?
I think this is mostly resolved now, especially given: https://github.com/whatwg/url/issues/16 https://github.com/whatwg/url/issues/17 Please open new GitHub issues if there's anything left here.