This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24163 - Unsafe characters should be percent-encoded
Summary: Unsafe characters should be percent-encoded
Status: RESOLVED WORKSFORME
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-23 18:40 UTC by Santiago M. Mola
Modified: 2015-06-18 09:44 UTC (History)
3 users (show)

See Also:


Attachments

Description Santiago M. Mola 2013-12-23 18:40:40 UTC
RFC 1738 specifies the following unsafe characters which should be always encoded:  "{", "}", "|", "\", "^", "~", "[", "]", and "`".

"`" and "\" are already covered by the spec, but not the others.

The others seem to have less consistency across browsers:
- Firefox v25 encodes all of them except "|" and "~".
- Chrome v31 encodes all of them except "[", "]" and "~". And converts "\" to "/".
- IE v10 encodes all of them except "[", "]" and "~". And converts "\" to "/".
- Safari v5:
   - When opening a link or using location.href: encodes none of them. And converts "\" to "/".
   - When typing in the URL bar: encodes all of them except "~" (yes, even "\").
Comment 1 Santiago M. Mola 2013-12-25 18:22:28 UTC
Some additional notes:

RFC 2396 actually removed "~" from the unwise character set.

I would say that all characters in the unwise character set as of RFC 2396 should be percent-encoded in URLs, with the only exception of the special handling of "\".

That would make parsed URLs standard-compliant URIs. And that would be particularly good for interoperability with other systems.
Comment 2 Anne 2014-01-15 11:10:38 UTC
The RFCs you quote are obsolete. I think I copied Safari here, it being the most conservative.

Are browsers always converting them or only in path?
Comment 3 Santiago M. Mola 2014-01-19 01:42:12 UTC
I still haven't checked about escaping other than path.

RFC 3986 is the same as RFC 2396 regarding to this (they're not defined as the "unwise" character set, but they're still forbidden in path).
Comment 4 NARUSE, Yui 2014-07-01 02:36:39 UTC
I hit the issue and surveyed,

1. input following text in the address bar of the browser

http://example.com/ !"$%&'()*+,-.:;<=>@[\]^_`{|}~? !"$%&'()*+,-./:;<=>?@[\]^_`{|}~# !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

2. show the page
3. copy the URL in the address bar

The result is following:

Chrome:
http://example.com/%20!%22$%&'()*+,-.:;%3C=%3E@[/]%5E_%60%7B%7C%7D~?%20!%22$%&%27()*+,-./:;%3C=%3E?@[\]^_`{|}~# !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

Safari:
http://example.com/%20!%22$%&'()*+,-.:;%3C=%3E@[/]^_`{|}~?%20!%22$%&'()*+,-./:;%3C=%3E?@[\]^_`{|}~#%20!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

Firefox:
http://example.com/%20!%22$%&%27%28%29*+,-.:;%3C=%3E@%5B%5C%5D%5E_%60%7B|%7D~?%20!%22$%&%27%28%29*+,-./:;%3C=%3E?@[\]^_%60{|}~
#%20!%22#$%&%27%28%29*+,-./:;%3C=%3E?@[\]^_%60{|}~

IE: error
IE without % in path:
http://example.com/%20!%22$&'()*+,-.:;%3C=%3E@[/]%5E_%60%7B%7C%7D~? !"$%&'()*+,-./:;<=>?@[\]^_`{|}~# !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Comment 5 Anne 2014-07-01 07:27:42 UTC
Yui, the address bar is part of the UI. It's not a good place to run URL tests in.
Comment 6 NARUSE, Yui 2014-07-01 09:11:18 UTC
Santiago also tests with address bar.

As far as I understand, address bar gets various inputs, and output the escaped/normalized URL for other machines.
It must be meaningful.

Moreover you can see the difference between escaped characters in path/query/fragment.
For example Safari, Chrome and IE doesn't escape # in fragment.
Comment 7 Anne 2015-06-15 14:47:53 UTC
(In reply to NARUSE, Yui from comment #6)
> Santiago also tests with address bar.
> 
> As far as I understand, address bar gets various inputs, and output the
> escaped/normalized URL for other machines.
> It must be meaningful.

It's much more reliable to test how e.g. <a> or new URL() parses.


> Moreover you can see the difference between escaped characters in
> path/query/fragment.
> For example Safari, Chrome and IE doesn't escape # in fragment.

The specification doesn't either at the moment, though I think Firefox hit some issues implementing that.

What exactly is the desired change here?
Comment 8 Anne 2015-06-18 09:44:07 UTC
I think this is mostly resolved now, especially given:

  https://github.com/whatwg/url/issues/16
  https://github.com/whatwg/url/issues/17

Please open new GitHub issues if there's anything left here.