At http://dev.w3.org/html5/spec/Overview.html#terminology-0, a subsection
entitled "Terminology", it says:
A URL is a valid URL if at least one of the following conditions holds:
o The URL is a valid URI reference [RFC3986].
o The URL is a valid IRI reference and it has no query component. [RFC3987]
o The URL is a valid IRI reference and its query component contains no
unescaped non-ASCII characters. [RFC3987]
o The URL is a valid IRI reference and the character encoding of the URL's
Document is UTF-8 or a UTF-16 encoding. [RFC3987]
The problem that query components are interpreted in the document encoding is
acute for http:/https:, but not for mailto:, and hopefully not for any other
schemes. So the above text has to be changed to take this into account.
Because the conditions are or-ed together the simplest thing would be to add
another condition such as:
o The URL is a valid IRI reference and the scheme of the URL, potentially
after converting from relative to absolute form, is not http: or https:.
Regards, Martin. (Martin Dürst, firstname.lastname@example.org, please feel free to
contact me for further discussion)
Posted from: 220.127.116.11
User agent: Opera/9.80 (Windows NT 6.1; U; en) Presto/2.9.168 Version/11.52
So per the new standard http://url.spec.whatwg.org/ for non-hierarchical URL schemes such as mailto the rules would not apply and you would get normal utf-8 percent encoding. Now if that is correct and matches browsers I have yet to test.
data:text/html;charset=windows-1251,<!DOCTYPE html><a href="mailto:foo@bar?subject=å">foo shows literal "å" in the status bar in (new) opera while same for http: form-escapes it and clicking the link makes the å round-trip successfully to opera mail in both opera and firefox but firefox falls back to utf-8 for unencodeable characters so that's not telling much
But for Opera/Chrome it seems sufficient.