Bug 14832 - Check whether the encoding problems for query components applies to mailto: URLs and other non-HTTP URLs and see if we can change the definition of "valid URL" accordingly
Summary: Check whether the encoding problems for query components applies to mailto: U...
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL: http://www.whatwg.org/specs/web-apps/...
Depends on:
Reported: 2011-11-15 09:51 UTC by contributor
Modified: 2014-01-15 14:28 UTC (History)
9 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description contributor 2011-11-15 09:51:44 UTC
Specification: http://dev.w3.org/html5/spec/Overview.html
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

At http://dev.w3.org/html5/spec/Overview.html#terminology-0, a subsection
entitled "Terminology", it says:

A URL is a valid URL if at least one of the following conditions holds:

o The URL is a valid URI reference [RFC3986].
o The URL is a valid IRI reference and it has no query component. [RFC3987]
o The URL is a valid IRI reference and its query component contains no
unescaped non-ASCII characters. [RFC3987]
o The URL is a valid IRI reference and the character encoding of the URL's
Document is UTF-8 or a UTF-16 encoding. [RFC3987]

The problem that query components are interpreted in the document encoding is
acute for http:/https:, but not for mailto:, and hopefully not for any other
schemes. So the above text has to be changed to take this into account.
Because the conditions are or-ed together the simplest thing would be to add
another condition such as:

o The URL is a valid IRI reference and the scheme of the URL, potentially
after converting from relative to absolute form, is not http: or https:.

Regards,   Martin. (Martin Dürst, duerst@it.aoyama.ac.jp, please feel free to
contact me for further discussion)

Posted from:
User agent: Opera/9.80 (Windows NT 6.1; U; en) Presto/2.9.168 Version/11.52
Comment 1 Anne 2012-09-28 10:36:38 UTC
So per the new standard http://url.spec.whatwg.org/ for non-hierarchical URL schemes such as mailto the rules would not apply and you would get normal utf-8 percent encoding. Now if that is correct and matches browsers I have yet to test.
Comment 2 Anne 2014-01-15 14:28:25 UTC
From Simon:

data:text/html;charset=windows-1251,<!DOCTYPE html><a href="mailto:foo@bar?subject=&aring;">foo shows literal "å" in the status bar in (new) opera while same for http: form-escapes it and clicking the link makes the å round-trip successfully to opera mail in both opera and firefox but firefox falls back to utf-8 for unencodeable characters so that's not telling much

But for Opera/Chrome it seems sufficient.