<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>14832</bug_id>
          
          <creation_ts>2011-11-15 09:51:44 +0000</creation_ts>
          <short_desc>Check whether the encoding problems for query components applies to mailto: URLs and other non-HTTP URLs and see if we can change the definition of &quot;valid URL&quot; accordingly</short_desc>
          <delta_ts>2014-01-15 14:28:25 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>URL</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WORKSFORME</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#top</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>annevk</cc>
    
    <cc>duerst</cc>
    
    <cc>erik.arvidsson</cc>
    
    <cc>ian</cc>
    
    <cc>julian.reschke</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>public-webapps</cc>
          
          <qa_contact>sideshowbarker+urlspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>60040</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-11-15 09:51:44 +0000</bug_when>
    <thetext>Specification: http://dev.w3.org/html5/spec/Overview.html
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
At http://dev.w3.org/html5/spec/Overview.html#terminology-0, a subsection
entitled &quot;Terminology&quot;, it says:

A URL is a valid URL if at least one of the following conditions holds:

o The URL is a valid URI reference [RFC3986].
o The URL is a valid IRI reference and it has no query component. [RFC3987]
o The URL is a valid IRI reference and its query component contains no
unescaped non-ASCII characters. [RFC3987]
o The URL is a valid IRI reference and the character encoding of the URL&apos;s
Document is UTF-8 or a UTF-16 encoding. [RFC3987]

The problem that query components are interpreted in the document encoding is
acute for http:/https:, but not for mailto:, and hopefully not for any other
schemes. So the above text has to be changed to take this into account.
Because the conditions are or-ed together the simplest thing would be to add
another condition such as:

o The URL is a valid IRI reference and the scheme of the URL, potentially
after converting from relative to absolute form, is not http: or https:.

Regards,   Martin. (Martin Dürst, duerst@it.aoyama.ac.jp, please feel free to
contact me for further discussion)

Posted from: 133.2.210.73
User agent: Opera/9.80 (Windows NT 6.1; U; en) Presto/2.9.168 Version/11.52</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74775</commentid>
    <comment_count>1</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-09-28 10:36:38 +0000</bug_when>
    <thetext>So per the new standard http://url.spec.whatwg.org/ for non-hierarchical URL schemes such as mailto the rules would not apply and you would get normal utf-8 percent encoding. Now if that is correct and matches browsers I have yet to test.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>98502</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-01-15 14:28:25 +0000</bug_when>
    <thetext>From Simon:

data:text/html;charset=windows-1251,&lt;!DOCTYPE html&gt;&lt;a href=&quot;mailto:foo@bar?subject=&amp;aring;&quot;&gt;foo shows literal &quot;å&quot; in the status bar in (new) opera while same for http: form-escapes it and clicking the link makes the å round-trip successfully to opera mail in both opera and firefox but firefox falls back to utf-8 for unencodeable characters so that&apos;s not telling much

But for Opera/Chrome it seems sufficient.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>