<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>21267</bug_id>
          
          <creation_ts>2013-03-13 16:20:06 +0000</creation_ts>
          <short_desc>Should even &quot;non-relative&quot; URLs have a query string?</short_desc>
          <delta_ts>2013-07-03 10:07:00 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>URL</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Peter Occil">poccil</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>mike</cc>
          
          <qa_contact>sideshowbarker+urlspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>84346</commentid>
    <comment_count>0</comment_count>
    <who name="Peter Occil">poccil</who>
    <bug_when>2013-03-13 16:20:06 +0000</bug_when>
    <thetext>Earlier I posted an issue with serializing the query in non-relative URLs. But after
I read more about URIs, I am not sure whether the scheme data and query string
should be kept separate.  There is a distinction between how the URL specification
categorizes URLs and how the URI standards (RFC3986 and RFC3987) classify URIs.

Both standards allow fragments to appear in all URLs/URIs, but they differ on whether
a query string is parsed.  In the URL standard, query strings can occur in all URLs, but
in the URI standards, a query string is not parsed if the URI contains a scheme but
the scheme data doesn&apos;t begin with a slash (that is, if the URI is an &quot;opaque&quot; URI).

Take the following as an example:

mailto:me@example.com?subject=Hi

In the URL standard, the URL is parsed as:

scheme - mailto
scheme data - me@example.com
query - subject=Hi

but in the URI standards, the URI is parsed as:

scheme - mailto
scheme-specific part - me@example.com?subject=Hi

Here, in the mailto scheme, separating the scheme data and the query may be a useful distinction.

As another example, the string

jar:http://example.com/jar?x=1!/com/example/Foo.class

is parsed in the URI standards as:

scheme - jar
scheme-specific part - http://example.com/jar?x=1!/com/example/Foo.class

but in the URL standard as:

scheme - jar
scheme data - http://example.com/jar
query - x=1!/com/example/Foo.class

A better distinction for the jar scheme would have been &quot;http://example.com/jar?x=1&quot;
and &quot;com/example/Foo.class&quot;, but this is specific to the jar scheme.

This shows that while it&apos;s useful for some schemes to parse the query string, it&apos;s not so useful for others.  That&apos;s because not all schemes recognize a query string in opaque URIs, and each scheme has different parsing rules.  In both examples, mailto and jar are not relative schemes in the URL standard.

But what about a scheme that _is_ a relative scheme?

The URL &quot;http:example.com&quot; would be parsed as follows:

in the URL standard:

scheme - http
path - example.com

or in the URI standard:

scheme - http
scheme-specific data - example.com

(Since the URL doesn&apos;t contain a slash, &quot;example.com&quot; is not treated as a host;
in fact, this URL would be disallowed under RFC2616 section 3.2.2, and for the
other relative schemes, the relevant RFCs don&apos;t seem to allow a syntax like that.)

But when someone enters that URL in Firefox or Google Chrome, it gets treated like
&quot;http://example.com&quot; and is probably parsed that way too.

So the following questions should be discussed:

- Should the URL standard not parse the query string in the &quot;scheme data&quot; state?  
This will allow jar to work well, but may be inconvenient for mailto and other schemes, 
since it requires an additional step by the application.
- Should the URL standard parse the query string only for certain schemes that allow it, 
such as mailto?  This will require adding another category of schemes in addition to 
&quot;relative schemes&quot;.
- As stated above, the scheme data in &quot;relative&quot; schemes must start with &quot;//&quot;, so they
are, mostly correctly, handled differently.  But there are other &quot;non-relative&quot; schemes,
such as nntp, that follow the same rules.  Should those schemes be added to the
list of relative schemes?  Or should the URL standard parse all URLs with a scheme and &quot;//&quot; at the start like &quot;hierarchical&quot; URIs? (The list of currently registered schemes is at this page: &lt;http://www.iana.org/assignments/uri-schemes.html&gt;.)

I intend for this to be a discussion rather than a bug report.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>84349</commentid>
    <comment_count>1</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-03-13 16:28:45 +0000</bug_when>
    <thetext>It would help if you filed separate tickets for distinct issues. Or in case of discussion, email whatwg@whatwg.org.

I separated query from data so that about:blank?test would still be about:blank as that matched some number of UAs. It also helps with the URLQuery API.

How http:example depends on the base URL. If the base is http://test/ it comes out as http://test/example but if the base is https://test/ (different scheme) it comes out as http://example/ ...

We might add more relative schemes, depending on whether or not that makes sense.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>84368</commentid>
    <comment_count>2</comment_count>
    <who name="Peter Occil">poccil</who>
    <bug_when>2013-03-13 21:11:53 +0000</bug_when>
    <thetext>I&apos;ve posted this message to the mailing list.  I encourage you to add your thoughts there.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>84369</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-03-13 21:14:03 +0000</bug_when>
    <thetext>Thanks Peter! (Also thanks for all the review on various specs, been great.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>90219</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-07-03 10:07:00 +0000</bug_when>
    <thetext>I think this is defined in the URL Standard as we want it. Even though query is indeed distinct, you still have the same amount of options with it as before.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>