This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21267 - Should even "non-relative" URLs have a query string?
Summary: Should even "non-relative" URLs have a query string?
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-13 16:20 UTC by Peter Occil
Modified: 2013-07-03 10:07 UTC (History)
1 user (show)

See Also:


Attachments

Description Peter Occil 2013-03-13 16:20:06 UTC
Earlier I posted an issue with serializing the query in non-relative URLs. But after
I read more about URIs, I am not sure whether the scheme data and query string
should be kept separate.  There is a distinction between how the URL specification
categorizes URLs and how the URI standards (RFC3986 and RFC3987) classify URIs.

Both standards allow fragments to appear in all URLs/URIs, but they differ on whether
a query string is parsed.  In the URL standard, query strings can occur in all URLs, but
in the URI standards, a query string is not parsed if the URI contains a scheme but
the scheme data doesn't begin with a slash (that is, if the URI is an "opaque" URI).

Take the following as an example:

mailto:me@example.com?subject=Hi

In the URL standard, the URL is parsed as:

scheme - mailto
scheme data - me@example.com
query - subject=Hi

but in the URI standards, the URI is parsed as:

scheme - mailto
scheme-specific part - me@example.com?subject=Hi

Here, in the mailto scheme, separating the scheme data and the query may be a useful distinction.

As another example, the string

jar:http://example.com/jar?x=1!/com/example/Foo.class

is parsed in the URI standards as:

scheme - jar
scheme-specific part - http://example.com/jar?x=1!/com/example/Foo.class

but in the URL standard as:

scheme - jar
scheme data - http://example.com/jar
query - x=1!/com/example/Foo.class

A better distinction for the jar scheme would have been "http://example.com/jar?x=1"
and "com/example/Foo.class", but this is specific to the jar scheme.

This shows that while it's useful for some schemes to parse the query string, it's not so useful for others.  That's because not all schemes recognize a query string in opaque URIs, and each scheme has different parsing rules.  In both examples, mailto and jar are not relative schemes in the URL standard.

But what about a scheme that _is_ a relative scheme?

The URL "http:example.com" would be parsed as follows:

in the URL standard:

scheme - http
path - example.com

or in the URI standard:

scheme - http
scheme-specific data - example.com

(Since the URL doesn't contain a slash, "example.com" is not treated as a host;
in fact, this URL would be disallowed under RFC2616 section 3.2.2, and for the
other relative schemes, the relevant RFCs don't seem to allow a syntax like that.)

But when someone enters that URL in Firefox or Google Chrome, it gets treated like
"http://example.com" and is probably parsed that way too.

So the following questions should be discussed:

- Should the URL standard not parse the query string in the "scheme data" state?  
This will allow jar to work well, but may be inconvenient for mailto and other schemes, 
since it requires an additional step by the application.
- Should the URL standard parse the query string only for certain schemes that allow it, 
such as mailto?  This will require adding another category of schemes in addition to 
"relative schemes".
- As stated above, the scheme data in "relative" schemes must start with "//", so they
are, mostly correctly, handled differently.  But there are other "non-relative" schemes,
such as nntp, that follow the same rules.  Should those schemes be added to the
list of relative schemes?  Or should the URL standard parse all URLs with a scheme and "//" at the start like "hierarchical" URIs? (The list of currently registered schemes is at this page: <http://www.iana.org/assignments/uri-schemes.html>.)

I intend for this to be a discussion rather than a bug report.
Comment 1 Anne 2013-03-13 16:28:45 UTC
It would help if you filed separate tickets for distinct issues. Or in case of discussion, email whatwg@whatwg.org.

I separated query from data so that about:blank?test would still be about:blank as that matched some number of UAs. It also helps with the URLQuery API.

How http:example depends on the base URL. If the base is http://test/ it comes out as http://test/example but if the base is https://test/ (different scheme) it comes out as http://example/ ...

We might add more relative schemes, depending on whether or not that makes sense.
Comment 2 Peter Occil 2013-03-13 21:11:53 UTC
I've posted this message to the mailing list.  I encourage you to add your thoughts there.
Comment 3 Anne 2013-03-13 21:14:03 UTC
Thanks Peter! (Also thanks for all the review on various specs, been great.)
Comment 4 Anne 2013-07-03 10:07:00 UTC
I think this is defined in the URL Standard as we want it. Even though query is indeed distinct, you still have the same amount of options with it as before.