This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Earlier I posted an issue with serializing the query in non-relative URLs. But after I read more about URIs, I am not sure whether the scheme data and query string should be kept separate. There is a distinction between how the URL specification categorizes URLs and how the URI standards (RFC3986 and RFC3987) classify URIs. Both standards allow fragments to appear in all URLs/URIs, but they differ on whether a query string is parsed. In the URL standard, query strings can occur in all URLs, but in the URI standards, a query string is not parsed if the URI contains a scheme but the scheme data doesn't begin with a slash (that is, if the URI is an "opaque" URI). Take the following as an example: mailto:me@example.com?subject=Hi In the URL standard, the URL is parsed as: scheme - mailto scheme data - me@example.com query - subject=Hi but in the URI standards, the URI is parsed as: scheme - mailto scheme-specific part - me@example.com?subject=Hi Here, in the mailto scheme, separating the scheme data and the query may be a useful distinction. As another example, the string jar:http://example.com/jar?x=1!/com/example/Foo.class is parsed in the URI standards as: scheme - jar scheme-specific part - http://example.com/jar?x=1!/com/example/Foo.class but in the URL standard as: scheme - jar scheme data - http://example.com/jar query - x=1!/com/example/Foo.class A better distinction for the jar scheme would have been "http://example.com/jar?x=1" and "com/example/Foo.class", but this is specific to the jar scheme. This shows that while it's useful for some schemes to parse the query string, it's not so useful for others. That's because not all schemes recognize a query string in opaque URIs, and each scheme has different parsing rules. In both examples, mailto and jar are not relative schemes in the URL standard. But what about a scheme that _is_ a relative scheme? The URL "http:example.com" would be parsed as follows: in the URL standard: scheme - http path - example.com or in the URI standard: scheme - http scheme-specific data - example.com (Since the URL doesn't contain a slash, "example.com" is not treated as a host; in fact, this URL would be disallowed under RFC2616 section 3.2.2, and for the other relative schemes, the relevant RFCs don't seem to allow a syntax like that.) But when someone enters that URL in Firefox or Google Chrome, it gets treated like "http://example.com" and is probably parsed that way too. So the following questions should be discussed: - Should the URL standard not parse the query string in the "scheme data" state? This will allow jar to work well, but may be inconvenient for mailto and other schemes, since it requires an additional step by the application. - Should the URL standard parse the query string only for certain schemes that allow it, such as mailto? This will require adding another category of schemes in addition to "relative schemes". - As stated above, the scheme data in "relative" schemes must start with "//", so they are, mostly correctly, handled differently. But there are other "non-relative" schemes, such as nntp, that follow the same rules. Should those schemes be added to the list of relative schemes? Or should the URL standard parse all URLs with a scheme and "//" at the start like "hierarchical" URIs? (The list of currently registered schemes is at this page: <http://www.iana.org/assignments/uri-schemes.html>.) I intend for this to be a discussion rather than a bug report.
It would help if you filed separate tickets for distinct issues. Or in case of discussion, email whatwg@whatwg.org. I separated query from data so that about:blank?test would still be about:blank as that matched some number of UAs. It also helps with the URLQuery API. How http:example depends on the base URL. If the base is http://test/ it comes out as http://test/example but if the base is https://test/ (different scheme) it comes out as http://example/ ... We might add more relative schemes, depending on whether or not that makes sense.
I've posted this message to the mailing list. I encourage you to add your thoughts there.
Thanks Peter! (Also thanks for all the review on various specs, been great.)
I think this is defined in the URL Standard as we want it. Even though query is indeed distinct, you still have the same amount of options with it as before.