Bugzilla – Bug 11379
[pending URL spec] definition of hierarchical URL inconsistent with rfc 3986
Last modified: 2012-12-15 10:47:46 UTC
Section 2.6.1 defines a hierarchical URL thus:
"An absolute URL is a hierarchical URL if, when resolved and then parsed, there is a character immediately after the <scheme> component and it is a U+002F SOLIDUS character (/)."
However, RFC3986 Section 3 defines all URIs as containing a hierarchical part as follows:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
and, further, does not require the hierarchical part to start with "/". In particular, it defines hier-part as:
hier-part = "//" authority path-abempty
Which, when expanding these components into their definitions, corresponds to:
= "//" authority
| "//" authority 1*( "/" segment )
| "/" [ segment-nz *( "/" segment ) ]
| segment-nz *( "/" segment )
Note that the last two alternatives do not start with "/", yet are still considered a "hierarchical" part by RFC3986. For example, the following URIs match this syntax, with hier-part mapping to path-rootless:
In order to avoid confusion, it may be desirable to use a different term in HTML5 than "hierarchical URL" in this regard. Alternatively, a note could be added which distinguishes the defined usage from the like named (but different) constructs in RFC3986.
I would also note that, in terms of the definitions found in 2.6.1, all "authority-based URLs" are also "hierarchical URLs". I can't tell if this is intentional or not, if it is, then perhaps a note indicating this would be useful.
I'll look into this in more detail once Adam's spec on how to parse URLs is ready. From a quick glance, though, it seems not too unreasonable to come up with different terminology if there's a better term than "hierarchical" here. Any suggestions?
I've been using the term "standard URL" but that might not be the optimal term either.
mass-move component to LC1
http://url.spec.whatwg.org/ defines URLs now. Per that document a URL is always "absolute" (perhaps invalid, but always absolute). The input to the parsing algorithm may be relative to something else, but you always end up with URL that has all the relevant information (although it could be invalid if there's relative input and nothing to resolve it to).