11379 – [pending URL spec] definition of hierarchical URL inconsistent with rfc 3986

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11379 - [pending URL spec] definition of hierarchical URL inconsistent with rfc 3986

Summary: [pending URL spec] definition of hierarchical URL inconsistent with rfc 3986

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P4 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+urlspec

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-11-22 18:40 UTC by Glenn Adams
Modified:	2012-12-15 10:47 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Glenn Adams 2010-11-22 18:40:56 UTC

Section 2.6.1 defines a hierarchical URL thus:

"An absolute URL is a hierarchical URL if, when resolved and then parsed, there is a character immediately after the <scheme> component and it is a U+002F SOLIDUS character (/)."

However, RFC3986 Section 3 defines all URIs as containing a hierarchical part as follows:

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

and, further, does not require the hierarchical part to start with "/". In particular, it defines hier-part as:

hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

Which, when expanding these components into their definitions, corresponds to:

hier-part
          = "//" authority
          | "//" authority 1*( "/" segment )
          | "/" [ segment-nz *( "/" segment ) ]
          | segment-nz *( "/" segment )
          | 0<pchar>

Note that the last two alternatives do not start with "/", yet are still considered a "hierarchical" part by RFC3986. For example, the following URIs match this syntax, with hier-part mapping to path-rootless:

about:blank
file:foo/bar
urn:example.net:foo:bar

In order to avoid confusion, it may be desirable to use a different term in HTML5 than "hierarchical URL" in this regard. Alternatively, a note could be added which distinguishes the defined usage from the like named (but different) constructs in RFC3986.

I would also note that, in terms of the definitions found in 2.6.1, all "authority-based URLs" are also "hierarchical URLs". I can't tell if this is intentional or not, if it is, then perhaps a note indicating this would be useful.

Regards,
Glenn

Comment 1 Ian 'Hixie' Hickson 2011-01-01 05:50:13 UTC

I'll look into this in more detail once Adam's spec on how to parse URLs is ready. From a quick glance, though, it seems not too unreasonable to come up with different terminology if there's a better term than "hierarchical" here. Any suggestions?

Comment 2 Adam Barth 2011-01-01 21:51:32 UTC

I've been using the term "standard URL" but that might not be the optimal term either.

Comment 3 Michael[tm] Smith 2011-08-04 05:13:26 UTC

mass-move component to LC1

Comment 4 Anne 2012-09-28 10:51:51 UTC

http://url.spec.whatwg.org/ defines URLs now. Per that document a URL is always "absolute" (perhaps invalid, but always absolute). The input to the parsing algorithm may be relative to something else, but you always end up with URL that has all the relevant information (although it could be invalid if there's relative input and nothing to resolve it to).