Bugzilla – Bug 20642
[FO30] non-hierarchic URIs
Last modified: 2013-03-12 16:14:23 UTC
fn:resolve-uri makes reference to non-hierarchic URIs.
I couldn't find a description of how to determine whether a URI is non-hierachic.
I'm guessing that it is what Java describes as Opaque URIs, rather than requiring knowledge of the specifics of all possible URI schemes.
Could this be clarified in the specification?
The term "hierarchic" is used in the sense of RFC 3986, which uses the term extensively though it does not define it formally.
RFC 3986 in its list of changes states "All references to "opaque" URIs have been replaced with a better description of how the path component may be opaque to hierarchy." Java is using the old terminology.
I'm happy to add an informal explanation, e.g. hierarchic URI (in the sense of RFC 3986).
(In reply to comment #1)
> The term "hierarchic" is used in the sense of RFC 3986, which uses the term
> extensively though it does not define it formally.
I read through that, and it does refer to "base URI has a non-hierarchical path" but I couldn't point to anything obvious in the specification that identified that non-hierarchical paths meant URIs which match the path-rootless production.
It's more obvious from the RFC 2396 production:
absoluteURI = scheme ":" ( hier_part | opaque_part )
The text from RFC 3986:
"For some URI
schemes, the visible hierarchy is limited to the scheme itself:
everything after the scheme component delimiter (":") is considered
opaque to URI processing. Other URI schemes make the hierarchy
explicit and visible to generic parsing algorithms."
seems to indicate that the hierachic nature is dependent on the scheme. e.g. mailto is always non-hierarchic, http is always hierarchic.
> I'm happy to add an informal explanation, e.g. hierarchic URI (in the sense
> of RFC 3986).
Yes please - or a fairly specific reference into the RFC. Thanks.
" A path consists of a sequence of path segments separated by a slash
("/") character. A path is always defined for a URI, though the
defined path may be empty (zero length). Use of the slash character
to indicate hierarchy is only required when a URI will be used as the
context for relative references. "
might lead one to believe that a URI is hierarchic if it contains a slash.
Microsoft's .NET System.Uri class allows resolution against "opaque" URLs. Why does F&O forbid it, since an "opaque" URI is absolute?
var urn = new Uri("urn:isbn:foo");
var mailto = new Uri("mailto:email@example.com");
var foo = new Uri(urn, mailto);
Console.WriteLine(foo); // prints mailto:firstname.lastname@example.org
var bar = new Uri(mailto, urn);
Console.WriteLine(bar); // prints urn:isbn:foo
As regards your example, I think this rule:
If $relative is an absolute IRI (as defined above), then it is returned unchanged.
is intended to take precedence over this one:
A dynamic error is raised [err:FORG0002] if $base is not ... a suitable IRI to use ...
I agree that isn't necessarily clear in the spec.
Is this what the issue is here?
As regards the definition of a "non-hierarchic URI", I don't want to attempt to publish any kind of interpretation or clarification of the RFC: if timbl thinks it's a good enough specification, then it's not for me to say he's wrong. If there's more than one interpretation of what the term means, then I'm happy to say that you can use whichever one you want. The RFC states in 1.2.3 "...relative references can only be used within the context of a hierarchical URI", and we are merely referring to that statement, not attempting to amplify it.
> Is this what the issue is here?
No. My example was wrong. I meant to use "b.html" as one of the URIs. .NET does indeed through a (rather obscure) error when trying to resolve this against the URN. Just ignore all that!
The issue is how to determine what constitutes a non-hierarchic URI. I'm guessing what Java identifies as Opaque URIs is one approximation. But I suspect it really boils down to how any specific URI scheme has been defined - and that's covered by scheme specific RFCs, not RFC 3986. In that sense, RFC 3986 isn't at all at fault.
Were we to add the test:
what result would you expect?
Tim, yes, individual URI scheme registrations are responsible for saying whether a URI is hierarchical.
I agree it's messy, and if a library decides to be over-optimistic and allow resolution of a relative URI against against, say, a mailto: URI, it's up to the user not to do that :-) - otherwise you'd be stuck when a new hierarchical scheme was defined that the library didn't recognise.
The WG discussed this in Agenda 2013-03-05 and decided no change was to be made to the specification.