This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29381 - [QT3] resolve-uri-28 expects error FORG0002, but seems valid according to RFC-3986
Summary: [QT3] resolve-uri-28 expects error FORG0002, but seems valid according to RFC...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3 & XPath 3 Test Suite (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: O'Neil Delpratt
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-19 15:39 UTC by Abel Braaksma
Modified: 2016-06-21 22:46 UTC (History)
2 users (show)

See Also:


Attachments

Description Abel Braaksma 2016-01-19 15:39:34 UTC
Section 5.1 of the RFC says:

   A base URI
   must conform to the <absolute-URI> syntax rule (Section 4.3).  If the
   base URI is obtained from a URI reference, then that reference must
   be converted to absolute form and stripped of any fragment component
   prior to its use as a base URI.

Section 4.3 has the production:

   absolute-URI  = scheme ":" hier-part [ "?" query ]

Section 3 has the productions:

   URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   hier-part   = "//" authority path-abempty
               / path-absolute
               / path-rootless
               / path-empty

It furthermore has the following example:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose


Which imo clearly shows that a "non-hierarchic" URI has a hierarchic part. The test tests the following expression:

    resolve-uri("b.html", "urn:isbn:01234567890X")

Since the algorithm described in 5.3 assumes a path separator of "/" and removes the last part segment, here the path is "isbn:01234567890X", which must be removed, resulting in the valid new URI:

   "urn:b.html"

I believe this to be the correct interpretation of the spec. I know that often this RFC is considered fuzzy, and perhaps somewhere it allows to consider this "not a valid base URI", but I can't find it, hence my argument that this test should NOT raise FORG0002.
Comment 1 Michael Kay 2016-01-19 21:01:07 UTC
I fear we will never converge on this one.

The F+O spec states:

<quote>
A dynamic error is raised [err:FORG0002] if $base is not a valid IRI according to the rules of RFC3987, extended with an implementation-defined subset of the extensions permitted in LEIRI, or if it is not a suitable IRI to use as input to the chosen resolution algorithm (for example, if it is a relative IRI reference, if it is a non-hierarchic URI, or if it contains a fragment identifier).
</quote>

Now, one can argue about all the three examples of "unsuitable" URIs listed in the parentheses, because RFC3986 is vague about these situations. It's true you can mechanistically apply the algorithm of 3986 to a non-hierarchic URI. But section 1.2.3 makes it pretty clear that you are not expected to do so: it says that some schemes are hierarchic and others aren't, and that "relative references can only be used within the context of a hierarchical URI". OK, that's a "can only" not a "must only", but it's all we have to go on.

We've made a conscious decision not to try to make resolve-uri any more rigorous or prescriptive than the underlying RFC; the fact that the F+O reference to non-hierarchic URIs is in parentheses is a deliberate choice, because we recognize that it's legitimate to apply a different reading to the words of the RFC. But I believe we made a conscious decision about this, and the most I would accept in terms of the test case is to allow a "success" result as an alternative result.
Comment 2 Liam R E Quin 2016-01-19 23:25:31 UTC
Personal response...

The URI spec is specific that only / is used for hierarchical processing.
https://tools.ietf.org/html/rfc3986#section-1.2.3 [[
  All URI references are parsed by generic syntax parsers when used.
   However, because hierarchical processing has no effect on an absolute
   URI used in a reference unless it contains one or more dot-segments
   (complete path segments of "." or "..", as described in Section 3.3),
   URI scheme specifications can define opaque identifiers by
   disallowing use of slash characters, question mark characters, and
   the URIs "scheme:." and "scheme:..".
]]

(so no, you can't split a URN at a colon).

The URN spec is specific that it avoids the use of /.

https://tools.ietf.org/html/rfc2141 [[
 RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
   purposes. The URN-WG has not yet debated the applicability and
   precise semantics of those purposes as applied to URNs. Therefore,
   these characters are RESERVED for future developments.  Namespace
   developers SHOULD NOT use these characters in unencoded form, but
   rather use the appropriate %-encoding for each character.
]]


Therefore, there's no such thing as a partial URI reference for a URN.

Hope this helps.

Liam
Comment 3 O'Neil Delpratt 2016-04-26 14:54:53 UTC
I am marking this one as invalid. Please reopen if you don't agree.
Comment 4 Abel Braaksma 2016-04-28 00:47:25 UTC
I'm afraid we are using the wrong arguments here. Let me try to explain myself better.

(In reply to Liam R E Quin from comment #2)
> (so no, you can't split a URN at a colon).
I am not suggesting we can. The first colon is not a path separator, it is part of the scheme and separates [scheme] and [hier-part]. In a URN, [hier-part] is the whole of the part *after* the scheme-colon, without any subsequent splitting (I guess that is why in earlier specs it is called "opaque").

That's why I think that fn:resolve-uri("ietf:rfc:2648", "urn:isbn:0451450523") should therefore return "urn:ietf:rfc:2648".


> Therefore, there's no such thing as a partial URI reference for a URN.
The URN itself cannot be split (except that some schemes have a hierarchy, like DOI, but they are not standardized). But from the point of view from URIs they consist of a scheme and a [hier-part]. The hier-part is opaque, but can be split from the scheme and I think that is what RFC 3986 tells us.

(In reply to Michael Kay from comment #1)
> because RFC3986 is vague about these situations. It's
> true you can mechanistically apply the algorithm of 3986 to a non-hierarchic
> URI. But section 1.2.3 makes it pretty clear that you are not expected to do
> so: it says that some schemes are hierarchic and others aren't, and that
> "relative references can only be used within the context of a hierarchical
> URI". OK, that's a "can only" not a "must only", but it's all we have to go
> on.
That same section 1.2.3 also starts out by saying that all URIs are, by definition, hierarchic:

   The URI syntax is organized hierarchically, with components listed in
   order of decreasing significance from left to right.  For some URI
   schemes, the visible hierarchy is limited to the scheme itself:
   everything after the scheme component delimiter (":") is considered
   opaque to URI processing.  Other URI schemes make the hierarchy
   explicit and visible to generic parsing algorithms.

So: "For some URI schemes, the visible hierarchy is limited to the scheme itself".

Which I think sums it up... 

Whether all this makes "real world sense" I don't know, but I think it would be a good thing to follow how other implementations (in this case: browsers*) have been going about resolving URIs.

> I fear we will never converge on this one.
<snip>
> and the most I would accept in terms of the test case is to allow a 
> "success" result as an alternative result.
If consensus cannot be reached, then I would vote for that. But I hope we can agree that absence of a "/" does not mean absence of a [hier-part] and that in fact the URI spec is meant to be used orthogonally with any type of URI. 

-----
* Note that browsers that follow the HTML5 standard use the term URL, but nevertheless point to RFC 3986, and further specify processing normatively in https://url.spec.whatwg.org/, where non-hierarchical schemes are allowed and resolve as explained above. However, different browsers, as they are, interpret base URIs that are URNs differently.
Comment 5 Abel Braaksma 2016-05-10 09:13:26 UTC
In addition to what already has been said, please mark that RFC-3986 has an explicit remark on this scenario in the section on the algorithm (5.x). In 5.2.1 it says:

   "Note that only the scheme component is required to be
   present in a base URI; the other components may be empty or
   undefined."

And:

   "A component is undefined if its associated delimiter does
   not appear in the URI reference; the path component is never
   undefined, though it may be empty."

Section 3.3 describes the path segment as being the whole after "urn:" and before any "#" or "?" for non-hierarchical URIs, called in ABNF "path-rootless".

So I think the described scenario is neither an error nor undefined in the spec.
Comment 6 O'Neil Delpratt 2016-06-21 16:56:15 UTC
At today's telcon the WG decided to remove this test case.

Removed from QT3 and committed to cvs
Comment 7 Abel Braaksma 2016-06-21 22:46:37 UTC
For sake of reference, let me add the rationale of this decision.

The reason we dropped this test case is that the WG concluded that the RFC-3986 we reference in the spec is too vague in this respect to create a reliable test-case for fn:resolve-uri with a URN.

It was recognized that a strict following of the algorithm in sections 5.2.1 and 5.2.2 of RFC-3986 would allow using fn:resolve-uri with a URN without raising an error.

However, it was also recognized that it didn't make sense to use fn:resolve-uri without hierarchical URIs. While RFC-3986 is written in such away to remove the differences between hierarchical and non-hierarchical URIs, we decided that a base URI that is not hierarchical is allowed to raise this error.

Since both raising an error and returning a result based on the algorithm of 5.2.2 are both allowed, it was considered senseless to keep the test.