empty authority implies rel-path, not net-path?

Hi again,

While doing some regression testing for URI reference syntax validation and
parsing, I ran across something of note in the RFC 2396bis grammar. I'm
working from http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html and am
fully prepared to be publicly humiliated when someone points out that either
this is not the current version or that I've overlooked something that makes
the whole thing a non-issue.

As far as I can tell, according to the grammar, the authority component can no
longer be an empty string. So if you have a reference that begins with '//',
optionally preceded by a scheme and ':', then everything from the '//' up to
the next '?' or '#' or end of the string must be a rel-path, not a net-path.

This situation causes problems when trying to interpret URI references such as
'//', 'file:///', and 'myscheme://a:b/', not to mention Roy's own test cases
for relative URI resolution where he uses base URIs like 'fred:///s//a/b/c'.
The components of these references that used to be net-paths under RFC 2396
are now rel-paths under RFC 2396bis, unless I'm mistaken, which I probably am.

If this new interpretation is correct, then it makes the regex in Appendix B
misbehave, since it will assume that any path that starts with '//' must be a
net-path. If this regex is used as the basis for the parser that implements
the resolution algorithm in section 5 of RFC 2396bis, then you're going to
have problems (as I currently have).

If the grammar is correct, then I would suggest changing the first asterisk in
the regex to a plus, but only after fully considering the dramatic effect this
has on the test cases in
    http://www.ics.uci.edu/~fielding/url/test4.html and
    http://www.ics.uci.edu/~fielding/url/test5.html,
if you use the regex to parse each URI into its components.

-Mike

Received on Sunday, 21 December 2003 16:18:41 UTC