This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27257 - anyURI_b006 seems to be valid
Summary: anyURI_b006 seems to be valid
Status: NEW
Alias: None
Product: XML Schema Test Suite
Classification: Unclassified
Component: Microsoft tests (show other bugs)
Version: 2006-11-06
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema Test Suite mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-06 12:19 UTC by Georgiy Rakov
Modified: 2014-11-07 12:16 UTC (History)
1 user (show)

See Also:


Attachments

Description Georgiy Rakov 2014-11-06 12:19:09 UTC
Bug 4048 [1] resulted in marking the expected result for anyURI_b006 test as "invalid" because "//" (double slash) is considered as invalid URI. However according to reading of rfc2396 [2] presented below double slash should be considered as valid URI.

Section "5. Relative URI References" from rfc2396.txt [2] states that:

   A relative reference beginning with two slash characters is termed a
   network-path reference, as defined by <net_path> in Section 3.  

Section "3. URI Syntactic Components" from rfc2396 [2] states:

      net_path      = "//" authority [ abs_path ]

Section "3.2. Authority Component" from rfc2396 [2] states:

      authority     = server | reg_name

So if 'server' component can be empty then '//' should be considered as valid URI. According to following reasoning 'server' component can be empty.

Section "3.2.2. Server-based Naming Authority" from rfc2396 [2] states:

      server        = [ [ userinfo "@" ] hostport ]

namely according to BNF rules above it is allowed for 'server' component to be empty, thus '//' can be considered as empty relative network-path reference.

I understand that 3.2.2 from rfc2396 [2] in its beginning states:

   URL schemes that involve the direct use of an IP-based protocol to a
   specified server on the Internet use a common syntax for the server
   component of the URI's scheme-specific data:

      <userinfo>@<host>:<port>

   where <userinfo> may consist of a user name and, optionally, scheme-
   specific information about how to gain authorization to access the
   server. The parts "<userinfo>@" and ":<port>" may be omitted.

thus it looks like that from:
1. definition '<userinfo>@<host>:<port>'
2. and the excerpt from above: 'The parts "<userinfo>@" and ":<port>" may be omitted'
it follows that '<host>' part is obligatory,
but section "1.6. Syntax Notation and Common Elements" states:

   This document uses two conventions to describe and define the syntax
   for URI.  The first, called the layout form, is a general description
   of the order of components and component separators, as in

      <first>/<second>;<third>?<fourth>

   The component names are enclosed in angle-brackets and any characters
   outside angle-brackets are literal separators.  Whitespace should be
   ignored.  These descriptions are used informally and do not define
   the syntax requirements.

namely it says: "These descriptions are used informally and do not define the syntax requirements.". Hence I believe no conclusions about syntax should be made from layout syntax definition '<userinfo>@<host>:<port>' of 'server' component.

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=4048
[2] http://www.ietf.org/rfc/rfc2396.txt
Comment 1 Henry S. Thompson 2014-11-06 13:17:09 UTC
2396 was obsoleted by 3986 [3], whose BNF does _not_ allow the
authority to be empty:

  relative-part = "//" authority path-abempty
  authority     = [ userinfo "@" ] host [ ":" port ]

ht

[3] http://tools.ietf.org/html/rfc3986
Comment 2 Georgiy Rakov 2014-11-06 14:03:54 UTC
Yes, but XML Schema Part 2: Datatypes Second Edition [4] references rfc2396 rather than rfc3986.

[4] http://www.w3.org/TR/xmlschema-2/

Georgiy.
Comment 3 Henry S. Thompson 2014-11-06 14:26:36 UTC
Indeed it does.  And 2396 says it has been replaced by 3986.  See recent discussion about 'tight binding' vs. 'loose binding':

 http://lists.w3.org/Archives/Public/www-xml-schema-comments/2014OctDec/0004.html
Comment 4 Michael Kay 2014-11-06 14:37:37 UTC
I have some sympathy with Georgiy on this one. XSD 1.0 references RFC 2396. The problem is that RFC 2396 is a mess.

When I raised this as a bug in bug #4048, I was probably influenced by the fact that the java.net.URI class rejects "//", with the error:

java.net.URISyntaxException: Expected authority at index 2: //

I suspect that the designers of class java.net.URI noted that very often when the RFC mentions the term "authority", it means a non-empty authority. Examples of this usage are: "A base URI without an authority component", "some URI schemes do not allow an <authority> component", "If the authority component is defined".

The Javadoc comments for java.net.URI say:

"This constructor parses the given string exactly as specified by the grammar in RFC 2396, Appendix A, except for the following deviations:

(1) An empty authority component is permitted as long as it is followed by a non-empty path, a query component, or a fragment component. This allows the parsing of URIs such as "file:///foo/bar", which seems to be the intent of RFC 2396 although the grammar does not permit it. If the authority component is empty then the user-information, host, and port components are undefined.

(2) ..."

So I think the justification for rejecting "//" is the belief that RFC 2396 doesn't mean what it says.
Comment 5 Georgiy Rakov 2014-11-07 12:16:36 UTC
If I understand correctly the intention is to treat referencing rfc2396 within [4] in 'loose binding' manner (is this correct?). But W3C spec [4] doesn't state that referencing to rfc2396 is done in 'loose binding' way. BTW: rfc2396 doesn't have any references to rfc3986 but even if such reference existed, I believe, it wouldn't be obvious that it should take 'superseding' effect when applying to [4].

So as I see it there is no normative spec stating that rfc2396 should be superseded by rfc3986 when applying to W3C spec [4]. I believe 'tight binding' is the 'default understanding' (it's closer to literal interpretation of the text). 

Neither are there any comments that rfc2396 should be understood with some corrections taken into account (as Michael said rfc2396 is a mess).