This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3959 - fn-base-uri-23 incorrect URI?
Summary: fn-base-uri-23 incorrect URI?
Status: CLOSED INVALID
Alias: None
Product: XML Query Test Suite
Classification: Unclassified
Component: XML Query Test Suite (show other bugs)
Version: 1.0.1
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Andrew Eisenberg
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-08 00:03 UTC by Ivan Shcheklein
Modified: 2006-12-26 08:37 UTC (History)
0 users

See Also:


Attachments

Description Ivan Shcheklein 2006-11-08 00:03:53 UTC
The query is as follows:

fn:string(fn:base-uri(<anElement
xml:base="http:\\example.com\\examples">Element content</anElement>))

In RFC 2616 we read:

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

So, possibly, expected result must be FORG0001?

Thanks,
Ivan
Comment 1 Michael Kay 2006-11-08 09:32:34 UTC
There's been a fair bit of discussion about this test case, I suggest you search for the test name in the archives. The XML Base specification does not require the value of the xml:base attribute to be a valid URI (whether against RFC 2616 or against any other RFC). Rather, section 3.1 says that if the attribute contains characters that aren't allowed in a URI "processors must encode and escape these characters to obtain a valid URI reference". The Infoset specification then says that the base URI property of the node is the unescaped URI, and after discussion of this test case in the WGs we decided that this unescaped value is also the base URI property of the node in XDM, and hence the value returned by the base-uri() function. 
Comment 2 Ivan Shcheklein 2006-11-08 13:20:34 UTC
Thank you, Michael, for answer. Yes, I saw previous discussion about this test case, but it had slightly different goal - to escape or not to escape this URI.

Yes, I agree that The XML Base specification does not
require the value of the xml:base attribute to be a valid URI, but function base-uri has return-type equal to xs:anyURI. 

May be I am not right, but I suppose we need to check lexical representation in this case in the same way as if we create xs:anyURI using its constructor.



Comment 3 Michael Kay 2006-11-08 14:16:41 UTC
The data type xs:anyURI is fairly liberal in what it permits. In particular, it allows any string that would be a valid URI if special characters were %-encoded (what I sometimes call a wannabe-URI). I think that the string in this example, although horrible, is a legal instance of xs:anyURI even though it is not a legal URI.
Comment 4 Ivan Shcheklein 2006-11-08 15:40:39 UTC
The extent to which an implementation validates the lexical form of xs:anyURI is implementation dependent. I think, it means that if our implementation is strict then we must raise FORG0001 in this test case. 

I don't insist on making FORG0001 as one possible alternative. But It seems to me that FORG0001 should be allowed as expected result.

If our goal to create non-strict fn:base-uri why we couldn't return xs:string, not xs:anyURI? 

P.S. Moreover, percent-encoded URI in this test case is not correct URI too according to RFC 2616.
Comment 5 Carmelo Montanez 2006-12-14 21:24:44 UTC
Ivan:

I do not get your last point.  The returned value is not supposed to be escaped.  The returned values looks ok.  I will mark this as WONTFIX.  Please
re open thsi bug is you still have an issue.

Thanks,
Carmelo
Comment 6 Ivan Shcheklein 2006-12-14 21:33:18 UTC
Yes I still have an issue. I didn't say that returned value must be escaped. I said that FORG0001 should be allowed here because 'http:\\example.com\\examples' is not in lexical space of xs:anyURI type.
Comment 7 Michael Kay 2006-12-14 22:05:10 UTC
But, horrible though this string is, it actually *is* in the lexical space of xs:anyURI; that's because xs:anyURI according to XML Schema allows strings that would be valid URIs if special characters were escaped.
Comment 8 Ivan Shcheklein 2006-12-14 22:50:17 UTC

Ok, if almost any string (possibly, after escaping) is within lexical space of xs:anyURI why we should raise FORG0002 in fn-resolve-uri-3 and moreover in fn-resolve-uri-4 which contains scheme specific invalid URI (as I know "http://" is valid according to general URI syntax)?

Comment 9 Michael Kay 2006-12-14 23:26:21 UTC
While it might be true that *almost* any string is permitted by xs:anyURI, it is not true that *every* string is permitted.

fn-resolve-uri-3 uses ":" as a relative URI; that's illegal according to the syntax in the RFC, and it remains illegal after escaping as described in XLink.

fn-resolve-uri-4 uses ""http://" as an absolute URI; that's also illegal according to the RFC. The rules are complex, but the essence, I think, is that when the colon is followed by exactly two slashes, then an authority component must be present in the URI.
Comment 10 Ivan Shcheklein 2006-12-14 23:46:27 UTC
fn-resolve-uri-3:
As I understand condition:
"If $relative or $base is not a valid xs:anyURI an error is raised [err:FORG0002]"
from fn:resolve-uri, it means that we should check validness of both arguments without knowledge relative them or absolute.

fn-resolve-uri-4:
"http://" is valid absolute URI according to RFC because authority component can be empty according to the following rules:

...
authority     = server | reg_name
...
server        = [ [ userinfo "@" ] hostport ]
...
Comment 11 Michael Kay 2006-12-15 00:21:22 UTC
Fair enough. I was looking at RFC 3986 which has

authority   = [ userinfo "@" ] host [ ":" port ]

host =        IP-literal / IPv4address / reg-name
Comment 12 Ivan Shcheklein 2006-12-15 00:48:06 UTC
I thought that I should use RFC 2396 as defined by XML Schema Datatypes for xs:anyURI type.

Nevertheless, 'host' can be empty in RFC 3986 too.

Moreover what I should do with similar fn-doc queries. Why we have 'invalid URI' in fn-doc-17, for example?
Comment 13 Ivan Shcheklein 2006-12-15 00:50:53 UTC
Small addition to previous comment:
'host' can be empty because of reg-name:

reg-name   = *( unreserved / pct-encoded / sub-delims ).
Comment 14 Ivan Shcheklein 2006-12-25 23:46:55 UTC
I think I should close this bug as INVALID - fn-base-uri-23 is correct and other test cases should be considered in other threads.

Thanks, Michael, for the discussion.