Re: ISSUE-60: Name of draft should be changed to refer to URI's not URL's

John: on today's call, I promised a bit of a followup.

John Kemp wrote:

> What would a '#' mean in a URN? RFC2141[1] suggests that '#' is a 
> reserved character, and would thus 
> require escaping.


I'm not quite sure why URNs are coming up as a big consideration wrt/ this 
change.  RFC 3986 [1] is the syntax for all Web identifiers, including for 
example those using the http scheme.  So, the main reason that some of us 
pushed to change URL to URI in the title and content of the draft is that 
it's the preferred initialism for the identifiers we're discussing, 
including those that use http.

Regarding the urn scheme, unless I'm missing something, fragment 
identifiers are allowed with any URI scheme.  From section 3.5 of [1]

"The semantics of a fragment identifier are defined by the set of 
representations that might result from a retrieval action on the primary 
resource.  The fragment's format and resolution is therefore dependent on 
the media type [RFC2046] of a potentially retrieved representation, even 
though such a retrieval is only performed if the URI is dereferenced.  If 
no such representation exists, then the semantics of the fragment are 
considered unknown and are effectively unconstrained.  Fragment identifier 
semantics are independent of the URI scheme and thus cannot be redefined 
by scheme specifications."

[...]

"As with any URI, use of a fragment identifier component does not imply 
that a retrieval action will take place.  A URI with a fragment identifier 
may be used to refer to the secondary resource without any implication 
that the primary resource is accessible or will ever be accessed."

So, that refers to the types of representations, and goes pretty far in 
signaling that schemes don't matter.  The urn scheme doesn't give you a 
fixed way of retrieving a representation of a resource, but a) if you had 
a way of getting such a media-typed representation I think that fragids 
could be used per the spec for that media type, and 2) RFC 3986 makes 
clear that fragids can be used even when retrieval is not possible at all, 
though the semantics are "unconstrained".

As to escaping, I'm not quick enough to compose all the pertinent BNF of 
RFC2141 with RFC 3986, but I'm fairly sure that RFC 2141 is defining the 
strings that match: "scheme ":" hier-part [ "?" query ] " in the generic 
syntax of URIs:

        URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

So, if I've got that right, the requirement to escape # would be if that 
character occurred in part of the URN itself, not the fragment identifier. 
 

Noah

[1] http://www.ietf.org/rfc/rfc3986.txt

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Thursday, 2 April 2009 23:08:26 UTC