WG's position on IDREF and URI

[Dan's take on URIs and IDREFs is below and a worthwhile read, I'm trying to summarize the WG's position in response.]

The reason I asked the XML schema editors about the URI datatype is because I needed to understand the syntactical validation constraints placed over that type. If it permits fragments, we will likely have to create a user-generated type by specifying a pattern facet over the string type. This is admittedly awkward, and I'm of a mixed mind on it as are many of the WG members, but the reasons for this follow:

The WG is presently doing two things "oddly" in its treatment of references.

Our present course is to define a URI-clean (sans the fragment), such that:

URI-clean = [ absoluteURI | relativeURI ]

This is done because the treatment of XPATH/XSLT or other fragment expressions in the context of a URI can be confusing. As XPath is a feature some WG members will want to use, the semantics of the transform are very important to the signature and it makes sense that they be explicitly represented as part of a transform. As part of a transform that we identify the WG _can_ properly specify any serialization or canonicalization necessary for XPATH/XSLT to work for our application. (Given that serialization and attribute order are purposefully not specified by those specs and are punted to the application, I wonder how other applications will address this issue (consistent serialization) when they are expressed merely as part of a URI...)

However, we still need to support signature references to XML elements within a local document. (Where a signature is enveloped by or enveloping XML content in the same document. ) Given our URI definition it makes sense to rely upon IDREFs for this purpose for the following reasons:
1. I believe this was the intent of ID/IDREF as specified in XML1.0.
2. This method permits those members not keen on XPath to reference local XML elements (within the same document) by using presently implemented XML and not having to support XPath immediately.
However, there are a number of reasons/arguments not to do this
1. It is my understanding that ID/IDREFs are not thought of that highly by Berners-Lee as they permit "closed-world" references.
  "The local identifier space is a subset of URI space. When an attribute is defined as a URI, the simple "#" prefix gives access to the local ID space - while still allowing great power of expression by reference to anything else on the Web. When the "IDREF" form is used, this is not possible. The IDREF form is a weak form IMHO and not wise for new designs which are not to be deliberately constraining." http://www.w3.org/DesignIssues/Syntax.html
2. For XML applications to understand IDREFs they need access to the DTD. However, I've heard arguments that this is not the case. (Though I'm not sure how relevant the DTD is in any case as this this document will have element types from two different DTD/schemas: the document and the signature.)
3. The end result of this is rather kludgey as already noted.
Consequently the following to arguments were forwarded:
1. Boyer has proposed we use XPath (or some profile subset/hack) for doing local references. Everyone must support this particular XPath instance, though not the whole specification.
2. Karlinger has seconded Boyer's argument, or even suggested that any XPath specification of a URI needs to be interpreted in the context of our application serialization and canonicalization profile.
However, this issue was discussed at the FTF meeting last week with the result that:

Schaad: let's stay with what we have until we hear a compelling argument that we understand and agree with before we move away from what we have.
Reagle: what about the "clean-URI" type, no such thing. Result: Define our own 'clean-URI' XML datatype.

At 22:30 00/01/26 -0600, Dan Connolly wrote:
 >[copy to w3c-archive in case I write something useful; feel free
 >to forward to anywhere, including public forums like the dsig WG]
 >
 >"C. M. Sperberg-McQueen" wrote:
 >> 
 >> At 14:32 00/01/11 -0500, you wrote:
 >> >The URI schema data type does envision "#fragment" being a valid URI, right?
 >
 >Yes, I believe "#fragment" is supposed to be a happy value in
 >the case where the schema says the datatype is the one given at
 >http://www.w3.org/TR/1999/WD-xmlschema-2-19991217/#uri
 >
 >> The type we define almost certainly should allow values like
 >> "#fragment" -- we just have to be careful to use the right term for
 >> it.  If people need both types, and wish to distinguish them,
 >> then that's a good requirement for version 2.
 >
 >Like somebody said (David Beech?), I think it's somewhat misleading to
 >call that datatype "URI". uriRef (or URI-Reference or whatever) is more
 >consistent with the URI spec,
 >http://www.ietf.org/rfc/rfc2396.txt
 >
 >
 >> I don't know.  Dan has persuaded me to be cautious in using
 >> the terms 'URI' and 'URI reference', but so far I have not managed
 >> to get fully straight on which is which.  In general, I believe
 >> 'URI reference' is more general, but at the last Schema ftf, Dan
 >> persuaded me that that was only true on some axes, and on other
 >> axes the generality ran the other way.  Result:  I am terminally
 >> confused.
 >
 >Perhaps you haven't read
 >	URI terminology, esp. in XML specs
 >	From: Dan Connolly (connolly@w3.org)
 >	Date: Mon, Jan 10 2000 
 >	http://lists.w3.org/Archives/Public/uri/2000Jan/0002.html
 >
 >But in case you have, and you're still confused, I'll try again...
 >
 >What I know for certain is that RFC2396 clearly defines two syntactic
 >constructs and much of their semantics:
 >	URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
 >and
 >	absoluteURI   = scheme ":" ( hier_part | opaque_part )
 >
 >URI-reference is what you know and love from HTML as the thing inside
 >the href="..." (except for the I18N-friendly but mathematically awkward
 >conventions in HTML 4 for non-ascii characters in URIs. Let's don't go
 >there for now). The following are all URI-references:
 >
 >	../foo
 >	#bar
 >	../foo#bar
 >	http://example.com/
 >	http://example.com/#bar
 >
 >absoluteURI is the thing that you give to lynx or wget or libwww when
 >you want to suck some bytes (and maybe a MIME type) down from the
 >network.
 >It generally refers to a resource you can get at via the network
 >(but not always; e.g. uuid:23j23lkj32 or isbn:nnnn or whatever).
 >Of those above, only the following is an absolute URI:
 >
 >	http://example.com/
 >
 >You may have seen the term "URI" used as the union of those syntactic
 >constructs. But as I worked out a formalization of all this stuff,
 >	http://www.w3.org/XML/9711theory/URI.html in
 >	http://www.w3.org/XML/9711theory/
 >I decided that it doesn't make sense to look at their union
 >semantically.
 >It would be like taking the union of time-points and time-offsets.
 >Yes, the value 3 makes sense both as the time point '3 seconds since the
 >epoch'
 >and as the time-offset between 12:00:00 and 12:00:03, but they're quite
 >differnent
 >beasts.
 >
 >Similarly, http://example.com/ should be looked at differently when
 >it's used as a URI-reference than when it's used as an absoluteURI.
 >When it's a URI-reference, it's not something you can hand to your
 >network layer and get content back; you have to combine it with
 >a base absoluteURI to get the referent of the URI-reference,
 >another absoluteURI; then you can hand that to the network layer
 >and get bytes back. Don't let the fact that
 >	X + http://example.com/ = http://example.com/
 >for all X confuse you.
 >
 >Now let's check your understanding; try this: the URI-reference
 >http://example.com/ refers to the absoluteURI http://example.com/
 >regardless what base absoluteURI that URI-reference is...
 >um... added to.
 >
 >The hard part is generalizing that sentence:
 >	With respect to some base absoluteURI, a URI-reference
 >	refers to a ?????.
 >
 >There's no standardized term to put in the ????, even though it's
 >the one the Namespace spec needs so badly. absoluteURI almost
 >works, except when the URI-reference in question has a fragmentID.
 >
 >i.e.
 >
 >	http://example.com/xyz + ../foo#bar = http://example.com/foo#bar
 >
 >but what do you call http://example.com/foo#bar ? it doesn't match
 >the syntax of absoluteURI, so that's no good.
 >
 >It was just called a URI in RFC1630, but the IETF folks objected cuz
 >the #bar part doesn't affect the network operation of the thingy;
 >but it doesn't make sense, web-architecturally, to treat URIs
 >with #fragids as second-class citizens.
 >
 >But it's not a URI-reference; it's the referent of a URI-reference.
 >It's a time-point now, no longer a time-offset.
 >
 >I use the term absolute-uri-with-optional-fragid in converstations
 >like this one sometimes, and I abbreviated that to URIwf in my
 >formalism:
 >
 >   URIwf tuple of abs: absoluteURI, fragment: Fragment 
 >   absoluteURI tuple of scheme: URISchemeID, path: PathName 
 >
 >[...]
 >
 >   asserts
 >
 >     \forall i1, i2: absoluteURI, if1, if2: URIwf, r1, r2:
 >URI_reference,
 >        frag: Fragment
 >
 >        i1 # frag == [i1, frag];
 >
 >        combine(if1.abs, asRef(if2)) == if2;
 >
 >        combine(if1.abs, wrt(if2, if1)) = if2;
 >
 >        combine(i1, r1).fragment = fragment(r1);
 >
 >        % asRef is 1-1
 >        asRef(if1) = asRef(if2) => if1=if2;
 >
 >
 >> I hope against hope that this helps, though I recognize that "I
 >> do not know" cannot be a reassuring answer.  I'll have to look it
 >> up and ask Dan to consult the entrails of a goat or two ...
 >
 >I hope this explanation is less mystical than goat entrails ...
 >
 >I think it might make a nifty informational RFC, especially if I
 >elaborate
 >on both the historical notes and the formalism.
 >
 >After all...
 >
 >"Dan received a B.S. in Computer Science [...] His research interest is
 >investigating the value of formal descriptions of
 >chaotic systems like the Web, especially in the consensus-building
 >process."
 >	-- my bio
 >	http://www.w3.org/People/all#connolly%40w3.org