This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3264 - xs:anyURI definition
Summary: xs:anyURI definition
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P2 normal
Target Milestone: CR
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: cluster: anyURI
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2006-05-09 11:28 UTC by Michael Kay
Modified: 2009-04-18 08:45 UTC (History)
3 users (show)

See Also:


Attachments

Description Michael Kay 2006-05-09 11:28:12 UTC
QT approved comment:

Section 3.3.18 (anyURI) appears to lack any definition of the value space, or any definition of identity and equality. 

(One gets the impression that the types defined later in the spec have received less attention from the editors, just as they have from this reviewer).
Comment 1 C. M. Sperberg-McQueen 2007-10-14 19:16:57 UTC
The WG discussed this briefly with QT at the October 2007 ftf meetings.

The nature of the value space is currently entailed by the description of
the lexical space and the description of the lexical mapping as the
identity function.

The nature of the value space should probably be stated explicitly.  It does
not need to change, it just needs to be clearly and explicitly stated.
Comment 2 C. M. Sperberg-McQueen 2008-05-24 03:06:18 UTC
Since this issue (in the WG's analysis as summarized in comment #1, at
any rate) is directed at clarifying, rather than changing, the definition
of the value space in question, I am tentatively marking it editorial.
This may have the effect that the issue will be addressed after, rather
than before, the next published working draft (but it also will have the
effect of ensuring that the issue is not closed as WONTFIX before that
WD is published).
Comment 3 C. M. Sperberg-McQueen 2009-03-27 16:20:39 UTC
At its call today, the XML Schema WG agreed to instruct the editors
to revise the discussion of anyURI 

  a) to refer non-normatively to the Note on Legacy Encoded IRIs (LEIRIs)
  b) to mention explicitly that (since the lexical space of anyURI is
      essentially the set of strings, and the lexical mapping is identity,
      it follows that) the value space is isomorphic to that of xsd:string

 
Comment 4 C. M. Sperberg-McQueen 2009-04-13 00:36:56 UTC
A wording proposal intended to resolve this issue is at 

  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b3264.html
  (member-only link)

Comment 5 C. M. Sperberg-McQueen 2009-04-13 15:32:08 UTC
Noah Mendelsohn has pointed out a flaw in the proposed wording, which 
also applies to the description of the lexical space.  ("The set of possibly
empty character sequences, huh?  OK, here's a character sequence:
'http://www.w3.org/'.  Could it possibly be empty? No, it's not empty, 
and it couldn't possibly be empty without no longer being itself:  I can 
see characters right there. I guess it's not a member of the value space,
or of the lexical space either.")

Perhaps it would be better to change the proposed new section on value 
space and the existing first sentence of the section on lexical mapping 
from:

    3.3.18.1 Value Space

    The value space of anyURI is the set of possibly empty character 
    sequences.

    3.3.18.2 Lexical Mapping

    The ·lexical space· of anyURI is the set of possibly empty finite-length 
    character sequences.

to

    3.3.18.1 Value Space

    The value space of anyURI is the set of finite-length sequences
    of zero or more characters.

    3.3.18.2 Lexical Mapping

    The ·lexical space· of anyURI is the set of finite-length sequences 
    of zero or more characters.

The proposed amendment is modeled on the formulation used to 
describe the value space of string.

Comment 6 Dave Peterson 2009-04-13 17:01:18 UTC
(In reply to comment #5)
> Noah Mendelsohn has pointed out a flaw in the proposed wording, which 
> also applies to the description of the lexical space.  ("The set of possibly
> empty character sequences, huh?  OK, here's a character sequence:
> 'http://www.w3.org/'.  Could it possibly be empty? No, it's not empty, 
> and it couldn't possibly be empty without no longer being itself:  I can 
> see characters right there. I guess it's not a member of the value space,
> or of the lexical space either.")
> 
> Perhaps it would be better to change the proposed new section on value 
> space and the existing first sentence of the section on lexical mapping 

IIRC, the "possibly empty" was directed by the WG some time ago (since some FLCS are required to be non-empty, and we wanted to be explicit at every occurrence.  There are 8 occurrences of 'possibly empty' in the current spec.  Either we expect readers to understand what's meant, or we must fix all eight.  I suspect that anyone who comes up with Noah's "flaw" will understand what we meant.  If we're going to fix things at this level of nit-pick, we've got a lot of other changes to be made too.  Let's not go down that slippery slope.

OTOH, note that the lexical space is limited to finite sequences; the value space (by the proposed wording) is not.  Since when we insured we were explicit about allowing or disallowing the empty string, we also chose to be careful to disallow infinite strings, the wording from Lexical Mapping should be used.
Comment 7 C. M. Sperberg-McQueen 2009-04-13 17:32:18 UTC
Having reviewing the occurrences of "possibly empty", I believe
that it is only those occurring in the descriptions of hexBinary,
base64Binary, and anyURI that are prone to the unsatisfactory
reading identified by Noah.  The others differ in syntax or context
enough that I do not believe they need to change.

So I propose to amend my proposed amendment to the wording
proposal to include analogous changes in hexBinary and
base64Binary, from

    ...  the set of possibly empty finite-length sequences of 
    binary octets 

to 

    ...  the set of finite-length sequences of zero or more 
    binary octets

The press of time is a good reason for not going out of our
way to find minor improvement to the spec and raise new
issues about them.  But in this case we have an issue and need
to change the spec in either case; if we can do so without
delaying the WG I don't see that we should not try to make
the wording as clear as we can, in the time available.
Comment 8 Noah Mendelsohn 2009-04-13 18:23:59 UTC
Michael Sperberg-McQueen writes (after suggesting changes to the binary types as well as to anyURI):

> The press of time is a good reason for not going
> out of our way to find minor improvement to the
> spec and raise new issues about them.  But in this
> case we have an issue and need to change the spec
> in either case; if we can do so without delaying
> the WG I don't see that we should not try to make
> the wording as clear as we can, in the time
> available.

As Michael knows but perhaps others do not, I made a point of making my original comment to Michael privately, just to ensure that there is no added burden on the working group of trying to satisfy >me< in particular with respect to this concern.  I am grateful that he felt it worth the trouble to carry them forward to the working group.

For the record, I am very pleased with Michael's proposal to update that descriptions of the binary types as well as the proposed new description for anyURI, but I have not formally asked for any of these changes.  If other commentators and members of the working group are happy with changing or not changing these, then so am I.  In short:  don't let me slow you down.  Thank you.

Noah


 
Comment 9 Sandy Gao 2009-04-13 19:31:21 UTC
One comment on the proposal in comment #5.

I like the idea of following the value space description for string. I think we should go all the way and copy the entire sentence.

"The ·value space· of anyURI is the set of finite-length sequences of zero or more characters (as defined in [XML]) that ·match· the Char production from [XML]."

Otherwise anyURI could have values that can't be represented in XML and xs:string. (Unless that was intended by the proposal in comment #5.)
Comment 10 Dave Peterson 2009-04-13 20:09:09 UTC
(In reply to comment #7)

> So I propose to amend my proposed amendment to the wording
> proposal to include analogous changes in hexBinary and
> base64Binary, from
> 
>     ...  the set of possibly empty finite-length sequences of 
>     binary octets 
> 
> to 
> 
>     ...  the set of finite-length sequences of zero or more 
>     binary octets
> 
> The press of time is a good reason for not going out of our
> way to find minor improvement to the spec and raise new
> issues about them.  But in this case we have an issue and need
> to change the spec in either case; if we can do so without
> delaying the WG I don't see that we should not try to make
> the wording as clear as we can, in the time available.

In which case:  Presumably a binary octet is a sequence of bits.  A sequence of sequences of bits is not a sequence of bits, since a bit is not a sequence of bits.  (Please, let's not violate the axiom of regularity!) Therefore, a finite-length sequence of zero or more binary octets is not a sequence of bits.  What we want is the concatenation of all the terms of the sequence.  So:

   ... the set of finite-length concatenations of sequences of zero
   or more binary octets.

or "the set of finite-length bit-strings of zero or more binary octets" (because concatenation is generally implied for strings but not sequences; that's one of the more important connotational distinctions between strings and sequences).

And that's why I didn't want to start the slippery slope.
Comment 11 Dave Peterson 2009-04-13 20:45:32 UTC
(In reply to comment #7)

> So I propose to amend my proposed amendment to the wording
> proposal to include analogous changes in hexBinary and
> base64Binary, from
> 
>     ...  the set of possibly empty finite-length sequences of 
>     binary octets 
> 
> to 
> 
>     ...  the set of finite-length sequences of zero or more 
>     binary octets

Every definition I've come across defining 'octet' in computer science contexts is effectively "bit-string of length 8".  So 'binary' is redundant.
Comment 12 Dave Peterson 2009-04-13 20:49:33 UTC
(In reply to comment #10)

>    ... the set of finite-length concatenations of sequences of zero
>    or more binary octets.
> 
> or "the set of finite-length bit-strings of zero or more binary octets"

or, perhaps most concisely, "the set of finite-length concatenations of zero or more octets".
Comment 13 C. M. Sperberg-McQueen 2009-04-13 22:24:57 UTC
in re comment #11:  do the sources you cite specify whether the high-order
bit comes first or last?  

I had the impression that in network protocols the notion of octet was 
carefully formulated to remain agnostic on that point.  (But IANAEE.)
If that's so, then RFC 3548 may be being careful instead of careless when it
describes the base 64 encoding as encoding sequences of octets, rather than
sequences of bits.
Comment 14 Dave Peterson 2009-04-13 22:30:51 UTC
This is not worth the time we're spending discussing it.  Pick an option (MSM's, SG's, or mine) and go with it.  I can live with any.
Comment 15 David Ezell 2009-04-17 15:47:48 UTC
Decided:
- 3264 (XML Query and XSL WGs): xs:anyURI definition.
http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b3264.html

Summary: the value space of anyURI needs to be specified.

Note: some discussion in Bugzilla, some amendments proposed.  One
possible point of controversy: are base64Binary and hexBinary
intended to encode sequences of bits or of octets?

MSM's recommendations: fairly quick.

   - Amend as described in comment 7: in hexBinary and
     base64Binary, change

         ...  the set of possibly empty finite-length sequences of
         binary octets

     to

         ...  the set of finite-length sequences of zero or more
         binary octets

   - Amend as described in comment 9: in the new 3.3.18.1, read

         The ·value space· of anyURI is the set of finite-length
         sequences of zero or more characters (as defined in
         [XML]) that ·match· the Char production from [XML].

   - Amend as suggested in comment 5, in the light of comment 9:
     in 3.3.18.2 (anyURI lexical mapping) for

         The ·lexical space· of anyURI is the set of possibly
         empty finite-length character sequences.

     read

         The ·lexical space· of anyURI is the set of finite-length
         sequences of zero or more characters (as defined in
         [XML]) that ·match· the Char production from [XML].

   - And for the record, optionally reaffirm in the minutes that
     base64Binary encodes octet sequences, not (by itself) bit
     sequences.

Comment 16 C. M. Sperberg-McQueen 2009-04-18 03:40:34 UTC
As noted in comment 15, the XML Schema WG discussed this issue today
and resolved it as described in comment 15.

Michael, if you as the originator of the issue would report the disposition
to the XSL and XML Query WGs, we'll be grateful.  Close or reopen the
issue in the usual way to signify agreement or disagreement with our
disposition; if we don't hear from you or QT in 10 days or so, we'll assume
you are happy with this disposition.

Thank you.