This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6695 - whiteSpace=collapse
Summary: whiteSpace=collapse
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.0/1.1 both
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-13 08:27 UTC by Michael Kay
Modified: 2009-04-20 15:46 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2009-03-13 08:27:50 UTC
According to a message from Dieter Guthmann on xmlschema-dev today, at least one product (Liquid XML Studio) has interpreted the rules for whiteSpace=collapse to mean that if the input consists entirely of space characters, it is normalized to a single space.

The definition of "collapse" in 4.3.6 relies on the interpretation of the undefined terms "leading" and "trailing", which means that this reading of the spec cannot be dismissed as perverse. (Can one be a leader if one has no followers?)

A more rigorous definition might be:

collapse:
  After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and any #x20 at the start or end of the string is then removed.
Comment 1 Dave Peterson 2009-03-22 00:45:25 UTC
(In reply to comment #0)

> A more rigorous definition might be:
> 
> collapse:
>   After the processing implied by replace, contiguous sequences of #x20's are
> collapsed to a single #x20, and any #x20 at the start or end of the string is
> then removed.

I gather the question is whether an input of whitespace only (and at least one) should collapse to a single space character or to the empty string.  I don't think that proposed definition answers the question any better than the status quo version.  Assuming we *want* to have one space character remain, we should say something explicit, like "; if the original string consisted of only whitespace, the final result is one space character" (added just before the final period).
Comment 2 Dave Peterson 2009-04-13 17:27:42 UTC
It appears that 'leading' and 'trailing' can be interpreted to require something "in between", or not.  It's clear that the WG intended "not":  E.g., consider hexBinary which has whiteSpace = collapse.  The element

   <gorp xsi:type=hexBinary>
   00
   </gorp>

has its [actual value] the single-byte bit-string 00000000.  It would be very counterintuitive if

   <gorp xsi:type=hexBinary>
   
   </gorp>

was invalid rather than having the empty bit-string as its [actual value].  (Note that if all whitespace were removed from each example, their [actual value]s *would* be 00000000 and the empty bit-string, respectively.)

Accordingly it appears clear that the intent in the spec is to interpret 'leading' and 'trailing' as *not* requiring something in between.

The editors propose to add, after the serntence beginning "After the processing implied by replace":

    This means a string consisting entirely of whitespace is
    first replaced with a single space character (#x20) and
    then that character is removed, since it is considered to
    be both a leading and trailing character; the final result
    is the empty character string.

Comment 3 Dave Peterson 2009-04-20 15:33:36 UTC
On 17 Apr the WG agreed to adopt the wording proposed in comment #0

> collapse:
>   After the processing implied by replace, contiguous sequences of #x20's are
> collapsed to a single #x20, and any #x20 at the start or end of the string is
> then removed.

to insure that there was no question whether or not collapsing a whitespace-only string resulted in an empty string.

Mike Kay, assuming you concur (since the fix is from your comment), please mark this issue CLOSED.

I will respond to the question raised by Dieter Guthmann on xmlschema-dev answering that the above is the correct result.