This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
According to a message from Dieter Guthmann on xmlschema-dev today, at least one product (Liquid XML Studio) has interpreted the rules for whiteSpace=collapse to mean that if the input consists entirely of space characters, it is normalized to a single space. The definition of "collapse" in 4.3.6 relies on the interpretation of the undefined terms "leading" and "trailing", which means that this reading of the spec cannot be dismissed as perverse. (Can one be a leader if one has no followers?) A more rigorous definition might be: collapse: After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and any #x20 at the start or end of the string is then removed.
(In reply to comment #0) > A more rigorous definition might be: > > collapse: > After the processing implied by replace, contiguous sequences of #x20's are > collapsed to a single #x20, and any #x20 at the start or end of the string is > then removed. I gather the question is whether an input of whitespace only (and at least one) should collapse to a single space character or to the empty string. I don't think that proposed definition answers the question any better than the status quo version. Assuming we *want* to have one space character remain, we should say something explicit, like "; if the original string consisted of only whitespace, the final result is one space character" (added just before the final period).
It appears that 'leading' and 'trailing' can be interpreted to require something "in between", or not. It's clear that the WG intended "not": E.g., consider hexBinary which has whiteSpace = collapse. The element <gorp xsi:type=hexBinary> 00 </gorp> has its [actual value] the single-byte bit-string 00000000. It would be very counterintuitive if <gorp xsi:type=hexBinary> </gorp> was invalid rather than having the empty bit-string as its [actual value]. (Note that if all whitespace were removed from each example, their [actual value]s *would* be 00000000 and the empty bit-string, respectively.) Accordingly it appears clear that the intent in the spec is to interpret 'leading' and 'trailing' as *not* requiring something in between. The editors propose to add, after the serntence beginning "After the processing implied by replace": This means a string consisting entirely of whitespace is first replaced with a single space character (#x20) and then that character is removed, since it is considered to be both a leading and trailing character; the final result is the empty character string.
On 17 Apr the WG agreed to adopt the wording proposed in comment #0 > collapse: > After the processing implied by replace, contiguous sequences of #x20's are > collapsed to a single #x20, and any #x20 at the start or end of the string is > then removed. to insure that there was no question whether or not collapsing a whitespace-only string resulted in an empty string. Mike Kay, assuming you concur (since the fix is from your comment), please mark this issue CLOSED. I will respond to the question raised by Dieter Guthmann on xmlschema-dev answering that the above is the correct result.