2448 – [F&O] Clarification for semantics of upper-case() and lower-case()

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2448 - [F&O] Clarification for semantics of upper-case() and lower-case()

Summary: [F&O] Clarification for semantics of upper-case() and lower-case()

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Functions and Operators 1.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Linux

Importance:	P2 minor
Target Milestone:	---
Assignee:	Ashok Malhotra
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-11-04 10:03 UTC by Colin Adams
Modified:	2006-03-04 18:12 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Colin Adams 2005-11-04 10:03:26 UTC

I am not totally clear on the semantics of upper-case() and lower-case().
Are they suppose to implement full case mappings (as is strongly suggested by:
"Case mappings may change the length of a string."), or simple case mappings?
It would be nice to explicitly say that the full default mappings must be used.

It think it would also be good to add a warning that
lower-case ($A) eq lower-case ($B) is not a true case-insensitive comparison
(for that we would need a fold-case() function, but I'm not suggesting taht be
added for 2.0 at this late stage).

Comment 1 Joanne Tong 2006-02-01 15:34:56 UTC

The XSL and XQuery working group discussed this issue on Feb 1, 2006 and 
accepted the following proposed text for fn:upper-case (and similarly for 
fn:lower-case):

Summary: Returns the value of $arg after translating every character to 
its upper-case correspondent as defined <new>in the appropriate case 
mappings section</new> in the Unicode standard [The Unicode Standard]. 
<new>For versions of Unicode beginning with 2.1.8 update, only 
locale-insensitive case mappings should be applied.  Beginning with 
version 3.2.0 (and likely future versions) of Unicode, precise mappings 
are described in default case operations, which is case mappings in the 
absence of tailoring for particular languages and environments.</new> 
Every lower-case character that does not have an upper-case correspondent, 
as well as every upper-case character, is included in the returned value 
in its original form. 

Note that the last sentence differ from the CR draft due to another comment 
that was accepted by the working groups.

Regarding your suggestion to add a warning that lower-case ($A) eq lower-case 
($B) is not a true case-insensitive comparison, the working groups felt that 
such details should be left to the Unicode specification.

Thank-you for raising this comment.  Please let us know within one week if the 
proposed text is acceptable.

Joanne

Comment 2 Ashok Malhotra 2006-02-02 00:09:50 UTC

The joint WGs decided to close this issue on 2006-02-01 by implementing wording
suggested by Joanne Tong and Jim Melton.

Comment 3 Joanne Tong 2006-02-02 13:47:30 UTC

Comment from Colin Adams:

I'm still not entirely happy with the wording. This phrase:

" Beginning with 
version 3.2.0 (and likely future versions) of Unicode, precise mappings 
are described in default case operations, which is case mappings in the 
absence of tailoring for particular languages and environments."

In fact, two mappings are described - simple mappings in which the
string length doesn't change), and full mappings (in which the string
length can change). Both apply " in the absence of tailoring for
particular languages and environments". 
The simple mappings are only intended for use in legacy applications
that cannot cope with string-length changes (it says this somewhere in
the standard).

Your wording leaves open the possibility of either mapping being used, I
think. If that is intended, then it is implementation defined or
dependent behaviour.

Comment 4 Jim Melton 2006-02-24 21:50:29 UTC

Colin, thanks for your pushback; it forced us to re-read the Unicode material
dealing with case conversions.  I think that we can satisfy both the Working
Groups' intent and your last concern by inserting the word "full" as follows:

"Beginning with version 3.2.0 (and likely future versions) of Unicode, precise
mappings are described in default case operations, which are full case mappings
in the absence of tailoring for particular languages and environments."

I believe that change requires implementations of these two functions to use
both UnicodeData.txt and SpecialCasing.txt, but without application of a
"higher-level protocol" that would invoke tailoring for "particular languages
and environments".  And *that*, I'm virtually positive, satisfies the intent of
the Working Groups for these functions. 

Would you be so kind as to respond to this note indicating your level of comfort
with that solution? 

This is a PERSONAL opinion only and has not been approved by the Working Groups.

Comment 5 Colin Adams 2006-02-25 08:35:41 UTC

I'm very happy with your proposed wording, Jim.

Comment 6 Jim Melton 2006-03-04 00:22:57 UTC

Colin, thanks for your confirmation that our proposed wording resolves your
concern.  We would be grateful if you would explicitly change the status of the
bug to CLOSED.  Thanks!