This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27618 - [XSLT30] Defining Decimal Formats Permits Only Single Characters [I18N-ISSUE-398]
Summary: [XSLT30] Defining Decimal Formats Permits Only Single Characters [I18N-ISSUE-...
Status: CLOSED WONTFIX
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-15 22:35 UTC by Addison Phillips
Modified: 2015-10-29 09:50 UTC (History)
2 users (show)

See Also:


Attachments

Description Addison Phillips 2014-12-15 22:35:02 UTC
When decimal formats are defined only single characters are permitted for decimal format elements. The zero-digit element only allows a single numeric character. This means that a sequence such as U+0030 U+20E3 would not be permitted. See section 5.5

[ This item converted from I18N-ACTION-375 ]

This is an I18N WG comment.
Comment 1 Jim Melton 2014-12-15 23:27:40 UTC
This bug applies equally well to the XPath and XQuery Functions and Operators 3.1 specification.
Comment 2 Michael Kay 2015-01-23 09:56:05 UTC
(For those who don't know, 20E3 is a "combining enclosing keycap". So this would allow you to format a number with each digit in a box that looks like a key on a keyboard.)

Initial reaction: horror. Why is a digit in bold or italic the same character, but a digit in a rectangular box with rounded corners a different character? Surely this should be done at the level of choosing a font for the digits?

(Mind you, the same is true for a digit in a circle, which isn't even a modifier, it's a completely separate character. What a mess.)

But if they've done it, they've done it, and we can't change it.

A difficulty here is that we use one property in the decimal-format to indicate a set of ten strings used to represent digits. Using 0 to represent 0,1,2,3...9 works algorithmically because the Unicode sequences are contiguous. Using *0* to represent *0*, *1*, *2*, ... gets more problematical. Is the rule that the string must contain exactly one digit character? Or that it must contain a digit followed optionally by modifiers?

The rules for grouping positions also become more complicated. Are we counting digits or characters?

I'm inclined to think that the requirement is sufficiently specialized that we can leave people to handle it by post-processing the formatted number using the replace() function.
Comment 3 Michael Kay 2015-01-30 21:22:23 UTC
The Working Group discussed this suggestion and did not feel that the use case was compelling, given the additional complexities of (a) describing a family of strings to be used for the ten decimal digits, and (b) refining the rules on things such as grouping positions, which are currently all predicated on each digit being a single character.

A particular challenge would be defining the rules to ensure that the syntax of the pattern remains unambiguous. For example, if we allowed the percent, permille, and exponent separator to be multiple characters, then it would no longer be sufficient to say they must be different strings, we would need a rule to ensure that when they appear in a pattern, we can tell which is which.

We also felt that since we are dealing with a computationally complete language, format-number() is really just a convenience function for handling common requirements: it doesn't need to do everything imaginable. At some stage, it becomes easier for users to write their own formatting function.

I'm marking this as resolved, and the XSL WG hopes that I18N will indicate its agreement by closing the bug. Thank you for your comments.