This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29488 - [QT3] format-integer-030
Summary: [QT3] format-integer-030
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3 & XPath 3 Test Suite (show other bugs)
Version: Candidate Recommendation
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: O'Neil Delpratt
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-19 23:38 UTC by Benito van der Zander
Modified: 2016-04-02 15:54 UTC (History)
3 users (show)

See Also:


Attachments

Description Benito van der Zander 2016-02-19 23:38:54 UTC
The <test>format-integer(602347826, '#(000)000-000')</test> expects '(602)347-826', but the FO says  " A grouping separator is included in the formatted number only if there is a digit to its left"

There is no digit left from the "(", so the output should be '602)347-826'
Comment 1 O'Neil Delpratt 2016-03-04 13:18:18 UTC
There are conditions(In reply to Benito van der Zander from comment #0)
> " A grouping separator is included in the
> formatted number only if there is a digit to its left"

In the spec there are two conditions for the above:

(a) the number is large enough to require that digit, or 
(b) the number of mandatory-digit-signs in the format token requires insignificant leading zeros to be present.

I would condition b applies here.
Comment 2 Benito van der Zander 2016-03-04 13:26:22 UTC
but there are no leading zeros
Comment 3 Michael Kay 2016-03-04 14:23:44 UTC
I'm inclined to agree with Benito here.
Comment 4 Abel Braaksma 2016-03-09 16:14:29 UTC
Reading the rules in 4.6.1 of F&O I also agree with Benito and Michael, but I would like to add that this appears to be an unfortunate limitation. The test shows a certain use-case (phone numbers, though similar formatting with parens is often used with negative numbers as well) where there are "no numbers on the left" of the grouping separator, but you would still want that grouping separator to appear.

However, it appears that his rule is present since F&O 3.0, not sure it is worth the trouble to try to change this (i.e., something like that only repetitive gs's do no appear, but non-repetitive gs's will appear also if there is no digit to the left).
Comment 5 Josh Spiegel 2016-03-09 16:19:01 UTC
I don't think the grouping separator is intended to be used like this.  e.g. you wouldn't want this:
 
  format-integer(123456, '###,000,000') ==> ",123,456"
Comment 6 Abel Braaksma 2016-03-09 16:35:30 UTC
(In reply to Josh Spiegel from comment #5)
> I don't think the grouping separator is intended to be used like this.  e.g.
> you wouldn't want this:
>  
>   format-integer(123456, '###,000,000') ==> ",123,456"

That's why I suggested to only do it for non-repetitive grouping separators. The above example is repetitive according to the format-integer rules.

To make the non-repetitive there must be more than one and their positions must not be multiples of each other, counting from the right, and they must not be different.

That may leave room for improvement. But whether the WG would want to do that I don't know, it's a fairly small, albeit somewhat common, use-case.
Comment 7 Abel Braaksma 2016-03-09 16:36:46 UTC
> and they must not be different.
sorry, they must not be the same (if position is multiple, but character is different, it is not repetitive, which is the case in Benito's example).
Comment 8 Michael Kay 2016-03-09 17:09:37 UTC
The rule for irregular grouping separators was intended for India:

http://www.statisticalconsultants.co.nz/blog/how-the-world-separates-its-digits.html
Comment 9 Benito van der Zander 2016-03-09 17:28:02 UTC
By the way, it is also not clear how the absence of grouping separators is handled.

E.g. in format-integer(123456789, "###0,00")

It says a single separator is always repeated/"extrapolated to the left", so it seems the above should return 1,23,45,67,89
but that looks odd
Comment 10 Abel Braaksma 2016-03-10 11:27:06 UTC
(In reply to Benito van der Zander from comment #9)
> It says a single separator is always repeated/"extrapolated to the left", so
> it seems the above should return 1,23,45,67,89
> but that looks odd
That is precisely how it should work, you interpret the spec correctly there. If a user wants decimal point behavior, he should use format-number instead.

(In reply to Michael Kay from comment #8)
> The rule for irregular grouping separators was intended for India:
> 
> http://www.statisticalconsultants.co.nz/blog/how-the-world-separates-its-
> digits.html
Wikipedia has a nice list as well: https://en.wikipedia.org/wiki/Decimal_mark#Digit_grouping. Some grouping mentioned there is not easy to express, even with the current rules.

I agree that changing the rules here apparently means breaking other supported behaviors / locales. Guess that phone-number formatting, as Benito found, has to be done in a different way then (I don't see a way to express it with merely format-integer).
Comment 11 Benito van der Zander 2016-03-11 20:13:59 UTC
>That is precisely how it should work, you interpret the spec correctly there. If a user wants decimal point behavior, he should use format-number instead.

So you agree that the absence does not matter?

There just appeared a test ( format-integer-072 ) for the opposite behavior, where the absence does matter.
Comment 12 Josh Spiegel 2016-03-11 20:29:50 UTC
I just added the test. 

    <test-case name="format-integer-072">...
        <test>format-integer(123456789, '000,00,00')</test>
        <result>
            <assert-eq>'12345,67,89'</assert-eq>
        </result>
    </test-case>

Here is what the specification says:

"If grouping-separator-signs appear at regular intervals within the format token, that is if the same grouping separator appears at positions forming a sequence N, 2N, 3N, ... for some integer value N (including the case where there is only one number in the list), then the sequence is extrapolated to the left.."

My analysis is:

N=3.
At position N  (3) = ,
At position 2N (6) = ,
At position 3N (9) = 3

So no pattern...

Did I miss something?
Comment 13 Benito van der Zander 2016-03-11 20:32:44 UTC
I interpret it as:

, appears at  positions 3 and 6

The sequence (3, 6) matches the pattern 3N
Comment 14 Josh Spiegel 2016-03-11 21:15:52 UTC
Ok, then as you say, I think we need some clarification in the spec.  

You gave this example:

  format-integer(123456789, "###0,00") ==> 1,23,45,67,89

The specification also says:

  "The only purpose of optional-digit-signs is to mark the position of grouping-separator-signs. "

So, under your interpretation are the optional digit signs in this example meaningless?  i.e. is it equivalent to this?

   format-integer(123456789, "0,00")

Also, I think it could be more explicit about patterns like these:

   format-integer(123456789, '0.000,00,00')
   format-integer(12345678912345, '00.00.00,00,00')

I assume the intent is that there is no extrapolation in these cases.
Comment 15 Abel Braaksma 2016-03-14 00:07:53 UTC
(In reply to Josh Spiegel from comment #12)
> "If grouping-separator-signs appear at regular intervals within the format
> token, that is if the same grouping separator appears at positions forming a
> sequence N, 2N, 3N, ... for some integer value N (including the case where
> there is only one number in the list), then the sequence is extrapolated to
> the left.."
> 
> My analysis is:
> 
> N=3.
> At position N  (3) = ,
> At position 2N (6) = ,
> At position 3N (9) = 3
> 
> So no pattern...
> 
> Did I miss something?
The way I understand the specification, and I believe a couple of tests to be there (but I didn't check), is that:

1) if you have only one GS on pos N, it is always repeated
2) if you have 2 or more GS, they are the same, they are on positions that are multiples (same distances), it is always repeated
3) any other case is not repeated

That means that the output of "format-integer(123456789, '000,00,00')" should be "1,23,45,67,89".

If that is not the case, I think I have misread the spec as well.

(In reply to Josh Spiegel from comment #14)
> So, under your interpretation are the optional digit signs in this example 
> meaningless?  i.e. is it equivalent to this?
> 
>   format-integer(123456789, "0,00")
Yes, the quoted example (with "format-integer(123456789, "###0,00")") is equivalent, the optional digit signs are meaningless here (at least, that is the way I interpret the spec text and, iirc, earlier discussions on the matter, but I can't find back the mail).

(In reply to Josh Spiegel from comment #14)
> I assume the intent is that there is no extrapolation in these cases.
I think you are right, because the format string contains different GS characters.
Comment 16 Abel Braaksma 2016-03-14 00:11:44 UTC
(In reply to Benito van der Zander from comment #11)
> So you agree that the absence does not matter?
> 
> There just appeared a test ( format-integer-072 ) for the opposite behavior,
> where the absence does matter.
See also my comment#15. I think test format-integer-072 to be incorrect as currently written.
Comment 17 Benito van der Zander 2016-03-14 00:39:49 UTC
Btw, have you ever considered adding a big test case for this?

let $product := fold-left(?, 1, function($x, $y){$x * $y}),
    $big := $product(1 to 100000)
return  format-integer($big, "w") 

=> https://gist.githubusercontent.com/benibela/f0163b02562f647e4d2f/raw/8fe7d9699498e8666a77ad75f1cb7d6a403092fb/format-integer
Comment 18 Josh Spiegel 2016-03-14 15:25:49 UTC
Re comment 15 and comment 17, I think the problem is not that format-integer-072 is written incorrectly but rather that this feature is currently underspecified (as suggested by Benito in comment 9).  I ran this test in another publicly released implementation and found that it produced yet a third answer: 1,2345,67,89.  So, implementors have reached at least 3 different conclusions on this part of the definition.  

I propose changing this to a specification bug, requesting clarification.  Depending on the outcome, I will adjust the result of format-integer-072 as necessary.  

I think the specification should clarify what constitutes a grouping separator pattern.  
  (1) Can there be multiple overlapping patterns?  
  (2) Can overlapping patterns involve different grouping separators?  
  (3) Does a pattern extrapolate within the range of the picture string?

I think Abel, Benito, and I agree that the answer to 1 and 2 should be 'no'.  However, I have a different opinion on 3.  My inclination is that additional characters in the picture should break a pattern based on these observations:  

 - Why would the user include the additional digits without a separator if she really wanted a separator?  The separator can always be added in the picture to complete the pattern if that is what is really desired.  In other words, the function is more expressive/flexible if additional digits break the pattern.

 - If additional characters do not break the pattern, it is then confusing that additional trailing optional digits are allowed but serve no purpose.  

 - It "looks odd" (comment 9) if a pattern extrapolates within the range of the picture.  I think the expectation is that the the formatted number looks similar to the picture string.
Comment 19 Michael Kay 2016-03-14 16:34:02 UTC
I propose to clarify the specification to say:

The *position* of a grouping separator is the number of optional-digit-signs and mandatory-digit-signs appearing between the grouping separator and the right-hand end of the primary format token.

Grouping separators are defined to be *regular* if:

* there is at least one grouping separator
* every grouping separator is the same character C
* there is a positive integer G (the grouping size) such that:

** the position of every grouping separator is an integer multiple of G

** every positive integer multiple of G that is less than the number of optional-digit-signs and mandatory-digit-signs in the primary format token is the position of a grouping separator.

The *grouping separator template* is a (possibly infinite) set of (position, character) pairs. 

* If grouping separators are regular then the grouping separator template contains one pair of the form (n*G, C) for every positive integer n where G is the grouping size and C is the grouping character.

* Otherwise, the grouping separator template contains one pair of the form (P, C) for every grouping separator found in the primary formatting token, where C is the grouping separator character and P is its position.

Note: if there are no grouping separators, the grouping separator template is an empty set.

The number is formatted as follows:

1. The number is formatted in decimal notation as if by casting the supplied integer to a string.

2. If the number of digits is less than the number of mandatory-digit-signs in the primary format picture then it is extended on the left with leading zeroes to make it up to this size.

3. All digits 0-9 are replaced by corresponding digits from the selected digit family, producing a string S.

4. For every (position P, character C) pair in the grouping separator template where P is less than the number of digits in S, character C is inserted into S at position P (counting from the right-hand end).

5. If the ordinal modifier is present, then the resulting string is converted into ordinal form as described below.
 

I don't propose to extend the specification to handle the "phone number" use case, by allowing grouping separators to appear at the start or end, or adjacent to each other. I don't think the spec was intended for this purpose and the changes would be non-trivial.
Comment 20 Josh Spiegel 2016-03-14 17:37:28 UTC
I am in favor of the direction of comment 19 but I would like clarification on the following point. 

Position is only defined for grouping separators.  However, you say "** every positive integer multiple of G that is less than the number of
optional-digit-signs and mandatory-digit-signs in the primary format token is
the position of a grouping separator."

Which implies that characters other than grouping separators also have a position.  And if all characters in the pattern have a position, then given the current definition, multiple characters will commonly have the same position. For example:

   1,234

The position of both '1' and ',' is 3.
Comment 21 Josh Spiegel 2016-03-14 17:41:58 UTC
Please ignore comment 20.  I see how it works now.
Comment 22 Benito van der Zander 2016-03-14 18:43:18 UTC
format-number has the same issue with grouping separators, right?
Comment 23 Michael Kay 2016-03-14 18:50:52 UTC
No, format-number does not do grouping.
Comment 24 Benito van der Zander 2016-03-15 12:28:42 UTC
But FO says about format-number's picture string:
>The integer-part-grouping-positions is a sequence of integers representing the positions of grouping separators within the integer part of the sub-picture. For each grouping-separator-sign that appears within the integer part of the sub-picture, this sequence contains an integer that is equal to the total number of optional-digit-sign and decimal-digit-family characters that appear within the integer part of the sub-picture and to the right of the grouping-separator-sign. In addition, if these integer-part-grouping-positions are at regular intervals (that is, if they form a sequence N, 2N, 3N, ... for some integer value N, including the case where there is only one number in the list), then the sequence contains all integer multiples of N as far as necessary to accommodate the largest possible number.
Comment 25 Michael Kay 2016-03-15 12:59:00 UTC
Right, thanks, I overlooked that.
Comment 26 Josh Spiegel 2016-03-15 18:26:12 UTC
DECISION: For bug 29488, accept the proposal in comment 19.  We will not enhance fn:format-integer to support the telephone number use case (format-integer-030).  Note, under the accepted proposal, the current result for format-integer-072 is correct.

ACTION A-636-04: MKay to adopt the proposal in comment 19 of bug 29488 and fix the expected result of format-integer-030.

(We didn't talk about fn:format-number(), if you think there is also an issue there, can you please file another bug?)
Comment 27 Abel Braaksma 2016-04-02 15:54:39 UTC
(In reply to Josh Spiegel from comment #26)
> (We didn't talk about fn:format-number(), if you think there is also an
> issue there, can you please file another bug?)
For reference to later visitors, Benito filed this under Bug 29534.