<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>29488</bug_id>
          
          <creation_ts>2016-02-19 23:38:54 +0000</creation_ts>
          <short_desc>[QT3] format-integer-030</short_desc>
          <delta_ts>2016-04-02 15:54:39 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>XQuery 3 &amp; XPath 3 Test Suite</component>
          <version>Candidate Recommendation</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Benito van der Zander">benito</reporter>
          <assigned_to name="O&apos;Neil Delpratt">oneil</assigned_to>
          <cc>abel.braaksma</cc>
    
    <cc>josh.spiegel</cc>
    
    <cc>mike</cc>
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>125170</commentid>
    <comment_count>0</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-02-19 23:38:54 +0000</bug_when>
    <thetext>The &lt;test&gt;format-integer(602347826, &apos;#(000)000-000&apos;)&lt;/test&gt; expects &apos;(602)347-826&apos;, but the FO says  &quot; A grouping separator is included in the formatted number only if there is a digit to its left&quot;

There is no digit left from the &quot;(&quot;, so the output should be &apos;602)347-826&apos;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125363</commentid>
    <comment_count>1</comment_count>
    <who name="O&apos;Neil Delpratt">oneil</who>
    <bug_when>2016-03-04 13:18:18 +0000</bug_when>
    <thetext>There are conditions(In reply to Benito van der Zander from comment #0)
&gt; &quot; A grouping separator is included in the
&gt; formatted number only if there is a digit to its left&quot;

In the spec there are two conditions for the above:

(a) the number is large enough to require that digit, or 
(b) the number of mandatory-digit-signs in the format token requires insignificant leading zeros to be present.

I would condition b applies here.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125364</commentid>
    <comment_count>2</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-03-04 13:26:22 +0000</bug_when>
    <thetext>but there are no leading zeros</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125366</commentid>
    <comment_count>3</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-03-04 14:23:44 +0000</bug_when>
    <thetext>I&apos;m inclined to agree with Benito here.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125406</commentid>
    <comment_count>4</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-03-09 16:14:29 +0000</bug_when>
    <thetext>Reading the rules in 4.6.1 of F&amp;O I also agree with Benito and Michael, but I would like to add that this appears to be an unfortunate limitation. The test shows a certain use-case (phone numbers, though similar formatting with parens is often used with negative numbers as well) where there are &quot;no numbers on the left&quot; of the grouping separator, but you would still want that grouping separator to appear.

However, it appears that his rule is present since F&amp;O 3.0, not sure it is worth the trouble to try to change this (i.e., something like that only repetitive gs&apos;s do no appear, but non-repetitive gs&apos;s will appear also if there is no digit to the left).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125407</commentid>
    <comment_count>5</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-03-09 16:19:01 +0000</bug_when>
    <thetext>
I don&apos;t think the grouping separator is intended to be used like this.  e.g. you wouldn&apos;t want this:
 
  format-integer(123456, &apos;###,000,000&apos;) ==&gt; &quot;,123,456&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125409</commentid>
    <comment_count>6</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-03-09 16:35:30 +0000</bug_when>
    <thetext>(In reply to Josh Spiegel from comment #5)
&gt; I don&apos;t think the grouping separator is intended to be used like this.  e.g.
&gt; you wouldn&apos;t want this:
&gt;  
&gt;   format-integer(123456, &apos;###,000,000&apos;) ==&gt; &quot;,123,456&quot;

That&apos;s why I suggested to only do it for non-repetitive grouping separators. The above example is repetitive according to the format-integer rules.

To make the non-repetitive there must be more than one and their positions must not be multiples of each other, counting from the right, and they must not be different.

That may leave room for improvement. But whether the WG would want to do that I don&apos;t know, it&apos;s a fairly small, albeit somewhat common, use-case.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125410</commentid>
    <comment_count>7</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-03-09 16:36:46 +0000</bug_when>
    <thetext>&gt; and they must not be different.
sorry, they must not be the same (if position is multiple, but character is different, it is not repetitive, which is the case in Benito&apos;s example).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125411</commentid>
    <comment_count>8</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-03-09 17:09:37 +0000</bug_when>
    <thetext>The rule for irregular grouping separators was intended for India:

http://www.statisticalconsultants.co.nz/blog/how-the-world-separates-its-digits.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125412</commentid>
    <comment_count>9</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-03-09 17:28:02 +0000</bug_when>
    <thetext>By the way, it is also not clear how the absence of grouping separators is handled.

E.g. in format-integer(123456789, &quot;###0,00&quot;)

It says a single separator is always repeated/&quot;extrapolated to the left&quot;, so it seems the above should return 1,23,45,67,89
but that looks odd</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125429</commentid>
    <comment_count>10</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-03-10 11:27:06 +0000</bug_when>
    <thetext>(In reply to Benito van der Zander from comment #9)
&gt; It says a single separator is always repeated/&quot;extrapolated to the left&quot;, so
&gt; it seems the above should return 1,23,45,67,89
&gt; but that looks odd
That is precisely how it should work, you interpret the spec correctly there. If a user wants decimal point behavior, he should use format-number instead.

(In reply to Michael Kay from comment #8)
&gt; The rule for irregular grouping separators was intended for India:
&gt; 
&gt; http://www.statisticalconsultants.co.nz/blog/how-the-world-separates-its-
&gt; digits.html
Wikipedia has a nice list as well: https://en.wikipedia.org/wiki/Decimal_mark#Digit_grouping. Some grouping mentioned there is not easy to express, even with the current rules.

I agree that changing the rules here apparently means breaking other supported behaviors / locales. Guess that phone-number formatting, as Benito found, has to be done in a different way then (I don&apos;t see a way to express it with merely format-integer).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125447</commentid>
    <comment_count>11</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-03-11 20:13:59 +0000</bug_when>
    <thetext>&gt;That is precisely how it should work, you interpret the spec correctly there. If a user wants decimal point behavior, he should use format-number instead.

So you agree that the absence does not matter?

There just appeared a test ( format-integer-072 ) for the opposite behavior, where the absence does matter.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125448</commentid>
    <comment_count>12</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-03-11 20:29:50 +0000</bug_when>
    <thetext>I just added the test. 

    &lt;test-case name=&quot;format-integer-072&quot;&gt;...
        &lt;test&gt;format-integer(123456789, &apos;000,00,00&apos;)&lt;/test&gt;
        &lt;result&gt;
            &lt;assert-eq&gt;&apos;12345,67,89&apos;&lt;/assert-eq&gt;
        &lt;/result&gt;
    &lt;/test-case&gt;

Here is what the specification says:

&quot;If grouping-separator-signs appear at regular intervals within the format token, that is if the same grouping separator appears at positions forming a sequence N, 2N, 3N, ... for some integer value N (including the case where there is only one number in the list), then the sequence is extrapolated to the left..&quot;

My analysis is:

N=3.
At position N  (3) = ,
At position 2N (6) = ,
At position 3N (9) = 3

So no pattern...

Did I miss something?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125449</commentid>
    <comment_count>13</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-03-11 20:32:44 +0000</bug_when>
    <thetext>I interpret it as:

, appears at  positions 3 and 6

The sequence (3, 6) matches the pattern 3N</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125450</commentid>
    <comment_count>14</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-03-11 21:15:52 +0000</bug_when>
    <thetext>Ok, then as you say, I think we need some clarification in the spec.  

You gave this example:

  format-integer(123456789, &quot;###0,00&quot;) ==&gt; 1,23,45,67,89

The specification also says:

  &quot;The only purpose of optional-digit-signs is to mark the position of grouping-separator-signs. &quot;

So, under your interpretation are the optional digit signs in this example meaningless?  i.e. is it equivalent to this?

   format-integer(123456789, &quot;0,00&quot;)

Also, I think it could be more explicit about patterns like these:

   format-integer(123456789, &apos;0.000,00,00&apos;)
   format-integer(12345678912345, &apos;00.00.00,00,00&apos;)

I assume the intent is that there is no extrapolation in these cases.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125454</commentid>
    <comment_count>15</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-03-14 00:07:53 +0000</bug_when>
    <thetext>(In reply to Josh Spiegel from comment #12)
&gt; &quot;If grouping-separator-signs appear at regular intervals within the format
&gt; token, that is if the same grouping separator appears at positions forming a
&gt; sequence N, 2N, 3N, ... for some integer value N (including the case where
&gt; there is only one number in the list), then the sequence is extrapolated to
&gt; the left..&quot;
&gt; 
&gt; My analysis is:
&gt; 
&gt; N=3.
&gt; At position N  (3) = ,
&gt; At position 2N (6) = ,
&gt; At position 3N (9) = 3
&gt; 
&gt; So no pattern...
&gt; 
&gt; Did I miss something?
The way I understand the specification, and I believe a couple of tests to be there (but I didn&apos;t check), is that:

1) if you have only one GS on pos N, it is always repeated
2) if you have 2 or more GS, they are the same, they are on positions that are multiples (same distances), it is always repeated
3) any other case is not repeated

That means that the output of &quot;format-integer(123456789, &apos;000,00,00&apos;)&quot; should be &quot;1,23,45,67,89&quot;.

If that is not the case, I think I have misread the spec as well.

(In reply to Josh Spiegel from comment #14)
&gt; So, under your interpretation are the optional digit signs in this example 
&gt; meaningless?  i.e. is it equivalent to this?
&gt; 
&gt;   format-integer(123456789, &quot;0,00&quot;)
Yes, the quoted example (with &quot;format-integer(123456789, &quot;###0,00&quot;)&quot;) is equivalent, the optional digit signs are meaningless here (at least, that is the way I interpret the spec text and, iirc, earlier discussions on the matter, but I can&apos;t find back the mail).

(In reply to Josh Spiegel from comment #14)
&gt; I assume the intent is that there is no extrapolation in these cases.
I think you are right, because the format string contains different GS characters.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125455</commentid>
    <comment_count>16</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-03-14 00:11:44 +0000</bug_when>
    <thetext>(In reply to Benito van der Zander from comment #11)
&gt; So you agree that the absence does not matter?
&gt; 
&gt; There just appeared a test ( format-integer-072 ) for the opposite behavior,
&gt; where the absence does matter.
See also my comment#15. I think test format-integer-072 to be incorrect as currently written.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125456</commentid>
    <comment_count>17</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-03-14 00:39:49 +0000</bug_when>
    <thetext>Btw, have you ever considered adding a big test case for this?

let $product := fold-left(?, 1, function($x, $y){$x * $y}),
    $big := $product(1 to 100000)
return  format-integer($big, &quot;w&quot;) 

=&gt; https://gist.githubusercontent.com/benibela/f0163b02562f647e4d2f/raw/8fe7d9699498e8666a77ad75f1cb7d6a403092fb/format-integer</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125475</commentid>
    <comment_count>18</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-03-14 15:25:49 +0000</bug_when>
    <thetext>Re comment 15 and comment 17, I think the problem is not that format-integer-072 is written incorrectly but rather that this feature is currently underspecified (as suggested by Benito in comment 9).  I ran this test in another publicly released implementation and found that it produced yet a third answer: 1,2345,67,89.  So, implementors have reached at least 3 different conclusions on this part of the definition.  

I propose changing this to a specification bug, requesting clarification.  Depending on the outcome, I will adjust the result of format-integer-072 as necessary.  

I think the specification should clarify what constitutes a grouping separator pattern.  
  (1) Can there be multiple overlapping patterns?  
  (2) Can overlapping patterns involve different grouping separators?  
  (3) Does a pattern extrapolate within the range of the picture string?

I think Abel, Benito, and I agree that the answer to 1 and 2 should be &apos;no&apos;.  However, I have a different opinion on 3.  My inclination is that additional characters in the picture should break a pattern based on these observations:  

 - Why would the user include the additional digits without a separator if she really wanted a separator?  The separator can always be added in the picture to complete the pattern if that is what is really desired.  In other words, the function is more expressive/flexible if additional digits break the pattern.

 - If additional characters do not break the pattern, it is then confusing that additional trailing optional digits are allowed but serve no purpose.  

 - It &quot;looks odd&quot; (comment 9) if a pattern extrapolates within the range of the picture.  I think the expectation is that the the formatted number looks similar to the picture string.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125482</commentid>
    <comment_count>19</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-03-14 16:34:02 +0000</bug_when>
    <thetext>I propose to clarify the specification to say:

The *position* of a grouping separator is the number of optional-digit-signs and mandatory-digit-signs appearing between the grouping separator and the right-hand end of the primary format token.

Grouping separators are defined to be *regular* if:

* there is at least one grouping separator
* every grouping separator is the same character C
* there is a positive integer G (the grouping size) such that:

** the position of every grouping separator is an integer multiple of G

** every positive integer multiple of G that is less than the number of optional-digit-signs and mandatory-digit-signs in the primary format token is the position of a grouping separator.

The *grouping separator template* is a (possibly infinite) set of (position, character) pairs. 

* If grouping separators are regular then the grouping separator template contains one pair of the form (n*G, C) for every positive integer n where G is the grouping size and C is the grouping character.

* Otherwise, the grouping separator template contains one pair of the form (P, C) for every grouping separator found in the primary formatting token, where C is the grouping separator character and P is its position.

Note: if there are no grouping separators, the grouping separator template is an empty set.

The number is formatted as follows:

1. The number is formatted in decimal notation as if by casting the supplied integer to a string.

2. If the number of digits is less than the number of mandatory-digit-signs in the primary format picture then it is extended on the left with leading zeroes to make it up to this size.

3. All digits 0-9 are replaced by corresponding digits from the selected digit family, producing a string S.

4. For every (position P, character C) pair in the grouping separator template where P is less than the number of digits in S, character C is inserted into S at position P (counting from the right-hand end).

5. If the ordinal modifier is present, then the resulting string is converted into ordinal form as described below.
 

I don&apos;t propose to extend the specification to handle the &quot;phone number&quot; use case, by allowing grouping separators to appear at the start or end, or adjacent to each other. I don&apos;t think the spec was intended for this purpose and the changes would be non-trivial.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125485</commentid>
    <comment_count>20</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-03-14 17:37:28 +0000</bug_when>
    <thetext>I am in favor of the direction of comment 19 but I would like clarification on the following point. 

Position is only defined for grouping separators.  However, you say &quot;** every positive integer multiple of G that is less than the number of
optional-digit-signs and mandatory-digit-signs in the primary format token is
the position of a grouping separator.&quot;

Which implies that characters other than grouping separators also have a position.  And if all characters in the pattern have a position, then given the current definition, multiple characters will commonly have the same position. For example:

   1,234

The position of both &apos;1&apos; and &apos;,&apos; is 3.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125486</commentid>
    <comment_count>21</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-03-14 17:41:58 +0000</bug_when>
    <thetext>Please ignore comment 20.  I see how it works now.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125490</commentid>
    <comment_count>22</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-03-14 18:43:18 +0000</bug_when>
    <thetext>format-number has the same issue with grouping separators, right?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125491</commentid>
    <comment_count>23</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-03-14 18:50:52 +0000</bug_when>
    <thetext>No, format-number does not do grouping.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125499</commentid>
    <comment_count>24</comment_count>
    <who name="Benito van der Zander">benito</who>
    <bug_when>2016-03-15 12:28:42 +0000</bug_when>
    <thetext>But FO says about format-number&apos;s picture string:
&gt;The integer-part-grouping-positions is a sequence of integers representing the positions of grouping separators within the integer part of the sub-picture. For each grouping-separator-sign that appears within the integer part of the sub-picture, this sequence contains an integer that is equal to the total number of optional-digit-sign and decimal-digit-family characters that appear within the integer part of the sub-picture and to the right of the grouping-separator-sign. In addition, if these integer-part-grouping-positions are at regular intervals (that is, if they form a sequence N, 2N, 3N, ... for some integer value N, including the case where there is only one number in the list), then the sequence contains all integer multiples of N as far as necessary to accommodate the largest possible number.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125500</commentid>
    <comment_count>25</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-03-15 12:59:00 +0000</bug_when>
    <thetext>Right, thanks, I overlooked that.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125511</commentid>
    <comment_count>26</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-03-15 18:26:12 +0000</bug_when>
    <thetext>DECISION: For bug 29488, accept the proposal in comment 19.  We will not enhance fn:format-integer to support the telephone number use case (format-integer-030).  Note, under the accepted proposal, the current result for format-integer-072 is correct.

ACTION A-636-04: MKay to adopt the proposal in comment 19 of bug 29488 and fix the expected result of format-integer-030.

(We didn&apos;t talk about fn:format-number(), if you think there is also an issue there, can you please file another bug?)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125694</commentid>
    <comment_count>27</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-04-02 15:54:39 +0000</bug_when>
    <thetext>(In reply to Josh Spiegel from comment #26)
&gt; (We didn&apos;t talk about fn:format-number(), if you think there is also an
&gt; issue there, can you please file another bug?)
For reference to later visitors, Benito filed this under Bug 29534.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>