28845 – fn:format-number, formatting rules for exponential notation

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28845 - fn:format-number, formatting rules for exponential notation

Summary: fn:format-number, formatting rules for exponential notation

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Functions and Operators 3.1 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-06-24 09:20 UTC by Christian Gruen
Modified:	2016-03-22 10:12 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description Christian Gruen 2015-06-24 09:20:46 UTC

I have a slightly hard time understanding the formatting semantics for the new exponential notation of fn:format-number. This the current wording for Rule 4 and 5:
_______________________________________________

4. If the sub-picture contains a percent-sign, the number is multiplied by 100. If the sub-picture contains a per-mille-sign, the number is multiplied by 1000. The resulting number is referred to below as the adjusted number.

5. If the minimum exponent size is non-zero, then the adjusted number is scaled to establish a mantissa and an integer exponent. The mantissa and exponent are chosen such that

(a) the primitive type of the mantissa is the same as the primitive type of the adjusted number (integer, decimal, float, or double),
(b) the mantissa multiplied by ten to the power of the exponent is equal to the adjusted number, and
(c) the number of significant digits in the integer part of the mantissa is equal to the minimum integer part size.
_______________________________________________

Some comments/questions:

- Rule 4: The following sentence could be added (provided that it's correct): "If there is no percent-sign or per-mille-sign, the adjusted number will be equal to the original number."

- Rule 5a is clear, but 5b and 5c is a bit confusing to me (although I'm sorry I cannot provide a better solution yet, because I don't understand the exact semantics yet). Is the leading zero of a number larger than 0 and smaller than 1 a "significant digit"? What is going to happen if the minimum integer part size is 0 and if there are optional digit signs in the integer part of the pattern? In the current test cases "numberformat135" and "numberformat136"...

fn:format-number(0.2, '#.e9')
fn:format-number(0.2, '9e9')

...the expected result is "2e-1". Is the result correct for both test cases?

Thanks in advance,
Christian

Comment 1 Michael Kay 2015-06-24 09:59:06 UTC

Firstly, rule (c) is talking about the number of significant digits in an integer. The number of significant digits in an integer is the number of digits excluding leading zeroes.

I believe that rules (b) and (c) in conjunction uniquely define the values of the mantissa and exponent. If the adjusted number is (say) 1234.5678, then rule (b) allows you to choose (M,E) as any of (for example) (1234.5678, 0) or (123.45678, 1) or (12.345678, 2) etc. Rule (c) then restricts your choice among these alternatives: if the minimum integer part size is 3, then you have to choose (123.45678, 1) because it is the only one with three significant digits in the integer part.

You ask: What is going to happen if the minimum integer part size is 0 and if there are optional digit signs in the integer part of the pattern? 

Optional digit signs to the left of the decimal point are more-or-less ignored; their only effect is to allow you to space out the grouping separators. (This question was resolved during the development of XSLT 2.0; some people read the XSLT 1.0 spec as saying that given a picture of '00.00', the number 123 should be formatted as "23.00" and we decided that this interpretation was wrong; the effect of this decision is that the picture '##00.00' has exactly the same effect as '00.00'.)

You ask: In the current test cases "numberformat135" and "numberformat136"...

  fn:format-number(0.2, '#.e9')
  fn:format-number(0.2, '9e9') 

...the expected result is "2e-1". Is the result correct for both test cases?

The second one looks OK but I'm not convinced by the first. I think the rules give us a mantissa of 0.2 and an exponent of 1. The maximum fractional digits is 0 so 0.2 is rounded to zero. My reading of the rules says that after rule 7, the mantissa is ".", and after rule 12, it is "", so the result should be e1 (which isn't very nice, but you get what you ask for)

Comment 2 Christian Gruen 2015-06-24 11:40:38 UTC

Thanks for helping me through. I now agree with you that 5b and 5c should indeed be sufficient to choose mantissa and exponent. I was mostly confused by the test results, and also the varying uses of 'mantissa' [1].

I have recategorized this as a test suite bug, and it is about test case "numberformat135":

  fn:format-number(0.2, '#.e9')

I would expect "0e0" as result:

* according to 5b and 5c, (0.2, 0) must be chosen for (M, E)
* format-number(0,'#.') returns '0'

[1] https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22

Comment 3 Michael Kay 2015-07-14 23:00:14 UTC

You are quite correct to observe that the traditional computer-science use of the term "mantissa" has been criticized as being incorrect, and that purists prefer "significand". I will check to see what the impact of a change here might be.

XSD 1.0 uses "mantissa"; XSD 1.1 seems to get by without either term...

Comment 4 Michael Kay 2015-07-17 16:05:03 UTC

We walked through test case numberformat135 in the WG as follows.

The test is

fn:format-number(0.2, '#.e9')

4.7.4 Analyzing the picture string:

* integer-part-grouping-positions = empty set

* minimum-integer-part-size = 1

* prefix = ""

* fractional-part-grouping-positions = empty set

* minimum-fractional-part-size = 0

* maximum-fractional-part-size = 0

* minimum-exponent-size = 1

* suffix = ""

4.7.5 Formatting the number

1. If the input number is NaN - N/A

2. The positive sub-picture is used if the input number is positive

3. If the sub-picture contains a percent-sign: N/A

4. If the minimum exponent size is non-zero, then the adjusted number is scaled to establish a mantissa and an integer exponent. 

mantissa = 0.2, exponent = 0

5. The mantissa is converted (if necessary) to an xs:decimal value: no conversion needed.

6. The absolute value of the rounded number is converted to a string in decimal notation: ".2". ** Here I think we went wrong. We missed the sentence "This value is then rounded so that it uses no more than maximum-fractional-part-size digits in its fractional part. " Corrected result: "."

7. If the number of digits to the right of the decimal-separator-sign is less than minimum-fractional-part-size: N/A

8. For each integer N in the integer-part-grouping-positions list: no action

9. For each integer N in the fractional-part-grouping-positions list: no action

10. If there is no decimal-separator-sign in the sub-picture, or if there are no digits to the right of the decimal-separator-sign character in the string: Original analysis: N/A. Corrected analysis: The string is now "".

11. If an exponent exists, then the string produced from the mantissa as described above is extended with the following, in order: (a) the exponent separator sign; (b) if the exponent is negative, the minus sign; (c) the value of the exponent represented as a decimal integer, extended if necessary with leading zeroes to make it up to the minimum exponent size, using digits taken from the decimal digit family. => ".2e0" Corrected analysis: "e0"

12. The result of the function is the concatenation of the appropriate prefix, the string conversion of the number as obtained above, and the appropriate suffix. => .2e0. Corrected analysis: "e0"

A strange result perhaps, but you get what you ask for.

Comment 5 Debbie Lockett 2015-07-30 12:34:54 UTC

Apologies for not being aware of the ongoing discussion in this bug earlier.

I think there remain some clarifications to be made in the specification, relating to Christian's questions (and the test case numberformat135: fn:format-number(0.2, '#.e9') expected result '2e-1'). It seems there are three main issues, to do with significant digits, minimum-integer-part-size, and the rules for choosing the mantissa and exponent.

Recall that the definition of minimum-integer-part-size (in 4.7.4) is currently:
"an integer indicating the minimum number of digits that will appear to the left of the decimal-separator character. It is normally set to the number of ·decimal digit family· characters found in the integer part of the sub-picture. But if the sub-picture contains no ·decimal digit family· character and no decimal-separatorXP31 character, it is set to one."

1. First of all, it would be worth properly defining "significant digit of a number" in the Spec (this has clearly caused some confusion). Specifically, stating that leading zeroes of a number are not significant. So the first (and only) significant digit of 0.2 is the digit '2'. Note that this definition holds for *any* number, not just an integer (Mike only refered to integers in comment 1, but in fact we need it for all numbers). Significant digits are refered to in "4.7.5 Formatting the number" in rule 5(c); but also earlier in "4.7.4 Analysing the picture string" - in the Note after the definition of minimum-integer-part-size; and also the Notes section of 4.7.2 (which says: "Numbers will always be formatted with the most significant digit on the left." But I don't know what this means. That numbers read from left to right??? Is it necessary to state that?)

Thus (using 5(c) "the number of significant digits in the integer part of the mantissa is equal to the minimum integer part size") we obtain for example:

fn:format-number(0.2, '0.0e9') has expected result '2.0e-1' (not 0.2e0, for which the mantissa has zero significant digits in its integer part), minimum-integer-part-size = 1.

fn:format-number(0.2, '9e9') has expected result '2e-1' (not 0e0), minimum-integer-part-size = 1.

fn:format-number(0.2, '000.0e9') has expected result '200.0e-3' (not '002.0e-1'). minimum-integer-part-size = 3, and the number of significant digits in the integer part of the mantissa of the result is also 3 (even though the input 'adjusted number' only has one significant digit '2').

2. It appears that actually 4.7.5 rule 5 does not uniquely define the mantissa and exponent in the case that minimum-integer-part-size = 0. Consider fn:format-number(0.002, '.0e0'), for which minimum-integer-part-size = 0. Rule (b) allows (M,E) to be (0.002, 0) or (0.02, -1) or (0.2, -2) (as well as infinitely many other solutions). For each of these, the number of significant digits in the integer part of M is zero, so rule (c) can not choose between them. What should the expected result be? e.g. the options supplied give '.0e0' or '.0e-1' or '.2e-2'. The last one looks most meaningful here.

Also consider fn:format-number(0.002, '.000e0') We have the same options for (M,E), but now the results are '.002e0' or '.020e-1' or '.200e-2'. Which should be expected?

3. Again recall the definition of minimum-integer-part-size:
"It is normally set to the number of ·decimal digit family· characters found in the integer part of the sub-picture. But if the sub-picture contains no ·decimal digit family· character and no decimal-separatorXP31 character, it is set to one."

I think the last sentence should be changed to (A) "But if the *mantissa part* contains no ·decimal digit family· character and no decimal-separatorXP31 character, it is set to one.", to catch the case for the picture '#e9' (as well as '#', '###', etc).
e.g. fn:format-number(0.2, '#e9')
Note that the sub-picture contains a ·decimal digit family· character (though the mantissa part doesn't), so by the current definition minimum-integer-part-size = 0 (it is not set to one). We actually now hit the problem described in point 2. With the new definition minimum-integer-part-size is set to one, and you get the result '2e-1'.

In fact, as far as I can see, this last sentence was redundant before the introduction of formatting using exponential notation. The formatting rules in 4.7.5 would produce the desired results even without setting the minimum-integer-part-size to one for certain cases i.e. '#', '###', etc. (Compare to the analysis for '.#', for which the minimum-integer-part-size is not set to one.) The sentence *is* now necessary because I think you *do* want to set the minimum-integer-part-size to one in the case '#e9'.

What about '#.' and '#.suffix' and '#.e9'? Should these pictures really be valid? They are a bit odd, but are currently allowed. For each of these, minimum-integer-part-size=0 (by the old and suggested new definition), and by the formatting rules in 4.7.5 the first two produce results which I think are reasonable:
e.g. fn:format-number(0.2, '#.') result is '0'
fn:format-number(1.2, '#.') result is '1'
fn:format-number(0.2, '#.suffix') result is '0suffix'
fn:format-number(1.2, '#.suffix') result is '1suffix'

So finally, we get to fn:format-number(0.2, '#.e9'), test case numberformat135. If the picture '#.e9' is indeed supposed to be valid, I think you should expect the same results as for the picture '#e9'. Again minimum-integer-part-size should be set to one (in fact, we already thought this happened (me in the expected result, and you in the analysis in comment 4) but as described above, with the current definition this is not the case). Unfortunately, just dealing with this case requires further intricate tweaking of the definition, something like "But if the mantissa part contains no ·decimal digit family· character and no decimal-separatorXP31 character, or if the mantissa part is not the whole sub-picture but contains no ·decimal digit family· character and the fractional part is empty, it is set to one."). Then the expected result will be '2e-1'.

A better solution would be to change the definition to (B) "But if the mantissa part contains no ·decimal digit family· character and the fractional part is either empty or contains only passive characters, it is set to one." Which would mean that the minimum-integer-part-size is set to one for all of the pictures '#', '###', '#.', '#.suffix', '#.e9'; and you get (what I believe are) the right results.

Comment 6 O'Neil Delpratt 2015-07-31 10:00:38 UTC

In light of comment #5 I am changing this bug issue to a bug against the spec.

Comment 7 Debbie Lockett 2015-08-06 17:05:07 UTC

FYI, I have added a bunch of new tests to the format-number test set (numberformat201 - numberformat263) covering the examples I gave above, and more. Some of the results depend on the outcome of this bug, so I have labelled them as such.

Comment 8 Michael Kay 2015-09-09 17:51:56 UTC

A proposed resolution is at

https://lists.w3.org/Archives/Public/public-xsl-query/2015Sep/0008.html

Comment 9 Debbie Lockett 2015-09-10 09:33:04 UTC

Subtle amendment to the proposed adjustment to the rules for minimum-integer-part-size:

if minimum-integer-part-size and maximum-fractional-part-size are both zero, then:
	• if there is an exponent separator, set *minimum-fractional-part-size* (and maximum-fractional-part-size) to 1;
	• otherwise set minimum-integer-part-size to 1.

Corresponding amendment to 4.7.4 new rule:
if the effect of the above rules is that minimum-integer-part-size and maximum-fractional-part-size are both zero, then an adjustment is applied as follows: if an exponent separator is present then minimum-fractional-part-size is changed to 1 (one) (and so is maximum-fractional-part-size); otherwise minimum-integer-part-size is changed to 1 (one).

Then in the first example:
fn:format-number(0.2, '#.e9') => ".2e1"

(Explanation: min-int-part-size=0, scaling-factor=0. Initially max-frac-part-size=0, but it and min-frac-part-size get adjusted to 1 by the new rule. A '#' in the integer part of the picture has no effect except on grouping separators. But see the CODA, which would change the result to 0.2e1)

Comment 10 Tim Mills 2015-09-14 13:40:11 UTC

If the input to fn:format-number is xs:decimal, the scaling in rules in:

(4) If the sub-picture contains a percentXP31 character, the number is multiplied by 100. If the sub-picture contains a per-milleXP31 character, the number is multiplied by 1000. The resulting number is referred to below as the adjusted number.

(5) If the minimum exponent size is non-zero, then the adjusted number is scaled to establish a mantissa and an integer exponent. The mantissa and exponent are chosen such that (a) the primitive type of the mantissa is the same as the primitive type of the adjusted number (integer, decimal, float, or double), (b) the mantissa multiplied by ten to the power of the exponent is equal to the adjusted number, and (c) the number of significant digits in the integer part of the mantissa is equal to the minimum integer part size.

may exceed the implementation's size limit for xs:decimal.  Should this not be avoided?

Comment 11 Tim Mills 2015-09-14 15:10:01 UTC

Also consider format-number with a large double argument, e,g, 1.7976931348623157E+308 and the use of a percent-sign, or percent-mille multiplier which will cause the adjusted number to become infinity.

Comment 12 Michael Kay 2015-09-14 16:50:05 UTC

The fact that the percent and per-mille scaling can cause overflow (comment #11) has been with us since XSLT 1.0 and is a purely theoretical problem because no-one is likely to use percentages with such extreme values. There are edge cases that we really don't need to legislate for.

I don't think that decimal overflow due to mantissa/exponent scaling is a very serious problem either. I'm not sure our specs say so explicitly but I think that in practice xs:decimal is usually implemented as floating point decimal. The effect of format-number(), unless you write a really perverse picture, will almost invariably change the exponent to be something fairly close to zero and therefore very unlikely to blow any limits.

Comment 13 Michael Kay 2015-09-30 11:05:37 UTC

The Working Group accepted the proposal in comment #8 (including CODA part (b)) as amended by comment #9.

The separate issue raised in comments #10 and #11 remain outstanding, and the editor was asked to propose a resolution.

The proposed resolution (in 4.7.5) is:

(1) in rule 4, about multiplying by 100 or 1000, specify that when this causes overflow, the adjusted number is positive or negative infinity as appropriate

(2) in rule 3, about infinity, say "if the *adjusted* number is positive or negative infinity"

(3) move rule 3 after rule 4.

(I have applied these changes in anticipation.)

Comment 14 Tim Mills 2015-09-30 11:38:37 UTC

(In reply to Michael Kay from comment #13)
> The Working Group accepted the proposal in comment #8 (including CODA part
> (b)) as amended by comment #9.
> 
> The separate issue raised in comments #10 and #11 remain outstanding, and
> the editor was asked to propose a resolution.
> 
> The proposed resolution (in 4.7.5) is:
> 
> (1) in rule 4, about multiplying by 100 or 1000, specify that when this
> causes overflow, the adjusted number is positive or negative infinity as
> appropriate

Just to confirm, am I correct in thinking that if the input is an xs:decimal $d, the adjusted number is of type

1. xs:double if $d * 1000 would cause a numeric overflow error
2. xs:decimal otherwise.

Comment 15 Michael Kay 2015-09-30 11:51:13 UTC

>am I correct in thinking that if the input is an xs:decimal $d, the adjusted number is of type

1. xs:double if $d * 1000 would cause a numeric overflow error
2. xs:decimal otherwise.

Yes. Well, the rule says it's Infinity, and you could actually use either xs:double or xs:float infinity, both would produce the same result. I did think about whether I need to say more precisely what "multiply by 100" means, e.g. by referring to op:numeric-multiply, but it seems excessively paranoid.

Comment 16 Michael Kay 2015-10-06 17:03:41 UTC

The change in comment #13 (which had already been applied to the spec, and for which test cases are available) was approved by the WG today, so the bug is now resolved.

Comment 17 Tim Mills 2015-10-08 15:46:45 UTC

Sorry to reopen this, but I believe that under the new rules, the query

format-number(0, '.#')

which has

minimum-integer-part-size = 0
minimum-fractional-part-size = 0
maximum-fractional-part-size = 1

results in the empty string.

My intuition was that the following results would follow:

format-number(0,   '.#') -> .0
format-number(0.1, '.#') -> .1
format-number(1,   '.#') -> 1
format-number(1.1, '.#') -> 1.1

but I think my intuition might not be correct.

Comment 18 Michael Kay 2015-10-08 15:57:07 UTC

Isn't that a duplicate of bug #29164?

Comment 19 Tim Mills 2015-10-08 16:12:13 UTC

Sorry!  Thank goodness, so it is.