2644 – [F+O] Conversion from float/double to string

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2644 - [F+O] Conversion from float/double to string

Summary: [F+O] Conversion from float/double to string

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Functions and Operators 1.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Ashok Malhotra
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-01-05 15:00 UTC by Michael Kay
Modified:	2007-02-25 23:22 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Michael Kay 2006-01-05 15:00:54 UTC

The rules for conversion of a float or double to a string do not give an
unambiguous answer. The rule as stated in 17.1.2 is:

Beyond the one required digit after the decimal point in the mantissa, there
must be as many, but only as many, additional digits as are needed to uniquely
distinguish the value from all other values for the datatype after rounding the
final digit.

Most of this comes straight from XPath 1.0, except for the phrase "after
rounding the final digit". I've no idea what that phrase is supposed to mean.
Final digit of what, pray?

Having dealt with that, what does the rest mean?

The particular test case that gave rise to this bug report is casthcds14. This
generates a float value whose internal IEEE representation is x

Consider, for example, the float whose internal IEEE representation is
x58901723. The table below shows that this can be produced by parsing any string
in the range 1.26743230E15 to 1.26743243E15. What is the correct string
representation of this float value? The expected test results are stated as
1.26743233E15. Saxon (and Java) produce 1.26743237E15. The only way I can read
the spec, however, suggests that the result should be 1.2674323E15.

And even if we can decide what the rules really mean, is there a known efficient
algorithm for implementing them?

The Java rule is clear and unambiguous:

[Where m is the mantissa] Let n  be the unique integer such that 10^n <= m <
10^n+1; then let a be the mathematically exact quotient of m and 10^n so that 1
<= a < 10. The magnitude is then represented as the integer part of a, as a
single decimal digit, followed by '.' ('\u002E'), followed by decimal digits
representing the fractional part of a.

I would propose that we adopt the Java rules.


Table: 
Column 1: a string
Column 2: the internal (hexadecimal) representation of the IEEE float produced
by parsing this string as a float 
Column 3: the Java string representation of the float in column 2. As it
happens, in this sequence 7 digits after the decimal point are always enough to
distinguish the value. But will this always be the case, and how does one know?

1.26743200E15 = 58901720 = 1.26743196E15
1.26743201E15 = 58901720 = 1.26743196E15
1.26743202E15 = 58901720 = 1.26743196E15
1.26743203E15 = 58901720 = 1.26743196E15
1.26743204E15 = 58901721 = 1.2674321E15
1.26743205E15 = 58901721 = 1.2674321E15
1.26743206E15 = 58901721 = 1.2674321E15
1.26743207E15 = 58901721 = 1.2674321E15
1.26743208E15 = 58901721 = 1.2674321E15
1.26743209E15 = 58901721 = 1.2674321E15
1.26743210E15 = 58901721 = 1.2674321E15
1.26743211E15 = 58901721 = 1.2674321E15
1.26743212E15 = 58901721 = 1.2674321E15
1.26743213E15 = 58901721 = 1.2674321E15
1.26743214E15 = 58901721 = 1.2674321E15
1.26743215E15 = 58901721 = 1.2674321E15
1.26743216E15 = 58901721 = 1.2674321E15
1.26743217E15 = 58901722 = 1.26743223E15
1.26743218E15 = 58901722 = 1.26743223E15
1.26743219E15 = 58901722 = 1.26743223E15
1.26743220E15 = 58901722 = 1.26743223E15
1.26743221E15 = 58901722 = 1.26743223E15
1.26743222E15 = 58901722 = 1.26743223E15
1.26743223E15 = 58901722 = 1.26743223E15
1.26743224E15 = 58901722 = 1.26743223E15
1.26743225E15 = 58901722 = 1.26743223E15
1.26743226E15 = 58901722 = 1.26743223E15
1.26743227E15 = 58901722 = 1.26743223E15
1.26743228E15 = 58901722 = 1.26743223E15
1.26743229E15 = 58901722 = 1.26743223E15
1.26743230E15 = 58901723 = 1.26743237E15
1.26743231E15 = 58901723 = 1.26743237E15
1.26743232E15 = 58901723 = 1.26743237E15
1.26743233E15 = 58901723 = 1.26743237E15
1.26743234E15 = 58901723 = 1.26743237E15
1.26743235E15 = 58901723 = 1.26743237E15
1.26743236E15 = 58901723 = 1.26743237E15
1.26743237E15 = 58901723 = 1.26743237E15
1.26743238E15 = 58901723 = 1.26743237E15
1.26743239E15 = 58901723 = 1.26743237E15
1.26743240E15 = 58901723 = 1.26743237E15
1.26743241E15 = 58901723 = 1.26743237E15
1.26743242E15 = 58901723 = 1.26743237E15
1.26743243E15 = 58901723 = 1.26743237E15
1.26743244E15 = 58901724 = 1.2674325E15
1.26743245E15 = 58901724 = 1.2674325E15
1.26743246E15 = 58901724 = 1.2674325E15
1.26743247E15 = 58901724 = 1.2674325E15
1.26743248E15 = 58901724 = 1.2674325E15
1.26743249E15 = 58901724 = 1.2674325E15
1.26743250E15 = 58901724 = 1.2674325E15
1.26743251E15 = 58901724 = 1.2674325E15
1.26743252E15 = 58901724 = 1.2674325E15
1.26743253E15 = 58901724 = 1.2674325E15
1.26743254E15 = 58901724 = 1.2674325E15
1.26743255E15 = 58901724 = 1.2674325E15
1.26743256E15 = 58901724 = 1.2674325E15
1.26743257E15 = 58901725 = 1.26743264E15
1.26743258E15 = 58901725 = 1.26743264E15
1.26743259E15 = 58901725 = 1.26743264E15
1.26743260E15 = 58901725 = 1.26743264E15
1.26743261E15 = 58901725 = 1.26743264E15
1.26743262E15 = 58901725 = 1.26743264E15
1.26743263E15 = 58901725 = 1.26743264E15
1.26743264E15 = 58901725 = 1.26743264E15
1.26743265E15 = 58901725 = 1.26743264E15
1.26743266E15 = 58901725 = 1.26743264E15
1.26743267E15 = 58901725 = 1.26743264E15
1.26743268E15 = 58901725 = 1.26743264E15
1.26743269E15 = 58901725 = 1.26743264E15
1.26743270E15 = 58901725 = 1.26743264E15
1.26743271E15 = 58901726 = 1.26743277E15
1.26743272E15 = 58901726 = 1.26743277E15
1.26743273E15 = 58901726 = 1.26743277E15
1.26743274E15 = 58901726 = 1.26743277E15
1.26743275E15 = 58901726 = 1.26743277E15
1.26743276E15 = 58901726 = 1.26743277E15
1.26743277E15 = 58901726 = 1.26743277E15
1.26743278E15 = 58901726 = 1.26743277E15
1.26743279E15 = 58901726 = 1.26743277E15
1.26743280E15 = 58901726 = 1.26743277E15
1.26743281E15 = 58901726 = 1.26743277E15
1.26743282E15 = 58901726 = 1.26743277E15
1.26743283E15 = 58901726 = 1.26743277E15
1.26743284E15 = 58901727 = 1.2674329E15
1.26743285E15 = 58901727 = 1.2674329E15
1.26743286E15 = 58901727 = 1.2674329E15
1.26743287E15 = 58901727 = 1.2674329E15
1.26743288E15 = 58901727 = 1.2674329E15
1.26743289E15 = 58901727 = 1.2674329E15
1.26743290E15 = 58901727 = 1.2674329E15
1.26743291E15 = 58901727 = 1.2674329E15
1.26743292E15 = 58901727 = 1.2674329E15
1.26743293E15 = 58901727 = 1.2674329E15
1.26743294E15 = 58901727 = 1.2674329E15
1.26743295E15 = 58901727 = 1.2674329E15
1.26743296E15 = 58901727 = 1.2674329E15
1.26743297E15 = 58901727 = 1.2674329E15
1.26743298E15 = 58901728 = 1.26743304E15
1.26743299E15 = 58901728 = 1.26743304E15

Comment 1 Michael Kay 2006-01-17 21:10:07 UTC

After doing some further work on this, I'm going to propose three options.

Option A: don't prescribe the rounding rules, except that the value must be a
lexical representation of the original float or double (to allow round-tripping
without loss of precision)

Option B: specify that the value must be an exact decimal representation of the
original binary value: no rounding or truncation of significant digits is
allowed even if the resulting value would round-trip to the original value.
(This appears to be what Java does, as distinct from what it says it does.)

Option C: specify the rounding rules. Here is text that does that:

In 17.1.2, delete the text

"Besides these special values, the general form of the canonical form for
xs:float and xs:double is a mantissa, .......... the value from all other values
for the datatype after rounding the final digit."

replacing it with:

"For other values, the canonical representation consists of a minus sign '-'
(x2D) if the value is negative, followed by the magnitude m (absolute value),
represented as follows. Let n be the unique integer such that 10^n <= m <
10^(n+1); then let a be the mathematically exact quotient of m and 10^n so that
1 <= a < 10. The magnitude is then represented as the integer part of a, as a
single decimal digit, followed by '.' (x2E), followed by decimal digits
representing the fractional part of a, followed by the letter 'E' (x45),
followed by a representation of n as a decimal integer, as produced by the rules
for converting xs:integer to xs:string. Suppose that the string of decimal
digits that exactly represents the fractional part of a is S. So long as the
length of S is at least one, if rounding of the last digit in S results in a
string that is a lexical representation of the original xs:float or xs:double
value, then such rounding is performed; and this process is repeated. The
rounding is done up or down according to the rules of the
fn:round-half-to-even() function.

Note: I haven't reproduced the Java words exactly here, because I don't actually
think the Java words are precise. In any case, Java doesn't seem to do what the
spec says it should. (They actually use the same phrase "as many, but only as
many, more digits as are needed to uniquely distinguish" that appears in XPath
1.0, and then embellish them with a further explanation: but the explanation
doesn't seem to explain the results actually produced). I couldn't find a
description of the rules for C#, but the actual behaviour of C# appears to lose
precision - the result of round-tripping a float to a string and back is not
necessarily the original float, which is one of our objectives (but I don't know
the language well and might have missed something). My aim here is to reflect
the intent of the current words, and simply give a precise interpretation of
what I think they were probably intended to mean - not necessarily the only
interpretation possible.

Comment 2 Michael Kay 2006-01-26 15:57:30 UTC

The meeting on 24/1/2006 decided on option A, and I was asked to propose
concrete wording to implement this. Here is the proposed wording.

In F+), section 17.1.2, bullet 4 subbullet 3 add a new subbullet 1

* TV will be a string in the lexical space of xs:double or xs:float that when
converted to an xs:double or xs:float under the rules of section 17.1.1 produces
a value that is equal to SV. In addition, TV must satisfy the constraints in the
following sub-bullets.

In the current subsubbullet3, replace the current text including the Note that
follows it by:

* If SV is NaN, TV is the string "NaN".

* If SV is positive or negative infinity, TV is the string "INF" or "-INF"
respectively

* In other cases, the result consists of a mantissa, which has the lexical form
of an xs:decimal, followed by the letter "E", followed by an exponent which has
the lexical form of an xs:integer. Leading zeroes and "+" signs are prohibited
in the exponent. For the mantissa, there must be a decimal point, and there must
be exactly one digit before the decimal point, which must be non-zero. The  "+"
sign is prohibited. There must be at least one digit after the decimal point.
Apart from this mandatory digit, trailing zero digits are prohibited. 

Note:

The above rules allow more than one representation of the same value. For
example the xs:float value whose exact decimal representation is 1.26743223E15
might be represented by any of the strings "1.26743223E15", "1.26743222E15" or
"1.26743224E15" (inter alia). It is implementation-dependent which of these
representations is chosen.

Comment 3 Michael Kay 2006-01-27 11:44:09 UTC

There's a minor oversight in the proposed new first subbullet: change

TV will be a string in the lexical space of xs:double or xs:float that when
converted to an xs:double or xs:float under the rules of section 17.1.1 produces
a value that is equal to SV.

to read

TV will be a string in the lexical space of xs:double or xs:float that when
converted to an xs:double or xs:float under the rules of section 17.1.1 produces
a value that is equal to SV, or is NaN if SV is NaN.

Comment 4 Jim Melton 2007-02-25 23:22:55 UTC

Closing bug because commenter has not objected to the resolution posted and more than two weeks have passed.