This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29555 - [FO31] 9.8.4.2 The Width Modifier
Summary: [FO31] 9.8.4.2 The Width Modifier
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-04 13:44 UTC by Tim Mills
Modified: 2016-12-16 19:55 UTC (History)
4 users (show)

See Also:


Attachments

Description Tim Mills 2016-04-04 13:44:44 UTC
In 9.8.4.2 The Width Modifier, we find

"A format token containing more than one digit, such as 001 or 9999, sets the minimum and maximum width to the number of digits appearing in the format token; if a width modifier is also present, then the width modifier takes precedence."

Suppose the format token in "#,###,000".  According to the above, this sets the minimum and maximum width to 3.  Is this intentional?
Comment 1 Michael Kay 2016-04-12 14:49:50 UTC
Yes, this needs clarification. I don't think we thought about this very carefully when migrating format-date/time from XSLT to F+O (the original referred to xsl:number rather than format-integer, and xsl:number does not allow grouping separators or optional digit signs in the picture).

I propose to change:

<old>
A format token containing more than one digit, such as 001 or 9999, sets the minimum and maximum width to the number of digits appearing in the format token; if a width modifier is also present, then the width modifier takes precedence.
</old>

to

<new>
The presentation modifier implicitly defines a minimum and maximum width (for example, the modifier 001 has a minimum and maximum width of 3, while #'##9 defines a minimum width of 1 and a maximum of 5. If a width modifier is also present, then the width modifier takes precedence. Special rules apply to the year, fractional seconds, and timezone components: see below. Values are truncated to the maximum width only in cases where the meaning is still clear.
</new>

There are some non-trivial cases to consider:

* fractional seconds. We made a decision to treat a single-digit token specially: [f1] means as many digits of precision as the implementation chooses. What about things like [f##1] or [f9.99]? I propose: 

(a) in the fractional seconds component, an optional-digit-sign is not allowed to precede a digit sign; in all other components, an optional-digit-sign is not allowed to follow a digit sign.

(b) if the first presentation modifier in the fractional seconds component consists of a single digit, this does not imply a minimum and maximum width of one; rather it implies a minimum width of one and a maximum that is implementation-dependent.

* years. It seems fairly intuitive that [Y##99] should output the year of death of Augustus as "14", the death of Constantine as "337", and the current year as "2016". But 2016 is formatted as "16" if you specify either [Y99] or [Y9,2-2]. So the rule is that truncation of significant high-order digits occurs only for the year component, and it occurs if the maximum width has been set either implicitly (using the first presentation modifier) or explicitly (using the width modifier).

* timezones. We've got a very detailed set of examples for timezones, but they don't include any with a '#'. I think the basic rule (as with integer-valued components like months and seconds is that '#' is allowed in a leading position but essentially has no effect.

This all suggests the need for some more test cases...
Comment 2 Tim Mills 2016-04-12 14:55:17 UTC
Regarding fractional seconds, it's not entirely clear whether the function formats a decimal value, or an integer value corresponding to the number of milliseconds (or microseconds, or some other division of time based on the number of decimal digits supported by the implementation's xs:time representation).

I had this drafted out to send when you responded on this bug...

Consider the formatting of fractional seconds. e.g.

format-time(xs:time('00:00:00.001'), '[f1]')


In this case the fractional seconds is 0.001.  Suppose that the implementation supports 6 decimal digits in the representation of fractional seconds (i.e. microseconds).  So in this case 0.001 is represented by the integer 1000 (i.e. 0.001 * 10^6).  

I don't think the specification is entirely clear as to what the full representation of this value is.  If it were treated like other integers, it would be 1000, however I suspect the intended answer is 001.

It's not clear how the full representation of the fractional seconds responds to foramts such as "[f0/0]".  This appears to set a min-width and max-width of 2 characters.   In the case of the fractional seconds component, the value is rounded to the specified size as if by applying the function round-half-to-even(fractional-seconds, max-width).  So we need to compute the full representation of round-half-to-even(1000, 2), i.e. 1000.  Is the full representation 1/0/0/0, or 0/0/1?  Either way, it's still bigger than the requested minimum width.

If the input were smaller, such that the full representation was less than the min-width, the specification advises adding zero digits in the case of the fractional seconds component.  This would ignore any gropuing specifiers.  Is that intentional?
Comment 3 Michael Kay 2016-04-19 09:42:40 UTC
Yes, the rules for fractional seconds are seriously under-specified.

It really doesn't make sense to allow [f###000], rather it makes sense to allow [f000###].

I think we probably need a separate section for formatting fractional seconds, rather as we introduced a new section for formatting timezones. Since grouping separators in the fractional part are so badly broken at the moment as to be effectively useless, I propose to disallow them. This suggests the following text:

9.8.4.4 Formatting Fractional Seconds

Special rules apply to the formatting of timezones. When the component specifier f is used, the rules in this section override any rules given elsewhere in the case of discrepancies.

For the fractional seconds component:

* the first presentation modifier must take the form \d+#* (that is, one or more digits followed by zero or more '#' signs). All digits must be from the same decimal digit family.

* the second presentation modifier must be absent.

* the formatting is controlled by three parameters: the decimal digit family, the
minimum width, and the maximum width.

** The decimal digit family is the Unicode digit family used for the digits in the first presentation modifier

** If a width modifier is present, then the minimum and maximum width are determined by the width modifier as described in ยงยงยง.

** Otherwise (when no width modifier is present):

*** if the first presentation modifier consists of a single digit, then the minimum width is one and the maximum width is infinity.

*** otherwise, the minimum width is the number of digits in the first presentation modifier, and the maximum width is the total number of characters in the first presentation modifier

The actual value of the fractional seconds in the input value is then formatted as follows:

1. Let V be the value of seconds-from-dateTime(xs:dateTime($value)) mod 1.

2. Let W be the value of round-half-to-even(V, M) where M is the maximum width.

3. Cast W to a string and take the substring after the decimal point.

4. Add trailing zeroes if necessary to pad the value to the minimum width.

5. Replace all digits by corresponding digits from the selected decimal digit family.
Comment 4 Michael Kay 2016-04-19 15:22:46 UTC
Noted on the call that we need to clarify that the minimum and maximum width include grouping separators.
Comment 5 Josh Spiegel 2016-04-20 19:15:11 UTC
Does it prohibit the minimum from being greater than the maximum anywhere?  e.g. [f,4-2] 

The rules in comment 3 still work in this case but I am not sure what the point would be.  I suppose this should be an error.
Comment 6 Josh Spiegel 2016-04-20 19:25:00 UTC
Typo: "Special rules apply to the formatting of timezones."

timezones -> fractional seconds
Comment 7 Josh Spiegel 2016-04-27 15:35:47 UTC
On the last call, I think the decision was that grouping separators will contribute to the effective maximum width but will not contribute to the minimum width.  And comment 1 above proposes ".. while #'##9 defines a minimum width of 1 and a maximum of 5". And then "... rule is that truncation of significant high-order digits occurs only for the year component, and it occurs if the maximum width has been set either implicitly (using the first presentation modifier) or explicitly (using the width modifier)."

This is also consistent with the specification which says "In the case of the year component, setting max-width requests omission of high-order digits from the year, for example, if max-width is set to 2 then the year 2003 will be output as 03."

So putting this all together, I would expect this query:

  declare variable $dt := xs:date("2014-01-01");
  format-date($dt, "[Y0-0]")

To produce:

  "2-0-1-4"

Since the maximum is 3.

I would then expect this query:

  declare variable $dt := xs:date("2014-01-01");
  format-date($dt, "[Y00]")

To produce:

  "14"

Since the maximum is 2.

I don't think it makes sense that grouping separators should impact the decision to truncate the year.  Or am I missing something?
Comment 8 Abel Braaksma 2016-04-29 22:26:12 UTC
A few additional observations that I think fit the scope of this bug-report (while supporting the proposed solutions):

(1)
Typo in the 2nd of the last three bullets of "9.8.4.3 The Width Modifier", s/should/SHOULD/, i.e. replace:

   "For timezone offsets this should be done by"

with:

   "For timezone offsets this SHOULD be done by"

(2)
Also, that same paragraph suggests that:

[ZN,5] with tz 00:00 as "GMT:00"
[ZN,8] with tz 00:00 as "000GMT:00"
[Z00,6] with tz as 05:30 as "005:30" (this seems the intended meaning)
[ZZ,8-*] with tz as 05:00 as "0000R:00

I think this can be fixed by specifying, just like the first bullet point, that it only applies when numeric tz representations are requested. Perhaps by starting the paragraph with:

   "For numerical representations of timezone offsets, this SHOULD be [...]" 

(3)
And one more nitpick: the paragraph just before this section may be more correct or complete if it added "by choosing one of the following", i.e.:

   "If the full representation of the value is shorter than the specified 
   minimum width, then the processor should pad the value to the specified 
   width by choosing one and only one of the following options:"

(this prevents the overlap between "numerical presentation of numbers" and "timezone offsets")
Comment 9 Abel Braaksma 2016-04-29 22:28:32 UTC
(In reply to Abel Braaksma from comment #8)
> [ZN,5] with tz 00:00 as "GMT:00"
> [ZN,8] with tz 00:00 as "000GMT:00"
Apparently I have limited counting skills, these examples should of course be:

[ZN,6] with tz 00:00 as "GMT:00"
[ZN,9] with tz 00:00 as "000GMT:00"
Comment 10 Michael Kay 2016-05-03 10:30:37 UTC
ACTION A-641-02 (bug 29555) MK to propose a precise algorithm for computing min and max width of a format-date component; also define an error when min>max. 

I propose the following rules, which apply when

(a) the component in question is an ordinary integer-valued component (specifically, one of MDdFWwHhms), and

(b) the first presentation modifier takes the form of a decimal digit pattern

1. If there is no width modifier, then the value is formatted according to the rules of the format-integer function.

2. If there is a width modifier, then the first presentation modifier is adjusted as follows:

2.a The number of mandatory-digit-sign characters in the presentation modifier is increased if necessary. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the right, and then prepending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier if there are any, or ASCII digits otherwise.

2.b The maximum width, if specified, is ignored.

The rules for the Year component (Y) are the same except that the year as output is the value of the year component of the supplied value modulo ten to the power N where N is determined as follows:

(i) if the width modifier is present and includes a maximum width, then that maximum width

(ii) otherwise, if the first presentation modifier takes the form of a decimal-digit-pattern, then the number of optional-digit-signs and mandatory-digit-signs in that decimal-digit-pattern.

Finally: we add a rule that if there is a width modifier that specifies both a minumum and maximum width, then FOFD1340 is reported if the minimum is greater than the maximum.
Comment 11 Josh Spiegel 2016-05-04 02:38:17 UTC
In comment 10, I think we may need a special case related to (ii).  If the pattern consists of a single mandatory digit, the year value should not be truncated.  e.g. [Y1] formats this year as "2016" and not "6".
Comment 12 Andrew Coleman 2016-05-06 09:45:11 UTC
At the meeting on 2016-05-03, the WG decided to adopt the proposal in comment #10.  Action A-642-01 was raised to track this.
Comment 13 Benito van der Zander 2016-05-10 14:49:02 UTC
From #29616:

Now the truncation does not seem to be consistent between 

  format-time(xs:time("12:22:33.123456"), "[f00,2]")
and
  format-date(xs:date('2016-01-01'), '[Y00,2]')
Comment 14 Benito van der Zander 2016-05-10 15:33:24 UTC
>Since grouping separators in the fractional part are so badly broken at the moment as to be effectively useless, I propose to disallow them. 

But that is where they seem to be the most useful

Consider 1.234567890123 seconds. Writing it as 1.234 567 890 123 makes it much more readable.

In the other components, what would you want to do? Write December as 1.2 ?
Comment 15 Michael Kay 2016-06-17 16:40:09 UTC
Here is a revised proposal that attempts to take into account comments on the previous effort.

ACTION A-641-02 (bug 29555) MK to propose a precise algorithm for computing min and max width of a format-date component; also define an error when min>max.

ACTION A-643-01: MikeK to update proposal on Bug 29555.  

The rules are defined separately for different kinds of component.

(A) For ordinary integer-valued components (specifically, one of MDdFWwHhms)

(A.1) If the first presentation modifier takes the form of a decimal digit pattern:

A.1.1. If there is no width modifier, then the value is formatted according to the rules of the format-integer function.

A.1.2. If there is a width modifier, then the first presentation modifier is adjusted as follows:

A.1.2.a If the decimal digit pattern includes a grouping separator, the output is implementation-defined (but this is not an error). [Note: use of a width modifier together with grouping separators is inadvisable for this reason. It is never necessary to use a width modifier with a decimal digit pattern, since the same effect can be achieved by use of optional digit signs]

A.1.2.b Otherwise, the number of mandatory-digit-sign characters in the presentation modifier is increased if necessary. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the right, and then prepending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier if there are any, or ASCII digits otherwise.

A.1.2.c The maximum width, if specified, is ignored.

A.1.2.d The output is then as defined using the format-integer function with this adjusted decimal digit pattern.

(A.2) If the first presentation modifiers is one of N, n, or Nn:

Let FN be the full name of the component, that is, the form of the name that would be used in the absence of any width modifier.

If FN is shorter than the minimum width, then it is padded by appending spaces to the end of the name.

If FN is longer than the maximum width, then it is abbreviated, either by choosing a conventional abbreviation that fits within the maximum width (for example, "Wednesday" might be abbreviated to "Weds"), or by removing characters from the end of F until it fits within the maximum width.

(A.3) For other presentation modifiers: Any adjustment of the value to fit within the requested width range is implementation-defined. The value should not be truncated if this results in output that will not be meaningful to users (for example, there is no sensible way to truncate Roman numerals). If shorter than the minimum width, the value should be padded to the minimum width, either by appending spaces, or in some other way appropriate to the numbering scheme.

(B) The rules for the Year component (Y) are the same as those in (A) above, except that the value of the year as output is the value of the year component of the supplied value modulo ten to the power N where N is determined as follows:

(i) if the width modifier is present and includes a maximum width, then that maximum width, or 2, whichever is greater.

(ii) otherwise, if the first presentation modifier takes the form of a decimal-digit-pattern, then the number of optional-digit-signs and mandatory-digit-signs in that decimal-digit-pattern, or 2, whichever is greater

For example: given the year 2016, [Y0001] produces "2016"; [Y01] produces "16"; [Y1] produces "2016"; [Y1,2-2] produces "16"; [Yi] produces "mmxvi"; [Yi,2-2] produces "xvi".

(C) The output for the fractional seconds component (F) is equivalent to the result of the following algorithm: 

C.1 If the first presentation modifier contains no Unicode digit, then the output is implementation-defined.

C.2 Otherwise, the value of the fractional seconds is output as follows: 

C.2.1. the sequence of characters in the first presentation modifier is reversed (For example, "999'###" becomes "###'999"). If the result is not a valid decimal digit pattern, then the output is implementation-defined.

C.2.2. the sequence of digits in the conventional decimal representation of the fractional seconds component is reversed, with insignicant zeroes removed, and the result is treated as an integer. For example, if the seconds value is 25.8235, the reversed fractional seconds value is 5328.

C.2.3. The reversed fractional seconds value is formatted using the the reversed decimal digit pattern according to the rules in (A) above, taking into account any width modifiers. Given the examples above, the result is "5'328"

C.2.4. The resulting string is reversed. In our example, the result is "823'5".



Finally: we add a rule that if there is a width modifier that specifies both a minumum and maximum width, then FOFD1340 is reported if the minimum is greater than the maximum.
Comment 16 Michael Kay 2016-06-20 07:43:23 UTC
I think a refinement is needed to the proposed rules for fractional seconds: the value should be truncated to the maximum width, which does not happen for integers. This needs an additional rule:

C.2.5 If the result contains more digits than the number of digit symbols in the decimal digit pattern, then excess digits are removed from the right hand end. Any grouping separator that immediately precedes a removed digit is also removed.
Comment 17 Michael Kay 2016-06-21 15:31:03 UTC
The proposal in comments #15 and #16 was accepted.