This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5671 - [FO] Type promotion in fn:min and fn:max
Summary: [FO] Type promotion in fn:min and fn:max
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-30 13:59 UTC by Oliver Hallam
Modified: 2012-03-27 23:13 UTC (History)
1 user (show)

See Also:


Attachments

Description Oliver Hallam 2008-04-30 13:59:26 UTC
The summary for fn:min/fn:max says:
Selects an item from the input sequence $arg whose value is [less/greater] than or equal to the value of every other item in the input sequence

However further down in their summaries:
This function returns an item from the converted sequence rather than the input sequence.

Could this be worded more clearly.


Reading the rules for promotion:
Numeric and xs:anyURI values are converted to the least common type that supports the [le/ge] operator by a combination of type promotion and subtype 

From reading this I would say that if your input is a single value of type xs:unsignedShort, then you would return a value of type xs:integer, as this is "the least common type that supports the [le/ge] operator"; however the XQTS test K2-SeqMINFunc-15 seems to disagree with me here.  What is the correct behaviour?
Comment 1 Michael Kay 2008-05-20 15:56:15 UTC
This was considered by the WGs on 20 May 2008. It was noted that the resolution of bug #3358 proposed that the text should read "converted to their least common type by a combination of type promotion and subtype substitution. ". The actual text published was "converted to the least common type that supports the ge operator by a combination of type promotion and subtype substitution". It is not clear why the phrase "that supports the ge operator" was added, or what it was supposed to mean. The discussion of bug #3358 makes it clear that the intention of the WG was that the min() or max() of a sequence of hatsizes should be a hatsize.

I therefore propose to delete the phrase "that supports the ge operator".

Regarding the summary, I think it's a major piece of editorial work to ensure that all the function summaries are indeed summaries of the detailed rules for each function. I would like to tackle this for the 1.1 release, but I don't think piecemeal improvements by means of errata are appropriate in cases like this.
Comment 2 Michael Dyck 2008-05-20 20:10:58 UTC
So I believe the implication is that the XQTS test K2-SeqMINFunc-15 is correct.
That is,
    min(xs:unsignedShort(<e>1</e>)) instance of xs:unsignedShort
yields
    true

But what about something like
    min((xs:unsignedShort(<e>1</e>), xs:positiveInteger(<e>2</e>)))
    instance of xs:unsignedShort
?

The least common supertype of xs:unsignedShort and xs:positiveInteger is xs:nonNegativeInteger, so the text suggests that the two items in the input sequence are converted to xs:nonNegativeInteger by subtype substitution, yielding a 'converted sequence' consisting of two xs:nonNegativeInteger values, one of which is returned as the result of the fn:min call, which then fails the "instance of" test (i.e. yielding false for the whole expr).

However, XQuery 2.5.4 (SequenceType Matching) says:
    Subtype substitution does not change the actual type of a value.
    For example, if an xs:integer value is used where an xs:decimal
    value is expected, the value retains its type as xs:integer.
In terms of the example, if an xs:unsignedShort value is used where an xs:nonNegativeInteger is expected, the value retains its type as xs:unsignedShort. This suggests that the result of the fn:min call is actually an xs:unsignedShort value, and the "instance of" test yields true.
Comment 3 Michael Kay 2008-05-20 20:57:15 UTC
>The least common supertype of xs:unsignedShort and xs:positiveInteger is
xs:nonNegativeInteger, so the text suggests that the two items in the input
sequence are converted to xs:nonNegativeInteger

Yes, precisely.

>Subtype substitution does not change the actual type of a value

The phrase about conversion to the least common type is used in a number of places in the XPath/XQuery language specs. The formula used in some cases is "converted to the least common type reachable by a combination of type promotion and subtype substitution". Would you be more comfortable with that?

Incidentally, my interpretation of this rule is that it guarantees that the result will be an instance of xs:nonNegativeInteger. It does not say that the value might not also be an instance of some other type, such as xs:unsignedShort. Functions (and expressions generally) are always free to return a result that belongs to a subtype of the required type.
Comment 4 Tim Mills 2008-05-27 09:15:33 UTC
I believe that the upshot of this is that:

1. When no type promotion is required, we can always return an item (with its type unchanged) from the input sequence.  The static type of the function call will be the least common type of the input item types.

2. When type promotion is required, we can always return an item from the input cast as the promoted type (which will be xs:decimal, xs:float, xs:double or xs:string).  The static type of the function call will be the promoted type.

Comment 5 Michael Dyck 2008-07-08 00:21:59 UTC
[personal response:]

(In reply to comment #4)
> 
> 1. When no type promotion is required, we can always return an item (with its
> type unchanged) from the input sequence.

According to MKay's interpretation, you can. Perhaps some other interpretation says you can't. Anyhow, the user isn't guaranteed that you will. (You might return that item converted to a supertype.)

> The static type of the function call
> will be the least common type of the input item types.

Yes.

> 2. When type promotion is required, we can always return an item from the
> input cast as the promoted type (which will be xs:decimal, xs:float,
> xs:double or xs:string).

Any case in which the resultant type is xs:decimal wouldn't *require* type promotion (you could do it all with subtyping), so you could leave that one off the list.

> The static type of the function call will be the promoted type. 

Yes.
Comment 6 Michael Kay 2008-07-08 22:09:39 UTC
On 27 May 2008 (recorded in the minutes but sadly not here), the joint WGs decided to resolve this by changing the text (in fn:max() and fn:min()) from

Numeric and xs:anyURI values are converted to the least common type that supports the ge operator by a combination of type promotion and subtype substitution.

to

Numeric and xs:anyURI values are converted to the least common type reachable by a combination of type promotion and subtype substitution.

This decision was confirmed at the joint WG meeting on 7 July 2008.

I am marking this as resolved/fixed. Oliver, if you are content with this resolution, I would be grateful if you could mark the bug as closed.

(Incidentally, I think comments #4 and #5 are correct)
Comment 7 Michael Kay 2008-07-08 22:21:01 UTC
Will be the subject of erratum E27
Comment 8 Oliver Hallam 2008-07-09 14:31:56 UTC
I am marking this bug closed.

However this solution does have ramifications for formal semantics, and the typing rules should be updated (which are broken anyway - see bug #5459)
Comment 9 Michael Dyck 2008-07-19 08:04:46 UTC
On further reflection (working on Bug #5459), I don't think the new wording correctly captures our intent.

Consider a sequence containing both values matching xs:anyURI and values
matching xs:string. I believe the intent is that all of those values will
be converted to xs:string (when forming the "converted sequence"). But if
we say "numeric and xs:anyURI values are converted to the least common type
reachable by a combination of type promotion and subtype substitution",
then we'll look at just the xs:anyURI values, convert them to a common type
(some subtype of xs:anyURI, possibly xs:anyURI itself), and leave the
xs:string values untouched. Then, when we say "All items in $arg must be
numeric or derived from a single base type for which the ge/le operator is
defined", it fails, because the xs:anyURI values and xs:string values are
not derived from a single base type. [I'm assuming that where it says
"$arg", it actually means "converted sequence", otherwise other things
happen.]

...

Also, in that latter quoted sentence, the "numeric or" is unnecessary,
since all numeric values have already been converted to a common type,
which certainly qualifies as "derived from a single base type".

We say "... a single base type for which the ge operator is defined. In
addition, the values in the sequence must have a total order." But does the
second sentence actually add anything?

It's odd that we would require values of two subtypes of xs:integer to be
converted to a common type (because they're numeric values), but not
require values of two subtypes of (say) xs:date to be converted to a common
type. Wouldn't it be correct to say that *all* values are converted to a
common type, not just numerics and xs:anyURI? (If so, it's redundant to say
"all items ... must be ... derived from a single base type".)

And it's odd that we say "Duration values must either all be
xs:yearMonthDuration values or must all be xs:dayTimeDuration values",
since surely that's implied by the "derived from a single base type"
requirement.
Comment 10 Michael Kay 2008-07-19 08:56:03 UTC
>But if we say "numeric and xs:anyURI values are converted to the least common type
reachable by a combination of type promotion and subtype substitution",
then we'll look at just the xs:anyURI values, convert them to a common type
(some subtype of xs:anyURI, possibly xs:anyURI itself), and leave the
xs:string values untouched.

My reading of "least common type" was "least common type of all the items in the input sequence", not "least common type among the numeric and xs:anyURI values". As you say, that latter reading wouldn't make sense.

>"All items in $arg must be
numeric or derived from a single base type for which the ge/le operator is
defined", it fails, because the xs:anyURI values and xs:string values are
not derived from a single base type. [I'm assuming that where it says
"$arg", it actually means "converted sequence", otherwise other things
happen.]

I think that where it says $arg, it means $arg, and that it fails to capture the effective equivalence of xs:anyURI and xs:string.

>It's odd that we would require values of two subtypes of xs:integer to be
converted to a common type (because they're numeric values), but not
require values of two subtypes of (say) xs:date to be converted to a common
type. Wouldn't it be correct to say that *all* values are converted to a
common type, not just numerics and xs:anyURI?

Yes, it's a bit odd, but not odd enough to require a 1.0 change that will impact existing implementations.

>And it's odd that we say "Duration values must either all be
xs:yearMonthDuration values or must all be xs:dayTimeDuration values",
since surely that's implied

It's not unusual, unfortunately, for the F+O spec to say things more than once in different ways.
Comment 11 Michael Dyck 2008-07-19 09:56:20 UTC
(In reply to comment #10)
>
> My reading of "least common type" was "least common type of all the items in
> the input sequence",

Ah, I see. Well, I think that's a sufficiently non-obvious reading that it should be made explicit.

> I think that where it says $arg, it means $arg, and that it fails to capture
> the effective equivalence of xs:anyURI and xs:string.

Okay, so that's a mistake, right? Also, it fails to capture the exception for xs:untypedAtomic (i.e., you can have xs:untypedAtomic values in $arg even though they're neither numeric nor derived from a type for which the ge operator is defined).

Is it intended that $collation be ignored for comparison of xs:anyURI values?

> >It's odd that we would require values of two subtypes of xs:integer to be
> converted to a common type (because they're numeric values), but not
> require values of two subtypes of (say) xs:date to be converted to a common
> type. Wouldn't it be correct to say that *all* values are converted to a
> common type, not just numerics and xs:anyURI?
> 
> Yes, it's a bit odd, but not odd enough to require a 1.0 change that will
> impact existing implementations.

I'm not clear on how it would affect an existing implementation. If (for the example above) an implementation returns a subtype-of-date value that hasn't been converted to the common type, that would still be conformant, under the interpretation you gave in Comment #3.
Comment 12 Michael Dyck 2009-01-09 20:07:13 UTC
Reopening, which I probably should have done at comment #9,
as the points I raised then and since haven't been resolved yet.
Comment 13 Michael Kay 2009-02-10 17:04:24 UTC
The WG agreed subject to detailed wording that we need to fix the sentence

"All items in $arg must be numeric or derived from a single base type for which the ge operator is defined."

so that a sequence containing a mix of xs:string and xs:anyURI is acceptable.
Comment 14 Michael Kay 2009-02-14 21:21:45 UTC
Erratum E47 has been drafted to reflect this decision. It changes the wording of the relevant paragraph from "All items in $arg must be numeric or ..." to "All items in the converted sequence must be numeric or ...".
Comment 15 Michael Kay 2009-02-16 10:34:27 UTC
The problem noted in comment #9 also affects two other sentences.

The sentence "All items in $arg must be numeric or derived from a single base type for which the ge operator is defined." should be changed to "All items in the converted sequence must be derived from a single base type for which the ge operator is defined." (There is no need to mention numerics as a special case any more, since if they are numerics the condition will automatically be satisfied).

The paragraph 

"If the items in the value of $arg are of type xs:string or types derived by restriction from xs:string, then the determination of the item with the largest value is made according to the collation that is used. If the type of the items in $arg is not xs:string and $collation is specified, the collation is ignored." 

should change to:

"If the items in the converted sequence are of type xs:string or types derived by restriction from xs:string, then the determination of the item with the largest value is made according to the collation that is used. If the type of the items in the converted sequence is not xs:string and $collation is specified, the collation is ignored."

I am revising the draft E47 accordingly.

Comment 16 Michael Dyck 2009-02-24 08:48:48 UTC
Those changes are improvements, but I think there's still the problem,
raised in comment #9, of the wording:
    Numeric and xs:anyURI values are converted to the least common type...
The question is: the least common type of what?  I claim that a plausible
answer is:
    the least common type of the numeric and/or xs:anyURI values
    in the input sequence
(i.e., the values identified in the sentence's subject), which leads to
unintended results. In comment #10, you say that your reading is:
    the least common type of all the items in the input sequence
which I say is a non-obvious reading.

Moreover, I believe the wording still indicates that, if you call the
function with a sequence of xs:anyURI values and a collation, the values
are compared using the default collation, not the supplied collation.
I'm still wondering if that's intended.
Comment 17 Michael Dyck 2009-03-15 01:59:56 UTC
Here is a specific proposal to resolve the remaining concerns expressed in
the previous comment.

In the second bullet, change
    * Numeric and xs:anyURI values ...
to just
    * Numeric values ...

and instead, handle xs:anyURI values in a new (second) bullet:
    * Values of type xs:anyURI are promoted to xs:string.

With that, I believe the question of "the least common type of what?"
becomes moot. (That is, you get the same result whether you think it means
"the least common type of the numeric values" or "the least common type
of all values".)

Also, it ensures that xs:anyURI values in the input sequence will
(due to their promotion to xs:string) be subject to the paragraph re
collation-aware comparisons. (I assume that's what we intended.)
Comment 18 Michael Kay 2009-03-18 20:45:10 UTC
The change in comment #17 has been added to the draft erratum E47, as decided by the WG yesterday. Note that with the splitting of the "promotion" bullet into two, it is no longer to specify that promotion is to a type having a le operator, since all numeric types have such an operator.
Comment 19 Michael Kay 2012-03-27 23:13:59 UTC
I believe that all the points made in this lengthy discussion were addressed in the published Second Edition and that the bug can now be closed.