This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5459 - [FS] Static type analysis for the fn:abs, fn:ceiling, fn:floor, fn:round, and fn:round-half-to-even functions
Summary: [FS] Static type analysis for the fn:abs, fn:ceiling, fn:floor, fn:round, and...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Formal Semantics 1.0 (show other bugs)
Version: Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Dyck
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-07 13:57 UTC by Oliver Hallam
Modified: 2008-08-28 00:59 UTC (History)
2 users (show)

See Also:


Attachments

Description Oliver Hallam 2008-02-07 13:57:14 UTC
The static type analysis rules for fn:abs, fn:ceiling, fn:floor, fn:round, and fn:round-half-to-even are defined in section 7.2.3:

The second sentence reads:
Note that the fact that the type should be the least is cannot expressed by the inference rule notation used here.

Firstly, the phrase "least is cannot expressed by" makes no sense - I assume we are looking for the least type satisfying the inference rule.

Consider an argument of type numeric(), then the only type satisfying the rule is xs:double.  If the argument has (for example) an xs:integer value, the return type is xs:integer which disagrees with this inferred type.

It seems that typing rules similar to those in section C.2 need to be applied in this case.
Comment 1 Oliver Hallam 2008-02-07 15:00:07 UTC
A similar flaw exists in the type checking rules for fn:min, fn:max, fn:avg and fn:sum as defined in 7.2.10 of formal semantics.

If for example fn:min is applied to a sequence of values typed numeric, then the static type is inferred by these rules to be xs:double, and not xs:numeric.
Comment 2 Michael Dyck 2008-02-12 03:36:18 UTC
(In reply to comment #0)
>
> The second sentence reads:
> Note that the fact that the type should be the least is cannot expressed by
> the inference rule notation used here.
> 
> Firstly, the phrase "least is cannot expressed by" makes no sense

I agree, but that was already raised in Bug 1756 Comment #3, and fixed in FS erratum E014, by changing "is cannot expressed" to "is not expressed".

> I assume we are looking for the least type satisfying the inference rule.

Yes, where "least" means "lowest on the type promotion ladder".

> Consider an argument of type numeric(), then the only type satisfying the rule
> is xs:double.

I don't understand what you mean. As far as I can tell, there is no type numeric(). Or xs:numeric.

In the F+O spec, the functions in question are "declared" with a parameter type of "numeric?", but section 1.4 explains that this is just a shorthand for four declarations, each with a specific atomic type.
Comment 3 Oliver Hallam 2008-02-13 15:00:33 UTC
When I said xs:numeric I was referring to a union of the numeric types, which is how I had been (wrongly) treating the word "numeric" in the F+O declarations, and I agree that the rules for fn:abs, fn:ceiling, fn:floor, fn:round, and
fn:round-half-to-even are fine.

The functions fn:min, fn:max, fn:avg and fn:sum however are defined for xs:anyAtomicType*, and so this problem still occurs:
If you have an argument of type (xs:integer | xs:double)*, then (as far as I can tell) the static analysis rules for these functions imply a return type of xs:double, when it could also be xs:integer.
Comment 4 Michael Dyck 2008-02-15 09:43:34 UTC
Ah, I see. So, for example, consider:
    fn:min( if (fn:true()) then (1,2) else (3e0, 4e0) )
The argument's static type is
    (xs:integer,xs:integer) | (xs:double,xs:double)
whose prime type is
    xs:integer | xs:double
and the least type that both can promote to is xs:double, so the inferred static type for the whole is xs:double. However, we know that the expression's value is the integer 1, which does not match the type xs:double, so type soundness is broken.

One approach to fixing this might be the following: After computing the prime type and pushing it through convert_untypedAtomic, consider all possible combinations of its item types. For each such combination, find the least common type (by subtype substitution and numeric promotion). Then take the union of all the least types and 'multiply' it by aggregate_quantifier(...).

So, for the example above, there are three combinations:
    xs:integer
    xs:double
    xs:integer xs:double
The least common types are
    xs:integer
    xs:double
    xs:double
respectively. And the union of those three types (i.e., xs:integer | xs:double) is the inferred static type for the call.
Comment 5 Oliver Hallam 2008-02-15 10:44:27 UTC
I think it may be sufficient just to consider the types making up the union of the prime type, and not all combinations thereof.

With all these functions it is only possible to get a particular return type if one of the arguments is of that type - for example fn:min only returns xs:double if at least one argument is of type xs:double (or a type deriving from xs:double)

This way the static typing rules are directly analogous to those given in section C.2 for overloaded functions.  If you view for example fn:sum as a series of calls to fs:plus (which as far as I can tell is semantically the same) then it makes sense to apply the same static typing rules.
Comment 6 Michael Dyck 2008-02-16 05:09:07 UTC
(In reply to comment #5)
> I think it may be sufficient just to consider the types making up the union of
> the prime type, and not all combinations thereof.

I'm not sure exactly what you're suggesting. E.g, at first I thought you meant this:
    Get the prime type, call convert_untypedAtomic, then for each
    item type, find the least type (in the set of target types)
    that it can be promoted to, and take the union of those types.
That would work for my example, but in general it doesn't ensure that the argument types are compatible. E.g., it should (but doesn't) fail to infer a type when the arg type is (integer|date)*.

> With all these functions it is only possible to get a particular return type
> if one of the arguments is of that type - for example fn:min only returns
> xs:double if at least one argument is of type xs:double (or a type deriving
> from xs:double)

(or xs:untypedAtomic) Also, that observation isn't quite true for fn:avg -- it returns xs:decimal even when all the args are xs:integer.

> This way the static typing rules are directly analogous to those given in
> section C.2 for overloaded functions.

The rules in C.2 should be taken with a grain of salt. They have some bugs that haven't been reported yet.
Comment 7 Oliver Hallam 2008-03-31 13:31:34 UTC
I have had another look at the specification and I have convinced myself again that the static type analysis of abs, ceiling, round and round-half-to-even is wrong, due to the way that function calls are bound.

Consider the following query:
abs(
  remove( (xs:integer("-1"), xs:float("-2.0")), 2 )
) instance of xs:float

The spec says [XQuery 2.2.3.1]:
 "[During the static analysis phase,] The static context is used to resolve ... function names ... If a name of one of these kinds in the operation tree is not found in the static context, a static error ([err:XPST0008] or [err:XPST0017]) is raised."

Along with the rules given in Formal Semantics 4.1.5 this says that during static analysis, only a type is assigned to the function call (in this case the type assigned is xs:float through the rules in 7.2.3), and that the function itself is chosen when the call is evaluated and in this case the version operating on xs:floats is selected, and so the argument is cast to a float and the value of the query is true.

However, during static analysis it is acceptable to perform rewritings that do not affect the value of expressions, so when static analysis of the fn:abs call is performed the query could look like this:

abs( xs:integer(-1) ) instance of xs:float

Applying the same rules selects the xs:integer version of the function and hence the query evaluates to false.

I suggest that the static type of the function (in the first case) should have been inferred as xs:integer | xs:float.  This however leaves the selection of the function used ambiguous, and so addition judgments would be required dynamically to select the function used (similar to the current static rules).

This seems to mirror the behavior (although probably not the mechanism) used by both our implementation and Saxon.
Comment 8 Michael Dyck 2008-04-01 03:56:11 UTC
(In reply to comment #7)
>
> Consider the following query:
> abs(
>   remove( (xs:integer("-1"), xs:float("-2.0")), 2 )
> ) instance of xs:float
> 
> ...
> Along with the rules given in Formal Semantics 4.1.5 this says that during
> static analysis, only a type is assigned to the function call (in this case
> the type assigned is xs:float through the rules in 7.2.3),

Agreed.

> and that the function itself is chosen when the call is evaluated
> and in this case the version operating on xs:floats is selected,
> and so the argument is cast to a float

I'm not sure what leads you to that conclusion. The only suggestion I could find that there even *are* multiple "versions" of fn:abs is in F+O section 1.4, where it says that a signature involving 'numeric' represents four signatures. But the XQuery and FS specs are fairly clear that fn:abs has only one signature and one implementation. As far as I can tell, they agree that the argument to fn:abs is not cast to anything, and so the result is also an xs:integer.
 
> I suggest that the static type of the function (in the first case) should have
> been inferred as xs:integer | xs:float.

Yes, I agree, that needs to be fixed.

> This however leaves the selection of the function used ambiguous,

There I disagree. The static type inferred for a function call does not influence its dynamic evaluation.
Comment 9 Oliver Hallam 2008-04-01 10:50:12 UTC
> The only suggestion I could find that there even *are* multiple "versions" of
> fn:abs is in F+O section 1.4, where it says that a signature involving
> 'numeric' represents four signatures. But the XQuery and FS specs are fairly
> clear that fn:abs has only one signature and one implementation.

I had presumed from F&O 1.4 that there are four separate fn:abs functions (or at least four separate signatures that are chosen from at compile time).  Which parts of the specification imply otherwise?

If it is the case that fn:abs has four signatures then the rules given in FS 4.1.5 under "Dynamic Evaluation" deduces four different results for an argument of type xs:integer (since it can be promoted to match any signature).

Should this be marked as a bug against Functions & Operators?
Comment 10 Michael Dyck 2008-04-01 20:02:33 UTC
(In reply to comment #9)
> 
> I had presumed from F&O 1.4 that there are four separate fn:abs functions
> (or at least four separate signatures that are chosen from at compile time).
> Which parts of the specification imply otherwise?

XQuery
2.1.1 Static Context
Function signatures:
    Each function is uniquely identified by its expanded QName and its arity
    (number of parameters).
(Echoed in C.1 Static Context Components.)

FS
3.1.1 Static Context
statEnv.funcType:
    Because [XPath/XQuery] allows multiple functions with the same name
    differing in the number of parameters, this environment component maps
    an expanded QName and an arity to a function signature FunctionSig.

So where 4.1.5 / STA / rules 3+4 and 4.1.5 / Dyn Ev / rules 1+2 consult
statEnv.funcType for a given FunctionCall (with specific expanded-QName and
arity), it yields exactly one function signature (or else fails, if the
function name is undefined). There's no way to bring four signatures into
the picture.

> Should this be marked as a bug against Functions & Operators?

I don't think this Bugzilla bug should have its Component changed to F+O, if
that's what you're suggesting.  But it might be worth a new bug.
Comment 11 Tim Mills 2008-04-02 10:36:39 UTC
We think we have a solution for min/max etc.  Here's the rules for min.  We think the rules could be simplified, but as presented I hope it makes things clear.

1. The simple case of the empty sequence.

statEnv |- Type = empty
------------------------------------------------
statEnv |-  (FN-URI,"min")(Type) : empty


2. Consider prime atomic types.  The set of types in T are those for which ge(T, T) is defined.

statEnv |- Type2 = convert_untypedAtomic(Type, xs:double),
           Type2 <: T
           T in { xs:integer, xs:decimal, xs:float, xs:boolean,
                  xs:boolean, xs:string, 
                  xs:date, xs:time, xs:dateTime, 
                  xs:yearMonthDuration, xs:dayTimeDuration,
                  xs:anyURI }
------------------------------------------------
statEnv |-  (FN-URI,"min")(Type) : T


Rules for a union of prime atomic types.  

3. Rule for those types which are promotable to xs:double.

statEnv |- Type = (T1 | T2 ... Tn)
           (FN-URI,"min")(T1) : M1
           (FN-URI,"min")(T2) : M2 ...
           (FN-URI,"min")(Tn) : Mn
           M = (M1 | M2 | Mn)
           M <: (xs:integer | xs:decimal | xs:float  | xs:double)
------------------------------------------------
statEnv |-  (FN-URI,"min")(Type) : M


4. Rule for those types which are promotable to xs:string.

statEnv |- Type = (T1 | T2 ... Tn)
           (FN-URI,"min")(T1) : M1
           (FN-URI,"min")(T2) : M2 ...
           (FN-URI,"min")(Tn) : Mn
           M = (M1 | M2 | Mn)
           M <: (xs:anyURI | xs:string)
------------------------------------------------
statEnv |-  (FN-URI,"min")(Type) : M


5. Rule for the other permissable types.

statEnv |- Type = (T1 | T2 ... Tn)
           (FN-URI,"min")(T1) : M1
           (FN-URI,"min")(T2) : M2 ...
           (FN-URI,"min")(Tn) : Mn
           M = (M1 | M2 | Mn)
           M <: X in { xs:boolean, xs:date, xs:time, xs:dateTime, xs:yearMonthDuration, xs:dayTimeDuration }
------------------------------------------------
statEnv |-  (FN-URI,"min")(Type) : X


6. Finally, sort out the quantifiers.

statEnv |-  quantifier(Type) in {1, +}
statEnv |-  (FN-URI,"min")(prime(Type)) : T
------------------------------------------------
statEnv |-  (FN-URI,"min")(Type) : T


statEnv |-  quantifier(Type) in {?, *}
statEnv |-  (FN-URI,"min")(prime(Type)) : T
------------------------------------------------
statEnv |-  (FN-URI,"min")(Type) : T?


The rules for "max" should be identical.  "avg" is slightly different in that rule (2) must map xs:integer to xs:decimal, and the set of types in T are those for which div(T, T) is defined (xs:integer, xs:decimal, xs:float, xs:double, xs:yearMonthDuration, xs:dayTimeDuration).  Rule 4 would be ignored while in rule 5, the set of types in X would modified ppropriately.

"sum" is slightly different again, in that rule (1) must reflect the fact the the unary form of fn:sum returns xs;integer zero.  In rule (2), the set of types in T are those for which plus(T, T) is defined (xs:integer, xs:decimal, xs:float, xs:double, xs:yearMonthDuration, xs:dayTimeDuration).  I _think_ that the binary form of fn:sum must consider the union of the types of the two arguments.
Comment 12 Tim Mills 2008-04-02 12:13:45 UTC
One further observation on fn:sum...

From F&O

"Returns a value obtained by adding together the values in $arg. If $zero is not specified, then the value returned for an empty sequence is the xs:integer value 0."

Therefore

fn:sum( $arg as xs:dayTimeDuration* ) 

should be typed as xs:dayTimeDuration | xs:integer

I can't believe that this result is in any way desirable!  It make more sense for the unary version of fn:sum to return empty sequence.
Comment 13 Oliver Hallam 2008-07-10 10:27:41 UTC
While we are considering these static typing rules, would it not make sense to stipulate that fn:floor and fn:ceiling should return an xs:integer for xs:decimal arguments?
Comment 14 Michael Kay 2008-07-10 10:49:47 UTC
>While we are considering these static typing rules, would it not make sense to
stipulate that fn:floor and fn:ceiling should return an xs:integer for
xs:decimal arguments?

Sounds logical, but there's one hitch: implementations are permitted to support a smaller range of values for xs:integer than they support for xs:decimal.
Comment 15 Michael Dyck 2008-07-16 22:31:20 UTC
Here's my proposal for abs/ceiling/floor/round/round-half-to-even. That's
FS 7.2.3, corresponding to F+O 6.4 "Functions on Numeric Values".  In the
above discussion, Comments #0, 2-3, 7-10, 13-14 deal with these functions.

One thing I realized was that semantics of these functions doesn't really
involve type promotion, so the invocation of "can be promoted to" is gone.
Also, I remembered that normalization does some of the work. And lastly, I
noticed we were ignoring the two-parameter version of round-half-to-even.

So, replace the whole of 7.2.3 / STA with the following text:

    Note that, in the declarations for the built-in functions fn:abs,
    fn:ceiling, fn:floor, fn:round, and fn:round-half-to-even, the (first)
    parameter is declared with type "numeric?". Thus, for a call to one of
    these functions, the normalization rules of 4.1.5 will have wrapped the
    argument in calls to fn:data() and fs:convert-simple-operand() (with a
    'prototypical value' of type xs:double).  Thus, static analysis of the
    call is guaranteed that the argument type is a subtype of
    xs:anyAtomicType*, with no occurrences of xs:untypedAtomic.

    In the static typing rule for these functions, we do a cardinality
    check on the argument type, extract its prime type (which must be a
    choice of atomic types), find the base numeric type for each, and then
    form the choice of those results.

        expanded-QName in { (FN-URI,"abs"), (FN-URI,"ceiling"), [etc] }
        statEnv |- Type1 <: xs:anyAtomicType ?
        prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN
        AtomicTypeName1 has base numeric type AtomicTypeName1'
            ...
        AtomicTypeNameN has base numeric type AtomicTypeNameN'
        Type3 = AtomicTypeName1' | ... | AtomicTypeNameN'
        ------------------------------------------------------------
        statEnv |- expanded-QName(Type1) : Type3 ยท quantifier(Type1)

    The auxiliary judgment "has base numeric type" is defined as follows.

    If Type1 is (or is derived from) one of xs:integer, xs:float, or
    xs:double, then that is its base numeric type.

        Type2 in {xs:integer, xs:float, xs:double}
        Type1 <: Type2
        ------------------
        Type1 has base numeric type Type2

    Since xs:integer is a subtype of xs:decimal, we can't include xs:decimal
    in the above rule, otherwise subtypes of xs:integer would have both
    xs:integer and xs:decimal as their base numeric type. Instead,
    xs:decimal has its own rule, which excludes xs:integer and its subtypes.

        Type1 <: xs:decimal
        not( Type1 <: xs:integer )
        ------------------
        Type1 has base numeric type xs:decimal

    The fn:round-half-to-even function also has a two-parameter version.
    Its static type-checking can be reduced to that of the one-parameter
    version.

        expanded-QName = (FN-URI,"round-half-to-even")
        Type2 <: xs:integer
        statEnv |- expanded-QName(Type1) : Type3
        ------------------------------------------------------------
        statEnv |- expanded-QName(Type1, Type2) : Type3
Comment 16 Michael Dyck 2008-07-21 08:58:20 UTC
Okay, I finally have a proposal for min/max/sum/avg. That's FS 7.2.10,
corresponding to F+O 15.4 "Aggregate Functions" (minus fn:count). In the
above discussion, Comments #1, 3-6, 11-12 deal with these functions.

In addition to the unsoundness pointed out in comment #3, there are a few
other mistakes in the current rules:

    General:

        In STA 2 / rules 1, 3, 5, 7, we say (roughly)
            prime(Type) = empty
        but this can never be true. The proper judgment is simply
            Type = empty

        In STA 2 / rules 2, 4, 6, 8, we say (roughly)
            prime(Type) = FormalItemType1, ..., FormaItemTypeN
        but this can never be true for N>1.
        Instead, each ',' should be a '|'.

    min + max:

        In STA 2 / rules 2, 4, the set of target types should
        include xs:anyURI and xs:boolean.

    sum:

        STA 1 / rule 1 refers to expressions when it should be talking
        about types.

        STA 2 / rule 7 should deal with passing an empty seq to the
        *two-arg* version of sum.

        In STA 2 / rule 8, the set of target types should include
        xs:dayTimeDuration. (See Bug #5460).

        The prose says "The result type is the union of the target type and
        the type of the second argument." but the STA 2 / rule 8 doesn't do
        that.  Instead, it requires that the type of the second arg be a
        subtype of the target type. This has the effect that simple cases
        such as the first example in XQuery 15.4.5.1 fail to type-check.

        In STA 2 / rule 8, the use of aggregate_quantifier in the
        conclusion is incorrect. (Unlike min, max, and avg, passing an
        empty sequence to sum doesn't imply that the result will be empty.)

In addition, there are a few matters that, while not incorrect, could be
improved:

    In all rules in STA 2, the call to convert_untypedAtomic is redundant,
    because normalization has wrapped each argument in a call to
    fs:convert-simple-operand().
    
    Since we know we're only working with atomic types, occurrences of
    FormalItemType could be narrowed to AtomicTypeName.

    The rules for min and max are the same except for the function name.



The following proposal handles all of the above.

I believe that my proposal is more or less in agreement with that of
Comment #11 (although it looks fairly different). However, note that
rule 2 in Comment #11 has a problem: since xs:integer is a subtype of
xs:decimal, any Type2 satisfying T = xs:integer will also satisfy
T = xs:decimal, leading to two possible conclusions re the result type,
which is not what we want. In comment #15, I got around the integer/decimal
problem by introducing the 'has base numeric type' judgment, which I've
generalized into a 'has base atomic type' judgment:

    If Type1 is (or is a subtype of) any primitive atomic type other than
    xs:decimal, then that is its base atomic type.

        Type1 <: Type2
        Type2 is a primitive atomic type
        not(Type2 = xs:decimal)
        --------------
        Type1 has base atomic type Type2

    Similarly for xs:integer (which isn't primitive).

        Type1 <: xs:integer
        --------------
        Type1 has base atomic type xs:integer

    Finally, the special case for decimal:

        Type1 <: xs:decimal
        not( Type1 <: xs:integer )
        --------------
        Type1 has base atomic type xs:decimal

-------------------------------
Given that, here's my proposal:

Replace 7.2.10 / STA 1 with:

        statEnv |- (FN-URI,"sum")(Type1, xs:integer) : Type2
        ----------------------------------------
        statEnv |- (FN-URI,"sum")(Type1) : Type2

Replace 7.2.10 / STA 2 with:

    Now we can define the static typing rules for the aggregate functions.
    Note that the normalization rules of 4.1.5 will have wrapped each
    argument in calls to fn:data() and fs:convert-simple-operand() (with a
    'prototypical value' of type xs:double).  Thus, static analysis of the
    call to an aggregate function is guaranteed that any argument type is
    a subtype of xs:anyAtomicType*, with no occurrences of
    xs:untypedAtomic.

    The general approach for fn:min, fn:max, and fn:sum is as follows.
    First, we check that the input type(s) are acceptable for the function.
    Then we construct the (first) argument's prime type, a union of
    AtomicTypeNames. For each of the latter, we find the 'base atomic type'
    (a primitive atomic type or xs:integer). The union of these base atomic
    types is the basis for the result type, which may finally be adjusted
    for cardinality (fn:min and fn:max) or for the effect of the second
    argument (fn:sum). In addition, we provide a rule for the special case
    when the (first) argument has type 'empty'.

    For fn:min and fn:max, the permitted input types are all those for
    which ge(T,T) and le(T,T) are defined.  An empty input sequence yields
    an empty result.

        LocalPart in {"min", "max"}
        Type = empty
        --------------------------------------------
        statEnv |- (FN-URI,LocalPart)(Type) : empty


        LocalPart in {"min", "max"}
        Type1 <: Type3*
        Type3 in {
            fs:numeric,
            xs:anyURI|xs:string,
            xs:yearMonthDuration, xs:dayTimeDuration
            xs:date, xs:time, xs:dateTime, xs:boolean }
        prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN
        AtomicTypeName1 has base atomic type AtomicTypeName1'
        ...
        AtomicTypeNameN has base atomic type AtomicTypeNameN'
        AtomicTypeName1' | ...  | AtomicTypeNameN' = Type4
        ---------------------
        statEnv |- (FN-URI,LocalPart)(Type1) :
            Type4 . aggregate_quantifier(quantifier(Type1))


    For fn:sum, the permitted input types (for the first argument) are all
    those for which plus(T,T) is defined.  If you pass an empty sequence as
    the first argument, the function returns the value of the second
    argument. 

        Type1 = empty
        Type2 <: xs:anyAtomicType?
        --------------------------------------------
        statEnv |- (FN-URI,"sum")(Type1,Type2) : Type2


        Type1 <: Type3*
        Type3 in {fs:numeric, xs:yearMonthDuration, xs:dayTimeDuration}
        Type2 <: xs:anyAtomicType?
        prime(Type1) = AtomicTypeName1 | ... | AtomicTypeNameN
        AtomicTypeName1 has base atomic type AtomicTypeName1'
        ...
        AtomicTypeNameN has base atomic type AtomicTypeNameN'
        AtomicTypeName1' | ...  | AtomicTypeNamen' = Type4
        second arg contribution for sum of Type1 and Type2 is Type2'
        ---------------------
        statEnv |- (FN-URI,'sum')(Type1,Type2) : Type4 | Type2'

    The second argument's contribution (if any) to the above result type
    is determined as follows.  If the first argument could be the empty
    sequence, we add the type of the second argument to the result type.
    Otherwise, the type of the second argument is ignored.

        empty <: Type1
        --------------
        second arg contribution for sum of Type1 and Type2 is Type2

        not (empty <: Type1)
        --------------
        second arg contribution for sum of Type1 and Type2 is none


    For the purposes of static type analysis, fn:avg($arg) is equivalent to
        fs:div( fn:sum($arg,()), fn:count($arg) )
    Thus, we have the rule:

        statEnv |- (FN-URI,'sum')(Type1, empty) : Type2
        statEnv |- (FS-URI,'div')(Type2, xs:integer) : Type3
        ------------------------------------
        statEnv |- (FN-URI,'avg')(Type1) : Type3

--------
(end of proposal)

Note that the rule for min & max isn't as tight as some might like.

The way that the main rules are written is actually "optimized" based on
various properties of type-promotion as it is currently defined. If the
"can be promoted to" relation is altered/augmented at some point in the
future, we might have to change these rules. I spent some time thinking
about a more future-proof approach, but it got fairly complicated.
Comment 17 Michael Dyck 2008-07-24 18:41:22 UTC
(In reply to comment #12)
> It make more sense
> for the unary version of fn:sum to return empty sequence.

You'd have to raise that as an issue against the F+O spec.
Comment 18 Michael Kay 2008-07-24 18:52:59 UTC
>> It make more sense
>> for the unary version of fn:sum to return empty sequence.

> You'd have to raise that as an issue against the F+O spec.

Please don't. This was considered during the design phase and rejected. It would be horribly incompatible with XPath 1.0, and a horrible usability trap for the vast majority of users of XPath 2.0 who (a) normally apply the sum() function to numbers, (b) regard it as obvious that the sum of an empty set of numbers is zero, and (c) don't actually test their stylesheets against a large set of input documents, so would not spot this problem until they go into production.

(I think we could do better by allowing a processor to return zero in a different numeric type - or perhaps even a zero-length duration - if it is able to do static inferencing on the argument type, but that's a different question).
Comment 19 Michael Dyck 2008-07-30 01:51:21 UTC
At their meeting today, the WGs accepted my proposals in comment #15 and
comment #16.  Consequently, I'm marking this bug resolved-fixed, but I'd
still be interested in seeing a response from either of the CBCL commenters.
Comment 20 Oliver Hallam 2008-07-31 14:22:37 UTC
These type checking rules agree with our implementation except your definition of base atomic type for xs:dayTimeDuration and xs:yearMonthDuration is xs:duration and so for example the following (quite reasonable) query would fail to type check.:

min(sum(A), sum(B))

where A and B produce sequences of one or more xs:dayTimeDurations.

Since we already have the exception for xs:integer, I see no reason not to extend this to exceptions for xs:dayTimeDuration and xs:yearMonthDuration; that is I suggest altering the rules as follows:

    If Type1 is (or is a subtype of) any primitive atomic type other than
    xs:decimal or xs:duration, then that is its base atomic type.

        Type1 <: Type2
        Type2 is a primitive atomic type
        not(Type2 in { xs:decimal, xs:duration })
        --------------
        Type1 has base atomic type Type2

    Similarly for xs:integer, xs:dayTimeDuration and xs:yearMonthDuration
    (which aren't primitive).

        Type1 <: Type2
        Type2 in { xs:integer, xs:dayTimeDuration, xs:yearMonthDuration }
        --------------
        Type1 has base atomic type Type2

    Finally, the special case for xs:decimal:
        Type1 <: xs:decimal
        not( Type1 <: xs:integer )
        --------------
        Type1 has base atomic type xs:decimal

    and for xs:duration:
        Type1 <: xs:duration
        not( Type1 <: xs:yearMonthDuration )
        not( Type1 <: xs:dayTimeDuration )
        --------------
        Type1 has base atomic type xs:duration

Although the last rule is only for completeness and is never applied.


If the concept of base atomic type is being added anyway, then is it worth using it in place of base numeric type for abs, etc. ?
Comment 21 Michael Dyck 2008-07-31 21:46:13 UTC
(In reply to comment #20)
> These type checking rules agree with our implementation except your definition
> of base atomic type for xs:dayTimeDuration and xs:yearMonthDuration is
> xs:duration

Whoops! I forgot that those two aren't primitive. Thanks for catching that.
Your suggested alterations look fine.

> If the concept of base atomic type is being added anyway, then is it worth
> using it in place of base numeric type for abs, etc. ?

Yeah, I was thinking of doing that.

Comment 22 Michael Dyck 2008-08-28 00:59:50 UTC
This issue has been entered as FS erratum E033, and the proposed fixes have
been committed to the source files for the next edition of the FS document.
Consequently, I'm marking this issue CLOSED.