4841 – [FS] Use of fn:subsequence in relation to normalization rules for filter expressions in FS

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4841 - [FS] Use of fn:subsequence in relation to normalization rules for filter expressions in FS

Summary: [FS] Use of fn:subsequence in relation to normalization rules for filter expr...

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Formal Semantics 1.0 (show other bugs)
Version:	Recommendation
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Dyck
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-07-09 14:02 UTC by Oliver Hallam
Modified:	2008-08-17 01:01 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Oliver Hallam 2007-07-09 14:02:10 UTC

Under "Normalization" in section 4.2.2 of the FS specification, there is the following example:

For example, the expression //a[3.4] always returns the empty sequence.

This is really an example of a predicate expression, and so probably should be moved down to section 4.3.2


However, in section 4.3.2, the normalization rules say that filter expressions with numeric literal predicates normalize as follows:

[PrimaryExpr PredicateList [ NumericLiteral ]]Expr
==
let $fs:sequence := [PrimaryExpr PredicateList]Expr return
fn:subsequence($fs:sequence, NumericLiteral,1)

and so the example above normalizes to:
let $fs:sequence := //a return
fn:subsequence($fs:sequence, 3.4, 1)


According to the definition of fn:subsequence in Functions and Operators 15.1.10, fn:subsequence(3.4, 1) is defined to be

$fs:sequence[fn:round(3.4) le $p and $p lt fn:round(3.4) + fn:round(1)]

Apart from being a somewhat circular definition, this contradicts the example above, as it implies that //a[3.4] should return the third item in //a.

Comment 1 Michael Kay 2007-07-09 14:46:05 UTC

It looks to me as if the FS normalization rule in 4.3.2 should apply only to IntegerLiteral and not to other kinds of NumericLiteral. (Real implementations can also infer a static type of empty-sequence when the predicate is a constant numeric value that is non-integer or less than one, but I suspect the FS doesn't want to do that.)

Comment 2 Oliver Hallam 2007-07-09 15:19:47 UTC

I agree, and have moved bug to formal semantics.

Comment 3 Michael Dyck 2007-07-10 22:38:30 UTC

(In reply to comment #0)
> Under "Normalization" in section 4.2.2 of the FS specification, there is the
> following example:
> 
> For example, the expression //a[3.4] always returns the empty sequence.
> 
> This is really an example of a predicate expression, and so probably should be
> moved down to section 4.3.2

I think you meant to say that it's an example of a *filter* expression, since that's the subject of section 4.3.2. But in fact, it isn't a filter expression. Of course, the whole expression is a PathExpr, but we're presumably only interested in the a[3.4] step. That step is not a FilterExpr, but rather an AxisStep (because the 'a' is not a PrimaryExpr but rather a ForwardStep [specifically, a NameTest]). So the appropriate section of the FS is not 4.3.2, but 4.2.1.

However, 4.2.1 has similar normalization rules to 4.3.2, so your point (that //a[3.4] is normalized without involving the []_Predicates rule) is still valid. (In fact, from the combination of 4.2.1 and 4.3.2, we can say that []_Predicates will never be applied to a NumericLiteral or a call to fn:last.)

So, I agree that the example of //a[3.4] is inappropriate for 4.2.2. Instead, something like
    let $v := 3.4 return //a[$v]
(whose normalization *does* involve []_Predicates) would be appropriate for 4.2.2.

----

The XQuery spec defines the dynamic semantics of
    //a[3.4]
and
    let $v := 3.4 return //a[$v]
as follows.

Section 3.2.2 says: "If the value of the predicate expression is a singleton atomic value of a numeric type or derived from a numeric type, the predicate truth value is 'true' if the value of the predicate expression is equal (by the 'eq' operator) to the context position, and is 'false' otherwise."

Then section 3.5.1 says that if you compare 3.4 (a literal of type xs:decimal) to the context position (of type xs:integer) via the 'eq' operator, the two operands are converted (via subtype substitution) to their least common type (i.e., xs:decimal) and then passed to op:numeric-equal(), which will always return false.

Thus, a predicate expression of 3.4 will always result in a predicate truth value of 'false', which will discard all items in the context sequence. So the two example expressions above will always yield the empty sequence (assuming they don't raise some error not pertinent to this discussion).

----

However, as you point out, the FS's normalizations for NumericLiteral as predicate-expresion involve fn:subsequence, which involves rounding, which allows //a[3.4] to return something other than the empty sequence. (Not "the third item in //a", that's (//a)[3.4], but rather every element that is the third a-child of some node in the context tree.) Thus, those normalizations (specifically, 4.2.1 / Norm / rule 1+3 and 4.3.2 / Norm / rule 1) do not correctly reflect the dynamic semantics defined in the XQuery spec.

Michael Kay's suggestion to change NumericLiteral to IntegerLiteral (extended to all three normalization rules) would eliminate this problem, however it would cause the FS to specify less precise types for cases involving other NumericLiterals. This would affect conformance of implementations as follows:
-- Any currently-conforming implementation would become a conforming implementation with a static typing extension for those cases.
-- Conceivably, certain currently-nonconforming implementations would become conforming.

I'm not sure how it would affect conformance of queries; I'm not even sure conformance of queries is defined.

Comment 4 Michael Kay 2007-07-11 10:23:13 UTC

>This would affect conformance of implementations ...

I'm not sure how far you want to go here. Saxon, given the expression A[3.4], will normalize it to () and then infer a type of empty-sequence (which under Saxon's optimistic static typing results in a warning). The FS has been reluctant to introduce normalization rules that take account of the value, as distinct from the type, of the operands: and normalizing to empty-sequence is undesirable as it makes the expression an error. I suppose that the fix that makes most sense is one that infers a cardinality of zero-or-one for any predicate value that is a numeric literal, without doing the incorrect normalization to subsequence().

Comment 5 Michael Dyck 2007-08-07 02:16:37 UTC

(In reply to comment #4)
> 
> The FS has been reluctant to introduce normalization rules that take
> account of the value, as distinct from the type, of the operands:

As far as the processing model is concerned, values aren't available statically, so can't be a basis for normalization. (Even just taking account of the type can be problematic, causing circularities if we're not careful.)

> I suppose that the fix that makes most sense is one that infers a
> cardinality of zero-or-one for any predicate value that is a numeric
> literal, without doing the incorrect normalization to subsequence().

If we don't normalize such cases to a special form, then I think it would be fairly difficult for the FS to infer a restricted cardinality, and perhaps impossible to do so without requiring implementations to infer a restricted cardinality in other unintended cases.

So, I believe the feasible solutions are:

1) Change 'NumericLiteral' to 'IntegerLiteral' in 4.2.1 / Norm / rule 1+3 and 4.3.2 / Norm / rule 1. (Simple, but changes the inferred static type when the predicate is a DecimalLiteral or DoubleLiteral.)

2) Change fn:subsequence to (say) fs:sub, which would be defined to have the same static semantics as fn:subsequence, but without the rounding in its dynamic semantics. This could either be done just in those three normalization rules (which might be somewhat confusing), or throughout the FS.

Comment 6 Michael Kay 2007-08-07 04:04:52 UTC

I don't know what the correct notation is, but I think the required outcome can probably be achieved by normalizing 

PrimaryExpr PredicateList [ NumericLiteral N where (N mod 1 eq 0 and N gt 0)] 

to

fn:subsequence($fs:sequence, NumericLiteral, 1)

and

PrimaryExpr PredicateList [ NumericLiteral N where (N mod 1 ne 0 or N le 0)] 

to

()

This gives the same static inference as the current rules except in cases where the current rules get it wrong.

Unlike the previous proposals, it still infers a singleton for numeric predicates expressed as decimals or doubles provided the actual value is equal to an integer. It's also a very localized change.

It does mean that expressions like foo[3.4] or foo[0] will fall foul of the static typing rule that says empty sequences are bad for your health; but that seems well within the spirit of the Act.

If we were doing this again I think I would want to normalize both filter expressions and subsequence to calls on a primitive such as fs:item-at(seq, int).

Comment 7 Michael Dyck 2007-08-07 08:18:55 UTC

(In reply to comment #6)
> 
> PrimaryExpr PredicateList [ NumericLiteral N where (N mod 1 ne 0 or N le 0)] 
> to
> ()

Yup, we could do that too, so that's solution #3. Like #1, it would result in a different inferred static type in some cases.

> This gives the same static inference as the current rules except in cases 
> where the current rules get it wrong.

Re "get it wrong": The current rules infer a static type that is correct. (It's less precise than possible in some cases, but that wasn't the point of the original comment.) 

> Unlike the previous proposals, it still infers a singleton for numeric
> predicates expressed as decimals or doubles provided the actual value is equal
> to an integer.

In those cases, the current rules and proposal #2 also "infer a singleton", if I understand what you mean by that.

> It's also a very localized change.

They're all pretty localized.

Comment 8 Michael Kay 2007-08-07 11:18:42 UTC

>The current rules infer a static type that is correct. 

Yes, the rules infer a correct static type for x[3.4], but at the cost of computing the wrong value for the expression. I think there are some users out there who care more about the result of evaluating the expression than about its static type - though I agree these users are not part of the target readership for the Formal Semantics ;-)

Comment 9 Michael Dyck 2007-08-09 21:31:21 UTC

> Yes, the rules infer a correct static type for x[3.4], but at the cost of
> computing the wrong value for the expression.

Yes, exactly.

Comment 10 Michael Dyck 2007-10-30 04:31:15 UTC

At meeting 337, I was directed to propose a solution that did not change the results of static type analysis. This excludes all of the above suggestions except #2 in Comment #5. Here is a more fleshed-out version of that solution.

(A)
At every occurrence of "fn:subsequence" in 4.2.1 and 4.3.2:
-- Change "fn:subsequence" to "fs:item-at".
-- Delete the third argument (always "1").

(B)
Create a new section:

  7.1.13 The fs:item-at function

    fs:item-at($sourceSeq as item()*, $loc as xs:double) as item()?

    The fs:item-at function returns the item at a specified position in
    a sequence.

  Dynamic Evaluation

    If $loc is numeric-equal to the position of some item in $sourceSeq,
    that item is returned. (This implies that $sourceSeq is non-empty,
    and $loc is numeric-equal to an integer between 1 and n inclusive,
    where n is the number of items in $sourceSeq.)

    Otherwise, the empty sequence is returned.

    The function is roughly equivalent to the following user-defined
    function.

        declare function fs:item-at(
            $sourceSeq as item()*,
            $loc as xs:double) as item()?
        {
            if ($loc div 1 eq 0) then
                fn:subsequence($sourceSeq,$loc,1)
            else
                ()
        };

  Static Type Analysis

    The static typing rules for invocations of fn:item-at depend on the
    syntactic form of the second argument.  By construction, this must
    be either $fs:last, $fs:position, or a NumericLiteral. Thus, those
    are the only cases considered by the following rules.

    [Then, from 7.2.13 / STA, move all but the last rule (and its
    preceding prose) to here, with the following changes:
        'fn:subsequence'       --> 'fs:item-at'
        (FN-URI,"subsequence") --> '(FS-URI,"item-at")'
        In each rule's conclusion, drop the function call's third arg
        (always "1").
    ]

(C)
In "7.2.13 The fn:subsequence function":

  Introduction:
    Drop it.

  Static Type Analysis:
    Everything except the last rule (the catch-all case) has been moved
    to 7.1.13.

    Change the last rule's preceding prose to: 
        The static type for the fn:subsequence is computed using prime
        and quantifier, which are defined in [8.4 Judgments for FLWOR
        and other expressions on sequences].

(D)
In "C.1 Functions and Operators used in the Formal Semantics", drop the
list item for fn:subsequence.

Comment 11 Michael Kay 2007-10-30 08:44:07 UTC

$loc div 1 eq 0

Should that be

$loc mod 1 eq 0

Comment 12 Oliver Hallam 2007-10-30 11:56:27 UTC

The changes proposed fail to provide a catch all case, and so the type is not defined for an argument with quantifier ? for example.

The typing rules could be made simpler (without changing the result) by just applying the following rules:

statEnv |-  QName of func expands to (FN-URI,"item-at")
statEnv |-  Expr : Type  quantifier(Type) in { 1, + }
--------------------------------------------------------
statEnv |-  QName(Expr, 1) : prime(Type)

statEnv |-  QName of func expands to (FN-URI,"item-at")
statEnv |-  Expr : Type
--------------------------------------------------------
statEnv |-  QName(Expr1, Expr2) : prime(Type) ?


Additionally, the static typing could be improved with the following rule: statEnv |-  QName of func expands to (FN-URI,"item-at")
statEnv |-  Expr : Type  quantifier(Type) in { 1, + }
--------------------------------------------------------
statEnv |-  QName(Expr1, $last) : prime(Type)

Comment 13 Michael Dyck 2007-10-30 15:06:04 UTC

(In reply to comment #11)
> $loc div 1 eq 0
> 
> Should that be
> 
> $loc mod 1 eq 0

Whoops, right, thanks!

Comment 14 Oliver Hallam 2007-10-30 16:04:10 UTC

One other concern with the proposed change is that removing the special case rules from subsequence causes static type errors to occur in XQTS queries such as K-RangeExpr-11 and K-ForExprPropositionalVar-23.

Perhaps the special case for subsequence(Expr,1,1) should be kept.

Comment 15 Michael Dyck 2007-10-31 02:28:17 UTC

(In reply to comment #12)
> The changes proposed fail to provide a catch all case, and so the type is not
> defined for an argument with quantifier ? for example.

Whoops again! You're right, the four moved rules do not cover all cases.

> The typing rules could be made simpler (without changing the result) by just
> applying the following rules:
> 
> statEnv |-  QName of func expands to (FN-URI,"item-at")
> statEnv |-  Expr : Type  quantifier(Type) in { 1, + }
> --------------------------------------------------------
> statEnv |-  QName(Expr, 1) : prime(Type)
> 
> statEnv |-  QName of func expands to (FN-URI,"item-at")
> statEnv |-  Expr : Type
> --------------------------------------------------------
> statEnv |-  QName(Expr1, Expr2) : prime(Type) ?

An excellent simplification. (I had to do some quantifier-arithmetic to convince myself that, in all cases, the resulting type was the same as currently inferred, but I agree that it is.)

> Additionally, the static typing could be improved with the following rule:
> statEnv |-  QName of func expands to (FN-URI,"item-at")
> statEnv |-  Expr : Type  quantifier(Type) in { 1, + }
> --------------------------------------------------------
> statEnv |-  QName(Expr1, $last) : prime(Type)

In fact, fn:subsequence used to have such a rule, but it was removed in the 20050915 WD and hasn't returned. (See Bug 1728 Comment #7.) My impression is that the removal was a mistake, and I think the rule should be reinstated (for item-at). However, that won't be happening as part of this issue, and probably not as an erratum either.

Comment 16 Michael Dyck 2007-10-31 05:55:36 UTC

(In reply to comment #14)
> One other concern with the proposed change is that removing the special case
> rules from subsequence causes static type errors to occur in XQTS queries such
> as K-RangeExpr-11 and K-ForExprPropositionalVar-23.

As far as I can tell, those tests should already be raising errors. More precisely, an implementation that conforms to the Formal Semantics and does not support a static typing extension will raise a type error during static type analysis of either of those tests.

For instance, consider K-RangeExpr-11:
   subsequence(-3 to -1, 1, 1) eq -3

For inferring the static type of the call to fn:subsequence, you might think that the first rule of 7.2.13 applies, yielding xs:integer. However, static type analysis always takes place on the normalized query.  Normalization transforms the query to:

   fs:eq(
      fs:convert-operand(
         fn:data((
            subsequence(
               fs:to(
                  (fs:unary-minus(fs:convert-operand(fn:data((3)),1.0E0))),
                  (fs:unary-minus(fs:convert-operand(fn:data((1)),1.0E0)))
               ),
               fs:convert-simple-operand(fn:data(1),1.0E0),
               fs:convert-simple-operand(fn:data(1),1.0E0),
            )
         )),
         "string"
      ),
      fs:convert-operand(
         fn:data((
            fs:unary-minus(fs:convert-operand(fn:data((3)),1.0E0))
         )),
         "string"
      )
   )

The normalized call to subsequence() matches none of the specialized forms in 7.2.13; only the final catch-all rule applies. The type of the first arg is xs:integer*, so prime(Type) is xs:integer and quantifier(Type) is *. Thus, the inferred type of the subsequence() call is xs:integer*. Ditto for the fn:data() call and fs:convert-operand() call that contain it.

So when it comes to typing the fs:eq() call, the xs:integer* argument type isn't accepted by the rules of C.2, and static type inference fails. (A type error is raised.)

Of course, an implementation is free to support a static typing extension, in which the subsequence() call might have a type of xs:integer? or xs:integer. In such a case, static typing of fs:eq() would succeed (yielding xs:boolean? or xs:boolean respectively).

Comment 17 Michael Dyck 2007-11-13 23:46:11 UTC

As directed by action item A-346-02, here is the consolidation of my proposal from Comment #10 with the amendments of Comment #11 (#13) and most of Comment #12 (#15), plus some minor editorial tweaks.

(A)
At every occurrence of "fn:subsequence" in 4.2.1 and 4.3.2:
-- Change "fn:subsequence" to "fs:item-at".
-- Delete the third argument (always "1").

(B)
Create a new section:

  7.1.13 The fs:item-at function

    fs:item-at($sourceSeq as item()*, $loc as xs:double) as item()?

    The fs:item-at function returns the item at a specified position in
    a sequence.

  Dynamic Evaluation

    If $loc is numeric-equal to the position of some item in $sourceSeq,
    that item is returned. (This implies that $sourceSeq is non-empty,
    and $loc is numeric-equal to an integer between 1 and n inclusive,
    where n is the number of items in $sourceSeq.)

    Otherwise, the empty sequence is returned.

    The function is roughly equivalent to the following user-defined
    function.

        declare function fs:item-at(
            $sourceSeq as item()*,
            $loc as xs:double) as item()?
        {
            if ($loc mod 1 eq 0) then
                fn:subsequence($sourceSeq,$loc,1)
            else
                ()
        };

  Static Type Analysis

    The static typing rules for invocations of fs:item-at depend on the
    syntactic form of the second argument. If it is the IntegerLiteral 1,
    then we can be relatively precise about the resulting type.

        statEnv |-  QName of func expands to (FS-URI,"item-at")
        statEnv |-  Expr1 : Type1
                    quantifier(Type1) in { 1, + }
        --------------------------------------------------------
        statEnv |-  QName(Expr1, 1) : prime(Type1)

    Otherwise, the following less precise rule is used.

        statEnv |-  QName of func expands to (FS-URI,"item-at")
        statEnv |-  Expr1 : Type1
        --------------------------------------------------------
        statEnv |-  QName(Expr1, Expr2) : prime(Type1) ?

    (Since invocations of fs:item-at arise only as the result of
    particular normalization rules, Expr2 in the above rule must be
    either $fs:last, $fs:position, or a NumericLiteral.  Thus, there
    is no need to check its type.)


(C)
In "7.2.13 The fn:subsequence function":

  Introduction:
    Drop it.

  Static Type Analysis:
    Drop everything except the last rule (the catch-all case).

    Change the last rule's preceding prose to:
        The static type of a call to fn:subsequence is computed using
        prime and quantifier, which are defined in [8.4 Judgments for
        FLWOR and other expressions on sequences].

(D)
In "C.1 Functions and Operators used in the Formal Semantics", drop the
list item for fn:subsequence.

Comment 18 Michael Dyck 2007-11-28 23:45:04 UTC

At its meeting on November 27th, the Joint XML Query WG and XSL WG approved the
solution presented in Comment #17.

Comment 19 Michael Dyck 2008-08-17 01:01:21 UTC

This issue has been entered as FS erratum E024, and its fix has been
committed to the source files for the next edition of the FS document.
Consequently, I'm marking this issue CLOSED.

(By the way, I also noticed (and fixed) a problem with the the catch-all
(i.e., last) rule for fn:subsequence in FS 7.2.13: it required that the
type of the arguments for $startingLoc and $length be xs:double, whereas
it should merely require that those types be *convertible* to xs:double.)