This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4446 - [XQuery] 2.3.4 Equivalent expressions
Summary: [XQuery] 2.3.4 Equivalent expressions
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 2.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Don Chamberlin
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-02 21:27 UTC by Hans-Juergen Rennau
Modified: 2010-02-11 00:22 UTC (History)
1 user (show)

See Also:


Attachments

Description Hans-Juergen Rennau 2007-04-02 21:27:30 UTC
Section 2.3.4 Errors and Optimization introduces the notion of equivalent expressions:

 implementations are free to rewrite expressions into equivalent expressions. Other than the raising or not raising of errors, the result of evaluating an equivalent expression must be the same as the result of evaluating the original expression.

If  I understand the text correctly, this amounts to the definition of equivalent expressions.

Subsequent text shows that the expression

$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]

is unsafe, whereas

$N[if (@x castable as xs:date) then xs:date(@x) gt xs:date("2000-01-01")
   else false()]

is safe. But are these not equivalent expressions, according to the definition? If so, the second expression is not safe, because it might be rewritten! And I even wonder if the expression:

	if (not(local:isInputOK())) then error((),blablabla) 
        else local:doSomething()

is not equivalent to the expression

	local:doSomething()

because other than the raising or not raising or errors, the result is the same! I am sure no implementation would do such a rewrite, but the example points to the basic uncertainty.

Please consider explicit clarification of the notion equivalent expression so as to ensure fully predictable query results.
 
Regards,
- Hans-Juergen
Comment 1 Michael Kay 2007-06-06 09:38:25 UTC
There has been some email discussion on this within the WG, and there was some sentiment for leaving the text as it is on the grounds that we've spent a lot of time on the area and this is the best we can do; however I volunteered and was given an action (A-332-03) to see if we can't make any editorial improvements.

It seems to me that the complaint here is that the term "equivalent expression" has no definition, other than the circular definition implied from the sentence where the term is introduced. At the same time, it's not clear (to one reader at least) that the following paragraphs concerning conditional expressions and typeswitch are intended to qualify the general statement allowing arbitrary rewrites. There are other problems: it's undefined what it means for one result to be "the same as" another, and the freedom in respect of errors shouldn't extend to static errors. Finally, I think we can do better in terms of setting expectations about where rewrites might change error behaviour, without being over-prescriptive.

The following is my attempt to improve this section. Text in [#...#] is my commentary not intended for inclusion in the final spec.

<new>
For a variety of reasons, including optimization, implementations MAY rewrite expressions into a different form. There are a number of rules that limit the extent of this freedom:

* Other than the raising or not raising of errors, the result of evaluating a rewritten expression MUST conform to the semantics defined in this specification for the original expression. [# remove "be the same as", because where the specification allows implementations latitude e.g. over ordering, rewrites can exploit this freedom #]

  Note: this allows an implementation to return a result in cases where the original expression would have raised an error, or to raise an error in cases where the original expression would have returned a result. The main cases where this is likely to arise in practice are (a) where a rewrite changes the order of evaluation, such that a subexpression causing an error is evaluated when the expression is written one way and is not evaluated when the expression is written a different way, and (b) where intermediate results of the evaluation cause overflow or other out-of-range conditions. [# This note is non-normative and unenforceable, but sets expectations. It gives users a bit of leverage over vendors who decide that 1 div 0 is 42 #]

  Note: this rule does not mean that the result of the expression will always be the same in non-error cases as if it had not been rewritten, because there are many cases where the result of an expression is to some degree implementation-dependent or implementation-defined.

*  Conditional and typeswitch expressions MUST NOT raise a dynamic error in respect of subexpressions occurring in a branch that is not selected. Thus, the following example MUST NOT raise a dynamic error if the document abc.xml does not exist:

if (doc-available('abc.xml')) then doc('abc.xml') else ()

* As stated earlier, an expression MUST NOT be rewritten to dispense with a required cardinality check: for example, string-length(//title) MUST raise an error if the document contains more than one title element

* Expressions must not be rewritten in such a way as to create or remove static errors. [# see bug #3773 #] For example, there is a rule that in casting a string to a QName the operand must be a string literal. This rule applies to the original expression and not to any rewritten form of the expression.


Expression rewrite is illustrated by the following examples.

    * Consider the expression //part[color eq "Red"]. An implementation might choose to rewrite this expression as //part[color = "Red"][color eq "Red"]. The implementation might then process the expression as follows: First process the "=" predicate by probing an index on parts by color to quickly find all the parts that have a Red color; then process the "eq" predicate by checking each of these parts to make sure it has only a single color. The result would be as follows:
          o  Parts that have exactly one color that is Red are returned.
          o  If some part has color Red together with some other color, an error is raised.
          o The existence of some part that has no color Red but has multiple non-Red colors does not trigger an error.
    
*      The expression in the following example cannot raise a casting error if it is evaluated exactly as written (i.e., left to right). Since neither predicate depends on the context position, an implementation might choose to reorder the predicates to achieve better performance (for example, by taking advantage of an index). This reordering could cause the expression to raise an error.

      $N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]

      To avoid unexpected errors caused by expression rewrite, tests that are designed to prevent dynamic errors should be expressed using conditional or typeswitch expressions. For example, the above can be written as:

$N[if (@x castable as xs:date)
   then xs:date(@x) gt xs:date("2000-01-01")
   else false()]

</new>
Comment 2 Michael Kay 2007-06-15 14:46:11 UTC
Looking at a bug report that came in from a customer yesterday, I think the rule for conditional expressions could be a little stronger. The proposed text was:

*  Conditional and typeswitch expressions MUST NOT raise a dynamic error in
respect of subexpressions occurring in a branch that is not selected.

My customer is doing

if (...some complex condition)
  then true()
  else error()

and under the current rules (even after the proposed modifications)

(a) the processor can determine that the result is either true() or an error without evaluating one of the operands, namely the condition

(b) it's not covered by the the rule for conditional expressions as phrased above.

So I propose a stronger wording for this rule:

*  Conditional and typeswitch expressions MUST NOT raise a dynamic error in
respect of subexpressions occurring in a branch that is not selected, and must not return the value delivered by a branch unless that branch is selected.
Comment 3 Michael Dyck 2007-06-17 19:52:34 UTC
> (a) the processor can determine that the result is either true() or an
> error without evaluating one of the operands, namely the condition

And that's a perfectly valid deduction to make. I think the behaviour you're concerned about is for the processor to *always* return true() for this example. 

> *  Conditional and typeswitch expressions MUST NOT raise a dynamic error in
> respect of subexpressions occurring in a branch that is not selected, and must
> not return the value delivered by a branch unless that branch is selected.

How about "... MUST NOT evaluate a branch that is not selected" ?
Comment 4 Michael Dyck 2007-06-17 20:02:47 UTC
> How about "... MUST NOT evaluate a branch that is not selected" ?

(Interestingly, the current text of 3.10 and 3.12.2 says, roughly, that the processor *can* evaluate an unselected branch, but must ignore any dynamic errors it encounters there.)
Comment 5 Hans-Juergen Rennau 2007-06-17 21:45:25 UTC
(In reply to comment #2)

1. "unless that branch is selected"
===================================
> *  Conditional and typeswitch expressions ... must
> not return the value delivered by a branch unless that branch is selected.

Please check my understanding: is this equivalent to stating that "if the conditional expression as a whole is evaluated, the condition MUST be evaluated"? In other words, what does it exactly mean that a branch has been *selected*? I guess (and hope) it means that the branch in question has been selected by evaluating the condition. (How else?) But it seems to me the present wording might still leave room for uncertainty. For example, one might start to wonder if

> if (...some external function call)
>   then true()
>   else true()

might be rewritten simple as
   true()
arguing that in this case one at least has not returned the result of a branch that has *not* been selected... Probably a fallacious argument, but at any rate I suggest a wording that explicitly demands the evaluation of the condition. Or have I misunderstood your intention?

2. "semantics of conditional expressions"
=========================================
> Other than the raising or not raising of errors, the result of evaluating a
> rewritten expression MUST conform to the semantics defined in this
> specification for the original expression.

Concerning condition expressions, I wonder if the intuitive order of execution - first the condition, then the selected branch - is part of the semantics which are protected by the new wording. The spec 3.10 Conditional Expressions says: "The first step in processing a conditional expression is to find the effective boolean value of the test expression, ..."

So, to frame my question into a concrete testcase:
could the expression
if (java:openBase ne true()) then java:recoverOpenError() else true()

be rewritten this way:
let $res := java:recoverOpenError()
return if (java:openBase ne true()) then $res else true()

I hope it cannot, that is, I hope the execution order "condition first, branch second" is guaranteed, as this leaves to the query writer one of the last possibilities to control sequence...
Comment 6 Hans-Juergen Rennau 2007-06-17 22:19:32 UTC
(In reply to comment #3)
> I think the behaviour you're
> concerned about is for the processor to *always* return true() for this
> example. 
> ...
> How about "... MUST NOT evaluate a branch that is not selected" ?

It seems to me the problem is that the processor might skip the evaluation of the condition. If so, your proposal would less reliably enforce the execution, as compared to the wording of Michael Kay (if I understand it correctly, see comment #5).

As an example, consider the case that the conditional expression is operand of the fn:count function, and that both branches consist of function calls whose signature guarantees exactly one result item. It seems to me that your wording would permit to rewrite the conditional expression as the expression "1", whereas the original proposal would not allow such a rewrite, because it would amount to returning the result of a branch that has not been selected. 
Comment 7 Michael Kay 2007-06-17 23:15:33 UTC
>As an example, consider the case that the conditional expression is operand of the fn:count function, and that both branches consist of function calls whose signature guarantees exactly one result item. It seems to me that your wording would permit to rewrite the conditional expression as the expression "1", whereas the original proposal would not allow such a rewrite, because it would amount to returning the result of a branch that has not been selected.

I think that it in cases where the result of an expression can be deduced from (complete or partial) knowledge of the static type, then it should always be possible to return the result without actually evaluating the expression. For example, it is legitimate to return "true" as the result of

(if (EXPR) then 1 else 2) instance of xs:integer

without evaluating EXPR, and the same applies to

exists(if (EXPR) then 1 else 2)
Comment 8 Michael Dyck 2007-06-17 23:44:48 UTC
(In reply to comment #6)

> As an example, consider the case that the conditional expression is operand of
> the fn:count function, and that both branches consist of function calls whose
> signature guarantees exactly one result item. It seems to me that your wording
> would permit to rewrite the conditional expression as the expression "1",

I think you mean rewrite the fn:count(...) call as 1.

> whereas the original proposal would not allow such a rewrite, because it would
> amount to returning the result of a branch that has not been selected.

But it doesn't amount to evaluating a branch that has not been selected?

(I'm not sure it amounts to either, but I don't see how you can say it's one
and not the other.)
Comment 9 Hans-Juergen Rennau 2007-06-18 23:06:13 UTC
(In reply to comment #7)

> For example, it is legitimate to return "true" as the result of
> (if (EXPR) then 1 else 2) instance of xs:integer 
> without evaluating EXPR

Yes, I see. So my error was to apply your constraint

> Conditional and typeswitch expressions MUST NOT ... must not return the
> value delivered by a branch unless that branch is selected.

to the count(EXPR) example, ignoring that EXPR is not *evaluated*.

Apparently, I got confused what the word 'evaluating' means. I think that "to evaluate an expression" means to replace the expression by its value. And I now see that in the count(EXPR) - example one has done "something" with the argument expression - extracted information from it - but not evaluated it. 

Trying to arrive at a crystal clear understanding of what the constraint which you propose means, I now think:

"must not return the value delivered by a branch unless that branch is selected" applies only to those cases where the (conditional/typeswitch) expression is fully evaluated, that is, replaced by its value.

I added "fully" because sometimes a  conditional expression might be partially evaluated in the sense of determining a value set guaranteed to contain the real value. For example let EXPR denote the expression "if ($x eq 0) then 23 else 45". Then the expression: 
   EXPR < 100
can be evaluated, not replacing EXPR by its value, but analyzing the set of possible values. These cases are not limited to where the superset is statically known. (An example is given by replacing '23' and '45' by $a and $b.)

I believe you would not want your constraint to apply to such cases - hence "fully". 
Comment 10 Hans-Juergen Rennau 2007-06-18 23:15:31 UTC
(In reply to comment #8)
> But it doesn't amount to evaluating a branch that has not been selected?
> (I'm not sure it amounts to either, but I don't see how you can say it's one
> and not the other.)

You are right!

(I mixed up the extraction of information with real evaluation. In comment #9 I tried to arrive at a clear distinction.)
Comment 11 Michael Kay 2007-07-31 19:16:26 UTC
The WG discussed this at the F2F meeting in Pisa (see minutes at http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2007Jun/0140.html, member-only). The resolution is to accept the text proposed in comments #1 and #2. I am marking this FIXED for the moment; I'll defer marking it closed until the procedure for managing closure of fixed bugs is clarified.
Comment 12 Michael Kay 2007-11-16 11:53:37 UTC
The agreed change will be implemented in erratum XP.E4 for XPath, and XQ.E4 for XQuery.