4441 – [XQuery] 2.3.4 Optimization rules unclear?

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4441 - [XQuery] 2.3.4 Optimization rules unclear?

Summary: [XQuery] 2.3.4 Optimization rules unclear?

Status:	RESOLVED INVALID

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XQuery 1.0 (show other bugs)
Version:	Recommendation
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Don Chamberlin
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-03-31 12:35 UTC by Hans-Juergen Rennau
Modified:	2007-04-12 00:41 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Hans-Juergen Rennau 2007-03-31 12:35:06 UTC

In my opinion, section 2.3.4 Errors and Optimization does not sufficiently clearly define the implementors freedom in optimizing the evaluation process. However, full clarity is needed when the query writer wants to enforce the evaluation of a function the return value of which will not be used afterwards. The text states:
 
If, at such an intermediate stage of evaluation, a processor is able to establish that there are only two possible outcomes of evaluating Q, namely the value V or an error, then the processor may deliver the result V without evaluating furher items in the operand E.

Please consider explicit clarification of the following question: May the processor skip the evaluation of a function call based on an evaluation of the functions return type?

Explanation
===========
For example, consider this code:

declare function local:prepareEnvironment() as xs:boolean {};

if (count(local: prepareEnvironment ()) lt 1) then error(NEVER) else
local:produceResult()

This construction is meant to ensure that prepareEnvironment is indeed evaluated, and that it is evaluated before produceResult. But can one be sure that prepareEnvironment will be evaluated? The answer hinges on the clarification requested above.

Postscriptum
============
If the processor may indeed perform such skips, these were the consequences:
- functions with return type empty-sequence() never need to be evaluated
- it becomes a serious problem to enforce the evaluation of a function call the result of which must be discarded (namely, is neither part of the result nor is it allowed to influence the result) 

Regards,
- Hans-Juergen

Comment 1 Michael Kay 2007-03-31 14:12:38 UTC

Personal response:

The rules (in particular, the sentence that you cite) make it clear that the processor does not need to call local:prepareEnvironment in this situation, because the result of the call (unless it fails) has no effect on the final result of the query.

My guess from the name of your function is that local:prepareEnvironment() is designed to have side-effects. This is also suggested by your phrase "when the query writer wants to enforce the evaluation of a function the return value of which will not be used afterwards".

XQuery doesn't say very much about the semantics of calls to external functions, and doesn't mention what happens if they have side-effects. The XSLT 2.0 spec (perhaps in the light of field experience with 1.0) has a number of warnings and caveats on this subject, saying that they are not prohibited but warning that since the order of execution is not well-defined, the results are unpredictable.

The WG tried to place as few constraints as possible on optimizers. It would be most unreasonable in my view to say that the rewrite (count(X) lt 1) => false() is not allowed when X has a static cardinality of exactly one.

XQuery doesn't provide any mechanism to force evaluation of expressions whose result is not used. It's open to implementors to provide such mechanisms using the extensibility features in the language.

Michael Kay

Comment 2 Martin Probst 2007-03-31 23:02:02 UTC

Also, it's often possible to re-write functions in a way that doesn't use side effects. E.g. you could formalize the environment using a variable (even if it doesn't really contain a value but evaluates to the empty sequence):
let $env := local:prepareEnvironment()
return ($env, local:do-something())

This enforces evaluation of the method body in all implementations if I'm not mistaken. Or you might actually find something useful to return...

Comment 3 Michael Kay 2007-03-31 23:28:46 UTC

<quote>
let $env := local:prepareEnvironment()
return ($env, local:do-something())

This enforces evaluation of the method body in all implementations
</quote>

Well, it's not 100% guaranteed. But Saxon for example is careful to treat an external function with a Java return type of "void" as having a static type of item()* rather than empty-sequence(), to stop the optimizer doing anything "clever" with such a construct.

It can still be difficult to enforce the order in which such functions are evaluated.

Michael Kay

Comment 4 Martin Probst 2007-04-01 12:13:48 UTC

It's probably even valid to treat anything that returns 'empty-sequence()' as a function with possible side-affects - why would you call it otherwise?

Comment 5 Michael Kay 2007-04-01 16:35:16 UTC

If you've inferred a static type of empty-sequence() for an expression such as

para[self::section]

then you'd be wasting all that effort if you then treated it as a function with possible side-effects!

Comment 6 Hans-Juergen Rennau 2007-04-02 16:51:38 UTC

You write:

"XQuery doesn't provide any mechanism to force evaluation of expressions whose
result is not used. It's open to implementors to provide such mechanisms using the extensibility features in the language."

And in another comment:

"It can still be difficult to enforce the order in which such functions are
evaluated."

The enforcement of  (a) execution and ( b) order of execution is of vital importance when writing standalone queries of extended functionality, e.g. standalone queries opening, using and closing JDBC connections. Don't you think that query authors need a reliable, standards based pattern, rather than an implementation specfic feature? I suppose that the pattern described beneath is in fact reliable; the only constraint on the enforced functions is that they do not have the return type empty-sequence(). If I am wrong and the pattern is not reliable - cordial thanks to you or any implementor correcting my error.

(a) To enforce the silent execution of functions, wrap the  function calls in a call to silentExec (see below)

(b) Let S1 be a sequence of function calls to be silently executed, and S2 the part of the query to be evaluated after execution of S1. To achieve this, use the pattern 
	if (local:silentExec(S1)) then error() else S2

(c) Example:
if (local:silentExec(local:initQuery(), java:initEnv))) 
then error((), "NEVER") else
...

Sketch of local:silentExec (can certainly be improved, e.g. using an appropriate theorem):

declare function local:silentExec($items as item()*) as node()* {
   let $obscure as xs:integer := sum(for $item at $pos in $items return 
      string-length(concat(xs:string($pos),
         typeswitch($item)
            case $i as node() return $i/root()/comment()[1]
            default $i return xs:string($i)
      )))
   return
      comment {"NEVER"} 
         [9999999 eq sum(for $digit in string($obscure) return number($digit) * number($digit))]
};

Comment 7 Hans-Juergen Rennau 2007-04-02 17:13:35 UTC

(In reply to comment #2)

But there remain two problems, if I am not mistaken:

a) the order of evaluation is not guaranteed - the processor may choose to execute 'do-something' first and only afterwards 'prepareEnvironment'

b) if $env must not leave any trace in the query result (as was the case in the scenario we started at), there are two possibilities:

- if the code is based on the assumption that $env evaluates to the empty sequence, we have constraints on the function (empty result, but static return type T? or T*) - therefore the pattern is not generic

- otherwise I must still get rid of $env, and therefore I am still not on dry land - the riddance offers the processor a chance of optimizing the function call away

Comment 8 Michael Kay 2007-04-02 21:12:45 UTC

>The enforcement of  (a) execution and ( b) order of execution is of vital importance when writing standalone queries of extended functionality, e.g.
standalone queries opening, using and closing JDBC connections.

Then I would expect any vendor providing such extensions to provide them in such a way that they are usable for the purpose.

There's been initial design work done on procedural extensions to XQuery (see papers on XQueryP) which will provide semantics for extensions with side-effects. In the meantime it's up to vendors defining such extensions to explain to users how they should be used, and what guarantees are offered by the implementation.

Comment 9 Martin Probst 2007-04-03 16:37:06 UTC

I don't think it's necessary to write horrible code to trick an XQuery interpreter into executing statements in order. You can simply make them depend on each other via input and output data, and processors will be forced to execute them in order.

let $jdbc:connection := jdbc:get-connection('jdbc://some/url')
let $jdbc:result := jdbc:execute-sql($jdbc:connection, 'SELECT * FROM foo')
let $jdbc:connection := $jdbc:result/connection
 ... do something with the result ...
let $jdbc:close-token := jdbc:close($jdbc:connection)
return $jdbc:close-token

The $jdbc:* stuff doesn't actually need to really _be_ the connection or anything, it's sufficient if it's some sort of token, even a string simply containing 'hello world' will do.

I've seen people doing these side-effect calls several times, and I really think it's not a good idea. It might be a bit ugly to return stuff like $jdbc:close-token from the query, be it empty or an XML comment or anything, but it sure is better than having ugly side effects happening behind the back of your query processor. Especially with all the problems that ensue from that.

Comment 10 Hans-Juergen Rennau 2007-04-04 04:16:46 UTC

(In reply to comment #8)

First

I am not the only developer who has to write queries which
a) are production-quality
b) are portable (across processors and versions)
c) must initialize, use and clean up the external environment, successively

These queries must be written now, not in a couple of years, and I think it is possible.

Second

The word side-effects deserves closer inspection. I discern two types:
-	type 1 concerns the interaction with the external environment
-	type 2 concerns the update of XDM instances and variable bindings

Type 1 is an inherent part of XQuery1.0, due to the feature of external functions. Type 2 is not.

XQueryP creates and tackles type 2 (along with type 1). Exciting work, but probably not yet standard for a while.

Type 1 is an inherent part of XQuery1.0. The specification allows external functions, so I refuse to accept that it may discourage a basic pattern of init/main/exit. Rather, it should support the developer who is obliged to realize it, adding a note or two on how to enforce execution, and sequence of execution.

Comment 11 Hans-Juergen Rennau 2007-04-07 13:09:19 UTC

(In reply to comment #9)

<quote>
You can simply make them depend on each other via input and output data, and processors will be forced to execute them in order.
</quote>

The pattern you suggest is in my opinion not reliable, as the evaluation of a function may simply ignore any argument not needed for determining the function result. So you have to begin write something strange  *within* the dependent functions (those which require a precursor). The whole approach amounts to a heavy load on function design: signature extended by unnessary input, and  function body extended by tricky code making the input seem necessary. (Further complication ensues if a dependent function needs the preceding execution of not one but several functions.) The bottom line is that though the technique of enforcing sequence by output/input ñhains is fine as an ad hoc approach in a specific situation, it does not seem to qualify as a reusable pattern to be employed in a consistent way in a larger collection of library modules. Also please keep in mind that it is not sufficient if testing proves that the code works with the present processor version  the code must be safe against future improvements of optimazation.
 
<quote>
I've seen people doing these side-effect calls several times, and I really think it's not a good idea.
</quote>

I am not sure I understand would you mean with doing these side-effect calls. It cannot be that you discourage the side-effect iteself (e.g. the opening of a connection). So maybe you mean discoupling the side-effect from a visible result to be passed on to a consumer?

Comment 12 Martin Probst 2007-04-10 07:29:15 UTC

Regarding the first part, I presumed that your functions eventually end up in Java, or anywhere outside of the scope of your processors optimizations, so it cannot really tell if a parameter is used or not.

"It cannot be that you discourage the side-effect iteself (e.g. the opening of a connection)"

I actually do, at least within a functional language. 

Of course, connections need to be opened, so there is state that needs to be handled. However trying to handle state in a functional language, that is per definition stateless, is a really bad idea and will lead to many problems, IMHO (I don't know about the host language part of your implementation, but what about multi-threading?). I think it's possible to encapsulate such state managing outside of XQuery, in the host language, and that's what I would do.

"signature extended by unnessary input, and function body extended by tricky code making the input seem necessary."

The point is that in your example with a function "prepareEnvironment", the initialized environment is actually the return value of the function, and following functions obviously take the environment as necessary input, and depend on it. And they do use the input (the initialized environment) in their bodies. The difference between imperative languages like Java and functional languages like XQuery is that XQuery makes this dependency explicit and visible. And that is A Good Thing (tm), as it has lots of benefits (e.g. optimizability, code easier to understand, ...).

Comment 13 Daniel Engovatov 2007-04-12 00:41:34 UTC

Working group believes that this is an important use case.   Requirements document has been recently published at http://www.w3.org/TR/xquery-sx-10-requirements/.  It describes a planned XQuery extension that should address this issue.

At the same time, as was noted in comments, in particular in comment #1, this is not a valid XQuery 1.0 bug.  We have voted to close this bug.