5332 – [UPD] Parentheses around () or fn:error()

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5332 - [UPD] Parentheses around () or fn:error()

Summary: [UPD] Parentheses around () or fn:error()

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Update Facility (show other bugs)
Version:	Last Call drafts
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Andrew Eisenberg
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-12-31 14:46 UTC by Michael Kay
Modified:	2011-01-07 11:27 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description Michael Kay 2007-12-31 14:46:09 UTC

The XQuery Update specification allows the expressions () or fn:error() to appear in certain contexts where updating expressions may appear. However, it does not allow the expression () or fn:error() to be enclosed in parentheses in such contexts. Thus it is OK to write:

if (b)
  then delete node $x
  else fn:error()

But it is wrong to write

if (b)
  then (delete node $x)
  else (fn:error())

This violates the common expectation that any expression can be enclosed in parentheses without changing its meaning or validity. It makes life unnecessarily difficult for software that is generating XQuery code (for example, a stylesheet that generates XQuery from XQueryX), and it also makes life unnecessarily difficult for people implementing XQuery parsers, because they have to retain some representation of redundant parentheses in the expression tree until after semantic checks have been performed.

The problem can be solved by partitioning expressions into three categories instead of two: updating, non-updating, and neutral. The expressions () and fn:error() should fall into the neutral category, and a parenthesized expression should have (as now) the same category as its content. Contexts that currently allow an updating expression or () or fn:error() should allow any updating or neutral expression. (And of course, places that require a non-updating expression should now require a non-updating or neutral expression)

I would suggest that fn:trace() should also be added to the neutral category.

Comment 1 Michael Kay 2008-01-01 18:49:04 UTC

Please ignore the comment about fn:trace() - allowing update instructions to be traced is a much more complex subject than I thought.

I've been trying to implement the rules that () and fn:error() (and no other non-updating expressions) are allowed in updating contexts, and the rules are surprisingly disruptive. The problem is that you can't tell whether one branch of a conditional, say, is an updating expression until you are well into the semantic analysis (because you can't do it until function calls have been bound to the relevant function declarations), and by the time you get to this stage of semantic analysis, a lot of the original syntactic detail will have been lost. Not just redundant parentheses, as mentioned in comment #0, but probably quite a lot else too. Many parsers work by constructing an expression tree in some kind of "core grammar" that discards syntactic distinctions - in my case, for example, this reduces typeswitch to an if/then/else construct. 

Ideally the rule that allows () and fn:error() in updating contexts should be rephrased to work in terms of the type system, so that the rule relates to the type of the result of the expression, and not to its syntactic form. That's probably a significant challenge. Short of that, I would suggest two new rules:

(a) If the processor is able to infer that the result of an expression will always be an empty sequence or an error then it may allow that expression to appear in a position where an updating expression would be allowed.

(b) If the processor is able to infer that a branch of a conditional/typeswitch will never be evaluated, then it is allowed to eliminate that branch before determining whether all branches are consistently updating or non-updating. (This would be the case if it were a dynamic error or type error rather than a static error, and one way of achieving this rule would be to reclassify the error.)

Comment 2 Michael Kay 2008-01-01 19:04:08 UTC

Another observation: the following is allowed by the current rules:

if ($a=1) 
  then ()
else if ($a=2)
  then error()
else if ($a=3)
  then delete node $x
else ()

but the following simplification is disallowed:

if ($a=3)
  then delete node $x
else if ($a=2)
  then error()
else ()

The reason is that in the second case the "else" branch of the outer "if" is not an updating expression and is not one of the two permitted exceptions (() and error()).

Comment 3 John Snelson 2008-01-07 12:17:26 UTC

"My eyes! The goggles, they do nothing!" etc, etc.

I'm sure this is the tip of the iceberg, and many more horrible examples could be derived. I agree with Michael that this is a horrible rule to understand as a user, and nigh-on impossible to implement. I know for a fact that XQilla can't tell the difference between () and (()) when checking this rule.

I think there are only two viable solutions to this problem:

1) Michael's updating / non-updating / neutral classification, along with a firming up of the rules for how these properties are derived.

2) Adopt the Scripting Extensions approach to updating expressions, and allow any non-updating expression to exist where an updating one is expected. This requires a definition of how certain expressions handle mixed updating / non-updating results, which we already understand reasonably well from SE.

I'm pretty sure that the rules for (1) will be hard to describe, and I consider (2) to be the only philosophically sound solution, although it's obviously more disruptive.

Comment 4 Michael Kay 2008-01-07 12:40:05 UTC

I think a possible (and relatively simple) fix would be to change the places where we refer to "the expression () or fn:error()" by a term such as "an ineffective expression", and define "ineffective" to mean "the expression (), or fn:error(), or any non-updating expression that the processor is statically able to determine will always either return an empty sequence or fail with a dynamic error".

We could attempt to define some additional kinds of ineffective expressions that processors are obliged to recognize as such, for the sake of interoperability.

Comment 5 John Snelson 2008-01-08 09:29:20 UTC

The trouble is that the phrase "expression that the processor is statically
able to determine will always either return an empty sequence or fail with a
dynamic error" needs defining unambiguously, otherwise this will be a source of incompatibility between implementations.

Comment 6 Michael Kay 2008-01-08 10:16:14 UTC

My proposal was to use the phrase knowing that it described behaviour that would not be fully interoperable. This is analogous with the pessimistic static typing rules - implementations that can make better inferences than those defined in the spec are allowed to do so.

Of course, if someone can come up with a better proposal, I'd jump at it. This is the best solution I can come up with.

Comment 7 Don Chamberlin 2008-01-26 21:54:53 UTC

Michael,
The working group considered this issue on 22 Jan 2008 and decided to resolve it by defining a category of expression ("vacuous expression") that can be combined with either updating or non-updating expressions. Vacuous expressions will be detected statically, and will include ( ) and error() as well as other expressions  whose effective return values are computed by vacuous expressions. If you are satisfied with this resolution, please change the status of this bug to Closed.
Don Chamberlin (for the Query Working Group)

Comment 8 zhen hua liu 2009-02-06 18:05:42 UTC

This bug does not seemed to be resolved yet based on the latest XQUF spec.
>
>
>     From: w3c-xml-query-wg-request@w3.org [mailto:w3c-xml-query-wg-request@w3.org] On Behalf Of Zhen Liu
>     Sent: 05 February 2009 20:13
>     To: w3c-xml-query-wg@w3.org
>     Subject: issues of vacuous expression in XQUF
>
>     In the latest XQUF spec, vacuous expression is defined as
>     Definition: A vacuous expression is a simple expression that can only return an empty sequence or raise an error.]
>     I recall it is introduced because March 2008 version of XQuery scripting extension has
>     added vacuous expression concept and then XQUF started to adopt this concept.
>
>     My first question:
>     =====================
>      My understanding from the rest of XQUF spec that defines how vacuous expression is
>     determined is that the vacuous expression analysis is done statically. This means one can
>     figure this out without dynamic evaluation. If so, we shall fix the definition to reflect this. It is
>     not clear from the definition that vacuous expression is determined statically.
>
>     Furthermore, what type of static analysis is allowed here ?
>
>     For example, given the following expression:
>     if (fn:true()) then () else 3
>
>     If the static analysis is sophisticated, it can figure out this expression only returns empty sequence,
>     so it  is a vacuous expression.  But if one follows the current XQUF spec AS IS, this is not
>     vacuous expression. This appears to be quite inconsistent as the current XQUF can deduce that
>     if (cond) then () else ()
>     as vacuous expression, but why not
>     if (fn:true()) then () else 3
>
>      My second question regarding to function call:
>     ========================================
>     According to 2.5.6 Function call in XQUF, it states that a call to the built-in function fn:error()
>     is a vacuous expression. What about calling a function which always return empty sequence?
>     That function call shall be considered as vacuous expression, right ?
>
>     If so, then just as 'updating' keyword,
>      'vacuous' shall be used as a keyword to describe function as well
>     But this is not the case in the XQUF spec.
>
>     The other situation is that in many situations, a caller may invoke functions defined in other modules
>     whose query text may not be available to do static analysis, in such case,
>     how could one determine if a
>     function call is vacuous unless vacuous is a keyword to describe a function.
>
>     So it seems to me that introducing vacuous expression in XQUF appears to be unnecessary.
>     The latest XQuery scripting extension has dropped the vacuous expression category completely.
>     Shall we revert the XQUF back to the version which does not have vacuous expression
>     definition ?
>
>     The trigger that promotes me on this is that the latest XQUF conformance tests start to add
>     test cases that require static analysis of  vacuous expressions crossing typeswitch, conditional
>     expr, sequence expr etc.  While adding static analysis to support vacuous expression determination
>     is a  simple exercise, there appears to be some fundamental issues
>     regarding to vacuous expressions (described above)  that needs to be resolved.
>
>     Thanks
>
>     zhen
>

Comment 9 Michael Kay 2009-02-06 18:28:13 UTC

1. I agree that the definition could be tightened. I would suggest:

[Definition: Various expressions are defined in this specification to be vacuous: examples are <code>()</code>, <code>fn:error()</code>, and <code>if (x) then () else ()</code>. When evaluated, a vacuous expression will either return an empty sequence or raise an error. The analysis to determine whether an expression is vacuous is done statically.]

2. My understanding is that the WG rejected my original proposal to allow processors flexibility to decide that expressions such as "if (true()) then () else 3" were vacuous, preferring instead to define a simple set of interoperable rules. I don't see any reason for reopening that question.

3. No-one is ever going to deliberately write a function that always returns () or throws an error, so there seems no point at all in defining a keyword to allow such functions to be labelled. One could extend the rules so that a function call is vacuous if the body of the function being called is vacuous; but that seems an unlikely scenario, and it's hard to define the rule without risking circularity if the function is recursive.

Comment 10 zhen hua liu 2009-02-06 19:47:28 UTC

in your comment #9, consider the xquery dev env where modules can be compiled
and linked separately, without having 'vacuous' as keyword to describe a function,
how could static analysis to determine the following expr is legal or not:

import module namespace md = "http://foo.com"
if ($a) then delete $x/a/b else md:raiseErr()

From the function signature of md:raiseErr() , I only know it is a simple
expression, but I don't know if it is really vacuous or not and the source code
for module "http//foo.com" is not available.



I agree that determination of vacuous expression has to be done statically. However, depending on how sophisticated your static analysis is, the
interoperability is not guaranteed.

Comment 11 Michael Kay 2009-02-06 22:19:05 UTC

As currently defined, a function call on a function other than fn:error() is never vacuous and cannot therefore be mixed with an updating expression.

I can see why you might want to allow this, but I don't think that's an enhancement we should be considering at this stage of the game.

Comment 12 zhen hua liu 2009-02-06 22:56:15 UTC

for your comment #11, I understand if we disallow function to be vacuous, it simplifies the matter. However, is not that inconsistent with the vacuous expression definition which states that it is a simple expression that returns
only empty sequence or raises errors. 
Imagine, for code modularity, people may wish to develop one generic
error function that raises errors and then make all their code to call that
common generic error function.

Comment 13 Jonathan Robie 2009-04-24 17:06:42 UTC

On 25 Feburary, the WG decided to change the definition to address concerns raised by Zhen:

RESOLVED: Change the definition of Vacuous Expression to:

  [Definition: Certain expressions are defined in this specification
   to be "vacuous expressions". These all have the characteristic that
   they can be determined statically to either return an empty
   sequence or raise an error.] Some expressions are always vacuous;
   for instance, an empty parenthesized expression ( ) is a vacuous
   expression. Other expressions may be vacuous if one of their
   operands is vacuous; for instance, if both branches of a
   conditional expression are vacuous, the conditional expression is a
   vacuous expression.

Comment 14 Jonathan Robie 2009-04-27 23:07:17 UTC

(In reply to comment #13)
> On 25 Feburary, the WG decided to change the definition to address concerns
> raised by Zhen:
> 
> RESOLVED: Change the definition of Vacuous Expression to:
> 
>   [Definition: Certain expressions are defined in this specification
>    to be "vacuous expressions". These all have the characteristic that
>    they can be determined statically to either return an empty
>    sequence or raise an error.] Some expressions are always vacuous;
>    for instance, an empty parenthesized expression ( ) is a vacuous
>    expression. Other expressions may be vacuous if one of their
>    operands is vacuous; for instance, if both branches of a
>    conditional expression are vacuous, the conditional expression is a
>    vacuous expression.
> 


In the editor's draft, I changed this definition to the following, which I believe to be equivalent:


[Definition: A "vacuous expression" is an expresson that can be determined statically to always return an empty sequence or raise an error.]
Some expressions are always vacuous; for instance, an empty parenthesized expression ( ) is a vacuous expression. Other expressions may be vacuous if one of their operands is vacuous; for instance, if both branches of a  conditional expression are vacuous, the conditional expression is a vacuous expression.

Comment 15 Michael Dyck 2009-04-28 04:03:56 UTC

See Zhen's "first question" in comment #8, specifically the example
    if (fn:true()) then () else 3
This expression can be determined statically to always return an empty sequence, and so qualifies as a vacuous expression using the definition in comment #14. However, the intent is that it not be a vacuous expression, which is allowed by the definition in comment #13.

Comment 16 Michael Kay 2009-04-28 07:10:46 UTC

Jonathan, I don't believe your wording is equiavalent. The wording that we agreed on might seem contorted, but there was good reason for it: we wanted to make clear that "vacuous expressions" were defined extensionally (we provide a list of constructs considered vacuous), not intensionally (anthing that can be statically inferred to return () or error() is by definition vacuous). Your revised wording fails to capture this distinction.

For example, someone could argue that under your definition, xx[0] is a vacuous expression: but it isn't.

Comment 17 Jonathan Robie 2009-04-28 09:02:41 UTC

(In reply to comment #16)
> Jonathan, I don't believe your wording is equiavalent. The wording that we
> agreed on might seem contorted, but there was good reason for it: we wanted to
> make clear that "vacuous expressions" were defined extensionally (we provide a
> list of constructs considered vacuous), not intensionally (anthing that can be
> statically inferred to return () or error() is by definition vacuous). Your
> revised wording fails to capture this distinction.
> 
> For example, someone could argue that under your definition, xx[0] is a vacuous
> expression: but it isn't.


OK, I think I understand the intent now. But the definition the WG agreed on is neither a clear extensional nor a clear intensional definition. An extensional definition has to state what constructs are vacuous, and not just some examples (outside the definition) listed as "for instance".

I suggest we hash this out on today's call.

Comment 18 Michael Kay 2009-04-28 09:16:28 UTC

>An extensional definition has to state what constructs are vacuous

What the agreed definition does is to say that the detail is to be found elsewhere.

>I suggest we hash this out on today's call.

We spent some time on this before, I personally don't see why it needs to be reopened.

Michael Kay

Comment 19 Jonathan Robie 2009-04-28 12:38:11 UTC

How about this definition, which is more explicit:

<snip>
The following expressions are defined by this specification to be "vacuous expressions": 

* An empty parenthesized expression ( ) is a vacuous expression. 
* A call to the built-in function fn:error is a vacuous expression.
* If all branches are vacuous expressions, the typeswitch expression is a vacuous expression.
* If both branches are vacuous expressions, the conditional expression is a vacuous expression.
* If all operands are vacuous expressions, the comma expression is a vacuous expression.

These expressions can be determined statically to always return an empty sequence or raise an error.
</snip>

Comment 20 Michael Kay 2009-04-28 12:46:52 UTC

It might be explicit but it's not complete, for example it leaves out

((()))

I can't see why you are trying to improve the text which we arrived at so painfully.

Comment 21 Jonathan Robie 2009-04-28 13:01:47 UTC

> It might be explicit but it's not complete, for example it leaves out
> 
> ((()))
> 
> I can't see why you are trying to improve the text which we arrived at so
> painfully.


Because the reader can not easily determine what is and what is not a vacuous expression without reading the entire specification and thinking about it. 

Because someone designing a test suite would have a hard time determining the complete list.

Because searching for terms like "is a vacuous expression" in the text also apparently leaves out some instances, like the one you just cited.

Comment 22 Jonathan Robie 2009-04-28 13:32:27 UTC

I can see that I need to at least add this:

A non-empty parenthesized expression is a vacuous expression if the expression it contains is a vacuous expression.

Are there other cases that I am missing? If not, how about this:

<snip>
 The following expressions are defined by this specification to be "vacuous
 expressions": 
 
 * A call to the built-in function fn:error is a vacuous expression.
 * An empty parenthesized expression ( ) is a vacuous expression. 
 * A non-empty parenthesized expression is a vacuous expression 
   if the expression it contains is a vacuous expression.
 * If all branches are vacuous expressions, the typeswitch expression 
   is a vacuous expression.
 * If both branches are vacuous expressions, the conditional expression 
   is a vacuous expression.
 * If all operands are vacuous expressions, the comma expression 
   is a vacuous expression.
 
 These expressions can be determined statically to always return an empty
 sequence or raise an error.
</snip>

Comment 23 Michael Kay 2009-04-28 14:03:43 UTC

There's at least one other I noticed, namely a FLWOR can be vacuous if its return clause is vacuous.

But on principle, I think it's a bad idea to include this list. It means we are saying things twice, and that creates the risk of inconsistency.

Michael Kay

Comment 24 Michael Kay 2011-01-07 11:27:54 UTC

For the record, the final outcome in the text of the PR is that the list of vacuous expressions is included as a non-normative note.