This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29346 - [XP31] XPath-style currying, or the arrow operator, may require a bit more specification
Summary: [XP31] XPath-style currying, or the arrow operator, may require a bit more sp...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 29459
  Show dependency treegraph
 
Reported: 2015-12-18 01:26 UTC by Abel Braaksma
Modified: 2016-03-08 15:52 UTC (History)
4 users (show)

See Also:


Attachments

Description Abel Braaksma 2015-12-18 01:26:20 UTC
Today's discussion with Christian Grün and Florent Georges (see https://lists.w3.org/Archives/Public/public-xsl-query/2015Dec/0028.html etc) led me to writing this bug, it seems that the production rules and their explanation of the current consensus, plus the text in this part of the spec are not entirely in line.

This applies to 3.22 Arrow operator (=>) of the internal WD (CR).

The production rules are as follows:

  ArrowExpr ::= UnaryExpr ( "=>" ArrowFunctionSpecifier ArgumentList )*
  ArrowFunctionSpecifier ::= EQName | VarRef | ParenthesizedExpr

The mandatory part of the spec in this section says:

<quote>
[Definition: An arrow operator applies a function to the value of a primary expression, using the value as the first argument to the function.] If $s is a sequence and f() is a function, then $s=>f() is equivalent to f($s), and $s=>f($j) is equivalent to f($s, $j).
</quote>

The (possible) issues:
----------------------

1) The text says "and f() is a function", but f() can never be a function. It can be a function call, or a function call evaluating to a function.

Suggestion, something like: "if ArrowFunctionSpecifier is either a function, a variable reference bound to a function or an expression evaluating to a function"

2) We don't specify what happens if this goes wrong. Maybe we don't need to, but I wonder what a => (concat#3)("b") would return. The spec suggests this is equivalent to (concat#3)(a, "b") but this will raise XPTY0004 (wrong no of args). Should we emphasize this can happen? Or will the returned value be a new function, with one argument, as in (concat#3)(a, "b", ?)?

3) Bug #26889 has some discussion that suggests placeholder-syntax cannot be used as the rhs. However, I believe the production rules, and the accompanying text leave room for a => (concat('b', ?)) being valid. Should it not be? I think it makes sense it is valid, but if it isn't, we should probably say so.

I believe it is valid because
a) we allow ParenthesizedExpr
b) the rhs translates to a function taking one argument
c) the equivalence-rule applies, turning this effectively in (concat('b', ?))(a), which is still legal.

4) The example in (3) is notably different from a => concat('b', ?), which translates to concat('a', 'b', ?). Still valid (returning a function item), but this difference is hard to spot and very subtle. While people have grown used to using parentheses to remove ambiguity of the comma operator, the difference between concat('b', ?) and (concat('b', ?)) is almost indiscernible.

If my interpretation is correct, perhaps we could mention this as a Note?

I believe this is valid because:
a) the prod rules above use ArgumentList
b) which allows for Argument between the ","
c) which allows for either an ArgumentPlaceHolder or a SingleExpr

5) We don't mention the possible occurrence of the rhs. Considering the equivalence rules, it stands to reason that
a) if rhs is empty sequence, an error is raised (cannot have empty seq. as target of a function call, XPTY0004)
b) if rhs has more-than-one, an error is raised (cannot have more-than-one as target of a function call, XPTY0004)

Though I wouldn't mind allowing something other than exactly-one, that's probably too much a change at CR stage (besides other idiosyncrasies this may raise)

Not sure this point requires a change, other than perhaps pointing out that the same errors apply as for a (dynamic) function call, whichever applies.

6) The text calls upon the equivalence with normal function calls, but we don't allow NamedFunctionRef as an alternative in ArrowFunctionSpecifier, though with "normal" function calls it is allowed. For consistency, it seems appropriate to allow it (currently, it is allowed, but requires parentheses).

This would make (a, b) => string-join#1() legal. Perhaps not a big win, but at least consistent and a small change. It is strange that (a, b) => (string-join#1)() is a required rewrite to make this expression valid.


PS: I used "currying" in the title not because it is the same as currying in other functional languages (it is not), but because it does allow to return a partial applied function as the result of the arrow expression, which is "kind of" currying (except that, if you chain an operation that way using arrows, you need parentheses).
Comment 1 Abel Braaksma 2015-12-18 01:31:58 UTC
(In reply to Abel Braaksma from comment #0)
> I believe the production rules, and the accompanying text leave room for 
> a => (concat('b', ?)) being valid.
This should, of course, be a => (concat('b', ?))() to be a valid expression here, because the production rules require the parentheses at the end.

(which makes me wonder, why do we require them? Perhaps because the EBNF would otherwise become ambiguous?)
Comment 2 Abel Braaksma 2015-12-18 01:43:31 UTC
(In reply to Abel Braaksma from comment #0)
> 4) The example in (3) is notably different from a => concat('b', ?), which 
This was incorrect, and should have been a => concat('b', ?)(). With comment #1 this shows that my point (4) was hogwash, the required parens at the end make concat('b', ?)() and (concat('b', ?))() equivalent, and the surrounding parens don't change the meaning (which is good, of course).
Comment 3 Michael Kay 2015-12-18 10:24:51 UTC
I agree that the ArrowOperator is very poorly specified and in particular that it is entirely unclear what happens if the ArgumentList contains a PlaceHolder.

I'm inclined to interpret the spec as saying that it CAN contain a PlaceHolder, because (a) the grammar allows it, and (b) it makes sense semantically. But I don't think we have any tests that demonstrate this. With this interpretation, I would be inclined to rewrite the spec as follows:

An ArrowExpr is alternative syntax for a *static function call* or *dynamic function call*. 

* If the ArrowFunctionSpecifier is an EQName, then the expression A => f() (where A is any UnaryExpr and f is the EQName) is equivalent to the static function call f(A), and the expression A => f(ARGS) (where ARGS is a comma-separated sequence of *Argument*s) is equivalent to the static function call f(A, ARGS).

* If the ArrowFunctionSpecifier is a VarRef or ParenthesizedExpr, then the expression A => E() (where A is any UnaryExpr, and E is the VarRef or ParenthesizedExpr) is equivalent to the dynamic function call E(A), and the expression A => E(ARGS) (where ARGS is a comma-separated sequence of *Argument*s) is equivalent to the dynamic function call E(A, ARGS).

In both cases the ArgumentList may contain a PlaceHolder, in which case the ArrowExpr is a *partial function application*.

The equivalence of an ArrowExpr to a static or dynamic function call applies also to error conditions. For example, a type error occurs if the result of evaluating the VarRef or ParenthesizedExpr is not a function, or if it is a function with incorrect arity. 

This syntax is particularly useful when conventional function call syntax leads to an expression with deeply nested parentheses. For instance, the following expression:

tokenize((normalize-unicode(upper-case($string), 'NFC')), ",\s*")

can be replaced by the more easily readable:

$string => upper-case() => normalize-unicode('NFC') => tokenize(",\s*")
Comment 4 Abel Braaksma 2015-12-18 11:14:07 UTC
(In reply to Michael Kay from comment #3)
> With this interpretation, I would be inclined to rewrite the spec as follows:
> 
> An ArrowExpr is alternative syntax for a *static function call* or *dynamic
> function call*. 
> <snip />
Thanks for this. This seems to address all my concerns, it is a solid rewrite and, more importantly, it removes the possible misinterpretations and/or under-specification of the text.

> For example, a type error occurs if the result of evaluating the VarRef or 
> ParenthesizedExpr is not a function, or if it is a function with incorrect 
> arity. 
I agree that for *dyn function calls* this is indeed a type error, but if it is static, it is a syntax error (at least I see nothing special under 3.1.5.1 on static function calls), i.e., we raise XPST0017 statically, or XPTY0004 dynamically.

What is your take on aligning this more closely with the static function call syntax, i.e. my point (6) in comment#0? That is, allowing a => concat#3('b', 'c')? It requires adding NamedFunctionRef with otherwise the same semantics as you describe for EQName.

Conversely, if we don't want to add this, I don't see how a => (concat#3)('b', 'c') is a dynamic function call, I believe it is also static. In which case we should either expand the text, or expand the grammar, whichever is more convenient.
Comment 5 Abel Braaksma 2015-12-18 12:07:06 UTC
Hmm, I just realize something. I don't think we can so easily allow the placeholder syntax. Your current proposal makes:

a => concat('b', ?, 'd')

resolve to:

concat(a, 'b', ?, 'd')

while it will make:

a => (concat('b', ?, 'd'))()

resolve to:

concat('b', a, 'd')

and it will make the following illegal:

a => concat('b', ?, 'd')()

I am inclined to suggest that we unify these calls to mean the same. I can't think of a use-case why it is handy to treat an arrow expr with an argument list with a placeholder as returning a function that takes one or more arguments. Besides, it simply "reads" exactly the other way: the arrow operator (to me at least) suggests replacing the question mark on the RHS if there is one.

To that affect, I suggest to change your proposal slightly, by adding/augmenting as follows:

<proposal>
In the expression A => F(ARGS?), where A is UnaryExpr and F is ArrowFunctionSpecifier ArgumentList, rewrite as follows:
* If F(ARGS?) is a *function call* but not a *partial function application*, it is rewritten as the partial function application:
   a) if ARGS is empty, let FA be F(?)
   b) if ARGS is not empty, let FA be F(?, ARGS)
* If F(ARGS?) is a *partial function application*, FA is F(ARGS)
* The first ArgumentPlaceHolder in FA is now replaced by A and evaluated as described in 3.1.5.1.
</proposal>

I think this works with the following expressions:

a => concat('b', ?)
becomes step 1: concat('b', ?)
becomes step 2: concat('b', a)

a => concat('b', 'c')
becomes step 1: concat(?, 'b', 'c')
becomes step 2: concat(a, 'b', 'c')

a => concat(?, ?, 'd')
becomes step 1: concat(?, ?, 'd')
becomes step 2: concat(a, ?, 'd')

a => (let $c := concat#2 return $c)('b', ?)
becomes step 1: (let $c := concat#2 return $c)('b', ?)
becomes step 2: (let $c := concat#2 return $c)('b', a)

a => (let $c := concat#2 return $c)('b')
becomes step 1: (let $c := concat#2 return $c)(?, 'b')
becomes step 2: (let $c := concat#2 return $c)(a, 'b')

let $c := concat#3 return a => $c('b', 'c')
becomes step 1: ... return $c(?, 'b', 'c')
becomes step 2: ... return $c(a, 'b', 'c')

a => (function($b) { $b + 1 })()
becomes step 1: (function($b) { $b + 1 })(?)
becomes step 2: (function($b) { $b + 1 })(a)

I think that this way it is more "natural" and it is also (still) a valid interpretation of the way the current text is written. It emphasizes the fact that ArrowExpr is a shorthand, or a macro if you wish.

However, I realize that it deviates in details from your proposal.

PS: (though probably too late in CR), I think we can do without the superfluous, but required parentheses at the end for everything except the EQName variant, allowing the RHS to become an expression that returns just a function item with arity 1.
Comment 6 Michael Kay 2015-12-18 12:29:02 UTC
>What is your take on aligning this more closely with the static function call
syntax, i.e. my point (6) in comment#0? That is, allowing a => concat#3('b',
'c')? It requires adding NamedFunctionRef with otherwise the same semantics as
you describe for EQName.

We need a very strong justification to change the grammar at this stage, and I can't see that this is important enough. It only provides cosmetic improvements for something that is surely not a common requirement.

Incidentally, as defined in the spec concat#3(1,2,3) is not a static function call, it is a dynamic function call.

If we want more orthogonality in the grammar then the top thing on my list would be to add VarRef on the rhs of a lookup operator. But we don't want minor improvements now, we want to finish.
Comment 7 Michael Kay 2015-12-18 12:42:07 UTC
Your current proposal makes:

a => concat('b', ?, 'd') resolve to: concat(a, 'b', ?, 'd')

Correct

while it will make: a => (concat('b', ?, 'd'))() resolve to: concat('b', a, 'd')

I wouldn't say it "resolves" to that, but yes, it's equivalent to that.

and it will make the following illegal:

a => concat('b', ?, 'd')()

Yes, because (a => concat('b', ?, 'c')) evaluates to a function of arity 1, so the argument list must have length 1. Why is this a problem?

>I can't think of a use-case why it is handy to treat an arrow expr with an argument
list with a placeholder as returning a function that takes one or more
arguments.

The general principle of orthogonality in language design is that you don't introduce restrictions and exceptions to prevent something working just because you can't think of a use case. There's no use case for using a function literal in a dynamic function call (as in round#1(93.7)) but we allow it for orthogonality reasons.

I don't think we need to allow the combination of "=>" and "?" in the same expression but there is also no justification to disallow it.

>The first ArgumentPlaceHolder in FA is now replaced by A and evaluated as
described in 3.1.5.1.

I think that would be a poor design even if we were rethinking the design from scratch. But as it's attempting to add a feature after going to a second CR, and the feature is only cosmetic, I think it's impossible to justify.
Comment 8 Abel Braaksma 2015-12-18 16:55:13 UTC
(In reply to Michael Kay from comment #7)
> I think that would be a poor design even if we were rethinking the design
> from scratch. But as it's attempting to add a feature after going to a
> second CR, and the feature is only cosmetic, I think it's impossible to
> justify.
I am sorry, it was not my intend to add a feature. I believe currently the spec says zero to nothing about ArgumentPlaceholders here, so I think that leaves us to the choice to:

a) disallow it as a syntax error
b) allow it by turning a => f(b, ?) into f(a, b, ?)
c) allow it by turning a => f(b, ?) into f(b, a)

Since the definition of the arrow operator reads:

  An arrow operator applies a function to the value of a primary expression, 
  using the value as the first argument to the function.

I think we can go either way. The first argument of the function returned by the expression "f(b, ?)" is "?" as it returns a function with one argument.

Likewise, if you do not think in "a function" as in function item per se, your solution applies equally well.

In terms or orthogonality, I think there is something to say for either solution. And whether we choose (a) or (b) seems to me a matter of an arbitrary choice, albeit that I have a preference for (b) as it appears more "natural" to me and, since the purpose of the arrow operator is to get rid of parens, I think in practice it will lead exactly that: less parens.

As a consequence of either (a), (b) or (c) it will always make a certain group of expressions illegal (raising dynamic errors), that are currently legal (or not) only by virtue of choice of implementers.

(In reply to Michael Kay from comment #6)
> We need a very strong justification to change the grammar at this stage, and
> I can't see that this is important enough. <snip />
Ok, makes sense.

> Incidentally, as defined in the spec concat#3(1,2,3) is not a static
> function call, it is a dynamic function call.
Ah, of course, thanks for pointing that out.

> If we want more orthogonality in the grammar then the top thing on my list
> would be to add VarRef on the rhs of a lookup operator. But we don't want
> minor improvements now, we want to finish.
Of course. (I didn't even realize that VarRef is not allowed there, I see now that you have to write it as M?($varref) to be valid, assuming that's what you meant here).
Comment 9 Abel Braaksma 2015-12-18 16:56:24 UTC
>  And whether we choose (a) or (b)
that should've been "(b) or (c)"...
Comment 10 Abel Braaksma 2015-12-18 16:57:36 UTC
> albeit that I have a preference for (b) 
And that should've been (c) (really, why can't I edit my own comments???)
Comment 11 Adam Retter 2016-01-10 22:06:45 UTC
Just to add some +1 to the thread and a preference for the outcome, please see my enquiry on xquery-talk about this very issue: http://markmail.org/search/list:com.x-query.talk#query:list%3Acom.x-query.talk+page:1+mid:bv46r2trzwzhp3jq+state:results
Comment 12 Josh Spiegel 2016-01-12 17:14:47 UTC
I like the clarification that Mike gave here:
https://lists.w3.org/Archives/Public/public-xsl-query/2015Sep/0019.html

"Given a UnaryExpr U, an ArrowFunctionSpecifier F, and a ArgumentList (A, B, C…), the expression U => F(A, B, C…) is equivalent to the expression F(U, A, B, C…)"

So: 

  a => f(b, ?) 

Is equivalent to:

  f(a, b, ?)
Comment 13 Josh Spiegel 2016-01-12 17:34:33 UTC
At meeting #629 on 2016-01-12 the working group decided to clarify the definition of the arrow expression as described in comment 12.
Comment 14 Abel Braaksma 2016-02-20 13:55:09 UTC
I am reopening this bug report following discussion at the F2F of the XSLWG where we found that no changes were made to the text as suggested by comment#12 and comment#13.

I checked the changelog today and didn't see any new changes in XP31, the internal WD text is equal to the public CR text.
Comment 15 Josh Spiegel 2016-02-20 18:11:31 UTC
Abel, I made the change this past Wednesday.  I'm not sure why you can't see it.  Can you look again?

https://www.w3.org/XML/Group/qtspecs/specifications/xquery-31/html/xpath-31.html#id-arrow-operator

I am closing the bug but please reopen it if you still think there is a problem.
Comment 16 Abel Braaksma 2016-03-08 15:52:03 UTC
> Can you look again?
I have meanwhile seen the changes, it may have been a caching issue, my apologies.