8662 – [XQ31ReqUC] Requirement: Recursive inline functions

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 8662 - [XQ31ReqUC] Requirement: Recursive inline functions

Summary: [XQ31ReqUC] Requirement: Recursive inline functions

Status:	NEW

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Requirements for Future Versions (show other bugs)
Version:	Working drafts
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jim Melton
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-01-06 17:21 UTC by Pavel Minaev
Modified:	2014-05-20 16:44 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description Pavel Minaev 2010-01-06 17:21:21 UTC

With the syntax and semantics of inline functions as defined by XQuery 1.1 Working Draft from 15 December 2009, there is no clean and easy way to define an inline function that can recursively call itself, nor there is any such way to define two or more mutually recursive inline functions. In particular:

    let $fib := function($n as xs:integer) as xs:integer {
       if ($n < 3) then 1 else $fib($n-1) + $fib($n+2)
    }
    return $fib(10)

doesn't work, because $fib isn't yet bound within the body of the inline function used for its initializer. The only workaround is to explicitly pass the function to itself:

    let $fib := function($fib as item(), $n as xs:integer) as xs:integer {
       if ($n < 3) then 1 else $fib($n-1) + $fib($n+2)
    }
    return $fib($fib, 10)

however, this requires the type of $fib to be specified as item(), as there is no way to spell out a recursive function type (that has parameter of its own type).

A typical usage scenario for this is a loop where every iteration depends on the result of computation of the previous one, expressed as a self-recursive inline function with accumulator-passing style (since such a thing cannot, in general, be expressed as an FLWOR expression) - a very common case in many other functional languages, such as Scheme, ML or Haskell.

Some possible approaches to tackle this:

1) Add a special construct analogous to "let", specifically to define recursive local functions, with identifier being defined accessible within the body of the function. E.g. "local function ... return ...":

    local function fib($n as xs:integer) as xs:integer {
       if ($n < 3) then 1 else $fib($n-1) + $fib($n+2)
    }
    return fib(10)

This has the advantage that it could be trivially extended to mutually recursive functions, wherein "local function" would allow a comma-separated list of function definitions like "let", all of which would be in scope of all function bodies thus defined within the same construct:

    local function foo() { bar() },
                   bar() { foo() }
    return foo()

2) Add a special, strongly typed construct to reference or call the function from within its body, e.g. "recurse":

    return (function($n as xs:integer) as xs:integer {
        if ($n < 3) then 1 else recurse($n-1) + recurse($n+2)
    })(10)

where "recurse" is only defined within a function body, always has exact same signature as the innermost enclosing function, and, ideally, can be used with operator # to produce function items, as in "recurse#1" in the above example.

This only covers the self-recursive case, but is significantly shorter for a particularly common case of function being called only once outside its body, by removing the need to bind it to a variable altogether. It may also make the typical intent of such a construct clearer, especially with a well-chosen syntax.

Comment 1 John Snelson 2010-01-06 18:26:08 UTC

1) Construct to Define Local Recursive Functions
   ---------------------------------------------

There are two related possibilities here:

a. Allow definition of named, scoped, local functions, ie:

declare function outer($a)
{
  declare function f1($b)
  {
    $a, f2($b)
  };

  declare function f2($b)
  {
    $b, f1($b)
  };

  f1#1;
};

b. Many functional languages have a recursive "let" whose variables are all in scope for any of the bound expressions:

letrec $f1 := function($b) { $a, $f2($b) },
       $f2 := function($b) { $b, $f1($b) }
return $f1

I think I'm personally in favour of a solution like this.

2) Reference to self
   -----------------

I dislike the idea of a special named function which is only in scope sometimes, and has a different type signature depending on where it is in scope. If we went this route, I would prefer a special variable like $this or $self of type function(*), which contains a reference to the current function.

Of course as you point out, this would not solve the mutually recursive function problem.

3) Recursive Global Variables
   --------------------------

Another (possibly tangential but related) issue is the rules about not allowing a global variable that references itself. It seems odd to allow this:

declare function f() { 1, f() };

but not this:

declare variable $f := 1, $f;

Both the function f() and the variable $f evaluate perfectly well on an implementation using lazy evaluation.

Maybe we could make our prolog rules simpler by allowing variables to refer to themselves.

Comment 2 Pavel Minaev 2010-01-06 20:11:01 UTC

(In reply to comment #1)
> There are two related possibilities here:
> 
> a. Allow definition of named, scoped, local functions, ie:
> 
> declare function outer($a)
> {
>   declare function f1($b)
>   {
>     $a, f2($b)
>   };
> 
>   declare function f2($b)
>   {
>     $b, f1($b)
>   };
> 
>   f1#1;
> };

What if I want a local function declaration at a deeper expression nesting level? The above syntax doesn't seem to allow for it, and "local function ... return ..." is effectively the above syntax adapted to allow arbitrary nesting.

> b. Many functional languages have a recursive "let" whose variables are all in
> scope for any of the bound expressions:
> 
> letrec $f1 := function($b) { $a, $f2($b) },
>        $f2 := function($b) { $b, $f1($b) }
> return $f1
> 
> I think I'm personally in favour of a solution like this.

This has the problem of also seemingly allowing (at least syntactically):

   letrec $x := $y, $y := $x

On the first glance this problem is already solved for globals, but that solution, if applied to locals, would prohibit the mutually recursive function declarations above. Furthermore, no simple rule can be devised, such as "recursive reference is legal within function body only", as that can still be abused:

   letrec $x := function(){$y}(), $y := function(){$x}()

The only case that can be easily statically guaranteed to be safe is the one where variables are directly bound to inline functions, and all recursive references are within bodies of said functions. At which point we get precisely the same semantics as "local function ... return ..." with a different syntax (which arguably seems to be more general than it actually is).

A more complicated version of the above is R5RS approach, where "letrec" requires that "it must be possible to evaluate each <init> without assigning or referring to the value of any <variable>". This, of course, also requires a strict definition of what it takes to "refer" to the value of variable (as opposed to merely referring to the variable itself), which is possible to do in Scheme because it explicitly defines variables as identifiers bound to locations holding values, not as identifiers bound directly to values. Coming up with a similar rigorous definition for XQuery may be tricky.

An alternative is to require implementation to detect any infinite recursion in initializers dynamically if it cannot prove it statically - the approach taken by ML with "let rec". That would be an additional burden on implementation with an unavoidable runtime penalty, however, and will be inconsistent with rules for global variable bindings.


> 2) Reference to self
>    -----------------
> 
> I dislike the idea of a special named function which is only in scope
> sometimes, and has a different type signature depending on where it is in
> scope. If we went this route, I would prefer a special variable like $this or
> $self of type function(*), which contains a reference to the current function.

It could also be a keyword that just looks like a function call. It would be desirable to have it properly typed, however, if only to require a static error if it is called with incorrect arity.

Also, any such "$this" would effectively be part of the static context, and, so far as I know, the existing practice is for accessors to static context parts to be functions, rather than implicitly-defined variables (in fact, are there any of the latter in the current XQuery 1.1?).

On the other hand, if type signature is made to be strictly corresponding to the function, then perhaps a keyword would best convey the idea that this isn't quite a "normal" function/variable, but has some magic associated with it.
 
> Another (possibly tangential but related) issue is the rules about not allowing
> a global variable that references itself. It seems odd to allow this:
> 
> declare function f() { 1, f() };
> 
> but not this:
> 
> declare variable $f := 1, $f;
> 
> Both the function f() and the variable $f evaluate perfectly well on an
> implementation using lazy evaluation.

But doesn't XQuery spec explicitly permit non-lazily-evaluating implementations by design? If so, the above construct, if allowed, would effectively have implementation-defined behavior with an extremely pronounced difference, with that not being obvious.

Also, even on a lazy implementation that would allow it, it would result in an infinite-length sequence. Such sequences don't seem to be explicitly prohibited by XDM (one could argue that the definition of "a sequence of zero or more items" includes "infinite"), but their non-existence is implied elsewhere. E.g. the definition of "context size" in XQuery (and XPath) defines it as an "integer greater than zero" - seemingly excluding "infinity" from the range of valid values. As well, the definition of count() - nor any other standard function - does not specify the effect should it be called on an infinite sequence. And any reasonable and useful definition (i.e. anything more specific than "entirely implementation-defined") would require lazy evaluation.

Comment 3 John Snelson 2010-01-07 12:06:10 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > There are two related possibilities here:
> > 
> > a. Allow definition of named, scoped, local functions, ie:
> > 
> > declare function outer($a)
> > {
> >   declare function f1($b)
> >   {
> >     $a, f2($b)
> >   };
> > 
> >   declare function f2($b)
> >   {
> >     $b, f1($b)
> >   };
> > 
> >   f1#1;
> > };
> 
> What if I want a local function declaration at a deeper expression nesting
> level? The above syntax doesn't seem to allow for it, and "local function ...
> return ..." is effectively the above syntax adapted to allow arbitrary nesting.

I agree that this solution doesn't allow arbitrary nesting of named function definitions, but to do that I think you need to go with option 1b.

> > b. Many functional languages have a recursive "let" whose variables are all in
> > scope for any of the bound expressions:
> > 
> > letrec $f1 := function($b) { $a, $f2($b) },
> >        $f2 := function($b) { $b, $f1($b) }
> > return $f1
> > 
> > I think I'm personally in favour of a solution like this.
> 
> This has the problem of also seemingly allowing (at least syntactically):
> 
>    letrec $x := $y, $y := $x

It does, but I don't see the problem with that. It's an infinite loop - but there are many ways to write an infinite loop in a turing complete language.

> On the first glance this problem is already solved for globals, but that
> solution, if applied to locals, would prohibit the mutually recursive function
> declarations above. Furthermore, no simple rule can be devised, such as
> "recursive reference is legal within function body only", as that can still be
> abused:
> 
>    letrec $x := function(){$y}(), $y := function(){$x}()
> 
> The only case that can be easily statically guaranteed to be safe is the one
> where variables are directly bound to inline functions, and all recursive
> references are within bodies of said functions. At which point we get precisely
> the same semantics as "local function ... return ..." with a different syntax
> (which arguably seems to be more general than it actually is).

I don't think there's a good reason to try to avoid infinite loops in variable values. You can already get infinite loops in functions, and we don't see the need to try to figure out complicated rules to avoid them. Additionally there are legitimate meaningful reasons to recursively refer to a variable from within it's initialising expression, ie:

letrec $fibonacci := 0, 1, zipWith(function($a, $b) { $a + $b },
                  $fibonacci, subsequence($fibonacci, 2))


> > 2) Reference to self
> >    -----------------
> > 
> > I dislike the idea of a special named function which is only in scope
> > sometimes, and has a different type signature depending on where it is in
> > scope. If we went this route, I would prefer a special variable like $this or
> > $self of type function(*), which contains a reference to the current function.
> 
> It could also be a keyword that just looks like a function call. It would be
> desirable to have it properly typed, however, if only to require a static error
> if it is called with incorrect arity.

XQuery (without the optional pessimistic "Static Typing" feature) is a dynamic typed language, so we only need to ensure that it's /possible/ for an implementation to assign a stricter type. In this case that is entirely possible, and I imagine exactly what most implementations would do.

> Also, any such "$this" would effectively be part of the static context, and, so
> far as I know, the existing practice is for accessors to static context parts
> to be functions, rather than implicitly-defined variables (in fact, are there
> any of the latter in the current XQuery 1.1?).

I think if it were a function, that function would return a function reference to the current function - not /be/ the current function by a different name.

> > Another (possibly tangential but related) issue is the rules about not allowing
> > a global variable that references itself. It seems odd to allow this:
> > 
> > declare function f() { 1, f() };
> > 
> > but not this:
> > 
> > declare variable $f := 1, $f;
> > 
> > Both the function f() and the variable $f evaluate perfectly well on an
> > implementation using lazy evaluation.
> 
> But doesn't XQuery spec explicitly permit non-lazily-evaluating implementations
> by design? If so, the above construct, if allowed, would effectively have
> implementation-defined behavior with an extremely pronounced difference, with
> that not being obvious.

This is already the case. My implementation of XQuery can quite happily work with infinite sequences, whilst others can't.

Comment 4 Michael Kay 2010-01-07 12:50:47 UTC

>declare variable $f := 1, $f;

I think it would be a very bad idea to do this, because it would force a processor to use lazy evaluation as its only strategy. This would prevent, for example, eagerly building an index over a sequence when the processor can see that there are repeated filter requests on the sequence. It would also be a fertile source of hard-to-diagnose user errors.

As for the general requirement, I'm not convinced that the value of the requested feature justifies the complexity of any of the solutions proposed so far. In the vast majority of cases, it's possible to write the function as a global named function. The exception is when the function accesses local variables external to itself; and it's easy enough to work around that by adding parameters to the global function.

But if we do decide to provide the feature, I think my preferred solution would be that within an inline function (or perhaps within any function body), a variable $fn:self should be implicitly bound, its value being the function that is being defined.

Comment 5 Jonathan Robie 2010-01-07 14:55:50 UTC

(In reply to comment #4)

> As for the general requirement, I'm not convinced that the value of the
> requested feature justifies the complexity of any of the solutions proposed so
> far. In the vast majority of cases, it's possible to write the function as a
> global named function. The exception is when the function accesses local
> variables external to itself; and it's easy enough to work around that by
> adding parameters to the global function.

I agree.

I'm concerned about complexity. I don't see a lot of functionality to justify  the complexity of the proposed features.

Comment 6 John Snelson 2010-02-05 20:49:48 UTC

I've written some analysis and a proposal, available here:

http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Feb/0015.html
http://jpcs.posterous.com/adding-recursive-inline-function-to-xquery-11

Comment 7 Pavel Minaev 2010-02-05 21:39:19 UTC

(In reply to comment #6)
> I've written some analysis and a proposal, available here:
> 
> http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Feb/0015.html
> http://jpcs.posterous.com/adding-recursive-inline-function-to-xquery-11

There are some uncertainties in the syntax. For one thing, the suggested syntax looks like a variable binding, complete with $ - but then functions are called without using $, as if they were true local functions and not variables:

   local $even := ...
   ...
   even(abs($a) - 1)
   
Also, the syntax itself is, IMO, too generic-looking while being specialized. In particular, keyword "local", on one hand, does not in any way indicate the main difference from "let" (scope of bound identifiers), and on the other hand, looks like a general-purpose local variable declaration, even though it cannot actually be used as such.

One idea would be to allow "letrec" (given that XQuery syntax is traditionally verbose, "let recursive" looks more in-line with the existing language) as described in point #3, but restrict initializer to inline function on grammar level. Such a restriction would fully enable this particular use case, and also allow for further extensions, if they are deemed desirable (e.g. mutually recursive lazy sequences a la Haskell, etc). It is also rather self-descriptive. So:

  let recursive
    $even := function($a)
    {
      if($a == 0) then true()
      else $odd(abs($a) - 1)
    },
    $odd := function($a)
    {
      if($a == 0) then false()
      else $even(abs($a) - 1)
    }
  return ($even, $odd) 

is fine, but neither:

   let recursive $x := $y, $y := $x

nor

   let recursive $x := 1, $y := 2

is not, because the expression must be an inline function.

If this syntax is still too misleading with respect to the limitation (less generic than it seems to be), then why not be fully explicit?

  let recursive function
    $even($a)
    {
      if($a == 0) then true()
      else $odd(abs($a) - 1)
    },
    $odd($a)
    {
      if($a == 0) then false()
      else $even(abs($a) - 1)
    }
  return ($even, $odd)

Comment 8 John Snelson 2010-02-05 23:20:12 UTC

(In reply to comment #7)
> There are some uncertainties in the syntax. For one thing, the suggested syntax
> looks like a variable binding, complete with $ - but then functions are called
> without using $, as if they were true local functions and not variables:
> 
>    local $even := ...
>    ...
>    even(abs($a) - 1)

That's an unfortunate typo - thanks for spotting it. Of course, the functions should be called using the variables ($even in the quoted example).

> Also, the syntax itself is, IMO, too generic-looking while being specialized.
> In particular, keyword "local", on one hand, does not in any way indicate the
> main difference from "let" (scope of bound identifiers), and on the other hand,
> looks like a general-purpose local variable declaration, even though it cannot
> actually be used as such.

I thought that "local" was more descriptive then "letrec", but maybe I've been looking at the syntax for too long.

> One idea would be to allow "letrec" (given that XQuery syntax is traditionally
> verbose, "let recursive" looks more in-line with the existing language) as
> described in point #3, but restrict initializer to inline function on grammar
> level. Such a restriction would fully enable this particular use case, and also
> allow for further extensions, if they are deemed desirable (e.g. mutually
> recursive lazy sequences a la Haskell, etc). It is also rather
> self-descriptive. So:
> 
>   let recursive
>     $even := function($a)
>     {
>       if($a == 0) then true()
>       else $odd(abs($a) - 1)
>     },
>     $odd := function($a)
>     {
>       if($a == 0) then false()
>       else $even(abs($a) - 1)
>     }
>   return ($even, $odd) 
> 
> is fine, but neither:
> 
>    let recursive $x := $y, $y := $x
> 
> nor
> 
>    let recursive $x := 1, $y := 2
> 
> is not, because the expression must be an inline function.

The proposal already restricts the variable binding expressions to be inline functions. I think that "let recursive" is a reasonable alternative syntax that should be considered.

Comment 9 John Snelson 2010-02-10 00:18:06 UTC

Today the XQuery Working Group considered and rejected the proposal linked from comment #6. There is hope that we can reconsider this feature as a requirement for XQuery 1.2.

Comment 10 Jonathan Robie 2014-05-20 16:44:18 UTC

Assigning to future requirements per Working Group decision (https://lists.w3.org/Archives/Member/w3c-xsl-query/2012Oct/0087.html).