This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25445 - [XP3.1] Replace curly array constructor with a function
Summary: [XP3.1] Replace curly array constructor with a function
Status: RESOLVED WONTFIX
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 3.1 (show other bugs)
Version: Last Call drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-24 22:03 UTC by Michael Kay
Modified: 2014-07-29 08:44 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2014-04-24 22:03:15 UTC
Semantically, array { X } is a pure function of X. I can't see any good reason why it needs special syntax; it should be a function call.

We have used curly brace syntax traditionally for things that are too complex to express as functions. This one isn't. It converts a sequence to an array; the inverse operation, to convert an array to a sequence, is a function.

There is no other case where we have used curly braces for something that is a simple non-higher-order function.

With the syntax growing it's becoming increasingly difficult to remember where to use curlies and where to use parens. For things like typeswitch I have to look it up every time. So let's avoid new syntax where we don't need it, and lets avoid making the two parallel operations array() and seq() different.

Plus, using functions in a functional language has obvious benefits.
Comment 1 Jonathan Robie 2014-04-25 13:59:02 UTC
The only reason we have two syntaxes is that we want one of these syntaxes to use commas the same way functions do, and we want the other to accept an arbitrary sequence and create an array whose members are the items in the sequence.

The square array constructor uses commas the same way that functions do, commas delimit the members of the resulting array. If we were to use function syntax for an array constructor, this would be a better candidate. 

The curly array constructor takes an arbitrary expression and creates an array with one member for each item in the sequence. It does not use commas the same way as function calls, a comma is a comma operator.

One of the use cases creates an array of maps. The current solution uses JSONiq, so the square array constructor has the same semantics as our curly array constructor:

[
  for $w in $s()
  return { "pos" : $w(2), "lemma" : $w(1) }
]

Creating arrays of maps or maps of arrays will be common when working with JSON. But with our current square constructor, this expression creates an array with one member, a sequence of maps. I was not happy with that decision. JSON doesn't even allow that structure, and I expect this to be a common mistake for people working with JSON.

I think your suggestion is probably backwards. We need one syntax in which the comma delimits arguments the same way as commas in function calls. Lets use function call syntax for that. We need another syntax in which the items in an arbitrary sequence are used as the members of an array. Function call syntax doesn't work well for that.
Comment 2 Michael Kay 2014-04-25 14:47:17 UTC
I did not intend to reopen the discussion about the semantics of [a,b,c]. I think we got it right in Prague: it creates an array with three members, these being the values of a, b, and c. I think that's what most people would expect, as we discussed in Prague many languages have such a construct and they invariably create an array with N+1 members where N is the number of commas. We can't do this with a function call unless it is a variable-arity function call.

I'm concerned here with the other construct, array{X}. I want to understand whether there is a good reason for having custom syntax for this, rather than using a function call as we do with its inverse, seq(X).

I see this being used in situations like

array {
  for $x in employee
  return $x/salary/data()
}

and perhaps this is why curlies were chosen; the FLWOR looks more like a statement than an expression to people from other cultures, because of its sentential syntax, and in those cultures curly braces are used to group "statements". But that's not our culture; we have an expression language, and array{} is semantically a pure function call. Making it a pure function allows things like

(a/b/c) => array()

which people will increasingly expect to be able to write.
Comment 3 Jonathan Robie 2014-04-25 15:52:33 UTC
(In reply to Michael Kay from comment #2)
> I did not intend to reopen the discussion about the semantics of [a,b,c]. I
> think we got it right in Prague: it creates an array with three members,
> these being the values of a, b, and c.

I think this issue inherently reopens the discussion of the syntax of array constructors.  And frankly, I'd rather first rewrite all the use cases in our current syntax before we revisit these decisions, we spent a lot of time getting to where we are now. If we do revisit these decisions, we should look at construction of arrays in general, not just one kind of array constructor. 

In Prague, we had agreement that we needed two kinds of constructors. One camp felt that the comma in [a,b,c] should have the same meaning that it has in (a,b,c).  The other camp felt that the comma should have the same meaning that it has in function calls, separating arguments.

In this issue, you have suggested that we use function call syntax for array constructors. I think that makes most sense for the syntax that uses commas the same way function calls do.

> We can't do this with a function call unless it is a variable-arity
> function call.

Yes, it would be variable arity.

> I'm concerned here with the other construct, array{X}. I want to understand
> whether there is a good reason for having custom syntax for this, rather
> than using a function call as we do with its inverse, seq(X).

Why do you have this question only for one of the two array constructor syntaxes?  The question seems equally apt for both.

> I see this being used in situations like
> 
> array {
>   for $x in employee
>   return $x/salary/data()
> }
> 
> and perhaps this is why curlies were chosen; the FLWOR looks more like a
> statement than an expression to people from other cultures, because of its
> sentential syntax, and in those cultures curly braces are used to group
> "statements". But that's not our culture; we have an expression language,
> and array{} is semantically a pure function call. 

In our culture, we use {} for computed constructors of many kinds - documents, elements, attributes, maps, even text nodes, PIs, and comments. To me, this seems perfectly in line with those constructs.

> Making it a pure function allows things like
> 
> (a/b/c) => array()
> 
> which people will increasingly expect to be able to write.

We could certainly add a function to do that. I'm not sure how important it is to create structures this way, or why arrays are different from documents, elements, attributes, maps, etc. 

We're talking about syntax here, and I think the best way to determine the most convenient syntax for expressions in a language is to look at a body of examples, written in each proposal. Syntax always has a high potential for bike shedding, so I suggest we first rewrite the use cases in our current syntax, then entertain change proposals for attribute constructors.

[
  for $w in $s()
  return { "pos" : $w(2), "lemma" : $w(1) }
]

array { a/b/c }
Comment 4 Jonathan Robie 2014-07-27 20:17:56 UTC
Here is a set of examples from the use cases in three different syntaxes:

** XQuery 3.1 CWD

The syntax in the current working draft.

** Changing array { } to array(())

Michael's proposal in comment 0 of this BZ. To my eyes, these
expressions are harder to read because of the number of parentheses.

** Changing array { } to [], changing [] to array()

A counterproposal that allows us to replace one of our array
constructor syntaxes with array() rather than array(())

Here are some examples from the use cases.

Note: 

  I did not find an example that depends on the comma behavior
  we have defined for the current [] operator, so I will try to
  construct such an example in a subsequent comment.

* Example 1:

** XQuery 3.1 CWD

declare function local:spellcheck($languages, $text)
{
  map:new (
     { "languages" : $languages },
     { "raw" : $text  },
     for $l in $languages
     return map {
       $l : array { $text ! ext:sc($l, .) }
     }
  )
};

** Changing array { } to array(())

declare function local:spellcheck($languages, $text)
{
  map:new (
     { "languages" : $languages },
     { "raw" : $text  },
     for $l in $languages
     return map {
       $l : array (( $text ! ext:sc($l, .) ))
     }
  )
};

** Changing array { } to [], changing [] to array()

declare function local:spellcheck($languages, $text)
{
  map:new (
     { "languages" : $languages },
     { "raw" : $text  },
     for $l in $languages
     return map {
       $l : array [ $text ! ext:sc($l, .) ]
     }
  )
};


* Example 2:

** XQuery 3.1 CWD

[
  for $w in $s()
  return array { "pos" : $w(2), "lemma" : $w(1) }
]

** Changing array { } to array(())

[
  for $w in $s()
  return array (( "pos" : $w(2), "lemma" : $w(1) ))
]

** Changing array { } to [], changing [] to array()

[
  for $w in $s()
  return [ "pos" : $w(2), "lemma" : $w(1) ]
]

* Example 3:

** XQuery 3.1 CWD

  map {
    true() : array { $s[$p(.)] },
    false() : array { $s[not($p(.))] }
  }

** Changing array { } to array(())

  map {
    true() : array (( $s[$p(.)] )),
    false() : array (( $s[not($p(.))] ))
  }


** Changing array { } to [], changing [] to array()

  map {
    true() : [ $s[$p(.)] ],
    false() : [ $s[not($p(.))] ]
  }


* Example 4:

** XQuery 3.1 CWD

declare function local:mult( $matrix1, $matix2 )
{
  if (length($matrix1) != length($matrix2(1))
  then error("Matrices must be m*n and n*p to multiply!")
  else array {
     for $i in 1 to length($matrix1)
     return array {
         for $j in 1 to length($matrix2(1))
         return
            sum (
               for $k in 1 to length($matrix2)
               return $matrix1($i)($k) * $matrix2($k)($j)
            )
     }
  }
};


** Changing array { } to array(())

declare function local:mult( $matrix1, $matix2 )
{
  if (length($matrix1) != length($matrix2(1))
  then error("Matrices must be m*n and n*p to multiply!")
  else array ((
     for $i in 1 to length($matrix1)
     return array ((
         for $j in 1 to length($matrix2(1))
         return
            sum (
               for $k in 1 to length($matrix2)
               return $matrix1($i)($k) * $matrix2($k)($j)
            )
     ))
  ))
};

** Changing array { } to [], changing [] to array()

declare function local:mult( $matrix1, $matix2 )
{
  if (length($matrix1) != length($matrix2(1))
  then error("Matrices must be m*n and n*p to multiply!")
  else [
     for $i in 1 to length($matrix1)
     return [
         for $j in 1 to length($matrix2(1))
         return
            sum (
               for $k in 1 to length($matrix2)
               return $matrix1($i)($k) * $matrix2($k)($j)
            )
     ]
  ]
};

* Example 5: assign items to groups

Note: We don't have really good use cases in our document for this. 
      I don't consider this one strong, but it illustrates the syntax.

** XQuery 3.1 WD

let $x := (1, 2, 3, 4, 5, 6, 7, 8, 9)
return  [$x[. mod 2 eq 0], $x[. mod 3 eq 0], $x[. mod 5 eq 0]]

** Changing array { } to array(())

(Same as above.)

let $x := (1, 2, 3, 4, 5, 6, 7, 8, 9)
return  [$x[. mod 2 eq 0], $x[. mod 3 eq 0], $x[. mod 5 eq 0]]

** Changing array { } to [], changing [] to array()

let $x := (1, 2, 3, 4, 5, 6, 7, 8, 9)
return  array( $x[. mod 2 eq 0], $x[. mod 3 eq 0], $x[. mod 5 eq 0])
Comment 5 Jonathan Robie 2014-07-29 08:44:07 UTC
The WG decided to close this bug with no change.