This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29080 - array:members
Summary: array:members
Status: CLOSED WONTFIX
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Working drafts
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-24 13:55 UTC by Christian Gruen
Modified: 2016-07-21 21:40 UTC (History)
3 users (show)

See Also:


Attachments

Description Christian Gruen 2015-08-24 13:55:23 UTC
People are frequently asking us how to iterate through array entries. A future helper function "array:members" would be convenient, which wraps each array member into a new array:

Example query:

  (: Current representation in XQuery :)
  declare function array:members(
    $array as array(*)
  ) as array(*)* {
    for $i in 1 to array:size($array)
    return [ $array($i) ]
  };

  let $array := [ 1 to 2, 3, () ]
  for $member in array:members($array)
  return <hit>{ $member }</hit>

Result:

  <hit>1 2</hit>
  <hit>3</hit>
  <hit/>

A similar function could be provided for map entries.
Comment 1 Michael Kay 2015-08-24 14:59:32 UTC
Alternatively (1), an extension to FLWOR expressions that binds a variable to each member in turn, as a sequence:

let $my-array := [(1,2,3), (4,5)]
for member $m in $my-array return count($m)

returns (3,2)

Alternatively or additionally (2), a more concise syntax for inline functions, provisionally λ{expr}, so that this can be written, for example

$my:array => array:for-each(
   λ{ <hit>{$1}</hit> }
)

where $1 is implicitly bound to the first supplied argument.
Comment 2 Christian Gruen 2016-06-10 07:42:46 UTC
I just discovered Bug 29685, and I was surprised to see that array:put was just added to the spec at this stage.

We are working with arrays every day, and I would really appreciate if array:members could be added as well. I believe it’s a very basic requirement for arrays to loop through its members; it is straightforward to implement, and it makes a lot of XQuery code much nicer. A map:values function would be another nice addition, also to have some analogy, but it may not be that pressing.
Comment 3 Abel Braaksma 2016-06-10 09:54:43 UTC
(In reply to Michael Kay from comment #1)
> Alternatively (1), an extension to FLWOR expressions that binds a variable
> to each member in turn, as a sequence:
> 
> let $my-array := [(1,2,3), (4,5)]
> for member $m in $my-array return count($m)
> 
> returns (3,2)
I find this interesting and a welcome addition (I'm not quite sure I understand your 2nd proposal with lambda-expressions). If this is added, I suggest to make it orthogonal with maps, as looping though a map is just as useful as looping through an array (apart from the undefined order).



(In reply to Christian Gruen from comment #2)
> A map:values function would be another nice addition, also to have some 
> analogy, but it may not be that pressing.
We tried, back in the day that maps were only a feature of XSLT 3.0. The problem is with:

let $m := map {'a' : (1,2,3), 'b' : (2,3,4)},
    $v := map:values($m)
return $v[2]

You might expect the second *value* to be returned (i.e., (2,3,4) or (1,2,3)), but the sequences would collapse. Instead you would get 2, or 3, depending on how the order of the map is. Furthermore, if the map is copied, the internal order may change, leading to the same code returning a different value the next time around.

While I believe map:values() to be a worthwhile addition, even if only to inspect its contents, this was one of the main reasons it was dropped (though we maintain it as an extension function in Exselt).
Comment 4 Christian Gruen 2016-06-10 10:08:08 UTC
(In reply to Abel Braaksma from comment #3)
> (In reply to Michael Kay from comment #1)
> > Alternatively (1), an extension to FLWOR expressions that binds a variable
> > to each member in turn, as a sequence:
> > 
> > let $my-array := [(1,2,3), (4,5)]
> > for member $m in $my-array return count($m)
> > 
> > returns (3,2)
> I find this interesting and a welcome addition (I'm not quite sure I
> understand your 2nd proposal with lambda-expressions). If this is added, I
> suggest to make it orthogonal with maps, as looping though a map is just as
> useful as looping through an array (apart from the undefined order).
’

Personally, I would vote for including a simple function. It requires no FLWOR expression and it should be no-brainer in terms of specification and implementation. But I’m looking forward to more discussion on this.


> We tried, back in the day that maps were only a feature of XSLT 3.0.

Here the new magic of arrays would help out as well (the example is very similar to my array:members proposal):

  (: Current representation in XQuery :)
  declare function map:values(
    $map as map(*)
  ) as array(*)* {
    map:for-each($map, function($key, $value) {
      [ $value ]
    })
  };

  let $m := map {'a' : (1,2,3), 'b' : (2,3,4)}
  for $v in local:values($m)
  return <hit>{ $v }</hit>
Comment 5 Josh Spiegel 2016-06-10 14:37:39 UTC
I think this function would lead to surprising results in cases where there is not implicit flattening/atomization.  For example:

  let $arr := [1, 2, (), 4]
  return
    array {
      for $i in array:members($arr)  
      where not(empty($i))
      return 
        $i
    }

I might expect this to filter empty values from the array.  But actually it would evaluate to [[1],[2],[()],[4]]

You are right that we have made changes recently to the functions and operators.  We added array:put and modified the signature of array:remove and map:remove.  In these cases, the modifications were low-risk and straightforward changes that addressed usability problems.  While I have some sympathy for the problem you are trying to solve, I am not convinced that this is the right way to solve the problem. And, as you point out, users can effectively iterate array members using 1 to array:size().
 
  let $arr := [1, 2, (), 4]
  return
    array {
      for $i in 1 to array:size($arr)  
      where not(empty($arr($i)))
      return 
        $i
    }

==>
   
   [1, 2, 4]
Comment 6 Christian Gruen 2016-06-10 20:16:15 UTC
I see that the implicit conversion of array members to new arrays is probably not as obvious as I thought it would possibly be. And indeed it may be necessary to flatten/atomize the looped entries.

To continue with John’s example (thanks for your helpful remarks), my notion was that people would automatically use array:join to rebuild arrays:

  let $arr := [1, 2, (), 4]
  return array:join(
    for $i in array:members($arr)
    where exists($i?*)
    return $i
  )

Or, shorter:

 array:join(array:members($arr)[exists(?*)])

As the example shows, the following check would always yield true:

  fn:deep-equal($array, array:join(array:members($array))

Originally, I expected array:for-each to be the best candidate to process all array entries, but it has been decided that the function will always return a new array.
Comment 7 Abel Braaksma 2016-06-11 14:26:45 UTC
I think that whatever direct approach would be proposed (using a function), there will always be surprising effects, both for map:values() and array:values()/members(). And a syntax change in the CR phase has a large risk of not being accepted.

To counter that, we could consider an alternative which is less obtrusive a change to the spec:

I propose therefore to change the semantics of the SimpleMapExpr (!) slightly, to allow the left-hand to be a singleton array or a singleton map. If it is an array or map, the bang-operator loops over the items in the map or array (setting the focus item, size and position to the respective member). 

If a given item in a map or array has size > 1, the function or operator on the right-hand side must be able to deal with that, or an error is raised.

This approach has a couple of benefits:
1) It serves principle of least surprise ($arr ! name(.) will work as expected)
2) It is an unobtrusive change (I think) at this stage
3) It solves the problem raised in this bug report
4) It is easy to implement (I think)
5) It deals neatly with sequence concatenation and leaves it to the programmer
6) Easy to use in the typical case where items have arity zero-or-one
7) No backwards-compat issues, XP30 has no arrays and no maps
8) Solves quite a few use-cases, readable syntax

Down sides, surprises:
a) It has the effect that the rh-side can receive an empty seq (focus item of size zero) if a member of array or map is an empty seq, which is otherwise not possible. We could opt to skip such items, or allow this.
b) WG members may frown upon different business rules based on the type and size of the left-hand operand (but the Lookup operator has similar differences)
Comment 8 Michael Kay 2016-06-11 17:44:24 UTC
>I propose therefore to change the semantics of the SimpleMapExpr (!) slightly,
to allow the left-hand to be a singleton array or a singleton map. 

Please no. The LHS can already be a singleton array or a singleton map (indeed, it can be anything) and it currently offers complete orthogonality and substitutability. If we start special-casing we end up with the kind of mess represented by the "/" operator (as well as a backwards incompatibility); and as soon as there are special cases like this, your ability to reason about equivalence of expressions, and therefore to do optimization, rapidly diminishes.
Comment 9 Josh Spiegel 2016-06-13 23:51:51 UTC
Consider this query:

  declare variable $games := parse-json('[
     { "location" : "Webster", "scores" : [8, 10] },
     { "location" : "Kirkwood", "scores" : [5,7] },
     { "location" : "Webster", "scores" : [7,3] }
  ]');

  declare function local:getAverages($location) {
     $games?*[?location eq $location]?scores ! avg(.)
  };

  <result>
     <webster>{local:getAverages('Webster')}</webster>
     <kirkwood>{local:getAverages('Kirkwood')}</kirkwood>
  </result>

Currently this evaluates to:

  <result><webster>9 5</webster><kirkwood>6</kirkwood></result>

But under your proposal, it would return:

  <result><webster>9 5</webster><kirkwood>5 7</kirkwood></result>

Correct?

If so, I think a user would find this surprising.
Comment 10 Abel Braaksma 2016-06-14 14:23:23 UTC
(In reply to Josh Spiegel from comment #9)
> Currently this evaluates to:
> 
>   <result><webster>9 5</webster><kirkwood>6</kirkwood></result>
> 
> But under your proposal, it would return:
> 
>   <result><webster>9 5</webster><kirkwood>5 7</kirkwood></result>
> 
> Correct?
> 
> If so, I think a user would find this surprising.
I think this currently evaluates to an error, because, unless I misunderstand the signature or your code, fn:avg#1 does not take an array as its argument (which in itself may be considered surprising).

(In reply to Michael Kay from comment #8)
> Please no. The LHS can already be a singleton array or a singleton map
> (indeed, it can be anything) and it currently offers complete orthogonality
> and substitutability. 
I understand your resentment against the proposal, but we have done a similar thing, for instance, for the new lookup operator, which special cases based on whether it is an item or an array.

I think that, conversely, the orthogonality and substitutability argument can also be used in favor of allowing arrays in some places where currently only sequences are allowed. I.e., in the fn:avg example in comment#9 from Josh.

Some languages (Python, C#) allow any traversable sequences (arrays, dictionaries, collections, queues) for such functions as looping, iterations, averaging, concat operators etc. So I don't think the idea itself is very strange.

Anyway, I thought it was a good idea (in the sense of "an easy solution"), but it looks like there's a lot more to it than I thought.
Comment 11 Josh Spiegel 2016-06-14 14:29:40 UTC
> I think this currently evaluates to an error, because, unless I misunderstand > the signature or your code, fn:avg#1 does not take an array as its argument 
> (which in itself may be considered surprising).

This is not correct.  The signature of fn:avg is:

  fn:avg($arg as xs:anyAtomicType*) as xs:anyAtomicType?

The call to this function will implicitly atomize the input and thus would flatten array. 

e.g. 

   <test-case name="fn-avg-10">
      <test>avg([1,2,3,4,5])</test>
      <result>
         <assert-eq>3</assert-eq>
      </result>
   </test-case>
Comment 12 Abel Braaksma 2016-06-14 15:35:03 UTC
(In reply to Josh Spiegel from comment #11)
> The call to this function will implicitly atomize the input and thus would
> flatten array. 
ah, thanks, of course, I totally missed that...
Comment 13 Michael Kay 2016-06-14 17:15:43 UTC
We left this open today so we could give it further thought.

I think it's probably true that arrays containing non-singleton members are a bit tricky to use, and that using arrays rather than sequences as the members of a "nested collection" is probably a good idea.

Converting an array whose members are sequences to an array whose members are arrays is easy enough:

array:for-each($in, function($m) { [$m] })

Turning the array of arrays into a sequence of arrays is also easy enough: $a?*

So the proposed function could also be written:

declare function array:members($array as array(*)) as array(*)* {
    array:for-each($in, function($m) { [$m] }) ? *
  };

Where a function has a one-line implementation, we need a very strong justification to add it to the spec. I think that's particularly true where the function is the composition of two simple operations that are easily expressed and likely to be often used in isolation: in this case, turning an array of sequences into an array of arrays, and turning an array of items into a sequence of items.

I've been trying to devise use cases, and they all suffer from the syndrome "I wouldn't start from here": I would start with an array of arrays rather than an array of sequences. In looking for cases where you would naturally do something that produces an array of sequences, I can't find any where it wouldn't be equally easy to produce an array of arrays.

So I'm not convinced of the need.
Comment 14 Christian Gruen 2016-06-14 20:49:21 UTC
I clearly see there’s not much consent on the original proposal. The discussion has been enriching, though; it would be fine with me to reassign this back to "XQuery 3.2 Use Cases and Requirements".

PS: Sorry, Josh, for addressing you with John.
Comment 15 Josh Spiegel 2016-06-14 20:57:31 UTC
> PS: Sorry, Josh, for addressing you with John.

No problem!
Comment 16 Michael Kay 2016-06-21 15:20:31 UTC
The WG decided to close this with no action.