This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5630 - [DM] Tuples and maps
Summary: [DM] Tuples and maps
Status: CLOSED WONTFIX
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Data Model 3.0 (show other bugs)
Version: Working drafts
Hardware: PC Windows XP
: P2 enhancement
Target Milestone: ---
Assignee: Anders Berglund
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL: http://www.nesterovsky-bros.com
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-07 08:33 UTC by Vladimir Nesterovsky
Modified: 2008-07-09 20:02 UTC (History)
1 user (show)

See Also:


Attachments

Description Vladimir Nesterovsky 2008-04-07 08:33:53 UTC
I suggest to introduce two types into xpath/xslt/xquery type system:
  opaque sequence (tuple), and
  associative array (map);

Tuples and maps should be considered as atomic values. In contrast to sequences, sequence of tuples or maps is not transformed into signle sequence.

Tuples and maps need to be comparable. 
This way one can sort/group by several values at once.

Tuple shall store a sequence of items, or more specific types.
Tuple shall be constructed from a sequence.
Tuple shall allow to access contained sequence.
Possible examples of tuple declarations:
  tuple() - tuple containing a sequence of items;
  tuple(*) - the same;
  tuple(xs:int, node()) - pair containing int and node;
  tuple(xs:string+) - tuple containing non empty sequence of strings.

Map shell store a collection of pairs (key, value).
Map shall be constructed from a sequence of (key, value) pairs.
Map shall allow to get a contained sequence of (key, value) pairs.
Map shall allow to get a contained sequence of keys.
Map shall allow to get a contained sequence of values.
Map shall allow to get a sequence of values corresponding to a specified key.
Possible examples of tuple declarations:
  map() - map of items;
  map(xs:string, element()) - map of string (as key) to element() as a value.

To support tuples and maps library shall define functions:
  tuple($items as item()*) as tuple();
    Constructs a tuple from items.

  tuple-items($tuple as tuple()) as item()*
    Returns contained items for a specified tuple.

  tuple-item($tuple as tuple(), $index as xs:integer) as item()?
    Returns an items with a specified index for a specified tuple.

  map($items as item()*) as map();
    Constructs a tuple from (key, value) pairs of items.
  
  map-items($map as map()) as item()*;
    Returns (key, value) items, contained in a map.

  map-keys($map as map()) as item()*;
    Returns keys, contained in a map.

  map-values($map as map()) as item()*;
    Returns values, contained in a map.

  map-value($map as map(), $key as item()) as item()*;
    Returns values, contained in a map, and corresponding to a specified key.
Comment 1 Michael Kay 2008-04-07 08:42:57 UTC
I think this proposal would have a much better chance of acceptance if you motivate it with some use cases. You need to describe a problem, show how it is difficult or impossible to solve with current capabilities, and then show how it would be solved if this capability were added.
Comment 2 Vladimir Nesterovsky 2008-04-07 09:57:15 UTC
1. Tuples and maps supplement sequences and allow to express set based
operations in more consise and natural way.

2. In principle tuples can be implemented either with xml tree, or with sequence that uses some "terminator" items to separate subsequences.

Either of these approaches have weakness: 
it does not express clearly algorithm's intention, 
which is, operate over a collection of sequences.

Moreover, tuples being a part of type system can be implemented 
more efficiently then tuple emulations.

3. Maps to some extent can be implemented with xml tree, and 
key function (in xslt), however, map is more rich, operates 
not only over xml (as key do), but with other types of items.

Map can be used to group sequence by some criterion, and operate with
this grouping on further stage.

Map can be used as a state bag, and may achive the same results in xquery,
as tunnel paramerers do in xslt.

4. Map use case.

Suppose you want to group items per some condition, and 
allow further processing over these groups.

Maps allow you to solve this task like in an example below:

<xsl:template match="/">
  <root>
    <xsl:variable name="cities" as="element()*">
      <city name="Jerusalem" country="Israel"/>
      <city name="London" country="Great Britain"/>
      <city name="Paris" country="France"/>
      <city name="New York" country="USA"/>
      <city name="Brazilia" country="Brazilia"/>
      <city name="Moscow" country="Russia"/>
      <city name="Tel Aviv" country="Israel"/>
      <city name="St. Petersburg" country="Russia"/>
    </xsl:variable>

    <!-- This constructs a map of pairs (country, city).  -->
    <xsl:variable name="map" as="map()" select="
      f:map
      (
        for $city in $cities return
          ($city/string(@country),  $city)
      )"/>
    
    ... Some processing that leads to call f:process(map)

  </root>
</xsl:template>

<xsl:function name="f:process">
  <xsl:param name="map" as="map()"/>

  <xsl:for-each select="f:map-keys($map)">
    <xsl:variable name="key" as="xs:string"/>

    <country name="{$key}">
      <xsl:sequence select="f:map-value($map, $key)"/>
    </country>
  </xsl:for-each>
</xsl:function>

At present the only acceptable way to solve this task is to constuct temporary tree,
however this does not preserve items identity.

5. Tuple use case. "Java bean formatter".

After building method's parameters one needs to format them 
one (compact) or the other (verbose) way depending on decision, 
which can be made when all parameters are already built 
(e.g. depending on number of parameters).

At present there is an option is to use "terminator" to separate subsequences:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

  <xsl:output method="xml" indent="yes"/>

  <!-- Terminator token. -->
  <xsl:variable name="t:terminator" as="xs:QName"
    select="xs:QName('t:terminator')"/>

  <!-- New line. -->
  <xsl:variable name="t:crlf" as="xs:string" select="'&#10;'"/>

  <xsl:template match="/">
    <!--
      We need to manipulate a sequence of sequence of tokens.
      To do this we use $t:terminator to separate sequences.
    -->
    <xsl:variable name="short-items" as="item()*">
      <xsl:sequence select="t:get-param('int', 'a')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'b')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'c')"/>
      <xsl:sequence select="$t:terminator"/>
    </xsl:variable>

    <xsl:variable name="long-items" as="item()*">
      <xsl:sequence select="t:get-param('int', 'a')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'b')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'c')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'd')"/>
      <xsl:sequence select="$t:terminator"/>
    </xsl:variable>

    <result>
      <short>
        <xsl:value-of select="t:format($short-items)" separator=""/>
      </short>
      <long>
        <xsl:value-of select="t:format($long-items)" separator=""/>
      </long>
    </result>
  </xsl:template>

  <!--
    Returns a sequence of tokens that defines a parameter.
      $type - parameter type.
      $name - parameter name.
      Returns sequence of parameter tokens.
  -->
  <xsl:function name="t:get-param" as="item()*">
    <xsl:param name="type" as="xs:string"/>
    <xsl:param name="name" as="xs:string"/>

    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$name"/>
  </xsl:function>

  <!--
    Format sequence of sequence of tokens separated with $t:terminator.
      $tokens - sequence of sequence of tokens to format.
      Returns formatted sequence of tokens.
  -->
  <xsl:function name="t:format" as="item()*">
    <xsl:param name="tokens" as="item()*"/>

    <xsl:variable name="terminators" as="xs:integer+"
      select="0, index-of($tokens, $t:terminator)"/>
    <xsl:variable name="count" as="xs:integer"
      select="count($terminators) - 1"/>
    <xsl:variable name="verbose" as="xs:boolean"
      select="$count > 3"/>

    <xsl:sequence select="
      for $i in 1 to $count return
      (
        subsequence
        (
          $tokens,
          $terminators[$i] + 1,
          $terminators[$i + 1] - $terminators[$i] - 1
        ),
        if ($i = $count) then () 
        else
        (
          ',',
          if ($verbose) then $t:crlf else ' '
        )
      )"/>
  </xsl:function>

</xsl:stylesheet>

If we allow tuple() type. This task can be solved as:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

  <xsl:output method="xml" indent="yes"/>

  <!-- New line. -->
  <xsl:variable name="t:crlf" as="xs:string" select="'&#10;'"/>

  <xsl:template match="/">
    <!--
      Use a sequence of tuples.
    -->
    <xsl:variable name="short-items" as="tuple()*">
      <xsl:sequence select="tuple(t:get-param('int', 'a'))"/>
      <xsl:sequence select="tuple(t:get-param('int', 'b'))"/>
      <xsl:sequence select="tuple(t:get-param('int', 'c'))"/>
    </xsl:variable>

    <xsl:variable name="long-items" as="tuple()*">
      <xsl:sequence select="tuple(t:get-param('int', 'a'))"/>
      <xsl:sequence select="tuple(t:get-param('int', 'b'))"/>
      <xsl:sequence select="tuple(t:get-param('int', 'c'))"/>
      <xsl:sequence select="tuple(t:get-param('int', 'd'))"/>
    </xsl:variable>

    <result>
      <short>
        <xsl:value-of select="t:format($short-items)" separator=""/>
      </short>
      <long>
        <xsl:value-of select="t:format($long-items)" separator=""/>
      </long>
    </result>
  </xsl:template>

  <!--
    Returns a sequence of tokens that defines a parameter.
      $type - parameter type.
      $name - parameter name.
      Returns sequence of parameter tokens.
  -->
  <xsl:function name="t:get-param" as="item()*">
    <xsl:param name="type" as="xs:string"/>
    <xsl:param name="name" as="xs:string"/>

    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$name"/>
  </xsl:function>

  <!--
    Format sequence of sequence of tokens separated with $t:terminator.
      $tokens - sequence of sequence of tokens to format.
      Returns formatted sequence of tokens.
  -->
  <xsl:function name="t:format" as="item()*">
    <xsl:param name="tuples" as="tuple()*"/>

    <xsl:variable name="verbose" as="xs:boolean"
      select="count($tuples) > 3"/>

    <xsl:sequence select="
      for $tuple in $tuples return
      (
        tuple-items($tuple),
        if ($i = $count) then () 
        else
        (
          ',',
          if ($verbose) then $t:crlf else ' '
        )
      )"/>
  </xsl:function>

</xsl:stylesheet>

Xslt that uses tuples is more consise, and operates in term of algorithm of task being solved, whereas
xslt that uses terminators exposes some "lower level" methods to operate with a sequence of sequences.

6. Combination of tuple and map allow to achive groupping by several items at once.
Tuples in xsl:sort element allow to sort data by several items at once, which
is a shorthand of a several xsl:sort elements.
Comment 3 Vladimir Nesterovsky 2008-04-09 05:04:52 UTC
To continue with use cases it's worth to mention that
the windowing (5609) can be solved with tuples,
through library or custom function:

window($items as item()*, $window-size as xs:integer) as tuple()*

Positional grouping in xquery (5608) can be solved using
a function calculating group boundaries and returning 
a sequence of tuples for these boundaries.

The other example is a node reference (5613), which
can be seen as tuple(node()?) - a tuple containing optional node.
Comment 4 Anders Berglund 2008-07-09 20:01:47 UTC
At the joint meeting of the XSL and XQuery Working groups 2008-06-23
it was decided that a change of this nature would be too large for the
next "point" release of the Recommendations. The request for new
functionality will be considered for a future "main" release.