This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29723 - [FO31] map:merge
Summary: [FO31] map:merge
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-07 15:31 UTC by Tim Mills
Modified: 2016-12-16 19:55 UTC (History)
1 user (show)

See Also:


Attachments

Description Tim Mills 2016-07-07 15:31:06 UTC
The "use-last" policy for map:merge means that the implementation has to process the entire input before it can determine the output.  This used to be the case for parse-json, but a change was made to parse-json so that it defaulted to "use-first".

Should map:merge not have a similar option, and a similar default?
Comment 1 Michael Kay 2016-07-10 09:38:47 UTC
Note also: in map:merge, the sentence (in a Note)

"This means that in edge cases, where (as a result of numeric promotion) the ·same key· relation is not transitive, different implementations may give different results."

is obsolete and should be deleted.

Otherwise, my preference is for the status quo. I don't think the rationale given is strong enough to justify a change.
Comment 2 Michael Kay 2016-07-19 17:22:58 UTC
ACTION A-650-02: Action on Mike Kay to write a proposal for adding options to map merge (use-first, use-last, reject, combine, don’t care).  

Proposal (in addition to deleting the redundant sentence mentioned above).

Add an optional second argument to map:merge, so there are two signatures:

map:merge($maps as map(*)*) as map(*)
map:merge($maps as map(*)*, $options as map(*)) as map(*)

<quote>
The effect of the arity-1 function is the same as the effect of the arity-2 function when an empty map is supplied as the value of $options.

The $options argument can be used to control the way in which duplicate keys are handled. The ·option parameter conventions· apply.

The entries that may appear in the $options map are as follows:

Key	Value	Meaning

duplicates	Determines the policy for handling duplicate keys, when two or maps in the value of $maps contain the *same key*. The required type is xs:string. The default value is "use-first".

reject	An error is raised [err:FOJS0003] if duplicate keys are encountered.

use-first	If duplicate keys are present, all but the first of a set of duplicates are ignored, where the ordering is based on the order of maps in the $maps argument.

use-last	If duplicate keys are present, all but the last of a set of duplicates are ignored, where the ordering is based on the order of maps in the $maps argument.

combine         If duplicate keys are present, the result map includes an entry for the key whose associated value is the sequence-concatenation of all the values associated with the key, retaining order based on the order of maps in the $maps argument.

unspecified     If duplicate keys are present, the effect is implementation-defined; the implementation may choose one of the above strategies for handling duplicates, or may choose some other strategy.

Informally, the supplied maps are combined as follows:

There is one entry in the returned map for each distinct key present in the union of the input maps, where two keys are distinct if they are not the ·same key·.

For any key that appears in only one of the input maps, the associated value for that key in the result map is the same as the associated value in that input map.

For any key that appears in more than one of the input maps, the associated value for that key in the result map depends on the way in which duplicates are handled, as described above.

The definitive specification of map-merge#2 is as follows. The result of the function call map:merge($MAPS, $OPTIONS) is defined to be the result of the expression

let $FOJS0003 := QName("......", "FOJS0003"),

$duplicates-handler := map {
  "use-first": function($a, $b) {$a},
  "use-last": function($a, $b) {$b},
  "combine": function($a, $b) {$a, $b},
  "reject": function($a, $b) {error($FOJS0003)},
  "unspecified": function($a, $b) {{#vendor:defined#}{}}
},

$combine-maps := function($A as map(*), $B as map(*), $deduplicator as function(*)) {
    fn:fold-left(map:keys($B), $A, function($z, $k){ 
        if (map:contains($z, $k))
        then map:put($z, $k, $deduplicator($z($k), $B($k)))
        else map:put($z, $k, $B($k))
    })
}
return fn:fold-left($MAPS, map{}, 
    $combine-maps(?, ?, $duplicates-handler(($OPTIONS?duplicates, "use-first")[1]))
            

</quote>

Change the example map:merge(($week, map{6:"Sonnabend"})) to illustrate the various options:

reject -> error
use-first -> 6:"Samstag"
use-last -> 6:"Sonnabend"
combine -> 6:("Samstag", "Sonnabend")
unspecified -> result is implementation-defined.

Change the example of map:merge that appears under fn:collation:key.
Comment 3 Michael Kay 2016-07-19 17:27:05 UTC
In addition, in XSLT §21.3, the specification of the xsl:map instruction can be changed to replace

let $keys := $maps!map:keys(.)
return if (count($keys) = count(distinct-values($keys)))
       then map:merge($maps)                        
       else error()

with the simpler formulation

map:merge($maps, map{"duplicates":"reject"})
Comment 4 Abel Braaksma 2016-07-20 13:42:44 UTC
> unspecified     If duplicate keys are present, the effect is 
> implementation-defined; the implementation may choose one of the above 
> strategies for handling duplicates, or may choose some other strategy.
I would like to suggest that "unspecified" may not act as "combine" or "reject", because that results in a functional and too unpredictable difference. It should act as a "use-first", "use-last" or "use-either", in other words, the only unpredictability of "unspecified" is whether the LH-value or the RH-value of any given key is used.

If we wouldn't do this, a user cannot know whether he gets a combined sequence for a given key, or whether he should add a try/catch.

This then also resolves another use-case: if the user knows that duplicate keys exist, but with the same or exchangeable value, "unspecified" would become the optimal strategy, where the implementer can choose either value and can short-circuit evaluation, but will not error, or combine.
Comment 5 Abel Braaksma 2016-07-20 13:47:36 UTC
To make that concrete, I'd propose a text like this:

unspecified     If duplicate keys are present, the chosen value is implementation-defined; the implementation may choose the "use-first" or "use-last" strategy for handling duplicates, or may choose some other strategy. The implementation may not combine or raise an error on duplicates if this option is chosen.
Comment 6 Abel Braaksma 2016-07-20 13:51:35 UTC
Sorry, I think this is better (ignore comment 5), and better complements the existing wording:

unspecified      If duplicate keys are present, all but one of a set of duplicates are ignored, where it is implementation-defined which item from the set is chosen.
Comment 7 Michael Kay 2016-07-26 20:12:01 UTC
The proposal in comment 2 was accepted as modified by comment 6, with the additional proviso that the effect of "unspecified" is implementation-dependent rather than implementation-defined.
Comment 8 Michael Kay 2016-07-26 20:28:13 UTC
Test cases have been updated.
Comment 9 Michael Kay 2016-07-26 22:07:41 UTC
The F+O 3.1 and XSLT 3.0 specifications have been updated.