Bugzilla – Bug 20573

Random generator in XPath

Last modified: 2014-09-15 09:26:13 UTC

A random generator is present in almost every programming language. I was surprised that XPath does not have one. e.g. fn:random() as xs:double returns a random number from the interval [0,1] fn:random($l as xs:double, $u as xs:double) as xs:double returns a random number from the interval [$l,$u]

The main reason for the omission, I think, is the difficulty of doing it with deterministic functional semantics. See also bug #13494 and bug #13747. The EXSLT library attempts to tackle the requirement with a deterministic function that generates a number that is a pseudo-random function of a supplied seed: http://www.exslt.org/random/functions/random-sequence/index.html

Moving this to the 3.1 category as it's too late to be considered for 3.0.

The WG reviewed this on 2014-04-29 and there was sentiment in favour of finding a solution, provided the resulting function was purely deterministic and did not rely on hidden state. The EXSLT proposal was examined; it was recognized that having a fixed-length sequence of random numbers to play with created usability problems. On the other hand a mechanism that only generates a single random number from a seed has the difficulty (or at least danger) of going into a closed loop. Michael Sperberg-McQueen suggested a function random-number-generator(seed) which returns a composite value (array or map) containing (a) the next random number in the sequence, and (b) a function to step this along. We would need to construct some examples to see how usable this is.

After considerable discussion in the WG, a proposal has been drafted and included in the F+O 3.1 specification, and was today accepted as status-quo text (with an invitation to WG members to review and comment). Jakub, protocol dictates that it's your privilege to mark the bug as closed when you are satisfied that your comment has been addressed. So here is the spec of the proposed function for your review: 4.9.1 fn:random-number-generator Summary Returns a random number generator, which can be used to generate sequences of random numbers. Signatures fn:random-number-generator() as map(xs:string, item()) fn:random-number-generator( $seed as xs:anyAtomicType) as map(xs:string, item()) Rules The function returns a random number generator. A random number generator is represented as a map containing three entries. The keys of each entry are strings: The entry with key "number" holds a random number; it is an xs:double greater than or equal to zero (0.0e0), and less than one (1.0e0). The entry with key "next" is a zero-arity function that can be called to return another random number generator. The entry with key "permute" is a function with arity 1 (one), which takes an arbitrary sequence as its argument, and returns a random permutation of that sequence. Calling the fn:random-number-generator function with no arguments is equivalent to calling the single-argument form of the function with an implementation-dependent seed. If a $seed is supplied, it may be an atomic value of any type. Both forms of the function are ·deterministic·: calling the function twice with the same arguments, within a single ·execution scope·, produces the same results. The value of the number entry should be such that all eligible xs:double values are equally likely to be chosen. The function returned in the permute entry should be such that all permutations of the supplied sequence are equally likely to be chosen. The map returned by the random-number-generator function may contain additional entries beyond those specified here, but it must match the type map(xs:string, item()). The meaning of any additional entries is ·implementation-defined·. To avoid conflict with any future version of this specification, the keys of any such entries should start with an underscore character. Notes It is not meaningful to ask whether the functions returned in the next and permute functions resulting from two separate calls with the same seed are "the same function", but the functions must be equivalent in the sense that calling them produces the same sequence of random numbers. The repeatability of the results of function calls in different execution scopes is outside the scope of this specification. It is recommended that when the same seed is provided explicitly, the same random number sequence should be delivered even in different execution scopes; while if no seed is provided, the processor should choose a seed that is likely to be different from one execution scope to another. (The same effect can be achieved explicitly by using fn:current-dateTime() as a seed.) The specification does not place strong conformance requirements on the actual randomness of the result; this is left to the implementation. It is desirable, for example, when generating a sequence of random numbers that the sequence should not get into a repeating loop; but the specification does not attempt to dictate this. Examples The following example returns a random permutation of the integers in the range 1 to 100: fn:random-number-generator()?permute(1 to 100) The following example returns a 10% sample of the items in an input sequence $seq, chosen at random: fn:random-number-generator()?permute($seq)[position() = 1 to (count($seq) idiv 10)] The following code defines a function that can be called to produce a random sequence of xs:double values in the range zero to one, of specified length: declare %public function r:random-sequence($length as xs:integer) as xs:double* { r:random-sequence($length, fn:random-number-generator()) }; declare %private function r:random-sequence($length as xs:integer, $G as map(xs:string, item())) { if ($length eq 0) then () else ($G?number, r:random-sequence($length - 1, $G?next()) }; r:random-sequence(200);