This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
This test is as follows: (every $s in tokenize('33a33', ',') satisfies matches($s, '^(?:(\d*){0,2}a\1)$') ) and (every $s in tokenize('33a34', ',') satisfies not(matches($s, '^(?:(\d*){0 ,2}a\1)$'))) Part of this involves checking that matches('33a33', '^(?:(\d*){0,2}a\1)$') is true. If I understand the spec correctly, (\d*) can be matched 0 to 2 times. \d* matches '33' once, then matches '' (the empty string) on a second pass. The spec states that: "If a sub-expression matches more than one substring (because it is within a construct that allows repetition), then only the last substring that it matched will be captured." thus, the back reference \1 has the value '', not the (presumably expected) value '33'. If my understanding is correct, there are related problems in the following tests. re00973 re00974 re00975 re00976
I think I've just checked in some changes to these tests. This was a bit naughty, I was under the impression that they were still under development and had not yet been committed. I came to the conclusion that the spec here is underdefined: with a construct such as (a*)* that can match the input "aaaa" in various ways, we aren't prescriptive about what the contents of \1 should be. One can argue that the inner loop should be executed four times and the outer loop once, but the spec makes no attempt to mandate that. Your theory that the outer loop is executed twice, matching "aaaa" the first time and "" the second time, is equally defensible. In fact there's nothing in the spec to say that the implementation must terminate...
Test re00976 still remains a problem. (every $s in tokenize('22a22z', ',') satisfies matches($s, '^(?:(\d*){2,}?a\1z)$ ')) and (every $s in tokenize('22a22', ',') satisfies not(matches($s, '^(?:(\d*) {2,}?a\1z)$'))) Here, (\d*){2,}? causes two passes of matching \d*. The first matches '22', the second matches '', hence the matching fails. For what it's worth, my interpretation of the specification is that \d* must match the longest possible substring, inferred from the text: ''Without the " ? ", the regular expression matches the longest possible substring.''
Needs WG discussion, whether there is a spec issue here.
Spec bug 17160 has been raised to implement the WG decision that we should be a little liberal in this area. I have modified the test case to allow either a match or non-match.