This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
I'm surprised by the expected results for these tests. In each case it appears that the captured subgroup only contains the last occurrence of the subexpression being matched. i.e. For analyze-string("how now brown cow", "(.*?ow\s+)+", "") captured subgroup 1 = "now ", and not "how now ". For analyze-string("banana", "(?:b(an)*a)") captured subgroup 1 = "an", and not "anan". Is this correct?
I believe the results are correct given the way the spec is currently written. Under fn:analyze-string, it states: For each capturing subexpression there will be at most one corresponding fn:group element in each fn:match element in the result. and 5.6.1 in defining capturing groups says: If a sub-expression matches more than one substring (because it is within a construct that allows repetition), then only the last substring that it matched will be captured. It would be nice to mark up all instances of the captured group, but it could cause some compatibility problems for replace(), and I suspect it could be difficult to implement with some off-the-shelf regex libraries.
Agreed.