This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14994 - [QT3] analyzeString-012, analyzeString-018
Summary: [QT3] analyzeString-012, analyzeString-018
Status: CLOSED INVALID
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3 & XPath 3 Test Suite (show other bugs)
Version: Member-only Editors Drafts
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Benjamin Nguyen
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-29 12:36 UTC by Tim Mills
Modified: 2012-01-06 12:46 UTC (History)
1 user (show)

See Also:


Attachments

Description Tim Mills 2011-11-29 12:36:11 UTC
I'm surprised by the expected results for these tests.

In each case it appears that the captured subgroup only contains the last occurrence of the subexpression being matched.

i.e. For

analyze-string("how now brown cow", "(.*?ow\s+)+", "")

captured subgroup 1 = "now ", and not "how now ".

For

analyze-string("banana", "(?:b(an)*a)")

captured subgroup 1 = "an", and not "anan".

Is this correct?
Comment 1 Michael Kay 2011-11-29 18:13:59 UTC
I believe the results are correct given the way the spec is currently written. Under fn:analyze-string, it states:

For each capturing subexpression there will be at most one corresponding fn:group element in each fn:match element in the result.

and 5.6.1 in defining capturing groups says:

If a sub-expression matches more than one substring (because it is within a construct that allows repetition), then only the last substring that it matched will be captured.

It would be nice to mark up all instances of the captured group, but it could cause some compatibility problems for replace(), and I suspect it could be difficult to implement with some off-the-shelf regex libraries.
Comment 2 Tim Mills 2012-01-06 12:46:19 UTC
Agreed.