This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17160 - [FO3.0] Captured groups in regular expressions.
Summary: [FO3.0] Captured groups in regular expressions.
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-23 13:41 UTC by Michael Kay
Modified: 2012-06-13 08:14 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2012-05-23 13:41:37 UTC
ACTION A-510-05 on Michael Kay to raise a bug against the regular expression specification following up on test bug 15545; the proposal should be to permit some implementation-dependent variation in the matching of groups within a regex (for example by a back-reference) when the regex is technically ambiguous.

First, some editorial housekeeping. Delete this sentence: "The fn:replace function described below allows access to the parts of the input string that matched a sub-expression (called captured substrings)." and replace it with: "Some operations associated with regular expressions (for example, back-references, and the fn:replace function) allow access to the parts of the input string that matched a sub-expression (called captured substrings)." 

Then replace this sentence "If a sub-expression matches more than one substring (because it is within a construct that allows repetition), then only the last substring that it matched will be captured." with a new paragraph:

When parentheses are used in a part of the regular expression that is matched more than once (because it is within a construct that allows repetition), then only the last substring that it matched will be captured. Note that this rule is not sufficient in all cases to ensure an unambiguous result, especially in cases where (a) the regular expression contains nested repeating constructs, and/or (b) the repeating construct matches a zero-length string. In such cases it is implementation-dependent which substring is captured. For example given the regular expression (a*)+ and the input string "aaaa", an implementation might legitimately capture either "aaaa" or a zero length string as the content of the captured subgroup.
Comment 1 Michael Kay 2012-06-12 15:28:28 UTC
Proposal accepted