This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
ACTION A-510-05 on Michael Kay to raise a bug against the regular expression specification following up on test bug 15545; the proposal should be to permit some implementation-dependent variation in the matching of groups within a regex (for example by a back-reference) when the regex is technically ambiguous. First, some editorial housekeeping. Delete this sentence: "The fn:replace function described below allows access to the parts of the input string that matched a sub-expression (called captured substrings)." and replace it with: "Some operations associated with regular expressions (for example, back-references, and the fn:replace function) allow access to the parts of the input string that matched a sub-expression (called captured substrings)." Then replace this sentence "If a sub-expression matches more than one substring (because it is within a construct that allows repetition), then only the last substring that it matched will be captured." with a new paragraph: When parentheses are used in a part of the regular expression that is matched more than once (because it is within a construct that allows repetition), then only the last substring that it matched will be captured. Note that this rule is not sufficient in all cases to ensure an unambiguous result, especially in cases where (a) the regular expression contains nested repeating constructs, and/or (b) the repeating construct matches a zero-length string. In such cases it is implementation-dependent which substring is captured. For example given the regular expression (a*)+ and the input string "aaaa", an implementation might legitimately capture either "aaaa" or a zero length string as the content of the captured subgroup.
Proposal accepted