This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21425 - possible regex error in test cases from "fn-matches.re"
Summary: possible regex error in test cases from "fn-matches.re"
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3 & XPath 3 Test Suite (show other bugs)
Version: Working drafts
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: O'Neil Delpratt
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-28 16:40 UTC by Sorin Nasoi
Modified: 2013-06-13 10:27 UTC (History)
3 users (show)

See Also:


Attachments

Description Sorin Nasoi 2013-03-28 16:40:32 UTC
The test case "re00056" is:

    (every $s in tokenize('', ',')
      satisfies matches($s, '^(?:[^a-d-b-c])$'))
    and
    (every $s in tokenize('a-b,c-c,ab,cc', ',')
      satisfies not(matches($s, '^(?:[^a-d-b-c])$')))

The regular expression [^a-d-b-c] seems wrong. The "a-d" means "'a' through 'd', i.e., abcd, and the "b-c" means "'b' through 'c', i.e., "bc". However, the '-' between the 'd' and 'b' makes no sense. It can't mean "'d' through 'b'" since 'b' is less than 'd', nor can it mean "a-d without 'b' and without 'c'," i.e., range subtraction per <http://www.w3.org/TR/xmlschema-2/#nt-charClassSub>.

Similarly, the test case "re00086" is:

    (every $s in tokenize(',a-1x-7,c-4z-9,a-1z-8a-1z-9,a1z-9,a-1z8,a-1,z-9', ',')
      satisfies matches($s, '^(?:[a-c-1-4x-z-7-9]*)$'))
    and
    (every $s in tokenize('', ',')
      satisfies not(matches($s, '^(?:[a-c-1-4x-z-7-9]*)$')))

The regular expression [a-c-1-4x-z-7-9] seems wrong for the same reason.
Comment 1 Michael Kay 2013-03-28 17:21:46 UTC
The meaning of hyphens in regular expressions is underspecified in XSD 1.0, but is clarified in XSD 1.1. My advice would be that XPath implementations should follow what XSD 1.1 says, though this is not mandatory. I believe that the tests are consistent with this assumption.

For example, I think that [a-c-1-4x-z-7-9] means [a-c]|-|[1-4]|[x-z]|-|[7-9].
Comment 2 Paul J. Lucas 2013-03-28 17:50:25 UTC
But according to XML Schema Part 2, section F.1 under "Character Range":

• The - character is a valid character range only at the beginning or end of a ·positive character group·.

Hence the two -'s in the middle that you beak out in your "means" would seem to contradict that.
Comment 3 Michael Kay 2013-03-28 18:11:58 UTC
Yes, XSD 1.0 is a mess in this area. It also says (two bullets earlier) that "-" is not a valid character range, which contradicts the sentence you cite. If you try and implement what XSD 1.0 says regarding hyphens, you tie yourself in knots, which is why I suggest that following the XSD 1.1 spec is the wisest course.

But you're probably right that these tests results should be marked as being dependent on XSD 1.1 support. I wouldn't like to say what the correct result is for a processor following the 1.0 rules.
Comment 4 Michael Kay 2013-05-01 13:40:44 UTC
Looking at this again, although XSD 1.0 is confused about hyphens in regexes, I think it is clear that a-b-c-d is not allowed. (The rules are contradictory, but not about this particular case).

So I'm going to split these two tests into XSD 1.0 and XSD 1.1 versions.
Comment 5 Tim Mills 2013-06-13 09:59:10 UTC
If regular expressions of the form 

[a-d-b-c]

are now considered to be errors, I think there are two other tests which are possibly incorrect.
	
re00102: regex contains [a-a-x-x]
K2-MatchesFunc-16: regex contains [0-9-.]
Comment 6 Michael Kay 2013-06-13 10:27:04 UTC
re00102 - agree, I have split it into two versions.

K2-MatchesFunc-16 - the problem was previously raised in bug #4466 and the test already allows alternative results. But the correct error is surely FORX0002 rather than FORG0001. I've split it into two versions.