This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20575 - [QT3TS] test-case re00216 in test-set fn-matches.re
Summary: [QT3TS] test-case re00216 in test-set fn-matches.re
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3 & XPath 3 Test Suite (show other bugs)
Version: Last Call drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-05 22:55 UTC by Michael Kay
Modified: 2013-05-07 17:12 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2013-01-05 22:55:36 UTC
This test does

matches('qwerty','\p{IsaA0-a9}')

and expects an error on the grounds that the regular expression is invalid, since "IsA0-a9" is not a recongnized group name.

The specification states that "if the value of $pattern is invalid according to the rules described in 5.6.1 Regular expression syntax", and section 5.6.1 says "The regular expression syntax and semantics are identical to those defined in [XML Schema Part 2: Datatypes Second Edition] with the following [irrelevant] additions..." The reference is to XSD 1.0, but we state "Implementations of this specification may support either XSD 1.0 or XSD 1.1 or both.".

The relevant syntax rule in both XSD 1.0 and XSD 1.1 is:

	IsBlock	   ::=   	'Is' [a-zA-Z0-9#x2D]+

Thus this regular expression matches the syntax.

In XSD 1.0, no semantics are given for a regular expression that uses an unknown block name, but it is nowhere stated that this is an error.

The situation is clarified in XSD 1.1:

<quote>
If a string "IsX" matches the non-terminal IsBlock but X is not a recognized block name, then the expressions "\p{IsX}" and "\P{IsX}" each denote the set of all characters. Processors may ·at user option· treat both "\p{IsX}" and "\P{IsX}" as denoting the empty set, instead of the set of all characters....

Processors should issue a warning if they encounter a regular expression using a block name they do not recognize. Processors may ·at user option· treat unrecognized block names as ·errors· in the schema.

Note: Treating unrecognized block names as errors increases the likelihood that errors in spelling the block name will be detected and can be helpful in checking the correctness of schema documents. However, it also decreases the portability of schema documents among processors supporting different versions of [Unicode Database]; it is for this reason that processors are allowed to treat unrecognized block names as errors only when the user has explicitly requested this behavior.
</quote>

We clearly have the opportunity to say something different for XPath regular expressions, but currently we do not do so. I think a clarification in the spec would be appropriate. In the meantime, based on the XSD 1.1 rules which we inherit, I propose to allow the alternative result "false".
Comment 1 Michael Kay 2013-01-05 22:57:52 UTC
Correction, on the basis that XSD 1.1 states that an unknown block name should be treated as matching all characters, the expected result is TRUE.
Comment 2 O'Neil Delpratt 2013-03-20 11:29:53 UTC
Marking this bug for WG discussion
Comment 3 Michael Kay 2013-03-20 12:17:15 UTC
My recommendation would be to state that for XPath regular expressions:

"If a Unicode Block name appears (in the construct \p{IsBlockName} or \P{IsBlockName}) that is not defined in any version of Unicode recognized by the processor, the regular expression is invalid". 

(Note: XSD 1.1 recommends treating an unrecognized block name as matching any character, but allows other semantics at user option. XSD 1.0 is open to interpretation on the question. Both the names of Unicode blocks and the characters assigned to them have changed from one version of Unicode to another; processors are free to choose which version(s) of Unicode block names are recognized.)
Comment 4 Michael Kay 2013-05-07 17:12:27 UTC
The WOrking Group accepted the proposal that use of unrecognized block names in a regular expression should be an error.

The result of this test case is therefore an error.

The test is being reclassified as a spec bug.