This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29354 - Unreserved characters in regular expression matching tests
Summary: Unreserved characters in regular expression matching tests
Status: RESOLVED INVALID
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3 & XPath 3 Test Suite (show other bugs)
Version: Working drafts
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: O'Neil Delpratt
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-01 13:45 UTC by Benito van der Zander
Modified: 2016-01-05 17:40 UTC (History)
1 user (show)

See Also:


Attachments

Description Benito van der Zander 2016-01-01 13:45:32 UTC
The regular expression syntax test cases in matches.re.xml test if \p{block} matches every character in that unicode code block, even characters that are non-assigned,  e.g. ԰ for Armenian or ޿ for Thaana.  
It does not make much sense to test for characters that do not actually exist.
Comment 1 Michael Kay 2016-01-01 22:47:46 UTC
The fact that a test "does not make much sense" seems irrelevant to me. The question is, what does the spec say that an implementation should do with a given query? 

As far as I can see, the codepoints 1328 and 1983 are legal codepoints in XML, and they are defined in the Unicode database (specifically, the blocks.txt file) to be within the range of a particular block, and should therefore match \p{IsBlock} for that block.
Comment 2 O'Neil Delpratt 2016-01-05 17:40:24 UTC
The WG decision that the test is valid:

The key rationale is to decouple our spec from Unicode:  using an unassigned character may be (perhaps is) a violation of the agreement between data source and data sink, if they have agreed to use a particular version of Unicode.
But it is not (and should not be) a violation of our specs.
Unicode may assign the character next month or next year, and it should be possible to use the character then, in a conforming implementation.
That means it must be possible *now* to use it in a conforming implementation.