This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27845 - msData\regex\reU6 should be valid
Summary: msData\regex\reU6 should be valid
Status: NEW
Alias: None
Product: XML Schema Test Suite
Classification: Unclassified
Component: Microsoft tests (show other bugs)
Version: 2006-11-06
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema Test Suite mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-16 16:16 UTC by Georgiy Rakov
Modified: 2015-06-15 21:14 UTC (History)
1 user (show)

See Also:


Attachments

Description Georgiy Rakov 2015-01-16 16:16:52 UTC
This test is marked as invalid and it demands that unicode symbol 23F doesn't match \w regex pattern. But actually it should because 23F according to [1] belongs to Ll general category (not P, Z and C) and XSD standard [2] states:

    \w [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters)

So this test actually should be turned to valid.

Could you please tell if you agree.

[1] http://www.unicode.org/Public/6.2.0/ucd/UnicodeData.txt
[2] http://www.w3.org/TR/xmlschema-2/
Comment 1 Michael Kay 2015-01-16 16:56:26 UTC
The metadata I have in my copy of this test reads:

<instanceTest name="reU6.i">
         <instanceDocument xlink:href="../msData/regex/reU6.xml"/>
         <expected validity="invalid" version="Unicode_4.0.0"/>
         <expected validity="valid" version="Unicode_6.0.0"/>
         <current date="2011-10-24" status="accepted" bugzilla="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13607"/>
         <prior date="2006-07-16" status="accepted"/>
      </instanceTest>

are you seeing the same? This shows that the test was modified as a result of bug #13607 to give alternative results for Unicode 4 and Unicode 6.
Comment 2 Georgiy Rakov 2015-01-21 12:11:48 UTC
(In reply to Michael Kay from comment #1)
> The metadata I have in my copy of this test reads:
> 
> <instanceTest name="reU6.i">
>          <instanceDocument xlink:href="../msData/regex/reU6.xml"/>
>          <expected validity="invalid" version="Unicode_4.0.0"/>
>          <expected validity="valid" version="Unicode_6.0.0"/>
>          <current date="2011-10-24" status="accepted"
> bugzilla="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13607"/>
>          <prior date="2006-07-16" status="accepted"/>
>       </instanceTest>
> 
> are you seeing the same? This shows that the test was modified as a result
> of bug #13607 to give alternative results for Unicode 4 and Unicode 6.

Thank you very much.
Yes I can see this in CVS:
http://dev.w3.org/cvsweb/XML/xml-schema-test-suite/2004-01-14/xmlschema2006-11-06/msMeta/Regex_w3c.xml?rev=1.22;content-type=text%2Fplain
Comment 3 Georgiy Rakov 2015-01-21 12:35:59 UTC
Could you please tell if I understand correctly the meaning of Target Milestone field of the bug 13607. Does it mean that this bug is supposed to be fixed just for XSD 1.1 test suite, but not for XSD 1.0 test suite.
Comment 4 Michael Kay 2015-01-21 14:44:48 UTC
It might have meant something at the time, perhaps we were classifying test bug as to which affected 1.0 and which only affected 1.1, but there is no active maintenance plan for the test suite at the moment, so it's unlikely bugs will be fixed (unless of course Oracle wants to provide some resources?)
Comment 5 Georgiy Rakov 2015-01-22 15:01:09 UTC
My understanding was that the bug was already fixed at least for reU6 because CVS contains fixed meta file you pointed. Is this understanding incorrect?

Anyway my question actually is - if the decision made in bug 13607 is appropriate for conformance testing of XML Schema 1.0 implementation.
Comment 6 Michael Kay 2015-01-22 16:35:59 UTC
That's a tough question to answer because you have to make a decision about how to interpret the fact that XSD 1.0 references Unicode 3.1. I think most WG members would say, go with a later version of Unicode if you want, but spec lawyers might not agree. W3C of course does not award conformance certificates.
Comment 7 Georgiy Rakov 2015-06-11 16:39:13 UTC
(In reply to Michael Kay from comment #6)
> That's a tough question to answer because you have to make a decision about
> how to interpret the fact that XSD 1.0 references Unicode 3.1. I think most
> WG members would say, go with a later version of Unicode if you want, but
> spec lawyers might not agree. W3C of course does not award conformance
> certificates.

Hello,

[1] contains following assertion:

Note:  [Unicode Database] is subject to future revision. For example, the mapping from code points to character properties might be updated. All ·minimally conforming· processors ·must· support the character properties defined in the version of [Unicode Database] that is current at the time this specification became a W3C Recommendation. However, implementors are encouraged to support the character properties defined in any future version. 

Namely it states that implementors of conformant processors are encouraged to support the character properties defined in any future version of Unicode; and before it states that Unicode can be updated by changing the mapping from code points to character properties.

I wonder if this implies that new Unicode version can define some character property existing in the previous version by updating the mapping from code points to this character property, and implementors of conformant processors are allowed (even encouraged) to support this.

If this is true then I believe the decision made in bug 13607 can be legally applied to XML Schema 1.0 not just to XML Schema 1.1 because this is what actually happened to characters engaged in the tests presented in this bug.

Could you please tell if you agree with the reasoning above and with the conclusion that the decision made in bug 13607 can be applied to XML Schema 1.0.

[1] http://www.w3.org/TR/xmlschema-2/

Thank you,
Georgiy.
Comment 8 Michael Kay 2015-06-15 21:14:26 UTC
I can only tell you what I think I would do as an implementor, and that opinion carries no more weight than your own. I would implement the most recent version of Unicode, and interpret the spec as relating to that version, with compatibility flags or options to do something different if I thought my customer base needed them.