This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5948 - Reference to Unicode Database
Summary: Reference to Unicode Database
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL: http://www.w3.org/International/revie...
Whiteboard:
Keywords: resolved
: 10008 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-11 11:33 UTC by Felix Sasaki
Modified: 2010-11-10 17:43 UTC (History)
3 users (show)

See Also:


Attachments

Description Felix Sasaki 2008-08-11 11:33:57 UTC
Hello,

this is a comment on behalf of the i18n core Working Group.
We suggest that you have a non-static reference to the Unicode database, and that you make the Note starting with "[Unicode Database] is subject to future revision" to normative text.
Locations:
References
http://www.w3.org/TR/2008/WD-xmlschema11-2-20080620/#UnicodeDB
and G1
http://www.w3.org/TR/2008/WD-xmlschema11-2-20080620/#charcter-classes
Thank you,
Felix
Comment 1 Michael Kay 2008-09-05 14:36:34 UTC
I would be more inclined to agree with this if the Unicode Database offered better guarantees of backwards compatibility. But recent releases have, for example, renamed one of the character groups from "Greek" to "Greek and Coptic", and an implementation that followed that blindly would cause any schema using the regular expression \P{IsGreek} to become invalid overnight. Similarly there are characters that have changed category, which also changes the semantics of regular expressions, causing a message that is validated by the sender to be rejected by the recipient even though both use the same schema. 

I think we need to offer XML Schema users better stability than this.
Comment 2 Felix Sasaki 2008-09-29 05:57:15 UTC
Hello Michael,

we discussed your comment
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5948#c1
at
http://www.w3.org/2008/09/17-core-minutes#item05
We are now contacting the Unicode Technical Committee asking for stability of block names. See the thread at 
http://lists.w3.org/Archives/Member/member-i18n-core/2008Sep/0013.html
The short summary is that there is a trade off between having a stable reference to the Unicode database and allowing for more characters in "yet to come" versions of Unicode. We think a good compromise would be to make it implementation-defined which version of the database is used. This proposal is also based on Bug 
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5818
Mark Davis pointed out at
http://lists.w3.org/Archives/Member/member-i18n-core/2008Sep/0025.html
that the property Alias file
http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt
provides information about previous block names, so you might want to take that into account as well.

Regards, Felix.
Comment 3 David Ezell 2008-10-31 21:14:49 UTC
WG agreed at the Face to Face:

<MSM> We think the text is mostly ok, but some tweaks are desirable:
<MSM> - replace references to 'current version' with references to 'the version cited in the References' (or to the actual number of that version -- we won't change it)
<MSM> - change references to 'future' versions to 'later' versions
<MSM> - add prose to the reference re-stating the conformance rules (ceiling/floor)
Comment 4 David Ezell 2008-10-31 21:16:00 UTC
AND:
<MSM> - Make those two Notes normative text!
Comment 5 Felix Sasaki 2008-11-01 09:00:33 UTC
(In reply to comment #3)
> WG agreed at the Face to Face:
> 
> <MSM> We think the text is mostly ok, but some tweaks are desirable:
> <MSM> - replace references to 'current version' with references to 'the version
> cited in the References' (or to the actual number of that version -- we won't
> change it)
> <MSM> - change references to 'future' versions to 'later' versions
> <MSM> - add prose to the reference re-stating the conformance rules
> (ceiling/floor)
> 

Hello David, all,

it is a bit hard to see what the actual change will be, could you give a link to the updated draft later? Also, FYI and maybe of importante for this issues: the i18n core WG has talked to the Unicode consortium about various stability policies, including character class names, which resulted in an update of these policies. See 
http://lists.w3.org/Archives/Member/member-i18n-core/2008Oct/0038.html

Felix
Comment 6 Michael Kay 2008-11-01 10:00:17 UTC
I don't see anything in that Unicode stability policy about names of blocks such as "Greek", which is my biggest concern here. Did I miss something? Also, a policy for the future doesn't solve the problem for the past - I do think we need to say something explicit to make sure that a schema using <pattern value="p{IsGreek}*"/> continues to work.
Comment 7 Felix Sasaki 2008-11-13 02:10:28 UTC
(In reply to comment #6)
> I don't see anything in that Unicode stability policy about names of blocks
> such as "Greek", which is my biggest concern here. Did I miss something? Also,
> a policy for the future doesn't solve the problem for the past - I do think we
> need to say something explicit to make sure that a schema using <pattern
> value="p{IsGreek}*"/> continues to work.
> 

From a mail exchange with Mark Davis on the topic:

[Soon] "there will be a publicly available stability provision for all of the property aliases and property value aliases on

    * http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt
    * http://unicode.org/Public/UNIDATA/PropertyAliases.txt 

with the exception of Contributory properties listed on <http://www.unicode.org/Public/UNIDATA/UCD.html#Properties>. This is not completely final yet, since the exact wording has to be formulated by the editorial committee, and it actually requires approval by the officers, but I don't anticipate any problems.

So that will include block names. Note that that the set of characters having a given property or property value may change (subject to the stability policies). What the above means is that the identifiers will always remain valid, so \p{script=Greek} or equivalent syntax will remain valid. That should address your concerns.

Mark"
Comment 8 C. M. Sperberg-McQueen 2008-12-19 03:34:03 UTC
A wording proposal intended to resolve bug 5948 and bug 5950 is at

  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b5948.html

This should make it easier to see what changes, exactly, are proposed.
Comment 9 Michael Kay 2008-12-19 09:39:34 UTC
In the wording proposal, the Note after the table of block names refers to PropertyAliases.txt and PropertyValueAliases.txt. The block names are actually defined in Blocks.txt
Comment 10 C. M. Sperberg-McQueen 2008-12-22 15:37:43 UTC
The wording proposal mentioned in comment #8 was approved by the XML Schema
WG at its telcon of 19 December 2008, with minor amendments.  We discussed the
point raised in comment #9; some WG members took comment #7 to mean that
the two files in question are in fact relevant to (changes in) block
assignment, or will be.  In the end, recalling that those two files are
already mentioned in the first note in the change proposal, not very far
above this location in the text, the WG decided just to delete that 
sentence.

The changes have been integrated into the status quo document at the usual
location.  

The WG decided not to close the issue, however.  Since XSD 1.0 refers to
version 3.1 of the Unicode database, the current draft of XSD 1.1 to 4.1,
and the Unicode Consortium's web site now carries version 5.1, we discussed
briefly which version to require of XSD 1.1 processors:  3.1 (for compatibility
with XSD 1.0)?, 4.1 (for compatibility with earlier drafts of 1.1)?, or 
5.1 (to be current)?  We decided to require 5.1, and instructed the editors
to check the block and property information and update the reference.

I'm marking this needsDrafting, accordingly.
Comment 11 C. M. Sperberg-McQueen 2009-01-21 00:04:12 UTC
On 16 January 2009, the XML Schema WG adopted the proposal at

  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b5948b.html

with amendments proposed by Michael Kay in 

  http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2009Jan/0011.html

(both of these are member-only links).

Felix, if you could convey this decision to the i18n WG and let us know
whether it resolves the issue to your and their satisfaction, we'd be
grateful. Close the issue if you're satisfied, reopen it if not.

If we don't hear from you in the next two weeks, we will assume you and
the i18n core WG are satisfied with the resolution of the issue. 
Comment 12 Michael Kay 2010-06-24 12:13:04 UTC
*** Bug 10008 has been marked as a duplicate of this bug. ***
Comment 13 David Ezell 2010-11-10 17:43:05 UTC
The WG reported this bug as FIXED on 2010-06-24.  We are closing this bug
as requiring no futher work.  If there are issues remaining, you can reopen
this bug and enter a comment to indicate the problem.  Thanks very much for the
feedback.