This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 8732 - Unicode 5.2
Summary: Unicode 5.2
Status: RESOLVED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2010-01-13 14:18 UTC by Michael Kay
Modified: 2011-05-04 16:18 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2010-01-13 14:18:31 UTC
Our current XSD 1.1 specs refer normatively to Unicode 5.1. Since 5.2 has been released, we should probably update to that.

This includes checking that the list of character categories and block names used in regular expressions is complete.
Comment 1 David Ezell 2010-01-15 16:26:45 UTC
On the telcon Michael Kay agreed to do some research to determine the required scope of the requested change.
Comment 2 David Ezell 2010-06-11 15:34:07 UTC
See email at from Mike Kay:
http://lists.w3.org/Archives/Member/w3c-xml-schema-wg/2010Jun/0002.html
Comment 3 David Ezell 2010-06-11 15:42:10 UTC
WG decided to adopt Michael Kay's proposal in comment #2.

proposal -- update "Unicode 5.1" to "Unicode 5.2" and add the code blocks listed in my email.
Comment 4 C. M. Sperberg-McQueen 2010-11-27 01:09:01 UTC
A problem arises in executing this change, perhaps a minor one.  We currently point to 'The Unicode Character Database' with a URI of http://www.unicode.org/Public/5.1.0/ucd/UCD.html.  This page, in turn, follows the familiar convention of providing a link to the previous version, and to the latest version, of the document.  Following the link to the latest version, one is told that "The UCD Documentation File You Requested Has Been Replaced", as of version 5.2.0.

The current version of Unicode now (as of October 2010) appears to be Unicode 6.0.0.

We need to repeat the work MK did on 5.2.0, checking character categories and block names.  

And we need to figure out which document(s) to link to.  Perhaps the best way to point to the Unicode Database itself is via http://www.unicode.org/Public/6.0.0/, which has the advantage that it has both the traditional semicolon-separated value files and an XML version of the database files in its subdirectories.  But for documentation, it would probably be better to point people to http://www.unicode.org/ucd/, which in turn links to http://www.unicode.org/versions/latest/ for the Unicode spec itself.
Comment 5 C. M. Sperberg-McQueen 2010-11-29 15:38:54 UTC
It seems clear that the WG needs to revisit the question of our Unicode 
reference and how tightly we should couple XSD 1.1 to any particular
Unicode version.  So after discussion with the chair I'm changing the
status from needsDrafting back to needsAgreement.
Comment 6 C. M. Sperberg-McQueen 2010-12-17 02:47:01 UTC
I wonder, having just revisited our discussion of anyURI and its relation to the relevant RFCs, whether we should cultivate a looser coupling to the Unicode database.  I can think of two or three variants that might simplify life:

(1) Just leave the reference at 5.1.  The spec allows implementations to support later versions of Unicode, and we don't have a pressing need to require them to be up to date.

(2) Move the reference back to Unicode 3 or so, on the same logic.

(3) Make the Unicode reference wholly generic and specify that the version of Unicode supported is implementation-defined.  

(4) Move the table of block names (and any other version-sensitive information about Unicode) to a non-normative document, and in that document provide not just the block names for the current version of Unicode, but record their changes (so users and implementors have a better chance of deciphering problems in the area). 

Note that (3) is compatible with each of the others, or with updating the minimum Unicode version to 6.0.
Comment 7 David Ezell 2010-12-17 16:45:17 UTC
Resolution: put the actual definitions in a separate document, and target version 3.1, time permitting; MSM to provide a Unicode history to the block names.
Comment 8 David Ezell 2011-03-18 15:25:56 UTC
<dezell> MSM: proposed -- I'd like to sever the tie to a particular unicode version.
<dezell> MSM: I'd like us to have a "first" unicode version, and allow later.
<dezell> ...: no restriction on supporting more than one version.

WG approved.
Comment 9 C. M. Sperberg-McQueen 2011-04-21 19:54:28 UTC
A proposal intended to resolve this issue is at

  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b8732.html
  (member-only link).
Comment 10 C. M. Sperberg-McQueen 2011-05-04 16:02:07 UTC
The proposal mentioned in comment 9 has been integrated into the status-quo document, so I'm closing this issue.

Michael, if you would close or reopen to signal assent/dissent?  Thanks.