This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10008 - Use of Unicode blocks that no longer exist in regular expressions.
Summary: Use of Unicode blocks that no longer exist in regular expressions.
Status: CLOSED DUPLICATE of bug 5948
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.0/1.1 both
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords:
Depends on: 5818
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-24 11:47 UTC by Oliver Hallam
Modified: 2010-11-10 17:22 UTC (History)
2 users (show)

See Also:


Attachments

Description Oliver Hallam 2010-06-24 11:47:12 UTC
Section F.1.1 states the following:

Note:  [Unicode Database] is subject to future revision. For example, the grouping of code points into blocks might be updated. All ·minimally conforming· processors ·must· support the blocks defined in the version of [Unicode Database] that is current at the time this specification became a W3C Recommendation. However, implementors are encouraged to support the blocks defined in any future version of the Unicode Standard.

Unfortunately some of these blocks no longer exist in the current Unicode specification!  I believe the changes are limited to the following:

CombiningMarksforSymbols is now CombiningDiacriticalMarksforSymbols

Greek is now GreekandCoptic

PrivateUse has been split into three groups (we think):
PrivateUseArea, SupplementaryPrivateUseAreaA and SupplementaryPrivateUseAreaB.

The behaviour for these old group names is left a bit vague.  I suggest that the correct behaviour should be one of the following, but this is not specified anywhere:

1) The old block names should no longer be valid.  This is a direct contradiction with the specification and would cause compatibility problems.

2) The old names should refer to groups in an older version of the Unicode specification that did have them.  In particular I suggest that this should be the version used in the Schema specification.

3) The old names should map to the equivalent groups in the newer version of the specification.  I can't find this mapping specified anywhere, but I believe it to be as described above (at least for the current version).
Comment 1 Michael Kay 2010-06-24 12:13:04 UTC

*** This bug has been marked as a duplicate of bug 5948 ***
Comment 2 David Ezell 2010-11-10 17:22:38 UTC
The WG reported this bug as DUPLICATE on 2010-06-24.  We are closing this bug as
requiring no futher work.  If there are issues remaining, you can reopen this
bug and enter a comment to indicate the problem.  Thanks very much for the
feedback.

Comment 1 contains a reference to the duplicate bug.