W3C

Edit comment LC-2130 for Efficient Extensible Interchange Working Group

Quick access to

Previous: LC-2108 Next: LC-2190

Comment LC-2130
:
Commenter: Yuri Delendik <yury_exi@yahoo.com>

or
Resolution status:

Hello,

From 7.1.10.1 Restricted Character Sets:
"... If the restricted character set for a datatype contains at least 255 characters or contains non-BMP characters, the character set of the datatype is not restricted and can be omitted from further consideration..."

Appendix E Deriving Character Sets from XML Schema Regular Expressions explains how to build character sets. It enumerates character groups that if they are contained in regular expression atom, the charset of the whole expression is defined to be the entire set of XML characters. One of the exceptions is multi-character escape "\d". By XSD definition it is equivalent to category escape "\p{Nd}". But according Unicode 5.0.0's UnicodeData.txt data file this category contains 290 characters (230 BMP and 60 non-BMP).

The exception of "\d" (and "\p{Nd}") is in correct: after all processing the expression "\d" becomes non-suitable for datatype encoding using restricted character set since the set has more than 255 and contains non-BMP characters.

There are a totals from UnicodeData.txt:
Category BMP non-BMP Total chars Excl.in EXI
\p{Cc} 65 0 65
\p{Cf} 33 105 138 ?
\p{Co} 2 4 6 X
\p{Cs} 6 0 6
\p{Ll} 1102 532 1634 X
\p{Lm} 167 0 167
\p{Lo} 6009 1954 7963 X
\p{Lt} 31 0 31
\p{Lu} 836 484 1320 X
\p{Mc} 167 8 175 ?
\p{Me} 10 0 10
\p{Mn} 602 278 880 X
\p{Nd} 230 60 290 ?
\p{Nl} 51 159 210 ?
\p{No} 252 84 336 ?
\p{Pc} 10 0 10
\p{Pd} 18 0 18
\p{Pe} 65 0 65
\p{Pf} 9 0 9
\p{Pi} 11 0 11
\p{Po} 260 18 278 ?
\p{Ps} 66 0 66
\p{Sc} 41 0 41
\p{Sk} 99 0 99
\p{Sm} 904 10 914 X
\p{So} 2350 608 2958 X
\p{Zl} 1 0 1
\p{Zp} 1 0 1
\p{Zs} 18 0 18
Regards,
Yuri Delendik
(space separated ids)
(Please make sure the resolution is adapted for public consumption)


Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: 2130.html,v 1.1 2017/08/11 06:44:19 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org