W3C

Edit comment LC-2108 for Efficient Extensible Interchange Working Group

Quick access to

Previous: LC-2248 Next: LC-2130

Comment LC-2108
:
Commenter: <pub@upokecenter.com>

or
Resolution status:

I want to make a suggestion on the section 'Deriving Character Sets from XML Schema Regular Expressions':

I want to propose that datatypes with a regular expression containing a "charClassSub" should have no restricted character set. The reason is that all the remaining parts of the regular expression derivation expect only a union of characters, which is very efficient in determining whether the expression contains a restricted character set or not. Having a 'charClassSub' as part of the derivation process may complicate this, as the program now has to subtract portions of the character set as well as add to them, which may be a problem if the character set contains a large number of characters, like this:

[&#x20;-&#xFF00;-[&#x60;-&#xFF00]]

That regular expression above would yield a restricted character set of 64 characters; however the implementation may require storing thousands of characters (a naive implementation, yes) before it must exclude them in the 'charClassSub' portion of the regular expression. Another problem is nested 'charClassSub' sets. For example, the following regular expression is allowed:

[A-Z-[B-Z-[C-Z-[D-Z-[E-Z-[...]]]]]]

Both problems make 'charClassSub' problematic in restricted character set derivation. I thank you for your time.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)


Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: 2108.html,v 1.1 2017/08/11 06:44:19 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org