Re: forbiddenCharacters data category - related to [ACTIOn-189] from Jirka Kosek on 2012-08-27 (public-multilingualweb-lt@w3.org from August 2012)

From: Jirka Kosek <jirka@kosek.cz>
Date: Tue, 28 Aug 2012 00:21:47 +0200
To: Yves Savourel <ysavourel@enlaso.com>
CC: public-multilingualweb-lt@w3.org
Message-ID: <503BF2FB.8060002@kosek.cz>

On 27.8.2012 22:30, Yves Savourel wrote:

> But, as you show, such character class subtraction doesn't have a common syntax across different engines, or it's not supported in other (like JavaScript as far as I know). That's why I think we want to avoid picking the all the features of one engine. For example we wouldn't allow \d, \w, etc. either because it's not interoperable.

We should think about users first and second about implementations.

ITS is used in XML documents. XML documents are described by XML
schemas. Global rules in ITS are expressed as XPath. These are good
reasons to use regexp syntax which is defined by XML Schema (and used in
XPath 2.0 as well) because XML-savvy people are already familiar with it.

I don't think we should define our own subset, we should just reuse what
people know and what's already standardized in XML world:

http://www.w3.org/TR/xmlschema-2/#nt-charClass

Of course implementations are also important. But as there are
open-source implementations of XML Schema regexps for all major
platforms -- for example Saxon for Java/.NET and libxml2 for C/C++ -- I
don't see any problem here. You will simply reuse existing code instead
of relying on default platform regexp engine.

Jirka

--
------------------------------------------------------------------
Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
------------------------------------------------------------------
Professional XML consulting and training services
DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
------------------------------------------------------------------

Received on Monday, 27 August 2012 22:22:16 UTC