RE: Comment on ITS 2.0 specification WD

Hi Jörg,

> What would be this "small sub-set that most engines support"?

See most of the old discussion here:
https://www.w3.org/International/multilingualweb/lt/track/actions/189

I think the last proposed sub-set is described in the attachment on this email:
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0269.html



> I'm asking because with XSD RE we have a standardized specification 
> which IMHO we wouldn't have with a "small sub-set".

When talking about regular expressions most developers will see Perl or ECMA as the de-facto standards rather than XSD RE.

The idea is to define a sub-set that is common to most of the main regex 'standards' (including XSD RE). The syntax of that sub-set could be specified by using a pattern facet in the ITS schema (written using XSD RE :)

What I'd like to avoid is to have ITS allowing the use of constructs specific to a given regex engine (XSD RE or any other).

I know XSD RE is implemented in various programming language and Jirka already pointed out ways to use such implementations, but I still think it would make more sense to use a sub-set for which the implementer doesn't have to do anything special and (in most cases) simply use the regex engine of his/her programming language.

I've seen the same kind of implementation issue with SRX where the 'standard' is the ICU RE. The outcome: as far as I know there is only one SRX engines that supports the ICU syntax properly, all the others simply use their programming language regex engine. In the Allowed Characters case we could avoid that because the aim of the regex can be achieved with a basic sub-set of regex constructs. Why make the implementation difficult when it can be simpler?

cheers,
-yves

Received on Thursday, 3 January 2013 17:09:36 UTC