This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11011 - Since Javascript does not support mode specifiers inside the regular expression, there is no simple way of matching a single word case-insensitively besides turning into [Ww][Oo][Rr][Dd]
Summary: Since Javascript does not support mode specifiers inside the regular expressi...
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords: WGDecision
: 11319 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-10-12 11:58 UTC by contributor
Modified: 2014-09-29 20:52 UTC (History)
11 users (show)

See Also:


Attachments

Description contributor 2010-10-12 11:58:28 UTC
Section: http://www.whatwg.org/specs/web-apps/current-work/#the-pattern-attribute

Comment:
Since Javascript does not support mode specifiers inside the regular
expression, there is no simple way of matching a single word
case-insensitively besides turning into [Ww][Oo][Rr][Dd]

Posted from: 129.241.132.186
Comment 1 Vegard Larsen 2010-10-12 12:56:31 UTC
A small use case: you wan't to validate a password (simply [A-Za-z0-9]{5,}), but cannot accept any case-variation of the word "password":

    <input 
      type="password" 
      name="foo" 
      pattern="(?![Pp][Aa][Ss][Ss][Ww][Oo][Rr][Dd])[A-Za-z0-9]{5,}" 
      required />

This becomes tedious very quickly. If mode specifiers were supported in Javascript regular expressions, one could try:

    (?i)(?!password)[A-Za-z0-9]{5,}

According to the spec, this should be equivalent to:

    /^(?:(?i)(?!password)[A-Za-z0-9]{5,})$/

This would have worked if Javascript's regular expressions had supported mode specifiers (it works when parsed as PCRE or a Java or a .NET regex).
Comment 2 Aryeh Gregor 2010-10-12 16:25:14 UTC
So what solution is possible here?
Comment 3 Simon Pieters 2010-10-12 16:37:02 UTC
patternflags="i"?
Comment 4 Jonas Sicking (Not reading bugmail) 2010-10-12 18:13:01 UTC
Are there other regexp flags that make sense here? If not then it might be better to just add a boolean patternignorecase attribute.
Comment 5 Ian 'Hixie' Hickson 2010-10-12 19:34:18 UTC
We could just add support for mode specifiers to JS. Or we could say that this use case really should be done from script (which is very quickly likely to become better if you're doing password strength testing).
Comment 6 Vegard Larsen 2010-10-12 20:37:32 UTC
(In reply to comment #4)
> Are there other regexp flags that make sense here? If not then it might be
> better to just add a boolean patternignorecase attribute.

Well, the other flags are global and multiline. global doesn't apply here, as the pattern has to match the entire string. multiline does not apply either, as there are no input elements that take multi-line values, and the pattern attribute isn't specified for textareas.

Your suggested attribute would work for me.

(In reply to comment #5)
> We could just add support for mode specifiers to JS. Or we could say that this
> use case really should be done from script (which is very quickly likely to
> become better if you're doing password strength testing).

That would mean changing the ECMAScript specification, but would be the best solution because it would bring Javascript regular expressions more in line with PCRE, .NET and Java regular expressions.

Depending on whether you support mode specifiers for parts of the regular expression or just globally changing the mode inside the regular expression, it could also provide much more powerful regular expressions.

As far as I can tell, there really are 4 options here (listed in my order of preference):

1. Drop the requirement that the regular expression must match the entire value, and let HTML author write a complete Javascript regex in the pattern attribute. This is probably the easiest route to take that fixes the problem. (There are probably reasons behind the "match entire value"-policy that I fail to see)

2. Add a patternignorecase attribute as suggested above.

3. Try to enforce a change in Javascript's regular expression syntax to support inline mode specifiers, even for parts of a regular expression (it really doesn't make sense only to allow setting an inline mode specifier for the whole regex). This is probably a lot of bureaucratic work, but would also have other positive side effects.

4. Delegate this functionality to scripting. Won't require a change to existing implementations. (This is the "ignore the issue, hope I go away"-alternative)
Comment 7 Jonas Sicking (Not reading bugmail) 2010-10-13 00:50:14 UTC
> 1. Drop the requirement that the regular expression must match the entire
> value, and let HTML author write a complete Javascript regex in the pattern
> attribute. This is probably the easiest route to take that fixes the problem.
> (There are probably reasons behind the "match entire value"-policy that I fail
> to see)

The problem here is that it's very unintuitive to not use the "match entire value" policy. For example if you want to enforce a US phone number, the pattern

\d\d\d-\d\d\d-\d\d\d\d

is incorrect, as it would match "hi all, i have several phone numbers, my home phone is 555-1234, but my cell phone is 415-555-4711".

Regexps are generally designed for searching, i.e. to find a substring within a bigger string. This makes them designed for being generally inclusive. What we want is something that enforces that something uses a particular pattern, i.e. we want to err on the side of being exclusive.

If we didn't use the "match enture value" policy, almost everyone would have to write patterns like "^stuff here$", and many would forget leading to bad user experience.

> 3. Try to enforce a change in Javascript's regular expression syntax to
> support inline mode specifiers, even for parts of a regular expression (it
> really doesn't make sense only to allow setting an inline mode specifier for
> the whole regex). This is probably a lot of bureaucratic work, but would also
> have other positive side effects.

This would likely take a looooong time and not be fully standardized until harmony is released (years away).

It'd be nice to do this as well, since as you notice it'd be nice to be able to match some parts case insensitively and some parts case sensitively. But I'd rather not wait for that.


> 4. Delegate this functionality to scripting. Won't require a change to
> existing implementations. (This is the "ignore the issue, hope I go
> away"-alternative)

If we do this then we likely want to make ignore-case the default behavior. That is probably more commonly useful.
Comment 8 Vegard Larsen 2010-10-13 10:44:40 UTC
(In reply to comment #7)
> The problem here is that it's very unintuitive to not use the "match entire
> value" policy. [...]If we didn't use the "match enture value" policy, almost 
> everyone would have to write patterns like "^stuff here$", and many would 
> forget leading to bad user experience.

Agreed. How about this allowing both complete AND partial regular expressions? 

* If you provide a complete regular expression, it will not be anchored by 
  default, and you will have to anchor it yourself if you want it. You can also 
  toggle the wanted flags (only /i really makes sense). 

  (We could also automatically anchor even complete regular expressions. 
   Inserting ^(?: and )$ after the starting slash and before the ending slash 
   will not screw up any regexes, even if they are already anchored).

* If you provide a partial regular expression, we default to the anchored, 
  case-sensitive form.

> If we do this then we likely want to make ignore-case the default behavior.
> That is probably more commonly useful.

That would only move the problem to the opposite side of the scale. With case-insensitivity as the default, you won't be able to demand that a password has both upper-case and lower-case characters, for instance, and this is a more apparent issue than lacking the option to do case-insensitive matching.
Comment 9 Ian 'Hixie' Hickson 2010-10-14 10:00:10 UTC
Browsers aren't going to gate their implementation of next-gen JS features until the next JS spec is done, any more than they are going to gate their implementation of HTML features until HTML is done. :-)

So I think the solution here is for us to at least try to see if browser vendors can implement this in their RegExp implementations before we add new attributes here. From a holistic point of view, it seems like the objectively better long-term solution.


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: see above
Comment 10 Jonas Sicking (Not reading bugmail) 2010-10-14 15:05:20 UTC
So the proposed solution is to expand the JS regexp syntax to allow case-sensitivity control without the use of flags?

Do we even know if this is something that TC39 is working on? Is anyone here prepared to bring forward such a proposal to them?

Just want to make sure that we're not relying on a purely theoretical feature.
Comment 11 Vegard Larsen 2010-10-19 06:24:44 UTC
(In reply to comment #10)
> Do we even know if this is something that TC39 is working on? Is anyone here
> prepared to bring forward such a proposal to them?
> 
> Just want to make sure that we're not relying on a purely theoretical feature.

There are no current (public) proposals of this nature on the TC39 wiki. There are however other extensions/changes to the regular expression syntax being discussed:

http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps
http://wiki.ecmascript.org/doku.php?id=discussion:extend_regexps

I feel it is important to reiterate that this issue only arises because we are using partial regular expressions for the pattern attribute.

Explaining how to match a word (or worse, a sentence) case-insensitively to someone with little experience with regular expressions would be immensely complicated with the draft as is. 

Not even to mention how cumbersome it would be to implement something like "type this sentence to let us know you really want to do this" that is common for digital signatures in some places.

I am escalating this issue to the WG.
Comment 12 Simon Pieters 2010-10-19 09:31:54 UTC
If you want to escalate, you should leave status as RESOLVED WONTFIX.
Comment 13 Sam Ruby 2010-10-21 18:37:31 UTC
http://www.w3.org/html/wg/tracker/issues/137
Comment 14 Jonas Sicking (Not reading bugmail) 2010-10-21 21:18:07 UTC
Just want to make sure:

Ian, given that there is currently no plans to add the ability to specify case sensitivity as part of the regexp, are you still opposed to adding an attribute to control case sensitivity?
Comment 15 Maciej Stachowiak 2010-11-10 10:50:43 UTC
(In reply to comment #9)
> Browsers aren't going to gate their implementation of next-gen JS features
> until the next JS spec is done, any more than they are going to gate their
> implementation of HTML features until HTML is done. :-)
> 
> So I think the solution here is for us to at least try to see if browser
> vendors can implement this in their RegExp implementations before we add new
> attributes here. From a holistic point of view, it seems like the objectively
> better long-term solution.
> 

JavaScript regexps take a case-insensitive flag that is not part of the regexp per se. The syntax looks like this:

case sensitive regexp: /foobar/ 
case-insensitive regexp: /foobar/i

It's near-certain the /i flag won't become part of the JavaScript regexp syntax as such (the Pattern production), since it is already provided for in the part of the regexp syntax that is not the Pattern production.

That's regular expression literals, but the RegExp constructor also takes the flags as a separate parameter, so there is very little chance of this being added as part of the core pattern syntax there too.

One simple way to address this without adding a new HTML attribute, and still deferring to ECMAScript, would be to allow the RegularExpressionLiteral production as an alternative to the Pattern production.
Comment 16 Aryeh Gregor 2010-11-10 15:02:05 UTC
(In reply to comment #15)
> It's near-certain the /i flag won't become part of the JavaScript regexp syntax
> as such (the Pattern production), since it is already provided for in the part
> of the regexp syntax that is not the Pattern production.
> 
> That's regular expression literals, but the RegExp constructor also takes the
> flags as a separate parameter, so there is very little chance of this being
> added as part of the core pattern syntax there too.

I don't see why.  Perl, PCRE, and Python regular expressions all support regular expressions like /abc (?idef)/ to mean /abc [dD][eE][fF]/, i.e., everything inside (?i...) is matched case-insensitively.  This could be used here, but also anytime you want to match part of a string case-sensitively and part case-insensitively (although I admit I've never actually had to do that in my life).

Anyway, I don't see why JavaScript can't support this syntax too.  (Except if it conflicts with existing syntax somehow?  Does /(?ifoo)/ match "?ifoo" in JavaScript right now?  I seem to get errors when I try it in Firefox and Chrome.)
Comment 17 Aryeh Gregor 2010-11-10 18:29:47 UTC
(In reply to comment #16)
> I don't see why.  Perl, PCRE, and Python regular expressions all support
> regular expressions like /abc (?idef)/ to mean /abc [dD][eE][fF]/, i.e.,
> everything inside (?i...) is matched case-insensitively.

Philip` points out that the actual syntax is /abc (?i)def/ or /abc ((?i)def)/.  But those don't seem to compile in Firefox or Chrome either, so the point stands.
Comment 18 Aryeh Gregor 2010-11-16 00:37:56 UTC
*** Bug 11319 has been marked as a duplicate of this bug. ***
Comment 20 Michael[tm] Smith 2011-08-04 05:12:11 UTC
mass-move component to LC1