Bugzilla – Bug 20453
3.1: two questions re underscore removal
Last modified: 2013-08-06 04:10:46 UTC
In 3.1, a couple of places say:
the identifier [of some construct] is the value of the identifier token with
any single leading U+005F LOW LINE ("_") character (underscore) removed.
The significance of the word 'single' is unclear. It could mean:
(a) if the value of the identifier token begins with exactly one underscore,
it is removed.
(b) if the value of the identifier token begins with one or more underscores,
exactly one is removed.
E.g., if the value of the identifier token is "__foo", is the identifier
(a) '__foo' or (b) '_foo' ?
identifier = [A-Z_a-z][0-9A-Z_a-z]*
allows an identifier token whose value is a single underscore.
This seems to imply that the resulting identifier has no characters.
Is that intended?
I removed the "single"s and updated the regex for the identifier token.
I've since noticed two things:
(1) Both the old and new regex allow an identifier token such as _42,
which would lead to an identifier of just 42, which seems undesireable.
(2) "The identifier of any of the abovementioned IDL constructs MUST NOT ...
begin with a U+005F LOW LINE ("_") character."
Which would seem to disallow an identifier token (or at least one that
provides the identifier for some IDL construct) from having two or more
leading underscores (which would obviate the original question about
So I wonder if the regex for identifier token should be:
Sounds good, thanks.