This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
In 3.1, a couple of places say: the identifier [of some construct] is the value of the identifier token with any single leading U+005F LOW LINE ("_") character (underscore) removed. (1) The significance of the word 'single' is unclear. It could mean: (a) if the value of the identifier token begins with exactly one underscore, it is removed. or (b) if the value of the identifier token begins with one or more underscores, exactly one is removed. E.g., if the value of the identifier token is "__foo", is the identifier (a) '__foo' or (b) '_foo' ? (2) The production: identifier = [A-Z_a-z][0-9A-Z_a-z]* allows an identifier token whose value is a single underscore. This seems to imply that the resulting identifier has no characters. Is that intended?
I removed the "single"s and updated the regex for the identifier token. http://dev.w3.org/cvsweb/2006/webapi/WebIDL/Overview.xml.diff?r1=1.673;r2=1.674;f=h http://dev.w3.org/cvsweb/2006/webapi/WebIDL/v1.xml.diff?r1=1.115;r2=1.116;f=h
I've since noticed two things: (1) Both the old and new regex allow an identifier token such as _42, which would lead to an identifier of just 42, which seems undesireable. (2) "The identifier of any of the abovementioned IDL constructs MUST NOT ... begin with a U+005F LOW LINE ("_") character." Which would seem to disallow an identifier token (or at least one that provides the identifier for some IDL construct) from having two or more leading underscores (which would obviate the original question about underscore removal). So I wonder if the regex for identifier token should be: _?[A-Za-z][0-9A-Z_a-z]*
Sounds good, thanks. http://dev.w3.org/cvsweb/2006/webapi/WebIDL/Overview.xml.diff?r1=1.687;r2=1.688;f=h http://dev.w3.org/cvsweb/2006/webapi/WebIDL/v1.xml.diff?r1=1.129;r2=1.130;f=h