This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20453 - 3.1: two questions re underscore removal
Summary: 3.1: two questions re underscore removal
Status: CLOSED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: WebIDL (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Cameron McCormack
QA Contact: public-webapps-bugzilla
URL:
Whiteboard: [v1]
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-20 04:55 UTC by Michael Dyck
Modified: 2013-08-06 04:10 UTC (History)
2 users (show)

See Also:


Attachments

Description Michael Dyck 2012-12-20 04:55:45 UTC
In 3.1, a couple of places say:
    the identifier [of some construct] is the value of the identifier token with
    any single leading U+005F LOW LINE ("_") character (underscore) removed.

(1)
The significance of the word 'single' is unclear. It could mean:
    (a) if the value of the identifier token begins with exactly one underscore,
        it is removed.
or
    (b) if the value of the identifier token begins with one or more underscores,
        exactly one is removed.

E.g., if the value of the identifier token is "__foo", is the identifier
(a) '__foo' or (b) '_foo' ?


(2)
The production:
    identifier = [A-Z_a-z][0-9A-Z_a-z]*
allows an identifier token whose value is a single underscore.
This seems to imply that the resulting identifier has no characters.
Is that intended?
Comment 1 Cameron McCormack 2013-08-04 05:46:59 UTC
I removed the "single"s and updated the regex for the identifier token.

http://dev.w3.org/cvsweb/2006/webapi/WebIDL/Overview.xml.diff?r1=1.673;r2=1.674;f=h
http://dev.w3.org/cvsweb/2006/webapi/WebIDL/v1.xml.diff?r1=1.115;r2=1.116;f=h
Comment 2 Michael Dyck 2013-08-04 20:24:29 UTC
I've since noticed two things:

(1) Both the old and new regex allow an identifier token such as _42,
    which would lead to an identifier of just 42, which seems undesireable.

(2) "The identifier of any of the abovementioned IDL constructs MUST NOT ...
    begin with a U+005F LOW LINE ("_") character."
    Which would seem to disallow an identifier token (or at least one that
    provides the identifier for some IDL construct) from having two or more
    leading underscores (which would obviate the original question about
    underscore removal).

So I wonder if the regex for identifier token should be:

    _?[A-Za-z][0-9A-Z_a-z]*