Bug 24959 - "Exposed=Window,Worker" will be parsed to 2 extended attributes
Summary: "Exposed=Window,Worker" will be parsed to 2 extended attributes
Status: RESOLVED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: WebIDL (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Cameron McCormack
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-06 18:35 UTC by Stefan Haustein
Modified: 2014-07-29 15:51 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Haustein 2014-03-06 18:35:58 UTC
http://heycam.github.io/webidl/#Exposed

[[
Exposed=Window,Worker
]]

If I understand the grammar correctly, this is parsed to two different extended attributes, "Exposed=Window" and "Worker", see http://heycam.github.io/webidl/#prod-ExtendedAttribute

I assume this is not intended here. Perhaps use a different delimiter, e.g. ';' or whitespace?
Comment 1 Nils Barth 2014-05-02 02:30:03 UTC
I believe that the issue is that extended attribute phrase grammar is *context-sensitive*, specifically due to collision of , as a extended attribute separator  (*between* extended attributes) and as an identifier separator (*within* an extended attribute).

Cameron, is this intended?

This makes parsing extended attributes *much* more difficult, because we need to do it during semantic analysis, not during parsing.

Consider supporting both [Exposed=Window,Worker] and [Global]
(grammar-wise: these are both valid interface extended attributes).

Writing this as:
[Exposed=Window,Worker,Global]
...is ambiguous, and requires complex semantic analysis just to *parse*.

Writing this as:
[Exposed=Window,Worker, Global]
...seems to disambiguate, but requires white space to be significant.

Using a different separator fixes this:
[Exposed=Window;Worker,Global]

In Blink we've been using | for extended attribute value separators, so we'd write this as:
[Exposed=Window|Worker,Global]

...or actually (style-wise):
[
    Exposed=Window|Worker,
    Global,
] interface Foo { ... };

Specifically, the problem is with the production rule ExtendedAttributeIdentList,
which allows one to write [Exposed=Window,Worker]
http://heycam.github.io/webidl/#prod-ExtendedAttributeIdentList

It's currently used for 3 extended attributes:
[Exposed], [Global], and [PrimaryGlobal]


The underlying issue is that the extended attribute phrase grammar is *context-sensitive*, and thus needs to be parsed during semantic analysis.

This is stated at:
http://heycam.github.io/webidl/#idl-extended-attributes
"""
Any extended attribute encountered in an IDL fragment is matched against the following six grammar symbols to determine which form (or forms) it is in:
ExtendedAttributeNoArgs
ExtendedAttributeArgList
ExtendedAttributeNamedArgList
ExtendedAttributeIdent
ExtendedAttributeIdentList
ExtendedAttributeTypePair
...
Each extended attribute definition will state which of the above six forms are allowed.
"""
I.e., the phrase grammar depends on the ExtendedAttribute (the value of the key).

This is *not* a serious issue for argument lists, like [Constructor(foo x, bar y)], since the () delimiters set this off, allowing context-free parsing.

However, for bare identifier lists, this causes parsing problems.

Could we change this?
Comment 2 Nils Barth 2014-05-02 08:10:08 UTC
Actually, how about making it a tuple, i.e., surrounding it in ()?

That just requires changing ExtendedAttributeIdentList
...from:
ExtendedAttributeIdentList → identifier "=" IdentifierList
...to:
ExtendedAttributeIdentList → identifier "=" "(" IdentifierList ")"

...and then allowing both ExtendedAttributeIdent and ExtendedAttributeIdentList.

Thus we could unambiguously write
[Exposed=(Window, Worker), Global]
...or:
[Exposed=Window, Global]
Comment 3 Dirk Schulze 2014-05-02 09:56:38 UTC
WebKit and Blink just use the '&' operator. This seems to be a good compromise in comparison to a more complex grammar:

Exposed=Window&Worker
Comment 4 Nils Barth 2014-05-02 10:01:52 UTC
Thanks Dirk: yup, the other alternative is to use a different separator.

In Blink, we use | or & for separating (depending on which is clearer in context); probably cleaner-looking would be ; as suggested by OP, though I think a tuple with commas (..., ...) is probably clearer still.
Comment 5 Dirk Schulze 2014-05-02 10:14:43 UTC
(In reply to Nils Barth from comment #4)
> Thanks Dirk: yup, the other alternative is to use a different separator.
> 
> In Blink, we use | or & for separating (depending on which is clearer in
> context); probably cleaner-looking would be ; as suggested by OP, though I
> think a tuple with commas (..., ...) is probably clearer still.

I fear that the grammar gets more complex. So far it is just Exposed that has this issue (at least in the WebIDL spec). Do we expect that we get into situations where tuples are the only choice and a different delimiter can't do the job? Something like nested tuples: ((.. , ..) , (.. , ..)) ? I hope not.
Comment 6 Christophe Dumez 2014-05-02 12:15:47 UTC
(In reply to Dirk Schulze from comment #5)
> (In reply to Nils Barth from comment #4)
> > Thanks Dirk: yup, the other alternative is to use a different separator.
> > 
> > In Blink, we use | or & for separating (depending on which is clearer in
> > context); probably cleaner-looking would be ; as suggested by OP, though I
> > think a tuple with commas (..., ...) is probably clearer still.
> 
> I fear that the grammar gets more complex. So far it is just Exposed that
> has this issue (at least in the WebIDL spec).

Actually, [Global] has the same issue:
[Global=Worker,DedicatedWorker]
Comment 7 Dirk Schulze 2014-05-02 12:49:03 UTC
(In reply to Christophe Dumez from comment #6)
> (In reply to Dirk Schulze from comment #5)
> > (In reply to Nils Barth from comment #4)
> > > Thanks Dirk: yup, the other alternative is to use a different separator.
> > > 
> > > In Blink, we use | or & for separating (depending on which is clearer in
> > > context); probably cleaner-looking would be ; as suggested by OP, though I
> > > think a tuple with commas (..., ...) is probably clearer still.
> > 
> > I fear that the grammar gets more complex. So far it is just Exposed that
> > has this issue (at least in the WebIDL spec).
> 
> Actually, [Global] has the same issue:
> [Global=Worker,DedicatedWorker]

No condition nesting so far.
Comment 8 Christophe Dumez 2014-05-02 13:45:32 UTC
(In reply to Dirk Schulze from comment #7)
> (In reply to Christophe Dumez from comment #6)
> > (In reply to Dirk Schulze from comment #5)
> > > (In reply to Nils Barth from comment #4)
> > > > Thanks Dirk: yup, the other alternative is to use a different separator.
> > > > 
> > > > In Blink, we use | or & for separating (depending on which is clearer in
> > > > context); probably cleaner-looking would be ; as suggested by OP, though I
> > > > think a tuple with commas (..., ...) is probably clearer still.
> > > 
> > > I fear that the grammar gets more complex. So far it is just Exposed that
> > > has this issue (at least in the WebIDL spec).
> > 
> > Actually, [Global] has the same issue:
> > [Global=Worker,DedicatedWorker]
> 
> No condition nesting so far.

That's right. I was just saying that [Exposed] and [Global] have the same issue. I am personally in favor of using another separator than the comma. We use the ampersand ('&') in Blink and I think this is fine.
Comment 9 Cameron McCormack 2014-07-22 22:51:14 UTC
I've gone with Nils' suggestion of parenthesizing the identifier list, but also allowing a single bare identifier.

https://github.com/heycam/webidl/commit/0527ec461492849d310257c50a7d2c54b4632b91
Comment 10 Dirk Schulze 2014-07-23 06:04:38 UTC
(In reply to Cameron McCormack from comment #9)
> I've gone with Nils' suggestion of parenthesizing the identifier list, but
> also allowing a single bare identifier.
> 
> https://github.com/heycam/webidl/commit/
> 0527ec461492849d310257c50a7d2c54b4632b91

I really don't think that this is a good idea.

With parenthesizing we have a slightly more complex IDL definition that needs to be maintained. IMO a IDL code generator must be as light weighted as possible and as maintainable as possible. After all, we are trusting the generator to produce code so that we don't need to do it manually. Everything that is less error prone is good for WebIDL. IMO a new separator is easier then scanning for start and end parenthesis.

Another argument in favor of a new separator is that we have at least two implementations with experience in it.
Comment 11 Boris Zbarsky 2014-07-23 06:45:03 UTC
Dirk, what is the grammar that you'd propose, exactly?  I'd like to see whether it's actually simpler than the grammar that ended up in the spec...  I assume you can just copy the grammar rules the Blink or WebKit IDL parser uses for extended attributes here.
Comment 12 Dirk Schulze 2014-07-23 07:19:53 UTC
(In reply to Boris Zbarsky from comment #11)
> Dirk, what is the grammar that you'd propose, exactly?  I'd like to see
> whether it's actually simpler than the grammar that ended up in the spec... 
> I assume you can just copy the grammar rules the Blink or WebKit IDL parser
> uses for extended attributes here.

To take:

    Exposed=(Window,Worker, Global)

as example, it would look like this in WebKit and Blink:

    Exposed=Window&Worker&Global

I do not advocate '&' as the separator of choice, even though I think it makes sense.
Comment 13 Boris Zbarsky 2014-07-23 12:52:55 UTC
I didn't ask what the syntax looks like; I know that part.  I asked what the grammar rules look like.  You're claiming that the current grammar rules in the spec are too complex.  The syntax you list does not match the old spec grammar, as far as I can tell, so presumably you have some other, simpler, grammar rules you are proposing.  What are those grammar rules?
Comment 14 Boris Zbarsky 2014-07-25 16:25:02 UTC
Alright, given lack of response, I'll assume it's just "I don't want to change my code" grumbling.  I'll be going ahead with the current spec text in Gecko.
Comment 15 Dirk Schulze 2014-07-25 16:34:20 UTC
(In reply to Boris Zbarsky from comment #14)
> Alright, given lack of response, I'll assume it's just "I don't want to
> change my code" grumbling.  I'll be going ahead with the current spec text
> in Gecko.

I am sure you didn't need my help to find the relevant code. You proved that at previous occasions. For the bug report record:

    my @callWithKeywords = split /\s*\&\s*/, $callWith;
Comment 16 Boris Zbarsky 2014-07-25 16:52:46 UTC
Dirk, that's not a grammar rule.  That's a regular expression match on an already-tokenized thing.  The grammar rule would be the thing that determines what exact text is in $callWith in that line.

And just because I'm willing to take the time to go read WebKit's code generator once doesn't mean I want to do it every time.
Comment 17 Boris Zbarsky 2014-07-25 17:46:33 UTC
Though, the grammar rules in the spec are no longer one-token lookahead.  This is what the spec has, basically:

ExtendedAttributeIdent	→	identifier "=" identifier
ExtendedAttributeIdentList	→	identifier "=" IdentOrParenthesizedIdentifierList
IdentOrParenthesizedIdentifierList	→	identifier
 | "(" IdentifierList ")"

So [Foo=Bar] it's ambiguous: is that an ExtendedAttributeIdent or an ExtendedAttributeIdentList without the parens?

It's not clear to me why we can't just allow the Exposed extended attribute value to match either ExtendedAttributeIdent or ExtendedAttributeIdentList and then require the latter to be parenthesized.
Comment 18 Boris Zbarsky 2014-07-25 18:10:40 UTC
So specifically, I'm proposing:

ExtendedAttributeIdent	→	identifier "=" identifier
ExtendedAttributeIdentList	→	identifier "=" "(" IdentifierList ")"
Comment 19 Nils Barth 2014-07-25 19:44:41 UTC
Boris, that sounds sensible (single ident vs. ident list); that way:
[Foo=Bar]
[Foo=(Bar)]
[Foo=(Bar, Zork)]
...are all unambiguous.
Comment 21 Nils Barth 2014-07-29 15:51:46 UTC
Thanks Cameron!