17452 – WebIDL: at some places in the grammar you probably intend mandatory whitespace

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17452 - WebIDL: at some places in the grammar you probably intend mandatory whitespace

Summary: WebIDL: at some places in the grammar you probably intend mandatory whitespace

Status:	RESOLVED INVALID

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	WebIDL (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 major
Target Milestone:	---
Assignee:	Cameron McCormack
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-09 10:22 UTC by Wolfgang Keller
Modified:	2012-06-22 06:24 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Wolfgang Keller 2012-06-09 10:22:58 UTC

To quote from the WebIDL specification:

"Implicitly, the whitespace terminal is allowed between every terminal in the input text being parsed. Such whitespace terminals, which actually encompass both whitespace and comments, are ignored while parsing."

I believe at some places in the grammar you want to put mandatory whitespace between the terminals because otherwise the grammar would probably not be unique.

An example of such a rule is
[25]	ImplementsStatement	→	identifier "implements" identifier ";"

Here you surely want to put a mandatory whitespace between identifier and "implements" because otherwise we could not detect whether

"fooimplementsbarimplementsbluv" stands for
"fooimplementsbar implements bluv"
or
"foo implements barimplementsbluv".

You should mark all places where you want to require mandatory whitespaces.

Comment 1 Cameron McCormack 2012-06-22 06:24:45 UTC

"fooimplementsbarimplementsbluv" must be tokenised as a single identifier token, because of the rule that says to tokenise the longest thing it can:

  When tokenizing, the longest possible match MUST be used. For example, if the
  input text is “a1”, it is tokenized as a single identifier, and not as a
  separate identifier and integer. If the longest possible match could match both
  an identifier and one of the quoted terminal symbols from the grammar, it MUST
  be tokenized as the quoted terminal symbol. Thus, the input text “long” is
  tokenized as the quoted terminal symbol "long" rather than an identifier called
  “long”.

So I don't think there is a need to annotate in the grammar where explicit white space is required.