This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Here is my suggested alternate wording (referred to in several of my recent comments) for the content of section A.2, and the three 'restrictive' grammar-notes of A.1.1. --- It incorporates suggestions from my recent comments. --- It resolves some of the open-ended comments. --- It expresses the three 'restrictive' grammar-notes in a more uniform way. --- It doesn't need the "longest match" rule. --- It doesn't need to define "terminal" (much less "delimiting" and "non-delimiting") or introduce long lists of symbols. --- It defines the language without invoking parsing, but also includes remarks about how the language design affects parser construction. --- It is generally more precise and concise (except for the syntax trees!), and yet (I think) it's also more readable, because it proceeds in a more logical fashion. ---------------------------------------- It needs this production added to A.1 EBNF: [154] Filler ::= ( #x20 | #x9 | #xD | #xA | Comment )+ /* ws: explicit */ (Alternatively, [154] Filler ::= ( S | Comment )+ /* ws: explicit */ might work.) A.2 Whitespace Characters and Filler A 'whitespace character' is any of the characters referenced in the right-hand-side of [...#NT-S]. In terms of this grammar, there are two mechanisms by which whitespace characters can appear in queries: explicit and implicit. Explicit: In the EBNF, there are places where whitespace characters are specifically allowed, either via references to the symbols S or Char, or via the notation [^abc] (which implicitly involves Char). (Note that comments are *not* allowed via this mechanism.) For instance, whitespace characters are specified explicitly in the productions for direct constructors, in order to be more consistent with the corresponding constructs in the XML grammar. Implicit: Whitespace characters and comments (collectively known as "filler") can be used in most expressions even though not explicitly allowed by the EBNF. Specifically, for each production that is not marked 'ws: explicit', filler is allowed between any two symbols that the production directly derives. For example, [51] MultiplicativeExpr ::= UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )* directly derives infinitely many finite sequences of symbols, one of which is UnionExpr "*" UnionExpr "div" UnionExpr Filler is allowed in the 4 'gaps' between these 5 symbols. In effect, the derivation step would result in: UnionExpr Filler? "*" Filler? UnionExpr Filler? "div" Filler? UnionExpr Note that this interpolation of filler only applies to one derivation step, the one directly involving the production in question; subsequent derivation steps will use other productions, which might have a 'ws: explicit' marking (and thus would *not* interpolate filler between symbols). Filler is also allowed at the start and end of a module. A.3 Culls The preceding two sections define a language containing all syntactically legal queries. However, it also includes some syntax trees / queries which we decree to be illegal. We do this in order to either eliminate ambiguity or make parsing easier. This section specifies those illegal queries. Note that when we say a syntax tree is illegal, this doesn't necessarily mean say that the resulting query (i.e., sequence of characters) is illegal. It's possible that there is a legal syntax tree resulting in the same query, in which case the query is legal. There's an example of this later. Definition: A 'keyword' is a symbol that appears in the EBNF as a quoted string, such that the characters inside the quotes conform to the syntax of an NCName (e.g., "while", "preceding-sibling"). (1) In two cases, the filler that is merely 'allowed' in A.2 is required; i.e., it is illegal for the interpolated 'Filler?' to derive the empty string. The cases are specified by the presence, in the syntax tree, of certain symbols to the left and right of the empty filler: (a) a keyword on one side (either side), and a keyword, QName, NCName, NumericLiteral, or StringLiteral on the other. For instance, consider the abbreviated syntax tree Expr | MultiplicativeExpr | +---------+-------+-------+--------+ | | | | | UnionExpr Filler? "div" Filler? UnionExpr | | | | | NumericLiteral | | | NumericLiteral | | | | | ++ | +++ | | || | ||| | | 10 [?] div [?] 3 The "div" is a keyword, so in both filler positions (bounded by NumericLiteral/keyword and keyword/NumericLiteral respectively), it is illegal to have empty filler. This leads to the conclusion that 10 div 3 10 div(:comment:)3 (for instance) are legal queries resulting from this tree, but 10div 3 10 div3 10div3 are illegal queries. (b) a keyword, QName, or NCName on the left, and "-" or "." on the right. For instance, consider the abbreviated syntax tree Expr | AdditiveExpr | +-------+------+------+-------+ | | | | | MultExpr Filler? "-" Filler? MultExpr | | | | | QName | | | QName | | | | | +++ | | | +++ ||| | | | ||| foo [?] - [?] foo In the first filler position (bounded by QName/"-"), it is illegal to have empty filler. In the second position (bounded by "-"/QName), there is no such constraint. Thus, foo - foo foo -foo foo(: comment :)- foo foo(: comment :)-foo are all legal queries resulting from this tree, but foo- foo is illegal. The query foo-foo is illegal as far as *this* tree is concerned, but it happens to be legal via a different syntax tree: Expr | QName | +++++++ ||||||| foo-foo because hyphen is a valid name character. (2) [grammar-note: occurrence-indicators] Consider these abbreviated syntax trees: AdditiveExpr | +--------------------+----------+-----------+ | | | MultiplicativeExpr "-" MultiplicativeExpr | | | TreatExpr | | | | | +-----------+--+---+-------------+ | | | | | | | | CastableExpr "treat" "as" SequenceType | | | | | | | | | | | +----+----+ | | | | | | | | | | | | ItemType OccIndicator | | | | | | | | | 4 treat as item() + - 5 AdditiveExpr | +--------------------+----+-----------+ | | | MultiplicativeExpr "+" MultiplicativeExpr | | | TreatExpr | UnaryExpr | | | +-----------+--+---+-------------+ | +-------+ | | | | | | | CastableExpr "treat" "as" SequenceType | "-" ValueExpr | | | | | | | | | | | | | | | | | | | | | | | | ItemType | | | | | | | | | | 4 treat as item() + - 5 This illustrates an ambiguity in the EBNF, which is resolved by making the second tree illegal. Specifically, a syntax tree is illegal if an ItemType is followed by a "+", "*", or "?" that is *not* an OccurrenceIndicator. (The presence or absence of filler makes no difference to the illegality.) (Thus, a parser, having recognized an ItemType, and seeing a "+", "*", or "?", can be certain that the latter is an OccurrenceIndicator.) Note that the elimination of such trees does not create a semantic hole in the language: one can easily construct a legal query that is semantically equivalent to one of these illegal trees, simply by parenthesizing the expression that most closely contains the ItemType. (The illegal trees can only occur when the ItemType is within an expression.) (3) [grammar-note: leading-lone-slash] Consider the two abbreviated syntax trees: Expr Expr | | PathExpr MultiplicativeExpr | | +-----+----+ +-------+-------+ | | | | | "/" RelativePathExpr UnionExpr "*" UnionExpr | | | | | | StepExpr PathExpr | | | | | | | | Wildcard "/" | | | | | | | / * / * 5 Although there isn't an ambiguity, some parsers would have trouble distinguishing between these two alternatives. Similarly with: Expr Expr | | PathExpr UnionExpr | | +-----+----+ +---------+---------+ | | | | | "/" RelativePathExpr IntExExpr "union" IntExExpr | | | | | | StepExpr PathExpr | | | | | | | | QName "/" | | | | | | | / union / union $x Therefore, in each pair, the tree on the right is deemed illegal. Specifically, a syntax tree is illegal if it contains: -- a PathExpr that only derives "/" followed by: -- "*" or a keyword. (Thus, a parser, seeing (in an expression context) a slash followed by a star or what-could-be-a-keyword, can be certain that the slash is not a complete PathExpr, but rather that start of a PathExpr.) Again, there is no semantic hole: one can easily construct a legal query that is semantically equivalent to one of these illegal trees, simply by putting the "lone slash" in parentheses. (4) [grammar-note: reserved-function-names] Consider these abbreviated syntax trees: Expr Expr | | IfExpr FunctionCall | | +----+----+-----+-----+--.... +-----+-----+-+--------+ | | | | | | | | | "if" "(" Expr ")" "then" QName "(" ExprSingle ")" | | | | | | | | | if ( foo ) then if ( foo ) Although there isn't an ambiguity, some parsers would be unable to distinguish between these two alternatives. Therefore, the right-hand tree is deemed illegal. Specifically, a syntax tree is illegal if it contains a FunctionCall whose QName: -- does not have a prefix, and -- has a local-part that is one of the following NCNames: attribute comment etc (Thus, a parser, seeing (in an expression context) one of those words followed by a open parenthesis, can be certain that it is not the start of a FunctionCall.) [Way to construct semantically equivalent query?]
As pointed out by Michael Kay over at Bug 1368, there *is* an ambiguity involving leading-lone-slash, which is a more compelling argument than the merely "troublesome" case that the spec (and then I) used. So please slot this replacement into A.3 (3) above: (3) [grammar-note: leading-lone-slash] Consider the two abbreviated syntax trees: Expr Expr | | PathExpr MultiplicativeExpr | | +------+-----+ +-------+-------+ | | | | | "/" RelativePathExpr UnionExpr "*" UnionExpr | | | | | | +------+-----+ PathExpr | PathExpr | | | | | | | | StepExpr "/" StepExpr "/" | +---+-----+ | | | | | | | | | NameTest | NameTest | | "/" RelativePathExpr | | | | | | | | | Wildcard | QName | | | StepExpr | | | | | | | | / * / foo / * / foo This illustrates another ambiguity in the EBNF. Similarly with: Expr Expr | | PathExpr UnionExpr | | +------+-----+ +---------+---------+ | | | | | "/" RelativePathExpr IntExExpr "union" IntExExpr | | | | | | +------+-----+ PathExpr | PathExpr | | | | | | | | StepExpr "/" StepExpr "/" | +---+-----+ | | | | | | | | | NameTest | NameTest | | "/" RelativePathExpr | | | | | | | | | QName | QName | | | StepExpr | | | | | | | | / union / foo / union / foo In each pair, the ambiguity is resolved by disallowing the tree on the right. Specifically, [etc as before]
(In reply to comment #0) I don't think this design or language is clearer than the existing design or language, though I may end up using some parts of it, the Filler production, for example. I certainly don't want to start using example parse trees. Once I'm done with all other modifications, the WG can compare that draft with this proposal and decide if they want to make a more radical change.
A joint meeting of the Query and XSLT working groups considered this comment on July 20, 2005. The WG does not agree that this proposal is a significant improvement over the existing text, and declines to adapt it. If you do not agree with this resolution, please add a comment explaining why. If you wish to appeal the WG's decision to the Director, then change the Status of the record to Reopened. If we do not hear from you in the next two weeks, we will assume you agree with the WG decision.