This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1390 - [XQuery] suggested alternate wording for A.2 and some A.1.1
Summary: [XQuery] suggested alternate wording for A.2 and some A.1.1
Status: CLOSED INVALID
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 1.0 (show other bugs)
Version: Last Call drafts
Hardware: All All
: P2 major
Target Milestone: ---
Assignee: Scott Boag
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard: grammar
Keywords:
Depends on:
Blocks: 1367 1448
  Show dependency treegraph
 
Reported: 2005-05-11 07:45 UTC by Michael Dyck
Modified: 2005-09-29 11:01 UTC (History)
0 users

See Also:


Attachments

Description Michael Dyck 2005-05-11 07:45:38 UTC
Here is my suggested alternate wording (referred to in several of my recent
comments) for the content of section A.2, and the three 'restrictive'
grammar-notes of A.1.1.

--- It incorporates suggestions from my recent comments.
--- It resolves some of the open-ended comments.
--- It expresses the three 'restrictive' grammar-notes in a more uniform way.
--- It doesn't need the "longest match" rule.
--- It doesn't need to define "terminal" (much less "delimiting" and
    "non-delimiting") or introduce long lists of symbols.
--- It defines the language without invoking parsing, but also includes remarks
    about how the language design affects parser construction.
--- It is generally more precise and concise (except for the syntax trees!), and
    yet (I think) it's also more readable, because it proceeds in a more logical
    fashion.

----------------------------------------

It needs this production added to A.1 EBNF:
    [154] Filler ::= ( #x20 | #x9 | #xD | #xA | Comment )+ /* ws: explicit */
(Alternatively,
    [154] Filler ::= ( S | Comment )+  /* ws: explicit */
might work.)


A.2 Whitespace Characters and Filler

    A 'whitespace character' is any of the characters referenced in the
    right-hand-side of [...#NT-S].

    In terms of this grammar, there are two mechanisms by which whitespace
    characters can appear in queries: explicit and implicit.

    Explicit: In the EBNF, there are places where whitespace characters are
    specifically allowed, either via references to the symbols S or Char, or
    via the notation [^abc] (which implicitly involves Char). (Note that
    comments are *not* allowed via this mechanism.)

    For instance, whitespace characters are specified explicitly in the
    productions for direct constructors, in order to be more consistent with
    the corresponding constructs in the XML grammar.

    Implicit: Whitespace characters and comments (collectively known as
    "filler") can be used in most expressions even though not explicitly
    allowed by the EBNF.  Specifically, for each production that is not
    marked 'ws: explicit', filler is allowed between any two symbols that
    the production directly derives.

    For example,
        [51] MultiplicativeExpr ::=
                        UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
    directly derives infinitely many finite sequences of symbols, one of
    which is
        UnionExpr "*" UnionExpr "div" UnionExpr
    Filler is allowed in the 4 'gaps' between these 5 symbols. In effect,
    the derivation step would result in:
        UnionExpr Filler? "*" Filler? UnionExpr Filler? "div" Filler? UnionExpr

    Note that this interpolation of filler only applies to one derivation step,
    the one directly involving the production in question; subsequent derivation
    steps will use other productions, which might have a 'ws: explicit' marking
    (and thus would *not* interpolate filler between symbols).

    Filler is also allowed at the start and end of a module.

A.3 Culls

    The preceding two sections define a language containing all syntactically
    legal queries. However, it also includes some syntax trees / queries which
    we decree to be illegal.  We do this in order to either eliminate ambiguity
    or make parsing easier.  This section specifies those illegal queries.

    Note that when we say a syntax tree is illegal, this doesn't necessarily
    mean say that the resulting query (i.e., sequence of characters) is illegal.
    It's possible that there is a legal syntax tree resulting in the same query,
    in which case the query is legal. There's an example of this later.

    Definition: A 'keyword' is a symbol that appears in the EBNF as a quoted
    string, such that the characters inside the quotes conform to the syntax
    of an NCName (e.g., "while", "preceding-sibling").

    (1)
    In two cases, the filler that is merely 'allowed' in A.2 is required; i.e.,
    it is illegal for the interpolated 'Filler?' to derive the empty string.
    The cases are specified by the presence, in the syntax tree, of certain
    symbols to the left and right of the empty filler:

    (a) a keyword on one side (either side), and a keyword, QName, NCName,
        NumericLiteral, or StringLiteral on the other.

        For instance, consider the abbreviated syntax tree

                                   Expr
                                    |
                           MultiplicativeExpr
                                    |
                  +---------+-------+-------+--------+
                  |         |       |       |        |
              UnionExpr  Filler?  "div"  Filler?  UnionExpr
                  |         |       |       |        |
            NumericLiteral  |       |       |  NumericLiteral
                  |         |       |       |        |
                  ++        |      +++      |        |
                  ||        |      |||      |        |
                  10       [?]     div     [?]       3

        The "div" is a keyword, so in both filler positions (bounded by
        NumericLiteral/keyword and keyword/NumericLiteral respectively), it is
        illegal to have empty filler. This leads to the conclusion that
            10 div 3
            10 div(:comment:)3
        (for instance) are legal queries resulting from this tree, but
            10div 3
            10 div3
            10div3
        are illegal queries.

    (b) a keyword, QName, or NCName on the left, and "-" or "." on the
        right.

        For instance, consider the abbreviated syntax tree

                              Expr
                               |
                         AdditiveExpr
                               |
                +-------+------+------+-------+
                |       |      |      |       |
            MultExpr Filler?  "-"  Filler? MultExpr
                |       |      |      |       |
              QName     |      |      |     QName
                |       |      |      |       |
               +++      |      |      |      +++
               |||      |      |      |      |||
               foo     [?]     -     [?]     foo

        In the first filler position (bounded by QName/"-"), it is illegal
        to have empty filler. In the second position (bounded by "-"/QName),
        there is no such constraint. Thus,
            foo - foo
            foo -foo
            foo(: comment :)- foo
            foo(: comment :)-foo
        are all legal queries resulting from this tree, but
            foo- foo
        is illegal.

        The query
            foo-foo
        is illegal as far as *this* tree is concerned, but it happens to be
        legal via a different syntax tree:

              Expr
               |
             QName
               |
            +++++++
            |||||||
            foo-foo

        because hyphen is a valid name character.

    (2)
    [grammar-note: occurrence-indicators]

    Consider these abbreviated syntax trees:

                                        AdditiveExpr
                                             |
                        +--------------------+----------+-----------+
                        |                               |           |
                MultiplicativeExpr                     "-"  MultiplicativeExpr
                        |                               |           |
                     TreatExpr                          |           |
                        |                               |           |
         +-----------+--+---+-------------+             |           |
         |           |      |             |             |           |
    CastableExpr  "treat"  "as"     SequenceType        |           |
         |           |      |             |             |           |
         |           |      |        +----+----+        |           |
         |           |      |        |         |        |           |
         |           |      |    ItemType OccIndicator  |           |
         |           |      |        |         |        |           |
         4         treat    as     item()      +        -           5


                                        AdditiveExpr
                                             |
                        +--------------------+----+-----------+
                        |                         |           |
                MultiplicativeExpr               "+"  MultiplicativeExpr
                        |                         |           |
                     TreatExpr                    |        UnaryExpr
                        |                         |           |
         +-----------+--+---+-------------+       |       +-------+
         |           |      |             |       |       |       |
    CastableExpr  "treat"  "as"     SequenceType  |      "-"  ValueExpr
         |           |      |             |       |       |       |
         |           |      |             |       |       |       |
         |           |      |             |       |       |       |
         |           |      |         ItemType    |       |       |
         |           |      |             |       |       |       |
         4         treat    as          item()    +       -       5

    This illustrates an ambiguity in the EBNF, which is resolved by making the
    second tree illegal.  Specifically, a syntax tree is illegal if an ItemType
    is followed by a "+", "*", or "?" that is *not* an OccurrenceIndicator.
    (The presence or absence of filler makes no difference to the illegality.)

    (Thus, a parser, having recognized an ItemType, and seeing a "+", "*", or
    "?", can be certain that the latter is an OccurrenceIndicator.)

    Note that the elimination of such trees does not create a semantic hole in
    the language: one can easily construct a legal query that is semantically
    equivalent to one of these illegal trees, simply by parenthesizing the
    expression that most closely contains the ItemType. (The illegal trees can
    only occur when the ItemType is within an expression.)

    (3)
    [grammar-note: leading-lone-slash]
    Consider the two abbreviated syntax trees:

              Expr                            Expr
               |                               |
            PathExpr                  MultiplicativeExpr
               |                               |
         +-----+----+                  +-------+-------+
         |          |                  |       |       |
        "/"  RelativePathExpr      UnionExpr  "*"  UnionExpr
         |          |                  |       |       |
         |       StepExpr           PathExpr   |       |
         |          |                  |       |       |
         |       Wildcard             "/"      |       |
         |          |                  |       |       |
         /          *                  /       *       5

    Although there isn't an ambiguity, some parsers would have trouble
    distinguishing between these two alternatives.  Similarly with:

              Expr                              Expr
               |                                 |
            PathExpr                         UnionExpr
               |                                 |
         +-----+----+                  +---------+---------+
         |          |                  |         |         |
        "/"  RelativePathExpr      IntExExpr  "union"  IntExExpr
         |          |                  |         |         |
         |       StepExpr           PathExpr     |         |
         |          |                  |         |         |
         |        QName               "/"        |         |
         |          |                  |         |         |
         /        union                /       union      $x

    Therefore, in each pair, the tree on the right is deemed illegal.
    Specifically, a syntax tree is illegal if it contains:
        -- a PathExpr that only derives "/"
        followed by:
        -- "*" or a keyword.

    (Thus, a parser, seeing (in an expression context) a slash followed by a
    star or what-could-be-a-keyword, can be certain that the slash is not a
    complete PathExpr, but rather that start of a PathExpr.)

    Again, there is no semantic hole: one can easily construct a legal query
    that is semantically equivalent to one of these illegal trees, simply by
    putting the "lone slash" in parentheses.

    (4)
    [grammar-note: reserved-function-names]

    Consider these abbreviated syntax trees:

                 Expr                              Expr
                  |                                 |
                IfExpr                         FunctionCall
                  |                                 |
      +----+----+-----+-----+--....     +-----+-----+-+--------+
      |    |    |     |     |           |     |       |        |
    "if"  "("  Expr  ")"  "then"      QName  "("  ExprSingle  ")"
      |    |    |     |     |           |     |       |        |
     if    (   foo    )    then         if    (      foo       )

    Although there isn't an ambiguity, some parsers would be unable to
    distinguish between these two alternatives.  Therefore, the right-hand tree
    is deemed illegal.  Specifically, a syntax tree is illegal if it contains a
    FunctionCall whose QName:
    -- does not have a prefix, and
    -- has a local-part that is one of the following NCNames:
           attribute
           comment
           etc

    (Thus, a parser, seeing (in an expression context) one of those words
    followed by a open parenthesis, can be certain that it is not the start of
    a FunctionCall.)

    [Way to construct semantically equivalent query?]
Comment 1 Michael Dyck 2005-05-11 23:44:34 UTC
As pointed out by Michael Kay over at Bug 1368, there *is* an ambiguity
involving leading-lone-slash, which is a more compelling argument than the
merely "troublesome" case that the spec (and then I) used.  So please slot
this replacement into A.3 (3) above:

    (3)
    [grammar-note: leading-lone-slash]
    Consider the two abbreviated syntax trees:

               Expr                             Expr
                |                                |
             PathExpr                   MultiplicativeExpr
                |                                |
         +------+-----+                  +-------+-------+
         |            |                  |       |       |
        "/"   RelativePathExpr       UnionExpr  "*"  UnionExpr
         |            |                  |       |       |
         |     +------+-----+         PathExpr   |    PathExpr
         |     |      |     |            |       |       |
         |  StepExpr "/" StepExpr       "/"      |   +---+-----+
         |     |      |     |            |       |   |         |
         |  NameTest  |  NameTest        |       |  "/" RelativePathExpr
         |     |      |     |            |       |   |         |
         |  Wildcard  |   QName          |       |   |     StepExpr
         |     |      |     |            |       |   |         |
         /     *      /    foo           /       *   /        foo

    This illustrates another ambiguity in the EBNF. Similarly with:

               Expr                               Expr
                |                                  |
             PathExpr                          UnionExpr
                |                                  |
         +------+-----+                  +---------+---------+
         |            |                  |         |         |
        "/"   RelativePathExpr       IntExExpr  "union"  IntExExpr
         |            |                  |         |         |
         |     +------+-----+         PathExpr     |      PathExpr
         |     |      |     |            |         |         |
         |  StepExpr "/" StepExpr       "/"        |     +---+-----+
         |     |      |     |            |         |     |         |
         |  NameTest  |  NameTest        |         |    "/" RelativePathExpr
         |     |      |     |            |         |     |         |
         |   QName    |   QName          |         |     |     StepExpr
         |     |      |     |            |         |     |         |
         /   union    /    foo           /       union   /        foo

    In each pair, the ambiguity is resolved by disallowing the tree on the
    right.  Specifically, [etc as before]
Comment 2 Scott Boag 2005-07-09 19:18:27 UTC
(In reply to comment #0)
I don't think this design or language is clearer than the existing design or
language, though I may end up using some parts of it, the Filler production, for
example.  I certainly don't want to start using example parse trees.

Once I'm done with all other modifications, the WG can compare that draft with
this proposal and decide if they want to make a more radical change.
Comment 3 Scott Boag 2005-07-22 20:28:58 UTC
A joint meeting of the Query and XSLT working groups considered this comment on 
July 20, 2005.  

The WG does not agree that this proposal is a significant improvement over the
existing text, and declines to adapt it.

If you do not agree with this resolution, please add a comment explaining why.
If you wish to appeal the WG's decision to the Director, then change the Status
of the record to Reopened. If we do not hear from you in the next two weeks, we
will assume you agree with the WG decision.