This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1385 - [XQuery] some editorial comments on A.2.2.1 Default Whitespace Handling
Summary: [XQuery] some editorial comments on A.2.2.1 Default Whitespace Handling
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 1.0 (show other bugs)
Version: Last Call drafts
Hardware: All All
: P2 minor
Target Milestone: ---
Assignee: Scott Boag
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-11 07:39 UTC by Michael Dyck
Modified: 2007-02-25 23:53 UTC (History)
0 users

See Also:


Attachments

Description Michael Dyck 2005-05-11 07:39:45 UTC
A.2.2.1 Default Whitespace Handling

[See a later comment for suggested alternate wording.]

(making it all explicit)
    In http://www.w3.org/2005/04/xquery-issues.html#qt-2004Feb0853-01,
    Steven Buxton suggested "that you give up on implicit whitespace rules
    in the EBNF, and go with totally explicit whitespace in every
    EBNF." Apparently the proposal was accepted. And yet the proposed change
    did not occur. What happened?

"[Definition: Whitespace characters are defined by [http:...#NT-S]"
    Put "characters" in bold, because you're defining "whitespace characters",
    not "whitespace".

    Maybe put it in the singular: "A 'whitespace character' is any of the
    characters referenced in the right-hand-side of [...#NT-S]."

"when these characters occur outside of a StringLiteral.]"
    I think this exception is unnecessary. Consider that there isn't an
    exception for QuotAttrValueContent, DirElemContent, etc.

"Ignorable"
    Change to lower-case "i".

"Unless otherwise specified ..., Ignorable whitespace may occur between
terminals,"
    This is not a definition. The real definition comes later.

    It isn't clear how these two phrases relate. That is, given two adjacent
    terminals, how does one determine whether whitespace may be inserted between
    them, i.e., whether Default or Explicit Whitespace Handling applies?
    For example, in the query
        <a>{ "hello" }{ "world" }</a>
    consider the two terminals '}' and '{' in the middle. They both come from
    (different applications of) the EnclosedExpr production, which is not marked
    with 'ws: explicit', and so is subject to Default Whitespace Handling.
    However, you presumably don't want to suggest that ignorable whitespace can
    be inserted between these two terminals. Instead, what I imagine you have in
    mind is that a pair of successive terminals is governed by their nearest
    common ancestor in the syntax tree. In the above example, that's a
    DirElemConstructor, which symbol/production *is* marked 'ws: explicit', so
    ignorable whitespace cannot be inserted. However, as I say, it isn't clear
    that this is the intent.

"and is not significant to the parse tree"
    Well, that's a bit tricky, since the presence/absence of whitespace can
    certainly be significant to the resulting parse tree ('a-b' vs 'a - b').

"For readability, whitespace may be used..."
    This certainly doesn't belong in a definition.

"All allowable whitespace that is not explicitly specified in the EBNF is
ignorable whitespace, and converse, this term does not apply to whitespace that
is explicitly specified. ]
    Change "converse" to "conversely".
    Delete space before right paren.

    You could simplify it by saying
        "Ignorable whitespace is any allowable whitespace that is not explicitly
        specified in the EBNF."
    (Now that's a definition.)

    However, the phrase "allowable whitespace" is not defined. (In fact, this is
    the only occurrence of the word "allowable" in the whole spec.) You could
    delete it; the "not explicitly specified" phrase is doing the real work.

"Whitespace is allowed before the first terminal and after the last terminal of
an expression module."
    Change "an expression module" to just "a module".

"Whitespace is optional between delimiting terminals."
    Change "optional" to "allowed".

    You missed a case: Whitespace is allowed between a delimiting terminal and
    a non-delimiting terminal (in either order). It would be simpler to just
    say "Whitespace is allowed between any two terminals."

(that whole paragraph)
    This paragraph is backwards. It talks about what you can do with ignorable
    whitespace, then defines it in terms of allowable whitespace, then defines
    where whitespace is allowed. The opposite order seems like it would make
    more sense.

"Comments may also act as 'whitespace' to prevent two adjacent terminals from
being recognized as one."
    This suggests that that's the only context in which comments may act as
    whitespace, which is not what you want.

    Should be mentioned in 2.6?

"foo- foo is a syntax error."
    Change "is" to "results in".

"foo-" would be recognized as a QName.
    Not necessarily. That is, when the parser raises a syntax error, it doesn't
    have to "recognize" anything.

"foo -foo parses the same as foo - foo"
    Don't bring parsing into it if you don't have to. Change "parses the same
    as" to "is syntactically equivalent to".

"The parser would match..."
    These sentences are too implementation-specific.

"also parses the same as"
    Ditto previous substitution.

"When used as an operator after the characters of a name, the "-" must be
separated from the name, e.g. by using whitespace or parentheses."
    This is odd wording. It's as if you're saying (e.g.):
        When your query is
            foo-foo
        your query must be
            foo -foo
        or
            (foo)-foo
    which is self-contradictory.  See next point.

"10div 3 results in a syntax error, since the "10" and the "div" would both be
non-delimiting terminals and must be separated by delimiting terminals in order
to be recognized."
    This is very odd wording.  It's as if the parser must realize that I had
    "10" and "div" in mind as distinct terminals, so that it can apply the
    terminal-separation rules.  The "would be" is a tip-off.  Consider this:
    "dog" and "cat" 'would be' non-delimiting terminals, but that doesn't mean
    that "dogcat" results in a syntax error!

    In order to properly apply terminal-separation rules, you need a context in
    which (e.g.) "10" and "div" *are* terminals, rather than 'would be'
    terminals.  And that context is not the query, or the parser, but the
    derivation tree (or syntax tree). E.g., it's fine to say something like:

        Consider the (abbreviated) syntax tree:

                      Expr
                       |
              MultiplicativeExpr
                       |
              +--------+--------+
              |        |        |
          UnionExpr  "div"  UnionExpr
              |        |        |
        IntegerLiteral |  IntegerLiteral
              |        |        |
              ++      +++       +
              ||      |||       |
              10      div       3

        The symbols IntegerLiteral, "div", and IntergerLiteral are all NDTs,
        so the adjacent pairs must be separated by whitespace in the resulting
        query.
Comment 1 Scott Boag 2005-07-09 18:32:56 UTC
(In reply to comment #0)
> A.2.2.1 Default Whitespace Handling
> 
> [See a later comment for suggested alternate wording.]
> 
> (making it all explicit)
>     In http://www.w3.org/2005/04/xquery-issues.html#qt-2004Feb0853-01,
>     Steven Buxton suggested "that you give up on implicit whitespace rules
>     in the EBNF, and go with totally explicit whitespace in every
>     EBNF." Apparently the proposal was accepted. And yet the proposed change
>     did not occur. What happened?

If you look carefully at the minutes referenced by where it says "Proposal
Accepted", it's my resolution proposal that was accepted:

 > a) the whitespace rules for XQuery are so complex...

SB: The rules should now be simple.
> b) there are different whitespace rules for so many

SB: This should not longer be true.
> My suggestion is that you give up on implicit whitespace rules

SB: This would really clutter up the grammar. I believe we now have a workable
system.

Suggest disposition of "accepted-clarification". See changes made to
http://www.w3.org/XML/Group/xsl-query-specs/proposals/grammar-lc-response/xquery.html#whitespace-rules


>
> "[Definition: Whitespace characters are defined by [http:...#NT-S]"
>     Put "characters" in bold, because you're defining "whitespace characters",
>     not "whitespace".
> 
>     Maybe put it in the singular: "A 'whitespace character' is any of the
>     characters referenced in the right-hand-side of [...#NT-S]."

Done.

> 
> "when these characters occur outside of a StringLiteral.]"
>     I think this exception is unnecessary. Consider that there isn't an
>     exception for QuotAttrValueContent, DirElemContent, etc.

Done.

> 
> "Ignorable"
>     Change to lower-case "i".

Done.

> 
> "Unless otherwise specified ..., Ignorable whitespace may occur between
> terminals,"
>     This is not a definition. The real definition comes later.
> 
>     It isn't clear how these two phrases relate. That is, given two adjacent
>     terminals, how does one determine whether whitespace may be inserted between
>     them, i.e., whether Default or Explicit Whitespace Handling applies?
>     For example, in the query
>         <a>{ "hello" }{ "world" }</a>
>     consider the two terminals '}' and '{' in the middle. They both come from
>     (different applications of) the EnclosedExpr production, which is not marked
>     with 'ws: explicit', and so is subject to Default Whitespace Handling.
>     However, you presumably don't want to suggest that ignorable whitespace can
>     be inserted between these two terminals. Instead, what I imagine you have in
>     mind is that a pair of successive terminals is governed by their nearest
>     common ancestor in the syntax tree. In the above example, that's a
>     DirElemConstructor, which symbol/production *is* marked 'ws: explicit', so
>     ignorable whitespace cannot be inserted. However, as I say, it isn't clear
>     that this is the intent.
> 
> "and is not significant to the parse tree"
>     Well, that's a bit tricky, since the presence/absence of whitespace can
>     certainly be significant to the resulting parse tree ('a-b' vs 'a - b').
> 
> "For readability, whitespace may be used..."
>     This certainly doesn't belong in a definition.
> 
> "All allowable whitespace that is not explicitly specified in the EBNF is
> ignorable whitespace, and converse, this term does not apply to whitespace that
> is explicitly specified. ]
>     Change "converse" to "conversely".
>     Delete space before right paren.
> 
>     You could simplify it by saying
>         "Ignorable whitespace is any allowable whitespace that is not explicitly
>         specified in the EBNF."
>     (Now that's a definition.)

Changed to:

<termdef term="Ignorable whitespace" id="IgnorableWhitespace">An <term>ignorable
whitespace</term> character    is any <termref def="Whitespace">whitespace
character</termref> that may occur between <termref
def="terminal">terminals</termref>, unless these characters occur in the context
of a production marked with  a <loc
href="#ExplicitWhitespaceHandling">ws:explicit</loc> annotation, in which case
they can occur only where explicitly specified (see <specref
ref="ExplicitWhitespaceHandling"/>).</termdef>  Ignorable whitespace characters
are not significant to the semantics of an expression.    

> 
>     However, the phrase "allowable whitespace" is not defined. (In fact, this is
>     the only occurrence of the word "allowable" in the whole spec.) You could
>     delete it; the "not explicitly specified" phrase is doing the real work.

yep.

> 
> "Whitespace is allowed before the first terminal and after the last terminal of
> an expression module."
>     Change "an expression module" to just "a module".

Done.

> 
> "Whitespace is optional between delimiting terminals."
>     Change "optional" to "allowed".

Done.

> 
>     You missed a case: Whitespace is allowed between a delimiting terminal and
>     a non-delimiting terminal (in either order). It would be simpler to just
>     say "Whitespace is allowed between any two terminals."
> 
> (that whole paragraph)
>     This paragraph is backwards. It talks about what you can do with ignorable
>     whitespace, then defines it in terms of allowable whitespace, then defines
>     where whitespace is allowed. The opposite order seems like it would make
>     more sense.

Mmmm.  See how it flows when all is done.

> 
> "Comments may also act as 'whitespace' to prevent two adjacent terminals from
> being recognized as one."
>     This suggests that that's the only context in which comments may act as
>     whitespace, which is not what you want.

I don't think it suggests that.

> 
>     Should be mentioned in 2.6?

OK, I'll get back to that when I process bug #1368.

> 
> "foo- foo is a syntax error."
>     Change "is" to "results in".

Done.

> 
> "foo-" would be recognized as a QName.
>     Not necessarily. That is, when the parser raises a syntax error, it doesn't
>     have to "recognize" anything.

Let's not split hairs on what recognizes means.  It's clear enough.

> 
> "foo -foo parses the same as foo - foo"
>     Don't bring parsing into it if you don't have to. Change "parses the same
>     as" to "is syntactically equivalent to".

Done.

> 
> "The parser would match..."
>     These sentences are too implementation-specific.

Sentences removed.

> 
> "also parses the same as"
>     Ditto previous substitution.

Done.

> 
> "When used as an operator after the characters of a name, the "-" must be
> separated from the name, e.g. by using whitespace or parentheses."
>     This is odd wording. It's as if you're saying (e.g.):
>         When your query is
>             foo-foo
>         your query must be
>             foo -foo
>         or
>             (foo)-foo
>     which is self-contradictory.  See next point.
> 
> "10div 3 results in a syntax error, since the "10" and the "div" would both be
> non-delimiting terminals and must be separated by delimiting terminals in order
> to be recognized."
>     This is very odd wording.  It's as if the parser must realize that I had
>     "10" and "div" in mind as distinct terminals, so that it can apply the
>     terminal-separation rules.  The "would be" is a tip-off.  Consider this:
>     "dog" and "cat" 'would be' non-delimiting terminals, but that doesn't mean
>     that "dogcat" results in a syntax error!
> 
>     In order to properly apply terminal-separation rules, you need a context in
>     which (e.g.) "10" and "div" *are* terminals, rather than 'would be'
>     terminals.  And that context is not the query, or the parser, but the
>     derivation tree (or syntax tree). E.g., it's fine to say something like:
> 
>         Consider the (abbreviated) syntax tree:
> 
>                       Expr
>                        |
>               MultiplicativeExpr
>                        |
>               +--------+--------+
>               |        |        |
>           UnionExpr  "div"  UnionExpr
>               |        |        |
>         IntegerLiteral |  IntegerLiteral
>               |        |        |
>               ++      +++       +
>               ||      |||       |
>               10      div       3
> 
>         The symbols IntegerLiteral, "div", and IntergerLiteral are all NDTs,
>         so the adjacent pairs must be separated by whitespace in the resulting
>         query.

I'm working on doing away with the DT and NDT terms, so, at least for the
moment, I've deleted the explanations, and just say these are not legal.  Will
probably revisit once I've got the whole DT NDT thing smoothed out.
Comment 2 Michael Dyck 2005-07-10 20:07:56 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > > (making it all explicit)
> >     In http://www.w3.org/2005/04/xquery-issues.html#qt-2004Feb0853-01,
> >     Steven Buxton suggested "that you give up on implicit whitespace rules
> >     in the EBNF, and go with totally explicit whitespace in every
> >     EBNF." Apparently the proposal was accepted. And yet the proposed change
> >     did not occur. What happened?
> 
> If you look carefully at the minutes referenced by where it says "Proposal
> Accepted",

I can't look at the minutes, they're for W3C members only. (So, thanks for
quoting an excerpt.)
Comment 3 Scott Boag 2005-07-22 19:30:20 UTC
A joint meeting of the Query and XSLT working groups considered this comment on 
July 20, 2005.  

The WGs agreed to resolve these editorial issues as listed in my previous comment.

If you do not agree with this resolution, please add a comment explaining why.
If you wish to appeal the WG's decision to the Director, then change the Status
of the record to Reopened. If we do not hear from you in the next two weeks, we
will assume you agree with the WG decision.
Comment 4 Jim Melton 2007-02-25 23:53:38 UTC
Closing bug because commenter has not objected to the resolution posted and more than two weeks have passed.