1384 – [XQuery] some editorial comments on A.2.1 Terminal Types

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1384 - [XQuery] some editorial comments on A.2.1 Terminal Types

Summary: [XQuery] some editorial comments on A.2.1 Terminal Types

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XQuery 1.0 (show other bugs)
Version:	Last Call drafts
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Scott Boag
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:	grammar
Keywords:

Depends on:
Blocks:

Reported:	2005-05-11 07:37 UTC by Michael Dyck
Modified:	2005-09-29 11:01 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Michael Dyck 2005-05-11 07:37:38 UTC

A.2.1 Terminal Types

[See a later comment for suggested alternate wording that achieves the effect of
this section without defining 'terminal' or making long lists of symbols.]

(intent)
    Is it your intention that the characters of every legal query can be
    partitioned into a sequence of terminals and intervening whitespace?
    If so, you'll need to add the following as terminals:
        Char
        "(#" and "#)" (or else Pragma)
        PITarget

    On the other hand, if it's your intention to only include terminals that
    could be next to ignorable whitespace, then there are a bunch that could be
    removed:
        ":"
        """
        "'"
        "<![CDATA["
        "]]>"
        PredefinedEntityRef
        CharRef
        "</"
        "{{"
        "}}"
         EscapeQuot
         EscapeApos
         S

    Note also that some of the terminals derive forms containing other
    terminals, which could complicate things:
        PredefinedEntityRef -> "lt", "gt", ";"
        CharRef             -> ";"
        Comment             -> CommentContents
        DecimalLiteral      -> Digits, "."
        DoubleLiteral       -> Digits, "."
        StringLiteral       -> '"', "'"
        QName               -> NCName, ":"

'terminal'
    This section waffles between two senses of 'terminal': a symbol in the
    grammar, or a group of characters in a query. (E.g., "The XQuery grammar
    defines 153 terminals." vs. "This query contains 1200 terminals.") Maybe
    nobody will mind.

(In my comments, I will abbreviate "delimiting terminal" and "non-delimiting
terminal" as "DT" and "NDT" respectively.)

"delimit"
    The rest of the spec uses "delimit" to mean "mark the start and end", e.g.:
        -- '(:' and ':)' are comment delimiters,
        -- braces delimit an enclosed expression, and
        -- a string literal can be delimited by apostrophes or quotation marks.
    However, in this section, the intended usage appears to be, for example,
    that in
        x+1
    the plus sign "delimits" x and 1. This is odd. It would be plainer to say
    that it "separates" them.  (Note that in A.2.2.1, the examples use the
    phrase "separated by DTs", not "delimited by DTs".)

    So I'd recommend changing "delimit/delimited" to "separate/separated".
    However, I don't recommend changing "(non-)delimiting terminal" to
    "(non-)separating terminal", as both seem like odd phrasing to me. For
    instance, in
        x+1-y
    it would seem reasonable to say that the 1 separates (or delimits, if you
    must) the plus and minus. So to decree that the plus is "separating"
    whereas the 1 is "non-separating" doesn't make sense.  Instead, I think
    "adjoinable" and "non-adjoinable" might be better, or "punctuation-like"
    and "word-like", or "closed" and "open", or just "class 1" and "class 2".

----

"A DT may delimit adjacent NDTs."
    This is not a definition. The real definition is the list.

(list of DTs)
    Delete initial comma.
    Delete "%%%".
    Maybe change """ to '"'.
    Not sure why you need both Comment *and* "(:" + ":)".
    Put them in ASCII order? Some kind of order would be nice.

    "." and "-" are going to cause problems, given that they're valid
    NCNameChars. E.g., if 'x' and '1' are two NDTs, and '-' is the DT that
    will 'delimit' them, you get x-1, which doesn't work. (It's misrecognized
    as a single NCName.)

    Expressing the problem in a different way: A.2.2.1 says that whitespace is
    only required between two NDTs, but '-' is not an NDT, so whitespace isn't
    required between 'x' and '-'. Which is not what you want.

    On the other hand, you can't make "-" an NDT, because then things like
    100-x and (blah)-1 would become illegal.

"NDTs generally start and end with alphabetic characters or digits."
    This is almost a definition, but the "generally" makes it too vague.
    Again, the real definition is the list.

"Adjacent NDTs must be delimited by a DT."
    This doesn't belong in a definition.

(list of NDTs)
    Change ValidationMode to just "lax", "strict".
    Definitely put them in ASCII order.

----

"delimit adjacent"
    Both "definitions" have a phrase along the lines of:
        "a DT [may/must] delimit adjacent NDTs"
    but this makes no sense -- if the DT is between the NDTs, then the NDTs are
    not adjacent! Presumably you mean something like
        "two NDTs may not be adjacent"
    or
        "two adjacent terminals may not both be NDTs"
    or
        "in every pair of adjacent terminals in a query, at least one of the
        terminals must be a DT"

    However, a reasonable response (to any of these) would be:
        But I have no choice! For instance, in the production for VersionDecl,
        it says right there:
            "xquery" "version" etc.
        So the (NDTs) "xquery" and "version" *have* to be adjacent. I can't just
        grab some DT and stick it between them -- I'd get a syntax error!

    The answer, I assume, would be:
        Ah, but S and Comment are DTs, and you certainly *can* (and in fact,
        must) put either or both of those between the "xquery" and "version".

    This illustrates a couple of problems:
    (1) What you mean here by 'adjacent' may not be what the reader thinks.
    (2) S and Comment are the only DTs that people can actually use to separate
        two NDTs (without changing the structure of the query), and they are
        buried in the list, and not mentioned in the prose.

    It might help solve both of these problems if you moved/recast the content
    of this section into A.2.2 Whitespace Rules. (As far as I can tell, the only
    reason to define these classes of terminals is to be able to define where
    whitespace is allowed, and where required, so the move is appropriate.)

Comment 1 Scott Boag 2005-07-09 10:23:11 UTC

(In reply to comment #0)
> A.2.1 Terminal Types
...

> (intent)
>     Is it your intention that the characters of every legal query can be
>     partitioned into a sequence of terminals and intervening whitespace?

Yes.

>     If so, you'll need to add the following as terminals:
>         Char

Fixed.

>         "(#" and "#)" (or else Pragma)

Fixed.

>         PITarget

Fixed.  These were all production problems.

> 
>     On the other hand, if it's your intention to only include terminals that
>     could be next to ignorable whitespace, then there are a bunch that could 

No.

>     Note also that some of the terminals derive forms containing other
>     terminals, which could complicate things:

Yes, perhaps.  May come back and revisit this.

> 
> 'terminal'
>     This section waffles between two senses of 'terminal': a symbol in the
>     grammar, or a group of characters in a query. (E.g., "The XQuery grammar
>     defines 153 terminals." vs. "This query contains 1200 terminals.") Maybe
>     nobody will mind.

Michael Sperberg-McQueen and I have been working on a clearer definition of
terminal:

<termdef term="terminal" id="terminal">A <term>terminal</term> is a symbol or
string or pattern that can appear
    in the right-hand side of a rule, but never appears on the
    left hand side in the main grammar, although it may appear
    on the left-hand side of a rule in the grammar for terminals.</termdef>

And going back to dividing the grammar into a main grammar and a section for
terminals.  I'm still tweaking the details of this, but I'm hoping it will clean
up a bunch of these problems, at least to a point.

> "delimit"
>     The rest of the spec uses "delimit" to mean "mark the start and end", e.g.:
>         -- '(:' and ':)' are comment delimiters,
>         -- braces delimit an enclosed expression, and
>         -- a string literal can be delimited by apostrophes or quotation marks.
>     However, in this section, the intended usage appears to be, for example,
>     that in
>         x+1
>     the plus sign "delimits" x and 1. This is odd. It would be plainer to say
>     that it "separates" them.  (Note that in A.2.2.1, the examples use the
>     phrase "separated by DTs", not "delimited by DTs".)

I used the term "separate" in the text, but I kept the terms, as I think they're
clear enough.

> 
>     So I'd recommend changing "delimit/delimited" to "separate/separated".
>     However, I don't recommend changing "(non-)delimiting terminal" to
>     "(non-)separating terminal", as both seem like odd phrasing to me. 

Right, agreed.

> For
>     instance, in
>         x+1-y
>     it would seem reasonable to say that the 1 separates (or delimits, if you
>     must) the plus and minus. So to decree that the plus is "separating"
>     whereas the 1 is "non-separating" doesn't make sense.  Instead, I think
>     "adjoinable" and "non-adjoinable" might be better, or "punctuation-like"
>     and "word-like", or "closed" and "open", or just "class 1" and "class 2".

I think the names for the categories are clear enough.

> 
> ----
> 
> "A DT may delimit adjacent NDTs."
>     This is not a definition. The real definition is the list.

I think it's clear enough as a term definition.  I don't really think the list
belongs in the term definition, and I would rather not restructure this in a
radical way.

Hmm... I may revisit this.  Maybe Michael can come up with some alternative wording?

> 
> (list of DTs)
>     Delete initial comma.

Fixed.

>     Delete "%%%".

Fixed.

>     Maybe change """ to '"'.

Not worth the extra work right now.  May revisit.

>     Not sure why you need both Comment *and* "(:" + ":)".

"(:" + ":)" have been removed from the list.

>     Put them in ASCII order? Some kind of order would be nice.

Yes it would.  They're in grammar occurance order now.  Unfortunately, it's a
bit of a technical challenge to sort them (the joys of recursive processing in
XSLT 1.0).  I may get to this at some point, but it's got to be low on the
priority list.

> 
>     "." and "-" are going to cause problems, given that they're valid
>     NCNameChars. E.g., if 'x' and '1' are two NDTs, and '-' is the DT that
>     will 'delimit' them, you get x-1, which doesn't work. (It's misrecognized
>     as a single NCName.)
> 
>     Expressing the problem in a different way: A.2.2.1 says that whitespace is
>     only required between two NDTs, but '-' is not an NDT, so whitespace isn't
>     required between 'x' and '-'. Which is not what you want.
> 
>     On the other hand, you can't make "-" an NDT, because then things like
>     100-x and (blah)-1 would become illegal.

Yes, need to sleep on this.  I think I need to make some sort of exception for
these cases.
 
> "NDTs generally start and end with alphabetic characters or digits."
>     This is almost a definition, but the "generally" makes it too vague.
>     Again, the real definition is the list.

Ok, redefined as <termdef id="non-delimiting-token" term="Non-delimiting
Terminal"><term>Non-delimiting terminals</term>  terminals must be separated by
a  <termref def="delimiting-token">delimiting terminal</termref>.</termdef>

I know you still don't like this.  Take it as a placeholder for now, I may revisit.

> 
> "Adjacent NDTs must be delimited by a DT."
>     This doesn't belong in a definition.

ditto

> 
> (list of NDTs)
>     Change ValidationMode to just "lax", "strict".

fixed.

>     Definitely put them in ASCII order.

Yes, as I said, has to take lower priority.

> 
> ----
> 
> "delimit adjacent"
>     Both "definitions" have a phrase along the lines of:
>         "a DT [may/must] delimit adjacent NDTs"
>     but this makes no sense -- if the DT is between the NDTs, then the NDTs are
>     not adjacent! Presumably you mean something like
>         "two NDTs may not be adjacent"
>     or
>         "two adjacent terminals may not both be NDTs"
>     or
>         "in every pair of adjacent terminals in a query, at least one of the
>         terminals must be a DT"
> 
>     However, a reasonable response (to any of these) would be:
>         But I have no choice! For instance, in the production for VersionDecl,
>         it says right there:
>             "xquery" "version" etc.
>         So the (NDTs) "xquery" and "version" *have* to be adjacent. I can't just
>         grab some DT and stick it between them -- I'd get a syntax error!
> 
>     The answer, I assume, would be:
>         Ah, but S and Comment are DTs, and you certainly *can* (and in fact,
>         must) put either or both of those between the "xquery" and "version".
> 
>     This illustrates a couple of problems:
>     (1) What you mean here by 'adjacent' may not be what the reader thinks.
>     (2) S and Comment are the only DTs that people can actually use to separate
>         two NDTs (without changing the structure of the query), and they are
>         buried in the list, and not mentioned in the prose.
> 
>     It might help solve both of these problems if you moved/recast the content
>     of this section into A.2.2 Whitespace Rules. (As far as I can tell, the only
>     reason to define these classes of terminals is to be able to define where
>     whitespace is allowed, and where required, so the move is appropriate.)

You may be right.  I'm going to come back to this after I've processed the other
issues.

Comment 2 Scott Boag 2005-07-22 20:16:41 UTC

A joint meeting of the Query and XSLT working groups considered this comment on 
July 20, 2005.  

The WG agreed to the editorial changes as enumerated in my earlier reply, and
agreed to list the issue as resolved.

If you do not agree with this resolution, please add a comment explaining why.
If you wish to appeal the WG's decision to the Director, then change the Status
of the record to Reopened. If we do not hear from you in the next two weeks, we
will assume you agree with the WG decision.