This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4176 - [UPD] Syntax "do rename ... as ..." problematic with tokenization
Summary: [UPD] Syntax "do rename ... as ..." problematic with tokenization
Status: CLOSED INVALID
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Update Facility (show other bugs)
Version: Working drafts
Hardware: Macintosh All
: P2 normal
Target Milestone: ---
Assignee: Andrew Eisenberg
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-10 09:23 UTC by Martin Probst
Modified: 2007-03-08 11:05 UTC (History)
2 users (show)

See Also:


Attachments

Description Martin Probst 2007-01-10 09:23:36 UTC
While implementing the current XQuery updates working draft I noticed a problem with the syntax for "do rename". Syntax is "do rename" expr "as" expr. Assuming a stateful, stack based Lexer, the implementation will run into problems. I'm using the state names from the "Building a Tokenizer for XPath or XQuery" note.

The Lexer will be in default state after the "do rename" token, then lex the expr content. After that, it's in operator state. When it now encounters an "as" token it's ambiguous whether the next state should be itemtype or default. Type declarations like "let $x as foo :=" need the itemtype state, do rename needs default state. 

As far as I can see it's not possible to solve this using the stack. The only two workarounds are to either duplicate the expression states in the lexer just for the do rename statement (ie. have a "default but do rename" state) or to inject the state knowledge from the parser. Both doesn't quite get a buty price.

Is "do rename expr into expr" correct English? That should give less problems.
Comment 1 Michael Kay 2007-01-10 10:15:01 UTC
If it's problematic for rename, then it's presumably problematic for insert too?

My personal view on this is that I'm not at all comfortable with the English-like or Cobol-like statement syntax when used in a language that's an expression language rather than a statement language, and especially one that has no reserved words. We changed the sequence type syntax for this reason from "element x of type y" to element(x,y), and I think we would make life easier for both implementors and users if we applied the same treatment to update expressions. I can't see what's wrong with rename(x,y); or if there are semantic reasons for not using a function call, then perhaps rename{x,y} or rename{x}{y}. On the surface it looks more cryptic, but once you build compound expressions with a lot of nesting it's far clearer.

Michael Kay
Comment 2 Martin Probst 2007-01-11 12:59:22 UTC
In do insert ... (as first | as last) into ... you can do a lookahead and decide because of the following 'first into' or 'last into', as those would be syntax errors in the itemtype state (however if there would only be one word following, that would be a problem...).

I do second the idea of a function-based syntax. I implemented a custom syntax for updates using functions (which are extremely close to this spec, except for the syntax) before this spec was publicly available, and users seem to adapt to it without much problems.
Comment 3 John Snelson 2007-01-11 14:36:59 UTC
I thought I had solved this problem, but I've just checked and found a bug. A simpler solution would be to use "to" instead of "as", since the "to" operator already changes back to the initial state.
Comment 4 Martin Probst 2007-01-12 22:48:39 UTC
If I remember this correctly the syntax used to be "do rename ... to ...", but the problem is that "to" conflicts with the syntax for integer list construction:

do rename 1 to 10 to xs:QName("foo")

(I think it's then non-LL(1)).
Comment 5 Scott Boag 2007-01-31 16:06:06 UTC
Irrespecitive of proposals for new syntax such as rename(a, b), the current syntax does not seem to have an ambiguity problem, as far as I can tell.  

"Building a Tokenizer for XPath or XQuery" (http://www.w3.org/TR/xquery-xpath-parsing/ was supposed to be marked as obsolete.  That it's not is probably my fault.  I know someone had an action from a working group meeting to do so.  We were running into certain brick walls with trying to do lexical-based disambiguation, which is why we dropped much of the lexical state machine in the test parser in favor of a lookahead mechanism (http://www.w3.org/2006/11/applets/xqueryApplet.html ... I can send you an xquery-update version of this, including .jjt file), and made the note obsolete.

I think you are saying that something like:

do rename foo treat as element() as baz

or 

do rename let $x as element() := foo return $x as baz

They are not ambiguous as the first is clearly associated with the "treat", and the second more stand-alone "as" only occurs within limited defined contexts which expects a type declaration or a non-default thing such as ":=" or "," or "{".  Lexicially, they don't have to be distinguished... they can be the same token.  I don't think you actually need lookahead for this.   The test parser evaluates both expressions correctly.
Comment 6 Martin Probst 2007-02-05 10:55:51 UTC
The problem for me is that the grammar started out as being LL(1) if a certain (quite complex) set of tokenization rules is being followed. With this addition of "as" as a token that leads to a non itemtype state, these rules break - it's no longer possible to write a tokenizer that is self-sufficient and correctly tokenizes any input without the help of a parser. The language is not ambiguous, it's just no longer LL(1), even with a tricky lexer.

To clarify: using 'as' as the keyword in this place would require to no longer lex a following 'element()' as an element test/type. This would then require the parser to look ahead beyond the QName token 'element' and see what's coming up to decide if 'element' is a type name or the element test. That is not LL(1).

It's possible to solve this (my current way is to include a Horrible Hack (tm) where the parser tells the lexer in the rename state to expect that "as" token), but it's quite ugly to add a language extension that breaks a formerly good strategy to parse the language.

I'd also say that adding more and more syntax to the language doesn't really make it more usable, quite the contrary. 'as' used to be an indicator of a type declaration (functions, flwor, typeswitch), now it's being used for two different things. I think it would also be easier for GUIs/editors if we would try to keep single keywords having a single meaning as much as possible.
Comment 7 Don Chamberlin 2007-03-07 19:30:00 UTC
Martin,
The Query Working Group considered this issue on 6 March 2007. We believe that multiple approaches exist that permit tokenization and parsing of the "rename ... as" syntax published in the current Working Draft. Therefore we are changing the status of this bug report to "Invalid". If you are satisfied with this resolution, please mark this bug report as Closed.
Thanks,
Don Chamberlin (for the Query Working Group)
Comment 8 Martin Probst 2007-03-08 11:05:05 UTC
Closing the bug.