This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23643 - [UPD 3.0] Convenient operator for transform expressions
Summary: [UPD 3.0] Convenient operator for transform expressions
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Update 3.0 (show other bugs)
Version: Working drafts
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: John Snelson
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-26 13:54 UTC by Christian Gruen
Modified: 2015-02-10 10:45 UTC (History)
6 users (show)

See Also:


Attachments

Description Christian Gruen 2013-10-26 13:54:24 UTC
The XQUF transform expression has turned out to be an essential operation when it comes to updating XML documents in main-memory. From our perspective, however, it has the following shortcomings:

* Its verbose syntax makes updatable expressions inconvenient to read. The copy/modify/return construct is particularly bulky when being embedded in another FLWOR expression.

* Our users frequently mix up the components of FLWOR and transform expressions. For many, it is hard to grasp why "copy" cannot be used in row with "for" or "let".

* Most transform expressions we have encountered so far contain a single copy variable and return the modified value without further modifications.

This is why I would like to propose a "modify" operator for XQUF 3.0, which would require only few modifications in the grammar:

  AndExpr        ::= ModifyExpr ( "and" ModifyExpr )*
  ModifyExpr     ::= ComparisonExpr ( "!!" ComparisonExpr )?
  ComparisonExpr ::= ...

The modify operator works similar to the XQuery 3.0 map operator (and the "!!" token has been chosen for that reason): the value of the first expression is bound as context item, and the second expression performs updates on this item. The updated item is returned as result. 

From the implementation point of view, this extension is straightforward:

* It can completely be derived from the transform expression.
* Error codes are the same, too (XUST0002, XUTY0013).

This is a little example how an expression may look like before and after:

* Using the transform expression:

  for $e in //item[@status ne 'up-to-date'] 
  let $c :=
    copy $c := $e
    modify insert node <sub/> into $c
    return $c
  return element updated { $c }

* Using the modify operator:

  for $e in //item[@status ne 'up-to-date'] 
  let $c := $e !! insert node <sub/> into .
  return element updated { $c }

I’m looking forward to the discussion.
Comment 1 John Snelson 2013-10-28 10:16:47 UTC
Interesting idea, although I'm not sure I like the way you spell the operator "!!". Maybe just using "modify" as an infix operator would be more clear.
Comment 2 Christian Gruen 2013-10-28 10:22:17 UTC
Yes, I’m open for alternative spellings.

One way why I chose special characters over a string is that we would need to introduce two keywords (The full-text keyword "contains" was changed to "contains text" for the same reason), and something like "<a/> modify node insert node <b/> into ." isn’t that catchy anymore.
Comment 3 Michael Kay 2013-10-28 11:37:25 UTC
We (or the user) could define a function 

modify($n as node(), $f as updating function($n as node())) {
  copy $m := $n
  modify $f($m)
  return $m
}

and then you could write

for $e in //item[@status ne 'up-to-date']
return element updated { modify($e, function($n) {insert node <sub/> into $n}) }

This would be even more attractive if we had a terser syntax for inline functions. In FtanSkrit I used 

{$1 + $2}

for a function that adds its two arguments (i.e. our "function ($n, $m) {$n + $m}"). We've decided to use a bare "{" for maps, but we could still go with something like {= $1 + $2 }, the leading '=' being familiar to spreadsheet users - or even just a leading '=' without the curlies:

for-each($seq, =($1 + 1))

and then Christian's example becomes

for $e in //item[@status ne 'up-to-date']
return element updated { modify($e, =insert node <sub/> into $1)}

I think we should be trying to build new capability by exploiting higher-order functions wherever we can, in preference to inventing new higher-order operators.
Comment 4 Christian Gruen 2013-10-28 13:09:19 UTC
Mike, thanks for the response. I also think that a higher-order approach could offer a comparable solution, provided that the syntax gets more concise at some stage (the FtanSkrit proposal looks compelling). 

As the modification of nodes in main-memory has got a very basic operation, looking at our user feedback and our use cases, and as the effort for specifying and implementing the extension is minimal, I still believe it could deserve its own operator, as the existing solution is just too clumsy. If one single keyword can be used, "modify" may be the better choice?
Comment 5 Christian Gruen 2013-10-28 23:49:30 UTC
I noticed that "modify" is no viable alternative, as it would get in conflict with the existing keyword, when being used in the transform expression:

  copy $c := <c/> modify insert node <X/> into $c return $c

Other alternatives for "!!" could be the keyword "update", the special character "~" or "<<" (to indicate that the following expressions have affect the lefthand operand):

  <c/> !! insert node <X/> into .
  <c/> << insert node <X/> into .
  <c/> ~  insert node <X/> into .
  <c/> update insert node <X/> into .
Comment 6 Christian Gruen 2013-12-04 18:54:11 UTC
A little addition: we believe that the "update" keyword is the best choice. It is both well readable and connects better to the existing vocabulary of the XQUF:

CURRENTLY:
  for $item in db:open('data')//item
  return copy $c := $item
         modify delete node $c/text()
         return $c

PROPOSED:
  for $item in db:open('data')//item
  return $item update delete node text()
Comment 7 Christian Gruen 2014-01-22 19:55:06 UTC
If the 'update' keyword should cause conflicts, and if we want to avoid special characters, 'update node' or 'modify node' would probably be the most consistent choice.
Comment 8 Innovimax 2014-01-23 00:32:04 UTC
and what about "updated by" ?
Comment 9 Christian Gruen 2014-01-23 09:54:21 UTC
...yes, might be more readable. Another alternative (to align the wording with the existing transform expression) could be "modified by":

  doc('input.xml')//item modified by (
    replace node . with element TEXT { text() }
  )
Comment 10 Jonathan Robie 2014-02-19 11:09:18 UTC
The Working Group agrees. We are adding the following ExprSingle:

TransformExpr ::= PathExpr ("transform" "with" "{" Expr? "}")?
Comment 11 Ghislain Fourny 2014-02-19 11:10:16 UTC
and

SimpleMapExpr ::= TransformExpr ("!" TransformExpr)*
Comment 12 Christian Gruen 2014-02-20 11:32:04 UTC
Thanks for looking at this.

One more question: I was surprised about the addition of the curly brace, as we now have three mandatory tokens. Wouldn't one of the following expression be syntactical valid, and sufficient?

  a) <a/> transform with insert node <b/> into .
  b) <a/> transform { insert node <b/> into . }
Comment 13 John Snelson 2014-02-20 11:36:37 UTC
We need a terminating token for the TransformExpr as the insert expressions are all directly under ExprSingle, which creates ambiguity in operators after the TransformExpr.

I think "$E transform { ... }" would work syntactically.
Comment 14 Christian Gruen 2014-02-20 16:56:16 UTC
Fine. In this case, I would prefer to get rid of the additional keyword. What do others think?
Comment 15 Jonathan Robie 2014-03-14 20:50:02 UTC
At the Face-to-Face in Prague, we decided as following:

DECISION: Adopt the simple TransformExpr as described by bug 23643 and 
modified to use the following syntax:

SimpleMapExpr ::= TransformExpr ("!" TransformExpr)*			
TransformExpr ::= PathExpr ("transform" "with" "{" Expr? "}")?

We apparently forgot to record that decision here.
Comment 16 Christian Gruen 2014-03-14 21:41:54 UTC
I just reopened the issue, as there is already a TransformExpr in XQuery 3.0 Update:

[206] TransformExpr ::= "copy" "$" VarName ":=" ExprSingle (","
      "$" VarName ":=" ExprSingle)* "modify" ExprSingle "return" ExprSingle

We could rename the new expression to ModifyExpr or something similar.

Next, I still wonder why/if we really need three tokens..

  "transform" "with" "{"

..or if we could get rid of "with" or "{"? In my initial proposal, I suggested to move the expression within AndExpr and ComparisonExpr; would this help?
Comment 17 Michael Dyck 2014-08-18 22:34:36 UTC
(In reply to Christian Gruen from comment #16)
> I just reopened the issue, as there is already a TransformExpr in XQuery 3.0
> Update:
> 
> [206] TransformExpr ::= "copy" "$" VarName ":=" ExprSingle (","
>       "$" VarName ":=" ExprSingle)* "modify" ExprSingle "return" ExprSingle
> 
> We could rename the new expression to ModifyExpr or something similar.

So TransformExpr uses the keyword "modify",
whereas ModifyExpr uses the keyword "transform".

Any chance we could fix that?
Comment 18 Jonathan Robie 2014-08-18 22:40:45 UTC
(In reply to Michael Dyck from comment #17)

> So TransformExpr uses the keyword "modify",
> whereas ModifyExpr uses the keyword "transform".
> 
> Any chance we could fix that?

The names of nonterminals can be changed without changing the language.  We may need WG approval for this, but I think we should do this.
Comment 19 Michael Kay 2014-08-18 22:42:21 UTC
I've argued in the past for the copy/modify expression to be called something other than TransformExpr (but for some reason, I failed to convince people).

When the name of an expression bears no relationship to the keywords used, it makes it very difficult to construct error messages that will be meaningful to the 95% of users who haven't read the W3C spec.
Comment 20 Christian Gruen 2014-08-26 15:36:34 UTC
Maybe the tokens "update" "{" would be a feasible alternative? In that case, we could call the rule UpdateExpr.
Comment 21 John Snelson 2014-10-14 16:30:58 UTC
The WG discussed this at meeting #585 abd made the following decision:

Rename the current TransformExpr to CopyModifyExpr, and name the nonterminal for the "transform with" expression TransformWithExpr.
Comment 22 Christian Gruen 2014-12-15 19:50:13 UTC
I have reopened this bug (sorry for my perseverance):

1. First, I would like to quote John Snelson's concerns, which I share:

> [...] I think the precedence for the transform with expression is wrong.
> I propose we move it to between CastExpr and UnaryExpr, so that it is
> situated along with the other textual operators, and so that a simple map
> expression can occur on it's left hand side without parentheses.

2. I'm still wondering if one or two tokens, such as "transform" "{" (as John proposed in Comment 13), or even simply "transform", would not be sufficient. I had to think of my initial feature request, in which I suggested to move the new expression between AndExpr and ComparisonExpr:

  AndExpr        ::= TransformExpr ( "and" TransformExpr )*
  TransformExpr  ::= TransformExpr ( "transform" ComparisonExpr )?
  ComparisonExpr ::= ...

I am pretty sure I have overseen something, but from the answers given so far I could not explain to myself why...

 a)  X and Y

...is a valid expression whereas it seems that...

 b)  X transform Y

would not be valid. Could someone possibly give me an example that demonstrates the problem? Or does the problem only occur if the TransformExpr is moved further down in the grammar?
Comment 23 John Snelson 2015-02-10 10:45:40 UTC
The working group reconsidered the syntax of the TransformWithExpr and decided not to make a change to it.