This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4285 - [UPD] Replacing optional content
Summary: [UPD] Replacing optional content
Status: CLOSED WONTFIX
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Update Facility (show other bugs)
Version: Working drafts
Hardware: PC Windows XP
: P2 major
Target Milestone: ---
Assignee: Andrew Eisenberg
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-29 15:42 UTC by Jonathan Robie
Modified: 2007-05-04 21:27 UTC (History)
0 users

See Also:


Attachments

Description Jonathan Robie 2007-01-29 15:42:48 UTC
Replacing optional content is very awkward, as discussed here:

http://lists.w3.org/Archives/Member/member-query-ultf/2006Feb/0040.html

My preferred solution to this is Version 2 of the fold proposal:

http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2006Oct/0058

Giorgio provides good reasons for preferring Version 2:

http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2006Oct/0064.html
Comment 1 Michael Kay 2007-01-29 16:17:30 UTC
The thing I don't like about this is that it special-cases operations on singletons. We should try to design operations that a general enough to work on sets or sequences.

Is it really that much of a hardship to do a delete first, then an insert?
Comment 2 Don Chamberlin 2007-02-12 20:17:04 UTC
Responding to comment #1 from Michael Kay:

As I understand it, the update spec does not currently provide any combination of expressions that is equivalent to the proposed "fold" expression. Here are two attempts to simulate the effects of the following "fold" expression:

fold <c/> into /a/b

(1) We could try a delete followed by an insert:

delete /a/b,
insert <c/> after /a/b

In this case, the new node <c/> will be inserted just after the position of the deleted b-node, but in nondeterministic order with other nodes that are also inserted after /a/b in the same snapshot. So this approach is not equivalent to the fold expression.

(2) We could try a conditional with a replace-branch and an insert-branch:

if (/a/b)
then replace /a/b with <c/>
else insert <c/> into /a

This approach has a serious problem: the else-branch does not specify the position within the parent where the new node should be inserted. It cannot say
"after /a/b" because in this branch we know that /a/b does not exist. So this approach is also not equivalent to the fold expression. 

Regards,
Don Chamberlin
Comment 3 Michael Kay 2007-02-12 20:41:28 UTC
I don't think I understand the semantics of the proposed fold operation. If my existing element has children Q, T, and G, and I fold in a sequence containing Z, T, and K, what is the resulting sequence? Does it use schema knowledge to know that the Z belongs before the Q and the K belongs after the G? If the schema said that the K element has to come first, would it give me K Z Q T G?

If this is intended to work in the absence of a schema, I fail to see how.

Michael Kay
Comment 4 Don Chamberlin 2007-02-12 20:57:08 UTC
CORRECTS ERRORS IN COMMENT #2 (SORRY)

As I understand it, the update spec does not currently provide any combination of expressions that is equivalent to the proposed "fold" expression. Here are two attempts to simulate the effects of the following "fold" expression:

fold <c/> into /a

(1) We could try a delete followed by an insert:

delete /a/c,
insert <c/> after /a/c

In this case, the new node <c/> will be inserted just after the position of the deleted c-node, but in nondeterministic order with other nodes that are also inserted after /a/c in the same snapshot. So this approach is not equivalent to the fold expression.

(2) We could try a conditional with a replace-branch and an insert-branch:

if (/a/c)
then replace /a/c with <c/>
else insert <c/> into /a

This approach has a serious problem: the else-branch does not specify the position within the parent where the new node should be inserted. It cannot say
"after /a/c" because in this branch we know that /a/c does not exist. So this approach is also not equivalent to the fold expression. 

Regards,
Don Chamberlin
Comment 5 Don Chamberlin 2007-02-12 21:00:48 UTC
As stated, the "fold" proposal fails to address an important issue.

Consider the example

fold <c/> into /a

If /a/c exists, the semantics are equivalent to 
replace /a/c with <c/>
and these semantics are well-defined.

If /a/c does not exist, the semantics are equivalent to
insert <c/> into /a
which causes a new c-element to be inserted into a non-deterministic position within /a. 

But the user will probably want to control the position of the new c-element within its parent. The user may intend the new node to be the first child of its parent, or the last child of its parent, or just before or after some designated sibling. These options are all supported by the insert expression but are missing from the proposed "fold" expression.

The missing functionality could be added to the "fold" expression by allowing the "into $parent" clause to take any of the following forms:

into $parent
as first into $parent
as last into $parent
before $sibling
after $sibling

Consider the following example:

fold <c/> after /a/b

The semantics would be as follows:

if /a/b/../c exists, semantics are equivalent to
replace /a/b/../c with <c/>

If /a/b/../c does not exist, semantics are equivalent to
insert <c/> after /a/b

It is worth noting that specifying "after", "as last", etc. can be done fairly easily with the keyword-syntax approach (Jonathan's Version 1) but not with the functional approach (Jonathan's Version 2).

This comment is not intended to express support or lack of support for the "fold" proposal, but simply to call attention to an issue that needs to be resolved to make the proposal well-specified.

Regards,
Don Chamberlin
Comment 6 Andrew Eisenberg 2007-02-13 15:44:00 UTC
Just thinbking out loud on this topic. The "do fold $children into $parent" seems to be a shorthand for the following (considering only element nodes):

let $matching := $children[name(.) = $parent/*/name()]
let $nonmatching := $children except $matching
return
   ( for $m in $matching
     return do replace ($parent/*[name() = name($m)]) with $m,
     do insert $nonmatching into $parent
   )

As Michael Kay pointed out, this will cause an error if one of the elements being has a sibling with the same name (caught by one of the executions of replace). I suppose that we could delete the 2nd and subsequent duplicates of the matching nodes and execute the replace only on the first copy. 

let $matching := $children[name(.) = $parent/*/name()]
let $nonmatching := $children except $matching
return
   ( for $m in $matching
     return
       ( let $matched := $parent/*[name() = name($m)]
         return
            ( do delete $matched[position() >= 2],
              do replace $matched[1] with $m
            )
       ),
     do insert $nonmatching into $parent
   )

Hmmm. Getting a bit complicated.

As Don has suggested, it could be extended to allow "do fold $children as first into $parent":

let $matching := $children[name() = $parent/*/name()]
let $nonmatching := $children except $matching
return
   ( for $m in $matching
     return do replace ($parent/*[name() = name($m)]) with $m,
     do insert $nonmatching as first into $parent
   )


It could be extended to allow the non-matching nodes to be placed before or after an existing node, using "do fold $children after $node":

let $parent := $node/..
let $matching := $children[name() = $parent/*/name()]
let $nonmatching := $children except $matching
return
   ( for $m in $matching
     return do replace ($parent/*[name() = name($m)]) with $m,
     do insert $nonmatching after $node
   )
Comment 7 Jonathan Robie 2007-02-26 23:15:52 UTC
(In reply to comment #1)
> The thing I don't like about this is that it special-cases operations on
> singletons. We should try to design operations that a general enough to work on
> sets or sequences.
> 
> Is it really that much of a hardship to do a delete first, then an insert?

There may not be a better solution. Optimizers can catch this fairly easily.

Unfortunately, users may try a variety of things to get around the problem, and I'm not sure optimizers will catch all of them, so there is at least an education issue (which could be addressed with a use case or an example in the language book). It's especially confusing if you implement static typing, the data is known to be there, but your system refuses to replace it because of static typing (which is why many implementations won't implement the static typing by the book).

Jonathan
Comment 8 Jonathan Robie 2007-02-26 23:18:22 UTC
(In reply to comment #3)
> I don't think I understand the semantics of the proposed fold operation. If my
> existing element has children Q, T, and G, and I fold in a sequence containing
> Z, T, and K, what is the resulting sequence? Does it use schema knowledge to
> know that the Z belongs before the Q and the K belongs after the G? If the
> schema said that the K element has to come first, would it give me K Z Q T G?
> 
> If this is intended to work in the absence of a schema, I fail to see how.

My intent was indeed to allow implementations to use a schema to determine the right place to put the data. For a view of a relational database, it would simply put data in the corresponding column. This part of the semantics is the reason for using "insert into" rather than specifying position.

Jonathan
Comment 9 Jonathan Robie 2007-02-26 23:24:36 UTC
> (2) We could try a conditional with a replace-branch and an insert-branch:
> 
> if (/a/c)
> then replace /a/c with <c/>
> else insert <c/> into /a
> 
> This approach has a serious problem: the else-branch does not specify the
> position within the parent where the new node should be inserted. It cannot say
> "after /a/c" because in this branch we know that /a/c does not exist. So this
> approach is also not equivalent to the fold expression. 

Actually, I believe this is precisely equivalent to the fold proposal, which says:

"If Parent has a child element or attribute $c with the same name and
node kind as $n, invokes upd:replaceNode($c, $n); if there is no child
element or attribute with the same name and node kind, invokes
upd:insertInto($parent, $n)."

Order is not intended to be specifiable - the reason this is easier in SQL is precisely that (1) order does not need to be specified and (2) the system knows that there is only one column with a given name in a table. If we try to make fold generalize to everything you can find in an arbitrary XML document, we lose the ability to simplify based on these two assumptions. The reason for this proposal is to deal with a specific impedance mismatch that is irritating.

Jonathan
Comment 10 Jonathan Robie 2007-02-27 17:16:26 UTC
(In reply to comment #5)

> It is worth noting that specifying "after", "as last", etc. can be done fairly
> easily with the keyword-syntax approach (Jonathan's Version 1) but not with the
> functional approach (Jonathan's Version 2).

Of course, that's also easily done using delete followed by an insert. Perhaps closing this issue with no action is the best we can do.

Jonathan
Comment 11 Jim Melton 2007-05-04 21:27:56 UTC
The response in comment #10 suggests that you are satisfied with the resolution of this bug.  Therefore, we are marking it CLOSED.