26958 – On not precluding updates in XQuery 3.1

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26958 - On not precluding updates in XQuery 3.1

Summary: On not precluding updates in XQuery 3.1

Status:	RESOLVED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XQuery 3.1 (show other bugs)
Version:	Working drafts
Hardware:	PC Linux

Importance:	P2 minor
Target Milestone:	---
Assignee:	Jonathan Robie
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-10-02 22:39 UTC by Jonathan Robie
Modified:	2014-12-16 17:33 UTC (History)
CC List:	5 users (show)

See Also:	27001 27040

Attachments

Description Jonathan Robie 2014-10-02 22:39:31 UTC

At the Hursley meeting, we decided to specify identity as follows in XDM, XPath/XQuery, and F&O:

ACTION A-579-17: Norm to write up maps and arrays in
the XDM in a way that doesn't preclude updates operating
on them (e.g. they have identity even if it's not exposed).

ACTION A-579-18: Jonathan to make sure XPath/XQuery text
does not preclude updates to maps and arrays, e.g. by being
explicit that they might have a non-exposed id.

ACTION A-579-19: Mike Kay to check that F&O does not preclude
updates to maps and arrays, e.g. by being explicit that
they might have a non-exposed id.

Note: There were two action items named A-579-18, and one of them got lost in our action item tracking due to the duplicate name.

I do not think that our LCWDs reflect that decision. On Tuesday's call, we revisited the issue and reached this compromise:

https://lists.w3.org/Archives/Member/w3c-xsl-query/2014Sep/0119.html

<p>This version of the XPath Data Model does not specify whether or
not maps have any identity other than their values. No operations
currently defined on maps depend on the notion of map identity. Other
specifications, for example, the XQuery Update Facility, may make
identity of maps explicit.</p>

There's a problem with this compromise: it results in two sets of behavior that can be distinguished only when updates are implemented. If implementations create and persist maps and arrays between now and then, identical queries can create XDM instance that behave differently when updated.

Suppose we ultimately decide that maps and arrays have identity, which is probably essential to have in-situ update. We do not specify whether an array or map that occurs as a value in another array or map is copied or referenced. That difference can be observed only when updates are available.

An example (in pseudocode, assuming maps and arrays do have identity):

$x := [ 1, 2, 3 ]
$y := { "x" : $x }
$z := { "x" : $x }

Two implementations persist data created with this query. Later, we implement updates for maps, and the value of $y("x") is replaced with the array [2, 4, 6].

What is the value of $z?

I do not believe that the value of $z should be changed, so I think that we should use copy semantics here. Is there a good way to say this without referring to identity?

Elements do have identity:

$x := <i>1 2 3</i>
$y := { "x" : $x }
$z := { "x" : $x }

If this instance is persisted, what should the update semantics be?

Comment 1 Jonathan Robie 2014-10-02 22:40:50 UTC

Note that XQuery Update is not functional, and does not have referential transparency.

As the Haskell Wiki points out, mutable state and referential transparency do not easily mix:

http://www.haskell.org/haskellwiki/The_Monad.Reader/Issue3/Functional_Programming_vs_Object_Oriented_Programming

<quote>
A Haskell program is like an OO program where every object is immutable. All you can do is construct new objects (this makes the term "data constructor" seem more relevant). When you want to change an object, you make a copy of it and replace just the bits you want to change. This isn't as expensive as it sounds, due to the extensive data sharing that immutable objects make possible (the bits of the object that aren't modified aren't copied; you just retain the references to the originals). Also, you avoid any aliasing problems, which is a potential cause of errors in programs that use mutable state.

Haskell also has no notion of object identity. When you construct a value (an instance of a type), the constructor creates a box (pointer) around the value. When you pass this value to functions, you pass the box, so it could be considered a reference. But Haskell has no way of saying "are these the same object?" i.e. are these the same box? All it has is an equality operator (==), which is only concerned with the box contents.

Object systems do have a notion of object identity. This seems to go hand-in-hand with mutable state, because the questions "are these the same object" and "are these the same value" are both relevant.

When you have immutable objects, the question "are these the same object" seems a bit pointless, whereas "are these the same value" is always relevant. "Are these the same object" isn't interesting because the behaviour of your program can't vary with the answer, whereas with mutable state, the behaviour of your program can vary with the answer (this is aliasing).

If you have mutable state then you can't guarantee referential transparency. That's the big deal with Haskell: no mutable state means you can guarantee referential transparency, and that guarantee is a Very Good Thing (all sorts of benefits flow from this).
</quote>

Comment 2 Michael Kay 2014-10-03 11:24:47 UTC

Do we want to build a hierarchic database or a network database? A database which holds maps that can contain other maps, but cannot contain references to other maps is basically a hierarchic database, whereas one that can hold references is a network database. An XML database is hierarchic too, and we get around the problems by using implicit (primary-key/foreign-key) relationships to represent the out-of-hierarchy relationships. Personally, from my background in databases and modelling, I find the limitations of hierarchic models very frustrating. A key aim in database work has always been "data independence" (the phrase is in the title of my 1975 PhD thesis), which means independence between the view of the data seen by query users from the arbitrary design decisions made by database designers. XML databases (and hierarchic databases generally) give very poor data independence, and I would hope we could do better. Certainly, I would love it if we could conduct the debate at that level, rather than nit-picking about exactly what we mean by "identity".

(Having said that, I'm really not sure that to achieve what I think is needed, I would want to start from here. I'm afraid that the XML/JSON hybrid database we seem to be heading towards is a dreadful mess, at least as bad as the XML/relational or relational/object hybrid databases. Taking two reasonably clean but very different data models and crunching them together is not something, in my view, that in the past has ever produced a thing of beauty.)

Comment 3 Jonathan Robie 2014-10-03 13:44:16 UTC

(In reply to Michael Kay from comment #2)

Our requirement is to support in-situ updates of maps and arrays, but I agree that thinking of this in terms of the databases that need it is helpful.

I'd like to rely on database models already in use in XML databases, JSON-based NoSQL databases, and relational databases. In most places I know of, data is stored as either JSON or XML, but not as a hybrid of both. I want to have databases that can store collections of XML documents or JSON objects, and I want to be able to query and use both kinds of data.  They may well be different physical stores.

In MongoDB and most similar databases, each JSON object is stored independently of other objects, a given map or array cannot be contained by two different maps or arrays. I believe this is also true of JSON support in SQL. This also mirrors what would happen in a naive file-based implementation if we serialize JSON and allow it to be updated. If we did anything different, I just don't think many vendors would implement our model. JSON does not have pointers, if you need cross references among maps you can do it by creating a GUID to represent the identity of a map, and use the same GUID for references to the map. I do not think we should try to go beyond this. Let's keep it simple.

I'll address the issue of XML contained in JSON in the next comment.

Comment 4 Jonathan Robie 2014-10-03 14:02:54 UTC

(In reply to Michael Kay from comment #2)

> (Having said that, I'm really not sure that to achieve what I think is
> needed, I would want to start from here. I'm afraid that the XML/JSON hybrid
> database we seem to be heading towards is a dreadful mess, at least as bad
> as the XML/relational or relational/object hybrid databases. Taking two
> reasonably clean but very different data models and crunching them together
> is not something, in my view, that in the past has ever produced a thing of
> beauty.)

I think the biggest need is to be able to support both XML stores and JSON stores. In my world, I have a lot of simple object data represented as JSON, often created by object persistence from JavaScript or Java objects, and I also have XML and HTML documents. I want to be able to use this data together, but I don't want to convert one into the other in order to do so.

Our model already says that XML nodes cannot contain maps or arrays, but maps and arrays can contain XML, as in the second example from the original comment:

$x := <i>1 2 3</i>
$y := { "x" : $x }
$z := { "x" : $x }

That raises three questions for a database vendor.

1. Can I just refuse to store a hybrid structure like that?

Our languages let you create this structure, but you can imagine that some vendors would not want to deal with it, because it adds complexity and may well not be a mainstream case.

2. If I store this structure, do I need to maintain the identity of the element so that updates to $x are propagated across $y and $z?

That would actually be a useful way to create a simple hierarchy of metadata to describe XML documents, and you could imagine that someone would want to do that. It also matches the run-time semantics we probably want in our languages, where the identity of elements is exposed and can be tested. But I think it would be fairly difficult with some existing data stores.  And it implies semantics that would not survive serializing and parsing the data (but so do our languages).

I don't think we want the in-memory semantics to differ from the semantics of stored instances. Creating new copies of XML nodes when persisting the data would make this a lot easier to implement in some environments, but it would force us to change the semantics of our languages in ways that lose identity.

This is the most complicated case, and arguably not the highest value case.  I'm inclined to find some way to say that it's OK to decide not to store such an instance, but if you do, persisting it should not change the identity of the XML nodes.

Comment 5 Ghislain Fourny 2014-10-06 09:07:00 UTC

Hi,

First, I think it is very good design that the core (XQuery 3.1) spec does not expose the identity of maps and arrays (meaning, no query allows a user to tell).

Second, I agree with Jonathan's point on document stores. When we designed JSONiq, we precisely had the document store (more precisely, MongoDB) use case in mind, and therefore used copy semantics in constructors, which considerably simplifies read/write to the underlying store. In terms of usability, it proved to be a good decision and I would make the exact same again without hesitation (for this very use case, of course). The drawback -- but it is the price to pay to achieve this usability -- is that it is hard to completely optimise copying away when it is not needed (e.g., when the child is only used once, which happens quite often in memory).

In general, I feel tension between two visions of the usage of maps and arrays. These two visions have been on top of our minds, I think, from the beginning of the design of the 3.1 spec. I believe that this is accurately described by Mike (network vs. hierarchy). It is important to note that the two corresponding use cases (1. identity- and type-preserving maps in memory, 2. storage as a document in a DB) are both equally important and needed by users (I think).

Perhaps the decision could be left to the design of the Update Facility 3.1, i.e. XQuery 3.1 also does not preclude the choice between copying or not copying?

I do think both use cases could potentially be supported by a static context property (in the Update Facility) that tells whether to copy or not. This property could be modified, like any other, (i) at the implementation level, (ii) at the module level and (iii) at the decision level.

I hope it helps.

Kind regards,
Ghislain

Comment 6 Ghislain Fourny 2014-10-06 09:09:20 UTC

I meant (iii) at the expression level.

But I think it is a nice and meaningful lapsus :-)

Comment 7 Jonathan Robie 2014-10-06 13:47:52 UTC

(In reply to Ghislain Fourny from comment #5)

> Perhaps the decision could be left to the design of the Update Facility 3.1,
> i.e. XQuery 3.1 also does not preclude the choice between copying or not
> copying?
> 
> I do think both use cases could potentially be supported by a static context
> property (in the Update Facility) that tells whether to copy or not. This
> property could be modified, like any other, (i) at the implementation level,
> (ii) at the module level and (iii) at the decision level.


Or perhaps we could leave this choice to the persistence layer, which is outside of what our specs actually define, but which we can talk about in notes and guidelines.

I don't think we need to support shared sub-arrays or shared sub-maps.  As long as that is not needed, this might work:

1. Maps and arrays support in-situ update (which can be represented in PULs). This implies identity.
2. Construction uses copy semantics for maps and arrays, but not for nodes.
3. A persistence layer can choose whether to create copies of nodes found in maps and arrays instead of preserving their identity, or refuse to store a map or array that contains nodes. Ideally, I would like this to be documented, could we make it implementation-defined?  I'm not sure we can, I'm not sure that this is actually in the scope of what our languages define.
4. Updates work the same way on any XDM.  Only #3 distinguishes implementations.

Comment 8 C. M. Sperberg-McQueen 2014-10-07 18:23:41 UTC

I am generally sympathetic to Michael Kay's request that we conduct our discussion in terms of data independence and not by splitting hairs over the meaning of terms. I apologize, therefore, for commenting solely on the question of terminology and not on the questions of design. My excuse is that I can't contribute to any design discussion if I cannot understand what people are saying, and the use of the unqualified term "identity" to mean solely "persistent identity across mutation or update" (instead of what I understand "identity" to mean) makes it very hard for me to follow some of the discussion here. Also, since I seem to be responsible for making JR self-conscious about his usage of the term, I would like to try to show that a less misleading usage is possible.

JR asks, in the initial description of the issue:

I do not believe that the value of $z should be changed, so
I think that we should use copy semantics here. Is there a
good way to say this without referring to identity?

Yes, there are plenty of ways to say it without any use of the term "identity". There are also plenty of ways to say it that use the term "identity" in its conventional English sense, without any notion that "identity" applies only to complex mutable objects and does not apply to (say) the integers.

By "identity" I believe normal English usage means either (a) similarity among distinct objects (as in "identical twins") or (b) the property of being itself and being distinct from other things. We really do not want sense (a) here or elsewhere. In sense (b), every thing which we can identify necessarily has "identity"; saying that maps, arrays, and elements have identity, therefore, is true but not particularly helpful, since it doesn't help distinguish them from other constructs in our data model or our languages. What is at issue here, I think, is that we envisage having operators whose results depend only on the identity of the maps, arrays, or elements to which the operators are applied, or (roughly the same thing in different words) we envisage having operators which expose the identity of maps and arrays in much the same way that 'is' and '<<' and '>>' expose the identity of nodes.

To test my claim that we can express what we need to express without using the term "identity" in the ways I continue to object to, let me suggest wordings for some sentences which, I believe, accurately convey the intended meaning.

- For "Suppose we ultimately decide that maps and arrays have identity," read "Suppose we ultimately decide to expose the identity of maps and arrays".

- For "(in pseudocode, assuming maps and arrays do have identity)" read "(in pseudocode)".

- For "Elements do have identity" read "Elements have node identity".

- For "creating a GUID to represent the identity of a map" read "creating a GUID to represent the identity of a map" (i.e. no change is needed).

- For "to change the semantics of our languages in ways that lose identity", I do not know what to write, because I'm not sure what's being said.

- For "This implies identity" read, perhaps, "This implies some sort of identity across updates".

None of the references to object identity, preserving identity, exposing identity, maintaining identity, or changing the identity of nodes needs revision, because all of them make perfect sense when "identity" is understand as the property of things which makes them identical to themselves and different from other things.

Comment 9 Michael Kay 2014-10-07 20:02:34 UTC

I think the key feature of the property we have been calling "identity" is that it is associated with things that have a life-history: a creation event, often followed by mutation, and finally destruction. For such things, identity is established by the creation event, and two things with different history are not identical even if their properties are otherwise the same. Integers do not have a life-history; documents do. The corrollory is the existence of constructors which are not pure functions because each invocation returns a distinct thing.

Comment 10 Jonathan Robie 2014-10-07 21:20:13 UTC

(In reply to Michael Kay from comment #9)
> I think the key feature of the property we have been calling "identity" is
> that it is associated with things that have a life-history: a creation
> event, often followed by mutation, and finally destruction. For such things,
> identity is established by the creation event, and two things with different
> history are not identical even if their properties are otherwise the same.
> Integers do not have a life-history; documents do. The corrollory is the
> existence of constructors which are not pure functions because each
> invocation returns a distinct thing.

Yes, I agree.

(In reply to C. M. Sperberg-McQueen from comment #8)

> What is at issue here, I think, is that we envisage having
> operators whose results depend only on the identity of the maps, arrays, or
> elements to which the operators are applied, or (roughly the same thing in
> different words) we envisage having operators which expose the identity of
> maps and arrays in much the same way that 'is' and '<<' and '>>' expose the
> identity of nodes.

Not for me.  I don't need those operators at all, I just need to be able to do in-situ updates on maps and arrays.
 
> To test my claim that we can express what we need to express without using
> the term "identity" in the ways I continue to object to, let me suggest
> wordings for some sentences which, I believe, accurately convey the intended
> meaning.

Could we open a separate bug for wording related to identity?  It's a good thing to discuss, but I think it's distinct from the requirements we are trying to satisfy for in-situ updates, and it can get detailed.

Comment 11 C. M. Sperberg-McQueen 2014-10-07 22:22:17 UTC

Michael Kay is quite right that the property of object identity is necessary to distinguish between objects with different histories.  Conversely, we can talk about objects having different histories if and only if they are different objects.  Since the determination of identity or non-identity of things with similar structures and values requires more care from language designers, and more attention from language users, than the identity or non-identity of immutable and/or history-less things, it is not surprising that the term "identity" often pops up in discussions of such things; there is no need, however, to depart from the normal meaning of the word "identity" in order to discuss them.  In particular, there is no advantage in trying to restrict the term "identity" to mutable objects or objects which can have different histories; it only confuses things.

Jonathan Robie seems to have understood my reference to operators which distinguish things based solely on their identities as a proposal for some relatively specific abstract syntax; it was not.  We envisage having a language in which some constructs have observable effects which depend on the identity or non-identity of some things.  If we are not going to have such constructs, then how will anyone tell whether or not the language has in-situ updates or not?

JR suggests opening "a separate bug for wording related to identity" and appears to want to regard comment 8 as not relevant to the bug.  But comment 8 is merely responding to a question asked towards the bottom of the original bug report.

And (as I am getting a little tired of pointing out) it's very difficult to have a useful discussion if some participants insist on using terms in special meanings, without definitions, even after requests for clearer wording, or definitions, have been made.

Comment 12 Jonathan Robie 2014-10-08 19:32:24 UTC

(In reply to Michael Kay from comment #2)
> Certainly, I would love it if we could conduct the debate at that level,
> rather than nit-picking about exactly what we mean by "identity".

I agree.

In this bug, I am not going to enter a debate on terminology, so I opened Bug 27001 to discuss the terminology we use surrounding identity.

Comment 13 Liam R E Quin 2014-10-27 21:24:08 UTC

As of F2F (meeting 585)

<jrobie> Jonathan would propose:
<jrobie> 1. map and array constructors preserve nodes, and do not copy them
<jrobie> 2. map and array constructors make copies of maps and arrays
<jrobie> 3. when you persist a map or array that contains nodes, it is
implementation defined whether the identity of nodes is preserved
<jrobie> 4. the distinction between "having identity" and "having the illusion
of identity for the sake of in-situ updates" is major editorial, but editorial,
it's a question of which abstraction is most convenient for our description

DECISION: adopt Jonathan's 4 4 points on map and array constructors from
meeting A-585 agendum 2.1, to reference nodes and otherwise copy, as guidance
to the editors.

Comment 14 Michael Kay 2014-10-27 23:20:35 UTC

I do not want F+O discussions of operators such as map:merge to contain any mention of concepts like "copying" a map or array. It's a meaningless concept at that level. Specs should be testable, and there is no way of testing whether map:merge has copied a map or array or not, therefore the spec should have nothing to say on the subject. Any discussion of identity-based semantics for maps and arrays must be confined solely to the Update specs, plus if necessary a permissive enabling statement in the data model to legitimize it.

I don't think the issue is resolved by leaving it to the editors, since I suspect that different editors will go off in different directions. Therefore re-opening.

Comment 15 Michael Kay 2014-10-28 00:04:01 UTC

If I could also add what is intended as a constructive remark on the original subject of this bug report ("not precluding updates") I think it would be very helpful if we had a concrete description of what kind of thing it is that we are intending not to preclude. It's become clear in recent weeks as we have discussed different models of trees and graphs that there are many different possibilities of what might happen once you allow maps to be both mutable and refernceable, and it seems that people are happy to preclude some of these models and not others. 

So I would suggest (again) that the best way of not precluding any particular future development is to say nothing until we have designed and reached consensus on that future developement. Otherwise we're being asked to sign a blank cheque. Just as we're being told that agreeing to the requirements document saying that we "would not preclude updates" means we have to put something in the data model, we are then going to be told that a carefully muted form of words in the data model means we agreed to some particular kind of database system becoming within the scope of our specs.

If we had a concrete proposal on the table for a spec that involved updateable maps and arrays, we would at least have some idea of what we are trying not to preclude.

Comment 16 Jonathan Robie 2014-11-04 20:57:54 UTC

(In reply to Michael Kay from comment #14)
> I do not want F+O discussions of operators such as map:merge to contain any
> mention of concepts like "copying" a map or array. It's a meaningless
> concept at that level. 

They already do that, and even specify most of the details.  For instance:

<quote>
17.1.1 map:merge
Creates a new map that combines entries from a number of existing maps.
</quote>

<quote>
17.1.8 map:remove
Constructs a new map by removing an entry from an existing map
</quote>

Sometimes this is a little confused:

<quote>
17.1.6 map:put
Creates a map that adds a single entry to an existing map, or replaces a single entry in an existing map.
</quote>

Does it really "replace a single entry in an existing map", or does it create a new map? Later text clarifies that a new map is created.

<quote>
The function map:put returns a new ·map· The new map contains all entries from the supplied $map, with the exception of any entry whose key is $key, together with a new entry whose key is $key and whose associated value is $value.
</quote>

> Specs should be testable, and there is no way of
> testing whether map:merge has copied a map or array or not, therefore the
> spec should have nothing to say on the subject. 

I believe the above assertions are testable without updates.  The level of clarity we need for updates is also needed for these functions.

For instance, variable references clearly show us whether a new copy has been made by map:merge, map:put, or map:remove. Some examples:

declare variable $element := <e/>;
declare variable $map := map { "one" : 1, "two" : 2 }
declare variable $array := [ 1, 2 ]

* Query 1 - identity of elements placed in a map

let $map1 := map { "value" : 1, "map" : $map, "array" : $array, "element" : $element }
let $map2 := map { "value" : 1, "map" : $map, "array" : $array, "element" : $element }
return $map1("element") is $map2("element")

If this returns 'true', then we have the same element in each case, and the map constructor does not create a new node with a new identity, as element constructors do.

* Query 2 - does map:put return a new map or not?

let $map1 := map { "value" : 1, "map" : $map, "array" : $array, "element" : $element }
let $map2 := map:put($map1, "color", "white")
return $map1("color")

The reference to $map1 shows that $map2 is not a modified copy of $map1, but a new map.

* Query 3 - does map:merge return a new map or not?

let $map1 := map { "value" : 1, "map" : $map, "array" : $array, "element" : $element }
let $map2 := map:merge($map1, map:entry("color", "white"))
return $map1("color")

The reference to $map1 shows that $map2 is not a modified copy of $map1, but a new map.

* Query 4 - does map:remove return a new map or not?

let $map1 := map { "value" : 1, "map" : $map, "array" : $array, "element" : $element }
let $map2 := map:remove($map1, "value") 
return $map1("value")

The reference to $map1 shows that $map2 is not a modified copy of $map1, but a new map.

Comment 17 Jonathan Robie 2014-11-04 21:28:04 UTC

(In reply to Michael Kay from comment #15)
> If I could also add what is intended as a constructive remark on the
> original subject of this bug report ("not precluding updates") I think it
> would be very helpful if we had a concrete description of what kind of thing
> it is that we are intending not to preclude. 

I assume that this is a response to comment #13, but I can't tell what part of comment #13 you are objecting to.

We intend not to preclude in-situ updates of deeply nested structures using an extension of the update specification.  We want to specify the results of these updates as clearly as the examples in comment #16.

One example of this is shown in the use cases - the internal version shows it at www.w3.org/XML/Group/qtspecs/requirements/xquery-31/html/Overview.html#update.example

And we want to be able to do this for deeply nested structures, using the basic model of the update spec: identify a target for the update with an expression, identify the update operation, the operation is placed on a pending update list.

> It's become clear in recent
> weeks as we have discussed different models of trees and graphs that there
> are many different possibilities of what might happen once you allow maps to
> be both mutable and refernceable, and it seems that people are happy to
> preclude some of these models and not others. 

I don't know if you were able to listen in to any of the face-to-face discussion on this point on the phone at the face-to-face, but comment #13 was the result of a relatively long discussion on this question.

We do need to make design decisions. Every design decision preculudes some possibilities. But we tried to keep this relatively open.

> So I would suggest (again) that the best way of not precluding any
> particular future development is to say nothing until we have designed and
> reached consensus on that future developement. Otherwise we're being asked
> to sign a blank cheque. 

I really do not think that comment #13 amounts to a blank check.  I wish you could have been there for the entire discussion that led to this decision.

> Just as we're being told that agreeing to the
> requirements document saying that we "would not preclude updates" means we
> have to put something in the data model, we are then going to be told that a
> carefully muted form of words in the data model means we agreed to some
> particular kind of database system becoming within the scope of our specs.

I think that's equivalent to precluding updates on nested maps and arrays.

We had to specify that element constructors create new copies of nodes precisely to be clear on the kinds of update semantics that we're discussing here.  And we also need to specify whether they create new copies of maps or arrays if we want to be able to support interoperable updates.

> If we had a concrete proposal on the table for a spec that involved
> updateable maps and arrays, we would at least have some idea of what we are
> trying not to preclude.

I hope we do not need a complete proposal for the 3.1 update specification before our current 3.1 specifications are allowed to progress.

I had thought this was relatively clear:

We want to do in-situ updates using the same basic model that the update specification uses for nodes: identify a target for the update with an expression, identify the update operation, the operation is placed on a pending update list.

That means that we need clarity on when map and array constructors or functions are creating copies, because otherwise we can not specify the effects of updating a map or array.

What else do you believe needs to be specified to have sufficient clarity to progress?

Comment 18 C. M. Sperberg-McQueen 2014-11-04 21:34:26 UTC

With respect to comment 16, I do not think the argument "the text already does X, therefore we should not entertain objections to the text doing X" is a sound one, even if the text does in fact do X.

All of the occurrences of 'Creates a new map ...' and similar could easily be replaced, without loss of clarity, by 'Returns a map ...' or 'Evaluates to a map ...'.  In the case where the map passed in as an argument is deep-equal() to the map returned, the current wording can (as JR points out) be read as dictating the creation of a new map rather than potentially returning the same map (assuming that maps have intensional identity and not extensional identity).

If the existence of words in the current draft of a spec is taken as committing the WG to every view of the world that can be read out of things, and requires extraordinary measures to change, then the result will be to make WG members inordinately cautions about accepting draft wording.

Comment 19 Jonathan Robie 2014-11-04 21:45:09 UTC

(In reply to C. M. Sperberg-McQueen from comment #18)
> With respect to comment 16, I do not think the argument "the text already
> does X, therefore we should not entertain objections to the text doing X" is
> a sound one, even if the text does in fact do X.

Huh?  I think the argument is more along these lines:

A. "I don't think we should be required to change the text to do X"
B. "But the text already does X"

> All of the occurrences of 'Creates a new map ...' and similar could easily
> be replaced, without loss of clarity, by 'Returns a map ...' or 'Evaluates
> to a map ...'.  In the case where the map passed in as an argument is
> deep-equal() to the map returned, the current wording can (as JR points out)
> be read as dictating the creation of a new map rather than potentially
> returning the same map (assuming that maps have intensional identity and not
> extensional identity).

Whatever wording we use, we need to know the result of the kinds of queries I provided in comment #16. I don't think we have to change the wording significantly to do so, but I do think we need this level of clarity if we want to know the results of existing functions and operations defined in our language.

Comment 20 Michael Kay 2014-11-04 21:49:46 UTC

Concerning comments #16 and #18, the specification frequently uses terms such as "creates", "constructs", and "new" when referring to objects such as strings and sequences that have no existential identity; I think the main purpose of this is to cater to a readership that is accustomed to sequences and even strings being mutable. The language used for maps and arrays has continued this tradition, perhaps even more emphatically to avoid any danger that people familiar with other languages will interpret operations such as remove() as mutations. I quite agree that it's not really appropriate; it would be much better to say "Returns a sequence that..." or "Returns a string that... " or "Returns a map in which...". Unfortunately getting rid of the term "constructor functions" is rather more difficult.

Comment 21 Jonathan Robie 2014-11-04 22:54:58 UTC

(In reply to Michael Kay from comment #20)

Could you be more concrete? What other language would you propose that:

(a) clearly specifies the results of the queries shown in comment #16, and
(b) does not "preclude in-situ updates analogous to updates in the XQuery Update Facility

<quote>
The map feature MUST NOT preclude in-situ updates analogous to
updates in the XQuery Update Facility.

Arrays MUST NOT preclude in-situ updates analogous to updates in
the XQuery Update Facility.
</quote>

Comment 22 Michael Kay 2014-11-04 23:29:00 UTC

In response to comment #21, I do not believe any change to F+O is needed.

(a) it's clear that maps are currently immutable and that operations produce new maps rather than making in-situ modifications

(b) the specification does not preclude changes being made in the future. None of our specifications do.

However, words like "creates", "constructs", and "new" might be better avoided because they could be misinterpreted as saying something about identity, when this is not intended. 

For example, the summary of fn:remove is currently

Returns a new sequence containing all the items of $target except the item at position $position.

which might be better phrased as

Returns a sequence containing all the items of $target except the item at position $position.

(Saying it returns a "new sequence" is a bit like saying that 2+2 returns a "new integer")

Similarly, the summary of map:remove is currently

Constructs a new map by removing an entry from an existing map

which might be better phrased as

Returns a map containing all the entries of an existing map $map with the exception of the entry that has key $key.

Comment 23 Jonathan Robie 2014-11-05 16:16:41 UTC

Back to the comment that reopened this issue.

(In reply to Michael Kay from comment #14)
> I do not want F+O discussions of operators such as map:merge to contain any
> mention of concepts like "copying" a map or array. It's a meaningless
> concept at that level. 

I think map:merge and map:remove are already quite clear.

As comment #13 points out, this is not clear for the values of map constructors and array constructors. Comment #13 says how this should be clarified.

I believe the same clarifications need to be made for functions that create maps in which a map or an array is the value of a key/value pair - that's the only clarification that is needed.

> Specs should be testable, and there is no way of
> testing whether map:merge has copied a map or array or not, therefore the
> spec should have nothing to say on the subject. Any discussion of
> identity-based semantics for maps and arrays must be confined solely to the
> Update specs, plus if necessary a permissive enabling statement in the data
> model to legitimize it.

In an implementation that does not persist maps and arrays, these assertions are not testable, and will not be even with the update specification.  So this does not impede any optimization for such an implementation.

In an implementation that does persist maps and arrays, this clarifies the semantics of what it means to do an in-situ update that is analogous to in-situ updates for XML. Any implementation that persists instances between now and the release of the update specification needs to get this right, now, or else reconstruct all data once we clarify the answer.

The XQuery / XPath 3.1 specifications have three requirements:

1. For the sake of optimizability, a map SHOULD NOT expose identity
   via the is, <<, >>, union, intersect, or except operators, or any
   operation that exposes document order.

2. The map feature MUST NOT preclude in-situ updates analogous to
   updates in the XQuery Update Facility.

3. Arrays MUST NOT preclude in-situ updates analogous to updates in
   the XQuery Update Facility.

The ambiguity we are discussing means that #2 and #3 are not met for data created by these specifications and then persisted. This will affect primarily implementations interested in persisting JSON, and being able to update it in the future. People are doing this now, at least with JSONiq, and will presumably do the same with XQuery 3.1.

We can make sure we got it right by (1) imagining that the 'is' operator is available for maps and arrays, and making sure that we understand what the result would be, or (2) imagining that an in-situ update is made, and making sure that the result is clear. 

If you insist that all assertions must be testable in this version of the specification, we could (a) relax the SHOULD NOT requirement so that the 'is' operator is available to make the assertions testable and we can meet our MUST NOT requirements, or (b) discuss how to manage the progression of our documents so that we're not inviting people to persist data created with our specifications without being clear what happens if it is updated.  But I think both of these are costly options, and I don't think it's costly to be clear in our current specifications.

Comment 24 Jonathan Robie 2014-11-05 16:19:42 UTC

(In reply to Michael Kay from comment #22)

> (b) the specification does not preclude changes being made in the future.
> None of our specifications do.

But people are creating and persisting data now.  If they ever plan to update that, there's a big gotcha due to an ambiguity in our specification. The Working Group has agreed, at least twice, to fix that gotcha.

I don't think we want to publish a specification with a warning that says "please don't persist this data, because we decided not to clarify some things that will bite you if you do updates in the future".

And I don't think that would satisfy our requirements.

Comment 25 Michael Kay 2014-11-06 09:04:06 UTC

>If they ever plan to update that, there's a big gotcha due to an ambiguity in our specification. The Working Group has agreed, at least twice, to fix that gotcha.

I don't believe there is any ambiguity in our specification, and I beleive that the only thing we have agreed regarding updating of maps and arrays is that we will not do anything that prevents us deciding to do this in the future.

Comment 26 Jonathan Robie 2014-11-06 18:05:03 UTC

(In reply to Michael Kay from comment #25)
> >If they ever plan to update that, there's a big gotcha due to an ambiguity in our specification. The Working Group has agreed, at least twice, to fix that gotcha.
> 
> I don't believe there is any ambiguity in our specification, and I beleive
> that the only thing we have agreed regarding updating of maps and arrays is
> that we will not do anything that prevents us deciding to do this in the
> future.

Comment #13 is the last Working Group decision on this, is there new information that should require us to revisit that decision?

We have discussed and agreed to this requirement at least a handful of times, starting with the requirements document. There is no ambiguity as long as we preclude updates on the instances that are created.  But if we do allow any kind of in-situ updates analogous to the current XQuery Update Facility, we face the ambiguity mentioned in the original description of this issue.

Comment 27 Michael Kay 2014-11-06 18:14:18 UTC

> We have discussed and agreed to this requirement at least a handful of
> times, starting with the requirements document. There is no ambiguity as
> long as we preclude updates on the instances that are created.  But if we do
> allow any kind of in-situ updates analogous to the current XQuery Update
> Facility, we face the ambiguity mentioned in the original description of
> this issue.

The decision recorded in comment #13 appears to be a provisional decision affecting the design of a future version of XQuery Update. It doesn't affect our current specs. It says (I think) that map and array constructors should "copy" any maps and arrays supplied as input, but since "copying" a map or array is an undefined and meaningless operation until we introduce updates, that's not a meaningful thing to say in our current specs.

Comment 28 Michael Kay 2014-11-06 18:28:13 UTC

>There is no ambiguity as long as we
preclude updates on the instances that are created. 

Perhaps we have a different understanding of the word preclude?

My understanding is it means "to make impossible". The requirement is that we shall not do anything that makes updates impossible. I think we can achieve this by doing nothing. Your objection, going back to the original comment in this bug entry, is that doing nothing "results in two sets of behavior that can be distinguished only when updates are implemented". I don't think that's a problem; the time to define the detail of how updates work is when we design the update facility, not now. Many people (for example, most XPath/XSLT implementors) will never implement in-situ update, and it would be entirely wrong for specs other than the Update spec to attempt to dictate choices that have no effect on non-Update programs.

Comment 29 Jonathan Robie 2014-11-11 13:42:13 UTC

(In reply to Michael Kay from comment #28)

> Perhaps we have a different understanding of the word preclude?
> 
> My understanding is it means "to make impossible". The requirement is that
> we shall not do anything that makes updates impossible. I think we can
> achieve this by doing nothing. Your objection, going back to the original
> comment in this bug entry, is that doing nothing "results in two sets of
> behavior that can be distinguished only when updates are implemented". I
> don't think that's a problem; the time to define the detail of how updates
> work is when we design the update facility, not now. 

My objection is that:

1. People are creating persistent data now, and there's a big gotcha if we're going to tell them later that they created it the wrong way and it cannot be updated according to our specs.

2. There are already implementations that update JSON from an XQuery environment. Like the original update spec, implementation is happening before the specification because we are lagging on the spec.  People need some guidance.

3. We know what an in-situ update is.  We have a requirement to not preclude it. We have a working group decision on how to do this.

> Many people (for
> example, most XPath/XSLT implementors) will never implement in-situ update,
> and it would be entirely wrong for specs other than the Update spec to
> attempt to dictate choices that have no effect on non-Update programs.

This has absolutely no effect on an implementation that never implements in-situ updates.  But it still affects the non-update specs, just as XML constructors do.  For XML constructors, we carefully spell out what is constructed so that updates can be done consistently.

In fact, the basic question is this: should our constructors clearly state what is constructed or not?  For updates, it's important to do that, because otherwise we do not know what is being updated.  You seem to be taking the position that we don't need to be clear about this because it's only a gotcha if someone tries to update it.

This is essential for any implementation that does in-situ updates.  It has no effect on any other implementation.

Comment 30 Jonathan Robie 2014-11-11 14:43:23 UTC

I just added the text to the internal working draft so we can discuss this more concretely. I am using <add></add> to indicate new material.

For map constructors:

<quote section="3.10.1 Maps">
Each MapKeyExpr expression is evaluated and atomized; a type error [err:XPTY0004] occurs if the result is not a single atomic value. The associated value is the result of evaluating the corresponding MapValueExpr. <add>Unlike element constructors, if the MapValueExpr evaluates to a node, the associated value is the node itself, not a new node with the same values. If the MapValueExpr evaluates to a map or array, the associated value is a new map or array with the same values.

Note:

XQuery 3.1 has no operators that can distinguish a map or array from another map or array with the same values. However, we need to be clear about the data model instance that is constructed so that future versions of the XQuery Update Facility update the instances defined in this specification.</add>
</quote>

For array constructors:

<quote section="3.10.2.1 Array Constructors">
<add>
In both forms of an ArrayConstructor, if a member expression evaluates to a node, the associated value is the node itself, not a new node with the same values. If the member expression evaluates to a map or array, the associated value is a new map or array with the same values.

Note:

Comment 31 C. M. Sperberg-McQueen 2014-11-11 18:08:51 UTC

In comment 29, Jonathan says:

    1. People are creating persistent data now, and there's a big gotcha if 
       we're going to tell them later that they created it the wrong way and 
       it cannot be updated according to our specs.

This leads to a question about timing.  If it's true that people are creating persistent data now, then aren't we already committed to what you call a 'gotcha', as soon as we define the details of how updates work, whether we do it later (as I take MK to be suggesting) or now (as I take you to be suggesting)?  I assume your argument may be that "the gotcha will be less painful now than later", but the argument "the gotcha is less painful at time T1 than at time T2" and the argument "time T1 has no gotcha and time T2 has a gotcha" don't on the face of it seem the same to me.  Which are you making?

Jonathan also writes:

    3. We know what an in-situ update is.  

Do we?  It's not obvious to me that we do; why do you think so?

Comment 32 Jonathan Robie 2014-12-05 22:19:30 UTC

For maps, I changed the text to this:

<add>
Each MapKeyExpr expression is evaluated and atomized; a type error [err:XPTY0004] occurs if the result is not a single atomic value. The associated value is the result of evaluating the corresponding MapValueExpr. If the MapValueExpr evaluates to a node, the associated value is the node itself, not a new node with the same values.

Note:

XQuery 3.1 has no operators that can distinguish a map or array from another map or array with the same values. Future versions of the XQuery Update Facility, on the other hand, will expose this difference, and need to be clear about the data model instance that is constructed.

In some existing implementations that support updates via proprietary extensions, if the MapValueExpr evaluates to a map or array, the associated value is a new or array with the same values.
</add>

For arrays, I changed it to this:

<add>
If a member of an array is a node, its node identity is preserved.

Note:

XQuery 3.1 has no operators that can distinguish a map or array from another map or array with the same values. Future versions of the XQuery Update Facility, on the other hand, will expose this difference, and need to be clear about the data model instance that is constructed.

In some existing implementations that support updates via proprietary extensions, if a member expression evaluates to a map or array, the member is a new map or array with the same values.
</add>