Change proposal for the terms 'identity', 'identical', etc.

C. M. Sperberg-McQueen, 13 October 2014



This document lists ten changes which I believe are needed in order to align the usage of the terms identity, identical, etc. in the QT specs with each other and with ordinary English usage. It also lists some passages where changes might be desirable, but where they seem less essential.

In some cases the document also includes brief discussions identifying what seems problematic in the existing text. These discussions are intended to identify the motivation for proposing the change for readers who do not see the motivation at a glance; they are not intended as statements of general principle to which the working groups are asked to agree. WG members may accept the wording changes proposed without assenting to or sharing the motivations given.

Such additional discussions may perhaps be suppressed when this document is presented to the working groups for action. If you're reading this document in its current form, I would be grateful for your thoughts on whether it's better to leave them in or suppress them.

The ordinary English usage relevant for our purposes, which these proposals attempt to make our specs follow, is instantiated by the following definitions, adapted from readily available dictionaries of English (details in separate attachment).

identity
  • the property (possessed by a thing) of being itself, and not another thing
  • the condition or fact of being some specific person or thing (as in "X and Y have the same identity")
identical
  • the same thing; the same one
  • sharing the same identity
identify
  • to recognize or establish as being a particular person or thing
  • to establish or ascertain the identity of (some thing)

Note that these definitions entail the conclusion that in ordinary English usage:

In casual usage, these terms are often used more broadly; in the broad sense, two things may be said to be identical if they are very, very similar, even though they are not thought to be the same entity (as in the phrase identical twins). I propose that we should continue to use identity etc. in the narrower senses given above.

Changes proposed

Note that no testable assertion in our specs is altered or invalidated by the wording changes proposed here.

XQuery and XPath Data Model 3.1

2.3 Node Identity

For

Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)

Note:
The concept of node identity should not be confused with the concept of a unique ID, which is a unique name assigned to an element by the author to represent references using ID/IDREF correlation.

read:

Each node has a unique identity; this identity is distinct from the values or other visible properties of the node. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. Nodes may be distinct even when they have the same values for all properties other than their identity.

(Atomic values do not also have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)

Note:
The concept of node identity should not be confused with the concept of a unique ID, which is a unique name assigned to an element by the author to represent references using ID/IDREF correlation.

Discussion: ordinary English usage uses identical as the adjectival form describing things that have identity. So any X and Y are identical if and only if they share the same identity. The existing text proposes that we deny that integers have identity at the same time that we say they are identical to themselves. That kind of baroque conceptual hair-splitting is bad enough when it's unavoidable; here it's not.

2.8.1 Functions

For

[Definition: A function is an item that can be called. ] Functions have no identity, cannot be compared, and have no serialization.

read

[Definition: A function is an item that can be called. ] Functions have no identity, cannot be compared for identity, equality, or otherwise, and have no serialization.

Discussion: In order to evaluate a call to a function, a processor must succeed in distinguishing the function in question from other functions. In ordinary English, that means it must identify the function, and the function must have an identity. What is meant here is, I guess, that the identity of functions is not visible to expressions in our languages* and that the data model provides no operations for testing identity of functions. That is entailed, I think, by saying that they cannot be compared.

* not visible to expressions, that is, except to the extent that deterministic functions f and g which return distinct values for identical arguments cannot be the same function and thus cannot be identical.

The statement that functions have no identity seems to be contradicted by the statement in section 1.6.4 of Functions and Operators that from time to time "the specification requires two function items to be identical".

The addition to the phrase about comparison is intended (at the suggestion of an outside reader) to ward off the misapprehension that all we mean is that functions are not ordered.

6 Nodes

For

The children and attributes properties of a node must not contain two nodes with the same identity.

read

The children and attributes properties of a node must not contain two nodes with the same identity. No node may appear more than once in the children or attributes properties of a node.

Discussion: if two nodes have the same identity, they are by definition not two nodes, but the same node, thus one node. If we did have one node appearing twice in the children attribute of its parent, we would still not have two nodes with the same identity -- it's not two nodes, but one, which appears twice.

XPath and XQuery Functions and Operators 3.1

17.1 Functions that Operate on Maps

For

Like sequences, maps have no identity (or none that is exposed to applications). It is meaningful to compare the contents of two maps, but there is no way of asking whether they are "the same map": two maps with the same content are indistinguishable.

read

Like sequences, maps have no identity (or none that is exposed to applications). It is meaningful to compare the contents of two maps, but there is no way of asking whether they are "the same map": two maps with the same content are indistinguishable.

This change seems necessary to align with the statement in XDM that maps do have identity, even though that identity is not now exposed and may never be exposed.

17.1.1 map:merge

For

If the input is a sequence of length one, the result map is the same as the supplied map. (Since maps are immutable and have no discernible identity, it is meaningless to ask whether the result is the same map as the original, or a different map with the same content.)

read

If the input is a sequence of length one, the result map is the same as the supplied map. (Since maps are immutable and have no discernible identity, it is meaningless there is no way to ask whether the result is the same map as the original, or a different map with the same content.)

Whether the question is meaningless, or meaningful but unanswerable, is a sometimes contentious question in the philosophy of science. We need not commit ourselves to answering it one way or the other. (And if we did commit ourselves, some WG members might prefer to avoid naive verificationism.)

17.3 Functions that Operate on Arrays

For

Like sequences, arrays have no identity. It is meaningful to compare the contents of two arrays, but there is no way of asking whether they are "the same array": two arrays with the same content are indistinguishable.

read

Like sequences, arrays have no identity that is exposed to applications. It is meaningful to compare the contents of two arrays, but there is no way of asking whether they are "the same array": two arrays with the same content are indistinguishable.

XQuery 3.1: An XML Query Language

3.9.3.7 Computed Namespace Constructors

For

Note:
The newly created namespace node has all properties defined for a namespace node in the data model. Like all nodes, it has identity. Like all nodes which do not share a common parent, the relative order of these nodes is implementation dependent. As defined in the data model, the name of the node is the prefix, and the string value of the node is the URI.

read

Note:
The newly created namespace node has all properties defined for a namespace node in the data model. Like all nodes, it has identity. Like all nodes which do not share a common parent, the relative order of these nodes is implementation dependent. As defined in the data model, the name of the node is the prefix, and the string value of the node is the URI. Also as defined in the data model, the relative order of all nodes which share no common ancestor is implementation dependent. The relative order of namespace nodes which share a parent is also implementation dependent.

Typo: parent where ancestor is meant.

XSL Transformations (XSLT) Version 3.0

21.1 Maps

For

Like sequences, maps have no identity. It is meaningful to compare the contents of two maps, but there is no way of asking whether they are "the same map": two maps with the same content are indistinguishable.

read

Like sequences, maps have no identity that is exposed to applications. It is meaningful to compare the contents of two maps, but there is no way of asking whether they are "the same map": two maps with the same content are indistinguishable.

21.1.2.1 map:merge

For

If the input is a sequence of length one, the result map is the same as the supplied map. (Since maps are immutable and have no discernible identity, it is meaningless to ask whether the result is the same map as the original, or a different map with the same content.)

read ...

If the input is a sequence of length one, the result map is the same as the supplied map. (Since maps are immutable and have no discernible identity, it is meaningless there is no way to ask whether the result is the same map as the original, or a different map with the same content.)

XQuery 3.1 Requirements and use cases

3.5.2 Function Options

For:

Nodes are not copied; identity is retained

read:

Nodes are not copied; their identity is retained

Optional changes and miscellaneous comments

N.B. This list includes occurrences of the word identity which do not require any change, but which might nevertheless be improved by revision, or which seem to be worth recording here for some other reason. The changes suggested below seem to me less essential than those suggested above.

Other occurrences of the terms identity, identical, and identify not listed here seem to me acceptable as they stand.

XQuery and XPath Data Model 3.1

6 Nodes

In the list of general constraints on nodes:

Every node must have a unique identity, distinct from all other nodes.

No change needed. (This is in fact more or less true by definition: saying that two nodes are distinct is the same as saying they are not identical, and vice versa. But the sentence does no harm and does remind the reader that node identity is a topic that requires a certain amount of attention.)

XPath and XQuery Functions and Operators 3.1

1.6.4 Properties of functions

In item 2 of the list in the definition of identical, for

Both items are nodes, and represent the same node

read

Both items are nodes, and represent are the same node

4.2.6 op:numeric-mod

The operation a mod b for operands that are xs:integer or xs:decimal, or types derived from them, produces a result such that (a idiv b)*b+(a mod b) is equal to a and the magnitude of the result is always less than the magnitude of b. This identity holds even in the special case that the dividend is the negative integer of largest possible magnitude for its type and the divisor is -1 (the remainder is 0). It follows from this rule that the sign of the result is the sign of the dividend.

No change needed.

Note that the word identity as used here implicitly attributes identity to atomic values; an identity in the sense used here is an expression whose form guarantees that the value denoted by the left-hand side and that denoted by the right-hand side are always identical. The paragraph thus seems at first glance to contradict the claim made elsewhere that atomic values "have no identity".

13.5 op:is-same-node

If the node identified by the value of $arg1 is the same node as the node identified by the value of $arg2 (that is, the two nodes have the same identity), then the function returns true; otherwise, the function returns false.

No change needed.

A more careful formulation might replace "the two nodes have the same identity" with "$arg1 and $arg2 have the same identity" or "the first and second arguments have the same identity" to avoid referring to two occurrences of a single node as "two nodes".

XML Path Language (XPath) 3.1

2.3.4 Errors and Optimization

For this purpose, two values are considered to represent the same outcome if their items are pairwise the same, where nodes are the same if they have the same identity, and values are the same if they are equal and have exactly the same type.

No change needed.

The current wording does unnecessarily suggest an unidentified distinction between being the same and having the same identity: given the ordinary meaning of identity in English, it is hard to see how two nodes which are not the same node could have the same identity, or how any X and Y which are the same node could fail to have the same identity. But the sentence doesn't say anything actively false, misleading, or confusing.

3.7.3 Node Comparisons

A comparison with the is operator is true if the two operand nodes have the same identity, and are thus the same node; otherwise it is false. See [XQuery and XPath Data Model (XDM) 3.1] for a definition of node identity.

No change needed.

A simpler wording might be:

A comparison with the is operator is true if the two operands have the same identity, and are thus are the same node; otherwise it is false. See [XQuery and XPath Data Model (XDM) 3.1] for a definition of node identity.

XQuery 3.1: An XML Query Language

Passages which appear to be the same as passages in the XPath spec already discussed are not listed again here.

3.17.6 Treat

If the value of expr1 is returned, its identity is preserved.

No change needed.

Note in passing that this formulation appears to assume that the phrase "its identity" is well defined for all values.