Bugzilla – Bug 12534

[DM] Definition of 'tree' missing and apparently non-standard

Last modified: 2014-03-04 16:35:58 UTC

The XDM 2E spec of 14 December 2010 says, in section 2.1 Nodes form a tree that consists of a root node plus all the nodes that are reachable directly or indirectly from the root node via the dm:children, dm:attributes, and dm:namespace-nodes accessors. Every node belongs to exactly one tree, and every tree has exactly one root node. The same words occur in the current draft (14 December 2010) of XDM 3.0; I'll open a separate bug report for that spec. Ideally, I'd be inclined to want an explicit definition in the XDM spec of what is meant by 'tree' here. I suppose an argument might be made that it's never possible to define absolutely everything; some notions end up being taken either as primitive undefined notions that can't be defined because they are primitives, or as well understood ideas that don't need to be defined because the spec is using them in the same sense they have in ordinary technical language. This argument has a major problem, though: the XDM spec cannot possibly be using 'tree' in any of the ordinary senses, because they are incompatible with the words quoted above. In mathematics, a tree is (in the words of Wolfram MathWorld [1]), "a simple, undirected, connected, acyclic graph (or, equivalently, a connected forest)". (Actually, in graph theory, 'tree' is routinely given any of several different definitions, and it is an exercise in elementary graph theory to prove that the different definitions are all equivalent. But Wolfram's definition will do.) But it is a direct consequence of the mathematical definition of tree that any node belongs to multiple trees (in most practical cases, to an infinite number of trees), because an undirected graph is just a set of nodes with a relation defined over the set of nodes, and any object that can be a member of a set is a member of multiple sets, unless it is the only object in the universe of discourse. When a universe of discourse has an infinite number of objects, any object will be a member of an infinite number of sets, and for each of those sets there will be at least one relation on the objects that defines a tree. It is also a fact that in ordinary mathematical discourse trees are not necessarily rooted; the trees XDM is interested in are directed graphs, not undirected graphs, and all the trees have roots. When 'tree' is used as the name of a data structure, it usually involves the notions of directedness and rootednees. But here, too, any node can belong to multiple trees, and almost always will; exceptions again involve universes of discourse with only one object. Even if we assume that a given parent : child relation is given and fixed (and if the reader is expected to assume that, this reader at least would really like to be told), the definition of 'tree' as a data structure is, in ordinary usage, formulated so as to guarantee that any tree with multiple nodes has a number of subtrees, each of which is by definition a tree. The discussion of trees as a datastructure in Wikipedia [2], for example, says explicitly that any subtree is a tree. So: either XDM is intending to use the term 'tree' in one of its ordinary senses, or it's intending to use the term in a non-ordinary sense. In the former case, the statement about any node belonging to a single tree is simply false and needs to be deleted or restated in a way that isn't bogus. In the latter case, this reader would like to know what definition is intended. My apologies for not noticing this problem earlier. [1] Weisstein, Eric W. "Tree." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/Tree.html [2] [Anonymous.] "Tree (data structure)." Wikipedia, last modified 18 April 2011. http://en.wikipedia.org/wiki/Tree_(data_structure)

The 3.0 analogue of this issue is bug 12535.

There's only one kind of tree that's of interest in XDM, and that's the tree of nodes defined implicitly by the dm:parent relation between nodes. XDM is talking about this kind of tree (and indeed, this kind of node), and XDM nodes each belong to exactly one tree of this kind. A "subtree" of this tree is not itself a tree, because it does not contain a root node; it is an essential property of a root node that it has no dm:parent. I wasn't aware that the use of the term "tree" in mathematics differed so widely from the use of the term in computer science - clearly it's the usage in computer science that's relevant here, not the usage in mathematics.

I think there are two features that characterize the use of "tree" in the DM: 1. "rooted": one node is designated to be the root (I am using the mathematical meaning of root here). 2. "maximal"/"a connected component of the graph induced by dm:parent" and this connected component happens to be a tree. In particular, it must include the node that has dm:parent not set (the "root" with our meaning, and which is also chosen to be the root of the tree with the mathematical meaning of bullet 1). I believe it is the second bullet point which implies that subtrees are not trees (but they would fulfill the first bullet point I think).

I thought it might be worth looking at some other uses of the term "tree" in the XML specification family. XML 1.05e has: * The document entity serves as the root of the entity tree * each leaf node in the syntax tree for the regular expression * There is exactly one element, called the root, ... Infoset: * This specification presents the information set as a modified tree * One of the element information items is the value of the [document element] property of the document information item, corresponding to the root of the element tree It seems to me that all these definitions use "tree" to mean "maximal tree" (a tree that is not part of a larger tree), and "root" to mean a node that is the root of a maximal tree. I think this is the usual usage in computing.

Michael Kay is right, I think, to suggest that it is not uncommon for the term "tree" to be used to refer to trees that are not parts of other trees; he is wrong, I think, to suggest that that is "the usual usage". (For one thing, if the usual usage of the term "tree" entails maximality, what on earth can a phrase like "a tree that is not part of a larger tree" possibly mean? And how is it possible for MK to use such a phrase with such confidence that any competent technical reader will understand what he's saying?) When we maintain a tree as a data structure for certain kinds of information, it is not at all uncommon for what GF and MK call the maximal tree to be the most common topic of interest. A reference, in context, to that data structure as "the tree" does not in any way entail the proposition that trees are not recursive, or that no node in that tree is a node in any other tree. All that a reference to "the tree" entails is (as is usual in English) that the reader can be expected to know which tree is meant -- there is no implication that no other trees exist, nor that the tree denoted has no proper subtrees. For this reason, I think that none of MK's examples entail the claim that the word "tree" as used in the QT specs has any particular connotation of maximalness. Each of the trees mentioned (the tree of entities of an XML document, the syntax tree for a regular expression, the element tree of an XML document, the tree constructed to hold all the information in an infoset, the element tree for the infoset of a document) are uniquely identified by their qualities. That doesn't mean they don't contain other trees as subtrees; it only means that the phrases "the entity tree", "the syntax tree" and so on serve to identity single trees, not groups of trees. A subtree of the entity tree is still a tree (and a node in that subtree is a node both in that subtree and in the full entity tree) -- it just happens not to be the tree denoted by the phrase "the entity tree". There are actually very few occurrences of the word "tree" in the QT specs that require being read as meaning "maximal tree constructed from the parent axis" to make sense. As I believe has been suggested before, they could easily be reworded not to require such a special reading of the word "tree", or the Data Model spec could observe, when introducing the concept that XDM nodes are arranged in trees, that occasionally an unqualified reference to "the tree" containing a certain node is a reference to the maximal tree constructed from the parent axis. (That is, the spec could define its use of the term "tree".) Conjectures about what is common usage are best supported by reference to data. I have already pointed, in other branches of this slow-motion discussion, to the usage in the handbooks of data structures and computer programming on my shelves. Those who wish to argue that the usage of Aho, Hopcroft, Knuth, Gonnet, Baeza-Yates, Wirth, and others is atypical of technical writing about computer programming really should provide some examples that support their claim. And in light of the examples cited in comment 4, it should probably be pointed out that to examples of the term "tree" which happen to denote a tree which is in some sense maximal do not in themselves support the claim that "tree" is commonly used to denote a non-recursive data structure: to support that claim, the example must be incompatible with the recursive interpretation of the term. None of the examples in comment 4 have that latter property: they are all perfectly compatible with the definitions of "tree" quoted in the bug report, as well as with the definitions to be found in the computing literature, which have I believe already been cited in this discussion. If this issue is now being discussed with a view to closing it, can we not adopt the wording change that led to the closure of bug 12535?

The WG does not propose to attempt to address this 1.0.

Why do the WGs not propose to attempt to address this in 1.0?