This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12535 - [XDM30] Definition of 'tree' missing and apparently non-standard
Summary: [XDM30] Definition of 'tree' missing and apparently non-standard
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Data Model 3.0 (show other bugs)
Version: Working drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Norman Walsh
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-20 13:29 UTC by C. M. Sperberg-McQueen
Modified: 2011-10-11 14:26 UTC (History)
2 users (show)

See Also:


Attachments

Description C. M. Sperberg-McQueen 2011-04-20 13:29:23 UTC
The XDM 3.0 draft of 14 December 2010 says, in section 2.1

    Nodes form a tree that consists of a root node plus all the
    nodes that are reachable directly or indirectly from the root
    node via the dm:children, dm:attributes, and dm:namespace-nodes
    accessors. Every node belongs to exactly one tree, and every
    tree has exactly one root node.

The same words occur in XDM 2.0 2E, published on the same date; 
bug 12534 has been opened against that spec; its description is
essentially the same as given here

Ideally, I'd be inclined to want an explicit definition in the XDM
spec of what is meant by 'tree' here.  

I suppose an argument might be made that it's never possible to
define absolutely everything; some notions end up being taken either
as primitive undefined notions that can't be defined because they
are primitives, or as well understood ideas that don't need to be
defined because the spec is using them in the same sense they have
in ordinary technical language.  This argument has a major problem,
though: the XDM spec cannot possibly be using 'tree' in any of the
ordinary senses, because they are incompatible with the words quoted
above.

In mathematics, a tree is (in the words of Wolfram MathWorld [1]),
"a simple, undirected, connected, acyclic graph (or, equivalently, a
connected forest)".  (Actually, in graph theory, 'tree' is routinely
given any of several different definitions, and it is an exercise in
elementary graph theory to prove that the different definitions are
all equivalent.  But Wolfram's definition will do.)  

But it is a direct consequence of the mathematical definition of
tree that any node belongs to multiple trees (in most practical
cases, to an infinite number of trees), because an undirected graph
is just a set of nodes with a relation defined over the set of
nodes, and any object that can be a member of a set is a member of
multiple sets, unless it is the only object in the universe of
discourse.  When a universe of discourse has an infinite number of
objects, any object will be a member of an infinite number of sets,
and for each of those sets there will be at least one relation on
the objects that defines a tree.

It is also a fact that in ordinary mathematical discourse trees are
not necessarily rooted; the trees XDM is interested in are directed
graphs, not undirected graphs, and all the trees have roots.

When 'tree' is used as the name of a data structure, it usually
involves the notions of directedness and rootednees.  But here, too,
any node can belong to multiple trees, and almost always will;
exceptions again involve universes of discourse with only one
object.  Even if we assume that a given parent : child relation is
given and fixed (and if the reader is expected to assume that, this
reader at least would really like to be told), the definition of
'tree' as a data structure is, in ordinary usage, formulated so as
to guarantee that any tree with multiple nodes has a number of
subtrees, each of which is by definition a tree.  The discussion of
trees as a datastructure in Wikipedia [2], for example, says
explicitly that any subtree is a tree.

So: either XDM is intending to use the term 'tree' in one of its
ordinary senses, or it's intending to use the term in a non-ordinary
sense.  In the former case, the statement about any node belonging
to a single tree is simply false and needs to be deleted or restated
in a way that isn't bogus.  In the latter case, this reader would
like to know what definition is intended.

My apologies for not noticing this problem earlier.


[1] Weisstein, Eric W. "Tree." From MathWorld--A Wolfram Web
Resource. http://mathworld.wolfram.com/Tree.html

[2] [Anonymous.]  "Tree (data structure)."  Wikipedia, last modified
18 April 2011.  http://en.wikipedia.org/wiki/Tree_(data_structure)
Comment 1 Norman Walsh 2011-05-31 15:58:24 UTC
Would the following rewording of the paragraph in question resolve your concerns?

  Every node is one of the seven kinds of nodes defined in 6 Nodes.
  Nodes form a rooted, directed, connected, acyclic graph (a tree)
  that consists of a single root node plus all the nodes that are
  reachable directly or indirectly from the root node via the
  dm:children, dm:attributes, and dm:namespace-nodes accessors. A
  node can have at most one parent. Nodes have identity (see 2.3
  Node Identity) and every node belongs to exactly one tree.

I think it's very likely that you understand precisely what the Data Model is trying to describe. If you have specific suggestions for how this terminology could be improved, I'd be delighted to hear them.
Comment 2 C. M. Sperberg-McQueen 2011-05-31 19:55:19 UTC
Several of the changes suggested in comment 1 seem helpful, but on the
whole I don't think the reformulation renders the text satisfactory.
Sorry.

Norm writes: 

    I think it's very likely that you understand precisely what the
    Data Model is trying to describe.
    
I am much less certain than Norm seems to be that I already know what
the text is trying to tell me here.  In particular, it's not obvious
to me how much of the current text of this paragraph (or indeed some
other parts of 2.1) is intended as the normative statement of a
particular property, how much is mentioning facts normatively
established elsewhere, and how much is intended just to lay out some
of the logical consequences of the normative statements, for clarity.

But I can't suggest constructive improvements for that situation until
I understand the spec better.

Michael Kay's response in comment 2 of bug 12534 suggests that he
believes the point of this passage is (rephrasing slightly) that for
any set of XDM nodes, there is a unique set of trees defined
implicitly by the dm:parent relation and that each XDM node belongs to
exactly one tree in that set.  As formulated, I think this is still a
little loose, and I don't think we should follow MK's suggestion that
'tree' is or should be taken to denote a non-recursive data structure.

But if MK has correctly identified the gist of the paragraph, then for
the moment I'd suggest something like:

(VERSION A)

    Every node is one of the seven kinds of nodes defined in 6 Nodes.

    The dm:children, dm:attributes, and dm:namespace-nodes properties
    of nodes organize them naturally into trees (rooted, directed,
    connected, acyclic graphs), each of which contains all the nodes
    reachable directly or indirectly from the root node via the
    dm:children, dm:attributes, and dm:namespace-nodes accessors.
    Most references to 'trees' in this specification refer to such
    'natural' trees.

    NOTE: Each node belongs to a set of one or more such 'natural'
    trees, exactly one of which tree has a root node with no dm:parent
    and contains all the others.

or possibly 

(VERSION B)
 
    Every node is one of the seven kinds of nodes defined in 6 Nodes.

    The dm:children, dm:attributes, and dm:namespace-nodes properties
    of nodes organize them naturally into trees (rooted, directed,
    connected, acyclic graphs), each of which contains a root node
    with no dm:parent and also all the nodes reachable directly or
    indirectly from the root node via the dm:children, dm:attributes,
    and dm:namespace-nodes accessors.  Most references to 'trees' in
    this specification refer to such 'natural' trees.

    NOTE: Each node belongs to exactly one such 'natural' tree.

Version A assumes that the children of a document node are fragments;
version B assumes that the children of a document node are not
fragments.  I do not know which is intended; nothing in the XDM spec
depends on the difference.

Some differences between the proposals above and the current text (and
Norm's suggested rewording) may as well be noted explicitly here, in
case it saves time.

  - (deletion of 'a')

    Nodes are organized into trees, not into "a" tree.

  - (explicit specification in version B that root nodes have no
    dm:parent)

    We need to specify that the root has no dm:parent, or the
    uniqueness claim is false.

  - Optionally, we could follow the current text more closely by
    adding at the end "and each such tree has exactly one root node".
    It's a true statement.  

    I think it's better left unsaid, however: the fact that each
    non-empty tree has exactly one root node is a consequence of the
    definition of tree.  We don't need to make a normative statement
    of it, and if we do, some readers will wonder whether we are
    clueless, occupying some alternative universe, or (this is where
    we came in) using some alternative definition of 'tree'.

  - (didn't add Norm's remark about nodes having at most one parent) 

    That a node has at most one dm:parent is already stated
    normatively elsewhere (5.11); that a node has at most one parent
    in a tree is a fact about trees.

  - (didn't add Norm's remark about nodes having identity)
 
    That a node has identity tells us pretty much nothing except
    perhaps to signal that XDM expects the reader to be using
    Aristotelian and not fuzzy logic.  Every thing we take seriously
    as a thing has identity (No entity without identity, as the slogan
    says) with the possible exception of some objects in quantum
    theory.  I think what is meant is that they have a form of object
    identity or referential opacity.  (This is also a problem with
    section 2.3, which I expect to raise in due course.)
Comment 3 Michael Kay 2011-05-31 23:07:54 UTC
I think it's best to stick to the current model whereby every node belongs to exactly one tree, and a root node is therefore defined as being a node that has no parent. If we adopted the alternative definition that every node is the root of a tree, an awful lot of our terminology would have to change.

(Let's remember that the term comes from botany, not from mathematics. Each apple in my garden grows on exactly one tree.)
Comment 4 C. M. Sperberg-McQueen 2011-05-31 23:55:56 UTC
I am not sure whether comment 3 amounts to a suggestion that we adopt version B of my proposal in preference to version A, or to a suggestion that there is no need to change the text of the spec, because everyone who matters already knows what it means without needing to bother about little things like what the words mean.

I think it's best if we try to write our specs so that readers of good will can wrest a coherent meaning from them without recourse either to oral tradition or to revelation.
Comment 5 Michael Kay 2011-06-01 09:00:09 UTC
Well, personally I find the statement that every node belongs to exactly one tree leaves little room for doubt. But there's always scope for making things clearer, especially for readers who come to the spec with a different idea about what the words mean and need those preconceptions to be corrected.
Comment 6 Norman Walsh 2011-09-06 14:00:15 UTC
I added the following note in an effort to resolve this issue:

<note diff="add" at="2011-09-06"><p>This definition of tree differs from the common mathematical
definition where every node can be seen as the root of the nodes below it
and consequently every node belongs to many trees. In XML, the term tree
refers to the directed graph that is rooted exclusively at the single
node that has no parent.</p>
</note>
Comment 7 Norman Walsh 2011-09-06 15:25:28 UTC
Michael says "no" (in IRC)

Thinking some more.
Comment 8 Norman Walsh 2011-09-06 15:53:37 UTC
Rather than try to redefine tree, maybe it'd be simpler to just reword the "exactly one tree" sentence that's troubling. How's everyone feel about this?

Every node is one of the seven kinds of nodes defined in 6 Nodes. Nodes form a tree. Each node has at most one parent (reachable via the dm:parent accessor) and descendant nodes that are reachable directly or indirectly via the dm:children, dm:attributes, and dm:namespace-nodes accessors. [Definition: The root node is the topmost node of a tree, the node with no parent.] Every tree has exactly one root node and every other node can be reached from exactly one root node.
Comment 9 Norman Walsh 2011-10-11 14:26:54 UTC
Implemented comment 8 as decided by the WG.