This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
When a document is constructed from an Infoset, or from a PSVI, or using a document node constructor in XQuery, it ends up having a string value that is the concatenation of all the descendant text nodes in the document. However, there appears to be no constraint in XDM that this is true of documents constructed in other ways, for example "synthetically" by an application. It appears to be possible for such document nodes to have a string value that is quite unrelated to the textual content of the document. The same is true (in a more complicated way) of element nodes. I'm sure that this was never intended. XSLT, unlike XQuery, does not say what the string value of a newly constructed document is. We all assumed that this was specified in XDM, but it seems that it isn't. Proposal: add a constraint to XDM.
I think Mike is right. I propose to add the following constraint to 6.1.1: 4. Regardless of how a document node is constructed, its string value must always be the concatenation of the string-values of all its Text Node descendants in document order or, if the document has no such descendants, the zero-length string. It's less clear what we should do in the element case. I'm inclined to something less crisp. In 6.2.1: 14. The string-value of an element node must be consistent with its typed value. That at least prevents some random construction process from creating an element with a typed value of 3.0 and a string-value of "New York State".
I think we should probably define the same constraint for element nodes (that is, the string value is the concatenation of the descendant text nodes). This seems to cover several cases: (a1) if the element has simple content and what the implementation actually stores is the string value, then it must behave as if it had a text node with that value (a2) if the element has simple content and what the implementation actually stores is the text node, then it must behave as if it had a string value with the same content as the text node (a3) if the element has simple content and what the implementation actually stores is the typed value, then it must generate the string value and the text node from this typed value in the same way - it can't present a string value of "3" and a text node of "003". (b) if the element has mixed content, then the string value must be the same as the typed value (c) if the element has element-only content, then it has no typed value, but the string value must be the same as the concatenation of text nodes, as in the case for document nodes.