3849 – [XQuery] Copied nodes and in-scope namespaces

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3849 - [XQuery] Copied nodes and in-scope namespaces

Summary: [XQuery] Copied nodes and in-scope namespaces

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XQuery 1.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Don Chamberlin
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-10-18 22:23 UTC by Hans-Juergen Rennau
Modified:	2006-11-17 09:56 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Hans-Juergen Rennau 2006-10-18 22:23:42 UTC

This comment refers to the scope of the concept copied nodes and, in consequence, to the control of in-scope namespaces of constructed nodes. Two alternative changes of the specification are proposed, one merely editorial, the other changing semantics. The proposals are preceded by a motivation.

Motivation

The concept of copied nodes is crucial to the control of in-scope namespaces. The term is quite suggestive and seems to distinguish nodes copied into the result from those constructed within the query. However, of course, the scope is broader, including any node added to the content of a constructed element or document node by evaluating an enclosed expression. In consequence, any constructor occurring between curly braces (either delimiting an enclosed expression, or a function body) contributes to the result a copied node (or no node at all, depending on context). In particular, it holds that the descendants of constructed document nodes and of computed element nodes are necessarily copied nodes. Example: the query

declare copy-namespaces no-preserve, inherit;
declare namespace ns1=example.ns1;
document {
   <a xmlns:ns1=example.ns1>
     <ns1:b>bla1</ns1:b>
     <ns1:b>bla2</ns1:b>
     <ns1:b>bla3</ns1:b>
   </a>
}

produces the output:

<a> 
   <ns1:b xmlns:ns1=example.ns1>bla1</b:ns1>
   <ns1:b xmlns:ns1=example.ns1>bla2</b:ns1>
   <ns1:b xmlns:ns1=example.ns1>bla3</b:ns1>
</a>

So the copy-namespaces settings have a broader scope than perhaps generally appreciated. Checking several textbooks, I did not find any hints to this matter, so I believe the specification should strive at clarification. Please kindly consider the two proposals meant as alternatives.

Alternative A: editorial change

To enhance clarity, the term copied nodes may be linked to an explicit definition, rather than be introduced in running text, and this definition may be supplemented by a note emphasizing the true scope of the term.

Alternative B: semantic change

It may be considered if the present scope of copied nodes should be restricted to what the term suggests, that is, exclude nodes constructed within the enclosed expression and delivered into the result by the constructor expression itself, rather than by reference via variable or path. (Treatment of nodes created within functions would deserve special thought.)

Afterword: I apologize for the great length of this comment. What I have in mind is that the inadvertent disappearance of a namespace declaration attribute (e.g. effected by no-preserve, the setting recommended by an excellent textbook) can lead to a proliferaton of namespace declaration attributes further down in the tree, exploding the size of the result and spoiling its readability.

Comment 1 Michael Kay 2006-10-19 09:05:39 UTC

Allow me to attempt a personal response, since the WG seems to have appointed me kicking and screaming into the unsolicited role of chief namespace guru.

Firstly, is the current text clear? I think it is. Section 4.9, describing the copy-namespaces declaration in the prolog, states clearly what its effect is on the static context. It clearly states that it sets the "copy-namespaces mode", and gives some helpful cross references as to where this is used.

Copy-namespaces mode is used only in section 3.7.1.3, clause 1.e.ii.D, where it explains the effect on "copied element nodes". Clause 1.e.ii starts "For each node returned by an enclosed expression, a new copy is made of the given node and all nodes that have the given node as an ancestor, collectively referred to as *copied nodes*. " I think that this clearly explains what is meant by "copied nodes" within the body of this clause.

I agree it's not easy reading. Nothing to do with namespaces ever is. And I can appreciate the cause of the misunderstanding: your example doesn't feel like one that is copying anything, so it's not intuitive that a declaration that talks about copying is actually relevant. But I don't think we're suddenly going to make namespaces comprehensible to the average punter (who doesn't read textbooks, let alone the spec) just by choosing a more judicious keyword, even if we could come up with one.

Now to your suggested change:

>>It may be considered if the present scope of copied nodes should be restricted to what the term suggests, that is, exclude nodes constructed within the enclosed expression and delivered into the result by the constructor expression itself, rather than by reference via variable or path.

I'm afraid it's not at clear to me how such a change could be defined, even if it were desirable. XQuery is an expression language, with orthogonality as one of the design aims: the treatment of nodes in the result of an enclosed expression should therefore not depend on how and when those nodes were created, or on the syntactic form of the expression that created them.

Is a change to the semantics desirable? It seems to me that "copy-namespaces no-preserve" is a statement that you don't want namespace declarations on an element unless they are actually needed on that element. So in your example, you got exactly what you asked for. You didn't have to ask for it.

I do think (and I have argued this in the working group) that finer control over construction of namespace nodes will be needed by some users. A global switch at the prolog level is really too crude. A lot of people scream at me when I suggest this, because they say (rightly) that it's quite complicated enough already. But in XSLT 2.0, in the light of user experience, we added copy-namespaces="yes|no" at the level of an individual instruction, and we added an xsl:namespace instruction to construct individual namespace nodes dynamically; I think that field experience with XQuery will reveal the same requirements. But this is definitely for consideration only in a future version.

I will recommend to the WG that we close this with no action, and I would appreciate your concurrence with this.

Comment 2 Hans-Juergen Rennau 2006-10-22 01:11:33 UTC

Thank you very much for the detailed analysis and discussion! You explained how a change in the treatment of enclosed expressions would put fundamental design aims at stake. This shed new light on the matter; so my fresh understanding is as follows:

(a) on first glance only those nodes of the expression result need to be actually copied rather than taken as is and simply attached to their new parent which existed prior to the expression evaluation (typically: nodes of an input document); (b) however, this amounts to treating the nodes differently according to information not contained in static/dynamic context and node properties, thus braking a key principle. Bottom line: unconditional copying is inevitable, even though it has an effect on in-scope namespaces which may be regarded as undesirable. (I would appreciate it if you correct this view, should it be wrong.)

At any rate, the real issue is, as you pointed out, fine-grained control of the namespaces policy. It is a pity that your proposal was not accepted. I simply cannot understand that it was argued against as making things more complicated. On the contrary, it would make things simple, whereas now they are mind-boggling. This view is based on the assumption that the real intent of introducing the copy-namespaces mode is controlling the namespaces in document fragments indeed copied from existing sources, and that the effect on nodes constructed anew within constructors is often (to say the least) a side-effect, rather than intended. Of course query results can be quite complex, containing as well fragments copied from input, as parts constructed in the query. So when the need arises to control the in-scope namespaces in a copied fragment, how to protect the constructed part of the result from the side effects? If, say, a constructed fragment has to be output only on the first of the month, it inevitably slips into an enclosed expresson and suffers from settings meant for a copied fragment. With fine-grained control, everything is simple: set the usual default settings in the prolog and wrap the copying region in a redeclare-copy-namespaces expression. Such a redeclaration expression would be perfectly analogous to Ordered / Unordered expressions: simply delimiting a query region with a changed setting. I cannot imagine that any user would suffer any inconvenience: like the Ordered expression, he uses it, or ignores it.

(However, even better than introducing a special redeclare-copy-namespaces expression would be a general reset-static-context expression, which could change any combination of settings controlled by the prolog Setters (production rule 7). This approach would solve the problem of properly nesting query regions with different settings. The introduction of such an expression could be smoothed by defining any single resettable mode as optional feature. But of course these musings refer to a future version of XQuery.)

Finally, I do not agree that a clarifying note concerning the scope of the copy-namespaces declaration is superfluous. You are right in this: the present text is absolutely clear and unambiguous. But the real problem is not difficult reading of a paragraph, as you suggest: the problem is that right understanding requires to link strewn information together in a conscious act whereas a suggestive term seems to make such an effort unnecessary. Think of a computed element constructor. Everybody, I suppose, looks at it as nothing else than a syntactical alternative to a direct constructor, required in the special case that the element name has to be computed. Most easily it is overlooked that the mere syntax (!) enforces any child element node to be a copied node. And this state of affairs is only detected when following a link. Please remember the fact that the specification is a vital reference for developpers, because hardly any textbook on XQuery is new enough to be a reliable source of informaton. The developper looks up paragrah 4.9 and more likely than not understands an existing node copied by an element constructor as nodes copied from input. This error can even be found in an excellent XQuery textbook. I suggest a short note in Section 4.9:

Note: It is important to be aware that the copy-namespaces mode applies not only to nodes copied from external sources, but as well to nodes constructed within enclosed expressions in the content of element or document constructors. By implication, the mode applies to any descendants of computed element nodes and of document nodes.

Or something similar. However, should you reject my proposal also after my new arguments, I will accept your decision.

Hans-Juergen Rennau

PS: At any rate, the definition of copy-namespaces declaration and copy-namespaces mode should be corrected. Instead of copied by an element constructor they should read: copied by an element constructor or by a document constructor.

Comment 3 Michael Kay 2006-10-23 15:40:22 UTC

>>(a) on first glance only those nodes of the expression result need to be
actually copied  rather than taken as is and simply attached to their new
parent  which existed prior to the expression evaluation (typically: nodes of
an input document); (b) however, this amounts to treating the nodes differently
according to information not contained in static/dynamic context and node
properties, thus braking a key principle. 

I think you're confusing the language semantics and the behaviour of an actual implementation. Implementations can often avoid making an unnecessary copy; optimizers can use any information they want to make the code faster (of course, if they don't make a copy, then they must behave exactly as if they did). That doesn't affect the language semantics and doesn't break any language design principles.

>>At any rate, the real issue is, as you pointed out, fine-grained control of the namespaces policy....

If your result document contains QNames in content, you need to use copy-namespaces preserve. If it doesn't, you can safely use no-preserve. I think that's a fairly simple rule. It's true that a consequence of using no-preserve is that namespaces may be declared further down the tree than you would like, from the point of view of minimizing clutter in the serialized document. But that's largely aesthetic, it doesn't affect whether downstream applications using the XML work or not.

>> Finally, I do not agree that a clarifying note concerning the scope of the
copy-namespaces declaration is superfluous. You are right in this: the present
text is absolutely clear and unambiguous. But the real problem...

We're writing a specification, not a tutorial. We've always taken the view that it's not the job of the language specification to give advice and education. If existing textbooks aren't good enough, that's hardly surprising given that the specification is not yet frozen. I'm sure good books will appear in time. I might even write one myself.

>> I suggest a short note in Section 4.9: ...

I'll leave that one for the editor to respond to. Logistically, it's not a good time to be making editorial improvements at the moment.

Michael Kay
personal response

Comment 4 Don Chamberlin 2006-11-02 01:19:02 UTC

Hans-Juergen,
Thanks for your comment. On Nov. 1, 2006, the Query Working Group considered your comment and decided not to add the suggested Note, for the reasons outlined by Michael Kay in Comment #3. However, we will correct the error you identified in the P.S. to Comment #2. In XQuery Section 4.9, Copy-Namespaces Declaration, we will change "... copied by an element constructor" to "... copied by an element constructor or document constructor." If you are satisfied with this resolution, please change the status of this bug to Closed.
Regards,
Don Chamberlin (for the Query Working Group)

Comment 5 Hans-Juergen Rennau 2006-11-17 09:56:16 UTC

Don and working group members,

thank you for considering my report and the associated comments. I accept the decision (with a sigh) and close this bug report.

But one afterword. In the meantime I checked eight implementations which claim to conform to XQuery candidate recommendation of Nov 2005, or later. As a test case I used the example in this bug report. Result: one pass (Saxon), seven failures; among the  failures, two implementations in fact implemented the semantic alternative discussed in this bug report. (A surprise to me.) It is a pity that presently the copy namespaces mode is a dark corner of XQuery, as far as implementations are concerned, already not speaking of textbooks.