29442 – [XSLT30] Some tweaks may be needed on 5.7 Sequence Constructors to create a better understanding between the two phases involved and the effect of build-tree

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29442 - [XSLT30] Some tweaks may be needed on 5.7 Sequence Constructors to create a better understanding between the two phases involved and the effect of build-tree

Summary: [XSLT30] Some tweaks may be needed on 5.7 Sequence Constructors to create a b...

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XSLT 3.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-02-09 19:49 UTC by Abel Braaksma
Modified:	2016-10-06 18:42 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Abel Braaksma 2016-02-09 19:49:44 UTC

I have tried reading this several times and keep reading in circles, so I think there is a bug. 

1) Say you have:

<xsl:template match="/">
   <xsl:map-entry key="foo" select="42" />
</xsl:template>

You now have an xsl:template instruction with a sequence constructor. Following the rules in 5.7 we end up at Rule 8: raise XTDE0450 when the sequence contains a function item.

2) Say you have:

<xsl:function name="f:foo">
   <xsl:sequence select="name#1" />
</xsl:function>

Same as in (1), you have to raise XTDE0450 on every function call.

3) Say you have:

<xsl:variable name="f:arr" as="array(*)">
   <xsl:sequence select="arr:create(1)" />
</xsl:variable>

Different from (1), as in Rule 2 we have to flatten the array first. After flattening, it becomes a sequence of atomics, which are cast to a string. This is not roundtrippable, so the result (the "supplied value") cannot be cast back to an array.

4) Say you have:

<xsl:function name="f:find" as="xs:integer*">
    <xsl:for-each select="1 to 10">
        <xsl:if test=". = (5, 7)">
            <xsl:sequence select="." />
        </xsl:if>
    </xsl:for-each>
</xsl:function>

Here the inner seqtor returns two single integers, that are atomized (Rule 3) and cast to a string. The seqtor or xsl:for-each returns thus a sequence of two string "5" and "7". These are concatenated into the string "5 7" which is now not castable to xs:integer* anymore, failing each time with a cast error.

5) Example from 11.10 in the spec

<xsl:variable name="values" as="xs:integer*">
    <xsl:sequence select="(1,2,3,4)"/>
    <xsl:sequence select="(8,9,10)"/>
</xsl:variable>

For the same reasons as (4), this will fail.

6) Say you have:

<xsl:variable name="txt" as="text()">
    <xsl:text>a</xsl:text>
    <xsl:text>b</xsl:text>
</xsl:variable>

If you evaluate this you end up with Rule 7: adjacent text nodes are merged into one. That means that the above example should work and NOT throw an error, however most implementations (Saxon and Exselt for one) throw a cast error here.

I think it is debatable what should happen here. I have create two tests, seqtor-041, seqtor-042 to that effect.

7) Rule 11

The above examples were written without Rule 11 in mind, which reads:

"Each node in the resulting sequence is attached as a namespace, attribute, or child of the newly constructed element or document node."

That means that *each and every* sequence constructor we encounter creates nodes. That is clearly not what is supposed to happen.

---------------

The above examples are supposed to be a literal reading of this part (section 5.7) of the spec. It is very well possible that I miss the obvious here, because all example, except perhaps for item (6) are without a doubt not how these instructions are supposed to behave.

Comment 1 Michael Kay 2016-02-09 20:34:00 UTC

Error XTDE0450 is part of "constructing complex content", and none of your examples invokes the rules in "constructing complex content". These rules are invoked only when creating the content of element and document nodes.

Comment 2 Abel Braaksma 2016-02-09 21:41:48 UTC

Thanks, I knew I was missing the obvious and I think I have been stymied by this before.

In section 5.7 there are only two sections on constructing content, either simple or complex. The section on complex mentions an open-ended list: "...by evaluating the sequence constructor contained in an instruction such as xsl:copy, xsl:element, xsl:document, xsl:result-document, or a literal result element."

This led me to believe that this list is anything that is not in the all-inclusive list mentioned under Simple Content.

I see now that somewhat hidden in the preamble of this section there's a line that simply says: 

"The result of evaluating a sequence constructor is the sequence of items formed by concatenating the results of evaluating each of the nodes in the sequence constructor, retaining order."

Which, in hindsight, is probably the crux I was missing upon reading and re-reading.

Another caveat that led me into circles was:

"This section describes how the sequence obtained by evaluating a sequence constructor may be used to construct the children of a newly constructed document node...."

the offending part being "may be", which I (again, in hindsight) read as "it may be used for creating nodes, but this section describes how you evaluate it".

Finally, in the list of instructions that create complex content, also xsl:copy is mentioned. I think this is true (the only hierarchical data types are elements and document nodes, for which the seqtor is evaluated), so even when the seqtor of xsl:copy returns a function item, it can only ever become a child of a node (error!) or it is never evaluated (not an error).

I would like to suggest a few minor tweaks to the text to make it clear(er), perhaps we can discuss it briefly at the F2F. Suggestion like:

- make the list of instructions that trigger complex content a complete list
- add the "hidden" line of the preamble in its own section, say "processing sequence constructors" and again create a complete list, or make explicit that this is "anything but the instructions mentioned under Complex/Simple.
- make explicit, or add a note, that if Complex/Simple is not in effect, normalization does NOT take place and text nodes, even empty ones, are not concatenated, nor removed.
- some variant of the above

(though I am glad the bug appeared not to be the bug I thought it was...)

Comment 3 Abel Braaksma 2016-02-09 23:29:41 UTC

I think what I am looking for is something along these lines:

The result of a sequence constructor is dependent on the immediate instruction it is contained by and is one of the following:

1) The following seqtors create [#LINK Simple Content].
   a) Instructions xsl:attribute, xsl:comment, xsl:processing-instruction, xsl:namespace, and xsl:value-of 
   b) Attribute value templates and text value templates

2) The following seqtors create [#LINK Complex Content].
   a) Instructions xsl:element, xsl:document, xsl:copy, xsl:message, xsl:assert, xsl:variable/param/with-param without an as-attribute
   b) The declarations , xsl:variable/param without an as-attribute
   c) Literal result elements
   d) The instruction xsl:result-document when the effective value for build-tree="true".
   e) The implicit xsl:document instruction in [f:construct-result-tree] if result-tree construction is requested for the entire transformation (usually by setting build-tree="true").

3) The following seqtors create [#LINK Sequences of items]
   a) Instructions xsl:fallback, xsl:break, xsl:choose, xsl:for-each, xsl:for-each-group, xsl:if, xsl:iterate, xsl:map, xsl:map-entry, xsl:merge-key, xsl:on-empty, xsl:on-non-empty, xsl:perform-sort, xsl:sequence, xsl:stream, xsl:try, xsl:variable/param/with-param with an as-attribute, xsl:where-populated, 
   b) The declarations xsl:key, xsl:function, xsl:template and xsl:variable/param/with-param when an as-attribute is present
   c) The non-instructions xsl:accumulator-rule, xsl:otherwise, xsl:when, xsl:catch, xsl:merge-action, xsl:merge-key, xsl:on-completion, xsl:matching-substring, xsl:non-matching-substring and xsl:sort
   d) The instruction xsl:result-document when the effective value for build-tree="false".
   e) The result of the entire transformation when result-tree creation is not requested (usually by setting build-tree="false" on xsl:output)

4) Extension instructions that take sequence constructors can fall in either or none of the above categories depending implementation-defined semantics.


The first two links are obvious, the third would go to a new section that simply has the quoted line from Comment#2.

I tried to be complete. Even if we decide to do nothing, at least we have a list the enumerates what instruction creates what, a cause of common confusion (and I'm afraid this won't be the last time I am stymied by it...).

As a by-product of going over each instruction and finding out whether or not they have a link to 5.7.1 or 5.7.2 showed a (tiny) additional editorial bug in the spec: xsl:result-document is listed under 5.7.1, but this is only true when build-tree="true".

I listed xsl:on-empty and xsl:on-non-empty with "Sequence" instead of complex/simple content, because in fact its result, *after* evaluating the sequence constructor in the normal, non-normalizing way, it is added to the result of its containing instruction which is then normalized if necessary.

Comment 4 Michael Kay 2016-02-09 23:50:40 UTC

>The result of a sequence constructor is dependent on the immediate instruction
it is contained by

I think we should try and maintain proper orthogonality. The result of a sequence constructor is the same, regardless. The containing instruction then takes this result and does different things with it. Because many instructions use it either to construct complex content or to construct simple content, we factor out these usages so they don't have to be repeated ad nauseam. But the algorithms given in these two sections are logically part of the behaviour of the calling instruction, not part of the behaviour of the sequence constructor.

I'm reluctant to make wholesale changes to text that is largely the same as XSLT 2.0 and has stood the test of time. A few clarifying notes perhaps, but not a fundamental change to the processing model.

Comment 5 Abel Braaksma 2016-02-10 00:58:53 UTC

I'm starting to understand the logic and its separation. For me, having "constructing complex content" as part of 5.7 Sequence Constructors has always been viewed as a monolithic, albeit two-phased processing model.

In general, I think we are in agreement. I do not propose to change the programming model. And I think that by en large the information that I suggest here is already in place, just not as clear as it could be.

In part, I think what convolutes things a tad more than they have to is that in XSLT 3.0 we allow instructions that previously just returned a document node to return a sequence (xsl:result-document, the transformation result).

I'm sorry my comment appeared as a suggestion for a complete rewrite, that was not my intent. I agree that at best a few tweaks are needed. The lists I enumerated below are already present, but the spec seems not entirely correct or complete, fixing that should not change anything in the processing model.

Comment 6 Michael Kay 2016-02-15 15:54:14 UTC

Agreed that we don't need to make any technical changes here, but there is scope for improved presentation. Editor to make suggestions.

Comment 7 Michael Kay 2016-02-23 22:26:11 UTC

Proposed changes:

(a) change

The result of evaluating a sequence constructor is the sequence of items formed by concatenating the results of evaluating each of the nodes in the sequence constructor, retaining order.

to

[Definition: The result of evaluating a sequence constructor -- referred to as the *raw result* of the sequence constructor -- is the sequence of items formed by concatenating the results of evaluating each of the nodes in the sequence constructor, retaining order.]

(b) change

"There are several ways the result of a sequence constructor may be used."

to

"The way that raw result of a sequence constructor is used depends on the containing element in the stylesheet, and is specified in the rules for that element. It is typically one of the following:"

(b) rewrite the first bullet and the subsequent Note as follows:

* The raw sequence may be bound to a variable or delivered as the result of a stylesheet function. In this case the "as" attribute of the containing variable or function may be used to declare its required type, and the *raw result* is then converted to the required type by applying the *function conversion rules*.

Note:

* In the absence of an "as" attribute, the result of a function is the raw sequence; but the value of a variable (for backwards compatibility reasons) is a document node whose content is formed by applying the rules for *Constructing Complex Content* to the raw sequence.

* The function conversion rules do not merge adjacent text nodes or insert separators between adjacent items. This means it is often inappropriate to use xsl:value-of in the body of xsl:variable or xsl:function, especially when the intent is to return an atomic result. The xsl:sequence instruction is designed for this purpose, and is usually a better choice.

* The result of a function, or the value of a variable, may contain nodes (such as elements, attributes, and text nodes) that are not attached to any parent node in a result tree.  The semantics of XPath expressions when applied to parentless nodes are well-defined; however, such expressions should be used with care. For example, the expression / causes a type error if the root of the tree containing the context node is not a document node.

* Parentless attribute nodes require particular care because they have no namespace nodes associated with them. A parentless attribute node is not permitted to contain namespace-sensitive content (for example, a QName or an XPath expression) because there is no information enabling the prefix to be resolved to a namespace URI. Parentless attributes can be useful in an application (for example, they provide an alternative to the use of attribute sets: see 10.2 Named Attribute Sets) but they need to be handled with care.


(c) change the introductory paragraph of "Constructing Complex Content" from

This section describes how the sequence obtained by evaluating a sequence constructor may be used to construct the children of a newly constructed document node, or the children, attributes and namespaces of a newly constructed element node. The sequence of items may be obtained by evaluating the sequence constructor contained in an instruction such as xsl:copy, xsl:element, xsl:document, xsl:result-document, or a literal result element.

to

Many instructions, for example xsl:copy, xsl:element, xsl:document, xsl;result-document, and literal result elements, create a new parent node, and evaluate a sequence constructor forming the content of the instruction to create the attributes, namespaces, and children of the new parent node. The raw result of the sequence constructor is processed to create the content of the new parent node as described in this section.

(d) Use the new term "raw result" elsewhere where it aids understanding.

Comment 8 Michael Kay 2016-02-25 20:45:14 UTC

Following discussion and decision at the WG telcon today, I have applied these changes (with a little editorial discretion), substituting the term "immediate result" for "raw result".

Note: the "raw result" of a transformation obtained by invoking a particular component (template or function) is the "immediate result" of evaluating the sequence constructor contained in that template or function, converted if necessary to the declared type by applying the function conversion rules.