This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29455 - [XSLT30] Clarify/update rules of xsl:on-empty and xsl:on-non-empty with respect to sequences of empty strings or arrays
Summary: [XSLT30] Clarify/update rules of xsl:on-empty and xsl:on-non-empty with respe...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-14 14:19 UTC by Michael Kay
Modified: 2016-10-06 18:42 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2016-02-14 14:19:42 UTC
The rules for xsl:on-empty say:

An xsl:on-empty instruction is evaluated only if every preceding sibling instruction, text node, and literal result element in the same sequence constructor returns either an empty sequence, or a sequence consisting entirely of zero-length text nodes and/or document nodes with no children.

In this case one of the preceding sibling instructions is

<xsl:sequence select="''"/>

which delivers a zero-length string. This is not an empty sequence, it is not a zero-length text node, and it is not a document node with no children, therefore according to the spec (whether we like it or not...) the xsl:on-empty should not be activated.
Comment 1 Michael Kay 2016-02-14 14:24:08 UTC
Also affects 113b, 114a, 114b.
Comment 2 Michael Kay 2016-02-17 09:59:36 UTC
We decided to change this to a spec bug.

We think it is desirable that the rules for xsl:on-empty and xsl:on-non-empty should be consistent with the rules for constructing complex content (and this design intent is already indicated by a Note in the spec). Specifically:

Consider the sequence constructor after removing any xsl:on-[non]-empty instructions. Apply the rules for constructing complex content, ignoring any error conditions. If the result is empty content, then on-empty fires (and if not, then on-non-empty fires).

The practical difference from the current rules is that some additional constructs are considered empty - for example, a zero-length string (but not if there are adjacent zero-length strings which would cause separators to be inserted).
Comment 3 Michael Kay 2016-02-17 10:16:59 UTC
Slight problem: consider

<xsl:sequence>
  <xsl:sequence select="''"/>
  <xsl:on-non-empty select="'NOT EMPTY'"/>
  <xsl:sequence select="''"/>
</xsl:sequence>

Here we don't know whether on-non-empty fires until we know what comes next - this breaks the current algorithm which is designed to ensure streamability.

We think this can probably be solved by breaking the sequence constructor into parts at <xsl:on-non-empty/> boundaries, and applying the CCC rules to each of the parts.

ACTION on the editor to propose spec wording that implements this idea.
Comment 4 Michael Kay 2016-02-17 10:32:53 UTC
Studying the effect on this example:

        <xsl:for-each select="1 to 100">
            <xsl:sequence select="''" />
            <xsl:on-empty select="'|'" />
        </xsl:for-each>

we realised that if the conditions are right for on-empty to fire (which, under the new rules, is the case here), then the content produced by on-empty must be output *instead of* the result of the SC, not *in addition to* that result. So some buffering is necessary, but only of tiny amounts of data.
Comment 5 Michael Kay 2016-02-18 09:10:18 UTC
I have spent some time thinking about this. I think that incorporating rules about separators between atomic values into xsl:on-empty and xsl:on-non-empty makes things excessively complicated, it makes a difference only in extreme edge cases, and it is unnecessary. I think we can achieve very close alignment with the rules for complex content construction (CCC) with the following changes.

PROPOSAL:

A. In 8.4.2, xsl:on-empty, replace

An xsl:on-empty instruction is evaluated only if every preceding sibling instruction, text node, and literal result element in the same sequence constructor returns either an empty sequence, or a sequence consisting entirely of zero-length text nodes and/or document nodes with no children.

by

[Definition: an item is *insignificant* if it is one of the following: a zero-length text node; a document node with no children; an atomic value which, on casting to xs:string, produces a zero-length string; or an array which on flattening (using the array:flatten function) produces either an empty sequence or a sequence consisting entirely of insignificant items.]

[Definition: an item is *significant* if it is not *insignificant*].

An xsl:on-empty instruction is triggered only if every preceding sibling instruction, text node, and literal result element in the same sequence constructor returns either an empty sequence, or a sequence consisting entirely of *insignificant* items.

If an xsl:on-empty instruction is triggered, then the result of the containing sequence constructor is the result of the xsl:on-empty instruction.

<Note>This means that the (insignificant) results produced by other instructions in the sequence constructor are discarded. This is relevant mainly when the result of the sequence constructor is used for something other than constructing a node: for example if it forms the result of a function, or the value of a variable, and the function or variable specifies a required type.

When streaming, it may be necessary to buffer insignificant items in the result sequence until it is known whether the result will contain items that are not insignificant. In many common situations, however - in particular, when the sequence constructor is being used to create the content of a node - insignificant items can be discarded immediately because they do not affect the content of the node being constructed.
</Note>

<Note>In nearly all cases, the rules for xsl:on-empty are aligned with the rules for constructing complex content. If the sequence constructor within a literal result element or an xsl:element instruction includes an xsl:on-empty instruction, then the content of the element will be the value delivered by the xsl:on-empty instruction if and only if the content would otherwise be empty. There is one minor exception to this rule: if the sequence constructor delivers multiple zero-length strings, then in the absence of the xsl:on-empty instruction the element would contain whitespace, made up of the separators between these zero-length strings; but xsl:on-empty takes no account of these separators.</Note>

<Note>Attribute and namespace nodes created by the sequence constructor are significant; the xsl:on-empty instruction will not be triggered if such nodes are present. If this is not the desired effect, it is possible to partition the sequence constructor to change the scope of xsl:on-empty, for example

<ol>
  <xsl:attribute name="class" select="numbered-list"/>
  <xsl:sequence>
     <xsl:value-of select="xyz"/>
     <xsl:on-empty select="'The list is empty'"/>
  </xsl:sequence>
</ol>

</Note>

B. In 8.4.3, xsl:on-non-empty, change:

An xsl:on-non-empty instruction is evaluated only if there is at least one sibling node in the same sequence constructor, excluding xsl:on-empty and xsl:on-non-empty instructions, whose evaluation yields a sequence containing an item other than a zero-length text node or a document node with no children. If this condition applies, then all xsl:on-non-empty instructions in the containing sequence constructor are evaluated, and their results are included in the result of the containing sequence constructor in their proper positions.

to:

An xsl:on-non-empty instruction is evaluated only if there is at least one sibling node in the same sequence constructor, excluding xsl:on-empty and xsl:on-non-empty instructions, whose evaluation yields a sequence containing an item that is *significant*. If this condition applies, then all xsl:on-non-empty instructions in the containing sequence constructor are evaluated, and their results are included in the result of the containing sequence constructor in their proper positions.

<Note>
xsl:on-non-empty is typically used to generate headers or footers appearing before or after a list of items, where the header or footer is to be omitted when there are no items in the list.
</Note>

C. In 8.4.4, change clause 4(b) from

Otherwise, the instruction is evaluated and its results are appended to R.

to

Otherwise, the existing contents of R are discarded, the instruction is evaluated, and its results are appended to R. 

<note>The need to discard items from R arises only when all the items in R are insignificant. Streaming implementations may therefore need a limited amount of buffering to retain insignificant items until it is known whether they will be needed. However, in many common cases an optimized implementation will be able to discard insignificant items such as empty text nodes immediately, because when a node is being constructed using the rules in CCC or CSC, such items have no effect on the final outcome.</note>

D. The above rules align the treatment of arrays between xsl:on-empty, xsl:on-non-empty, CCC, and CSC: for example in each case [[],[]] is treated the same as []. It seems desirable also to bring the rules for xsl:where-populated into line. Therefore, in section 8.4.1, change

o An array (see 27.7.1 Arrays) with no members.

to

o An array (see 27.7.1 Arrays) where the result of flattening the array using the array:flatten function is either an empty sequence, or a sequence in which every item is deemed empty (applying these rules recursively).
Comment 6 Abel Braaksma 2016-02-20 14:18:38 UTC
(In reply to Michael Kay from comment #5)
> PROPOSAL:
> <snip />
I think your proposal is actually better than what we conjured up during the F2F. The noted differences with the CCC and CSC rules wrt spaces are an understandable trade-off and may even lead to less surprises than the reverse (i.e., where we had the rule that if a seqtor would get separators inserted, it was considered non-empty).

I think it'd be a good idea to add the edge-case to the examples, for instance:

<xsl:value-of separator="|">
    <xsl:sequence select="foo/bar/normalize-space()" />
    <xsl:on-empty>No bars found!</xsl:on-empty>
</xsl:value-of>

with the following input:

<foo>
  <bar> </bar>
  <bar></bar>
  <bar>   </bar>
</foo>

this would create "||" without xsl:on-empty, and "No bars found!" with xsl:on-empty. That is a "surprise effect", as in that a not-empty result is "deemed empty" wrt our rules, which is why I suggest we include such, or a similar, example.

----------------------

On the new rules themselves: since they no longer rely on CCC or CSC rules, whether or not the include expansion of [xsl:]use-attribute-sets and literal or AVT attributes in LRE's should be mentioned. My take would be to:

1) let the result of use-attribute-sets be part of significant items
2) let literal and AVT's *not* be part of the significant items

The reason behind this is that use-attribute-sets requires extra processing and removal of duplicates, and they may have seqtors themselves (though in streaming they are required to be motionless).

Literal attributes and AVT-attributes are not visually part of the sequence constructor and should therefore not be part of it.

I think we should mention something similar to <xsl:copy copy-namespaces="yes">, which I think should *not* be part of the significant items, only xsl:namespace should.

xsl:attribute instructions are already mentioned as being significant, I think that is a good thing.
Comment 7 Michael Kay 2016-03-31 15:32:24 UTC
The minutes from 10th March record:

Terminology: vacuous, negligible, ... Reluctantly settling on vacuous...

ACTION-2016-03-10-005: Mike Kay to introduce the term "vacuous" for empty in
xsl:on-empty.

I think that we decided to accept the proposal in comment 5 changing "insignificant" to "vacuous", but we should confirm this.
Comment 8 Michael Kay 2016-03-31 23:51:17 UTC
The proposal was accepted and has been applied.