This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25185 - Usage absorption can take crawling expressions when TDU derives from xs:anyAtomicType
Summary: Usage absorption can take crawling expressions when TDU derives from xs:anyAt...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC Windows NT
: P2 enhancement
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-27 23:39 UTC by Abel Braaksma
Modified: 2014-08-22 11:36 UTC (History)
0 users

See Also:


Attachments

Description Abel Braaksma 2014-03-27 23:39:31 UTC
Functions that take an atomized value that has an occurrence indicator of one or zero-or-one, can be allowed to take a crawling expression as an argument.

This is true, because it is an error if the expression returns more than one node, and it is possible in the same way as for fn:count(x) to determine at runtime whether more than one nodes are returned, resulting in the dynamic error XPTY0004[1].

This simplifies scenarios where the user is not interested in the depth of a certain node, but knows beforehand that there will only ever be one matching node, and if not, accepts it as an error scenario.

Example:

<xsl:value-of select="string(proto//version)" />

Currently, this is not streamable, but if there are more than one matches, this would result in an error. If there is zero or one match, it is streamable. Hence, it is streamable in both cases and we can consider this a normal consuming expression.

This rule can apply to all functions (and therefor can simply be added to the general streamability rules), even user-defined ones, that have an argument that derives from xs:anyAtomicType with occurrence indicator zero, or zero-or-one. For instance, included are fn:ceiling, fn:dateTime, fn:string, fn:concat, fn:format-date, fn:error, but excluded are fn:data, fn:deep-equal, fn:min, fn:max.

This bug report was "inspired" by researching backwards compatibility behavior for bug 24506, comment 5.

[1] http://www.w3.org/TR/xpath-30/#ERRXPTY0004
Comment 1 Michael Kay 2014-03-27 23:59:28 UTC
I think you are right.

(You say "it is an error if the expression returns more than one node", which is not strictly correct; it's OK if the expression returns more than one node provided the atomized value is a singleton. So all selected nodes except one must have a typed value that is an empty sequence)

Some cases may be tricky to handle, but I think it works in theory.
Comment 2 Michael Kay 2014-03-28 10:04:56 UTC
For info, Saxon handles this case because before it does streamability analysis, it does type checking, and where an expression such as string(//title) expects a singleton, it rewrites it as string(zero-or-one(data(//title))). The zero-or-one() makes it streamable under the current rules.
Comment 3 Abel Braaksma 2014-03-28 12:30:36 UTC
> and where an expression such as string(//title) expects a 
> singleton, it rewrites it as string(zero-or-one(data(//title))).

actually, under current streamability rules, that would not be streamable, because of the addition of fn:data() and the crawling posture of //title. If you were to rewrite it as string(zero-or-one(//title)) it would work.
Comment 4 Michael Kay 2014-03-28 12:38:03 UTC
>If you were to rewrite it as string(zero-or-one(//title)) it would work.

Yes but it would produce the wrong answer if there are two title elements and one of them has an empty sequence as its typed value.
Comment 5 Michael Kay 2014-05-14 17:39:57 UTC
I've convinced myself this is streamable, though it's quite tricky in pathological cases, for example where the crawling sequence includes both list-valued elements and their text node children. If the crawling sequence includes both the element

<list> </list>

and its whitespace text node child, then the atomized value of the element is an empty sequence and the atomized value of the text node is a whitespace string, so the atomized value of the node sequence is the single whitespace xs:string value " "...
Comment 6 Michael Kay 2014-05-14 18:12:32 UTC
In fact, the logic for atomizing a sequence under these conditions is not really any easier than atomizing an arbitrary crawling sequence. The only difference is that for an arbitrary crawling sequence you potentially need memory proportional to the size of the largest atomic value times the depth of nesting of nodes in the sequence.

Because expressions like //title when used in an atomizing context almost invariably do NOT select overlapping nodes, I suggest that rather than handle the special case where the required value is singleton atomic, we allow all cases where the required type is an atomic sequence. That is, we deem data(X) to be streamable if X is crawling.
Comment 7 Michael Kay 2014-05-16 10:04:42 UTC
PROPOSAL
========

The proposal is that we distinguish atomization from other absorption operations. For atomization, we permit the operand to be crawling.

Specifically, we introduce a fifth kind of operand usage, called atomization, which differs from absorption in that in the general streamability rules, in the table in 1.b.iii.B, the entry for "Atomization/Crawling" is "Consuming" rather than "Free-Ranging".

This operand usage would apply whenever the semantics of the operation invoke atomization. For example: function calls where the required type is atomic; the data() function; AVTs; the select attribute of xsl:value-of. It would also apply to the small number of operations that get the string value of a node, for example string() and string-length(). In fact, it would apply to most cases where we currently use usage="absorption", with the exception of constructs like xsl:for-each and xsl:apply-templates and xsl:iterate where the processing of descendant elements is defined by user-written code rather than built-in code.

I'm then inclined to rename the existing usage=absorption as usage=consumption, to preserve one-letter abbreviations for usages, and because there's a clear link between usage=consumption and sweep=consuming.

A typical implementation will work as follows: when it encounters the start tag of a selected node, it opens a buffer for the string value of a node, and adds this buffer to the end of a queue. When it encounters a text node, it copies the value to all currently open string-value buffers. When it encounters the end tag for a selected node, it computes the atomized value of that node and seals the buffer; it then delivers (and dequeues) the atomized value of all buffers that are sealed and that are not queued behind one that is still open. The number of open buffers on the queue is determined by the amount of nesting of selected nodes in the crawling sequence, which in the vast majority of practical cases will be one; if there are no nested nodes in the crawling sequence, then each atomic value will be delivered as soon as the end tag for the corresponding node is encountered.

We could extend the same mechanism to all absorption operations on crawling sequences (for example, xsl:apply-templates and xsl:for-each). The reason I don't propose doing this is that (a) the amount of data in each buffer is unbounded (as it depends on user code), and (b) with operations like apply-templates, as distinct from atomization, it is much more likely that the result of the crawling expression will actually contain nested nodes.
Comment 8 Michael Kay 2014-05-22 22:18:11 UTC
The WG today accepted the technical effect of comment 7, but with advice to the editor to reconsider the presentation and terminology. The word "comsumption" was disliked. There was a suggestion that we could make do with the existing 4 operand usages, changing those that don't fit in (like apply-templates, for-each, and iterate) so they no longer use the GSR or are in some way handled as exceptions.
Comment 9 Michael Kay 2014-05-23 17:34:19 UTC
I found it was possible to make this change without changing any existing concepts; apart from changing the relevant entry in the GSR table from free-ranging to consuming, there is very little impact apart from a few examples and explanations. Instructions such as for-each, for-each-group, and iterate did not need to change, they are not affected because they don't rely on the GSR. The rules for apply-templates need to change to disallow a climbing or crawling select expression without appeal to the GSRs. The new rules for calls to streamable user-defined functions already contained this provision.

There is some risk that I didn't find all the incidental places affected by this change, that is, notes and examples.
Comment 10 Abel Braaksma 2014-05-23 18:40:46 UTC
Great to hear the change is smaller than anticipated. Let us know once the changes are applies, then I'll spend some time to go over them, including the existing examples and rules.
Comment 11 Abel Braaksma 2014-06-04 11:17:26 UTC
Correction previous comment: the change was applied to the internal working draft.
Comment 12 Michael Kay 2014-08-11 21:29:38 UTC
Unfortunately this change introduced a bug.

Consider:

<xsl:for-each select="//*">
  <xsl:copy-of select="."/>
</xsl:for-each>

The rules for xsl:for-each say (rule 3)

(a) The posture of the instruction is the posture of the contained sequence constructor, assessed with the context posture and context item type set to the posture and type of the select expression.

(b) The sweep of the instruction is the wider of the sweep of the select expression and the sweep of the contained sequence constructor.

The context posture is crawling, and the posture of xsl:copy-of follows the GSR with a single operand with posture=crawling, sweep=motionless. As a result of the change to the table in the GSR, specifically the CRAWLING/ABSORPTION entry, this is now CONSUMING (previously FREE-RANGINE). So the xsl:for-each as a whole is grounded/consuming, whereas the intent (in comment 7) was that this would still be roaming/free-ranging.

As far as xsl:for-each is concerned, I think we need to add a rule that if the select expression is crawling and the body is consuming then the PS is roaming/free-ranging. Similar changes may also be needed for apply-templates and for-each-group.
Comment 13 Michael Kay 2014-08-14 16:40:18 UTC
Abel pointed out that it might be possible to implement the intent by using the rules for focus-changing expressions: in effect if the controlling part of a focus-changing expression is crawling then the controlled part must be motionless.
Comment 14 Michael Kay 2014-08-21 14:01:40 UTC
I looked at the suggestion in comment 13, and I don't think it works in the way suggested. Generally, focus-changing constructs such as xsl:for-each don't use the GSR, they have individual rules, and commoning them up would be difficult. So I think we need to augment the rules for each focus-changing construct.

Specifically:

xsl:for-each (new rule 3): if the select expression is crawling and the contained sequence constructor is consuming, then roaming and free-ranging

xsl:iterate (new rule 4): if the select expression is crawling and the contained sequence constructor is consuming, then roaming and free-ranging

xsl:for-each-group (new rule 7): if the select expression is crawling and the contained sequence constructor is consuming, then roaming and free-ranging

path expressions and simple mapping expressions: no change

xsl:apply-templates: no change.
Comment 15 Michael Kay 2014-08-21 17:24:50 UTC
The WG accepted the proposal in comment #14.
Comment 16 Michael Kay 2014-08-22 11:36:29 UTC
The changes have been applied.