This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5849 - [XSLT 2.0] xsl:number problem
Summary: [XSLT 2.0] xsl:number problem
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 2.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-09 23:45 UTC by Michael Kay
Modified: 2009-01-30 11:54 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2008-07-09 23:45:06 UTC
I'm not sure whether this really counts as a bug, but it is certainly an oddity.

Consider the following source document:

<doc>
  <a mark="true"/>
  <a/>
  <a/>
  <a/>
  <a mark="true"/>
  <a/>
  <a/>
  <a/>   
</doc>

and the template

<xsl:template match="a">
  <a>
    <xsl:copy-of select="@mark"/>
    <xsl:number level="any" count="a" from="a[@mark='true']"/>
  </a>
</xsl:template>

How would you expect the nodes to be numbered?

I think the correct answer according to the spec is:

<doc>
  <a mark="true" nr=""/>
  <a nr="2"/>
  <a nr="3"/>
  <a nr="4"/>
  <a mark="true" nr="5"/>
  <a nr="2"/>
  <a nr="3"/>
  <a nr="4"/>   
</doc>

The explanation is that when you are numbering node X, the algorithm takes no account of whether X matches the from pattern, but it does include the previous node that matched the from pattern in its count, assuming that it also matched the count pattern.

I think a less surprising answer would be:

<doc>
  <a mark="true" nr="1"/>
  <a nr="2"/>
  <a nr="3"/>
  <a nr="4"/>
  <a mark="true" nr="1"/>
  <a nr="2"/>
  <a nr="3"/>
  <a nr="4"/>   
</doc>

which would be achieved by changing

Let $F be the node sequence selected by the expression

   $S/(preceding::node()|ancestor::node())[matches-from(.)][last()]

to

Let $F be the node sequence selected by the expression

   $S/(preceding::node()|ancestor-or-self::node())[matches-from(.)][last()]

I've no idea, however, what side-effects this might have on other use cases.

The XSLT 1.0 rule is "If the from attribute is specified, then only nodes after the first node before the current node that match the from pattern are considered." I think this suffers the same problem: a node that matches both "count" and "from" is not numbered 1, but the next counted node is numbered 2. But the sentence is so convoluted that you can read it different ways.
Comment 1 Henry Zongaro 2008-07-10 18:55:51 UTC
I agree that the sentence that Michael Kay has quoted from XSLT 1.0 is very difficult to parse - indeed, I believe it contains a typographical error that exacerbates the problem.  However, when I read it, I believe the expected result in XSLT 1.0 is as follows:

<doc>
  <a mark="true" nr=""/>
  <a nr="1"/>
  <a nr="2"/>
  <a nr="3"/>
  <a mark="true" nr="4"/>
  <a nr="1"/>
  <a nr="2"/>
  <a nr="3"/>   
</doc>

That is, a node that matches the "from" pattern and the "count" pattern is considered to be the last node of the set that began *after* the last node that matched the "from" pattern, rather than the first node in new set.

For convenience, I'll refer to the "a" element nodes using XPath expressions given "doc" as a context node.

The first part of the paragraph on level="any" says, "it constructs a list of length one containing the number of nodes that match the count pattern and belong to the set containing the current node and all nodes at any level of the document that are before the current node in document order."  So for a[2] that set - prior to considering the "from" pattern - consists of {a[1], a[2]}, and for a[6], that set consists of {a[1],a[2],...a[6]}.

Then we have, "If the from attribute is specified, then only nodes after the first node before the current node that match the from pattern are considered."  I think "match" here is a typographical error, and should be "matches."  The referent of "that" must be "the first node" not "nodes after the first node," because we're counting from the first node that matches the from pattern, we're not counting only the nodes that match the from pattern.  Thus, "only nodes after the first node before the current node that matches the from pattern are considered."  So the preceding node that matched the "from" pattern is not counted.

So, for a[2], the first node before the current node that matches the from pattern is a[1].  If we only consider the nodes in {a[1],a[2]} that are after a[1], we're left with {a[2]} - so the count is 1.  Similarly, for a[6], the first node before the current node that matches the from pattern is a[5].  If we only consider nodes in {a[1],a[2],...a[6]} that are after a[5], we're left with {a[6]} - so the count is again 1.

For a[5] - the second "a" element with mark="true" - the first part of level="any" has us construct the set {a[1],a[2],...a[5]}.  The qualification for the "from" attribute has us consider only the nodes after the first node before a[5] that matches the "from" pattern - i.e., a[1].  So that leaves us with the set {a[2],a[3],a[4],a[5]}, and the value of the count is 4.

For a[1], the set is initially {a[1]}.  There is no first node before a[1] that matches the "from" pattern, so I'm guessing that the set that results from the qualification for "from" is the empty set, and according to erratum E23 for XSLT 1.0, count is an empty list.
Comment 2 Michael Kay 2008-07-10 22:26:00 UTC
I agree entirely with the analysis of the XSLT 1.0 spec in comment #1. Indeed, the results match what Saxon 6.5.5 outputs, with one exception: Saxon numbers a[1] as nr="1". The spec doesn't say what happens if there is no node that matches the "from" pattern; Henry decided to discard all the nodes, whereas in Saxon I guess I decided to retain them all.

According to various contributors to the xsl-list, here's a survey of what various products do with this stylesheet. The eight columns are the values of the nr attribute on the 8 <a/> elements, with "-" indicating that the attribute is empty.

Saxon 6.5.5     1 1 2 3 4 1 2 3
Saxon 9.1.0.1   - 2 3 4 5 6 7 8
Saxon 9.1.0.2   - 2 3 4 5 2 3 4
Gestalt         - 2 3 4 5 6 7 8
Xalan 1.9       0 2 3 4 0 6 7 8
LibXSLT 1.1.22  1 2 3 4 1 2 3 4
Xalan-C 1.10    0 2 3 4 0 6 7 8
Xalan-J 2.7.1   0 2 3 4 0 6 7 8
Intel           1 2 3 4 5 6 7 8
MSXML3          0 1 2 3 0 1 2 3
.NET 1.0        - 1 2 3 - 1 2 3
.NET 2.0        - 1 2 3 4 1 2 3

As far as I can see the only two results that are defensible are

        (a)     - 1 2 3 4 1 2 3
        (b)     1 2 3 4 1 2 3 4

with the main argument in favour of (a) being backwards compatibility with the 1.0 spec, though since implementations are so inconsistent this cannot be a very strong argument. The XSLT 2.0 spec, which produces

        (c)    - 2 3 4 5 2 3 4

does not seem defensible: if the "from" node is regarded as the last in the previous run, then the first node after the "from" node should be numbered 1, not 2.
Comment 3 Michael Kay 2008-10-08 20:25:33 UTC
At its telcon on 2 Oct 2008 the WG agreed in principle that it would be good to fix this so that the answer for the given use case is "1 2 3 4 1 2 3 4". The challenge is to produce a change proposal that will have this effect without having any nasty side-effects on other use cases.

The proposed change is in 12.2, under level="any", to change the second bullet so that it uses the expression

$S/(preceding::node()|ancestor-or-self::node())[matches-from(.)][last()]

in place of

$S/(preceding::node()|ancestor::node())[matches-from(.)][last()]

Now

(a) this only affects level="any"

(b) it only affects the numbering of the selected node if it matches the "from" pattern

so it seems fairly clear that the change will be fairly limited in scope.

It's also the case that both level="single" and level="multiple" use ancestor-or-self() in searching for nodes that match the "from" pattern: that is, they both treat "from" as inclusive, whereas level="any" currently treats "from" as exclusive. So the change improves consistency.

So, after sleeping on it, I've convinced myself this is a good change. If the WG agrees, I will validate it further by implementing it and seeing whether it changes the result of any other test cases.
Comment 4 Michael Kay 2008-10-10 10:21:30 UTC
I have experimentally implemented the change in comment #3 to review the effect on the test suite.

It affects the result of three xsl:number tests: numb41, numb44, and numb45.

numb45 is the test case described in the bug report, and it now produces the numbering (1,2,3,4,1,2,3,4)

numb44 is a rather artificial test designed to exercise the use of current() in a predicate: <xsl:number from="*[name()=name(current())]/*" level="any"/>. The effect of the change is that a node that matches the "from" pattern (that is, a node that has the same name as its parent) is now numbered 1; previously it was numbered based on its distance from the previous node that had the same name as its parent.

numb41 tests numbering of attribute nodes: <xsl:number level="any" count="node() | / | @*" from="@*"/>. Previously when this was applied to an attribute node, there would be no ancestor or descendant that matched the "from" pattern, so no number was allocated (blank output). Now the attribute node itself matches the "from" pattern, so it is given the number 1.

So it seems that the change has no unwanted side-effects: the only change is for level="any", when there is a from pattern and the numbered node matches the from pattern, it is now numbered 1.
Comment 5 Michael Kay 2009-01-30 11:54:00 UTC
Erratum E30 has been raised as agreed by the WG on 2009-01-29. The bug is therefore being marked fixed and closed.