Bug 20416 - [SER30] Sequence Normalization Queries
Summary: [SER30] Sequence Normalization Queries
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 3.0 (show other bugs)
Version: Candidate Recommendation
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Henry Zongaro
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
Depends on:
Reported: 2012-12-17 17:48 UTC by Josh Spiegel
Modified: 2012-12-21 05:20 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Josh Spiegel 2012-12-17 17:48:36 UTC
Section 2 says that when the item-separator serialization parameter is present, sequence normalization is equivalent to constructing a document node as follows:

  declare construction preserve;
  document {
    for tumbling window $w in $seq
      start when true()
      end $end when $end instance of node()
    return if (count($w) ge 2)
      then string-join($w[count($w)-1], $sep)
      else $w

fn:string-join accepts xs:string* so if the input sequence is (1, 2, 3) I would expect this query to raise a sequence type matching error (sequence normalization does not raise this error).  

Also, sequence normalization places the item-separator between each item in the sequence.  I don't think the above windowing query achieves this.  For example, assume the item-separator is "x".  If the input sequence is (text { 1 }, text { 2 }, text { 3 }), I think the result of the query is a document with the text node "123" but the desired result is "1x2x3".  

I think the following query would better model sequence normalization when item-separator is present:

  declare construction preserve; 
  document {
    for $item at $pos in $seq
    let $node := 
      if ($item instance of node()) then 
        text { $item }
      if ($pos eq 1) then
        ($sep, $node)  

The let clause converts atomic values in the sequence to text nodes.  The purpose of this is to avoid the interleaving blanks that would be inserted by the document constructor.  


A second minor problem is with the query used when the item-separator is absent:

  declare construction preserve;
  document {
    for $s in $seq return
      if ($s instance of document-node())
      then $s/child::node()
      else $s

Document construction removes document nodes so manually removing the document nodes is unnecessary.  I propose the following query instead:

  declare construction preserve;
  document { $seq }
Comment 1 Henry Zongaro 2012-12-17 19:23:58 UTC
Thank you for pointing out those problems with the query I used to describe the behaviour of the item-separator serialization parameter.  I think I was so focused on ensuring adjacent atomic values would be separated by the item separator, not spaces, that I failed to ensure that the item separator would be present between each pair of adjacent items.

I think your proposed query is correct, but I'd like to take a closer look to ensure that it is.

Regarding your last point that "Document construction removes document nodes so manually removing the document nodes is unnecessary," I agree.  Looking at the history of that query in Serialization, it was introduced at a time when XQuery made it an error if the content of a document node constructor contained a document node.  After that restriction was removed, I never noticed the query in this section could be simplified.

Speaking for myself, not on behalf of the working groups.
Comment 2 Henry Zongaro 2012-12-21 05:20:22 UTC
At the joint XSLT/XQuery teleconference of 18 December 2012,[1] the working groups decided to make the changes suggested by Josh.

I also replaced the XSLT fragment that describes how the item-separator serialization parameter is treated with the following, as the original had the same problem as the XQuery expression:

  <xsl:for-each select="$seq">
    <xsl:sequence select="if position() gt 1 then $sep else ()"/>

      <xsl:when test=". instance of node()">
        <xsl:sequence select="."/>
        <xsl:value-of select="."/>

Finally, Ghislain pointed out that the second step of sequence normalization indicates that if an item in the sequence is not atomic, it will be a node.  That was true in Serialization 1.0, but is no longer, with the addition of functions.  Similarly, the final sentence of the paragraph indicates error SERN0001 is reported if the sequence contains namespace or attribute nodes.  That should also mention functions.  I will update both accordingly.

These changes will be reflected in the next working draft.

Josh, as you were present when this decision was made, I will assume you are in agreement with these changes, and so mark the bug as CLOSED.

[1] https://lists.w3.org/Archives/Member/w3c-xsl-query/2012Dec/0054.html (Member-only link)