This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29692 - xsl:strip-space and packages
Summary: xsl:strip-space and packages
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-11 08:24 UTC by Michael Kay
Modified: 2016-10-06 18:42 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2016-06-11 08:24:21 UTC
We say in 3.5.5:

If xsl:strip-space or xsl:preserve-space declarations appear within a library package, they only affect calls to the docFO30 or document functions appearing within that package. Such a declaration within the top-level package additionally affects stripping of whitespace in the document that contains the global context item.

(a) No mention here of xsl:stream or of fn:collection.

(b) No mention here of the initial match selection.

(c) No mention here of the fact (which I thought we had discussed and agreed) that this rule has a consequence: if doc(X) is called in two different packages, with different xsl:strip-space declarations, then you necessarily get two different document nodes back, (and this, at least on the surface, conflicts with what F+O says about determinism).
Comment 1 Michael Kay 2016-06-11 18:39:23 UTC
Concerning (c), we allude to this rule in 4.4 (Stripping Type Annotations):

The source trees to which this applies are the same as those affected by xsl:strip-space and xsl:preserve-space: see 4.5 Stripping Whitespace from a Source Tree. As with whitespace stripping, the rules for stripping of type annotations may vary from one package to another, and have the effect of modifying the mapping from URIs to document nodes defined in the XPath dynamic context; this means that two calls to the docFO30 function (for example) supplying the same URI may produce different document nodes if the calls appear in different packages.

Also add (d): No mention of how dynamic calls on doc(), document(), and collection() are handled.
Comment 2 Michael Kay 2016-06-12 07:50:48 UTC
An edge case (which also applies to XSLT 2.0): what happens if the global context item, or some other item subjected to whitespace stripping, is itself a whitespace text node that gets stripped away?

In fact generally we don't say anything much about stripping of input nodes other than document nodes, for example what happens if collection() returns a sequence of sibling elements, will they still be siblings after whitespace stripping is applied?
Comment 3 Abel Braaksma 2016-06-12 11:39:52 UTC
(In reply to Michael Kay from comment #0)
>  (which I thought we had discussed and agreed) 
In Bug 23326 we say we resolved this by saying that xsl:strip-space and type annotations are applicable to xsl:stream as well, and only apply to the relevant package. I did not find a change log entry in the current draft, maybe it was overlooked and never applied.

In Bug 22663 we concluded (on differences caused by separate compilation, and the effect of the use of doc() in the static context):

Bug 22663 comment #2
"It was noted that we could require the calls on doc() in different static contexts to return results that effectively mean the document must only be read once and then have space stripping applied (ie. disallowing reading a version of the document that has changed in the interim)"

Then, on the dynamic context we said:

Bug 22663 comment #3
"We should say that the dynamic context for the doc() function depends on which package the call is contained in, and the mapping from URIs to document nodes is therefore different for different packages. We could say that these mappings are permitted to vary ONLY by applying the strip-space and strip-annotations as modifications to some base mapping which must be common across packages; but it's simpler to say nothing, which essentially means that implementations have freedom to allow the mapping in different packages to be completely independent (one resolver/catalog per package) or not, as they see fit."

Effectively: the intend was to leave it to the implementation by saying (almost) nothing. 

(In reply to Michael Kay from comment #2)
> In fact generally we don't say anything much about stripping of input 
> nodes other than document nodes

I didn't find former discussion on anything other than document nodes. But we do say in the spec that it applies to any node (but notice the ambiguity in the sentence):

Section 4.5:
"For the purposes of this section, the term source tree means the document containing the global context item if it is a node, any documents containing nodes present in the initial match selection, any document returned by the functions document, docFO30, or collectionFO30, and any document read using xsl:stream"

I read this as: the global context item (which can only be accessed in the principal package) may be something else than a document. The initial match selection *must* be a document for it to apply.

Furthermore, the text on xsl:strip-space suggests it only applies to elements, so a document with a whitespace node child is not effected.

This quote above also suggests that they do apply to all of doc, collection, document and xsl:stream.

--------

Bottom line: I think all the rules are there, albeit hard to disentangle. We could be more explicit about it. We could also allow stripping of non-document nodes. Or explicitly state it is disallowed, for clarity reasons.
Comment 4 Abel Braaksma 2016-06-12 11:48:46 UTC
On (a), (b) and (c), I think we say it all in this paragraph under 4.5:

"Formally, the stripping process modifies the mapping from URIs to document nodes defined in the XPath dynamic context. This mapping can therefore vary from one package to another. The mapping that applies to a particular call on document, docFO30, or collectionFO30, or a particular evaluation of xsl:stream, is affected by the xsl:strip-space and xsl:preserve-space declarations within the package in which that construct appears. This means that two calls on the docFO30 function (for example) may return different nodes if the calls appear in different packages."

Here we mention that
1) the mapping is per package
2) it applies to doc, document, collection, xsl:stream
3) that you may get back different nodes from the same URI in different pkgs

This leaves (d) (dynamic calls to fn:doc etc). But I think that follows from this definition: it depends on in what package the dynamic call takes place (if we have such a notion).

And the edge case (text node only), we say in 4.5:

"The stripping process takes as input a set of element names whose child whitespace text nodes are to be preserved."

Note an editorial typo here: last words should be "are to be stripped".
Comment 5 Michael Kay 2016-06-24 14:03:41 UTC
I propose the following changes:

1. Merge sections 4.4 and 4.5 into a single section (4.4 Preprocessing Source Documents) with the current sections as subsections, and put shared material in the introduction to this new section. Specifically:

<quote>
Source documents supplied as input to a transformation may be subject to preprocessing. Two kinds of preprocessing are defined: stripping of type annotations (see 4.4.1), and stripping of whitespace text nodes (see 4.4.2).

Stripping of type annotations happens before stripping of whitespace text nodes.

The source documents to which this applies are as follows:

* The document containing the global context item if it is a node

* Any documents containing nodes present in the initial match selection

* Any document containing a node that is returned by the functions document, docFO30, or collectionFO30

* Any document read using xsl:stream. 

Note: this list excludes documents passed as the values of stylesheet parameters or parameters of the initial template or function, trees created by functions such as parse-xmlFO30, parse-xml-fragment, analyze-stringFO30, or json-to-xml, and values returned from extension functions.

If a node other than a document node is supplied (for example as the global context item), then the preprocessing is applied to the entire document containing that node. If several nodes within the same document are supplied (for example as nodes in the initial match selection, or as nodes returned by the collection function), then the preprocessing is only applied to that document once.

The rules determining whether or not stripping of annotations and/or whitespace happens are defined at the level of a package. Declarations within a library package only affect the handling of documents loaded using a call on the document, docFO30, or collectionFO30 functions or an evaluation of an xsl:stream instruction appearing lexically within the same package. Declarations within the top-level package also affect the processing of the global context item and the initial match selection.

The semantics of the doc, document, and collection functions are formally defined in terms of mappings from URIs to document nodes maintained within the dynamic context. The effect of the declarations that control stripping of type annotations and whitespace is therefore to modify this mapping (so it now maps the URI to a stripped document). The modification applies to the dynamic context for calls to these function appearing within a particular package; each package therefore has a different set of mappings. This means that when two calls to the doc function appear in different packages, specifying the same absolute URI, then in general different documents are returned. An implementation MAY return the same document if it is able to determine that the effect of the annotation and whitespace stripping rules in both packages is the same.

The effect of dynamic calls to the doc, document, and collection functions is defined in the same way as for other functions with dependencies on the dynamic context. As described in 5.3.4, named function references (such as doc#1) and calls on function-lookupFO30 (for example, function-lookup("doc", 1)) are defined to retain the XPath static and dynamic context at the point of invocation as part of the closure of the resulting function item, and to use this preserved context when a dynamic function call is subsequently made using the function item.

</quote>

2. In 4.4, delete this paragraph:

<quote>
The source trees to which this applies are the same as those affected by xsl:strip-space and xsl:preserve-space: see 4.5 Stripping Whitespace from a Source Tree. As with whitespace stripping, the rules for stripping of type annotations may vary from one package to another, and have the effect of modifying the mapping from URIs to document nodes defined in the XPath dynamic context; this means that two calls to the docFO30 function (for example) supplying the same URI may produce different document nodes if the calls appear in different packages.
</quote>

3. Inn 4.5, delete the following paragraphs:

<quote>
For the purposes of this section, the term source tree means the document containing the global context item if it is a node, any documents containing nodes present in the initial match selection, any document returned by the functions document, docFO30, or collectionFO30, and any document read using xsl:stream. It does not include documents passed as the values of stylesheet parameters or parameters of the initial template or function, trees created by functions such as parse-xmlFO30, parse-xml-fragment, analyze-stringFO30, or json-to-xml, nor values returned from extension functions.

Each source tree is associated with a package: the relevant package for the global context item is the top-level package; the relevant package for a call on document, docFO30, or collectionFO30; is the package in which that call appears; and the relevant package for evaluation of xsl:stream is the package in which that instruction appears.
</quote>

Change the following paragraph:
<quote>
Formally, the stripping process modifies the mapping from URIs to document nodes defined in the XPath dynamic context. This mapping can therefore vary from one package to another. The mapping that applies to a particular call on document, docFO30, or collectionFO30, or a particular evaluation of xsl:stream, is affected by the xsl:strip-space and xsl:preserve-space declarations within the package in which that construct appears. This means that two calls on the docFO30 function (for example) may return different nodes if the calls appear in different packages.
</quote>

to
<quote>
The stripping process that applies for a particular package is determined by the xsl:strip-space and xsl:preserve-space declarations within that package.
</quote>

After 
<quote>
The xml:space attributes are not removed from the tree.
</quote>

add:
<quote>
If the stripping process strips a whitespace text node that is present in the sequence provided as the initial match selection, or in the sequence that forms the result of the collection function, then the relevant node is removed from this sequence.
</quote>

Delete the following paragraph:
<quote>
The effect of xsl:strip-space and xsl:preserve-space is local to the package in which they appear. Declarations within a library package only affect the handling of documents loaded using a call on the document, docFO30, or collectionFO30 functions or an evaluation of an xsl:stream instruction appearing lexically within the same package. Declarations within the top-level package also affect the processing of the main input document.
</quote>
Comment 6 Michael Kay 2016-06-27 08:00:25 UTC
These changes have been applied to the spec; the bug is left open pending WG approval of the text.
Comment 7 Michael Kay 2016-07-20 19:16:39 UTC
The minutes from July 7 record

ACTION 2016-07-07-001: Bug 29692: Mike Kay to fix the sentence about
stripping whitespace text nodes present in the result of the
collection() function, since earlier we modelled this by saying that
the collection function didn't return these nodes in the first place.
Also include that it also applies to the global context item.

Although the minutes aren't entirely clear on the point, my recollection is that we approved the changes subject to this amendment.

The changes have been applied so the bug is now being marked as resolved.