This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14917 - Normalization of line endings in XPath
Summary: Normalization of line endings in XPath
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 3.0 (show other bugs)
Version: Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-23 21:32 UTC by Michael Kay
Modified: 2013-06-19 08:44 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2011-11-23 21:32:07 UTC
XPath 2.0 (and also 3.0) contains a statement (in A.2.3 line endings) that line endings in XPath expressions should be normalized according to the XML rules.

This is highly undesirable when XPath is embedded in XML (as for example in XSLT and XSD), because the line ending normalization has already been done by the XML processor. This means that it is impossible to represent a character such as x0D: writing it as 
 will prevent the XML parser normalizing it away, but this doesn't stop the XPath processor normalizing the resulting x0D character; and since character references are not recognized in XPath, there is no other way of expressing this character.

Furthermore, this is an undocumented incompatibility with XSLT 1.0. I suspect the only reason this has gone unnoticed is that the major XSLT 2.0 processors have not implemented this clause in the specification.

The statement first appeared in the draft of 15 September 2005, as a response to bug #1307. This bug was raised against XQuery only, but the resolution that was accepted changed both XPath and XQuery. (The decision is recorded in the minutes of meeting 266, July 2005)

In the 7 July 2005 XQuery draft the change log states: "Removed from the main part of the document any references to line-ending normalization. Affects 3.1.1 (Literals) and 3.7.1.3 (Dir. Elem. Constructor--Content, Rule 1a). Responds to Bug 1307. Line ending normalization will be applied globally by the parser rather than by individual expressions."

The effect of this decision was to change line-ending normalization from being XQuery-only to something that applied to XPath as well.

There are a number of tests in the XSLT 2.0 conformance test suite that clearly assume writing a character reference such as &#xD will prevent end-of-ilne normalization. Examples are satest-035 and unicode-001.
Comment 1 Jonathan Robie 2011-12-06 17:47:50 UTC
The Working Group agrees. Line-ending normalization will be done only in XQuery, in XPath the host language defines whether it is done.
Comment 2 Jonathan Robie 2011-12-10 19:18:26 UTC
(In reply to comment #1)
> The Working Group agrees. Line-ending normalization will be done only in
> XQuery, in XPath the host language defines whether it is done.

Here are the relevant paragraphs in XQuery and XPath.

XQuery:

<quote diff="unchanged">
The XQuery 3.0 processor must behave as if it normalized all line breaks on input, before parsing. The normalization should be done according to the choice to support either [XML 1.0] or [XML 1.1] lexical processing.
</quote>

XPath:

<quote diff="add">
The host language must specify whether the XPath 3.0 processor normalizes all line breaks on input, before parsing, using the rules of XML 1.0 or 1.1.
</quote>

I kept the sections that describe the rules of XML 1.0 and 1.1 in XPath as well as in XQuery.
Comment 3 Michael Kay 2012-07-09 08:17:43 UTC
I think the wording adopted does not adequately reflect the WG decision - it is ambiguous and easily misread to mean that normalization of line endings is always performed, but the host language can choose between the XML 1.0 and XML 1.1 rules.

I suggest instead:

<quote diff="chg">
The host language must specify whether or not the XPath 3.0 processor normalizes all
line breaks before parsing, and if it does so, whether it uses the rules of XML 1.0 or 1.1.
</quote>
Comment 4 Michael Kay 2013-03-05 15:47:00 UTC
This bug report was reopened in July 2012 but it appears that because it was classified as a 2.0 issue, the ambiguity in the wording that was added following the original resolution has gone uncorrected and has found its way into the Candidate Recommendation. I'm therefore reclassifying the bug against 3.0 in the hope that this raises its profile.
Comment 5 Jonathan Robie 2013-03-12 16:27:01 UTC
For reference, here is the 3.0 wording - Appendix D lists this as an implementation-defined item:

Which version of XML and XML Names (e.g. [XML 1.0] and [XML Names] or [XML 1.1] and [XML Names 1.1]) and which version of XML Schema (e.g. [XML Schema 1.0] or [XML Schema 1.1]) is used for the definitions of primitives such as characters and names, and for the definitions of operations such as normalization of line endings and normalization of whitespace in attribute values. It is recommended that the latest applicable version be used (even if it is published later than this specification).
Comment 6 Jonathan Robie 2013-03-12 16:30:53 UTC
Actually, this refers to A.2.3 End-of-Line Handling (http://www.w3.org/TR/xpath-30/#id-eol-handling).

The Working Group has agreed to adopt Comment #3.


(In reply to comment #3)
> I think the wording adopted does not adequately reflect the WG decision - it
> is ambiguous and easily misread to mean that normalization of line endings
> is always performed, but the host language can choose between the XML 1.0
> and XML 1.1 rules.
> 
> I suggest instead:
> 
> <quote diff="chg">
> The host language must specify whether or not the XPath 3.0 processor
> normalizes all
> line breaks before parsing, and if it does so, whether it uses the rules of
> XML 1.0 or 1.1.
> </quote>