This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21574 - [XP3.0] C0 control characters in XPath expressions
Summary: [XP3.0] C0 control characters in XPath expressions
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 3.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-03 20:08 UTC by Michael Kay
Modified: 2013-05-07 16:20 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2013-04-03 20:08:38 UTC
J2.1 item 2 states "Adopted the XML restriction that control characters #x1 to #x1F and 0x7F to 0x9F cannot appear in unescaped form in an XQuery. Resolves Bug 14921." This statement is inappropriate in the XPath specification, and raises the question of what the correct statement should be.

Bug 14921, which led to this XQuery change, had nothing to say about XPath.

Section 2 Basics says simply: "The basic building block of XPath 3.0 is the expression, which is a string of [Unicode] characters".

A1.2, constraint xml-version, says: "XML 1.0 and XML 1.1 differ in their handling of C0 control characters (specifically #x1 through #x1F, excluding #x9, #xA, and #xD) and C1 control characters (#x7F through #x9F). In XML 1.0, these C0 characters are prohibited, and the C1 characters are permitted. In XML 1.1, both sets of control characters are permitted, but only if written as character references. It is RECOMMENDED that implementations should follow the XML 1.1 rules in this respect; however, for backwards compatibility with XPath 2.0, implementations MAY allow C1 control characters to be used directly."

This recommendation doesn't make sense for XPath, because character references don't exist in XPath and therefore C0 and C1 characters cannot be written as character references.

In practice XPath is often embedded in XML, and characters written as character references will therefore be ordinary Unicode characters by the time the XPath parser sees them. Therefore I propose that XPath expressions should allow any Unicode character legal in XML, subject only to constraints imposed by the host language.
Comment 1 Jonathan Robie 2013-05-07 16:17:18 UTC
I agree.
Comment 2 Jonathan Robie 2013-05-07 16:20:23 UTC
The Working Group has agreed to this change.