[Bug 21574] New: [XP3.0] C0 control characters in XPath expressions

https://www.w3.org/Bugs/Public/show_bug.cgi?id=21574

            Bug ID: 21574
           Summary: [XP3.0] C0 control characters in XPath expressions
    Classification: Unclassified
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XPath 3.0
          Assignee: jonathan.robie@gmail.com
          Reporter: mike@saxonica.com
        QA Contact: public-qt-comments@w3.org

J2.1 item 2 states "Adopted the XML restriction that control characters #x1 to
#x1F and 0x7F to 0x9F cannot appear in unescaped form in an XQuery. Resolves
Bug 14921." This statement is inappropriate in the XPath specification, and
raises the question of what the correct statement should be.

Bug 14921, which led to this XQuery change, had nothing to say about XPath.

Section 2 Basics says simply: "The basic building block of XPath 3.0 is the
expression, which is a string of [Unicode] characters".

A1.2, constraint xml-version, says: "XML 1.0 and XML 1.1 differ in their
handling of C0 control characters (specifically #x1 through #x1F, excluding
#x9, #xA, and #xD) and C1 control characters (#x7F through #x9F). In XML 1.0,
these C0 characters are prohibited, and the C1 characters are permitted. In XML
1.1, both sets of control characters are permitted, but only if written as
character references. It is RECOMMENDED that implementations should follow the
XML 1.1 rules in this respect; however, for backwards compatibility with XPath
2.0, implementations MAY allow C1 control characters to be used directly."

This recommendation doesn't make sense for XPath, because character references
don't exist in XPath and therefore C0 and C1 characters cannot be written as
character references.

In practice XPath is often embedded in XML, and characters written as character
references will therefore be ordinary Unicode characters by the time the XPath
parser sees them. Therefore I propose that XPath expressions should allow any
Unicode character legal in XML, subject only to constraints imposed by the host
language.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Wednesday, 3 April 2013 20:08:43 UTC