This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29962 - [XP31] Legal XML Unicode character
Summary: [XP31] Legal XML Unicode character
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-29 03:51 UTC by Abel Braaksma
Modified: 2016-11-03 14:05 UTC (History)
2 users (show)

See Also:


Attachments

Description Abel Braaksma 2016-10-29 03:51:02 UTC
Section A.1.2 (Extra-grammatical Constraints) in the subsection on xml-version [1], we have the closing sentence:

"XPath expressions allow any legal XML Unicode character, subject only to constraints imposed by the host language."

But we don't define XML Unicode character (it occurs once in XP31), and that term does not exist in the XML specification.

I would assume it is the Char production. But it could also be all Unicode characters except NULL and surrogate pairs (or something like that).

Note that the Char production in XML itself says "any unicode character except ...", but this comment is not complete (the production shows otherwise) and therefore ambiguous[2].

If XPath is used in a host language like XSLT it is naturally restricted by XML itself, but if it is used outside such context, the limitation should be well-defined.

My suggestion would be to say:

"XPath expressions allow any legal Unicode character except 0000, FFFE, FFFF and surrogate blocks, subject only to constraints imposed by the host language."

This would define XPath expressions character ranges to be wider than the XML 1.0 character range, but many of these excluded characters can appear entity-escaped in XML 1.1. And escaping is out of scope for XPath itself anyway.

[1] https://www.w3.org/XML/Group/qtspecs/specifications/xquery-31/html/xpath-31-diff.html#parse-note-xml-version
[2] https://www.w3.org/TR/REC-xml/#charsets
Comment 1 Michael Kay 2016-10-29 08:48:43 UTC
The terms "legal" and "illegal" should never be used in a software specification, unless when referring to the actual law, e.g. copyright.
Comment 2 Abel Braaksma 2016-10-30 02:52:57 UTC
(In reply to Michael Kay from comment #1)
> The terms "legal" and "illegal" should never be used in a software
> specification, unless when referring to the actual law, e.g. copyright.
With that in mind, the term occurs once in F&O as well, albeit in a non-normative place:

"In the fn:format-number function, some picture strings that previously were legal but had no defined meaning are now disallowed."

In XP31 the below occurrence was the only one, in XSLT there was no mention (except for copyright links).
Comment 3 Josh Spiegel 2016-11-01 18:36:13 UTC
It seems the intent of this sentence is to permit the host language to impose further constraints wrt these productions.  Is this really necessary? Can we simply remove the sentence?