<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>29962</bug_id>
          
          <creation_ts>2016-10-29 03:51:02 +0000</creation_ts>
          <short_desc>[XP31] Legal XML Unicode character</short_desc>
          <delta_ts>2016-11-03 14:05:22 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>XPath 3.1</component>
          <version>Candidate Recommendation</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Abel Braaksma">abel.braaksma</reporter>
          <assigned_to name="Jonathan Robie">jonathan.robie</assigned_to>
          <cc>josh.spiegel</cc>
    
    <cc>mike</cc>
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>127976</commentid>
    <comment_count>0</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-10-29 03:51:02 +0000</bug_when>
    <thetext>Section A.1.2 (Extra-grammatical Constraints) in the subsection on xml-version [1], we have the closing sentence:

&quot;XPath expressions allow any legal XML Unicode character, subject only to constraints imposed by the host language.&quot;

But we don&apos;t define XML Unicode character (it occurs once in XP31), and that term does not exist in the XML specification.

I would assume it is the Char production. But it could also be all Unicode characters except NULL and surrogate pairs (or something like that).

Note that the Char production in XML itself says &quot;any unicode character except ...&quot;, but this comment is not complete (the production shows otherwise) and therefore ambiguous[2].

If XPath is used in a host language like XSLT it is naturally restricted by XML itself, but if it is used outside such context, the limitation should be well-defined.

My suggestion would be to say:

&quot;XPath expressions allow any legal Unicode character except 0000, FFFE, FFFF and surrogate blocks, subject only to constraints imposed by the host language.&quot;

This would define XPath expressions character ranges to be wider than the XML 1.0 character range, but many of these excluded characters can appear entity-escaped in XML 1.1. And escaping is out of scope for XPath itself anyway.

[1] https://www.w3.org/XML/Group/qtspecs/specifications/xquery-31/html/xpath-31-diff.html#parse-note-xml-version
[2] https://www.w3.org/TR/REC-xml/#charsets</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127977</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-10-29 08:48:43 +0000</bug_when>
    <thetext>The terms &quot;legal&quot; and &quot;illegal&quot; should never be used in a software specification, unless when referring to the actual law, e.g. copyright.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127979</commentid>
    <comment_count>2</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-10-30 02:52:57 +0000</bug_when>
    <thetext>(In reply to Michael Kay from comment #1)
&gt; The terms &quot;legal&quot; and &quot;illegal&quot; should never be used in a software
&gt; specification, unless when referring to the actual law, e.g. copyright.
With that in mind, the term occurs once in F&amp;O as well, albeit in a non-normative place:

&quot;In the fn:format-number function, some picture strings that previously were legal but had no defined meaning are now disallowed.&quot;

In XP31 the below occurrence was the only one, in XSLT there was no mention (except for copyright links).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>128015</commentid>
    <comment_count>3</comment_count>
    <who name="Josh Spiegel">josh.spiegel</who>
    <bug_when>2016-11-01 18:36:13 +0000</bug_when>
    <thetext>
It seems the intent of this sentence is to permit the host language to impose further constraints wrt these productions.  Is this really necessary? Can we simply remove the sentence?</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>