<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>29745</bug_id>
          
          <creation_ts>2016-07-20 13:04:06 +0000</creation_ts>
          <short_desc>[FO31] fn:parse-json edge cases</short_desc>
          <delta_ts>2016-07-26 11:23:39 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Functions and Operators 3.1</component>
          <version>Candidate Recommendation</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Tim Mills">tim</reporter>
          <assigned_to name="Michael Kay">mike</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>127001</commentid>
    <comment_count>0</comment_count>
    <who name="Tim Mills">tim</who>
    <bug_when>2016-07-20 13:04:06 +0000</bug_when>
    <thetext>How should the following be handled?

(1) parse-json(&apos;[&quot;\uD834&quot;]&apos;)
(2) parse-json(&apos;[&quot;\uD834&quot;]a&apos;)
(3) parse-json(&apos;[&quot;\udD1E&quot;]&apos;)
(4) parse-json(&apos;[&quot;a\udD1E&quot;]&apos;)
(5) parse-json(&apos;[&quot;\uD834\uD834\udD1E&quot;]&apos;)

I can guess at  (1), (3) being invoking the fallback option e.g. &amp;#xFFFD;.
But would (2) and (4) consume the two characters as one badly encoded string codepoint, or as two characters?  i.e. &amp;#xFFFD;a or just &amp;#xFFFD;?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127040</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-07-22 15:21:20 +0000</bug_when>
    <thetext>The rules state:

The function is called when the JSON input contains a special character (as defined under the escape option) that is valid according to the JSON grammar, whether the special character is represented in the input directly or as an escape sequence. The function is called once for any surrogate that is not properly paired with another surrogate. The string supplied as the argument will always be a two- or six- character escape sequence, starting with a backslash, that conforms to the rules in the JSON grammar


This seems pretty clear to me. You process the input one nibble at a time, where a nibble is a character or an escape sequence introduced by &quot;\&quot;. If you hit a high surrogate that isn&apos;t followed by a low surrogate, you emit FFFD and move on to the next nibble. If you hit a low surrogate that isn&apos;t preceded by a high surrogate, you emit FFFD and move on to the next nibble.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127050</commentid>
    <comment_count>2</comment_count>
    <who name="Tim Mills">tim</who>
    <bug_when>2016-07-26 11:22:39 +0000</bug_when>
    <thetext>Agreed.

Thanks.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>