This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29745 - [FO31] fn:parse-json edge cases
Summary: [FO31] fn:parse-json edge cases
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-20 13:04 UTC by Tim Mills
Modified: 2016-07-26 11:23 UTC (History)
0 users

See Also:


Attachments

Description Tim Mills 2016-07-20 13:04:06 UTC
How should the following be handled?

(1) parse-json('["\uD834"]')
(2) parse-json('["\uD834"]a')
(3) parse-json('["\udD1E"]')
(4) parse-json('["a\udD1E"]')
(5) parse-json('["\uD834\uD834\udD1E"]')

I can guess at  (1), (3) being invoking the fallback option e.g. �.
But would (2) and (4) consume the two characters as one badly encoded string codepoint, or as two characters?  i.e. �a or just �?
Comment 1 Michael Kay 2016-07-22 15:21:20 UTC
The rules state:

The function is called when the JSON input contains a special character (as defined under the escape option) that is valid according to the JSON grammar, whether the special character is represented in the input directly or as an escape sequence. The function is called once for any surrogate that is not properly paired with another surrogate. The string supplied as the argument will always be a two- or six- character escape sequence, starting with a backslash, that conforms to the rules in the JSON grammar


This seems pretty clear to me. You process the input one nibble at a time, where a nibble is a character or an escape sequence introduced by "\". If you hit a high surrogate that isn't followed by a low surrogate, you emit FFFD and move on to the next nibble. If you hit a low surrogate that isn't preceded by a high surrogate, you emit FFFD and move on to the next nibble.
Comment 2 Tim Mills 2016-07-26 11:22:39 UTC
Agreed.

Thanks.