This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28663 - [FO31] Escaping rules in fn:xml-to-json
Summary: [FO31] Escaping rules in fn:xml-to-json
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-20 15:21 UTC by Christian Gruen
Modified: 2015-10-16 18:34 UTC (History)
2 users (show)

See Also:


Attachments

Description Christian Gruen 2015-05-20 15:21:48 UTC
1. The handling of quotation marks is covered twice in the string escaping rules:

* [...], and any occurrence of quotation mark (") is replaced by \"
* [...] any occurrence of quotation mark, backspace [...]

The first rule can probably be dropped? I cannot find it in the XSLT spec. neither.

2. FOJS0007 is to be raised if \u is "not followed by four hexadecimal digits (that is [0-9A-F]{4})". Maybe a-f (lower case) should be allowed as well?
Comment 1 Michael Kay 2015-05-29 10:00:59 UTC
I'm finding the rules quite hard to decipher, and I think we can achieve greater clarity by refactoring them as follows:

(1) If the attribute escaped="true" is present for a string value, or escaped-key="true" for a key value, then:

(1.a) any valid JSON escape sequence present in the string is copied unchanged to the output,

(1.b) any invalid JSON escape sequence results in a dynamic error [err:FOJS0007].

(1.c) any unescaped occurrence of quotation mark, backspace, form-feed, newline, carriage return, or tab is replaced by \", \b, \f, \n, \r, or \t respectively, 

(1.d) any other codepoint in the range 1-31 or 127-159 is replaced by an escape in the form \uHHHH where HHHH is the upper-case hexadecimal representation of the codepoint value.

(2) Otherwise (that is, in the absence of the attribute escaped="true" for a string value, or escaped-key="true" for a key value):

(2.a) any occurrence of backslash (\) is replaced by \\

(2.b) any occurrence of quotation mark, backspace, form-feed, newline, carriage return, or tab is replaced by \", \b, \f, \n, \r, or \t respectively, 

(2.c) any other codepoint in the range 1-31 or 127-159 is replaced by an escape in the form \uHHHH where HHHH is the upper-case hexadecimal representation of the codepoint value.

I agree that we should allow lower-case A-F in an escape sequence. At one time I was under the impression JSON did not allow this, but this was because I overlooked an obscure clause in RFC4234 whose effect is that the ABNF "A"|"B"|"C" doesn't mean what you think it does.
Comment 2 Michael Kay 2015-05-29 10:15:15 UTC
Changed the title, the question of allowing lower-case a-f is not purely editorial.
Comment 3 Michael Kay 2015-06-02 20:19:57 UTC
The change was accepted and has been applied.
Comment 4 Abel Braaksma 2015-10-16 18:34:10 UTC
(just clearing the still-open needinfo request of this bug, and transitioning it to "closed" status)