This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28151 - [F+O 3.1] Minor inconsistencies between parse-json() and json-to-xml()
Summary: [F+O 3.1] Minor inconsistencies between parse-json() and json-to-xml()
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-06 09:17 UTC by Michael Kay
Modified: 2016-12-16 19:55 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2015-03-06 09:17:33 UTC
There are some minor inconsistencies between the functions parse-json() and json-to-xml() (newly transferred from XSLT 3.0) which ought to be addressed.

I have fixed those that are purely editorial. The others are:

A. parse-json() accepts an empty sequence as the first argument, json-to-xml() does not. 

RECOMMENDATION: accept an empty sequence.

B. json-to-xml() explicitly permits a byte-order-mark at the start of the content. parse-json() says nothing on the subject, except by reference to RFC 7159. The RFC says "In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error."

RECOMMENDATION: accept (and ignore) a byte order mark.

C. Duplicate keys in maps. parse-json() provides three options: reject, use-first, and use-last. json-to-xml() says nothing on the subject, other than what can be deduced (a) from the option validate=true, which validates the resulting XML against a supplied schema, which prohibits duplicates, and (b) a statement that the mapping from JSON to XML preserves order.

I don't think it's appropriate to make json-to-xml() provide exactly the same options as parse-json() here, firstly because of the interaction with schema validation, and secondly because we want the conversion to be streamable and therefore order-preserving. I think the appropriate options for json-to-xml() would be retain, reject, and use-first: retain means that duplicates are retained in the result (making it invalid against the schema, so this option is incompatible with validate=true); reject and use-first have the same meaning as for parse-json().

RECOMMENDATION: In json-to-xml(), add the option duplicates=retain|reject|use-first. If the effective value of validate is true, duplicates defaults to reject; otherwise it defaults to retain. The value duplicates=retain is incompatible with validate=true.

D. Unescaping and invalid characters. Both functions have options whether to unescape JSON escape sequences, and both default to substituting invalid characters with xFFFD. The json-to-xml() function has an additional option ("fallback") to supply a user-written function to handle invalid characters.

RECOMMENDATION: Add the fallback option to parse-json().

Note also, for both functions the substitution of xFFFD happens only for non-XML characters represented as JSON escape sequences. The fallback option also handles non-XML characters that are represented directly in unescaped form (for example, C1 control characters, or cp1252 characters masquerading as C1 control characters). 

RECOMMENDATION: Both functions should handle non-XML characters represented in unescaped form in the same way as if they were written in escaped form (that is, the handling depends on the unescape and fallback options).

The specification of json-to-xml() in the "Errors" section contains a spurious error condition for invalid characters, which does not match what is said in the rules. I will correct this.
Comment 1 Michael Kay 2015-03-09 09:13:42 UTC
I wrote:

>Note also, for both functions the substitution of xFFFD happens only for non-XML characters represented as JSON escape sequences. The fallback option also handles non-XML characters that are represented directly in unescaped form

I overlooked that unescaped non-XML characters cannot occur either for parse-json or for json-to-xml because in both cases the input is a string, and a string cannot contain non-XML characters. The situation could occur for json-doc, except that we have chosen to define json-doc as the composition of unparsed-text and parse-json, and this means it will fail in the same way as unparsed-text in the presence of non-XML characters. We could redefine json-doc to avoid this error (json-doc already doesn't behave EXACTLY like unparsed-text, in that it has different rules for detecting the encoding).
Comment 2 Michael Kay 2015-03-10 21:45:24 UTC
The proposals made herein were today accepted by the WG, and have been applied to the spec.