30286 – json-doc() handling of non-XML characters

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 30286 - json-doc() handling of non-XML characters

Summary: json-doc() handling of non-XML characters

Status:	NEW

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Functions and Operators 3.1 (show other bugs)
Version:	Recommendation
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-09-22 22:46 UTC by Michael Kay
Modified:	2018-09-22 22:46 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Michael Kay 2018-09-22 22:46:11 UTC

The spec of json-doc says:

If the resource contains characters that are not valid in the version of XML used by the processor, then rather than raising an error as fn:unparsed-text#1 does, the function replaces such characters by the equivalent JSON escape sequence prior to parsing.

The mechanism of "replacing such characters by the equivalent JSON escape sequence" does not achieve the required effect (unless you adopt a generous interpretation of "equivalent"). There are many places where an escape sequence is not accepted in the JSON grammar - though such cases aren't really a problem because the only effect is to change an error at the encoding level to the an error at the JSON parsing level. But there are some places, notably after "\", where inserting an escape sequence changes the meaning. For example "\" followed by x00 should be an error, but "\\u0000` is a legitimate representation of the six characters \ u 0 0 0 0.

Some of the new JSON test cases in test set misc-JsonTestSuite, adapted from http://github.com/nst/JSONTestSuite, illustrate these issues.