27478 – Serializing JSON: \t, \r, Unicode sequences

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27478 - Serializing JSON: \t, \r, Unicode sequences

Summary: Serializing JSON: \t, \r, Unicode sequences

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Serialization 3.1 (show other bugs)
Version:	Working drafts
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	C. M. Sperberg-McQueen
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:	27477
Blocks:
	Show dependency tree / graph

Reported:	2014-12-01 22:54 UTC by Christian Gruen
Modified:	2014-12-05 20:32 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Christian Gruen 2014-12-01 22:54:52 UTC

This is related to Bug 27477:

Section 4.3e of the serialization spec says that

  If the JSON output method is selected, replace
  " with \" and the newline character (&#x000A;) with \n. 

It needs to be clarified how &#9; and &#13; are to be serialized in JSON (I assume it will be '\t' and '\r' ?).

If there are cases left in which Unicode characters need to be escaped in the JSON output, it may be worth telling if an implementation should choose upper or lower case (RFC7159 allows both variants). For example, &#13; can currently be serialized as "\r", "\u000D", or \u000d".

Comment 1 C. M. Sperberg-McQueen 2014-12-02 17:45:59 UTC

The joint WGs discussed this bug today.  JSON (as defined by RFC 7159) doesn't require escaping of characters other than " and \ but on consideration the WGs agreed that it's probably useful to escape other characters which might otherwise frequently be corrupted in transmission over some channels.

The rules in serialization will be aligned with the json-to-xml function of XSLT [1], which says that in addition to escaping \, 

    any occurrence of quotation mark, backspace, form-feed, newline, 
    carriage return, or tab is replaced by \", \b, \f, \n, \r, or \t 
    respectively, and any other codepoint in the range 1-31 or 127-159 is 
    replaced by an escape in the form \uHHHH where HHHH is the hexadecimal 
    representation of the codepoint value.

[1] http://www.w3.org/TR/xslt-30/#func-xml-to-json

We believe this resolves the issues, so we are marking this Bugzilla entry RESOLVED.  

Christian, if you would review this resolution and indicate your agreement by changing the bug status to CLOSED (or your dissent by RE-OPENING it), it would be helpful.  If we don't hear from you in the next two weeks, we will assume that you are content with the resolution of the issue.

Comment 2 Christian Gruen 2014-12-02 17:57:12 UTC

Michael, thanks a lot for the summary. I completely agree with the resolution.

Before closing this bug, I have one last question, regarding the hexadecimal representation: Does HHHH mean that the hexadecimal digits A-F should be output in upper case, or is this implementation dependent?

Comment 3 Jim Melton 2014-12-05 20:28:35 UTC

I'm not Mike but I'll take a stab at answering the question anyway. 

I do NOT believe that the use of "HHHH" (as opposed to "hhhh", "Hhhh", "hHhh, etc.) was intended to imply that the hexadecimal digits must be output in upper case.  In fact, it is rather more common for lowercase to be used.  I don't believe that it is useful for our specs to prescribe one or the other, so I think implementation-dependent (not -defined!) is most appropriate.  And that probably is worth an entry into the appendix. 

I'm re-marking the bug RESOLVED/FIXED. If you agree that this is the appropriate response, please mark the bug CLOSED.

Comment 4 Christian Gruen 2014-12-05 20:32:55 UTC

Jim, thanks for the information. As suggested I'm closing the bug.