26784 – [SER 3.1] Comments on JSON Serialization Method

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26784 - [SER 3.1] Comments on JSON Serialization Method

Summary: [SER 3.1] Comments on JSON Serialization Method

Status:	RESOLVED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Serialization 3.1 (show other bugs)
Version:	Working drafts
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	C. M. Sperberg-McQueen
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-09-11 11:50 UTC by Michael Kay
Modified:	2014-10-20 10:53 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Michael Kay 2014-09-11 11:50:20 UTC

Apologies that this review could have been done months ago; I only just noticed the spec.

The comment applies to XSLT and XQuery Serialization 3.1 dated 24 April 2014, mainly section 9.

1. Serializing maps. After converting keys to strings there may be duplicate keys, e.g. the string "2014-09-11" and the date 2014-09-11 are not equal but both convert to the same string. What happens?

2. Serializing maps. Should be explicit that the serialization order of entries is impl-dep.

3. The proposal is that nodes in the input should be atomized. I think it would generally be more useful if they are serialized (using the XML output method with default properties plus omit-xml-declaration="yes"). Also note, atomization of a node does not produce a single atomic value, it produces a sequence of atomic values.

4. indent: I think we should specify that if indent="no", the serializer must output no whitepace between tokens (none is required to satisfy the grammar). Because this could generate very long lines, while indent="yes" might generate very verbose output, perhaps there's a need for an intermediate option to insert a newline occasionally to limit the line length?

5. suppress-indentation: I can't see how this is relevant to JSON serialization.

6. We should specify which characters should be escaped using JSON escape sequences. Probably only (a) those where escaping is mandatory, e.g. backslash and double-quotes, plus \n, \t etc; plus any characters that can't be represented directly in the chosen encoding. (So encoding="us-ascii" forces escaping of non-ASCII characters).

7. Section 9 (JSON output method) says that the effect of item-separator is described in section 2 (Sequence Normalization), but section 2 (on my reading) says that it does not apply to the JSON output method. (An alternative reading is that sequence normalization is mandatory for all output methods except JSON, where it is presumably optional...)

8. There seem to be values for which no JSON serialization is defined. For example, sequences. What is the JSON serialization of (1 to 10)? Is this an error? I would be inclined to serialize any sequence of length > 1 as an array.

9 Other values that cannot be serialized into legal JSON include INF, NaN, and function items. These should probably be serialization errors.

Comment 1 C. M. Sperberg-McQueen 2014-09-22 16:36:34 UTC

The following remarks reflect the tentative views of the editors of the
serialization spec, after some discussion.

On 1 (duplicate map keys), we note that RFC 7159 does not impose a
uniqueness constraint on member names in an object, although it notes
that some JSON parsers will raise an error on them. The WG seems to
face a choice here:
  
  (A) Make no substantive change, on the grounds that RFC 7159 does
      not make duplicate object-member names an error.  

      Optionally, add a note to warn users that if they care about
      avoiding duplicate object names, they need to take steps to
      ensure that they are not omitted.

  (B) raise an error, on the grounds that object member names are
      normally supposed to be unique in JSON, RFC 7159 explicitly
      warns of interoperability issues with them, and many users will
      want a serialization error rather than an error in the next JSON
      ingest step.  

      Optionally, add a serialization option to specify that duplicate
      object-member names should not raise an error.  (We'd be happy
      to let try/catch handle this, but try/catch doesn't seem to work
      for implicit serialization.)

On 2 (implementation-dependent order of map keys): agreed.  Good
catch.

On 3 (atomization vs serialization).  The WG will need to decide this.

If we understand correctly, the differences between the current draft
and the proposal to use serialization rather than atomization can be
illustrated with the following sample map:

  map {
    "able" := <e>42</e>,
    "baker" := <f>1 2 3 4 5</f>,
    "charlie" := <f><e>1</e><e>2</e><e>3</e><e>4</e><e>5</e></f>,
    "dog" := <p>The design goals for XML are:
               <list type="ordered">
                 <item>
                   <p>XML shall be straightforwardly usable over 
                      the Internet.</p>
                 </item>
                 ...
               </list>
             </p>
  }

Under the current draft (augmented with the rules suggested below to
serialize non-empty sequences as JSON arrays and escape JSON strings
as needed), if all the elements are untyped, this would (I think)
produce

  { "able" : 42,
    "baker" : [1, 2, 3, 4, 5],
    "charlie" : "12345",
    "dog" := "The design goals for XML are:\n\t\n\t\t\n\t\t\t\nXML ..."
  } 

If the 'f' element is typed as an element-only element, then the
"charlie" member would produce an error when atomization attempted to
access the typed value of the 'f' element.

If we perform serialization rather than atomization, the results
should be more like this (is this a correct understanding of the
intention?):

  { 
    "able" : "<e>42</e>",
    "baker" := "<f>1 2 3 4 5</f>",
    "charlie" := "<f><e>1</e><e>2</e><e>3</e><e>4</e><e>5</e></f>",
    "dog" := "<p>The design goals for XML are:\n\t<list type=\"ordered\">\n\t\t<item>\n\t\t\t<p>XML ..."
  }

On the one hand:  atomization serializes more elements as data values,
which seems likely to be often what JSON users desire.  And on the
other:  serialization serializes more elements without errors, which 
also seems likely to be a desired property.  

On 4 and 5 (indentation): agreed.

On 6 (character escaping):  agreed.

On 7 (item-separator):  we propose to specify that item-separator is
not relevant to JSON output.

On 8 (serialize non-empty non-singleton sequences as arrays):  agreed.

On 9 (INF, NaN, function items, ...):  agreed.

We hope to have draft wording reflecting these changes for the WG to
review tomorrow.

Comment 2 C. M. Sperberg-McQueen 2014-09-22 20:57:07 UTC

A revised version of the 3.1 Serialization spec which addresses most of the issues raised here (but not item 3) is at [1] (member-only link).

[1] https://www.w3.org/XML/Group/qtspecs/specifications/xslt-xquery-serialization-31/html/Overview-diff.html

Comment 3 Jonathan Robie 2014-10-02 00:20:21 UTC

Is the schema up to date?

https://www.w3.org/XML/Group/qtspecs/specifications/xslt-xquery-serialization-31/html/Overview-diff.html#serparams-schema

Comment 4 Andrew Coleman 2014-10-10 13:57:30 UTC

(In reply to Jonathan Robie from comment #3)
> Is the schema up to date?
> 
> https://www.w3.org/XML/Group/qtspecs/specifications/xslt-xquery-
> serialization-31/html/Overview-diff.html#serparams-schema

Yes, this is updated with the new serialization parameters

Comment 5 Andrew Coleman 2014-10-20 10:53:24 UTC

Closing as agreed in joint teleconference on 2014-10-14