Json

JSON-based instances and submissions

XForms allows the initialization, processing and serialization of instances whose data come from a JSON source by transforming the JSON value into an XML instance, and serializing it back out as JSON. The XML version of JSON has been designed to be round-trippable, and to allow XPath selectors that resemble the equivalent Javascript selectors. For example, a JSON source like this

{"company":"example.com", "size": 50, 
 "location": { "class":"international", "places": ["Amsterdam", "London"]}}

is transformed to

<json><company>example.com</company>
      <size type="number">50</size>
      <location>
           <class>international</class>
           <places starts="array">Amsterdam</places>
           <places>London</places>
      </location>
</json>

allowing formulations such as:

<instance src="http://example.com/company"/>
<bind nodeset="location/places[1]" ...
<submission resource="http://example.com/company" ...

In general it is opaque to the form (though not to the processor) which serialization has been used.

Processing

When an XForms processor receives a JSON object [ref JSON], it does the following:

A root element <json/> is created whose content is the transformation of each of the contained members, if any, of the JSON object.
Each name/value member is encoded as an XML element whose name is the name of the member, and whose content is the transformation of the value part of the member. If the name of the member contains characters that are not NMCHARs, those characters are replaced with underscores in the element name, and an attribute name is added to the element, with the value of the original name of the member. If the value part is an array, then each value in the array is so treated, and the first such element is given an attribute starts="array". If the value of the member is not a string, array or object, an attribute type is added to the element, with one of the values "number", "boolean" or "null".
The content of a string is copied across (without the enclosing quotation marks), with the characters "<", ">", and "&" transliterated to "<", ">" and "&", hexadecimal encodings \udddd converted to the XML equivalent &#xdddd;, and escaped characters appropriately converted: \" to "

\\ to \ \/ to / \b to ?? \f to ?? \n to ?? \r to ?? \t to ??

A number is copied across.
Objects are transformed by transforming each of the contained members, if any.
true and false are copied across.
null produces empty content.

Examples

JSON	becomes
`{"size": "50"}`	`<json><size>50</size></json>`
`{"size": 50}`	`<json><size type="number">50</size></json>`
`{"*size": "50"}`	`<json><_size name="*size">50</_size></json>`
`{"name": {"given": "Isaac", "family": "Newton"}}`	`<json><name><given>Isaac</given><family>Newton</family></name></json>`
`{"size": [30, 40]}`	`<json><size starts="array" type="number">30</size><size type="number">40</size></json>`
`{"open": true}`	`<json><open type="boolean">true</open></json>`
`{"values": null}`	`<json><values type="null"/></json>`

Serializing

Any instance in XForms when serialized is serialized by default with the same media type as it was created.

So when an instance that originated from a JSON object is serialized, by default it is serialized as a JSON object, and that is done by reversing the transformation given above.

To serialize an instance with a different media type, the attribute 'mediatype' should be added to the appropriate element. For instance to serialize a JSON instance as XML, write

<submission resource="..." mediatype="application/xml" ...

Similarly, to serialize an XML instance as JSON, write

<submission resource="..." mediatype="application/json" ...

However, the instance should conform to the rules given above for constructing a JSON-based instance.

Marking Types

Problem: You need to be able to differentiate between

 {"x": true} and {"x": "true"}
 {"x": 0} and {"x": "0"}
 {"x": null} and {"x": "null"}

You only really need to differentiate for serialisation.

Version: Strings get marked specially

{"height": 1080, "width": 1920} 
	<json><height>1080</height><width>1920</width></json> 
{"name": "Mark", "age": 21}     
	<json><name type="string">Mark</name><age>21</age></json> 
{"selected": true}              
	<json><selected>true</selected></json> 
{"load": [0.31, 0.33, 0.32]}    
	<json><load array="true">0.31</load><load array="true">0.33</load><load array="true">0.32</load></json> 
{"cities": ["Amsterdam", "Paris", "London"]} 	 
	<json><cities array="true" type="string">Amsterdam</cities><cities array="true" type="string">Paris</cities><cities array="true" type="string">London</cities></json> 
{"left": {"x": 0, "y": 0}, "right": {"x": 100, "y": 100}} 	 
	<json><left><x>0</x><y>0</y></left><right><x>100</x><y>100</y></right></json> 
{"p": null}

<json>

</json> Version: non-strings get marked specially {"height": 1080, "width": 1920} <json><height type="number">1080</height><width type="number">1920</width></json> {"name": "Mark", "age": 21} <json><name>Mark</name><age type="number">21</age></json> {"selected": true} <json><selected type="boolean">true</selected></json> {"load": [0.31, 0.33, 0.32]} <json><load array="true" type="number">0.31</load><load array="true" type="number">0.33</load><load array="true" type="number">0.32</load></json> {"cities": ["Amsterdam", "Paris", "London"]} <json><cities array="true">Amsterdam</cities><cities array="true">Paris</cities><cities array="true">London</cities></json> {"left": {"x": 0, "y": 0}, "right": {"x": 100, "y": 100}} <json><left><x type="number">0</x><y type="number">0</y></left><right><x type="number">100</x><y type="number">100</y></right></json> {"p": null} <json>

</json> I'm inclined to go with the latter, because it is more error-proof. That is to say {"x": "0"} is better than {"name": Mark}, because anything can be a string, but not anything can be a number. Mark the special cases, even though this makes the rules slightly more complicated.

Discussion points

Will we recommend or require support for unquoted names?

Is "json" the right name for the root?

It doesn't really matter. We don't really refer to it very often anyway, an alternative name might be "root". An alternative would be just "data".

Should the type attribute be on the root element?

Yes, to distinguish an object from an array.

Should the mediatype be recorded in the root element?

It could be, especially if MIME parameters are used to indicate which optional features of the mapping were used.

Can @type be replaced with @xsi:type?

Probably not. For one, the type can be array or object. Also, a number may map to xsd:double or xforms:double, but it may be the processor that has out-of-band information that enables a schema datatype to be attached, e.g. by a bind.

It could be useful to have an option of using xsi:type if we change how some of the other uses of type are represented. For example, a change for null is already suggested, and that leaves object and array. There could be an attribute "depth" with values of object or array.

Should we have the option of not generating type attributes? How might we recover the JSON from the XML without them?

Yes. Sometimes they are handy, but it is equally likely that an XForm will also use a schema or a set of type binds on the data. Using a mime parameter on the application/json mediatype, it should be possible to indicate that XForms datatype information should be used to help convert XML back to JSON.

Is type="null" the right way to indicate that an empty element is a null?

Using type="null" means it would be hard to allow the content to be null but also indicate the type that the data should take, e.g. number. This is why XML schema separates this into the xsi:nil concept. A null value would correspond to putting xsi:nil="true" on the element. If a JSON name is associated with an empty object or array, this would be distinguishable by not placing xsi:nil on the element.

One "con" is that using xsi:nil means that xmlns:xsi would have to be added to the root element. Another "con" is that xsi:nil is designed to work with a schema indication that an element is nillable. If there is a schema, then when the XML element is changed to non-empty content, then the changer has to know to change xsi:nil to false (or remove it), and this is not the default behavior of an XForms processor unfortunately. An alternative is to use a non-namespaced nil attribute, which gets rid of the namespace declaration as well as the expectation that the attribute will work with a schema. The nil attribute of true would simply indicate that empty means null, and nil attribute false (the default) would mean that empty is decided according to the type of the element, e.g. type="object|array|string" means empty object, array or string, whereas type="number|boolean" still means null. This type-based decision should be made regardless of how the type is assigned (e.g. whether by the type attribute in the XML or by XForms bind or schema datatype).

Alain thinks that instead of underscores that a unique reserved name should be used. If so, would we really need the name attribute anymore? Would it still be handy?

Yes. Here's a mechanism for escaping JSON names to produce XML tag names: For each illegal character with hex code H (H may be any number of hex digits), encode with two underscores, the H, then one underscore. Starting with two underscores means that a single underscore appearing in a name does not have to be escaped. If two or more underscores appear in sequence, use two underscores followed by the number of desired underscores. Finally, JSON allows an empty name. This can be represented as an XML element with exactly two underscores, which cannot be the result of any other non-empty JSON name.

If this encoding were used, we would not still need the name. Moreover, it would not be handy to have it around unless, as an option, we could say either always generate it or never generate it. If the name attribute is only generated when escaping is needed, then an author is forced to write two kinds of expressions and to know when to write each.

What to do about illegal XML characters?

In element names, transliterations such as < becoming "ampersand lt;" are not correct. The name escaping mechanism should be used.

In element content (i.e. in element values that represent JSON values), characters such as \b and \f and other control characters are not XML chars, so they should be omitted. Coming up with an escaping mechanism is problematic because it produces a processing burden to test for it, maintain it, etc. At the least, it should be possible using a mime parameter or an attribute to indicate whether an escaping mechanism is desired or whether to simply omit illegal chars.

What to do about escaped \n, \t, \r, \b, \f?

As mentioned, \b and \f should be omitted from values due to being non XML Chars.

In a value, both \n and \t should be converted to octet literal equivalents, i.e. 0x09 an 0x0A.

In a value, \r should be converted to the entity "ampersand #xD;" in the XML serialization so that it is preserved as the literal octet 0x0D when the XML is parsed.

Is @start the right name? Or should arrays be indicated in a different way?

Probably not, arrays should probably be indicated differently. One problem is how to indicate/preserve an empty array value. Another is how to create a structure for arrays such that a nodeset expression produces the same number of nodes there are array elements, including zero nodes for an empty array. A third is how to deal with an array as the root of the JSON input. A fourth is how to deal with nested arrays. A fifth problem is how to handle mutations that add array elements before the starting element.

One idea is to put an attribute on each element to indicate it is part of an array, but this puts the onus on the author to remember the attribute and in any case doesn't solve the empty array problem. Another idea is to use a second attribute on the starting array element to indicate that the array is empty, but this doesn't solve the problem of producing zero nodes for an empty array.

Arrays should be treated like objects, i.e. they have a type="array" and they produce a level of depth so that there is one child XML element for each array value. Granted this would mean a different square bracket expression than in javascript, but this is really to be expected because xpath array indexing is 1-based and it does not automatically descend into structure as javascript does.

One suggestion for the name of the child element is "item". Another is to use the no-name element name, double underscore. From an XPath referencing perspective, an author would just use /* either way to get to the level down. Here's the company example again with some of these changes:

{"company":"example.com", "size": 50, 
 "location": { "class":"international", "places": ["Amsterdam", "London"]}}

is transformed to

<root type="object>
      <company>example.com</company>
      <size type="number">50</size>
      <location type="object">
           <class>international</class>
           <places type="array">
               <__>Amsterdam</__>
               <__>London</__>
           </places>
      </location>
</root>

allowing references such as

<bind nodeset="location/places/*[1]" ...

Here are other examples based on some of the above suggestions:

JSON	becomes
`{"size": "50"}`	`<root type="object"><size>50</size></root>`
`{"size": 50}`	`<root type="object"><size type="number">50</size></root>`
`{"*size": "50"}`	`<root type="object"><__2A_size>50</__2A_size></root>`
`{"name": {"given": "Isaac", "family": "Newton"}}`	`<root type="object"><name type="object"><given>Isaac</given><family>Newton</family></name></root>`
`{"size": [30, 40]}`	`<root><size type="array"><__ type="number">30</__><__ type="number">40</__></size></root>`
`{"open": true}`	`<root type="object"><open type="boolean">true</open></root>`
`{"values": null}`	`<root type="object"><values nil="true"/></root>`
`{"values": []}`	`<root type="object"><values type="array"></values></root>`
`{"values": {}}`	`<root type="object"><values type="object"></values></root>`
`{"values": ""}`	`<root type="object"><values></values></root>`
`{"": "generic"}`	`<root type="object"><__>generic</__></root>`
`[[1,2],[3,4]]`	<root type="array"> <__ type="array"> <__ type="number">1</__> <__ type="number">2</__> </__> <__ type="array"> <__ type="number">3</__> <__ type="number">4</__> </__> </root>

Generalization to other serializations

(This section is Informative)

Although XForms doesn't define transformations for other mediatypes, implementors are encouraged to. As with JSON, ideally the mapping and serialization should be opaque to the form. For instance a mapping for VCARD (to take an example) could look like this:

     BEGIN:VCALENDAR
     METHOD:PUBLISH
     PRODID:-//Example/ExampleCalendarClient//EN
     VERSION:2.0
     BEGIN:VEVENT
     ORGANIZER:mailto:a@example.com
     DTSTART:19970701T200000Z
     DTSTAMP:19970611T190000Z
     SUMMARY:ST. PAUL SAINTS -VS- DULUTH-SUPERIOR DUKES
     UID:0981234-1234234-23@example.com
     END:VEVENT
     END:VCALENDAR

can be transformed to

     <VCALENDAR>
       <METHOD>PUBLISH</METHOD>
       <PRODID>-//Example/ExampleCalendarClient//EN</PRODID>
       <VERSION>2.0</VERSION>
       <VEVENT>
         <ORGANIZER>mailto:a@example.com</ORGANIZER>
         <DTSTART>19970701T200000Z</DTSTART>
         <DTSTAMP>19970611T190000Z</DTSTAMP>
         <SUMMARY>ST. PAUL SAINTS -VS- DULUTH-SUPERIOR DUKES</SUMMARY>
         <UID>0981234-1234234-23@example.com</UID>
       </VEVENT>
     </VCALENDAR>

References

D. Crockford, The application/json Media Type for JavaScript Object Notation (JSON) http://www.ietf.org/rfc/rfc4627.txt

Standard ECMA-262, ECMAScript Language Specification, Edition 5.1 (June 2011) http://www.ecma-international.org/publications/standards/Ecma-262.htm

See also: Processing JSON Data (in XSLT 3) http://www.w3.org/TR/xslt-30/#json

http://www.json.org/

Discussion at Lyon FtF

(Owner Steven Steven Pemberton 16:41, 17 November 2010 (UTC))