Efficient XML Interchange Working Group Teleconference -- 05 Jul 2016

EXI4JSON

<taki> DP: If keyname cannot be represented using NCName, then we just say you should it differently. We may need more guidance in the future.

<taki> http://www.w3.org/XML/EXI/docs/json/exi-for-json.html

<taki> DP: What's the next step?

<taki> DP: Do we need to provide any measurement?

<dape> https://www.w3.org/XML/EXI/docs/json/exi-for-json.html#decisionStructureChange

<taki> TK: Did we describe what we changed?

<taki> DP: C.4 describes what we changed.

when ready, have some questions on Sections 3.1.1 JSON object and 3.2.8 key Elements

<taki> DP showing appendix D...

<taki> DP: In the new structure, key is an element name.

<taki> DP: It used to be an attribute value.

<taki> DP: This way, we can define a proper schema.

Suggestion for Appendix D Examples. insert rows with title and brief description so that we can refer to them

<taki> DB: I suggest to describe what each row in appendix D is about.

<taki> DP: We can also descibe the change by example in C.4.

<taki> DP explaining the changes by using examples...

I found the terms "keyName" and "keyNumber" a little confusing because they looked like reserved terms with special significance.

scribe: probably because you had "keyname" in 3.1.1 https://www.w3.org/XML/EXI/docs/json/exi-for-json.html#N65938

DP: yes the term "keyname" in 3.1.1 is reserved for that use

<taki> DP: "a number" is an invalid name in XML.

<taki> DP: We use *any* element name (e.g. "A"), and place the real name in "keyname" child element.

<taki> DB: Is "ANY_NAME" a synonym introduced by author?

<taki> DP: We will need a guidance when we need to use it.

<taki> DB: It is better if we had a convention as to how to name it.

<taki> DP: I agree. We had a discussion.

Perhaps key "1 %" can be "NCNAME_1_%" or NCNAME_1_escapeCharacter" ?

what about <j:structure><j:keyname>a number</j:keyname></j:structure>

scribe: or <j:object><j:keyname>a number</j:keyname></j:object>

extrapolating: <j:object><j:keyname>a number</j:keyname><j:number>s1</j:number></j:object>

<taki> DP: The alternative is to define a convention that can convert between JSON name and corresponding XML name.

<brutzman_> DP: ... because you cannot have an array that includes different content models within it

<brutzman_> if we backtrack to where "a number" was changed to >j:A> then what appears to be needed is come kind of canonicalization of any key that does not pass NCNAME requirements

<brutzman_> Please confirm, that outer <j:A> or else <j:a_number> (or whatever canonicalization is used) goes away in the final reconstituted JSON

<brutzman_> DP: yes it goes away on reconstruction when the <j:keyname> element is found [gave example on screen capture]

<brutzman_> So is the next requirement for consistency to have a unique canonicalization?

<brutzman_> DP: canonicalization not necessary, the naming is syntactic sugar since it goes away anyway

<brutzman_> so the critical aspect is that the subsituted name is unique? is that something tools should have liberty to choose? might make parsing harder...

<brutzman_> if a tool is making up names arbitrarily, the chosen values might collide with other keys in the document.

<brutzman_> signature comparison would also be thwarted

<brutzman_> DP: duplication of key names in two forms (non-NCNAME and perhaps canonicalized) reduces compactness

<brutzman_> EXI4JSON document could note that choice of a non-NCNAME key is allowed but can reduce compactness

<brutzman_> Letting a compressor chose alternate names (A, B, C or whatever) also reduces compactness, in addition to possibly resulting in element-name collisions

<brutzman_> wouldn't JSON { "keyname" : 3 } simply map to <j:keyname>3</j:keyname>

<brutzman_> DP: (showing example) no because "j:keyname" is reserved

<brutzman_> 3.2.8 key Elements https://www.w3.org/XML/EXI/docs/json/exi-for-json.html#N66387 "A key element is any element other than j:map, j:array , j:string , j:number , j:boolean , j:null , j:other, j:base64Binary, j:dateTime, j:time, or j:date, j:integer or j:decimal, j:keyname, and is transformed to ..."

<brutzman_> doesn't this imply that a JSON author can't directly use a reserved keyword like boolean?

<taki> DP: On the encoder side point of view, the name also can't conflict with reserved EXI4JSON element names.

thus keyname sustitution is unavoidable, and thus necessary, for this entire EXI4JSON approach to work

so my recommendation is to canonicalize keyname substitution for both the reserved elements and for the non-NCNAME j:keyname elements

i suspect that since j:keyword escaping might require special handling by EXI compressor and EXI decompressor in any case, perhaps compactness won't be affected if we find a good canonicalization scheme

one naive canonicalization: EXI4JSON_boolean or EXI4JSON_a_number or EXI4JSON_1_escapeCharacter

scribe: which has the benefit of being human readable/deducible

a more formal approach might be to adapt URL canonicalization, if that passes NCNAME requirements.

RFC 3986 Uniform Resource Identifier (URI): Generic Syntax https://tools.ietf.org/html/rfc3986 section 3.1

I don't know that URI scheme is necessarily best, but it is well understood and widely implemented.

DP: what about forced lowercase?

That is just for hostname, so we would need to skip that part

Another (perhaps even simpler) approach is XML/HTML character entities https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

XML Entity Definitions for Characters (2nd Edition) W3C Recommendation 10 April 2014 http://www.w3.org/TR/xml-entity-names

DP: pointed out some difficulties on his screen capture

i think our main challenge for keyname-sustitution canonicalization is uniquess; compaction will be a lesser factor

perhaps unique and bidirectional isn't possible for mapping keys to NCNAME? this is always the challenge for escape schemes.

since keyname sustitution is required, perhaps we should instead create a string table that complements the schema (without name collisions)?

scribe: in other words, instead of just trying to solve the problem with unique escaping, is there another feature of EXI and XML that we might also apply?

we could define a subsitution table in the reserved words, and thus simply make it a lookup when reconstituting JSON. this would let tools pick whatever names they wanted, unambigously.

a tool could also choose names that were helpful for compaction and not colliding, for example: A B D E G

wondering if there might be yet another variation here: making the EXI4JSON schema a separate namespace. Then include a subsitution-table mechanism there. This avoids doing 2 things with one schema.

Separation of the problem that causes name collisions might simplify the algorithm.

A further option might also be: "JSON authors who want to take advantage of schema-informed EXI compaction should avoid using the keywords of 3.2.8 and ensure that all keys are legal NCNAME value."

That has the advantage of simplicity, for end users and for implementations.

Problem cases we are worried about (keys such as "1 %") are perhaps not carefully designed object structures anyway...

old joke. patient says: "Doctor doctor, my arm hurts when i move it like this...."

scribe: Doctor: "OK, so don't move your arm like that. Now please pay the nurse $100."

- DRAFT -

Efficient XML Interchange Working Group Teleconference

05 Jul 2016

Attendees

Contents

EXI4JSON

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output