Efficient XML Interchange Working Group Teleconference -- 02 Aug 2016

<scribe> scribe: TK

<scribe> scribeNick: taki

EXI4JSON

Character escaping method to represent JSON names in XML names

TK: DB and DP commented to TK's proposal.

DP: I think we could just use unicode number.

DB: I liked numeric number as well.

DP: Alternative is we say explicitly people need to use upper or lower case.

<brutzman> the compacted size is the same for numeric values. however there is a drawback: in-memory size for processing is bigger because the numeric strings are bigger.

<brutzman> Using regular expression (regex) for example to convert upper to lower case is quite fast and efficient.

<brutzman> The case of special interest here is when the (perhaps very large) numeric array is in string from getting validated as XML, just prior to EXI compression.

<brutzman> In comparing the alternatives, i expect that hex unicode is simplest and best.

<brutzman> DP: test comparisons could show whether there is an impact, don't think that there will be much difference.

DP: We do have another situation. How to express names when they equal to one of the predefined ones?

<brutzman> DP: of note is that this technique only applicable to keys, not values

<brutzman> Aha then, that greatly reduces (eliminates) the potential of really long strings.

<brutzman> Perhaps we could look at what is most common; likely hex

<brutzman> lower-case characters are smaller than upper-case hex characters 8)

<brutzman> since you two are implementers, you are welcome to decide hex/numeric later. the general concept of escaping with underscores + unicode value seems sound.

<dape> {"number": 123} would become <_.number> to indicate that it is a special name that cannot be used

DB: By going numeric, we could differentiate them better.

<brutzman> the two defaults that have a hex character as first character are j:array and j:boolean

<brutzman> ... so those might need some disambiguation from hex unicode when parsing, e.g. escaped special name _boolean versus escaped unicode value _b123

<brutzman> good example. agreed this case - where a key name uses a reserved word - is important for the escaping to always work.

Flag to indicate the need of un-escaping

DP: Helping decoders to know whether it needs to cope with un-escaping or not.

DB: It is a good concept to consider.

<brutzman> We seem to have a good escaping mechanism now (unambiguous and complete). So, is the un-escaping flag a requirement or a hint?

<brutzman> it seems safer to not have a flag that might have an incorrect value which can break the decoding; however if it is a hint that speeds up decompression then that is worth considering

DB: If it is a hint, that is good. If not, it may cause inconsistency.

<brutzman> software engineering principle: DRY don't repeat yourself

<brutzman> https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

<brutzman> hint value seems fine if performance can be improved, but default value and mistaken value should not cause harm

<brutzman> it is a curious situation. most JSON keys will not need escaping, so the default value of a hint would likely be to not perform escaping. however if an escaped key existed, it would have to be checked anyway.

<brutzman> ... so the hint might not be relevant, if a parser always has to check for escaping regardless of the value of the hint.

<brutzman> we know that an escaping capability is essential for key names

<brutzman> DP: the escaping might be a contained character, not just the first character.

DP: Always-unescaping may not be a lot of overhead in implementations.

<brutzman> if instead of a hint, then we might avoid using a default value to ensure it is considered correctly. however this approach has several issues making it less desirable; possible inconsistency, and requiring an implementation to define the value of a hint.

<brutzman> (discussion of tradeoffs)

<brutzman> it is not difficult for an encoder to set this value; either escaping occurred or not.

<brutzman> potential performance impact could be measured on parsing a large unmodified EXI4JSON document by comparing differences between escape checking and no escape checking

DP: I think there is overhead. If we introduce flags, flags also introduce overhead.

<brutzman> if the overhead of checking escaping on a character-by-character basis for each key is considered nontrivial, then that would lead us to requiring the encoder to set the value (which has no performance cost)

<brutzman> TK: should we revisit the topic later?

<brutzman> the analysis seems good here, a performance test will likely give us the answer pretty easily

http://www.w3.org/2005/06/tracker/exi/products/15

<brutzman> (discussion of readiness to release next draft after reconciling today's progress)

<brutzman> confirmation: today's discussion relates to ISSUE-116 and ISSUE-117 (and mostly resolves them)

<brutzman> http://www.w3.org/2005/06/tracker/exi/issues/116 JSON keys invalid as XML Names

<brutzman> http://www.w3.org/2005/06/tracker/exi/issues/117 Name Clash between JSON key names and names used by EXI4JSON

Canonical EXI

Communicating EXI-C14 options

<brutzman> DP: will consider today's discussion, update draft, review remaining issues and recommend to group whether to publish next EXI4JSON draft

DP: Option 2, fragment identifier may not be necessary.
... Option 3, is not feasible any more since we have more information that cannot be represented in EXI options document.
... If we count in bytes, there is only sometimes 1 byte difference.
... elements are more consistent with how the rest of options are described.

<brutzman> sent reply on member list (not yet archived) expressing that consistency is important

<brutzman> inconsistent = inefficient (and a source of error) so elements seems like the appropriate choice

<brutzman> suggested topic for upcoming call: can we write a paper together for WWW2017 on application of EXI to Open Web Architecture

<brutzman> suggested topic for upcoming call: future work

<brutzman> question: how amenable is EXI to insertion of binary data block (BDATA perhaps) similar to CDATA? The X3D binary encoding has a use case.

<caribou> it reminds me of representation maps...

<brutzman> ... presumably defining byte length of a follow-on block isn't too difficult

- DRAFT -

Efficient XML Interchange Working Group Teleconference

02 Aug 2016

Attendees

Contents

EXI4JSON

Character escaping method to represent JSON names in XML names

Flag to indicate the need of un-escaping

Canonical EXI

Communicating EXI-C14 options

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output