IRC log of exi on 2016-08-02

Timestamps are in UTC.

14:02:23 [RRSAgent]
RRSAgent has joined #exi
14:02:23 [RRSAgent]
logging to http://www.w3.org/2016/08/02-exi-irc
14:02:25 [trackbot]
RRSAgent, make logs public
14:02:25 [Zakim]
Zakim has joined #exi
14:02:27 [trackbot]
Zakim, this will be EXIWG
14:02:27 [Zakim]
ok, trackbot
14:02:28 [trackbot]
Meeting: Efficient XML Interchange Working Group Teleconference
14:02:28 [trackbot]
Date: 02 August 2016
14:04:56 [brutzman]
brutzman has joined #exi
14:05:37 [dape]
dape has joined #exi
14:08:52 [taki]
scribe: TK
14:08:58 [taki]
scribeNick: taki
14:09:11 [taki]
TOPIC: EXI4JSON
14:09:24 [taki]
TOPIC: Character escaping method to represent JSON names in XML names
14:10:27 [taki]
TK: DB and DP commented to TK's proposal.
14:11:02 [taki]
DP: I think we could just use unicode number.
14:11:48 [taki]
DB: I liked numeric number as well.
14:12:37 [taki]
DP: Alternative is we say explicitly people need to use upper or lower case.
14:12:57 [brutzman]
the compacted size is the same for numeric values. however there is a drawback: in-memory size for processing is bigger because the numeric strings are bigger.
14:14:37 [brutzman]
Using regular expression (regex) for example to convert upper to lower case is quite fast and efficient.
14:16:14 [brutzman]
The case of special interest here is when the (perhaps very large) numeric array is in string from getting validated as XML, just prior to EXI compression.
14:17:09 [brutzman]
In comparing the alternatives, i expect that hex unicode is simplest and best.
14:17:37 [brutzman]
DP: test comparisons could show whether there is an impact, don't think that there will be much difference.
14:18:24 [taki]
DP: We do have another situation. How to express names when they equal to one of the predefined ones?
14:18:37 [brutzman]
DP: of note is that this technique only applicable to keys, not values
14:19:13 [brutzman]
Aha then, that greatly reduces (eliminates) the potential of really long strings.
14:20:06 [brutzman]
Perhaps we could look at what is common; likely hex
14:20:26 [brutzman]
s/common/most common/
14:21:35 [brutzman]
lower-case characters are smaller than upper-case hex characters 8)
14:28:23 [brutzman]
since you two are implementers, you are welcome to decide hex/numeric later. the general concept of escaping seems sound.
14:28:55 [brutzman]
s/escaping/escaping with underscores + unicode value/
14:29:10 [dape]
{"number": 123} would become <._number> to indicate that it is a special name that cannot be used
14:30:38 [dape]
s/<._number>/<_.number>
14:32:18 [taki]
DB: By going numeric, we could differentiate them better.
14:32:48 [brutzman]
the two defaults that have a hex character as first character are j:array and j:boolean
14:34:08 [dape]
<map><number><number>123</number></number>
14:34:23 [brutzman]
... so those might need some disambiguation from hex unicode when parsing, e.g. escaped special name _boolean versus escaped unicode value _b123
14:35:15 [dape]
<map><_.number><number>123</number></_.number></map>
14:36:59 [brutzman]
good example. agreed this case - where a key name uses a reserved word - is important for the escaping to always work.
14:40:30 [taki]
TOPIC: Flag to indicate the need of un-escaping
14:41:20 [taki]
DP: Helping decoders to know whether it needs to cope with un-escaping or not.
14:42:10 [taki]
DB: It is a good concept to consider.
14:43:08 [brutzman]
We seem to have a good escaping mechanism now (unambiguous and complete). So, is the un-escaping flag a requirement or a hint?
14:44:15 [brutzman]
it seems safer to not have a flag that might have an incorrect value which can break the decoding; however if it is a hint that speeds up decompression then that is worth considering
14:44:59 [taki]
DB: If it is a hint, that is good. If not, it may cause inconsistency.
14:45:06 [brutzman]
software engineering principle: DRY don't repeat yourself
14:45:40 [brutzman]
https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
14:46:39 [brutzman]
hint value seems fine if performance can be improved, but default value and mistaken value should not cause harm
14:48:41 [brutzman]
it is a curious situation. most JSON keys will not need escaping, so the default value of a hint would likely be to not perform escaping. however if an escaped key existed, it would have to be checked anyway.
14:49:27 [brutzman]
... so the hint might not be relevant, if a parser always has to check for escaping regardless of the value of the hint.
14:50:30 [brutzman]
we know that an escaping capability is essential for key names
14:50:55 [brutzman]
DP: the escaping might be a contained character, not just the first character.
14:52:27 [taki]
DP: Always-unescaping may not be a lot of overhead in implementations.
14:52:51 [brutzman]
if instead of a hint, then we might avoid using a default value to ensure it is considered correctly. however this approach has several issues making it less desirable; possible inconsistency, and requiring an implementation to define the value.
14:53:13 [brutzman]
s/define the value/define the value of a hint/
14:53:51 [brutzman]
(discussion of tradeoffs)
14:54:29 [brutzman]
it is not difficult for an encoder to set this value; either escaping occurred or not.
14:57:22 [brutzman]
potential performance impact could be measured on parsing a large unmodified EXI4JSON document by comparing differences between escape checking and no escape checking
14:58:48 [taki]
DP: I think there is overhead. If we introduce flags, flags also introduce overhead.
14:59:43 [brutzman]
if the overhead of checking escaping on a character-by-character basis for each key is considered nontrivial, then that would lead us to requiring the encoder to set the value (which has no performance cost)
15:01:22 [brutzman]
TK: should we revisit the topic later?
15:02:11 [brutzman]
the analysis seems good here, a performance test will likely give us the answer pretty easily
15:07:41 [taki]
http://www.w3.org/2005/06/tracker/exi/products/15
15:07:57 [brutzman]
(discussion of readiness to release next draft after reconciling today's progress)
15:10:36 [brutzman]
confirmation: today's discussion relates to ISSUE-116 and ISSUE-117 (and mostly resolves them)
15:11:10 [brutzman]
http://www.w3.org/2005/06/tracker/exi/issues/116 JSON keys invalid as XML Names
15:11:54 [brutzman]
http://www.w3.org/2005/06/tracker/exi/issues/117 Name Clash between JSON key names and names used by EXI4JSON
15:13:40 [taki]
TOPIC: Canonical EXI
15:14:01 [taki]
TOPIC: Communicating EXI-C14 options
15:14:19 [brutzman]
DP: will consider today's discussion, update draft, review remaining issues and recommend to group whether to publish next EXI4JSON draft
15:16:06 [taki]
DP: Option 2, fragment identifier may not be necessary.
15:16:36 [taki]
DP: Option 3, is not feasible any more since we have more information that cannot be represented in EXI options document.
15:20:15 [taki]
DP: If we count in bytes, there is only sometimes 1 byte difference.
15:23:25 [taki]
DP: elements are more consistent with how the rest of options are described.
15:24:02 [brutzman]
sent reply on member list (not yet archived) expressing that consistency is important
15:28:48 [brutzman]
inconsistent = inefficient (and a source of error) so elements seems like the appropriate choice
15:30:35 [brutzman]
suggested topic for upcoming call: can we write a paper together for WWW2017 on application of EXI to Open Web Architecture
15:30:49 [brutzman]
suggested topic for upcoming call: future work
15:31:55 [brutzman]
question: how amenable is EXI to insertion of binary data block (BDATA perhaps) similar to CDATA? The X3D binary encoding has a use case.
15:32:38 [caribou]
it reminds me of representation maps...
15:32:49 [brutzman]
... presumably defining byte length of a follow-on block isn't too difficult
15:35:16 [taki]
rrsagent, create minutes
15:35:16 [RRSAgent]
I have made the request to generate http://www.w3.org/2016/08/02-exi-minutes.html taki
16:26:37 [Zakim]
Zakim has left #exi
18:17:06 [liam]
liam has joined #exi