Efficient XML Interchange Working Group F2F -- 26-27 Oct 2015

<trackbot> Date: 26 October 2015

<scribe> scribe: TK

<scribe> scribeNick: taki

DP: Value string handling.
... strings are well-known. Many of them in WoT, for example.
... For small documents, it is a problem.
... We do not know whether and when EXI 2 happens.
... We could do it now without a lot of effort.
... I would tell them to create enumeration of string values in schema.
... e.g. enumeration of "Audi", "Golf", "BMW"
... The drawback is that you have to modify the schema.
... Extending a schema is much better than modifying it.
... EXI bases on schema-knowledge only. We should keep it. More dependency is not good.
... We could define a document that defines shared string values.
... How the string table gets created.
... Define a new EXI datatype. (exi:sharedStringValues)
... Use DTR to map string to sharedStringValues
... WoT uses JSON-LD
... There is an issue. Lots of pre-known vocabulary.
... JSON-LD 1464 bytes, CBOR 807, JSON EXI 523, EXI with shared string values 283 bytes

YF: If you define enumerations with lots of values, they get populated. I wonder whether it is good idea to put all enumerations in schema into table.
... Maybe we should have had it in EXI 1.0
... Can we have shared values in the header?
... sharedStringValues element can have value?

DP: Those from schema and those from header should be handled separately.

YF: In schema-informed, we have DTR.
... In schema-less, we don't have sharedStringValues.

DP: schema-less support will be in EXI2.

TK: Will we have a REC for this?

DP: No. Just a NOTE.

YF: We can leave open how schema-less case is supported.

DP: We should check with Carine if we can produce a Note for this.

YF: We should check whether there is no negative impact.

DP: Are there any schema constructs other than enumeration?

<dape> s/extendim/extending

DP: I am going to add better "union" support to EXI2

YF: If we can have a way to always fallback to string, that would be better.

TK: Do we have resolution to start working on Shared String Working Note?

DP: Let's check with CB about procedure.
... Do we have a template for Note?

<scribe> ACTION: DP to convert Shared Strings idea slides into document format. [recorded in http://www.w3.org/2015/10/26-exi-irc]

<trackbot> Created ACTION-728 - Convert shared strings idea slides into document format. [on Daniel Peintner - due 2015-11-02].

DP: JAXB may have an issue with union.
... JAXB uses java string for union typed data.
... We can still plugin EXI via SAX events. JAXB has special treatment for FastInfoset.

Open Source

DP: EXIficient will be changed to MIT-license based in a few weeks.
... I use JSON-based EXI grammar for YF's JS implementation.

DP demoing XML/EXI processing in browser based on Javasceipt-based EXIficient.

DP: I re-wrote EXIficient in JavaScript.

YF: You can use JS minifier.

DP demoing JSON/EXI processing in browser.

YF: SharedString implemented in JS as well?

DP: Currently implemented in neither JS or Java.

YF: Do you produce text while decoding?

DP: I use SAX-like events, and give to handler.

YF: I am interested about the speed. Compared with JSON parser.
... You need to prepare your code for making optimization possible.

DP: We plan to publish on the mailing list.

<liam> [ https://www.w3.org/XML/Group/qtspecs/specifications/xpath-functions-31/html/Overview.html#json-to-xml-mapping ]

JSON schema

DP: I noticed some differences between latest and the one we use.
... The latest one has anyAttributes all over the place.

LQ: It increases entropy.

DP: Yes, but it does not affect much.

LQ: The more people use the same schema, it is better.

DP: We have not gotten any response so far for the request we have made.

Johannes: Can you define extended schema yourself?

LQ: We can send it in to the public list.

<liam> [ public-xsl-query ]

Sebastian: There may be an issue in distinguishing float from double.

LQ: We will know in a few weeks whether the request will be met or not because the spec is supposed to become CR soon.

DP: We could use type cast, but the issue was type cast is costly.

LQ: Regarding the difference between REC and Note, one thing important is patent.

DP: Shared String extention will be just a few pages short document. Starting as a Note should be appropriate.

<scribe> ACTION: TK to ask Liam to write a blog article about JSON support when the work is done. [recorded in http://www.w3.org/2015/10/26-exi-irc]

<trackbot> Created ACTION-729 - Ask liam to write a blog article about json support when the work is done. [on Takuki Kamiya - due 2015-11-02].

Sebastian: It is a bit confusing to name EXI as "xml"-based when it is about to support JSON.

LQ: CSS support. If EXI can be used for multiple formats, it makes EXI more useful.

DP: In WoT, the motivation is for code footprint.

LQ: How much it would help in browser, I don't know.
... I made it optional whether JSON support should be REC or not.

<AxelPolleres> Hi, i’m an observer here at #TPAC today… we did an W3C member submission a while ago for a binary compressed serialization for RDF: http://www.w3.org/Submission/2011/03/… wanted to see where EXI is and what/whether we can learn from it.

<scribe> scribe: TK

<scribe> scribeNick: taki

Recapping yesterday's discussion...

AP: Do you preserve order in JSON support?

DP: We keep the order.

AP: We can compress further if we can reorder items.

DP: In our experiments, EXI with optimized schema compressed better than SHDT in many cases.
... You may want to talk to WoT's Thing Description task force.

Canonical EXI

YF: It looks like JS has his priority use case.

<dape> https://lists.w3.org/Archives/Public/public-exi-comments/2015Oct/0008.html

TK: JS wants no-timezone normalization option.

https://lists.w3.org/Archives/Public/public-exi-comments/2015Sep/0001.html

DP: XML already does timezone normalization, right?

YF: Canonical XML doesn't do it.

DP: If you go through binding layer, timezone may be normalized.

YF: No normalization should be the default, and it should be OK.

DP: I fear that it will break if binding layer changes timezone.
... If the default is the other way around, it is fine.
... If we remove this option, then we have only two variations

RESOLUTION: We provide only two variations for Canonical EXI. One with header options, the other without header options.

EXI2

<dape> TK: Charter asks for EXI2

<dape> ... we started a collections of ideas for future EXI version, https://www.w3.org/XML/EXI/wiki/EXI2

<dape> YF: Depending on the "new" features we might decide whether to add it as an extension to EXI1 and later merge it to EXI2

<dape> ... for example shared string tables

<dape> ... how about real JSON schema

<dape> DP: what do you mean by JSON schema? XML schema or "real" JSON schema

<dape> YF: can by any... in the end there will be grammars

<dape> .. knowing you have X elements in this JSON object and other Y in other JSON object

<dape> DP: You could do that already now by definig such an XML schema

<dape> YF: True, but it is not in the scope of current JSON proposal. Would give better compaction

<dape> DP: Correct. However, string table help already a lot... structure knowledge help further.. for sure

<dape> ... but is more complex also..

<dape> YF: In Javascript world you could just ise dedicated JSON grammars

<dape> DP: requires "shared" and common grammar system

<dape> DP: dedicated XML schema mapping may cause issues (having the same element with different type in current mapping), see http://www.w3.org/TR/xmlschema-1/#cos-element-consistent

<dape> YF: w.r.t. JSON mapping we could also define grammar representation

<dape> DP: I think we can start note very simple and add such a JSON grammar definition later

<dape> DP/YF: keep the idea in mind (add note to the document in the beginning to not forget about it)

<dape> TK: https://www.w3.org/XML/EXI/wiki/EXI2#.28Improvement.29_EXI_Options_Document

<dape> DP: idea was to open up the schemaId type allowing it to type it as integers and such

<dape> YF: this is a simple improvement.. let use find others of that type and group them

<dape> TK: Next is https://www.w3.org/XML/EXI/wiki/EXI2#.28Improvement.29_EXI_Options_Document

<dape> .. this one https://www.w3.org/XML/EXI/wiki/EXI2#.28Improvement.29_SelfContained_Elements_within_Compressed_Streams

<dape> idea of large XML documents where one the one hand compression is required and on the other hand index (e.g. to certain sub elememts) is desired

<dape> DP: With two moinor changes it should be possible in EXI 1.0

<dape> a) allows having selfContained and (pre)-compression in the EXI options at the same time

<dape> b) define what happans if you have SC element (flush the stream)

<dape> YF: hard to add it to EXI1.0

<dape> .. could be note how to to it.. is an extension that is easily implemented... and may find its way to EXI2.0

<dape> TK: Next is "(Improvement) Reduce Overhead of xsi:type cast"

DP: Type-cast is costly.

YF: You can have a list of types, and use index.
... You need a link to base type.

<dape> next "(Improvement) SelfContained Elements with some prior learning"

<dape> TK: shared string table relates to this issue.

<dape> next "(Improvement) Redundant second level productions in strict FALSE"

<dape> YF: I would suggest doing the reverse.. having the same 2nd level productions all the time

<dape> DP: agree

<dape> Next "(New Feature) Flush in Stream"

<dape> DP: Would provide an options that adds a "flush" production on 2nd/3rd level

<dape> TK: Flush useful in bit-packed mode only

<dape> DP: Flush in compression mode "could" also imply close the block so it might be useful there as well

<dape> DP: XMPP already uses a solution that uses EXI body for each stanza.

<dape> next "(New Feature) Streaming"

<dape> TK: Not sure about this feature

<dape> DP: I believe that Davids point is that there seem to be protocols that actually wait till a certain amount of data is read (block gets full) before forwarding the data to the application

<dape> TK: In that case we can't do anything.. I think it does not relate EXI... other formats would suffer the same problem

<dape> next "(New Feature) Canonical Format of Schema-Informed Grammar"

<dape> DP: not sure about the term "canonical" here

<dape> TK: Yes, it is meant to mean "standard" format

<dape> DP/TK: tried it in the past. Too many variation in existing implementations

<dape> TK: Let's skip it unless we have a new good idea

<dape> next "(Improvement) Schema-Informed Grammar for Wildcard Content"

<dape> DP: is it about switching grammars in the "middle" of the stream

<dape> TK: yes, element foo matches wildcard. foo is known in other grammar to which we could switch to

<dape> ... it has the advantage that you don't need to load the full set of grammars

<dape> break

<dape> next "(New Feature) BigFloat"

<dape> TK: DP: Idea to expand the range of float

<dape> TK: Not stated why we need no limit

<dape> .. currently the spec stays between the xsd datatypes

<dape> xsd spec restricts double, see http://www.w3.org/TR/xmlschema-2/#double

<dape> TK: The problem statement does not describe why we need to consider to further expand the range.

<dape> next "(New Feature) Proper support of xsd:union"

<dape> DP: Other formats like BiM do provide a codec to identify which union memberType is used. EXI could do the same

<dape> DP: we have the XBRL use cases

<dape> DP: MPEG7 schema uses also unions

<dape> TK: would xsd:union support increase complexity?

<dape> DP: I don't think so.. maybe reduces size and might slight impact encoder speed

<dape> TK: tend to think we should try this out

<dape> DP agree

<dape> next is "(Improvement) Memory-sensitive EXI Grammar approach"

<dape> DP: in cases such as MPEG they tend to use maxOccurs="10000" which leads to huge grammars

<dape> TK: it has benefit but we need to think about consequences

<dape> DP: by consequence you mean such use cases defining 10 element and they would end up with a list of possibly unbounded entries

<dape> DP: I think even xerces falls back to unbounded for larger values when validating

<dape> TK: it could be as simple as if maxOccurs is greater than X we set to unbounded (but we need to check)

<dape> TK: There is an issue with the following schema:

<dape> TK: this one is valid...

<dape> .. if we change maxOccurs="10" to "unbounded" and minOccurs="1" it violates UPA

<dape> DP: its a corner case, but yes it is an issue

<dape> DP: Could check what BiM does...

<dape> next is "(Improvement) Better String Table in terms of Local ID"

<dape> DP: I think it is desirable but reusing unsassigned ids would mean going through the list and modify (decrement) all existing identifiers

<dape> next is "(Improvement) Number of Significant Digits"

<dape> TK: Not sure whether this is EXI issue..

<dape> .. implementation could limit digits

<dape> next is "(Improvement) EXI Properties Embedded in XML Schema".

<dape> TK: Comment from Don and Bruce

<dape> ... ideas is to add header options into a schema so that EXI processor knows what works best for EXI

<dape> DP: Apart from xsd:documentation is there anything an XML schema parser may be able to report

<dape> ... best properties depend also on size of the XML document (the smaller the less good EXI compression works)

<dape> next is "Random access"

<dape> TK: not really clear what is wished here and how it differs from SelfContained we already have

<dape> next is "Fallback grammar"

<dape> TK: simplifying profile

<dape> DP. Correct: e.g. instead of creating a new grammar for unknown element we can fallback to xsd:any type

<dape> next is "Abstract elements and types"

<dape> TK: new version might want to exclude abstract elements

Efficient XML Interchange Working Group F2F meeting

26-27 Oct 2015

Attendees

Contents

Open Source

JSON schema

Recapping yesterday's discussion...

Canonical EXI

EXI2

Summary of Action Items