Efficient XML Interchange Working Group Teleconference

01 Mar 2016

See also: IRC log




<scribe> scribe: TK

<scribe> scribeNick: taki


DP: In EXI2 topics, there are related topics.


DP: Shared strings idea. In our use cases, JSON-LD is heavily loaded with strings that are well-known.
... Currently EXI does not provide any help for that.
... Is the group interested in this?
... The document also includes some other features.
... Grammar string, shared string and split string.

<brutzman> spellilng in section 5: requierements -> requirements

DP: We can use DTRM to map string encoding to this extension string representation.

TK: I can help advancing this document.

<brutzman> Presumably any computational cost of figuring out whether a given substring is part of the shared-string vocabulary only occurs during encoding. When decoding, it is essentially just a table lookup. Is that correct?

<brutzman> DP: yes that is true.

DP: Yes, that is correct. Decoder side just needs to look up the table.

<brutzman> Good, so there will be no noticeable impact on performance.

DB: No noticeable performance impact on encoder side is good.

DP: The idea is EXI WG is not the right place to define strings.

<brutzman> Next, wondering what strings to use, are they selected by EXI working group?

<brutzman> perhaps this is an extension mechanism as well? Different groups might have different vocabulary tables? Alternatively these might just be schema enumerations.

DP: EXI currently does not pre-populate string values from schemas.
... String value tables in EXI are always empty. Grammar strings should be allowed to be pre-populated.

<brutzman> A schema-based mechanism can be a good way to solve the need for vocabulary definition. Presumably enumerations are encoded. This could be a recommended practice for EXI use of schemas.

<brutzman> I think that XML Schema offers many capabilities which we ought to take advantage of. For example, if you applied EXI compression to a well-defined W3C Recommendation schema, then it would create a string table for you based on defined elements attributes and enumeration strings.

<brutzman> The list might be large by default, but stable schemas don't need to be installed more than once. Further the size of the table isn't an issue, especially if the schema table is external.

<brutzman> Stability of schema is simply a choice of document users: either a stable well-defined schema, or a flexible evolving schema. This approach is useful either way...

<brutzman> ... because it takes care of the coordination requirement when defining a "fixed" string table.

DP: I agree we should look for a way to allow to use strings in schemas.

<brutzman> It would at least be good to identify relevant supporting capabilities from XML schema in the note above.

DP: Whether we use all of them, or some of them, I am not sure yet.

<brutzman> It is also possible to say (a) a shared-string table has same structure as compressed schema enumerations, (b) a schema could be annotated to let a shared-string table generator extract enumerations of interest.

<brutzman> Am happy to contribute schema considerations to the Note.

<brutzman> I think that there is a lot more we might do with XML Schema and EXI, including usage of EXI-compressed XML schemas in support of streaming and common W3C document formats.

DB: Do we talk much about schema in EXI 2?

<brutzman> I see 5 issues already... we should list new version of XML Schema v1.1 as well... these issues might be grouped together so that comprehensive XML Schema support in EXI can be pursued.

<brutzman> For example I think this also relates to various list dialogs about HTML5 and compression.

<brutzman> Does anyone know why there is no XHTML schema for HTML5?

<brutzman> Is the X in XHTML not XML?

CB: most HTML5 documents are not XML.

DB: They are loaded into DOM, then can be serialized into XHTML.

<brutzman> If an HTML syntax page for HTML5 is loaded, it goes into DOM unambigously, and then can be serialized as XHTML unambiguously.

CB: That doesn't immediately make them XML.
... Syntactically, it is not XML. It just loads into DOM4.

<brutzman> I think if we then transform DOM into XHTML markup, we have something that EXI can compress.

<brutzman> Relevance of this use case: Accelerated Mobile Pages Project https://www.ampproject.org has a few characteristics in common

CB: That is subset of HTML that can compress better for mobile use cases.

DP: Subsetting HTML5.
... It is still HTML. Not XML.

<brutzman> Nothing seen in AMP FAQ regarding compression

<brutzman> I bring it up to point out that faster loading of mobile web pages is considered important. There has also been press interest on this project.

<brutzman> So higher-performance HTML pages seems to be an important goal for EXI as well.

<brutzman> It looks to me like the only thing missing is an XHTML schema for HTML5. That would let us apply EXI to HTML pages right away, enabling testing on size & performance & all of the other good things that EXI excels at.

<brutzman> We might further focus on CSS compression. We would then have HTML and JSON and CSS, plus MathML... most of the way.

<brutzman> If we then tried to get EXI to work compatibly with CBOR (or another Javascript compression scheme) then a comprehensive solution is emerging.

<caribou> Robin made some experiments on using EXI on html docs a few years ago

<brutzman> If HTML5.1 isn't working on a schema, EXI working group could ask them to do so or else declare reasons why not.

<brutzman> If HTML5.1 doesn't care about XHTML XML syntax, the EXI group could define a schema and invite improvements and note limitations.

<brutzman> This doesn't break anything. What's not to like? 8)

<brutzman> Perhaps Liam or someone can give us insight into XHTML XML HTML5 issues...

<brutzman> Agreed that Robin would have an excellent insight regarding these matters also.

<liam> [ polyglot HTML is *both* XML syntax *and* HTML syntax ]

CB: Applying XPath on HTML5. There are parallel issues. HTML5 is no longer XML.

<brutzman> I think we need to be precise on this topic. HTML5 includes both HTML Syntax and XHTML Syntax, with both mapping to a common DOM in a well-defined manner.

<brutzman> HTML5 section 8 The HTML syntax https://www.w3.org/TR/html/syntax.html#syntax

<brutzman> HTML5 section 9 The XHTML syntax https://www.w3.org/TR/html/the-xhtml-syntax.html#the-xhtml-syntax

<brutzman> So if we try to produce (or contribute to) an XML schema that supports XHTML representation of HTML5 DOM, then that seems quite useful for EXI.

Canonical EXI

ISSUE-110: What to do with xml:space being preserved in strict mode

Whitespace preservation mode


DP: Canonical EXI requires to use typed production first.
... xml:preserve have conflicts with strict mode.

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.144 (CVS log)
$Date: 2016/03/01 17:03:25 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.144  of Date: 2015/11/17 08:39:34  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/sectijon/section/
Succeeded: s/extentension/extension/
Succeeded: s/DB/DP/
Succeeded: s/HTML5/most HTML5/
Succeeded: s/we could/EXI working group could/
Succeeded: s/the only missing/the only thing missing/
Found Scribe: TK
Found ScribeNick: taki

WARNING: No "Present: ... " found!
Possibly Present: CB DB DP TK brutzman caribou dape exi https joined liam scribeNick trackbot
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy

WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Found Date: 01 Mar 2016
Guessing minutes URL: http://www.w3.org/2016/03/01-exi-minutes.html
People with action items: 

[End of scribe.perl diagnostic output]