See also: IRC log
<scribe> Scribe: ChrisL
<Vlad> Hello everyone!
<kbx> kbx (kenjibaheux)
<Vlad> The meeting agenda is at http://lists.w3.org/Archives/Public/public-webfonts-wg/2013Oct/0001.html
(everyone now knows everyone)
Vlad: were considering LZMA until
august but uphill battle and we decided to radically change
course to a different entropy coder
... we were stuck, now we have a way forward
... thanks to Google compression team
raph: brotli is a new algorithm
based on flate, performance really proving itself
... good match to needs of webfonts
... just open sourced this morning the first of several modules
- the decoder
... a specification and the encoder source will follow
... will present measurements today, on the encoder and also on
the preprocessor step which has not changed
... aimed midway betweenlzma and gzip, have surpassed that
while getting much faster decompression speed
... specifiability also much better than with lzma
... per table or single stream we have interesting findings
also
... 29.21% improvement on google font corpus compared to woff
1.0
3.18x decompresssion speed loss
preprocessing 4.5ns/byte
entropy 12 ns/byte
woff 1.0 5.1ns/byte
raph: slides will be releasd publicly
<Joe_Vieira> % improvement is total library aggregate savings
raph: gzip however is hugely optimised while this is not, yet
Vlad: preprocessing is actually
de-preprocessing
... (explanation of observer rules)
raph: lzma size win 4.74% on google font corpus
brotli 1.67x decompression speed win
ChrisL: so 3.18 × 1.67 = 5.3106 times slower than woff1 for lzma
raph: slide shows fonts ordered by compression gain
christopher: these are al TT?
raph: yes, need numbers for CFF
as well
... spike at end is from badly done fonts with lots of
redundant data
... for the rest of the graph they track very well
... continue stream is significantly better
... 1.3% compression improvement and 8.7% better speed to do
whole-font vs. per-table
... here we have data that whole font approach is better
ChrisL: argument in woff1 was selective table decompression and byte-range fetches. neither have seen use in practice
Vlad: some optimisation across
tables, so you really need to fetch all tables to reverse
preprocess
... again in favour of whole-font compression
raph: do we still want gzip?
compression is 85x slower than gzip for compression
... a fast brotli compressor (same wire format) is possible
Vlad: if brotli has a few established switches to affect compression, that is good. for many users, may be better to subset the font then compress on the fly, so compression speed is important
jyrki: its possible to make a brotli compressor that makes a valid bytesteam and is faster to compress but not as good compression
raph: having options in the implementation is good but there is one wire format
ChrisL: decompressor does not have switches and doesnt care how it was made
raph: right
christopher: effect of per-table?
raph: with gzip, continue stream
is on average not optimal so its a small benefit to compress
per table
... because the compressor uses stats from previous table which
are not good
... no significant effect on decompression speed
... preference is to simplify spec as brotli-only byetelevel
compression with whole font only compression
Vlad: need to revisit per-table option as it limits optimisations; they need the whole font to work
jyrki: plan to complete spec for brotli in 6 weeks
sergei: is it "too complex to explain" like lzma
jyrki: no! much like flate
Vlad: starting from flate makes it easier
raph: also we understand the importance of a solid spec, for security review for example. lzma did not have that
ChrisL: (explains benefit of early spec publication to stimulate review)
raph: we are within 5% of lzma which is very positive, given its also way faster
Vlad: want a veryclear statement, what exactly we are measuring. because we are mixing in mtx and entropy coding together
(we agree what the baselines are and that we are comparing apples to apples)
raph: we will be sharing the spreadsheet of detailed results, has exact byte counts per option.
jyrki: brotli with filtering is 1.67x faster than lzma but with no preprocessing is 2x faster
raph: expect profiling will improve on 2x faster than lzma
jfkthame: looked at comparison to
other flate+ like oodle
... or lzham
... question is have you compared them
raph: yes
... not in slides but we have looked at it. we dont have apples
to apples comparison for lzham
... lzham has a very expensive startup cost
jyrki: lzham works badly on small
fonts, catches up on large fonts but does not meet what brotli
does
... better in compression speed and compression ratio
christopher: what are you meaning by small and large fonts here?
raph: on the google fonts corpus,
threashold is 2Mbytes so CVJK fonts mainly, up to
4.7Mbytes
... lzhmm is also not actively developed in last year, not
production quality
... oodle is proprietary, but looking at reported perf its
similar to lzma a little faster, there is another profile tuned
for decompression speed
ChrisL: so in summary neither is better than brotli
jfkthame: need to show the others were looked at, otherwise itsz an obvious open question
jyrki: need to use current head
of lzhmm and packaging is clumsy, needs work to be robust.
Makes it difficult to repeat.
... its for huge corpora, not tuned for font file sizes, it
barely gets started
<raph> links for other compression algorithms: https://code.google.com/p/lzham/
<raph> Oodle: http://www.radgametools.com/oodlecompressors.htm
<raph> Oodle (blog post): http://cbloomrants.blogspot.nl/2012/09/09-22-12-oodle-beta-and-roadmap.html
https://code.google.com/p/font-compression-reference/
<raph> direct link to brotli code: https://code.google.com/p/font-compression-reference/source/browse/#git%2Fbrotli%2Fdec
jyrki: start was to use flate
with bigger window size
... we had prior experience with flate family
<raph> one of the implementations of flate compression is Zopfli: https://code.google.com/p/zopfli/
jyrki: noticed other things to
improve because larger window was not a significant
improvement
... entropy code reuse, windows size, context modellling,
literal count coding, distance cache, match lengths, and block
end codes are more efficient
... brotli can re-use past entropy codes within the same
metablock
... in flate, 50 byte overhead to change code. brotli can reuse
any previous entropy code
... codes refered to by number and also by order (second
last)
... window size, past 16MB rather than 32kB
... discussion if 8 or 4 MB would be better for decoder
jfkthame: impacts memory useage
of decoder
... will not fit in ram on low end or cache on medium end
behdad: also most fonts smaller than 4MB
zoltan: size of entire stream is sent in the header so never need to allocate more than eventual decompressed size
jyrki: only make long references for large data blocks so ther cost of exceeding cache is justified
raph: mainly want to see if ration is stable across architectures
jyrki: dektop and mobile cache architectures are becoming more similar
kenji: avoid picking a tooo low value that needs to be changed in the future
jyrki: prefer a larger limit with a hint to use lower values in practice
raph: gzip allows window size to be specified (but in header, not flate stream)
zoltan: encode maximum backward reference (log2 window size) in stream
jyrki: paying attention to past
two or three bytes can get more specific and thus more
efficient entropy coding. this is called context
modelling
... 5% gain in compression density
eg a "." is much more likely to have a " " after it
jyrki: number of pure literals is
counted, next literals with only entropy code
... number of bytes follwed by huffman coded bytes, then back
reference
... so different way to express block end. inherently faster to
decode
ChrisL: so removing a decision from an inner loop
jyrki: backward references can be
described in terms of four past distances, or two last with a
+/-3 offset
... can cope with small insertions or deletions in otherwise
similar strings
... flate has match lengths of 3-258 bytes. brotli has match
lengths of 2 as well. no longer codified separatelyy but with
length of literal seqiuences. works well with pulse repeating
data
... match lengths codified in joint entropy code with literal
lengths
... they have surprisingly high correlation
... in flate, any symbol could be end code; in brotli block
lenght is coded at the beginning so loop unrolling can be
used
ChrisL: very clear, thank you
Vlad: how hard to mess up compression stream to get at vulnerabilities? false lengths etc
jyrki: compression can be
exploited to find protected memory. Longer window extends
attack surface
... no other ways spring to mind
raph: decompression is inherently
high risk, it gets untrusted data.
... chrome looks at any new code with a rigorous security
review. not yet started for brotli but was in progress for
older woff2 (lzma)
kenji: yes, probing with random data was started. then code review by security engineer
raph: fuzz testing uses mutations
of good data. random data checks widely different
branches
... automated large scale testing
Vlad: can the results for brotli be made available
kenji: vulnerabilities will be disclosed and patched but the specific bad sequence is typically not disclosed
jyrki: total code size is fairly small, 1200 lines of C
raph: showing what the spec sections would look like
zoltan: (slides will be made
available)
... ring buffer of last four distances, reused in within +/-3
of last two giving 16 distance codes
encoding of commands in buckets powers of two, similar to flate
zoltan: short, medium and long
categories of literals and copy lengths, with 8 buckets each.
joint histogram
... last distance is specially coded as it recurrs
frequently
... then build a regular huffman code over that histogram
... for fonts, data is 4byte aligned so we can fix the last
number of bits. assumes distribution of extras bits is flat,
but its not so we have buckets with the last k bits fixed
... texst for example does not have the 4byte aligned
structure
... huffman code says how many bits each sigmnal will need.
then you make a histogram and create a huffman code.
... sequence of bit length has run length coding to encode
missing symbols
bit lenghts 0 to 15 plus three specialcodes giving 19 symbol alphabet
scribe: deflate has a different order.
zoltan: block splitting.
independent block boundaries for literals, commands and
distances
... encoding of block switch symbols
... (see table on slide)
... encoding of the block split
... context modelling for literals
... B-3 to B-1 are from already decoded bytes
... lok up in context map, find an index to a huffman code used
to encode next byte
... context modelling sensitive to type of data. have one or
two for fonts, others for ascii and utf-8 data
... also a simple versions, context is the copy length,
distribution for small copies is much different to larger ones.
so check for length 2, 3, or 4 and more
... context map encoding
... can disable context modelling in 1 bit
... this part is still under discussion
... move-to-fron thansfers many symbols to zero
... format specification. (see syntax diagram on slides)
ChrisL: max size is 2^(8x7)
(we agree this is plenty)
(see syntax diagrams on following slides)
Vlad: lets review, then agree to publish. its a long overdue deliverable. So we should say what we have been up to.
<kbx> kenjibaheux
<scribe> scribenick: kbx
<ChrisL> http://www.w3.org/Fonts/WG/WOFF2ER/
based on last meeting discussion added some requirements
added CSS3 fonts parallel deployment requirement
clarification of lossless's meaning
micro type express style plus LZMA => Preprocessing + LZMA with explanation about the preprocessing being based on microtype express
appendix A will contain all the data we had about LZMA
about the decompression and memory requirement section: replace "enterily due" with "mostly due" and use appendix A for actual data
how about qualifying "memory requirement" bit?
jyrki: context modeling and back forth reference. I believe that the latter dominates for LZMA.
raph: (basically 2x of gzip)
Candidate A: Preprocessing plus Brotli Compression Algorithm
will add what was presented today
and eventually publish this report
can be updated as we wish
Vlad: if we could agree within this meeting given that we have an understanding of what will go into the report.
<scribe> ACTION: ChrisL to update the evaluation report with information from today [recorded in http://www.w3.org/2013/10/11-webfonts-minutes.html#action01]
<trackbot> Created ACTION-119 - Update the evaluation report with information from today [on Chris Lilley - due 2013-10-18].
resolution: publish the evaluation report
Data for Brotli comparison over Google Fonts corpus
Per font comparison (Google Font): https://docs.google.com/spreadsheet/ccc?key=0AjdKc3tA4Jb0dGJpeGVxN21ONzVDX3I5SXNPdzhPQmc&usp=sharing Whole corpus comparison with different options (Google Font) https://docs.google.com/spreadsheet/ccc?key=0AjdKc3tA4Jb0dFJGZFk3NzZQNENKczJGVDZ3QTk2T1E&usp=sharing
Showing graph on third sheet of the per font spreadsheet
raph: (comments about the spreadsheet: example of important things to call out: average compression, worst case, the Max is from a font that has inherent design issues)
discussion about rejecting invalid fonts. WOFF 2.0 does not have strict checks (this is more an OTS like level issue).
would be interesting to run an experiment on a set of sloppy put together fonts.
raph: another interesting figure: Korean fonts (well coded fonts) with excellent compression gains
ChrisL: (is this because of inherent characteristics of Korean characters)
Raph: (it's more about how the font is built)
ChrisL: what about ordering changes?
Raph and kbx: we had this graph but it's gone from the spreadsheet. nothing wild, noisy => no major insights to gain from there (we think)
ChrisL and Raph: pointing out that the final compression gains achieved by Brotli outperformed our initial goal of half's WOFF 2.0 (LZMA) gains
Whole corpus spreadsheet, explanations of the graphs
Raph, David: LZMA compression gains also improved slightly with continuous stream
Raph and David: (pointing out that decompression speed is the current state and could be improved further since we haven't spent time optimizing for this yet)
Raph: bottom half of the
spreadsheet is for the large fonts corpus
... (decompression rate comparisons of per stream or continuous
stream overhead column H line 2-5 and line 7-9)
<raph> (scribing earlier discussion): the main goal of this work is to optimize fonts that are already well optimized
<ChrisL> scribenick: ChrisL
Vlad: open question, assume google has someone in compression in ietf
<raph> it would be possible to optimize fonts with (for example) very large kern tables but it's not a goal
raph: I assume so but not sure who that is
<raph> it might be worthwhile to rerun the experiment over a corpus that has already had the kern tables optimized, for example using the tool that Adobe presented on Wednesday at ATypI
Vlad: two options, google does it themselves, or submitted to w3c as a proposal and then liaison
<raph> that experiment would separate out the effect of cleaning up sloppily encoded fonts vs general effectiveness
(scribe, clean-up order)
Vlad: dont want us held up to finalize our spec if its not clear who picks it up
<kbx> ChrisL: (explained upcoming changes to which level of specs can be referenced)
<kbx> Raph: suggesting that we reach our IETF representatives for advices on how to proceed further. That should be a matter of 15 minutes discussion.
<kbx> ChrisL: "mostly aimed at fonts", well what else can it do? "http" then you get other folks involved and that might induce delays...
<kbx> Raph: good example would be Flate (IETF)
<kbx> Vlad: additional question would be to find out if it matters who reaches out?
<kbx> scribenick: kbx
Raph: (if it happens that we have difficulties to reconciliate with other groups, we have a good case to make given our use case and the data on hands)
Vlad: once the evaluation report is published, we start working on the spec
Vlad: target date for the first draft of the spec?
Raph: byte level spec or WOFF 2.0 spec?
Vlad: WOFF 2.0
Raph: co-editors?
... (volunteering)
ChrisL: (explaining alternatives to deal with Co-chairing and Editorship). Should not be an issue and if it becomes one we can always find a solution.
Vlad: (agreed to be the editor)
Raph: (welcomed the
proposal)
... target date is not blocked on Brotli
Vlad: need ability to point somewhere for Brotli
Raph: end of year sounds feasible
Vlad: this is more about an internal draft for the group
ChrisL: first public working
draft can be quite rough but it defines the work so it's good
to sketch a large scope area (due to patents related
matters)
... (explaining how patents work in w3c; patent
exclusions)
... so before we publish we should be careful about the
scope
Raph: the scope of this document would be almost the same as the prior write up. I don't see the scope changing in anyway at all. Because even if we had to turn around on the choice of the encoder, we haven't found opportunities to make changes to the preprocessing part.
<raph> many opportunities to do clever transformations in the preprocessing step help when you have gzip, but not effective when you have a more sophisticated byte-level compression format
Vlad: end of the year for the
internal draft sounds good
... will be bug down for the next month or so but if Raph could
start with an initial draft
Raph: (agreeing with the intent of focusing on the content rather than the spec wording aspects)
ChrisL: needs boilerplate, in HTML, normative and non normative should be clearly called out
Vlad: ChrisL if you could send links to the procedures and the tools used
<raph> vlad: the empirical measurements bear out transforms having relatively less impact when using better byte-level compression
ChrisL: there is ReSpec (http://www.w3.org/respec/) which does the boilerplate and so on
Vlad: ChrisL did you have the
time to talk around (follow up from previous meeting)
... the original concerns was that it seemed like an uphill
battle. Is this still true?
ChrisL: (unfortunately, this is
still true to some extent)
... noting that this isn't on the critical path for WOFF
2.0
<raph> (Adobe tool that optimizes kern tables, discussed earlier: https://github.com/adobe-type-tools/python-modules/blob/master/WriteFeaturesKernFDK.py)
ChrisL: if we can't get it done
in time, we do like we did for WOFF 1.0
... not opposed, will do if the group wants me too but suggests
that we focus on WOFF 2.0 first
Vlad: takes almost no time to put
a draft of that top level application, so let's write it down
send it and take room temperature
... (should be enough to find out how fast this will
move)
<scribe> ACTION: vlad to prepare a new draft for top level media type font [recorded in http://www.w3.org/2013/10/11-webfonts-minutes.html#action02]
<trackbot> Created ACTION-120 - Prepare a new draft for top level media type font [on Vladimir Levantovsky - due 2013-10-18].
<scribe> ACTION: kuettel (david) to prepare a new draft for top level media type font [recorded in http://www.w3.org/2013/10/11-webfonts-minutes.html#action03]
<trackbot> Created ACTION-121 - (david) to prepare a new draft for top level media type font [on David Kuettel - due 2013-10-18].
-short break-
<kenjibaheux> (going over the atypi kickoff presentation about takeoff of webfonts)
<kenjibaheux> http://www.atypi.org/atypi-amsterdam-2013/amsterdam-programme/activity?a=255
<ChrisL> goo.gl/5HeqYf
<kenjibaheux> Which resolves here: https://docs.google.com/a/chromium.org/document/d/1hrPWqBHMVlGmbAc8t4xQyvr4vC9WqCLtM1-DH6vEP7k/edit
<kenjibaheux> Link to the presentation: https://docs.google.com/a/chromium.org/presentation/d/16FmH1n8JZXurZW7UUSL5B8Fdi7-69LOh4Qy188QIQic/edit#slide=id.p
<ChrisL> adjourned
<ChrisL> Vlad: many thanks to google for hostingthe meetingand in particular the google compression team, it has been tremendous to see the work and the priority you have it!
This is scribe.perl Revision: 1.138 of Date: 2013-04-25 13:59:11 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/ntor/ntro/ Succeeded: s/than brotli/than gzip/ Succeeded: s/lzhmm/lzham/g Succeeded: s/Te/The/ Succeeded: s/codes/codes are more efficient/ Found Scribe: ChrisL Inferring ScribeNick: ChrisL Found ScribeNick: kbx Found ScribeNick: ChrisL Found ScribeNick: kbx ScribeNicks: ChrisL, kbx Present: sergei joe_vieira WARNING: Fewer than 3 people found for Present list! Got date from IRC log name: 11 Oct 2013 Guessing minutes URL: http://www.w3.org/2013/10/11-webfonts-minutes.html People with action items: chrisl david kuettel vlad[End of scribe.perl diagnostic output]