11:24:13 RRSAgent has joined #webfonts 11:24:13 logging to http://www.w3.org/2013/10/11-webfonts-irc 11:24:27 rrsagent, make logs public 11:25:01 Meeting: Fonts WG f2f meeting, Google Amsterdam 11:25:06 Chair: Vlad 11:25:13 Scribe: ChrisL 11:25:14 Vlad has joined #webfonts 11:26:06 kbx has joined #webfonts 11:26:22 Hello everyone! 11:26:28 Topic: Introductions 11:26:33 kbx (kenjibaheux) 11:26:36 Te meeting agenda is at http://lists.w3.org/Archives/Public/public-webfonts-wg/2013Oct/0001.html 11:26:48 Topic: Intorductions 11:27:52 s/ntor/ntro/ 11:28:11 behdad has joined #webfonts 11:28:24 (everyone now knows everyone) 11:29:47 Vlad: were considering LZMA until august but uphill battle and we decided to radically change course to a different entropy coder 11:29:47 sergeym has joined #webfonts 11:30:14 ... we were stuck, now we have a way forward 11:30:27 .. thanks to Google compression team 11:31:22 Topic: updates on compression 11:31:32 cslye has joined #webfonts 11:31:39 ChrisL has joined #webfonts 11:32:11 raph: brotli is a new algorithm based on flate, performance really proving itself 11:32:22 ... good match to needs of webfonts 11:32:41 ... just open sourced this morning the first of several modules - the decoder 11:32:58 ... a specification and the encoder source will follow 11:33:36 raph: will present measurements today, on the encoder and also on the preprocessor step which has not changed 11:34:56 raph: aimed midway betweenlzma and gzip, have surpassed that while getting much faster decompression speed 11:35:09 ... specifiability also much better than with lzma 11:35:41 raph: per table or single stream we have interesting findings also 11:36:24 raph: 29.21% improvement on google font corpus compared to woff 1.0 11:36:57 3.18x decompresssion speed loss 11:37:07 preprocessing 4.5ns/byte 11:37:15 entropy 12 ns/byte 11:37:26 woff 1.0 5.1ns/byte 11:38:06 raph: slides will be releasd publicly 11:38:12 % improvement is total library aggregate savings 11:38:42 raph: gzip however is hugely optimised while this is not, yet 11:40:42 Vlad: preprocessing is actually de-preprocessing 11:41:06 Vlad: (explanation of observer rules) 11:41:31 raph: lzma size win 4.74% on google font corpus 11:41:45 brotli 1.67x decompression speed win 11:42:40 ChrisL: so 3.18 × 1.67 = 5.3106 times slower than woff1 for lzma 11:45:32 raph: slide shows fonts ordered by compression gain 11:45:46 christopher: these are al TT? 11:45:56 raph: yes, need numbers for CFF as well 11:46:52 raph: spike at end is from badly done fonts with lots of redundant data 11:47:17 raph: for the rest of the graph they track very well 11:47:54 raph: continue stream is significantly better 11:48:51 ... 1.3% compression improvement and 8.7% better speed to do whole-font vs. per-table 11:49:18 raph: here we have data that whole font approach is better 11:50:12 ChrisL: argument in woff1 was selective table decompression and byte-range fetches. neither have seen use in practice 11:50:44 Vlad: some optimisation across tables, so you really need to fetch all tables to reverse preprocess 11:50:58 ... again in favour of whole-font compression 11:51:50 raph: do we still want gzip? compression is 85x slower than brotli for compression 11:52:24 ... a fast brotli compressor (same wire format) is possible 11:53:03 s/than brotli/than gzip/ 11:54:44 Vlad: if brotli has a few established switches to affect compression, that is good. for many users, may be better to subset the font then compress on the fly, so compression speed is important 11:55:45 jyrki: its possible to make a brotli compressor that makes a valid bytesteam and is faster to compress but not as good compression 11:56:27 raph: having options in the implementation is good but there is one wire format 11:56:48 ChrisL: decompressor does not have switches and doesnt care how it was made 11:56:52 raph: right 11:57:08 christopher: effect of per-table? 11:57:42 raph: with gzip, continue stream is on average not optimal so its a small benefit to compress per table 11:57:59 ... because the compressor uses stats from previous table which are not good 11:58:14 ... no significant effect on decompression speed 11:59:02 present+: sergei, joe_vieira 12:00:05 raph: preference is to simplify spec as brotli-only byetelevel compression with whole font only compression 12:00:36 Vlad: need to revisit per-table option as it limits optimisations; they need the whole font to work 12:02:28 jyrki: plan to complete spec for brotli in 6 weeks 12:02:59 sergei: is it "too complex to explain" like lzma 12:03:11 jyrki: no! much like flate 12:03:28 Vlad: starting from flate makes it easier 12:04:15 raph: also we understand the importance of a solid spec, for security review for example. lzma did not have that 12:07:04 ChrisL: (explains benefit of early spec publication to stimulate review) 12:07:26 raph: we are within 5% of lzma which is very positive, given its also way faster 12:09:03 Vlad: want a veryclear statement, what exactly we are measuring. because we are mixing in mtx and entropy coding together 12:10:47 (we agree what the baselines are and that we are comparing apples to apples) 12:11:18 raph: we will be sharing the spreadsheet of detailed results, has exact byte counts per option. 12:12:42 jyrki: brotli with filtering is 1.67x faster than lzma but with no preprocessing is 2x faster 12:13:28 raph: expect profiling will improve on 2x faster than lzma 12:14:12 jfkthame: looked at comparison to other flate+ like oodle 12:14:28 .. or lzhmm 12:14:39 ... question is have you compared them 12:14:42 raph: yes 12:15:21 raph: not in slides but we have looked at it. we dont have apples to apples comparison for lzhmm 12:15:57 raph: lzhmm has a very expensive startup cost 12:16:41 jyrki: lzhmm works badly on small fonts, catches up on large fonts but does not meet what brotli does 12:17:19 ... better in compression speed and compression ratio 12:17:36 christopher: what are you meaning by small and large fonts here? 12:18:03 s/lzhmm/lzham/g 12:18:44 raph: on the google fonts corpus, threashold is 2Mbytes so CVJK fonts mainly, up to 4.7Mbytes 12:19:14 raph: lzhmm is also not actively developed in last year, not production quality 12:19:53 raph: oodle is proprietary, but looking at reported perf its similar to lzma a little faster, there is another profile tuned for decompression speed 12:20:16 ChrisL: so in summary neither is better than brotli 12:20:45 jfkthame: need to show the others were looked at, otherwise itsz an obvious open question 12:21:49 jyrki: need to use current head of lzhmm and packaging is clumsy, needs work to be robust. Makes it difficult to repeat. 12:22:23 ... its for huge corpora, not tuned for font file sizes, it barely gets started 12:46:32 links for other compression algorithms: https://code.google.com/p/lzham/ 12:47:38 Oodle: http://www.radgametools.com/oodlecompressors.htm 12:48:03 Oodle (blog post): http://cbloomrants.blogspot.nl/2012/09/09-22-12-oodle-beta-and-roadmap.html 12:48:10 https://code.google.com/p/font-compression-reference/ 12:48:49 direct link to brotli code: https://code.google.com/p/font-compression-reference/source/browse/#git%2Fbrotli%2Fdec 12:49:26 s/Te/The 12:51:56 Topic: Brotli motivation 12:52:15 jyrki: start was to use flate with bigger window size 12:52:52 ...we had prior experience with flate family 12:52:59 one of the implementations of flate compression is Zopfli: https://code.google.com/p/zopfli/ 12:53:21 ... noticed other things to improve because larger window was not a significant improvement 12:54:15 jyrki: entropy code reuse, windows size, context modellling, literal count coding, distance cache, match lengths, and block end codes 12:54:39 s/codes/codes are more efficient 12:55:16 ... brotli can re-use past entropy codes within the same metablock 12:55:57 ... in flate, 50 byte overhead to change code. brotli can reuse any previous entropy code 12:56:46 ... codes refered to by number and also by order (second last) 12:58:07 ... window size, past 16MB rather than 32kB 12:58:32 ... discussion if 8 or 4 MB would be better for decoder 12:58:51 jfkthame: impacts memory useage of decoder 12:59:17 ... will not fit in ram on low end or cache on medium end 12:59:40 behdad: also most fonts smaller than 4MB 13:00:25 zoltan: size of entire stream is sent in the header so never need to allocate more than eventual decompressed size 13:01:04 jyrki: only make long references for large data blocks so ther cost of exceeding cache is justified 13:01:25 raph: mainly want to see if ration is stable across architectures 13:01:56 jyrki: dektop and mobile cache architectures are becoming more similar 13:02:36 kenji: avoid picking a tooo low value that needs to be changed in the future 13:02:55 jyrki: prefer a larger limit with a hint to use lower values in practice 13:03:34 raph: gzip allows window size to be specified (but in header, not flate stream) 13:04:14 zoltan: encode maximum backward reference (log2 window size) in stream 13:04:59 jyrki: paying attention to past two or three bytes can get more specific and thus more efficient entropy coding. this is called context modelling 13:05:11 ... 5% gain in compression density 13:05:29 eg a "." is much more likely to have a " " after it 13:06:31 jyrki: number of pure literals is counted, next literals with only entropy code 13:08:03 ... number of bytes follwed by huffman coded bytes, then back reference 13:08:25 ... so different way to express block end. inherently faster to decode 13:10:18 ChrisL: so removing a decision from an inner loop 13:11:00 jyrki: backward references can be described in terms of four past distances, or two last with a +/-3 offset 13:12:36 .. can cope with small insertions or deletions in otherwise similar strings 13:14:09 jyrki: flate has match lengths of 3-258 bytes. brotli has match lengths of 2 as well. no longer codified separatelyy but with length of literal seqiuences. works well with pulse repeating data 13:14:39 ... match lengths codified in joint entropy code with literal lengths 13:14:54 ... they have surprisingly high correlation 13:15:41 jyrki: in flate, any symbol could be end code; in brotli block lenght is coded at the beginning so loop unrolling can be used 13:16:16 ChrisL: very clear, thank you 13:16:44 Vlad: how hard to mess up compression stream to get at vulnerabilities? false lengths etc 13:17:35 jyrki: compression can be exploited to find protected memory. Longer window extends attack surface 13:17:48 ... no other ways spring to mind 13:18:24 raph: decompression is inherently high risk, it gets untrusted data. 13:19:02 ... chrome looks at any new code with a rigorous security review. not yet started for brotli but was in progress for older woff2 (lzma) 13:19:37 kenji: yes, probing with random data was started. then code review by security engineer 13:20:05 raph: fuzz testing uses mutations of good data. random data checks widely different branches 13:20:40 ... automated large scale testing 13:21:09 Vlad: can the results for brotli be made available 13:22:07 kenji: vulnerabilities will be disclosed and patched but the specific bad sequence is typically not disclosed 13:22:29 jyrki: total code size is fairly small, 1200 lines of C 13:22:45 topic: outline of Brotli specification 13:23:33 raph: showing what the spec sections would look like 13:24:55 zoltan: (slides will be made available) 13:26:18 zoltan: ring buffer of last four distances, reused in within +/-3 of last two giving 16 distance codes 13:27:51 encoding of commands in buckets powers of two, similar to flate 13:29:24 zoltan: short, medium and long categories of literals and copy lengths, with 8 buckets each. joint histogram 13:30:52 ... last distance is specially coded as it recurrs frequently 13:32:19 ... then build a regular huffman code over that histogram 13:34:36 ... for fonts, data is 4byte aligned so we can fix the last number of bits. assumes distribution of extras bits is flat, but its not so we have buckets with the last k bits fixed 13:35:08 .. texst for example does not have the 4byte aligned structure 13:36:08 zoltan: huffman code says how many bits each sigmnal will need. then you make a histogram and create a huffman code. 13:36:34 ... sequence of bit length has run length coding to encode missing symbols 13:37:06 bit lenghts 0 to 15 plus three specialcodes giving 19 symbol alphabet 13:37:18 ... deflate has a different order. 13:40:03 zoltan: block splitting. independent block boundaries for literals, commands and distances 13:41:20 zoltan: encoding of block switch symbols 13:42:15 ... (see table on slide) 13:43:00 zoltan: encoding of the block split 13:43:55 zoltan: context modelling for literals 13:44:41 ... B-3 to B-1 are from already decoded bytes 13:45:05 ... lok up in context map, find an index to a huffman code used to encode next byte 13:45:41 zoltan: context modelling sensitive to type of data. have one or two for fonts, others for ascii and utf-8 data 13:47:13 ... also a simple versions, context is the copy length, distribution for small copies is much different to larger ones. so check for length 2, 3, or 4 and more 13:48:54 zoltan: context map encoding 13:49:11 ... can disable context modelling in 1 bit 13:49:41 ... this part is still under discussion 13:50:35 ... move-to-fron thansfers many symbols to zero 13:51:12 zoltan: format specification. (see syntax diagram on slides) 13:53:17 ChrisL: max size is 2^(8x7) 13:53:27 (we agree this is plenty) 13:53:54 (see syntax diagrams on following slides) 14:29:36 Topic: Evaluation report 14:30:02 Vlad: lets review, then agree to publish. its a long overdue deliverable. So we should say what we have been up to. 14:30:33 kenjibaheux 14:30:42 scribenick: kbx 14:30:58 http://www.w3.org/Fonts/WG/WOFF2ER/ 14:33:06 based on last meeting discussion added some requirements 14:33:36 added CSS3 fonts parallel deployment requirement 14:33:47 clarification of lossless's meaning 14:35:29 micro type express style plus LZMA => Preprocessing + LZMA with explanation about the preprocessing being based on microtype express 14:36:37 appendix A will contain all the data we had about LZMA 14:38:06 about the decompression and memory requirement section: replace "enterily due" with "mostly due" and use appendix A for actual data 14:38:32 how about qualifying "memory requirement" bit? 14:39:06 jyrki: context modeling and back forth reference. I believe that the latter dominates for LZMA. 14:40:03 raph: (basically 2x of gzip) 14:42:37 Candidate A: Preprocessing plus Brotli Compression Algorithm 14:42:55 will add what was presented today 14:43:10 and eventually publish this report 14:43:17 can be updated as we wish 14:44:06 Vlad: if we could agree within this meeting given that we have an understanding of what will go into the report. 14:44:45 action: ChrisL to update the evaluation report with information from today 14:44:46 Created ACTION-119 - Update the evaluation report with information from today [on Chris Lilley - due 2013-10-18]. 14:45:45 resolution: publish the evaluation report 14:45:55 rrsagent, here 14:45:55 See http://www.w3.org/2013/10/11-webfonts-irc#T14-45-55 14:46:01 ChrisL has joined #webfonts 14:47:52 Data for Brotli comparison over Google Fonts corpus 14:47:54 Per font comparison (Google Font): https://docs.google.com/spreadsheet/ccc?key=0AjdKc3tA4Jb0dGJpeGVxN21ONzVDX3I5SXNPdzhPQmc&usp=sharing Whole corpus comparison with different options (Google Font) https://docs.google.com/spreadsheet/ccc?key=0AjdKc3tA4Jb0dFJGZFk3NzZQNENKczJGVDZ3QTk2T1E&usp=sharing 14:48:58 Showing graph on third sheet of the per font spreadsheet 14:52:17 raph: (comments about the spreadsheet: example of important things to call out: average compression, worst case, the Max is from a font that has inherent design issues) 14:54:41 discussion about rejecting invalid fonts. WOFF 2.0 does not have strict checks (this is more an OTS like level issue). 14:55:04 would be interesting to run an experiment on a set of sloppy put together fonts. 14:56:04 raph: another interesting figure: Korean fonts (well coded fonts) with excellent compression gains 14:56:42 ChrisL: (is this because of inherent characteristics of Korean characters) 14:57:35 Raph: (it's more about how the font is built) 14:58:04 ChrisL: what about ordering changes? 14:58:32 Raph and kbx: we had this graph but it's gone from the spreadsheet. nothing wild, noisy => no major insights to gain from there (we think) 14:59:45 ChrisL and Raph: pointing out that the final compression gains achieved by Brotli outperformed our initial goal of half's WOFF 2.0 (LZMA) gains 15:00:14 Whole corpus spreadsheet, explanations of the graphs 15:01:48 Raph, David: LZMA compression gains also improved slightly with continuous stream 15:02:27 Raph and David: (pointing out that decompression speed is the current state and could be improved further since we haven't spent time optimizing for this yet) 15:04:39 Raph: bottom half of the spreadsheet is for the large fonts corpus 15:06:26 Raph: (decompression rate comparisons of per stream or continuous stream overhead column H line 2-5 and line 7-9) 15:07:09 Topic: home for compression spec 15:07:09 Topic: Home of the compression specification 15:07:11 (scribing earlier discussion): the main goal of this work is to optimize fonts that are already well optimized 15:07:20 scribenick: ChrisL 15:07:54 Vlad: open question, assume google has someone in compression in ietf 15:08:05 it would be possible to optimize fonts with (for example) very large kern tables but it's not a goal 15:08:19 raph: I assume so but not sure who that is 15:08:49 it might be worthwhile to rerun the experiment over a corpus that has already had the kern tables optimized, for example using the tool that Adobe presented on Wednesday at ATypI 15:08:58 Vlad: two options, google does it themselves, or submitted to w3c as a proposal and then liaison 15:09:39 that experiment would separate out the effect of cleaning up sloppily encoded fonts vs general effectiveness 15:10:10 (scribe, clean-up order) 15:10:41 Vlad: dont want us held up to finalize our spec if its not clear who picks it up 15:14:27 ChrisL: (explained upcoming changes to which level of specs can be referenced) 15:15:15 Raph: suggesting that we reach our IETF representatives for advices on how to proceed further. That should be a matter of 15 minutes discussion. 15:15:45 ChrisL: "mostly aimed at fonts", well what else can it do? "http" then you get other folks involved and that might induce delays... 15:16:11 Raph: good example would be Flate (IETF) 15:17:09 Vlad: additional question would be to find out if it matters who reaches out? 15:17:33 scribenick: kbx 15:20:02 Raph: (if it happens that we have difficulties to reconciliate with other groups, we have a good case to make given our use case and the data on hands) 15:20:38 Vlad: once the evaluation report is published, we start working on the spec 15:20:39 Topic: plans for the WOFF2 spec 15:21:10 Vlad: target date for the first draft of the spec? 15:21:23 Raph: byte level spec or WOFF 2.0 spec? 15:21:32 Vlad: WOFF 2.0 15:21:41 Raph: co-editors? 15:22:04 Raph: (volunteering) 15:23:13 ChrisL: (explaining alternatives to deal with Co-chairing and Editorship). Should not be an issue and if it becomes one we can always find a solution. 15:23:28 Vlad: (agreed to be the editor) 15:23:37 Raph: (welcomed the proposal) 15:24:19 Raph: target date is not blocked on Brotli 15:24:45 Vlad: need ability to point somewhere for Brotli 15:24:55 Raph: end of year sounds feasible 15:25:52 Vlad: this is more about an internal draft for the group 15:27:19 ChrisL: first public working draft can be quite rough but it defines the work so it's good to sketch a large scope area (due to patents related matters) 15:28:29 ChrisL: (explaining how patents work in w3c; patent exclusions) 15:29:30 ChrisL: so before we publish we should be careful about the scope 15:30:46 Raph: the scope of this document would be almost the same as the prior write up. I don't see the scope changing in anyway at all. Because even if we had to turn around on the choice of the encoder, we haven't found opportunities to make changes to the preprocessing part. 15:32:00 many opportunities to do clever transformations in the preprocessing step help when you have gzip, but not effective when you have a more sophisticated byte-level compression format 15:32:23 Vlad: end of the year for the internal draft sounds good 15:32:53 Vlad: will be bug down for the next month or so but if Raph could start with an initial draft 15:33:14 Raph: (agreeing with the intent of focusing on the content rather than the spec wording aspects) 15:33:45 ChrisL: needs boilerplate, in HTML, normative and non normative should be clearly called out 15:34:06 Vlad: ChrisL if you could send links to the procedures and the tools used 15:34:10 vlad: the empirical measurements bear out transforms having relatively less impact when using better byte-level compression 15:34:50 ChrisL: there is ReSpec (http://www.w3.org/respec/) which does the boilerplate and so on 15:35:00 Topic: font media type 15:35:28 Vlad: ChrisL did you have the time to talk around (follow up from previous meeting) 15:35:53 Vlad: the original concerns was that it seemed like an uphill battle. Is this still true? 15:36:18 ChrisL: (unfortunately, this is still true to some extent) 15:36:51 ChrisL: noting that this isn't on the critical path for WOFF 2.0 15:36:58 ChrisL has joined #webfonts 15:37:10 (Adobe tool that optimizes kern tables, discussed earlier: https://github.com/adobe-type-tools/python-modules/blob/master/WriteFeaturesKernFDK.py) 15:37:24 ChrisL: if we can't get it done in time, we do like we did for WOFF 1.0 15:38:08 ChrisL: not opposed, will do if the group wants me too but suggests that we focus on WOFF 2.0 first 15:38:47 Vlad: takes almost no time to put a draft of that top level application, so let's write it down send it and take room temperature 15:39:16 Vlad: (should be enough to find out how fast this will move) 15:39:47 action: vlad to prepare a new draft for top level media type font 15:39:47 Created ACTION-120 - Prepare a new draft for top level media type font [on Vladimir Levantovsky - due 2013-10-18]. 15:39:57 action: kuettel (david) to prepare a new draft for top level media type font 15:39:57 Created ACTION-121 - (david) to prepare a new draft for top level media type font [on David Kuettel - due 2013-10-18]. 15:41:04 Topic: replay of the atypi presentation about progress of Webfonts 15:41:10 -short break- 15:59:43 kenjibaheux has joined #webfonts 16:00:14 (going over the atypi kickoff presentation about takeoff of webfonts) 16:00:38 http://www.atypi.org/atypi-amsterdam-2013/amsterdam-programme/activity?a=255 16:22:42 goo.gl/5HeqYf 16:23:06 Which resolves here: https://docs.google.com/a/chromium.org/document/d/1hrPWqBHMVlGmbAc8t4xQyvr4vC9WqCLtM1-DH6vEP7k/edit 16:28:20 Link to the presentation: https://docs.google.com/a/chromium.org/presentation/d/16FmH1n8JZXurZW7UUSL5B8Fdi7-69LOh4Qy188QIQic/edit#slide=id.p 16:29:47 rrsagent, make minutes 16:29:47 I have made the request to generate http://www.w3.org/2013/10/11-webfonts-minutes.html ChrisL 16:31:03 adjourned 16:31:06 rrsagent, make minutes 16:31:06 I have made the request to generate http://www.w3.org/2013/10/11-webfonts-minutes.html ChrisL 16:32:42 Vlad: many thanks to google for hostingthe meetingand in particular the google compression team, it has been tremendous to see the work and the priority you have it! 16:32:50 rrsagent, make minutes 16:32:50 I have made the request to generate http://www.w3.org/2013/10/11-webfonts-minutes.html ChrisL 17:22:43 jfkthame has joined #webfonts 17:22:50 jfkthame has left #webfonts