W3C

- DRAFT -

Web Fonts Working Group Teleconference

13 Apr 2020

Attendees

Present
Persa_Zula, jpamental, sergeym, Vlad, Garret
Regrets
Chair
Vlad
Scribe
Persa_Zula

Contents


myles: yes

Vlad: scribe?

<Vlad> scribenick: Persa_Zula

Vlad: let's go over action items and updates from Garrett from the email list last week

Update from Myles

https://www.w3.org/Fonts/WG/track/actions/212

myles: Is screensharing a demo of the tool he built. left hand area shows the fonts. This example shows with the range-based approach, you can never do 90% of the font.
... you need to know the overhead of the round trip times as well to help estimate the optimize. Once you have a font to optimize, a represenation of the web to optimize, and a RTT, it will move glyphs around to optimize the file.
... The results that he found are between 70-80-90% of this fitness function. The tool is looking good as trying this with a bunch of different fonts. The left panel shows the font file and the font glyph size. Learned from GFonts corpus - the biggest font in GFonts are not Chinese, they are Korean. Next would want to gather Korean webpage data and try the test with Korean fonts.
... overall observation is that the results are encouraging.

jpamental: is the takeaway from this is to serve 16k URLs, you would only need to serve about 8% of the font data? Is that how you read it?

myles: what the results are for this, is that for any individual big font, downloading any webpage would take between 5-25% of the font data

Vlad: how many roundtrips would it take to get what you need?

myles: this doesn't show the data, but would be easy to expose. I dont know the answer right not
... not exposed because it's part of the cost function
... unit of RTT is bytes, not time.
... if you sort the font really poorly, you can get 0% because the overhead might be worse than the font file load

Vlad: you're optimizing font for a particular piece of content you're accessing?

myles: yes, that's in the corpus file. That came from the webcrawl of 100k URLs he did earlier in a JSON object. In the app he can export that JSON file. In the demo, it doesn't look at all 100k. Took a random sample of 1% of URLs to show us dmeo
... Optimizing font file for a particular corpus
... you can select the random sample size. And then optimize for that percentage. This percentage fitness function on the right is insensitive to the size of the random sample. You get roughly the same fitness no matter which sample size you work with.
... this means you don't have to look at all 100k to find the smaller subset

Vlad: the optimized font you get is "good to go" for the rest of the pages that exist on the internet?

myles: as long as your corpus is representative, yes
... his corpus right now is Chinese, but if he picked a Korean font, it wouldn't be representative

Vlad: this gives us enough data to compare this approach against other approachs we've discussed

myles: have tried with many large fonts on MacOS, Google Fonts, and would like to try this with large fonts that Monotype has shared

Vlad: hasn't shared anything yet because he's hit a snag with obstacles with the online sharing tool

myles: will wait for Vlad to report back on that

Vlad: wants to get composite Asian fonts - not just "lots of glyphs", but also complicated composites

myles: this app flattens all composites

Vlad: in my experience, if you compare regular design vs composite, sometimes you get 2x or 3x size advantage with the composite

ned: this is like subroutines in TTFs in the Google experience
... typically we don't see CJK fonts being built that way in a modern world because they're tricky, maybe at monotype
... very few fonts he thinks have been built with subroutines; MS did ship a version with fancy features a few years ago; while deciding a policy on composite glyphs is a good time -- not sure they're a critical part of CJK fonts today

Vlad: I agree, but a useful experiemtn to expand our tests to see if it has effect

ned: This goes back to question about what fonts are able to provide. No apple system fonts use this technique

Garret: Can you hook this into the simulation framework on Github?

myles: gate was try it on a few fonts. This app was designed to take the next step to contribute the code to the framework
... the results ARE encouraging so I think now it's to incorporate this into the github repo. This app is not written in python

Garret: two pieces here -- the optimizer - this doesn't have to be python. The only requirement is that it can run on Linux and OSX, assuming its OSX currently

myles: there's two frontends -- CLI and GUI. app is written in swift. linux version of swift. Feasible to run on other OSs

Garret: CLI would be great if you can't port the GUI over; CLI would be nice if we can do that optimization

Vlad: worst case workaround - sepearate set of pre-optimized font can be used

Garret: would be good to have the code available so someone who's not use can reproduce the results
... 2nd piece - simulation code for this method.
... the simulation feeds the sequences of codepoints - request size and response size are needed
... happy to help work with Myles

myles: discovered that results are appealing as they are, which means there's probably no need for a new font file format. The results are still an opentype font

https://www.w3.org/Fonts/WG/track/actions/214

Vlad: any updates here?

myles: in discussion with compression team but no update to share
... would be valuable to get partial results that don't include server compression. Default apache is to use zlib. Each range request gets compressed with zlib individually. But hasn't done any investiagtion here.

Garret: patch-subset, we apply brotli to the request and include that in the cost of bytes needed. Can include this in simulation code

myles: last piece on app ( https://www.w3.org/Fonts/WG/track/actions/212) is that he has no export button :D

Vlad: it's importnat because that's the intermeidate product to transfer from one project to the next if we can't port the tool itself

results from https://www.w3.org/Fonts/WG/track/actions/215

Garret: Adobe provided some data on the performance, for this codepoint loads, how many bytes are needed to serve the request.
... was able to compare with other methods and sent to the email list
... different methods are:
... Adobe Skytype is the first method; supports CJK and most data is for CJK and just CJK results are in the simulation
... Google Fonts unicode bytes range ; breaking down the fonts and having the client choose one

Simulation results (https://lists.w3.org/Archives/Public/public-webfonts-wg/2020Apr/0005.html)

Garret: next is "optimal", on this approach - you produce one subset that has EXACTLY the data you need on the data
... patch/subset method - compute 2 subsets, compute the differnces, apply the subset
... whole font method - just sending the WOFF2 compressed version of the font, and nothing else
... Results -- # of bytes compared to the optimal approach
... the whole font sent 12x the times of font from optimal approach, whereas patch/subset only sent 1.2x
... computed cost of each method is another piece of data we collected
... Details of cost function are detailed in the email
... ran simulation across pageviews . The higher the number is, the worse the experience is for the user
... did this simulation across a bunch of different network conditions
... Takeaways - incremental transfer methods have far lower costs than other state of the art transfer methods.
... some exceptions - for connections with RTT - the ones that send less requests win out over slower connections
... otherwise the costs are lower for incr. transfer
... other observations -- both skytype and patch.subset method perform relatively close, and both are close to the optimal approach
... check out the email for the data & details

Vlad: do you still plan to try to implement hybrid approach?

Garret: yes, want to try that out to see performance of it. Likely that the hybrid approach might work better. Maybe not for CJK but for other scripts
... tested the simulation with subroutinized and desubroutinzed fonts. desubroutinzed fonts produce lower overall byte transfers
... dont' fully understand why yet

Vlad: from WOFF2 days, seems to vaguely remember we discovered that brotli compression is better on desubroutinzed fonts
... main benefit comes from applying benefit from applying patch.

Garret: Significant difference -- 20% more bytes on subroutineized one
... with this set of data, remapping seemed to make things a little worse -- was surprised that. Maybe more general data across other scripts will show otherwise
... currently working on a general dataset which covers more languages. hoping to have results soon.
... didn't share fonts with Adobe. this is only using open source fonts data from the ones in their system
... most requests are from noto sans and source sans
... this data is a pretty limited sample of things; not sufficient to declare final results
... good first step

myles: one thing that might help with datasets -- we can use wikipedia. There's also an OS dataset called "common crawl" that could give us textual data, no fonts

Garret: we can start with text data and insert the fonts at random

<myles> https://commoncrawl.org

Garret: dataset interally they're working on is text & fonts and how they're being used for the future
... if we can't publish our data, maybe we should have that dataset as well
... next steps, get 2nd dataset results published and then circle back to discuss

Vlad: one question - if Monotype gets us fonts in hands to used for simulation. They won't be available for the general public. Once we have the simulation results from the fontset not available to the general public. Is it a problem to publish those results if we don't give them enough to reproduce it themselves?

myles: we need something that's reproduceable
... we also need to publish data that is reproduceable if we publish others

Garret: if we publish two datasets, one that is and one that isn't fully reproduceable, that would be fine
... if the data doesn't agree with each other on public and non-public we would have to investigate why

next meeting

Vlad: we might have to use passwords for next meeting going forward -- Vlad will be in touch
... hoping next meeting is April 27, let's keep in touch. Remember you might need additional steps to join the meeting.

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2020/04/13 16:59:31 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Present: Persa_Zula jpamental sergeym Vlad Garret
Found ScribeNick: Persa_Zula
Inferring Scribes: Persa_Zula
Found Date: 13 Apr 2020
People with action items: 

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]