myles: yes
Vlad: scribe?
<Vlad> scribenick: Persa_Zula
Vlad: let's go over action items and updates from Garrett from the email list last week
myles: Is screensharing a demo of
the tool he built. left hand area shows the fonts. This example
shows with the range-based approach, you can never do 90% of
the font.
... you need to know the overhead of the round trip times as
well to help estimate the optimize. Once you have a font to
optimize, a represenation of the web to optimize, and a RTT, it
will move glyphs around to optimize the file.
... The results that he found are between 70-80-90% of this
fitness function. The tool is looking good as trying this with
a bunch of different fonts. The left panel shows the font file
and the font glyph size. Learned from GFonts corpus - the
biggest font in GFonts are not Chinese, they are Korean. Next
would want to gather Korean webpage data and try the test with
Korean fonts.
... overall observation is that the results are
encouraging.
jpamental: is the takeaway from this is to serve 16k URLs, you would only need to serve about 8% of the font data? Is that how you read it?
myles: what the results are for this, is that for any individual big font, downloading any webpage would take between 5-25% of the font data
Vlad: how many roundtrips would it take to get what you need?
myles: this doesn't show the
data, but would be easy to expose. I dont know the answer right
not
... not exposed because it's part of the cost function
... unit of RTT is bytes, not time.
... if you sort the font really poorly, you can get 0% because
the overhead might be worse than the font file load
Vlad: you're optimizing font for a particular piece of content you're accessing?
myles: yes, that's in the corpus
file. That came from the webcrawl of 100k URLs he did earlier
in a JSON object. In the app he can export that JSON file. In
the demo, it doesn't look at all 100k. Took a random sample of
1% of URLs to show us dmeo
... Optimizing font file for a particular corpus
... you can select the random sample size. And then optimize
for that percentage. This percentage fitness function on the
right is insensitive to the size of the random sample. You get
roughly the same fitness no matter which sample size you work
with.
... this means you don't have to look at all 100k to find the
smaller subset
Vlad: the optimized font you get is "good to go" for the rest of the pages that exist on the internet?
myles: as long as your corpus is
representative, yes
... his corpus right now is Chinese, but if he picked a Korean
font, it wouldn't be representative
Vlad: this gives us enough data to compare this approach against other approachs we've discussed
myles: have tried with many large fonts on MacOS, Google Fonts, and would like to try this with large fonts that Monotype has shared
Vlad: hasn't shared anything yet because he's hit a snag with obstacles with the online sharing tool
myles: will wait for Vlad to report back on that
Vlad: wants to get composite Asian fonts - not just "lots of glyphs", but also complicated composites
myles: this app flattens all composites
Vlad: in my experience, if you compare regular design vs composite, sometimes you get 2x or 3x size advantage with the composite
ned: this is like subroutines in
TTFs in the Google experience
... typically we don't see CJK fonts being built that way in a
modern world because they're tricky, maybe at monotype
... very few fonts he thinks have been built with subroutines;
MS did ship a version with fancy features a few years ago;
while deciding a policy on composite glyphs is a good time --
not sure they're a critical part of CJK fonts today
Vlad: I agree, but a useful experiemtn to expand our tests to see if it has effect
ned: This goes back to question about what fonts are able to provide. No apple system fonts use this technique
Garret: Can you hook this into the simulation framework on Github?
myles: gate was try it on a few
fonts. This app was designed to take the next step to
contribute the code to the framework
... the results ARE encouraging so I think now it's to
incorporate this into the github repo. This app is not written
in python
Garret: two pieces here -- the optimizer - this doesn't have to be python. The only requirement is that it can run on Linux and OSX, assuming its OSX currently
myles: there's two frontends -- CLI and GUI. app is written in swift. linux version of swift. Feasible to run on other OSs
Garret: CLI would be great if you can't port the GUI over; CLI would be nice if we can do that optimization
Vlad: worst case workaround - sepearate set of pre-optimized font can be used
Garret: would be good to have the
code available so someone who's not use can reproduce the
results
... 2nd piece - simulation code for this method.
... the simulation feeds the sequences of codepoints - request
size and response size are needed
... happy to help work with Myles
myles: discovered that results are appealing as they are, which means there's probably no need for a new font file format. The results are still an opentype font
Vlad: any updates here?
myles: in discussion with
compression team but no update to share
... would be valuable to get partial results that don't include
server compression. Default apache is to use zlib. Each range
request gets compressed with zlib individually. But hasn't done
any investiagtion here.
Garret: patch-subset, we apply brotli to the request and include that in the cost of bytes needed. Can include this in simulation code
myles: last piece on app ( https://www.w3.org/Fonts/WG/track/actions/212) is that he has no export button :D
Vlad: it's importnat because that's the intermeidate product to transfer from one project to the next if we can't port the tool itself
Garret: Adobe provided some data
on the performance, for this codepoint loads, how many bytes
are needed to serve the request.
... was able to compare with other methods and sent to the
email list
... different methods are:
... Adobe Skytype is the first method; supports CJK and most
data is for CJK and just CJK results are in the
simulation
... Google Fonts unicode bytes range ; breaking down the fonts
and having the client choose one
Garret: next is "optimal", on
this approach - you produce one subset that has EXACTLY the
data you need on the data
... patch/subset method - compute 2 subsets, compute the
differnces, apply the subset
... whole font method - just sending the WOFF2 compressed
version of the font, and nothing else
... Results -- # of bytes compared to the optimal
approach
... the whole font sent 12x the times of font from optimal
approach, whereas patch/subset only sent 1.2x
... computed cost of each method is another piece of data we
collected
... Details of cost function are detailed in the email
... ran simulation across pageviews . The higher the number is,
the worse the experience is for the user
... did this simulation across a bunch of different network
conditions
... Takeaways - incremental transfer methods have far lower
costs than other state of the art transfer methods.
... some exceptions - for connections with RTT - the ones that
send less requests win out over slower connections
... otherwise the costs are lower for incr. transfer
... other observations -- both skytype and patch.subset method
perform relatively close, and both are close to the optimal
approach
... check out the email for the data & details
Vlad: do you still plan to try to implement hybrid approach?
Garret: yes, want to try that out
to see performance of it. Likely that the hybrid approach might
work better. Maybe not for CJK but for other scripts
... tested the simulation with subroutinized and desubroutinzed
fonts. desubroutinzed fonts produce lower overall byte
transfers
... dont' fully understand why yet
Vlad: from WOFF2 days, seems to
vaguely remember we discovered that brotli compression is
better on desubroutinzed fonts
... main benefit comes from applying benefit from applying
patch.
Garret: Significant difference --
20% more bytes on subroutineized one
... with this set of data, remapping seemed to make things a
little worse -- was surprised that. Maybe more general data
across other scripts will show otherwise
... currently working on a general dataset which covers more
languages. hoping to have results soon.
... didn't share fonts with Adobe. this is only using open
source fonts data from the ones in their system
... most requests are from noto sans and source sans
... this data is a pretty limited sample of things; not
sufficient to declare final results
... good first step
myles: one thing that might help with datasets -- we can use wikipedia. There's also an OS dataset called "common crawl" that could give us textual data, no fonts
Garret: we can start with text data and insert the fonts at random
<myles> https://commoncrawl.org
Garret: dataset interally they're
working on is text & fonts and how they're being used for
the future
... if we can't publish our data, maybe we should have that
dataset as well
... next steps, get 2nd dataset results published and then
circle back to discuss
Vlad: one question - if Monotype gets us fonts in hands to used for simulation. They won't be available for the general public. Once we have the simulation results from the fontset not available to the general public. Is it a problem to publish those results if we don't give them enough to reproduce it themselves?
myles: we need something that's
reproduceable
... we also need to publish data that is reproduceable if we
publish others
Garret: if we publish two
datasets, one that is and one that isn't fully reproduceable,
that would be fine
... if the data doesn't agree with each other on public and
non-public we would have to investigate why
Vlad: we might have to use
passwords for next meeting going forward -- Vlad will be in
touch
... hoping next meeting is April 27, let's keep in touch.
Remember you might need additional steps to join the
meeting.
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Present: Persa_Zula jpamental sergeym Vlad Garret Found ScribeNick: Persa_Zula Inferring Scribes: Persa_Zula Found Date: 13 Apr 2020 People with action items: WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]