<Garret> Vlad: agenda: Garret sent an update on data using Monotype and other foundry fonts. Would like to review that data.
<Garret> Vlad: have a couple of questions about the difference about the two different results.
<myles> ScribeNick: myles
Garret: Vlad sent us some fonts
from Monotype that he thought would have interesting
characteristics and technical setups. I took those fonts and
the Adobe set of codepoint sequences and I randomly picked the
Monotype fonts and used those fonts in the Adobe sequences, and
ran it through the analysis again, to test
... What we saw was performance was somewhat similar to the
original Adobe data set. One difference was the transfer
efficiency is less overall for most of the methods. The
efficiency was about 20% extra bytes sent. So 62% more bytes
than the optimal approach
... let me open it up and look at it again...
... Beyond that, we see some of the same trends on fast or low
RTT connections. Once you get into a situation where large
amounts of requests are heavily penalized than a single font is
the best approach
Vlad: if we look at the numbers and consider the numbers to be our coarse estimate, the expectation is that the coarse estimate depends on font size itself. It was surprising that the whole font approach produced almost identical results.
Garret: You're right. Usually the cost function is based on the size of the font. The cost function we're using caps out at max. At 3 seconds, most browsers will give up on a font load. After that, any additional time isn't counted against you. So taking longer doesn't make things worse. I think what we see is for the whole font for CJK fonts, for most slow connections, they consistently take longer than the threshold
Vlad: I see on slow connections
when we evaluate optimal or patch subset approach there are
much larger cost functions that are recorded. Those don't seem
to be capped.
... When you look at the optimal column, in mobile 2G scenario,
you have over 20 million value of cost function. Yet at the
same time, whole font is over 5 million .... basically, i'm not
sure how to reconcile them now b/c it seems like some are
capped and some aren't
Garret: One thing to realize in the whole font approach is it leverages caching. If there's a sequence of 5 pages, the first page will have the big penalty, but the remaining 4 pages will be very fast. But in incremental transfer, you could have 1 font load on each of the 5 pages. If you have a really bad connection, where a single page view will include almost 3s of RTT, then you'll hit the maximum cost for all of those 5 pages. So you end up with a
higher total cost
Vlad: So the larger numbers we see are the sum of 5 different cost functions for 5 loads
Garret: The average page sequence
length is 5, and you see a ~5x multiplier. So that's why.
... In the original analysis I ran, it didn't have the capping,
and that gave skewed results. Some loads take 30s, and that
threw thank everything off.
... Capping it does match with the reality of how font loads
are handled in most browsers
... But i'm up for having a discussion about that
further.
... The cost function isn't set in stone
Vlad: ok.
... When I look at the original incremental transfer analysis
based on adobe dataset, for desktop fast, we have a significant
advantage shown in optimial or patch/subset. And significantly
better than whole font approach. In the new simulation results,
we still have an advantage between optimal vs patch/subset, but
it's very very minimal. The whole font approach has a very low
value. Why?
Garret: I don't know yet.
... It is probably because of how the fonts subset
differently.
Vlad: The whole font approach is
a tiny margin of benefit between that approach between optimal.
I'm not sure if the numbers correctly capture the truth. I
cannot explain the difference.
... 17.5k for whole font vs 16.5k for optimal
Garret: I can look under the hood to investigate.
Vlad: If we consider general CJK corpus, and we compare similar fonts using regular independent glyph formats vs composites, on average, that saves 2x font size. Font files are 2x as small as typical regular truetype fonts. I would expect the whole font cost to be in that range. But I'm not sure why it's such a small number.
Garret: The cost function is
nonlinear. Reducing the font size by 1/2 isn't going to reduce
the cost by 1/2
... Depending on where you are in the cost function, If you're
on the aggressive downslope, and you reduce bytes, it will have
a big effect.
... Because the fonts have different sizes, that could explain
what's going on
Vlad: I would be curious if you could dig in and explain it.
Garret: I can do that this
week.
... Despite the fact that the costs are almost equivalent,
incremental transfer are transferring significantly less bytes
overall
Vlad: Yes. For the final outcome,
if we're trying to pick the winner, two drastically different
sets of font data produce similar results in the cost function,
that determines that one strategy is the least expensive
throughout regardless of condition.
... Thank you. I'm curious why whole-font cost function gets
reduced so much compared to another data set. We can look at it
later
Garret: I can pick a few representative samples and show what happened for those samples. That should help explain.
Vlad: Thank you.
... Anything else?
<scribe> ScribeNick: Garret
Myles: in the CSS working group
in the f2f we talked about font issues and the topic of CJK
loading came up. Mentioned to the group that the webfonts
working group is working on making webfont loads faster and
hsould help wioth CJK
... another member Mike mentioned that they currently do this.
There's a thing called linearized PDF's.
... the pdf has a sequence of pages, and can have modifications
appended to the end of the file.
... there's a linearized pdf format that doesn't have these
modifications.
... each page has what it needs. There's resources that are
used across pages at the end of the file, and pages have
reference to the end. So fonts are resources that any page
might reference.
... so Mike works on a PDF viewer. They implemented lazy
downloading of resources. So if you need only page 7 will only
download resources needed for 7. Also did this at a per glyph
level. Will only download the glyphs used by page 7.
... this was surprising because he had already implemented
essentially the same range request approach I'm working on. His
tool is a real shipping tool which works. So we have a
existence proof the range request works in practice. So asked
him to join the working group.
... his approach is almosts identical to the approach I'm
using. One difference is he's using blocks which my approach
does not do. The reason why blocks are valuable is for page
sequences.
... will probably add a parameter for this which can be
tuned.
Sergey: in windows packages can have deltas per page. Works on glyph tables. Have islands of zero bytes where the data would be. Add data based on what new glyphs are characters are on the page.
Myles: so second case of this technology being used.
Vlad: so his approach is just using byte ranges with no special reorg done?
Myles: yeah. He didn't approach trying to modify font files since this is a font viewer that needs to work with arbitrary pdf content.
Sergey: another example from
windows in dwrite the loader can load by block, doesn't have
special knowledge of the font. Can load font file fragment. Not
on the level of tables.
... we have system fonts that are downloaded this way. This api
is not exposed.
... only used for windows system fonts.
<sergeym> https://docs.microsoft.com/en-us/windows/win32/api/fontsub/nf-fontsub-createfontpackage
Sergey: have a link.
Vlad: have question for Sergey and Myles: I wonder if the data that creates a print job you have all the layout decisions done so you have a glyph list instead of characters?
Myles: yeah but for dwrite some uses are using characters.
Sergey*
Vlad: for print jobs that I'm aware of they can assume layout is already done. Printer level application doesn't need to do that again. Probably something similar happening with PDFs.
Sergey: yeah since it was designed for print jobs layout is already done.
Vlad: if we remove that restriction and have the layout data available. You then have the problem of merging different subsets.
Sergey: in the link that I sent
you can merge patches to previously sent packages. Format of
the packages is not public and is hidden from the
application.
... for fonts app apis when we merge the package it's still a
valid fonts with lots of empty bytes.
Myles: I think the difference between the two approaches. It sounds like it's similar to the approach I'm taking you have one font file with pieces that are trivial to merge. With Garret's approach you have multiple files which are non-trivial to merge.
Sergey: don't want application to be aware of multiple fonts and you need to be careful to not have layout split between files.
Myles: that's it for this topic.
Also have an update on the optimizer.
... been working on the optimizer program. When I showed it
last it would start with random ordering and try to make random
modifications to move to better orderings over time.
... what I've done over the past couple of weeks instead of
starting with random ordering start with more intelligent
orderings. For example start with glyphs sorted by frequency.
There's 4 seeds which can be picked.
... that increases the fitness functions by 3-5% right off the
start. This week want to start contributing this.
... at a good point to start contributing.
Vlad: have you considered using
two axis of approaches using both unicode ranges and glyph
frequency.
... one of the approaches uses unicode ranges. If you treat the
unicode range as a standalone set of glyphs. I suspect there's
differences in glyph frequencies.
... for simulation of the approach that Myles is working on.
Would it make sense to apply optimization based on frequencies
that's already present.
Myles: if it's based on frequency then it already matches what I'm doing.
Vlad: so if we have two approaches one is a traditional unicode range based approach, and the other is based on font structure. Once it's done we'll have a comparison between the straightforward unicode range approach compared to bytes ranges.
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Present: Vlad sergeym Garret Myles jpamental Regrets: Persa_Zula Found ScribeNick: myles Found ScribeNick: Garret Inferring Scribes: myles, Garret Scribes: myles, Garret ScribeNicks: myles, Garret WARNING: No "Topic:" lines found. Found Date: 11 May 2020 People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]