W3C

- DRAFT -

Web Fonts Working Group Teleconference

11 May 2020

Attendees

Present
Vlad, sergeym, Garret, Myles, jpamental
Regrets
Persa_Zula
Chair
Vlad
Scribe
myles, Garret

Contents


<Garret> Vlad: agenda: Garret sent an update on data using Monotype and other foundry fonts. Would like to review that data.

<Garret> Vlad: have a couple of questions about the difference about the two different results.

<myles> ScribeNick: myles

Garret: Vlad sent us some fonts from Monotype that he thought would have interesting characteristics and technical setups. I took those fonts and the Adobe set of codepoint sequences and I randomly picked the Monotype fonts and used those fonts in the Adobe sequences, and ran it through the analysis again, to test
... What we saw was performance was somewhat similar to the original Adobe data set. One difference was the transfer efficiency is less overall for most of the methods. The efficiency was about 20% extra bytes sent. So 62% more bytes than the optimal approach
... let me open it up and look at it again...
... Beyond that, we see some of the same trends on fast or low RTT connections. Once you get into a situation where large amounts of requests are heavily penalized than a single font is the best approach

Vlad: if we look at the numbers and consider the numbers to be our coarse estimate, the expectation is that the coarse estimate depends on font size itself. It was surprising that the whole font approach produced almost identical results.

Garret: You're right. Usually the cost function is based on the size of the font. The cost function we're using caps out at max. At 3 seconds, most browsers will give up on a font load. After that, any additional time isn't counted against you. So taking longer doesn't make things worse. I think what we see is for the whole font for CJK fonts, for most slow connections, they consistently take longer than the threshold

Vlad: I see on slow connections when we evaluate optimal or patch subset approach there are much larger cost functions that are recorded. Those don't seem to be capped.
... When you look at the optimal column, in mobile 2G scenario, you have over 20 million value of cost function. Yet at the same time, whole font is over 5 million .... basically, i'm not sure how to reconcile them now b/c it seems like some are capped and some aren't

Garret: One thing to realize in the whole font approach is it leverages caching. If there's a sequence of 5 pages, the first page will have the big penalty, but the remaining 4 pages will be very fast. But in incremental transfer, you could have 1 font load on each of the 5 pages. If you have a really bad connection, where a single page view will include almost 3s of RTT, then you'll hit the maximum cost for all of those 5 pages. So you end up with a

higher total cost

Vlad: So the larger numbers we see are the sum of 5 different cost functions for 5 loads

Garret: The average page sequence length is 5, and you see a ~5x multiplier. So that's why.
... In the original analysis I ran, it didn't have the capping, and that gave skewed results. Some loads take 30s, and that threw thank everything off.
... Capping it does match with the reality of how font loads are handled in most browsers
... But i'm up for having a discussion about that further.
... The cost function isn't set in stone

Vlad: ok.
... When I look at the original incremental transfer analysis based on adobe dataset, for desktop fast, we have a significant advantage shown in optimial or patch/subset. And significantly better than whole font approach. In the new simulation results, we still have an advantage between optimal vs patch/subset, but it's very very minimal. The whole font approach has a very low value. Why?

Garret: I don't know yet.
... It is probably because of how the fonts subset differently.

Vlad: The whole font approach is a tiny margin of benefit between that approach between optimal. I'm not sure if the numbers correctly capture the truth. I cannot explain the difference.
... 17.5k for whole font vs 16.5k for optimal

Garret: I can look under the hood to investigate.

Vlad: If we consider general CJK corpus, and we compare similar fonts using regular independent glyph formats vs composites, on average, that saves 2x font size. Font files are 2x as small as typical regular truetype fonts. I would expect the whole font cost to be in that range. But I'm not sure why it's such a small number.

Garret: The cost function is nonlinear. Reducing the font size by 1/2 isn't going to reduce the cost by 1/2
... Depending on where you are in the cost function, If you're on the aggressive downslope, and you reduce bytes, it will have a big effect.
... Because the fonts have different sizes, that could explain what's going on

Vlad: I would be curious if you could dig in and explain it.

Garret: I can do that this week.
... Despite the fact that the costs are almost equivalent, incremental transfer are transferring significantly less bytes overall

Vlad: Yes. For the final outcome, if we're trying to pick the winner, two drastically different sets of font data produce similar results in the cost function, that determines that one strategy is the least expensive throughout regardless of condition.
... Thank you. I'm curious why whole-font cost function gets reduced so much compared to another data set. We can look at it later

Garret: I can pick a few representative samples and show what happened for those samples. That should help explain.

Vlad: Thank you.
... Anything else?

<scribe> ScribeNick: Garret

Myles: in the CSS working group in the f2f we talked about font issues and the topic of CJK loading came up. Mentioned to the group that the webfonts working group is working on making webfont loads faster and hsould help wioth CJK
... another member Mike mentioned that they currently do this. There's a thing called linearized PDF's.
... the pdf has a sequence of pages, and can have modifications appended to the end of the file.
... there's a linearized pdf format that doesn't have these modifications.
... each page has what it needs. There's resources that are used across pages at the end of the file, and pages have reference to the end. So fonts are resources that any page might reference.
... so Mike works on a PDF viewer. They implemented lazy downloading of resources. So if you need only page 7 will only download resources needed for 7. Also did this at a per glyph level. Will only download the glyphs used by page 7.
... this was surprising because he had already implemented essentially the same range request approach I'm working on. His tool is a real shipping tool which works. So we have a existence proof the range request works in practice. So asked him to join the working group.
... his approach is almosts identical to the approach I'm using. One difference is he's using blocks which my approach does not do. The reason why blocks are valuable is for page sequences.
... will probably add a parameter for this which can be tuned.

Sergey: in windows packages can have deltas per page. Works on glyph tables. Have islands of zero bytes where the data would be. Add data based on what new glyphs are characters are on the page.

Myles: so second case of this technology being used.

Vlad: so his approach is just using byte ranges with no special reorg done?

Myles: yeah. He didn't approach trying to modify font files since this is a font viewer that needs to work with arbitrary pdf content.

Sergey: another example from windows in dwrite the loader can load by block, doesn't have special knowledge of the font. Can load font file fragment. Not on the level of tables.
... we have system fonts that are downloaded this way. This api is not exposed.
... only used for windows system fonts.

<sergeym> https://docs.microsoft.com/en-us/windows/win32/api/fontsub/nf-fontsub-createfontpackage

Sergey: have a link.

Vlad: have question for Sergey and Myles: I wonder if the data that creates a print job you have all the layout decisions done so you have a glyph list instead of characters?

Myles: yeah but for dwrite some uses are using characters.

Sergey*

Vlad: for print jobs that I'm aware of they can assume layout is already done. Printer level application doesn't need to do that again. Probably something similar happening with PDFs.

Sergey: yeah since it was designed for print jobs layout is already done.

Vlad: if we remove that restriction and have the layout data available. You then have the problem of merging different subsets.

Sergey: in the link that I sent you can merge patches to previously sent packages. Format of the packages is not public and is hidden from the application.
... for fonts app apis when we merge the package it's still a valid fonts with lots of empty bytes.

Myles: I think the difference between the two approaches. It sounds like it's similar to the approach I'm taking you have one font file with pieces that are trivial to merge. With Garret's approach you have multiple files which are non-trivial to merge.

Sergey: don't want application to be aware of multiple fonts and you need to be careful to not have layout split between files.

Myles: that's it for this topic. Also have an update on the optimizer.
... been working on the optimizer program. When I showed it last it would start with random ordering and try to make random modifications to move to better orderings over time.
... what I've done over the past couple of weeks instead of starting with random ordering start with more intelligent orderings. For example start with glyphs sorted by frequency. There's 4 seeds which can be picked.
... that increases the fitness functions by 3-5% right off the start. This week want to start contributing this.
... at a good point to start contributing.

Vlad: have you considered using two axis of approaches using both unicode ranges and glyph frequency.
... one of the approaches uses unicode ranges. If you treat the unicode range as a standalone set of glyphs. I suspect there's differences in glyph frequencies.
... for simulation of the approach that Myles is working on. Would it make sense to apply optimization based on frequencies that's already present.

Myles: if it's based on frequency then it already matches what I'm doing.

Vlad: so if we have two approaches one is a traditional unicode range based approach, and the other is based on font structure. Once it's done we'll have a comparison between the straightforward unicode range approach compared to bytes ranges.

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2020/05/11 16:59:49 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Present: Vlad sergeym Garret Myles jpamental
Regrets: Persa_Zula
Found ScribeNick: myles
Found ScribeNick: Garret
Inferring Scribes: myles, Garret
Scribes: myles, Garret
ScribeNicks: myles, Garret

WARNING: No "Topic:" lines found.

Found Date: 11 May 2020
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.


WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report


WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]