W3C

– DRAFT –
Web Fonts Working Group Teleconference

25 July 2023

Attendees

Present
bberning, Garret, jhudson, myles, ned, skef, Vlad
Regrets
-
Chair
-
Scribe
myles

Meeting minutes

"Binned Incremental Font Transfer" presentation

ned: in both range request and iftb, the encoder is critical, but it is served statically and can be cached

ned: this is off the cuff, but that's probably reasonable

ned: What is the expected behavior with regards to layout data? How does a client determine which bins it needs?

ned: Do we perform layout with the needed features, then map that go a set of glyphs? Or are you suggesting that clients calculate a closure based on the layout data in advance?

skef: the proposed design is that all the encoder needs to do is it takes the codepoints it needs and feeds that through the first mapping (codepoints to bins)

skef: and it uses the bin to bin mapping for any relevant layout features to determine the set of bins

skef: it's the encoder's problem to make sure the bins have all the glyphs you need

Garret: This works a little like patch subset - you're getting the glyph closure rather than the specific glyphs for that text

ned: that simplifies things a lot. I like that approach because the client isn't responsible for parsing the layout table data any more than it needs to perform the layout. For aalt and nalt, you might have similar difficulties for hwid and fwid - features where you have together they can map between one another

skef: there's a set of compromises that it's pretty clear how a good encode would handle those compromises. There are a set of balances to be made between making things separate vs with duplication. This is a tricky thing for me to say - I've convinced myself that a sophisticated encoder with sophisticated things can balance all those things well. But I haven't implemented it

skef: We could talk about strategies for writing encoders

skef: There are enough options using various sub-compromises to get good perf. palt + nalt are fairly trivial for a good encoder. These are the things that cause the biggest problem for my bad encoder currently

ned: you mean aalt and nalt

skef: Yes. Those have large mappings. If you throw all those into bin 0, then you get a big bin 0

ned: the encoder can take into consideration which features are applied

skef: It can provide different mappings for different features

ned: thank you

Garret: I had a thought while you were presenting bin-to-bin mapping. Is this supported already?

Garret: can you do something like "if you see bin A and bin B then get bin B"?

Garret: That would help the ligature case

skef: For that particular case, I don't think there are enough substitutions like that to warrant it. We could add it though. This is just an area that we need more discussion for. Currently, the thing I have in there - every bin for a feature is extra, and is outside the set you start with, and there can only be one bin that's mapped for a given bin that's present. I'm already pretty sure you're going to want to grab more than 1 bin that

corresponds to a bin that you already have

skef: Let's say you have a bin with all the numbers in it. There's some feature that adds the billiard-ball version of all those numbers (like palt could)

skef: maybe you have most of those in their own chunk already, and you want to load that instead of duplicating it. But maybe some aren't in that chunk and there are additional substitutions. Your'e going to want more than one.

skef: your question boils down to "are there enough ligature substitutions in teh field to warrant things that are this complicated"

Garret: I think it will apply to typical opentype substitution layout

Garret: You might want to pull in more chunks rather than sticking them into the original chunks

skef: The way my encoder handles those cases now is it looks at the possible bins to assign things to, and it picks one. In general, I've been viewing this technology as a coarser-grained and inherently "yeah you're going to get more glyphs than you would get using other systems"

skef: For cases like this, I think that's OK. You're just going to inherently get more glyphs. But we could drive this decision by gathering data.

skef: That logic would lead to a more complicated bin-to-bin mapping. But that's just one piece that we can decide on

Garret: Sounds like it needs more discussion

skef: yes

Garret: there's upcoming work to support glyph IDs beyond 16-bits. We should make sure it's supported in whatever we do

skef: yes.

Garret: incremental loading for fonts for more than 64k glyphs is very attractive

skef: I think I already did that in the chunk file

skef: that should be trivial

Garret: just making sure you have enough width in your integer types

Garret: You mentioned an ordering requirement for tables. What is it, and why is it there?

skef: For tables, there are suggestions and there are requirements. The requirements are "there is an ordering for .... a font either has to have a CFF, a CFF2, a glyf table, or a glyf table + a gvar table." Those are the only combinations that are supported. You can't mix CFF and glyf. The primary requirement are that those tables and the loca table when it's relevant have to appear at the very end of the file. That's so the client doesn't have

to do further checksumming of the font other than those tables

skef: so the client operation is simplified

skef: in addition to that, there's a strong suggestion that the IFTB table and the cmap table appear fairly early in the font, so that a sophisticated future client could stream-decode the WOFF stream (which is a brotli stream) and get access to those tables early. Those 2 tables are all you need to request the remaining bins you need. You can start making bin requests before the rest of the data has downloaded

skef: when you empty out a glyph, you don't just fill in the hole version of it with 0s. You just wind up with an empyt entry. You'd have a loca that just says. there are 0 bytes for the glyph

Garret: another comment: we've talked so more about glyf & CFF. It's worth considering some of the colr formats. some of the color formats act like a glyf table. It won't be challenging, but just something to think about

skef: Because the chunk has the full tag name of the table (that part is already addressed)

Garret: This would work for emoji, which I agree. Supporting the bitmap tables for emoji fonts would be good

skef: We have looked at SVG, and there are some unfortunate mismatching between that and this approach

myles: it's not impossible

skef: And it makes the files significantly larger

skef: suppose you have a website for a popular TV show, or a news website. And a new season of the TV shows is debuting, or a significant news event happens

skef: a lof of people start chatting on the forum on this page

skef: with patch subset, you've got all of these requests for subsets and augmentations that depend on the particular comments that are displaying on each person's particular machines as they join this page. And they're all different. And your service falls over, and you have to serve the whole font as a fallback

skef: We know we need something that's CDN-compatible and allows taking advantage of caching.

skef: We face this problem in actual use cases

skef: in terms of the potential adoption of this as a replacement for range request: My strategy is: we're going to do these data simulations. As we're having discussions, I'll bring into the picture this kind of scenario that we face. We'll also consider real-time issues that a system will face when it's put into production. I'm comfortable with that setup, because we've designed this system to meet real-time needs

skef: and I think those will show through. That's my general attitude.

skef: Given those 2 approaches, the simulations that we do, and what I can bring in about the realtime scenarios - either the group will learn why this makes sense, or we will learn how some other thing makes sense and maybe we'll adopt that

Vlad: in the presentation, i understand the cacheability concerns. You also mentioned the probability of caching. I started thinking of 2 different scenarios: 1. when dynamic content is updated way too fast, and what you need depends on the timestamp when you visited the page, and it will be different for many visitors. Differnet hits will generate different requests

Vlad: That's one scenario: How probable will cacheability really be?

Vlad: 2. You have a collection of static pages. Each page will require updates to the font file that was previously downloaded, but the sequence of viewing those pages will be different. Different sequences with different updates. How will that affect the cacheability?

skef: Everything you just mentioned (either dynamic content, or a path through static content, which is a very common thing) those are all challenges for the cacheability of patch subset. With IFTB, you're breaking things down into separate bins, and you're going after bins that hold the independent glyphs that you need. But they're the same bins

skef: IFTB shouldn't have problems with the caching in this case. The fact that people need bin A or bin B means those will be that much warmer or cooler in the cache. It's a static set of files.

skef: With patch subset (especially with augmentation) i don't anticipate you'll find that much caching in general with augmentations. With augmentations the main way you get caching is you go to a homepage, and there's a most-likely page you go to second, and you'll get hits on that. Augmentations are not very cacheable. The initial subset will be more cacheable

skef: range-request is mostly discussed as a desire to not have a server-side (simple server side) one of the main practical use cases that that that and IFTB are serviced by cacheability

Vlad: What about CJK? Will you bin by frequency?

skef: I've been using CJK a lot. The idea is to encode a single font with those 3 scripts in mind, and make sensible bins for those scripts. One of the things that it does is it has a high-frequency, middle-frequency groupings for each scripts. It looks at the intersection of the high frequency, and makes a bin for those glyphs. And then it makes bins for the intersection of the sub-groupings of each of the 3

skef: It starts with a bin or two of korean, japanese, chinese. Then it has a bin of korean japanese, then korean chinese, etc.

skef: then it goes into high-frequency bins that are specific to those scripts. So you can pre-load the bins for those high frequency glyphs independently

skef: this system doesn't work at all if you don't know what character frequency is

skef: you have frequency data for high and medium frequency. For low frequency glyphs, you get some benefit by just by putting things in unicode order. If someone's using the glyph for "3 and a circle" it's more likely they'll be using the glyph by "4 and a circle" despite them both being very low frequency. So using unicode order for low frequency works well

skef: that gets you sensible orderings of low frequency characters

Garret: That's what we did for static subsets that we serve

Garret: We made subsets of the common characters, and for everything else we used unicode order

Vlad: Your encoder work is not part of your day job. If this is integrated, replacing range request, there's a substantial amount of spec-related work that will be needed. Garret and myles can attest that it's a non-trivial amount of time that is required.

skef: I can commit to that work as part of my ongoing participation in this working group.

Vlad: good. Just wanted to hear confirmation

skef: There's a qualitative difference in the quality of my life between improving my encoder and improving the specification. I can commit to the specification part just fine.

Garret: I liked what you proposed here. i think this is a good replacement for range request

Vlad: You (skef) have all this plus notes! don't forget to distribute notes!

skef: Should I just mail the PDF of my notes + slides to the list?

Vlad: OK

Vlad: I will capture what we have on IRC, and send that out as well

myles: this is a good proposal and should replace the range request method

Vlad: We appreciate it in the group

Vlad: For everything that comes up as a result of this, I'm looking forward to seeing this integrated into the spec we have, either as a part or as a satellite document.

Garret: I talked with skef before this. We agreed a good first step is to get that demo for patch subset hooked up to what skef did to show it working live

Garret: I'm looking forward to having that done

skef: Garret and i can discuss the current state of the encoder repository and get the dependencies under control

Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

Diagnostics

All speakers: Garret, myles, ned, skef, Vlad

Active on IRC: bberning, Garret, myles, skef, Vlad