Meeting minutes
"Binned Incremental Font Transfer" presentation
ned: in both range request and iftb, the encoder is critical, but it is served statically and can be cached
ned: this is off the cuff, but that's probably reasonable
ned: What is the expected behavior with regards to layout data? How does a client determine which bins it needs?
ned: Do we perform layout with the needed features, then map that go a set of glyphs? Or are you suggesting that clients calculate a closure based on the layout data in advance?
skef: the proposed design is that all the encoder needs to do is it takes the codepoints it needs and feeds that through the first mapping (codepoints to bins)
skef: and it uses the bin to bin mapping for any relevant layout features to determine the set of bins
skef: it's the encoder's problem to make sure the bins have all the glyphs you need
Garret: This works a little like patch subset - you're getting the glyph closure rather than the specific glyphs for that text
ned: that simplifies things a lot. I like that approach because the client isn't responsible for parsing the layout table data any more than it needs to perform the layout. For aalt and nalt, you might have similar difficulties for hwid and fwid - features where you have together they can map between one another
skef: there's a set of compromises that it's pretty clear how a good encode would handle those compromises. There are a set of balances to be made between making things separate vs with duplication. This is a tricky thing for me to say - I've convinced myself that a sophisticated encoder with sophisticated things can balance all those things well. But I haven't implemented it
skef: We could talk about strategies for writing encoders
skef: There are enough options using various sub-compromises to get good perf. palt + nalt are fairly trivial for a good encoder. These are the things that cause the biggest problem for my bad encoder currently
ned: you mean aalt and nalt
skef: Yes. Those have large mappings. If you throw all those into bin 0, then you get a big bin 0
ned: the encoder can take into consideration which features are applied
skef: It can provide different mappings for different features
ned: thank you
Garret: I had a thought while you were presenting bin-to-bin mapping. Is this supported already?
Garret: can you do something like "if you see bin A and bin B then get bin B"?
Garret: That would help the ligature case
skef: For that particular case, I don't think there are enough substitutions like that to warrant it. We could add it though. This is just an area that we need more discussion for. Currently, the thing I have in there - every bin for a feature is extra, and is outside the set you start with, and there can only be one bin that's mapped for a given bin that's present. I'm already pretty sure you're going to want to grab more than 1 bin that
corresponds to a bin that you already have
skef: Let's say you have a bin with all the numbers in it. There's some feature that adds the billiard-ball version of all those numbers (like palt could)
skef: maybe you have most of those in their own chunk already, and you want to load that instead of duplicating it. But maybe some aren't in that chunk and there are additional substitutions. Your'e going to want more than one.
skef: your question boils down to "are there enough ligature substitutions in teh field to warrant things that are this complicated"
Garret: I think it will apply to typical opentype substitution layout
Garret: You might want to pull in more chunks rather than sticking them into the original chunks
skef: The way my encoder handles those cases now is it looks at the possible bins to assign things to, and it picks one. In general, I've been viewing this technology as a coarser-grained and inherently "yeah you're going to get more glyphs than you would get using other systems"
skef: For cases like this, I think that's OK. You're just going to inherently get more glyphs. But we could drive this decision by gathering data.
skef: That logic would lead to a more complicated bin-to-bin mapping. But that's just one piece that we can decide on
Garret: Sounds like it needs more discussion
skef: yes
Garret: there's upcoming work to support glyph IDs beyond 16-bits. We should make sure it's supported in whatever we do
skef: yes.
Garret: incremental loading for fonts for more than 64k glyphs is very attractive
skef: I think I already did that in the chunk file
skef: that should be trivial
Garret: just making sure you have enough width in your integer types
Garret: You mentioned an ordering requirement for tables. What is it, and why is it there?
skef: For tables, there are suggestions and there are requirements. The requirements are "there is an ordering for .... a font either has to have a CFF, a CFF2, a glyf table, or a glyf table + a gvar table." Those are the only combinations that are supported. You can't mix CFF and glyf. The primary requirement are that those tables and the loca table when it's relevant have to appear at the very end of the file. That's so the client doesn't have
to do further checksumming of the font other than those tables
skef: so the client operation is simplified
skef: in addition to that, there's a strong suggestion that the IFTB table and the cmap table appear fairly early in the font, so that a sophisticated future client could stream-decode the WOFF stream (which is a brotli stream) and get access to those tables early. Those 2 tables are all you need to request the remaining bins you need. You can start making bin requests before the rest of the data has downloaded
skef: when you empty out a glyph, you don't just fill in the hole version of it with 0s. You just wind up with an empyt entry. You'd have a loca that just says. there are 0 bytes for the glyph
Garret: another comment: we've talked so more about glyf & CFF. It's worth considering some of the colr formats. some of the color formats act like a glyf table. It won't be challenging, but just something to think about
skef: Because the chunk has the full tag name of the table (that part is already addressed)
Garret: This would work for emoji, which I agree. Supporting the bitmap tables for emoji fonts would be good
skef: We have looked at SVG, and there are some unfortunate mismatching between that and this approach
myles: it's not impossible
skef: And it makes the files significantly larger
skef: suppose you have a website for a popular TV show, or a news website. And a new season of the TV shows is debuting, or a significant news event happens
skef: a lof of people start chatting on the forum on this page
skef: with patch subset, you've got all of these requests for subsets and augmentations that depend on the particular comments that are displaying on each person's particular machines as they join this page. And they're all different. And your service falls over, and you have to serve the whole font as a fallback
skef: We know we need something that's CDN-compatible and allows taking advantage of caching.
skef: We face this problem in actual use cases
skef: in terms of the potential adoption of this as a replacement for range request: My strategy is: we're going to do these data simulations. As we're having discussions, I'll bring into the picture this kind of scenario that we face. We'll also consider real-time issues that a system will face when it's put into production. I'm comfortable with that setup, because we've designed this system to meet real-time needs
skef: and I think those will show through. That's my general attitude.
skef: Given those 2 approaches, the simulations that we do, and what I can bring in about the realtime scenarios - either the group will learn why this makes sense, or we will learn how some other thing makes sense and maybe we'll adopt that
Vlad: in the presentation, i understand the cacheability concerns. You also mentioned the probability of caching. I started thinking of 2 different scenarios: 1. when dynamic content is updated way too fast, and what you need depends on the timestamp when you visited the page, and it will be different for many visitors. Differnet hits will generate different requests
Vlad: That's one scenario: How probable will cacheability really be?
Vlad: 2. You have a collection of static pages. Each page will require updates to the font file that was previously downloaded, but the sequence of viewing those pages will be different. Different sequences with different updates. How will that affect the cacheability?
skef: Everything you just mentioned (either dynamic content, or a path through static content, which is a very common thing) those are all challenges for the cacheability of patch subset. With IFTB, you're breaking things down into separate bins, and you're going after bins that hold the independent glyphs that you need. But they're the same bins
skef: IFTB shouldn't have problems with the caching in this case. The fact that people need bin A or bin B means those will be that much warmer or cooler in the cache. It's a static set of files.
skef: With patch subset (especially with augmentation) i don't anticipate you'll find that much caching in general with augmentations. With augmentations the main way you get caching is you go to a homepage, and there's a most-likely page you go to second, and you'll get hits on that. Augmentations are not very cacheable. The initial subset will be more cacheable
skef: range-request is mostly discussed as a desire to not have a server-side (simple server side) one of the main practical use cases that that that and IFTB are serviced by cacheability
Vlad: What about CJK? Will you bin by frequency?
skef: I've been using CJK a lot. The idea is to encode a single font with those 3 scripts in mind, and make sensible bins for those scripts. One of the things that it does is it has a high-frequency, middle-frequency groupings for each scripts. It looks at the intersection of the high frequency, and makes a bin for those glyphs. And then it makes bins for the intersection of the sub-groupings of each of the 3
skef: It starts with a bin or two of korean, japanese, chinese. Then it has a bin of korean japanese, then korean chinese, etc.
skef: then it goes into high-frequency bins that are specific to those scripts. So you can pre-load the bins for those high frequency glyphs independently
skef: this system doesn't work at all if you don't know what character frequency is
skef: you have frequency data for high and medium frequency. For low frequency glyphs, you get some benefit by just by putting things in unicode order. If someone's using the glyph for "3 and a circle" it's more likely they'll be using the glyph by "4 and a circle" despite them both being very low frequency. So using unicode order for low frequency works well
skef: that gets you sensible orderings of low frequency characters
Garret: That's what we did for static subsets that we serve
Garret: We made subsets of the common characters, and for everything else we used unicode order
Vlad: Your encoder work is not part of your day job. If this is integrated, replacing range request, there's a substantial amount of spec-related work that will be needed. Garret and myles can attest that it's a non-trivial amount of time that is required.
skef: I can commit to that work as part of my ongoing participation in this working group.
Vlad: good. Just wanted to hear confirmation
skef: There's a qualitative difference in the quality of my life between improving my encoder and improving the specification. I can commit to the specification part just fine.
Garret: I liked what you proposed here. i think this is a good replacement for range request
Vlad: You (skef) have all this plus notes! don't forget to distribute notes!
skef: Should I just mail the PDF of my notes + slides to the list?
Vlad: OK
Vlad: I will capture what we have on IRC, and send that out as well
myles: this is a good proposal and should replace the range request method
Vlad: We appreciate it in the group
Vlad: For everything that comes up as a result of this, I'm looking forward to seeing this integrated into the spec we have, either as a part or as a satellite document.
Garret: I talked with skef before this. We agreed a good first step is to get that demo for patch subset hooked up to what skef did to show it working live
Garret: I'm looking forward to having that done
skef: Garret and i can discuss the current state of the encoder repository and get the dependencies under control