Web Fonts Working Group Teleconference -- 19 Aug 2019

How do I set myself as scribe?

<myles> ScribeNick: Persa_Zula

scribe Persa_Zula

<chris> scribenick Persa_Zula

<scribe> ScribeNick: Persa_Zula

Review open items

I need a scribe cheapsheat

Stateless vs Stateful - will update by F2F

https://www.w3.org/Fonts/WG/track/actions/208 - will be updated by F2F

Overview of https://www.w3.org/Fonts/WG/track/actions/209

myles: There's two approaches that group has been approaching. Not talking smart server vs range requests.
... whether the client should be asking for codepoints from the server or glyphs from the server
... if the client asks for codepoints, server has to reply with every set of glyphs it can possibly reference. otoh, if the client req. glyphs, client needs to konw up front the relationships. it has to know the shaping info before it gets any glyphs at all
... it runs shaping locally without any glyphs, and as a followup sends another request for glyphs
... analysis is not specific to a format - weighing the upfront costs vs the cost spread over every request
... the relationship between the size of the outline & the size between everythign else. for large fonts, outlines are almost all of the font, and everything else is just a few % of the font file. for the glyph base approach there is one round trip but it's small compared to the rest of the size of the font file itself
... the other side, for real fonts: if a client requests a set of codepoints, how many actual glyphs will there be that would have to get sent back over the wire. interested in the metric of comparing the total number of glphs sent across the file compared to glyphs requested. only glyphs needed are requested. in the first model, if you need 5 glyphs and the server might give 100.

chris: how would this work with variable fonts

myles: work is orthagonal to variable fonts. only interested in shaping information

Vlad: if there is a font with diff. stylistic sets definied, you'd have to include all of them. alt. you'd need css data to find out which styleset
... you might also need features applied.
... in arabic, you have up to 4 glyphs for each character depending on the location in the word

myles: results - most of the extra 95 glyphs are not due to optional features - turns out that due to required language shaping rules
... relationship between these two numbers. example font. if we req. glyphs 0 - 5, how many end up in closure. if we req. GID 0-7, how many end up in closure? do it until we each end of file
... is the function linear? the two options work similarly. if the function scales super lineraly, the codepoint-based approach doesn't work very well.
... for a paricular font, you can see how superlinear the result is. did this for every font on a Windows machine. and did it for every font in the GFonts corpus.
... in the first email, first graph - every line represents a font. higher a line gets, the worse that font would work for codepoint approach
... next graph - for every font, we can associate the area under the curve - higher the area under the curve is worse for the codepoint approach
... table for the worst offenders
... in the google fonts corpus - similar results. except that the graph looks worse. the very last graph - the higher the dots are, the worse the codepoint approach is. many complex fonts don't work well for this codepoint approach
... overall result of this is that we can choose either system. but the codepoint based approach is really terrible for some languages. glyph approach has an equal cost for every language - and the cost turns out to be small.
... comparing a system that falls down on some langaguges vs a approach that has an equal cost for all languages

Garret: the glyph approach might blow up as well because it could request almost all of the font at the beginning. we need to analyze sequence of the codepoints. can't answer this until we have a browsing data set.
... it's premature to assume that it's unworkable to use codepoints

Variation and testing of different approaches to font enrichment

<Vlad> https://docs.google.com/document/d/1kx62tpy5hGIbHh6tHMAryon9Sgye--W_IsHTeCMlmEo/edit?usp=sharing

Vlad: we have 3 approaches to consider
... 1. patch based approach. 2. byte range approach suggested by myles - dependant on a standalone tool. 3. hybrid approach. using patch update mechanism so it doesn't need font data optimized but still has resembalnce to glyph based approach with byte ranges consolidated in a single pathc
... we need to consider if the original patch based approach can satisfy the original request. the approach the myles suggested might require more than one request for the first page render
... in order to be able to evaluate in real life, our framework has to account for this, and having real life data that we can use to mimic what we see out there

Garret: working on getting a dataset, might have info by the F2F
... will add hybrid approach to the doc and explaination on how it would work
... working on a prototype for the patch based approach in time for the F2F

myles: needs a subsetter that isn't super slow if am to continue experiments

Garret: harfbuzz is under development, 300x faster than fonttools for this

Vlad: framework analysis - does the cost function take into account the subsetter speed

Garret: the cost to calculate the patch vs creating the subset. computing the patch is higher
... ft parses the entire font in the object tree. HB generates almost no memory aloocations when working with the font

<scribe> ACTION: Garret to include cost of subsetting in the model

<trackbot> Created ACTION-210 - Include cost of subsetting in the model [on Garret Rieger - due 2019-08-26].

Codepoint set compression / obfuscation (see email threads [3,4]);

Garret: Updates on codepoint compression. Tried new methods in the 2nd round.
... Arithmatic coding (entropy coding) with the codepoint sets.
... Codepoint remapping. Use codepoint values and remap them to a continuous space. Points are much closer together than when using unicode values
... also reordered based on frequency. effort to get things to cluster together to use ranges
... used this on the top techniques that came from the first round and re-ran and the results look very good
... originally 4 bits per codepoint, now for CJK 1 bit per codepoint. very promising. can efficiently transfer things back and forth betwenn client and server
... can be used for glyph sets just the same - just encoding integers.

myles: mapping is done per font ?

Garret: yes. first response from server sends a guide for client on how to remap. the frequency data can be ignored if needed.

myles: first request is special. should the first reponse include shaping info?

Garret: smart server approach allows us to send whatever we want back and forth. hybrid approaach -> 1st request, codepoints. get shaping info, then talk glyphs going forward
... only incur extra penalty on the first request
... hopefully we can build this into the protocol. for a CJK maybe you always talk codepoints. Maybe in Arabic you talk glyphs instead. You can be flexible and do what's best for a given font

myles: is there ever a situation where the codepoints will be better than the glyphs?
... would requests 2-n ever want to request codepoints instead of glyphs?

Garret: for latin fonts, the non-glyph data could be bigger. if using codepoints for latin we wouldn't have to send everything up front
... we need to look at it with real world data

Vlad: including codepoints in the first request in the hybrid approach is a good twist. from the performance - you avoid costs assoc. with calculating follow up glyph closures

Garret: not that subseting will add too much cost, but it can have the benefit of keeping it simpler

Vlad: if we just update one specific big table in the binary font with everything else being the same, would the binary patching be smaller

Garret: don't know internals of brotoli but we should explore it
... we should include all of this in analysis
... close to being done with codepoint compression. close to where we'll be able to get for now.
... one last thing we can do to shrink the cost of encoding sizes - is to block up the codepoints or glyphs with one value. justifaction there in doing unicode range for CJK. the tail for CJK is infrequently used. can apply to glyphs or codepoint

F2F logistics and agenda

Vlad: almost 2 weeks from day of F2F. thanks, Jason, for venue info.
... hard stop at 5pm

jpamental: The one thing we have true for certain, the hard stop. That will give us the option to continue at dinner.

Vlad: dinner probably won't be meaningful technical discussion

myles: We need to gather an agenda for the timeslot

Vlad: Ambitious wish for the F2F. Want to end up with a defined schedule and timeline for what has to happen next. By F2F we need to finish everything we said we want to finish. For day of discussion is for you to submit Agenda+ requests and a time period you need

John_Hudson: will there be dial-in?

jpamental: we should be able to set something up

Vlad: can set up a zoom meeting - same capability now for the whole day

AOB

Vlad: not hard set on Monday meeting, if the discussion won't be beneficial. let's prepare for F2F this week. If anything else arises we can discuss next week, otherwise see you at F2F
... Jason, please keep us posted on F2F details as you find out more

jpamental: I will own that and make sure folks have all info they need for access to venue

- DRAFT -

Web Fonts Working Group Teleconference

19 Aug 2019

Attendees

Contents

Review open items

Variation and testing of different approaches to font enrichment

Codepoint set compression / obfuscation (see email threads [3,4]);

F2F logistics and agenda

AOB

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output