W3C

- DRAFT -

Web Fonts Working Group Teleconference

05 Oct 2020

Attendees

Present
Vlad, Chris, Persa_Zula, jpamental, myles, Garret
Regrets
Chair
vlad
Scribe
Persa_Zula

Contents


Apologies, I cannot find the Zoom password again - can someone DM it to me?

Found it :)

<scribe> scribeNick: Persa_Zula

Review final simulation result

Garret: Graphs for CJK section fixed, otherwise same as presented
... images are now included in the Github repo as well

Evaluation Report

Vlad: we need to get to a reasonable timeline
... good target would be week before; October 12-16
... Analysis & Conclusions section will give a more pronounced effect on the work for the next few years

Questions

chris: cost functions - why is it sigmoid (sp?)?

Garret: When max cap was added, natural way to retain the mostly exponential shape was to use sigmoid.

chris: where does -11.5 as a constant come from?

Garret: A few were added to get the min and max to fit the graph

<chris> https://docs.google.com/document/d/1kx62tpy5hGIbHh6tHMAryon9Sgye--W_IsHTeCMlmEo/edit#heading=h.r6ba2qdx0wbv

Garret: thinks -11.5 was to control the general shape of the graph; hand-tuned to get the ramp to start and end where we want it to

chris: Why were t-zed and t-m were divided by 2?

Garret: Going to need to re-review

chris: pagewalks - does this observe users somehow? where did this come from?

Garret: this came from logs of traffic internal to Google; not specific samples or recruitment of folks
... it was anonymized and aggregated before sent o ut

myles: you only see pageviews that use Google fonts or noto?

Garret: not specific to Google fonts only
... a walk of URLs , joined with an index of the web, collapsed to codepoints per font face. Not just Google fonts

chris: up until you sub in the fonts, it doesn't matter what fonts are used at all, just the codepoints matter. Then you add in the Google fonts?

Garret: if font faces don't match G fonts, then pick a font from the Google font collection that is the smallest that matches that codepoint range for the request

chris: effectively models a world where there are no local fonts?

Garret: correct

chris: lots of pagewalks, then sub-set so you get fewer. then you have languages. why not sub-sample english more and the others less?

Garret: wanted to keep distribution of languages closer to reality. that's why we created these groupings so we could see the results more clearly
... didn't want to skew the data arbitrarily

chris: it seems within each group all the languages behave the same

myles: when creating the dataset, did you parse the HTML for the font names, or loaded into browser for effects of font fallback?

Garret: just parsing the HTML with the font-face fallback chain
... and try to match a G font in that fallback chain; if that doesn't work, fallback to Google font substitution

myles: evaluation report needs to make this clear that this is specific to google fonts' fonts

Garret: correct

Vlad: does it make any difference if it's G font corpus or something else?
... speclates that other CJK fonts, for example, are similar to these

myles: not disagreeing, but if we model the web based on what Google serves, the report should clearly indicate it

Garret: agreed

jpamental: Google is aggressive on stripping down and aggressive on what it serves (no OT features); great for performance but different than if you serve it yourself

Garret: for the unicode-range, we just walk to a subset cut from the original font file; doesn't include hint-stripping or other advanced things; more representative of self-hosting in this set

chris: do we have data on absolute file size of the fonts in each 3 linguistic sets?

Garret: not on hand, but can look for it

chris: would be helpful

Garret: will collect filesize distribution

<chris> ACTION: garret to research cost function

<scribe> ACTION: Garret to collect filesize distribution for report

<trackbot> Created ACTION-225 - Research cost function [on Garret Rieger - due 2020-10-12].

<trackbot> Created ACTION-226 - Collect filesize distribution for report [on Garret Rieger - due 2020-10-12].

Section 4, Analysis and Conclusions

myles: after looking at the data; there are 2 tiers of solutions; 1 solution works really well if we want a lot of time & energy in deploying the solutions; the 2nd works better for not a lot of time but adding a direct replacement for the font

Vlad: do need to consider that we have 3 reference points of data (includes the unicode-range solution as it currently exists)
... for example, the latin graph, unicode-range is in the same ballpark as the other solutions, so the conclusion could be "do nothing" because unicode-range exists

Garret: range-request in latin and arabic categories, agreed, but in CJK space it should be considered

myles: two responses: most webfonts served today don't use unicode range. and second, unicode-range breaks shaping
... that's a serious consideration

chris: even for Latin and Arabic, breaking features is not great

Garret: for latin, range-request does worse than sending the whole font, and the whole font does retain features

chris: whole-font as the baseline was a good shift
... for latin, greek, etc, and arabic and indic, range-request is terrible in cost reduction - it's a negative effect vs sending the whole font

myles: thinks this is tweakable and is surprised it ended up this way; was designing it to minimize the number of bytes as a metric
... wished he had more time to see what can be tweaked here

chris: we can do a further work report on this point and still submit
... results show that we can't serve CJK over 2g or below connections; it's just too long of a wait time

Vlad: how do we want to structure this section for Analysis?
... by network type? by script? or by script groups?

Garret: do we want to try it with the same grouping on the report I put together, since that is how the data is put together?

Vlad: for analysis that's fine, but what about for conclusion?
... do we want just one conclusion or one per language group?

chris: there is one group that is different from the other two
... and we need to do say that as a firm conclusion

Vlad: mental model that he has come with is: a 2x2 matrix where one side is network-model switch, and the other side is different groups of languages and scripts - 4 cells in the table and depending on network model and particular scripts/language category, we might have 4 conclusions
... don't want conclusions to be too complex
... we cannot just make one yes or no decision on each particular method we analyzed

Garret: I think that fits; latin and arabic are closed, then CJK, and then the other clear division is 2G and not 2G

Vlad: or slow and fast network

myles: slow and fast is not good enough wording

Garret: high and low latency might be better generic term
... because latency is the driver in how results diverge

Vlad: if we adopt that approach in general, didn't expect that we could come to a decision during this meeting but maybe we can brainstorm on this and get to a defined groupings and conclusions in the next few days in the mail list
... is that reasonable?
... the beginning of the timeframe is next week when we need to finalize things

myles: we should make the matrix in it's full glory and then condense it down from that
... important instead of us doing it off the cuff here

Garret: agree

chris: agree

myles: make sure we're not just considering performance; there's more than performance to consider

chris: on the deployability thing.. something client-side is needed with range-request and patch-subset?

Garret: correct

chris: range-request just requires any server but pre-processing the font to change it - tool needed to do that, but it could be in an authoring
... for patch-subset, it requires a server module
... so, do these tools exist?
... how did you make those fonts, myles, and how easy it is it to get it into authoring tools?

myles: has a project (will link here) that optimizers - input is font, output is a font, output is the rearrangement
... plan is not to release it as an official product, but it would be kinda a proof of concept (maybe not the right words) - wants to show people how it's done

Garret: at the minimum, the components get flatted out, and desubroutinzed, and glyph table moved to the end

myles: the charstrings need to be at the end

Garret: is there a webserver module that can be dropped in? not yet, but it could be converted to run on an actual server. as it does today
... as it is today it would not work
... post requests from client to server is used to bypass caches, but as we design the final version we need to decide on the caching story

myles: post requests are uncachable

Garret: correct

myles: so that needs to be in the analysis

Garret: post request is not the final decision on the protocol, so some room to change that

myles: if you use a post request, also means you can't have one single request that means different thigns to different servers
... can't have a patch-subset server or a range-request servers that respond to the same request in the same way

chris: slight tangent, Simon Cozens designing a streamable font format? early work
... will post a link

<myles> https://github.com/litherum/StreamableFonts

<myles> ^^^ font optimizer

Vlad: comment on PFE description and solution - no reference of binary patching implementations
... the cost of binary patching is "free" with brotli solutions

myles: there are many binary patching solutions, what is special?

Garret: perfoms best

myles: but what makes it free?

Vlad: get the result with no effort

myles: but you need to run new software

Vlad: two step-process. first separate the piece of data, not free, requires work. once you have that data, you need to combine the data from previous requests to the new data.
... if you already have brotli compressor and decompressor available, in that case it would be free?

myles: brotli might be installed on the servers, but you can't just call a function and have it work

Vlad: if you disregard the steps for creating on the incremental subset and focus only on the binary patches and sending them to client to decompress and build a combined updated subset that includes the base subset and incremental one;

<chris> https://github.com/simoncozens/sff-spec

Garret: on the server side it's not important if it's there, but it matters on the client side; convincing a browser to add patching takes work

myles: webkit does not support brotli

Garret: edge, FF and Chrome do

<myles> CFNetwork and Core Text do, but not WebKit

Vlad: still good to mention patching can use brotli

<myles> for the record, i don't see anything about brotli in here: https://en.wikipedia.org/wiki/List_of_Apache_modules

Summary of Action Items

[NEW] ACTION: Garret to collect filesize distribution for report
[NEW] ACTION: garret to research cost function
 

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version (CVS log)
$Date: 2020/10/05 17:03:04 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision of Date 
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/sigloid/sigmoid/
Succeeded: s/arabica/arabic/
Succeeded: s/Conzen/Cozens/
Succeeded: s/worth/work/
Present: Vlad Chris Persa_Zula jpamental myles Garret
Found ScribeNick: Persa_Zula
Inferring Scribes: Persa_Zula
Found Date: 05 Oct 2020
People with action items: garret

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.


WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]