Apologies, I cannot find the Zoom password again - can someone DM it to me?
Found it :)
<scribe> scribeNick: Persa_Zula
Garret: Graphs for CJK section
fixed, otherwise same as presented
... images are now included in the Github repo as well
Vlad: we need to get to a
reasonable timeline
... good target would be week before; October 12-16
... Analysis & Conclusions section will give a more
pronounced effect on the work for the next few years
chris: cost functions - why is it sigmoid (sp?)?
Garret: When max cap was added, natural way to retain the mostly exponential shape was to use sigmoid.
chris: where does -11.5 as a constant come from?
Garret: A few were added to get the min and max to fit the graph
Garret: thinks -11.5 was to control the general shape of the graph; hand-tuned to get the ramp to start and end where we want it to
chris: Why were t-zed and t-m were divided by 2?
Garret: Going to need to re-review
chris: pagewalks - does this observe users somehow? where did this come from?
Garret: this came from logs of
traffic internal to Google; not specific samples or recruitment
of folks
... it was anonymized and aggregated before sent o ut
myles: you only see pageviews that use Google fonts or noto?
Garret: not specific to Google
fonts only
... a walk of URLs , joined with an index of the web, collapsed
to codepoints per font face. Not just Google fonts
chris: up until you sub in the fonts, it doesn't matter what fonts are used at all, just the codepoints matter. Then you add in the Google fonts?
Garret: if font faces don't match G fonts, then pick a font from the Google font collection that is the smallest that matches that codepoint range for the request
chris: effectively models a world where there are no local fonts?
Garret: correct
chris: lots of pagewalks, then sub-set so you get fewer. then you have languages. why not sub-sample english more and the others less?
Garret: wanted to keep
distribution of languages closer to reality. that's why we
created these groupings so we could see the results more
clearly
... didn't want to skew the data arbitrarily
chris: it seems within each group all the languages behave the same
myles: when creating the dataset, did you parse the HTML for the font names, or loaded into browser for effects of font fallback?
Garret: just parsing the HTML
with the font-face fallback chain
... and try to match a G font in that fallback chain; if that
doesn't work, fallback to Google font substitution
myles: evaluation report needs to make this clear that this is specific to google fonts' fonts
Garret: correct
Vlad: does it make any difference
if it's G font corpus or something else?
... speclates that other CJK fonts, for example, are similar to
these
myles: not disagreeing, but if we model the web based on what Google serves, the report should clearly indicate it
Garret: agreed
jpamental: Google is aggressive on stripping down and aggressive on what it serves (no OT features); great for performance but different than if you serve it yourself
Garret: for the unicode-range, we just walk to a subset cut from the original font file; doesn't include hint-stripping or other advanced things; more representative of self-hosting in this set
chris: do we have data on absolute file size of the fonts in each 3 linguistic sets?
Garret: not on hand, but can look for it
chris: would be helpful
Garret: will collect filesize distribution
<chris> ACTION: garret to research cost function
<scribe> ACTION: Garret to collect filesize distribution for report
<trackbot> Created ACTION-225 - Research cost function [on Garret Rieger - due 2020-10-12].
<trackbot> Created ACTION-226 - Collect filesize distribution for report [on Garret Rieger - due 2020-10-12].
myles: after looking at the data; there are 2 tiers of solutions; 1 solution works really well if we want a lot of time & energy in deploying the solutions; the 2nd works better for not a lot of time but adding a direct replacement for the font
Vlad: do need to consider that we
have 3 reference points of data (includes the unicode-range
solution as it currently exists)
... for example, the latin graph, unicode-range is in the same
ballpark as the other solutions, so the conclusion could be "do
nothing" because unicode-range exists
Garret: range-request in latin and arabic categories, agreed, but in CJK space it should be considered
myles: two responses: most
webfonts served today don't use unicode range. and second,
unicode-range breaks shaping
... that's a serious consideration
chris: even for Latin and Arabic, breaking features is not great
Garret: for latin, range-request does worse than sending the whole font, and the whole font does retain features
chris: whole-font as the baseline
was a good shift
... for latin, greek, etc, and arabic and indic, range-request
is terrible in cost reduction - it's a negative effect vs
sending the whole font
myles: thinks this is tweakable
and is surprised it ended up this way; was designing it to
minimize the number of bytes as a metric
... wished he had more time to see what can be tweaked here
chris: we can do a further work
report on this point and still submit
... results show that we can't serve CJK over 2g or below
connections; it's just too long of a wait time
Vlad: how do we want to structure
this section for Analysis?
... by network type? by script? or by script groups?
Garret: do we want to try it with the same grouping on the report I put together, since that is how the data is put together?
Vlad: for analysis that's fine,
but what about for conclusion?
... do we want just one conclusion or one per language
group?
chris: there is one group that is
different from the other two
... and we need to do say that as a firm conclusion
Vlad: mental model that he has
come with is: a 2x2 matrix where one side is network-model
switch, and the other side is different groups of languages and
scripts - 4 cells in the table and depending on network model
and particular scripts/language category, we might have 4
conclusions
... don't want conclusions to be too complex
... we cannot just make one yes or no decision on each
particular method we analyzed
Garret: I think that fits; latin and arabic are closed, then CJK, and then the other clear division is 2G and not 2G
Vlad: or slow and fast network
myles: slow and fast is not good enough wording
Garret: high and low latency
might be better generic term
... because latency is the driver in how results diverge
Vlad: if we adopt that approach
in general, didn't expect that we could come to a decision
during this meeting but maybe we can brainstorm on this and get
to a defined groupings and conclusions in the next few days in
the mail list
... is that reasonable?
... the beginning of the timeframe is next week when we need to
finalize things
myles: we should make the matrix
in it's full glory and then condense it down from that
... important instead of us doing it off the cuff here
Garret: agree
chris: agree
myles: make sure we're not just considering performance; there's more than performance to consider
chris: on the deployability thing.. something client-side is needed with range-request and patch-subset?
Garret: correct
chris: range-request just
requires any server but pre-processing the font to change it -
tool needed to do that, but it could be in an authoring
... for patch-subset, it requires a server module
... so, do these tools exist?
... how did you make those fonts, myles, and how easy it is it
to get it into authoring tools?
myles: has a project (will link
here) that optimizers - input is font, output is a font, output
is the rearrangement
... plan is not to release it as an official product, but it
would be kinda a proof of concept (maybe not the right words) -
wants to show people how it's done
Garret: at the minimum, the components get flatted out, and desubroutinzed, and glyph table moved to the end
myles: the charstrings need to be at the end
Garret: is there a webserver
module that can be dropped in? not yet, but it could be
converted to run on an actual server. as it does today
... as it is today it would not work
... post requests from client to server is used to bypass
caches, but as we design the final version we need to decide on
the caching story
myles: post requests are uncachable
Garret: correct
myles: so that needs to be in the analysis
Garret: post request is not the final decision on the protocol, so some room to change that
myles: if you use a post request,
also means you can't have one single request that means
different thigns to different servers
... can't have a patch-subset server or a range-request servers
that respond to the same request in the same way
chris: slight tangent, Simon
Cozens designing a streamable font format? early work
... will post a link
<myles> https://github.com/litherum/StreamableFonts
<myles> ^^^ font optimizer
Vlad: comment on PFE description
and solution - no reference of binary patching
implementations
... the cost of binary patching is "free" with brotli
solutions
myles: there are many binary patching solutions, what is special?
Garret: perfoms best
myles: but what makes it free?
Vlad: get the result with no effort
myles: but you need to run new software
Vlad: two step-process. first
separate the piece of data, not free, requires work. once you
have that data, you need to combine the data from previous
requests to the new data.
... if you already have brotli compressor and decompressor
available, in that case it would be free?
myles: brotli might be installed on the servers, but you can't just call a function and have it work
Vlad: if you disregard the steps for creating on the incremental subset and focus only on the binary patches and sending them to client to decompress and build a combined updated subset that includes the base subset and incremental one;
<chris> https://github.com/simoncozens/sff-spec
Garret: on the server side it's not important if it's there, but it matters on the client side; convincing a browser to add patching takes work
myles: webkit does not support brotli
Garret: edge, FF and Chrome do
<myles> CFNetwork and Core Text do, but not WebKit
Vlad: still good to mention patching can use brotli
<myles> for the record, i don't see anything about brotli in here: https://en.wikipedia.org/wiki/List_of_Apache_modules
This is scribe.perl Revision of Date Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/sigloid/sigmoid/ Succeeded: s/arabica/arabic/ Succeeded: s/Conzen/Cozens/ Succeeded: s/worth/work/ Present: Vlad Chris Persa_Zula jpamental myles Garret Found ScribeNick: Persa_Zula Inferring Scribes: Persa_Zula Found Date: 05 Oct 2020 People with action items: garret WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]