<scribe> ScribeNick: myles
Garret: We made updates to the
proposal based on previous meeting's discussions. 3 big
changes.
... 1. It would be nice for the input data to have multiple
families in a page view. We can try both ways and see which way
works for us
2. We talked about having a single cost function. I proposed something.
Vlad: Yes please
Garret: 3. Updated the section
about font families to test with. We shouldn't use just google
fonts, but that's just a starting place. I mentioned some
specific holes: CFF outlines, font families in scripts from
other languages, emoji icon, or color fonts. Lastly, test both
variable and non-variable fonts
... Cost function! Let's list assumptions. The important things
are 1) We want to reward a good user experience. Here, this is
how fast the fonts will load. Most importantly, we want to
reduce "user perceptible delays". This is simplifying things,
but there is a window that if fonts load in that window it's
okay, and the longer the delay after that, the worse it will
be. The penalty for going past that point is not linear. This
is intuitive. The
browser may fall back to system fonts. Longer delays have more penalty. Lastly: The total bytes transferred is important, but is more secondary. Primary thing to worry about is the delays. If a solution is inefficient, that's bad.
Garret: With those assumptinos in
mind, we have a cost function: If the network delay for the
fonts is less than the threshold, the cost is 0. After we pass
that page load time, the cost starts rising exponentially, or
maybe just non-linear. Perhaps quadratic.
... I chose exponential because it's aggressive. Low cost near
the threshold, but cost function is very large at ~3s
... We can compute total cost function by summing up the value
of that cost function across <missed>
... We can calculate bytes transfer efficiency: We can compare
bytes against the "perfect" subset
... Bytes transferred gets wroked into this by wanting to see
that at minimum they have improved over the current state of
the art. Beyond that, it's mostly ignored
Vlad: We have multiple solutions. One is based on brotli patch mode. Another is byte range requests. One is compressed and the other would be compressed when the data travels. Byte transfer is difficult to compare between the two solutions.
Garret: With range requests, the
responses will be gzip across the wire. You lose some
compression here.
... Hopefully myles can come up with some compression
methods.
... I'm interested to see how this comes out of the analysis to
see where we end up.
Vlad: I wonder if that byte transfer efficiency can be measured before brotli
Garret: For doing the analysis,
we should be able to simulate the size of the HTTP request. We
can just use gzip.
... We want to make sure the solutions are being used in their
best form
myles: users care about what they pay for across the wire with their cell connections
Garret: Yes, we can model the HTTP requrests themselves.
Vlad: THe metric is: If the user is on a metered connection, how much the cost is
Garret: yes.
... Hopefully we will see our solutions will be worse than
perfect but better than today's state of the art
myles: the optimal solution is a full font file?
Garret: yes. It's a lower
bound.
... We can take that set of bytes and spread it out over
different page views, or say it comes in on one page
view.
... the proposal has dtails about thsi
myles: We should be using experimental data here, either by filling in variables in teh cost function or by picking a cost function
Garret: Yes. Chrome has some details we can investigate. I don't know what kind of data we could investigate
myles: we can just measure latency
Garret: But then how do you map that to vague "cost" function?
myles: i see
PeterMueller: THe lighthouse team has metrics of abandonment of page loads over time
Garret: is that data public?
PeterMueller: I haven't found any public parts. I've been dealing with <missed> and Paul Irish
Garret: If Ilya has something useful, we can talk to him!
PeterMueller: I can bring it up
jpamental: There are some starting points in the literature about abandonment after waiting 3 seconds. We should do better that, but most research seems to point to you've lost ~50% of users after 3 seconds.
Garret: Yes. the cost function should be real bad at 3s
jpamental: That's for many reasons but we don't want the fonts to be one of them.
Vlad: Speaking about reasons to lose customers, I like the idea about expected page load time, but I wonder if this is expected to be a best guess average, or if we're considering measuring page load time
Garret: I hadn't considered that. If we have a data set that is based on actual pages, for each page view, we could measure the page load on average for that page, and do it for every page in the data set. That value will be different for each page view.
Vlad: In real life, that page
load tiem will be different for each page, but for different
load itself, as conditions change.
... If you're still loading fonts when other resources arrived,
that's bad
PeterMueller: That's difficult to put in a formula. I've tried.
Vlad: It doesn't need to be put in the formula. But that "P" value can be measured
Garret: Yes, that would be more realistic.
Vlad: If we expect that our results are going to be available to the public, if someone wants to re-run the test, his test conditions may be different than ours
Garret: under the "computing network delay" section, we shoudl re-do the analysis in different network connections. 3G, 4G, wired broadband, etc. And have results for each configuration.
PeterMueller: WebPageTest and LightHouse have configurations. They are in agreement of how to categorize these configurations
Garret: I was trying to find some sample numbers to add to this.
PeterMueller: Yes. I have some data in a repository at work. But I can backtrack to where I got those numbers from
Garret: Yes, I'd love to know more
PeterMueller: I can do that.
Vlad: SOunds like we're in good shape. What about inptu data?
Garret: DO we want to focus on one family at a time over a set of page views, or be more realisitc where a page has one or more fonts
myles: the second seems obviously better
<PeterMueller> Webpagetest connection configurations: https://github.com/WPO-Foundation/webpagetest/blob/master/www/settings/connectivity.ini.sample
Garret: If you have multiple
fonts that are loading on a page, most are independent, they
are loaded in parallel. So it isn't as important to model that
in the analysis
... but i'm not strongly opposed to it
myles: realistic data is better
Garret: We might want to have a bandwidth limit, so if 3 families are loading at the same time, each one will get 1/3 bandwidth
PeterMueller: That's teh only bottleneck
myles: we can measure those ratios
PeterMueller: It will be
difficult to model because of prioritization
... We can't model external resources (images, etc.) competing
for bandwidth
Garret: yes
Vlad: also, let's say we have 2 fonts in a page, one for headlines and one for body text, so the subset sizes will be very different, and the overhead for asking for those subsets will be similar for both. But they have different conditions because one will be a small subset and one will be a big subset.
Garret: okay okay okay everyone you've convinced me
Vlad: we will need unicdoe code points too. PeterMueller: Can you discuss?
PeterMueller: THis is tooling i've been working on for the last 3 years. Idea: Build tool to statically analyze your page and build the optimal subset. We pulled out that engine and I made this font inspect tool. This runs static analysis only, no javascript. It also explores all the permutations, states, the page can be in. Looking at media queries, CSS pseudo states, etc. This is importatnt to get the complete subset you need. But we might not need it
in this WG. So we can consider only some of the permutations.
PeterMueller: What permutations
should we explore for the engine?
... It also finds print-only text, which might not be what we
want
Garret: One permutation is
probably realistic
... but this would be pretty helpful already
PeterMueller: it's pretty battletested. But it doesn't work on jpamental's website. Runs out of memory.
jpamental: The calculations, maybe?
PeterMueller: It's all the CSS custom properties which adds many permutations
myles: Does it open the page in a webview or does it parse the content directly?
PeterMueller: It's a low-level
impelmentation. It uses JSDOM and CSSOM. No headless
browser.
... There is an option of running it in a headless browser like
chrome
myles: Do youp parse CSS?
PeterMueller: yes, in an external library
myles: so this is effectively a headless browser
PeterMueller: yes
myles: this is fine as long as this browser's results match shipping browser's results
PeterMueller: In practice it does.
Garret: Sure, we can use this tool. Sounds good.
PeterMueller: The current version of the tool doesn't expose all the data that I have available (because there's a lot) but I tried to cook it down to just relevant parts: text nodes, etc.
Garret: All we need for this analysis is the list of font families and the list of Unicode code points associated with each font family.
PeterMueller: CSS properties and
values from font family block, the URL of the font, and the
text nodes
... I'll discuss whether we can figure out a good way to reduce
the permutations
... Also we want to make it run faster
Vlad: Okay! For the last portion of input data section of the proposal, about acquiring input data, I wonder if we should consider making a decision on whether we should have a synthetic page collection or using snapshots of real pages
Garret: We have some potentially promising leads on getting a data set. Let's not jump to synthetic just yet.
jpamental: If there's enough data that comes back from Peter's tool, what if we created a list of sites from Alexa?
Garret: If we need to go down the
synthetic approach, we could start with Alexa or HTTP Archive,
and do a walk between pages in that index, and use Peter's tool
to pull data from those
... The synthetic part is that the walks are not
realistic.
... It might not be that bad
myles: HTTP Archive has a fixed
snapshot
... we can't re-host the web because of licensing. HTTP Archive
solves this problem
PeterMueller: THe tool can crawl as much or as little as well tell it to. We could do that if we needed the data set to be reproducable
myles: Instead of using google fonts, can we use the archive's real font files in their real content?
Garret: Licensing! If we can do
it, we should, but we might not be able to.
... We could also start with a set of blessed fonts, and then
only test pages that happen to be using those fonts.
... This could work with Google Fonts and Typekit
Vlad: A chunk of the web uses Google Fonts. I don't know whether the specific font in use will be significant. It's more likely that the font & how it's hosted is important
myles: The HTTP range request lives and dies based on the byte sequence of specific fonts. So having a realistic corprus is important
Garret: Yes. Also these fonts will be "optimized" according to the method
<Vlad and Myles banter about which fonts should be tested>
Vlad: some fonts that are of high commercial value, and so have more effort in the production, that may be the reason why we can't get it into open source. If we want the results to be reproducable by anyone, that limits what we can use
myles: let's investigate HTTP archive.
Garret: Does HTTP Archive host font files?
myles: dunno! let's check.
Vlad: I would be surprised.
... We have 4 minutes left. Doodle poll results!
... Unfortunately only one new member added to the doodle. But
that person is only available at one time of the day, which
doesn't match other people's schedules. SO maybe we should just
have a 1/month meeting to try to accommodate different
people?
... The biggest surprise is the current time isn't the best
choice for everybody.
... jpamental mentioned the current time is unnacceptable, but
you're here right now ... so was that a mistake?
jpamental: It was a mistake.
Vlad: The best time is noon on
Mondays
... Noon on mondays conflicts with publishing working group.
But publishing working group has progressed into areas i'm not
able to contribute to. So i'm willing to do that.
<jpamental> Note on Internet Archive: unfortunately it doesn't seem to include web fonts
Vlad: monday noon eastern
time
... 9:00 am pacitic
chris: that time is fine with me too
RESOLUTION: Future WebFonts WG meetings will be at noon EDT on mondays
Garret: I will be on vacationf or the next 2 weeks
ChristopherChapman: There is an offsite for me.
Vlad: Next week, some of us wont' be available, but the week after that many will be available. I won't be available the second week of july. Can we continue on July 15?
everyone: okay
<jpamental> exit
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/arse/parse/ Default Present: Vlad, Garret, Persa_Zula, PeterMueller, ChristopherChapman, jpamental Present: Vlad Garret Persa_Zula PeterMueller ChristopherChapman jpamental Chris Myles Regrets: Jonathan_Kew Found ScribeNick: myles Inferring Scribes: myles WARNING: No "Topic:" lines found. Found Date: 20 Jun 2019 People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]