<myles> ScribeNick: myles
Garret: I sent out the link
earlier.
https://lists.w3.org/Archives/Public/public-webfonts-wg/2019May/0021.html
... The proposal is about starting discussions about which
solutions are the best. We have a few different possible ways
we could approach this problem. I laid out a simple framework
for that. This analysis is incomplete; I'm not trying to
measure CPU usage on clients and servers. Instead, this just
investigates network performance. There's lots of other things,
but we can do that in the future.
... The first thing: A set of criteria that a good solution
would have, and then a set of measurements to determine whether
or not we've won. Criteria 1: The worst-case network delay that
a client who's loading fonts encounters. The time between
starting loading a font to the end of the last request.
... Interested in the worst case. Existing solutions now load
the font once, and have a bunch of cache hits later on, which
are basically free. If you average things out, it looks good.
But progressive enrichment makes the first load smaller, and
you load additional deltas later with additional page views. So
we're improving that worst case load.
... These are sorted
... Criteria 2: The total number of bytes
... We want to keep this number low. It would be bad if over
time you ended up with lots of data transfer
... Criteria 3 (least important): Total number of network
requests. In general we should keep this small
... Now we need a set of data. Proposal: A set of font
families, and for each one, there is a collection of page view
sequences. A "page view" is a set of unicode code points. We
can determine how many pages are in a sequence (maybe a good
pick is ~10?)
... We can simulate walking through those pages using each
strategy.
... Importantly, we don't have any data yet.
... One suggestion is that we can synthesize this based on the
web index to get somewhat realistic page views, or we can track
our users
... The analysis! For each sequence, for each method, go
through the page views, and the method will tell us how well we
did on the different criteria. One cost metric is regarding
network delays. Calculating the latency of a request uses a
simplistic approach, we could improve the realisticness of
measuring this.
... From each sequence, we can make distributions of network
delays. We should look at just the tail (95th percentile) for
each approach. Then, for each solution, we need to write a
small simulator. Given a set of code points it has and needs,
it will tell you about which requests are necessary and their
sizes.
... The different methods: 1) Send whole font file 2) Using
unicode-range + language-based subsets 3) patch and subset
(Google's proposal) 4) Adobe's existing dynamic augmentation
method (Maybe Adobe can make a data set public for this?) 5)
HTTP range transfers (Apple's proposal) 6) Optimal transfer: A
lower bound for the minimal number of bytes needed to transfer.
As if you knew all the code points the user will ever need in
the future, and you
could download just those code points
Garret: We'll need some fonts, of course. Google fonts is a good place to start, but we can extend it too
Vlad: With HTTP range requests, I suspect our results will be dependent on the form of the input data
Garret: Myles proposed some
method of sorting a font file
... That method would be incorporated into this method
... The way the font is arranged may evolve over time
<chris> scribenick: chris
myles: walk per font-family, correct?
Garret: yes
myles: why? measure the amt of
time it takes to view each page
... so they happen in parallel, we need a max function
Garret: yes, should be
independent per page. Should be okay to do per-page
... multi families per page view, becoms max of the fonts
... better signal if we test per font
myles: balance between throughput and latency. your suggestion does not really take into account latency
Garret: agreed
... optimize per font, does not matter if they are all in
parallel. minimize the individual ones
myles: lets measure both to
start, see if they really are independent
... ok cool. measuring many things at the start is the right
thing. later we can hone it down
... not clear which are the best ones
Garret: yes, we don't have an ideal set of fitness functions yet
myles: measure mean as well as 95th percentile
Garret: yes
myles: do you plan on real network requests, or simulating them?
Garret: risk of measuring on same, fast network. formula intended to simulate real-world conditions
myles: browsers don't actually measure onreal networks - we assume an ideal network. But Apple has network teams
Garret: same here
... we can see what data we get from early experiments
myles: in my email I have a bunch ofissues, but this is overall a great general direction
petermueller: assuming each page is static, can we look at progressively loading pages, eg css in body or js delayed loads
Garret: yes, if we can get some
data on typical behaviour
... started with static pagewalks as easier to get data
myles: two issues. distribution
of data added toa page may be different to main static pages.
need to gather from 2 diif distrivutions. then how to model
it
... boundary between 2 pages same as adding content to the
first page? or not?
petermueller: adding content while requests are in flight though
Garret: that is related to client-side processing times.this analysis is an earlier, simplified framework
petermueller: ok, fair
Garret: same for content added or page views. data on augmented pages can be added later
myles: initially, loking for big pcture. then fine tuning once we have a good fitness metric with more realistic datasets, interruptions, overlapping requests etc
Garret: yes, at that point testig in an actal browser
Vlad: maybe use irc as a siulation of content addition
myles: still not solved the flashing problem
Vlad: (scribing takes finite time)
<myles> Vlad: minor variations in screen update times might be unnoticed by users
<myles> Vlad: That may not be as much as we might think
<myles> Vlad: Even if there's a delay of 1 second, teh viewer might not not
<myles> *know
<myles> myles: It's mechanically difficult in a browser to implement this
<myles> Vlad: We don't want to add artificial delay. It will be determined by the font load. It will be different for different users on different devices
<myles> Vlad: We could adopt this model for test purposes
<myles> Vlad: for myles: you mentioned concerns about font source for testing. I wasn't sure about that concern. It's important to have a font corpus, so we can maek sure that the corpus represents multiple categories of fonts
<myles> myles: partially to be fair to the web, and because google fonts are skewed toward latin
<myles> Garret: Yes. Also we have almost no CFF fonts.
<myles> jfkthame: Also, about how fonts are created. I suspect multiple google fonts comes from a single set of font tooling. A whole different workflow and tooling create fonts that behave differently.
<myles> Garret: I don't know much about that, but it does seem valid
<myles> chris: We need to involve tool vendors in this.
<myles> chris: We might need to change / optimize based on the results
<myles> Vlad: The concern wasn't about the source per se, but it was about diversity
<myles> myles: What's the next step?
<myles> Garret: I have some ideas about revising the document. I'll send out an updated document
<myles> Garret: Then, we want to track down a source of input data for this.
<myles> Garret: Longer term, start a github repo where we start building this.
<myles> myles: do we need to worry about licenses and charters, etc.?
<myles> Garret: probably
<myles> chris: The main thing is allowing people to contribute
<myles> Vlad: We should be ready for this now
<myles> chris: The question is more what it's scope and name should be
<myles> Garret: I've got an engineer on this who can help with this
<myles> Garret: But i'll need implementations of progressive enrichment methods
<myles> ChristopherChapman: For data, the ball is in Google's lawyers now, but on the code front <missed>. But we're on board with implementing something to compare approaches
<myles> ChristopherChapman: Adobe and Monotype have tools. KenLundeADBE is a master of making CJK fonts that break things, monotype is a master of making Arabic fonts that break things
<myles> Vlad: we have lots of noto fonts that might help
<myles> Garret: CJK noto fonts are CFF but the rest are truetypes
<myles> Vlad: They have pretty good diversity
<myles> Garret: In google fonts, the only CFF fonts we have are CJK
<myles> chris: There are open source adobe fonts
<myles> ned: Adobe also has indic and arabic CFF fonts
Adobe open-source fonts on github https://github.com/adobe-fonts
<myles> myles: beware of constructing a corpus not matching the web at large
<myles> Vlad: myles, the quality of your tool will determine how well your solution fares
<myles> myles: yep
<myles> Vlad: jpamental isn't on the call :(
<myles> Vlad: Back then, jpamental didn't have any answers about ATypI. He doesn't have any update today and isn't here today
<ChristopherChapman> Filling in from earlier: on the code front: we at Adobe are on board with setting up a way to measure the performance of our current implementation using the framework we're discussing today
<myles> Vlad: Also, as a reaction to increased participation in the group (from Japan). It's a big penalty for them to join due to the current time (PDT morning). Do we have a better time?
<ChristopherChapman> Filling in from earlier: Kamal Mansour at Monotype is a master at producing Arabic fonts that break things
<myles> Vlad: Now that we're not dependent on WebEx, we can do it any time
<myles> petermueller: For tooling that can detect which code points are used on a sequence of pages. I can automate that.
<myles> petermueller: If the pages are static (no javascript) I already have a solution
<myles> Garret: thank you!
<Vlad> Chris, will you do the wrap up with minutes?
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/ic/irc/ Default Present: Garret, ChristopherChapman, Vlad, jfkthame, petermuller, Ned, myles, KenLunde Present: Garret ChristopherChapman Vlad jfkthame petermuller Ned myles KenLunde Regrets: Jason_Pamental Found ScribeNick: myles Found ScribeNick: chris Inferring Scribes: myles, chris Scribes: myles, chris ScribeNicks: myles, chris WARNING: No "Topic:" lines found. WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Found Date: 30 May 2019 People with action items: WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]