W3C

- DRAFT -

Web Fonts Working Group Teleconference

31 Jan 2019

Attendees

Present
Vlad, J_Hudson, RSheeter, sergeym, chris, Garret_Rieger, jpamental, Ken, Lunde, Christopher, Chapman, Ned_Holbrook
Regrets
Chair
vlad
Scribe
myles

Contents


<scribe> ScribeNick: myles

Vlad: Thank you for joining
... There was an agenda for more than a few hours, that should be enough. The discussion probably won't be short. Hopefully we'll have enough time. The first item is something I brought up on the first call, and we didn't get there.
... Scope of work. We have some tasks that need to be completed by group charter. Some are short term: Investigation of the solutions, gathering of the use cases, investigation to make sure we don't overlook anything obvious. And putting it in a form where if any asks in a year why we did someone in a particular way, we can tell them to read the evaluation report and don't bother us. That's a short term goal
... The long term goal is whatever product we make by gather all of this information. We won't know off hand exactly what to do, but we are all well-informed in what we've tried and decided what to use and what not to use
... Garret and Rod/ :D presented a great writeup

Garret_Rieger: Me and Rod had some thoughts about what what came up at the previous meeting.

<chris> https://docs.google.com/document/d/1AQ2VwiVwF77H2h_nuDHR1A5hRyGlpyIQYpYodMtEz1w/edit

Garret_Rieger: 1. What existing solutions exist.
... Google has tried 3 solutions so far: Unicode-range to split a CJK font into small pieces. This works fine for some languages like CJK, which don't make heavy use of open type layout features, but falls down completely if your script has layout features. For arabic or indic, this is a non-starter. Google fonts uses this approach for CJK today.
... 2. Tachie Font. Experimental. Incremental transfer approach. Take a full font, zero-out the outline data and some other associated data, and send to client at the start of a session. Compression algorithsm work well because lots of 0s. The "skeleton" is small. When the cleint needs outline data, it asks the server for it, and server sends it, and the client patches it in. The whole GSUB/GPOS is sent up front, this only works for outlines.

Doesn't work well for fonts with lots of layout data.

<Vlad> some additional background on TachyFont could be found in https://lists.w3.org/Archives/Public/public-webfonts-wg/2017Nov/0024.html

<chris> doesn't deal with large GSUB/GPOS

<ChristopherC> Vlad, can you mute while Garret's talking? Getting feedback from you it seems.

Garret_Rieger: This splits the logic between the server and the client. Both server and client need to understand how to modify font files. We implemented this in JS. It's an experiment. Google Play uses it to server Noto Sans JP to Japanese customers. Never expanded beyond that. Today, Google Fonts usually uses the unicode-range solution

<ChristopherC> Much better, thanks.

<RSheeter> Google Fonts always uses unicode-range [if client supports]. For CJK we split into small pieces. For others, larger blocks.

Garret_Rieger: 3: Incremental transfer demo. When the client needs more code points in their font, the server computes a binary delta between two fonts, and sends it across the wire, and the client binary-patches it into the font. This is our preferred approach. Simplifies the approach completely, keeps all logic on server. All it needs a binary patching algorithm and a font subsetting algorithm, both of which exist and are open source. Keeps everything

simple. Hopefully will be optimial in the amount of data it has to transfer

<Vlad> Sorry folks for extra noise

Garret_Rieger: Don't have an actual JS client yet. Hasn't been used for anything yet
... Question 2! What would customers be looking for?
... A few thoughts, ordered by priority.
... Biggest thing: Minimizing latency for using web fonts. Latency minimization will happen through minimizing bytes.
... ultimately, we should be trying to reducing latency
... Another advantage: Web developers shouldn't have to worry about how big their fonts are. Users have a difficult time using Noto Sans because they're so big.
... OpenType layout should "just work" with whatever solution we build. With unicode-range, this isn't true today.
... It should support cross-domain caching. The most effective optimization in Google Fonts is cross-domain caching.
... Ease of use for the end user is pretty important. Systems are complicated. Users should be able to drop in fonts into the system and everything just works.
... We prefer solutions that keep the client as simple as possible.
... This allows us to transparently improve the system over time. They automatically work transparently
... The server could send more than what the client requested. Predictively
... Next: Transparent negotion between client and server of the tools. Switch between different patching algorithms or subsetting algorithms.
... THere shouldn't be a big initial cost. Each request should only give the client what they need
... THis mechanism could also be used for transparent font version upgrades
... What do we measure? We have access to a public data set which has browsing behaviors. The solution should minimize latency at the client. The data is a set of code points. Latency = time to rendering text in a given font. This analysis should be done script by script. Hopefully there will be improvements across all scripts
... Should show that this solution improves real world latency

Vlad: I remember vaguely there was an opportunity in early Google Fonts for a font request to ask for a static subset

<chris> dynamic subsetting api shouldstill be listed in the proposed solutions, even though its not a good one

Garret_Rieger: We have the dynamic subsetting API. If you know which code points a priori what code points you need, Google Fonts can send you a subset for exactly those code points. But it isn't very helpful because usually people don't know exactly which code poitns they will be needing.

<RSheeter> Our IUC42 presentation has an overview our path to incremental; http://unicodeconference.org/presentations-42/S5T3-Sheeter.pdf

chris: Microsoft did that with their early tool. You could hand it a webpage, from the client, and it would decide which code poitns you needed, and it would spit out a font that supports that.
... we don't think it's a good fit

<chris> That was their EOT creation tool

sergeym: I can talk about that from Microsoft's perspective

<RSheeter> That's Sergey, not Rod

sergeym: Font Web is almost identical to approach #2 from google, it was for printing. It zeroed out the glyph data which was not used. The optimization was to print the first page fo the document very fast. The subsetted font contained only what was in the first page of the document
... Another one which is used in DWrite for system fonts. Client can request particular code points form the font and query which code poitns are available on the client, and it can request additional code points. THe difference is that the font is requested in chunks. Based on code points, we know which parts of the font are required to covered the code poitns, and then blocks, 4k chunks or 64k chunks are sent. THis is a public API that you can

query from DWrite. So I think this is 2 things that we have. The two that we mentioned, subsetting the font based on crawling the website, it's just defining which subset you want to create. It's just a convenience tool. THere is no delta patching, there is no incremental update.

sergeym: 2 questions: Since we are in the world of browsers and HTTP, how do we cache the results on the client? Because the HTTP requests will come in form of "give me additional code points and delta." What if the next session requests code points in a different order. How will we use it? We could develop a new caching mechanism.

Garret_Rieger: That's what we were expecting. IF you start a new session, and you've already got half the font cached, we should start with that. We might need a new cachign mechanism?

sergeym: Will we replay the set of deltas at the beginning of the next session?

Garret_Rieger: something like that

chris: That's definitely needed, the server doesn't have all that data. And website A and B can't just use the same data in the browser, it would have to be fetched from cache. There are two caches, system caches and PWA cache.

Garret_Rieger: we'd like to be able to use data from previous sessions

chris: Regarding patching to a new version of a font: People get around this by replacing files in the list of files to download, or preferring a local font because it's better. This should interact with PWA

<RSheeter> We've been imagining that you "enrich" from a given url and that receiving a patch more or less updates the cache entry for that url. Similar to requesting an updated asset for something expired, except now the new entry is a combination of the existing one and the patch.

<RSheeter> (total speculation ofc)

sergeym: I assume that downloaded fonts will be specified in some way inside @font-face
... will there be any scenarios that require programmatic access through JS where the page will query which characters are already available on the client?

myles: Such a facility shouldn't be able to give any more information than is already available today

sergeym: Websites can know this information today by timing attacks

Vlad: the ideal solution shouldn't involve JS becasue of performance concerns

sergeym: the font loading API allows you to wait until a font is finished downloading.

Vlad: Yes, instead of a JS agent that would determine which code points to ask for. Just have a native implementation where the browser already knows which code poitns are present.

sergeym: The browser may not know which characters may be used. E.g. canvas, until the drawing call happens, we don't know which characters to request.

<silence>

chris: i agree

sergeym: It would probably match waht we have now for @font-face and unicode-range. It should be enabled for what we have now

chris: The API lets you know when the font finshes downloading. What we're doing probably complicates the idea of "finished downloading"

sergeym: "finished" means you download the fonts which have unicode-range that intersects the given string
... it doesn't need to change much

chris: It does need to change, because picking a font is a binary choice, the font either supports the character or not. But the font could be halfway downloaded. It's the timing about when it tells you that it's loaded

sergeym: You still need to re-run font matching.

<RSheeter> If you match on unicode-range you have to match again because the fiel that comes down may not actually match what the range says

<chris> exactly, when is the font cleared to use

sergeym: Different @font-face rules with unicode-range. It should be exactly the same. I don't see what the difference is

<RSheeter> Agree, unicode-range has this problem already

jpamental: What adding a flag that says "this is a partially downloaded font"? You could let the browser know yes it's here, but it's not finished. So the browser knows when it should look for more glypsh

Vlad: we're drifting off the agenda to a tangent
... we shouldn't be solving right now

sergeym: agreed

Vlad: we can assume a patch is specific to a part of a page, rather than the whole page. We can discuss this later.
... is there anyone else who would like to share prior experiences?

ChristopherC: Two people are tied up and won't be able to contribute for a while. We're behind the binary patches idea, it's the right way to go in our experience

<ChristopherC> Bram & Persa are tied up until end of February.

Vlad: From Monotype's perspective, we've experimented with dynamic font subsetting, where JS analyzes the page. It was supposed to be for all languages, but then was limited to CJK only because of performance. It was more efficient to download the whole font for latin than to do subsetting.

<chris> oh, good to know there is a JS hit for smaller fonts

Vlad: That's why, in our minds, JS could be useful, but I would like us to at least consider an implementation where JS isn't required
... At least in a simple case.

myles: we agree

Vlad: This should be done by the browser, not JS.

Garret_Rieger: agree

<RSheeter> agree

Vlad: There are multiple approaches to do font subsetting. You can aim for efficiency (only leave info in the font that is truly necessary) to minimize bytes. Merging subsets is difficult. Clashing glyphIDs. Control value table for truetype glyphs, Can't reconstruct the new table that has all the entries. Sometimes this is difficult or impossible.
... Other approaches: leave holes in the font, to fill them in later. Less optimal. Really complicated on the client side, even when it's possible it's still difficult. Considering the patch based solution is preferred

<chris> Its a good idea to make that dataset though

<chris> agree on the performance analysis and benchmarks

myles: makes a spiel about benchmarking an performance

Vlad: The patch-based approach already gives you what you want
... With patches, you have two approaches: minimal bytes, or some redundancies to allow for easier integration. re-using WOFF2 compression and font format

myles: love to see a benchmark that we can use to rate all the different solutions. We are doing a performance task, which we have done many times before and we shoudl hold ourselves to the saem standard we have always been holding ourselves to when doing performance. THis means makign benchmarks
... once we have a benchmark, i'd like to use it to inform an investigation into a new font file format that works liek video files and allows using range requests to do subsetting

Garret_Rieger: Me and Rod had thought of using WOFF2. We also thougth of using the glyph table transformation and using it in the patchign. I don't htink it makes sense to make a delta after brotli, but some parts of woff format makes sense to use in the patch

Vlad: Each delta is a standard file
... And we could develop a format for patches.
... we don't want to deviate too much from what we have
... the success criteria is how easy it would be to adopt this

Garret_Rieger: ease of use is important

myles: the function of a benchmark is a tool that helps you pick the best solution

jpamental: It should be easy to make a few test pages that are typical and have those as exemplar pages that we can maek comparisons with

myles: that's exactly the sort of thing i was thinking about.

Vlad: My attempt to collect as much information as possible about prior experiences is to try to minimize prior mistakes
... we can move faster if we act jsut based on what we already know

myles: I agree we should use our previous experiences, but data is necessary

Vlad: i'm glad we agree

jpamental: I'm happy to help

<RSheeter> I think we want real page load sequences we can test on. And in time to test with real traffic, enable on canary + google fonts or some such.

Garret_Rieger: Google wants to help too

ChristopherC: We'd like to share some data that we have, but I have to deal with lawyers

chris: Let's keep the discussion on the mailing list
... this is part of our deliverable. It will go in the implementation report. We should discuss this in the WG.

Vlad: Yes. Agreed
... I don't want to create silos within the group. We should act as one body
... We will have to define success criteria for different stages in this group, so if you come up with something you'd like to share, we can do it on the email list
... if you want to add somethign to teh agenda for the next call, speak now or forever hold your peace

<silence>

Vlad: bye!

<chris> (adjourned)

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/01/31 18:02:00 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/Rob/Rod/ :D/
Succeeded: s/chaching/caching/
Succeeded: s/Rob/sergeym/
Succeeded: s/Rob/sergeym/
Succeeded: s/teh/the/
Present: Vlad J_Hudson RSheeter sergeym chris Garret_Rieger jpamental Ken Lunde Christopher Chapman Ned_Holbrook
Found ScribeNick: myles
Inferring Scribes: myles

WARNING: No "Topic:" lines found.

Found Date: 31 Jan 2019
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.


WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report


WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]