W3C

– DRAFT –
Web Fonts Working Group Teleconference

24 September 2024

Attendees

Present
bberning, ChrisL, davelab, Dwaine, Garret, Jeff, JH, Jimmy, jpamental, Rod Garret, RodS, sergeym, Skef, Vlad
Regrets
-
Chair
Garret, Vlad
Scribe
RodS

Meeting minutes

Feedback from Dominik from Chrome

Garret: feedback from Dominik from Chrome

Garret: various formatting, nits, etc. Won't cover in detail.

Garret: he also suggested merging two sections performance considerations wrt reducing network requests

ChrisL: would prefer to lean into separation

Garret: also had a comment about error handling, lack of detail wrt CSS font loading

Garret: agreed, will firm this up

ChrisL: nuances of exactly when what errors occur need to be firmed up

Garret: also what happens if you apply a bad patch needs elaboration

Garret: there is also some language about woff2

ChrisL: we should also be mroe explicit about woff1 is fine, no further complmication

Skef: we should also define if the initial file is an opentype font compressed however you want or if compression is part of spec. Need to clarify.

Garret: prefer to leave as regular font file and clarify application of woff2. Dropping pure brotli patch will help there.

Garret: suggests encoding considerations might want to move outside spec. Or maybe an appendix?

Rod: https://docs.google.com/document/d/1MAuXOJr7FQevxTg9NhaBOB5Y45Ak9Ad3SRyKDfND8SY/edit?usp=drivesdk is notes from my review

ChrisL: unclear on removal of encoding, having that be a black box seems like a problem

Rod: question was whether it should be in spec or not

ChrisL, Skef: want something in spec, need the practical guidance

Garret: maybe an appendix, we can discuss that later

<ChrisL> https://w3c.github.io/IFT/Overview.html#abstract

Rod: tl;dr room for an editorial pass to make easier to read. Don't suggest reviewing step by step here, just perhaps one of the editors could review and see if they deem any changes to make sense.

Garret: volunteers to review Rod's doc and see if any changes make sense

ChrisL: also enjoys reviewing docs and making edits

Rod: filed as w3c/IFT#213

ChrisL: this format defines 3 patch formats each for it's own scenarios immediately before defining 4 :)

Garret: 4 format #s but 2 are the same patch type ... this might merit clarification

Garret: two brotli patches. Full font and per table. In theory whole font is mroe efficient because it spans tables but it's incompatible with woff2 due to woff2 not having stable binary output.

Skef: and it's impractical to mix with glyph-keyed patches

Garret: leaves us with a very niche, marginally better, patch format. Suggest dropping.

ChrisL: seems like a footgun

Garret: yes, can non-obviously not work or fail in the face of a new binary because woff2 doesn't guarrantee sthe output

(clarification the patch references the decoded file)

Rod: because you want to refer to the decoded data, that's the reused thing

Agreement to remove the full file brotli patch

https://github.com/w3c/IFT/issues/214, high level description left to be elaborated later

Skef: perhaps we can now change the Per Table Brotli nomenclature to better reflect what it does over how it does it

Skef: something complimentary to Glyph Keyed

(updated 214 to note ^)

Garret: encoding considerations is entirely non-normative, it's all up to you!

(https: //w3c.github.io/IFT/Overview.html#encoding-considerations leads with not normative)

Garret: so the question is does this deserve to be top-level or should it be out-spec or in an appendix to make it's auxiliary nature more obvious

Garret: brotli has an encoding considerations akin to ours

Rod: prefer in spec, whether appendix or otherwise

Skef: fan of how 7 flows into 7.1

Skef: 7.1 explains how to concretely meet the normative requirements of 7

ChrisL: would like more sub-headings for 7

ChrisL: foundries, rights owners, breaking fonts if you like, etc ... these merit breaking apart

Vlad: normative followed by non-normative section works

(nobody seems to object)

Skef: prefer this way unless it's going to produce a lot of pushback, if so we could move to an appendix

Conclusion: leave it in 7

Vlad: other specs tend to leave encoding nuances non-normative to leave it open for implementations to compete there. Make an exceptionally fast or efficient or w/e encoder as long as it decodes properly.

Skef: Privacy Considerations and Security Considerations ... are they part of 7.1? Why aren't they #'d?

ChrisL: specs often don't, but we could

Skef: if not numbered ... can we more clearly divide

RodS: can we just number them unless there is some specific reason to do otherwise?

ChrisL, Garret: no specific objection, we shall go forth and # them

Garret: (but won't # appendix etc)

<ChrisL> w3c/IFT#215

Skef: in the sections discussing interpreting formats, e.g. "Interpreting Format 1" suggest adding more words to specify what this is for. This is how to understand, not necessarily the exact thing you'd write.

Garret: next up some sections discussing extensions

Vlad: COFFEE BREAK!

Privacy Review

ChrisL: privacy review surfaced an issue about minimum patch sizes, only issue to come out

<Skef2> We have a power outage in Anaheim and are continuing the meeting offline

(pause for power outage and lunch)

Issues Review

Garret: other than editorial pass, any other spec changes people are anxious to have in?

Skef: filed issues for any I had

(review of github issues)

Garret: w3c/IFT#192 Possible "all other codepoints" semantic, I proposed a change, did people review?

Skef: seems plausible, suggests an encoder issue we might want to discuss.

Skef: in switching from patch/subset to the current approach we seem to have gained an assumption that we're going to segment the font and then produce permutations

Skef: say you segmented roughly on scripts (latin, greek, etc)

Skef: say you looked at the documents in the world you'd find many with one script. Fewer with more than one script. So often you'd want two levels and then everything else. So it's a waste to segment deeper.

Skef: when developing the encoder we'll want control over depth

Garret: we do have a note about max depth under "Managing the number of patches"

(in https://w3c.github.io/IFT/Overview.html#encoding-considerations)

Skef: say you segment into 7 scripts and one catch-all. Then you have add-a-script patches for each script. So you limit max round trips.

Skef: believe use of an everything-else bucket will be common

Skef: so we should be sure this can be done efficiently

Skef: for format 1 only

Garret: to specify everything else you have to encode the full set of what's available total

(discussion of cmap being the additional source for what's supported)

Garret: w3c/IFT#192 (comment) proposes the lean-on-cmap bit, will draft related changes

Skef: where are we at for w3c/IFT#201 Considerations for "desiccated" initial font files

Garret: I have some spec edits to make

Skef: if the client wants to make determinations about the font and doesn't have adequate information, e.g. codepoints, how do you proceed?

Skef: say you want the psname from the font

(postscript name from name table)

Garret: we once talked about explicit font not initialized

Skef: user will need to pay to load some patch. Just wondering how to select a patch absent a set of codepoints.

Skef: assuming any patch you load will have the data needed. You want to pick a small one.

Garret: we'd be making assumptions about the encoder, intent is the client follows the algorithm ... but this case is outside that.

(discussion of whether or not trusting that encoder will patch basic global tables into place if you load any patch is reasonable)

Garret: not sure the client can assume this

Skef: the font won't work if not, violates a "should" wrt the font working

Garret: so you'd try to grab whatever looks smallest

Skef: would prefer to say you should activate a codepoint

Garret: will add advice, suggesting choose a codepoint and light it up by following the algorithm

Skef: regarding encoding, we can say there's an uncompressed initial font file. Lossless compression just works. WOFF2 has additional considerations.

Garret: notably the woff2 glyf transform does not have stable output

Garret: we should note that you decompress prior to operating on it

Skef: an IFT font is an OpenType font with specific extra tables

Garret: w3c/IFT#125 Client conformance tests is next, that has it's own agenda item so maybe we can switch to that

Garret: update on w3c/IFT#101 Shared Brotli, as of August it looks like this spec is moving again

Garret: so will continue to watch, worst case we'll copy the spec text inline. I'll post an update to the issue.

Client Conformance Tests

Vlad: two buckets for entries. 1) non-normative parts of spec we won't test need to have record, what are we not testing and why. 2) everything we do want to test to ensure we know how to test.

Vlad: based on woff2 conformance testing this may be an issue, conformance testing development is time consuming

Skef: suppose you have a directory with a hierarchy of pages you load in sequence. One loads full font, one loads incremental, rendering should be the same at every step.

Skef: doesn't that test almost everything?

Garret: we need a test for every MUST, and some of them are negative. The browser must reject this font.

Skef: so not much infrastructure

Garret: right, woff2 has relatively clear sets of static files

Vlad: woff2 has a bunch of files damaged in various ways. The test is designed to load and display P/F for pass/fail.

Vlad: so we had a bunch of minimal bespoke fonts with two glyphs and almost nothing else

Garret: our MUST statements are almost all error conditions. There is also an implied requirement for tests for algorithms.

Garret: we don't require rendering identically, just that you should end up at some known end state.

Skef: I was thinking things could be described in terms of behavior

Garret: most client tests likely around applying a patch. We'll have to design tests to expose that in client-side observable behavior. We want black-box tests, look at the doc and get a clear answer.

(discussion of observation of expected result of patch application)

Garret: I volunteer as tribute!

Vlad: we want tests with a very simple pass/fail check. The people running the tests will be implementers.

Garret: we can start marking up test refs

Vlad: approach for woff2 allowed linking to specific tests which was useful

Garret: next step is to markup the spec. Not sure about timing of actually building tests, perhaps not until we have a basic Chrome implementation.

Garret: in addition to client conformance tests we might need to discuss encoder conformance tests because we have at least one normative statement there

Skef: are we going to do anything about w3c/IFT#109 It'd be interesting to discuss markup discoverability

Garret: would ideally like to discuss with Yoav

Skef: seems to imply something more sophisticated than just pick the highest priority source you support

Garret: this issue is from the patch/subset days when client could over-request, but now you just obey what the encoder tells you. The issue is probably irrelevant now.

Garret: I'll try to catch Yoav at TPAC, failing that I'll close with a comment.

Building a High Quality Encoder

Garret: final topic, how to build a high quality encoder. Not really a spec issue but interesting wrt deployment for real.

Skef: two aspects of question, neither for spec. 1) parameterization of what encoder will do. By default general, based on how docs and fonts tend to work here are some ideas about organization. Then able to tweak, influence results.

Skef: 2) how you actually build the encoder, how does it work, what is it doing

Skef: not sure what we'd say about per-table patches in terms of configuration data

Garret: in the case of a general purpose encoder for the web I think the main tuning knob is how many files you want. More files will permit higher granularity.

Garret: e.g. no more than 10k patches, my environment doesn't like large file counts

Garret: can give additional knobs, e.g. spend most on glyph key patches

Skef: if I'm doing per-table patches and I have a budget for #patches then the more finely I cut things the more round trips I have

Garret: I imagine trying to ensure things are no more than one round trip away

Garret: e.g. a patch for each subset, a patch for each pair, etc

Skef: if the knob includes the width of patch the narrower the deeper the graph will be. Split latin in half, 2 patches just for latin, etc.

Garret: want it as close as possible to Just Works

Skef: a script might be a good split point so the input data needs to tell the encoder what a script is. Maybe we have core script, extended script for each script. Some scripts would default to separate encoding, others segment per-glyph but not per-table.

Skef: for CJK the division between core and extra might be frequency based. Per-table patch for this part, and then the rest. So you have config data about core vs data cutoff for per-table vs not.

Skef: by default I might not segment a script beyond those two buckets

Garret: a file budget enables reasoning about depth in per-table patching

Skef: consider per-table and glyph-keyed separately. Both require tuning. One way to express is target # patches. Another way is depth for per-table. On glyph-keyed side you could give a target # patches.

Skef: another tunable might be patch size, another might be #glyphs in patch

Garret: #files is based on notion that a server, say a cdn, might not want vast #s of files. That's the key limit the user has.

Skef: speaking of encoding, what about emoji

Skef: early on we talked about ligatures and I pushed back

Rod: icon fonts and emoji both have ligature problems

Skef: emoji will be more separable than icon names

Skef: in a color language font things may separate relatively well, most shapes correspond to individual characters

Skef: for emoji, the other aspect is likely more complex reuse of shapes

Skef: with enough reuse you might want to grow the base font

Skef: what you merge is different in an emoji font than a language font

Skef: you might want relatively few codepoints in a glyph keyed patch

Rod: can confirm there are a small set of highly used emoji

Skef: colorized things would use the same base outlines

Skef: if someone does shape coalescing that's a problem

(speculation about how emoji fit together)

RodS: for reference, https://unicode.org/Public/emoji/16.0/emoji-test.txt is the set you are supposed to handle

Skef: if designers reuse outlines heavily across concepts the grouping is non-obvious, we have to decide whether to group or duplicate

Garret: format 1 requires disjointness but format 2 doesn't

Garret: widely duplicated things can thus be de-duplicated in format 2, it doesn't have the format 1 requirement of 1 glyph => 1 bin

Garret: in format 2 if A maps to multiple patches all of them ultimately get loaded

(discussion of how patch mapping updates in the face of patch format 2)

Skef: I still find the interaction of encoding options and patch application a bit opaque.

Garret: the encoding reasoning differs because of how the different patch types can alter the font

Skef: for variable fonts the spec really only explains how to decode. How an encoder should build the data structures there's no guidance at all.

Skef: in our spec we haven't really explained how to write an encoder. We've specified the decoder and left the encoder as an exercise.

Garret: once formats are in common representation processing differs only in order, which ones go first

Skef: why when using per-table patches does it load only one vs all for glyph keyed

Garret: algorithm says to load one by one but it is intended that a client can make speculative loads for probable future patches

<Skef> Does anyone need a zoom link for the meeting?

leaverou2, question?

<leaverou2> oops sorry, wrong room

RodS: do we define how to build a basic encoder?

Garret: I have pseudo-code for the most basic possible per-table implementation

Garret: it's relatively easy to do pure glyph-keyed and pure per-table, mixed is trickier

Skef: we define things in terms of a subsetter, which is complex, but we can elide that

Garret: I have a simple mixed-type implementation so it's definitely possible

Skef: I worry that in the way we describe things we've over-optimized for compactness and unification vs comprehensibility. Some sections are hard to understand.

(meaning compactness of spec text)

Skef: we could switch to more of how this is how per-table works, this is how glyph-keyed works

Garret: it is possible to duplicate the wording

Garret: but the current wording does reflect how an exploratory implementation went quite closely

(additional discussion of how only client [decoder] is specified, encoder is less clear ... which is arguably by design, encoding is left open to permit ongoing innovation)

Garret: we could have pseudo-code for pure per-table, pure glyph-keyed, *and* how to combine

Skef: still have questions like where does the fi ligature go

Garret: I'll have to think about it, whether we can come up with a terse correct description

Skef: consider solving correctness using the abstraction of static subsetting, that's what mine does. Can we describe what mine does tersely?

Garret: I need to try it

Skef: my implementation puts complicated things into the base font

Skef: I think you are saying merge wherever there is an interaction

Garret: the spec pseudocode needs to be usable

Garret: we could add an informal description to the apply a patch section

RodS: how to encode is the hard part

Skef: perhaps concrete examples? - OpenType has some of this and it's useful

w3c/IFT#216 for concrete example

(side discussion of encoders and Rust)

VICTORY?

<Garret> Summary of discussion about privacy issue (w3c/IFT#207) while room was offline:

<Garret> - Just enforcing minimum group sizes on the patches isn't doesn't really add anything as it's possible to construct patches that work around this while still allowing for single character level granularity.

<Garret> - Single character granularity font loading is already possible via unicode range. Functionally IFT works quite similar to unicode range and so has the very similar privacy characteristics.

<Garret> - We should document in the spec that including third party resources in a site is implying a level of trust in that resources (eg. like including third party javascript or css).

<Garret> - If we do want something like minimum group sizes as part of patch subset we'd want to to the codepoint grouping prior to executing the patch subset extension algorithm. That potentially has performance implications but may be able to be written in a way that doesn't affect performance if the font is already well formed (disjoint groupings over a

<Garret> minimum size).

Minutes manually created (not a transcript), formatted by scribe.perl version 237 (Fri Oct 4 02:31:45 2024 UTC).