Meeting minutes
Feedback from Dominik from Chrome
Garret: feedback from Dominik from Chrome
Garret: various formatting, nits, etc. Won't cover in detail.
Garret: he also suggested merging two sections performance considerations wrt reducing network requests
ChrisL: would prefer to lean into separation
Garret: also had a comment about error handling, lack of detail wrt CSS font loading
Garret: agreed, will firm this up
ChrisL: nuances of exactly when what errors occur need to be firmed up
Garret: also what happens if you apply a bad patch needs elaboration
Garret: there is also some language about woff2
ChrisL: we should also be mroe explicit about woff1 is fine, no further complmication
Skef: we should also define if the initial file is an opentype font compressed however you want or if compression is part of spec. Need to clarify.
Garret: prefer to leave as regular font file and clarify application of woff2. Dropping pure brotli patch will help there.
Garret: suggests encoding considerations might want to move outside spec. Or maybe an appendix?
Rod: https://
ChrisL: unclear on removal of encoding, having that be a black box seems like a problem
Rod: question was whether it should be in spec or not
ChrisL, Skef: want something in spec, need the practical guidance
Garret: maybe an appendix, we can discuss that later
<ChrisL> https://
Rod: tl;dr room for an editorial pass to make easier to read. Don't suggest reviewing step by step here, just perhaps one of the editors could review and see if they deem any changes to make sense.
Garret: volunteers to review Rod's doc and see if any changes make sense
ChrisL: also enjoys reviewing docs and making edits
Rod: filed as w3c/
ChrisL: this format defines 3 patch formats each for it's own scenarios immediately before defining 4 :)
Garret: 4 format #s but 2 are the same patch type ... this might merit clarification
Garret: two brotli patches. Full font and per table. In theory whole font is mroe efficient because it spans tables but it's incompatible with woff2 due to woff2 not having stable binary output.
Skef: and it's impractical to mix with glyph-keyed patches
Garret: leaves us with a very niche, marginally better, patch format. Suggest dropping.
ChrisL: seems like a footgun
Garret: yes, can non-obviously not work or fail in the face of a new binary because woff2 doesn't guarrantee sthe output
(clarification the patch references the decoded file)
Rod: because you want to refer to the decoded data, that's the reused thing
Agreement to remove the full file brotli patch
https://
Skef: perhaps we can now change the Per Table Brotli nomenclature to better reflect what it does over how it does it
Skef: something complimentary to Glyph Keyed
(updated 214 to note ^)
Garret: encoding considerations is entirely non-normative, it's all up to you!
(https: //w3c.github.io/IFT/Overview.html#encoding-considerations leads with not normative)
Garret: so the question is does this deserve to be top-level or should it be out-spec or in an appendix to make it's auxiliary nature more obvious
Garret: brotli has an encoding considerations akin to ours
Rod: prefer in spec, whether appendix or otherwise
Skef: fan of how 7 flows into 7.1
Skef: 7.1 explains how to concretely meet the normative requirements of 7
ChrisL: would like more sub-headings for 7
ChrisL: foundries, rights owners, breaking fonts if you like, etc ... these merit breaking apart
Vlad: normative followed by non-normative section works
(nobody seems to object)
Skef: prefer this way unless it's going to produce a lot of pushback, if so we could move to an appendix
Conclusion: leave it in 7
Vlad: other specs tend to leave encoding nuances non-normative to leave it open for implementations to compete there. Make an exceptionally fast or efficient or w/e encoder as long as it decodes properly.
Skef: Privacy Considerations and Security Considerations ... are they part of 7.1? Why aren't they #'d?
ChrisL: specs often don't, but we could
Skef: if not numbered ... can we more clearly divide
RodS: can we just number them unless there is some specific reason to do otherwise?
ChrisL, Garret: no specific objection, we shall go forth and # them
Garret: (but won't # appendix etc)
<ChrisL> w3c/
Skef: in the sections discussing interpreting formats, e.g. "Interpreting Format 1" suggest adding more words to specify what this is for. This is how to understand, not necessarily the exact thing you'd write.
Garret: next up some sections discussing extensions
Vlad: COFFEE BREAK!
Privacy Review
ChrisL: privacy review surfaced an issue about minimum patch sizes, only issue to come out
<Skef2> We have a power outage in Anaheim and are continuing the meeting offline
(pause for power outage and lunch)
Issues Review
Garret: other than editorial pass, any other spec changes people are anxious to have in?
Skef: filed issues for any I had
(review of github issues)
Garret: w3c/
Skef: seems plausible, suggests an encoder issue we might want to discuss.
Skef: in switching from patch/subset to the current approach we seem to have gained an assumption that we're going to segment the font and then produce permutations
Skef: say you segmented roughly on scripts (latin, greek, etc)
Skef: say you looked at the documents in the world you'd find many with one script. Fewer with more than one script. So often you'd want two levels and then everything else. So it's a waste to segment deeper.
Skef: when developing the encoder we'll want control over depth
Garret: we do have a note about max depth under "Managing the number of patches"
(in https://
Skef: say you segment into 7 scripts and one catch-all. Then you have add-a-script patches for each script. So you limit max round trips.
Skef: believe use of an everything-else bucket will be common
Skef: so we should be sure this can be done efficiently
Skef: for format 1 only
Garret: to specify everything else you have to encode the full set of what's available total
(discussion of cmap being the additional source for what's supported)
Garret: w3c/
Skef: where are we at for w3c/
Garret: I have some spec edits to make
Skef: if the client wants to make determinations about the font and doesn't have adequate information, e.g. codepoints, how do you proceed?
Skef: say you want the psname from the font
(postscript name from name table)
Garret: we once talked about explicit font not initialized
Skef: user will need to pay to load some patch. Just wondering how to select a patch absent a set of codepoints.
Skef: assuming any patch you load will have the data needed. You want to pick a small one.
Garret: we'd be making assumptions about the encoder, intent is the client follows the algorithm ... but this case is outside that.
(discussion of whether or not trusting that encoder will patch basic global tables into place if you load any patch is reasonable)
Garret: not sure the client can assume this
Skef: the font won't work if not, violates a "should" wrt the font working
Garret: so you'd try to grab whatever looks smallest
Skef: would prefer to say you should activate a codepoint
Garret: will add advice, suggesting choose a codepoint and light it up by following the algorithm
Skef: regarding encoding, we can say there's an uncompressed initial font file. Lossless compression just works. WOFF2 has additional considerations.
Garret: notably the woff2 glyf transform does not have stable output
Garret: we should note that you decompress prior to operating on it
Skef: an IFT font is an OpenType font with specific extra tables
Garret: w3c/
Garret: update on w3c/
Garret: so will continue to watch, worst case we'll copy the spec text inline. I'll post an update to the issue.
Client Conformance Tests
Vlad: two buckets for entries. 1) non-normative parts of spec we won't test need to have record, what are we not testing and why. 2) everything we do want to test to ensure we know how to test.
Vlad: based on woff2 conformance testing this may be an issue, conformance testing development is time consuming
Skef: suppose you have a directory with a hierarchy of pages you load in sequence. One loads full font, one loads incremental, rendering should be the same at every step.
Skef: doesn't that test almost everything?
Garret: we need a test for every MUST, and some of them are negative. The browser must reject this font.
Skef: so not much infrastructure
Garret: right, woff2 has relatively clear sets of static files
Vlad: woff2 has a bunch of files damaged in various ways. The test is designed to load and display P/F for pass/fail.
Vlad: so we had a bunch of minimal bespoke fonts with two glyphs and almost nothing else
Garret: our MUST statements are almost all error conditions. There is also an implied requirement for tests for algorithms.
Garret: we don't require rendering identically, just that you should end up at some known end state.
Skef: I was thinking things could be described in terms of behavior
Garret: most client tests likely around applying a patch. We'll have to design tests to expose that in client-side observable behavior. We want black-box tests, look at the doc and get a clear answer.
(discussion of observation of expected result of patch application)
Garret: I volunteer as tribute!
Vlad: we want tests with a very simple pass/fail check. The people running the tests will be implementers.
Garret: we can start marking up test refs
Vlad: approach for woff2 allowed linking to specific tests which was useful
Garret: next step is to markup the spec. Not sure about timing of actually building tests, perhaps not until we have a basic Chrome implementation.
Garret: in addition to client conformance tests we might need to discuss encoder conformance tests because we have at least one normative statement there
Skef: are we going to do anything about w3c/
Garret: would ideally like to discuss with Yoav
Skef: seems to imply something more sophisticated than just pick the highest priority source you support
Garret: this issue is from the patch/subset days when client could over-request, but now you just obey what the encoder tells you. The issue is probably irrelevant now.
Garret: I'll try to catch Yoav at TPAC, failing that I'll close with a comment.
Building a High Quality Encoder
Garret: final topic, how to build a high quality encoder. Not really a spec issue but interesting wrt deployment for real.
Skef: two aspects of question, neither for spec. 1) parameterization of what encoder will do. By default general, based on how docs and fonts tend to work here are some ideas about organization. Then able to tweak, influence results.
Skef: 2) how you actually build the encoder, how does it work, what is it doing
Skef: not sure what we'd say about per-table patches in terms of configuration data
Garret: in the case of a general purpose encoder for the web I think the main tuning knob is how many files you want. More files will permit higher granularity.
Garret: e.g. no more than 10k patches, my environment doesn't like large file counts
Garret: can give additional knobs, e.g. spend most on glyph key patches
Skef: if I'm doing per-table patches and I have a budget for #patches then the more finely I cut things the more round trips I have
Garret: I imagine trying to ensure things are no more than one round trip away
Garret: e.g. a patch for each subset, a patch for each pair, etc
Skef: if the knob includes the width of patch the narrower the deeper the graph will be. Split latin in half, 2 patches just for latin, etc.
Garret: want it as close as possible to Just Works
Skef: a script might be a good split point so the input data needs to tell the encoder what a script is. Maybe we have core script, extended script for each script. Some scripts would default to separate encoding, others segment per-glyph but not per-table.
Skef: for CJK the division between core and extra might be frequency based. Per-table patch for this part, and then the rest. So you have config data about core vs data cutoff for per-table vs not.
Skef: by default I might not segment a script beyond those two buckets
Garret: a file budget enables reasoning about depth in per-table patching
Skef: consider per-table and glyph-keyed separately. Both require tuning. One way to express is target # patches. Another way is depth for per-table. On glyph-keyed side you could give a target # patches.
Skef: another tunable might be patch size, another might be #glyphs in patch
Garret: #files is based on notion that a server, say a cdn, might not want vast #s of files. That's the key limit the user has.
Skef: speaking of encoding, what about emoji
Skef: early on we talked about ligatures and I pushed back
Rod: icon fonts and emoji both have ligature problems
Skef: emoji will be more separable than icon names
Skef: in a color language font things may separate relatively well, most shapes correspond to individual characters
Skef: for emoji, the other aspect is likely more complex reuse of shapes
Skef: with enough reuse you might want to grow the base font
Skef: what you merge is different in an emoji font than a language font
Skef: you might want relatively few codepoints in a glyph keyed patch
Rod: can confirm there are a small set of highly used emoji
Skef: colorized things would use the same base outlines
Skef: if someone does shape coalescing that's a problem
(speculation about how emoji fit together)
RodS: for reference, https://
Skef: if designers reuse outlines heavily across concepts the grouping is non-obvious, we have to decide whether to group or duplicate
Garret: format 1 requires disjointness but format 2 doesn't
Garret: widely duplicated things can thus be de-duplicated in format 2, it doesn't have the format 1 requirement of 1 glyph => 1 bin
Garret: in format 2 if A maps to multiple patches all of them ultimately get loaded
(discussion of how patch mapping updates in the face of patch format 2)
Skef: I still find the interaction of encoding options and patch application a bit opaque.
Garret: the encoding reasoning differs because of how the different patch types can alter the font
Skef: for variable fonts the spec really only explains how to decode. How an encoder should build the data structures there's no guidance at all.
Skef: in our spec we haven't really explained how to write an encoder. We've specified the decoder and left the encoder as an exercise.
Garret: once formats are in common representation processing differs only in order, which ones go first
Skef: why when using per-table patches does it load only one vs all for glyph keyed
Garret: algorithm says to load one by one but it is intended that a client can make speculative loads for probable future patches
<Skef> Does anyone need a zoom link for the meeting?
leaverou2, question?
<leaverou2> oops sorry, wrong room
RodS: do we define how to build a basic encoder?
Garret: I have pseudo-code for the most basic possible per-table implementation
Garret: it's relatively easy to do pure glyph-keyed and pure per-table, mixed is trickier
Skef: we define things in terms of a subsetter, which is complex, but we can elide that
Garret: I have a simple mixed-type implementation so it's definitely possible
Skef: I worry that in the way we describe things we've over-optimized for compactness and unification vs comprehensibility. Some sections are hard to understand.
(meaning compactness of spec text)
Skef: we could switch to more of how this is how per-table works, this is how glyph-keyed works
Garret: it is possible to duplicate the wording
Garret: but the current wording does reflect how an exploratory implementation went quite closely
(additional discussion of how only client [decoder] is specified, encoder is less clear ... which is arguably by design, encoding is left open to permit ongoing innovation)
Garret: we could have pseudo-code for pure per-table, pure glyph-keyed, *and* how to combine
Skef: still have questions like where does the fi ligature go
Garret: I'll have to think about it, whether we can come up with a terse correct description
Skef: consider solving correctness using the abstraction of static subsetting, that's what mine does. Can we describe what mine does tersely?
Garret: I need to try it
Skef: my implementation puts complicated things into the base font
Skef: I think you are saying merge wherever there is an interaction
Garret: the spec pseudocode needs to be usable
Garret: we could add an informal description to the apply a patch section
RodS: how to encode is the hard part
Skef: perhaps concrete examples? - OpenType has some of this and it's useful
w3c/
(side discussion of encoders and Rust)
VICTORY?
<Garret> Summary of discussion about privacy issue (w3c/
<Garret> - Just enforcing minimum group sizes on the patches isn't doesn't really add anything as it's possible to construct patches that work around this while still allowing for single character level granularity.
<Garret> - Single character granularity font loading is already possible via unicode range. Functionally IFT works quite similar to unicode range and so has the very similar privacy characteristics.
<Garret> - We should document in the spec that including third party resources in a site is implying a level of trust in that resources (eg. like including third party javascript or css).
<Garret> - If we do want something like minimum group sizes as part of patch subset we'd want to to the codepoint grouping prior to executing the patch subset extension algorithm. That potentially has performance implications but may be able to be written in a way that doesn't affect performance if the font is already well formed (disjoint groupings over a
<Garret> minimum size).