Meeting minutes
status of the IFT encoder, scope of wg deliverables
Vlad: charter has two sections defined deliverables and open deliverables
Skef: two issues one with the spec how the interactions work, and the prototype impl
Skef: too date Google has made most of the contributions to the prototype impl
Skef: some of it is rewritten based on previous technology provided by Adobe
Skef: my understanding was its a joint project. Didn't have to go through dicussions in this meeting, but still under the w3c
Skef: contributions are reviewed and go through PRs, but with this document that Adobe has concerns about. Didn't want to review at the time.
Skef: open to creating a space within the repository that things are unreviewed.
Skef: not sure why it's in there.
Skef: it seems like there's a lot of decisions being made by Google.
Skef: when I thought this was a joint project.
Skef: if this is a Google project they can do what they want. However if it's udner w3c there should be more give and take.
Vlad: my concern is the rights to what have been produced in this working group as part of these side projects.
Vlad: other deliverables gives us flexibility to do other things that benefit the spec development.
Chris: we shouldn't call this a reference implementation. That has more normative force which isn't the case here.
Skef: I don't perceive there being a rights issue since the license is very permissive.
Vlad: for woff1 we needed a test harness, was primarily developed by one member of the group. When we had woff2 we needed specially crafted font files. Google hired a contractor to do this.
Vlad: I don't see obstacle to do what we are doing with the sample implementation.
Skef: if it's a group project what does that mean?
Skef: group has some influence
Skef: if group doesn't then it's not a group porject
Vlad: we do have influence every group member can participate
Skef: want to be clear if they want to merge for documentation purpose and we haven't had time to review. That other parties haven't signed off on it.
Garret: broadly agree, the document was worded poorly at the start with respect to describing its status. Happy to clarify it's status at the start.
Garret: will rework the PR a bit to address this.
Garret: is there intersect in code reviewing in the repo?
Scott: I'm interested in being involved in development.
Skef: looking to be more involved as well.
Garret: main goal at the moment is to get an end to end wasm type demo working again with client. Then I'll circle back to encoder to work.
Garret: also a test suite and fuzzer suite for checking the quality of segmentations.
Skef: to make the fuzzer test more effective would be able to feed in glyphs of interest and vary those in more compelling combinations.
Garret: circling back to the original issue, do we want to have ift-encoder stay in w3c?
Chris: thankful for Garret's contributions so far, but would be great to see others contributing as well.
Skef: agreed its best to continue under w3c.
gathering language/locale data
Skef: it is my belief that most documents are written in one language and the second most will be two languages. There's lots of locality in how documents are written.
Skef: there are cases where there are more then use of what's considered the same language. eg. hong kong chinese vs mainland chinese. Code point frequencies are relatively different.
Skef: it's a goal to optimize things for languages
Skef: in particular on the table keyed side to have locale specific patches.
Skef: on the glyph keyed side it makes sense to be able to optimize the glyph keyed patches so that you don't mix high frequency in one locale with another.
Skef: so it will make sense to have a notion of locale/language. I believe I can build the relevant files. Locale/language is not just he codepoints, also punctation.
Skef: these could feed into the encoder, with one piece of information: Per locale codepoint frequency.
<ChrisL> Reading https://
Skef: identify locale in document and associate different regions of the document and figure out codepoint frequency.
Skef: I don't have a source for this, but Google possibly does.
Skef: was hoping to get input from Google.
Garret: three sources of data -
1. The previous PFE analysis suite we gathered page walk data which you could derive language codepoint frequency for.
2. We've previously publicly released our CJK segmentation + frequency data.
3. I will be looking to regather codepoint frequency data from the webcrawl index.
Chris: would be nice to publish the frequency data would be useful outside of this group.
Garret: yes agreed
Vlad: are spec changes are needed?
Skef: don't think I need any
Skef: approach to spec development is to try and generalize, but it may end up coloured by the problems we're trying to solve
Vlad: working draft was published recently, don't have time to review in detail but it's on github.
Vlad: Chris initiated tag review process. If we get comments back do we need formal working group action?
Chris: if we have consensus we can resolve on github, if it's contentious we should discuss in working group.
Chris: may need to be on the agenda next time.
<skef> present!