Meeting minutes
Introductions
anssik: Welcome to 2022! We're now restarting CG calls given advances in Model Loader API
anssik: WebML CG and WG chair, Intel
Jon_Napper: leading ChromeOS ML intelligence team at Google
Andrew_Maylan: ChromeOS ML team at Google
Bruce: working with Ningxin, performance work, Intel
Geunhyung_Kim: explainability of ML is my interest, working for Gooroomee
Chai: Windows AI team lead at Msft, also WebNN API co-editor
Honglin_Yu: ChromeOS ML team at Google, Model Loader API spec and impl, exploring this space actively
Jiawei_Qian: prev handwriting recognition at Google
Jonathan_Binghan: product manager at Google, have worked with this CG for long, interested in both Model Loader API and WebNN API
Mingming: working with Ningxin on WebNN impl, Intel
Ningxin: WebNN co-editor, Intel
Ping_Yu: TF.js lead, Google, have worked with Ningxin and others
RafaelCintron: Edge team at Msft, also representing Msft in a bunch of other W3C groups, e.g. IW, WebGPU, Color on the Web groups
Raviraj: enabling the stack on Chrome OS at Intel, working with Ningxin
Model Loader API
<Jonathan> My intro: Product manager for Web ML at Google
https://
Spec and implementation progress
Honglin: Chromium CL just my personal, not for official review yet
… folks are welcome to review and make comments, final impl will be different
… will be split into multiple CLs
Slideset: https://
<chai> Just a note for Anssi, you may want to add Rama (Ganesan Ramalingam) into the participant list as well
Honglin: this is a brief on Model Loader API impl
Honglin: similar context to WebNN, user can set the number of threads is a diff
Honglin: ML loader corresponding to ML graph builder in WebNN
… this design is to handle the complexity of loading a model
Honglin: this is how the current prototype works, see the prototype CL
… all ML input and output relayed by the browser process
Honglin: we have benchmark results from MobileNet v2, even with CPU only the Model Loader API shows better performance than TF.js
… strong motivation to implement this API
RafaelCintron: why a separate process? The render process is the most secure one
Honglin: good question, we want this to be extensible to various hardware, e.g. Pixelbook has a ML-specific accelerators, we want to be able to use them, easier if we run this in ML service
… possibly more safe than renderer, needs to be validated, renderer can do JIT complication, in ML service we can disable those system calls
Chai: understanding execution path would help me better understand the relative perf of inference performance
Honglin: this inference time, with 150 images, using a demo web site where we download these images and after DL we process the data and run inference in a for loop
Chai: usually when running execution, you'd execute the kernels, the question is what backend of TF.js is used for the benchmark
Honglin: Wasm has limited CPU instructions supported, so ML service is compiled natively, this is the main reason
Honglin: results with quantized models still outperforms TF Lite
Honglin: IPC cost is 7-10 ms, not small, consider improving this
Honglin: 8 todos identified
<ningxin_hu> IPC cost seems high, is it caused by marshalling and unmarshalling?
Honglin: graph shows how to reduce the identified IPC cost
… in theory, this reduces IPC cost in half, being explored
Honglin: 5 open questions
Ningxin: IPC cost seems high, is it caused by marshalling and unmarshalling?
… does your prototype use shared memory to transfer tensors between processes
Honglin: marshall and unmarshall currently in prototype, considering alternatives
ningxin_hu: will follow up with you offline for this
RafaelCintron: how tied ChromeOS is to this ML service? Are you open to different inference engines besides TF Lite on Chrome OS
Andrew_Moylan: I think yes
RafaelCintron: how tied ChromeOS is to this, is this ChromeOS only API?
Jonathan: Honglin's work currently depends on ChromeOS, but we understand that is not a web standards and are talking to Chrome browser team and I think it is not in Chromium, but in Chrome, and we can coordinate with Msft to ensure ML service can be implemented also on other OSes
RafaelCintron: I'm somewhat familiar with ML service and thought it is not so tied to TF Lite
RafaelCintron: that'd be good for any browser that is cross-process
… even our first party like Office care about cross-platform, not just Windows
… how many processes can be created, thinking of malicious usage
Honglin: we can limit the max number of processes
… each model instance runs in a dedicated process, if the web page loads 10 models there's 10 processes, we'll limit the max number of processes
Dependencies, coordination topics with WebNN API
Honglin: we have discussed shareable structs, haven't yet started code reuse
… have discussed the frontend, backends needs to be explored
Meeting cadence
anssik: first, does this meeting slot work as a recurring one to folks?
https://
[agreement]
… I propose we do either bi-weekly (to match WG) or monthly?
… or on a need basis?
<Jonathan> anyone who would object isn't here, lol
Honglin: no opinion yet on cadence
Jonathan: having a recurring meeting would be valuable
… maybe next in two weeks and then once a month?
[agreement]
Chai: interop between the two APIs is a great target
… reuse of API contracts is even more useful than reuse of code
anssik: Thanks for joining everyone, thanks Honglin for the great presentation!