WebML CG Teleconference – 12 Jan 2022

Meeting minutes

Introductions

anssik: Welcome to 2022! We're now restarting CG calls given advances in Model Loader API

anssik: WebML CG and WG chair, Intel

Jon_Napper: leading ChromeOS ML intelligence team at Google

Andrew_Maylan: ChromeOS ML team at Google

Bruce: working with Ningxin, performance work, Intel

Geunhyung_Kim: explainability of ML is my interest, working for Gooroomee

Chai: Windows AI team lead at Msft, also WebNN API co-editor

Honglin_Yu: ChromeOS ML team at Google, Model Loader API spec and impl, exploring this space actively

Jiawei_Qian: prev handwriting recognition at Google

Jonathan_Binghan: product manager at Google, have worked with this CG for long, interested in both Model Loader API and WebNN API

Mingming: working with Ningxin on WebNN impl, Intel

Ningxin: WebNN co-editor, Intel

Ping_Yu: TF.js lead, Google, have worked with Ningxin and others

RafaelCintron: Edge team at Msft, also representing Msft in a bunch of other W3C groups, e.g. IW, WebGPU, Color on the Web groups

Raviraj: enabling the stack on Chrome OS at Intel, working with Ningxin

Model Loader API

<Jonathan> My intro: Product manager for Web ML at Google

https://github.com/webmachinelearning/meetings/blob/main/scribe-howto.md

Spec and implementation progress

Updated explainer

Early spec draft

Chromium prototype

Honglin: Chromium CL just my personal, not for official review yet
… folks are welcome to review and make comments, final impl will be different
… will be split into multiple CLs

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Jan/att-0000/Update_on_Model_Loader_API.pdf

<chai> Just a note for Anssi, you may want to add Rama (Ganesan Ramalingam) into the participant list as well

[ Slide 1 ]

Honglin: this is a brief on Model Loader API impl

[ Slide 2 ]

[ Slide 3 ]

[ Slide 4 ]

Honglin: similar context to WebNN, user can set the number of threads is a diff

[ Slide 5 ]

Honglin: ML loader corresponding to ML graph builder in WebNN
… this design is to handle the complexity of loading a model

[ Slide 6 ]

[ Slide 7 ]

Honglin: this is how the current prototype works, see the prototype CL
… all ML input and output relayed by the browser process

[ Slide 8 ]

Honglin: we have benchmark results from MobileNet v2, even with CPU only the Model Loader API shows better performance than TF.js
… strong motivation to implement this API

RafaelCintron: why a separate process? The render process is the most secure one

Honglin: good question, we want this to be extensible to various hardware, e.g. Pixelbook has a ML-specific accelerators, we want to be able to use them, easier if we run this in ML service
… possibly more safe than renderer, needs to be validated, renderer can do JIT complication, in ML service we can disable those system calls

Chai: understanding execution path would help me better understand the relative perf of inference performance

Honglin: this inference time, with 150 images, using a demo web site where we download these images and after DL we process the data and run inference in a for loop

Chai: usually when running execution, you'd execute the kernels, the question is what backend of TF.js is used for the benchmark

Honglin: Wasm has limited CPU instructions supported, so ML service is compiled natively, this is the main reason

[ Slide 9 ]

Honglin: results with quantized models still outperforms TF Lite

[ Slide 10 ]

Honglin: IPC cost is 7-10 ms, not small, consider improving this

[ Slide 11 ]

Honglin: 8 todos identified

<ningxin_hu> IPC cost seems high, is it caused by marshalling and unmarshalling?

[ Slide 12 ]

Honglin: graph shows how to reduce the identified IPC cost
… in theory, this reduces IPC cost in half, being explored

[ Slide 13 ]

Honglin: 5 open questions

Ningxin: IPC cost seems high, is it caused by marshalling and unmarshalling?
… does your prototype use shared memory to transfer tensors between processes

Honglin: marshall and unmarshall currently in prototype, considering alternatives

ningxin_hu: will follow up with you offline for this

RafaelCintron: how tied ChromeOS is to this ML service? Are you open to different inference engines besides TF Lite on Chrome OS

Andrew_Moylan: I think yes

RafaelCintron: how tied ChromeOS is to this, is this ChromeOS only API?

Jonathan: Honglin's work currently depends on ChromeOS, but we understand that is not a web standards and are talking to Chrome browser team and I think it is not in Chromium, but in Chrome, and we can coordinate with Msft to ensure ML service can be implemented also on other OSes

RafaelCintron: I'm somewhat familiar with ML service and thought it is not so tied to TF Lite

RafaelCintron: that'd be good for any browser that is cross-process
… even our first party like Office care about cross-platform, not just Windows
… how many processes can be created, thinking of malicious usage

Honglin: we can limit the max number of processes
… each model instance runs in a dedicated process, if the web page loads 10 models there's 10 processes, we'll limit the max number of processes

Dependencies, coordination topics with WebNN API

Honglin: we have discussed shareable structs, haven't yet started code reuse
… have discussed the frontend, backends needs to be explored

Meeting cadence

anssik: first, does this meeting slot work as a recurring one to folks?

https://github.com/webmachinelearning/meetings/blob/main/telcons/2022-01-12-cg-agenda.md

[agreement]
… I propose we do either bi-weekly (to match WG) or monthly?
… or on a need basis?

<Jonathan> anyone who would object isn't here, lol

Honglin: no opinion yet on cadence

Jonathan: having a recurring meeting would be valuable
… maybe next in two weeks and then once a month?

[agreement]

Chai: interop between the two APIs is a great target
… reuse of API contracts is even more useful than reuse of code

anssik: Thanks for joining everyone, thanks Honglin for the great presentation!

– DRAFT –
WebML CG Teleconference – 12 Jan 2022

12 January 2022

Attendees