WebML CG Teleconference – 6 April 2022

Meeting minutes

First topic: scheduling.

scheduling

Anssi: daylight savings time has shifted some of our times by one hour
… should we shift back one hour to match the original schedule?
… West coast 9pm, Finland 7am, Sydney 2pm, noon Shanghai

Rafael: either time is fine for me

Honglin: either time is fine for us

ningxin_hu: works for me

General agreement. Next month will be 1 hr earlier.

Model Loader API - Chromium implementation update

https://chromium-review.googlesource.com/c/chromium/src/+/3520923

anssik: Patches in review:

https://chromium-review.googlesource.com/c/chromium/src/+/3520923

https://chromium-review.googlesource.com/c/chromiumos/platform2/+/3473544

https://chromium-review.googlesource.com/c/chromium/src/+/3521114

https://chromium-review.googlesource.com/c/chromium/src/+/3555501

https://chromium-review.googlesource.com/c/chromium/src/+/3524241

https://chromium-review.googlesource.com/c/chromium/src/+/3525653

Honglin_Yu: There are 6 CLs needed to complete the prototype. 5 have been submitted.

Honglin_Yu: Patches submitted 3520923 and in review 3473544, 3521114, 3555501, 3524241, 3525653
… after the next CL, people can try with the flag
… progress has been smooth
… it's Chrome OS only
… next we'll implement WPT and other tests
… We want to collect developer feedback
… and we want to look into GPU support. Security is still the main blocker.
… We'll probably have to use the GPU process in chromium. It's TBD how much performance gain is possible compared to GPU.

ningxin_hu: progress is encouraging!
… once the feature is ready for testing, we at Intel have a Chrome OS team that can help test.
… The WebNN prototype is in our local branch. We'll look into using the common interfaces and replace the local patch.
… Then hopefully we can upstream some of the WebNN implementation and use shared infrastructure.

Honglin_Yu: There are many things to discuss in sharing the implementation.

ningxin_hu: There are open issues about sync/async and other topics.
… alignment will be good for both APIs.
… GPU integration support is an important topic. Within the WG, we've discussed the spec for WebGPU interop, related to video processing.
… GPU sandboxing and GPU process are good topics to discuss for implementation.

Honglin_Yu: Video and multimedia data will be a main use case. GPU support will be needed to make it successful.
… Maybe we should figure out a way to accept a video frame and output a video frame in processing

ningxin_hu: In the WG, the GPU interop issue is a hot topic. If you can provide your input to that discussion it will be helpful.

WebNN API: Context-based graph execution methods for different threading models. #257

Honglin_Yu: The current Model Loader API is still a lot of effort, even for CPU only.

ningxin_hu: We can work on GPU together in the WG.
… I can also keep the Model Loader usage in mind during the discussion and find common themes and opportunities for alignment

Honglin_Yu: We're already making progress in aligning the two APIs.

RafaelCintron: Two meetings ago, we asked why Model Loader isn't simply a wrapper around WebNN? Why have 2 APIs?
… Why can't Model Loader be a helper function?

Honglin_Yu: We've discussed that from the beginning. The 2 APIs are complementary. You could also make WebNN based on Model Loader, and convert a graph into a standard model format.
… The model loader API can have a more direct connection with the backend. That may make it easier to upgrade. Eg, upgrade the TensorFlow backend without changing the javascript API.
… There's the possibility of more cutting edge technology.
… If only WebNN was supported on some platform, Model Loader could be based on it as a polyfill.

RafaelCintron: A lot of data transformations would be required to layer WebNN on top of Model Loader. It seems more efficient the other way around.

Honglin_Yu: I'm not sure I agree it's more efficient one way or the other. For ChromeOS, we have a mature TF Lite runtime environment. If we directly run it, we can have native performance.
… We can have ML accelerators support.
… The implementation can be simpler with model loader, piping to the ML service.
… From our side, it's simpler not to base it on WebNN.
… Once WebNN reaches origin trial, we can try the other path.

RafaelCintron: I definitely agree it's simplest for you, for TF Lite.

Honglin_Yu: In the future, we could bring an ML service to every browser, whether TF or something else.
… This is the easiest way for us to get started. Chrome OS really needs a way to run ML.
… TF Lite wasn't designed to run models from arbitrary sites. There's security work to be done.

RafaelCintron: The API you're exposing to people currently is not yet a web API, it's only for Chrome OS

Honglin_Yu: We want to make it a web API, so we can have more users. Bringing it to chromium in general will take more time.

anssik: Model Loader is an experimentation at this point.
… These APIs are not competing. We're on a journey to learn.
… Chrome OS is a starting point for the Chrome OS team, in building model loader.
… We're figuring out the best future. Model Loader is still in incubation.

ningxin_hu: My perspective regarding Rafael's question: WebNN is designed to be independent of the backend framework.
… To be efficient across frameworks, the operators must be a subset of what frameworks support.
… It's challenging to implement Model Loader because it can take advantage of one underlying service.
… Purely implementing Model Loader on WebNN is challenging.
… Custom ops and interop are difficult.
… WebNN is a subset of one model format.

Honglin_Yu: WebNN is an important core of ML. Model Loader can maybe reach the cutting edge.

jonathan: thanks for this questions Rafael
… we discussed this in another thread, also with CrOS folks at Google, part of the attraction to Model Loader API at Google is that for WebNN all ops need to be agreed upon so that is quite heavy weight
… for Model Loader API we have such an agreed upon set of ops attached to a model format, and revising that model format can probably happen faster outside this group
… maybe this enables faster experimentation behind a flag, allow faster iteration
… strictly implementing Model Loader API atop WebNN would not allow us to move this fast
… there's worry among some TF folks the op set we're working on (in WebNN?) won't keep up with the development
… Model Loader API OTOH could add another model format in the future

Raviraj: no further comments

Model Loader API - new issues

Integration of media capture transform

ningxin_hu: I'll give a high level overview.
… We investigated whether we could have a full GPU-only pipeline, leveraging new web platform developments.

[prototype] Video processing with insertable streams main thread version

ningxin_hu: Web RTC came up with the media capture API to expose frames for processing

[prototype] Video processing with insertable streams worker version

ningxin_hu: Your transformer can get a video frame and controller.

Details of the processing pipelines in issue #226

ningxin_hu: Video frame is a new interface of web codec.

VideoFrame interface in WebCodecs spec

ningxin_hu: You can transform the video frame into a new one.
… How can WebNN help the use case of video transformations?
… One requirement is to keep data on the GPU as much as possible, to avoid transfer between GPU and CPU
… The transform itself can be done in multiple ways: WASM, WebGL, WebGPU shaders
… We're trying background blur as an example use case.
… It can be solved with shaders, WebGL or WebGPU.
… but segmenting background from foreground is an ML problem
… WebNN and Model Loader are also interesting approaches.
… I tried experimentally implementing a processing pipeline, combining WebGPU and WebNN graph APIs
… In the github issue, you can read about the details.
… Maybe the Web GPU buffer can be the graph input.
… We ran a model in the WebNN graph, which outputs a WebGPU buffer with the result.
… It labels as background or object.
… We use the video frame API with the results and inject into the media processing pipeline
… The transformed background blur can be used for local playback or Web RTC API calls, like video conferencing use
… That's the summary of the experiment

Honglin_Yu: This is great work. This will be necessary for both APIs to support. This is also what we're thinking, inspired by your work.
… We should put the computation on the GPU and use video frames.
… One question: besides insertable streams, is there any other way to achieve GPU-only processing of video frames?

ningxin_hu: Good question. To my knowledge, a video frame or element could be imported to Web GPU and use a shader to process that. Probably Rafael knows this domain more than me.

RafaelCintron: Ningxin is correct. Yes, you can take a video element and import to WebGL or WebGPU. The current spec has a copy operation for WebGL. WebGPU has a direct import.
… People from Intel are working on that. It's possible with zero copies.

ningxin_hu: Honglin mentioned thinking about Model Loader taking a video frame as input and producing it as output.
… Some work in background blur is rendering. You blur or blend the original and blurred image into a new frame.
… These tasks could be in the scope of model loader or done in operations.

Honglin_Yu: I'm only thinking about it. I haven't tried to implement it yet.
… Maybe the output will be a mask, and then the developer has to use the output mask to composite the original and mask.
… In that case, the output mask would be on the GPU still. Is that applicable?

ningxin_hu: This seems like one way that could work.
… We can follow up in the github issues.

Honglin_Yu: We could also output a WebGPU buffer, or store the tensor in WebGPU and developers can do whatever they want.
… We're in early stages.

Device Preferences: GPU, power, usage, performance hints

DevicePreference: "GPU" => "GpuRequired" and "GpuPreferred"

"default" v.s. "auto" in MLDevicePreference and MLPowerPreference

Add performance/usage Hints

anssik: It makes sense to discuss these issues together.

Honglin_Yu: We don't need to reach conclusions today. For TensorFlow, it may be the case that some operators are supported by GPU, and others cannot.
… We may need a device preference for GpuRequired, which returns an error if any operators are not supported.
… If GpuPreferred, it can fall back to CPU.
… If we change the default to an auto option, the implementation can decide which backend to use.
… It's tricky to expose what accelerators are available.
… It's potentially more granular and specific than WebGPU.
… which has privacy concerns.
… An auto option allows supporting more backends without the developer needing to make decisions.
… Hints may be friendlier or easier to use.
… Hints can inform the choice of backend.

anssik: This is an area where WebNN and Model Loader could share a surface, if hints make sense

-> Device selection with MLDevicePreference and MLPowerPreference

#169 https://github.com/webmachinelearning/webnn/issues/169

anssik: The values of the hints could be aligned.

Device selection with MLDevicePreference and MLPowerPreference #169

RafaelCintron: With regards to auto, there's precedence. It helps avoid fingerprinting. Give a wish list of things, and let the browser pick the backend.
… There needs to be a required aspect too. If the browser chooses CPU instead of GPU, that won't work for background blur.
… For the advanced user, the required mode should be available.

Honglin_Yu: Agree, we should support CPU or GPU. Maybe we should change the default option to Auto.
… Hints about video, image, or canvas might let the backend smartly choose.

RafaelCintron: Like a foreshadowing of what the web developer wants to do in the future with the results.

Honglin_Yu: When we first implemented, we made Model Loader default to Auto before we looked at the spec.
… It's an initial thought.

anssik: What can we learn from mobile or desktop apps, and uses outside of the web context?
… What has worked there?

qjw: In my experience using ML on Windows, i think auto-config is sort of new to the Web, because we don't want to expose all the information to the developer.

anssik: This topic is of interest for WebNN as well. We'll keep this group updated.

RafaelCintron: Yes, on Windows, native APIs let you ask for the specifics of the backend. That won't work for TAG reviews. It leads to a lot of fingerprinting risk.
… For the Web, request parameters or a dictionary of required or optional parameters might make sense.
… That's more web-y to do. Let the browser decide.

anssik: Starting small and then expanding is easier on the web.

– DRAFT –
WebML CG Teleconference – 6 April 2022

06 April 2022

Attendees