WebML WG Teleconference – 24 March 2022

Meeting minutes

Announcements

Ethical Principles for Web Machine Learning

Repo for Ethical Principles for Web Machine Learning

james: consultation doc migrated to draft note, now migrated to github
… it solidifies the principles (based on UNESCO) and the guidance attached to them
… the guidance making the principles more concrete
… risks & mitigations is the next steps - they'll be even more concrete and need y'all input
… either via github, or through the live sessions week of Apr 4 which Dom is running

anssi: thanks, amazing work
… I invite you all to watch James' intro video

Security considerations - last call for review

PR: Update Security Considerations

anssi: I've been bugging you all to make progress on this work - security is a key component of our wide review commitment
… thanks for the feedback received which I've incorporated in PR #251

anssik: changes per review feedback:
… - Drop avoid reshaping tensors guideline
… - Note shape inference is done during the graph building stage
… - Note concrete device selection is left to the implementation
… - Update timing attack mitigation considerations

anssi: should we go ahead and merge this for another round of review?

dom: +1

<ningxin_hu> +1

anssi: let's circle back with the chrome review and then bring it W3C security review
… in any case, this is a first draft on which we'll keep iterating on as the rest of the document

Context-based graph execution methods for different threading models

anssi: #257 is the pull request on which we agreed to converge during our previous meeting

chai: update from last time: there has been additional feedback and we're converging on the idea of making the default device option more explicit
… with the default being the CPU
… this should resolve the problem of determining when the sync method can be used
… Another progress is around the interop with WebGPU
… Rafael, Brian and myself have spent quite a bit of time discussing this
… we're converging on a somewhat more abstract layer that would be friendlier to implementations based on Vulkan, CoreML or linux
… this hasn't been brought to the PR yet
… it comes down to having the context taking the WebGPU queue and populating the ML workload the WebGPU queue
… which would then be submitted and executed async

Rafael: chai has been doing a great job pushing this forward

anssi: I notice Brian shared comments in PR #255 with a response from ningxin - not sure if that was incorporated

chai: yes, he's on board with our converged direction

Device selection with MLDevicePreference and MLPowerPreference

anssi: re device selection, #169 is an existing issue on the topic

dom: I think Chai summarized my point re device selection
… it has been an explicit choice to make device selection a hint due to privacy reasons
… I'm thinking defaulting to CPU might be actually OK
… if you ask for a CPU you'll get a CPU-backed context

"default" v.s. "auto" in MLDevicePreference and MLPowerPreference

anssi: model loader is also considering reusing the device preference and power preference from our spec, with similar questions

ningxin_hu: thanks to chai, Rafael and brian to help moving the design forward
… I've added a comment based on my investigation of the gpu-only processing pipeline
… the WebGPU/WebNN interop plays a very important role there to make sure data stays on GPU for efficient processing
… I've left some questions in PR #255 on resources sharing and execution order
… I look forward to the updated PR

Ningxin's response based on prototype investigation findings (in context of the older PR #255)

ningxin_hu: my scenario is that the WebGPU compute shader is used for pre- and post-processing to the WebNN compute
… my comment has pseudo-code to illustrate that usage in the WebGPU/WebNN background blur

anssi: let's land this in our spec before seeking WebGPU formal review

Integration with real-time video processing

Video processing with insertable streams main thread version

anssi: ningxin_hu has developed a media capture transform-based pipeline for background blur, one using WebGL, the other using WebGPU/WebNN that needs a prototype implementation of WebNN in Chromium

Video processing with insertable streams worker version

Video processing with insertable streams main thread version

Details of the processing pipelines in issue #226

ningxin: this follows up to Dom's suggestion to investigate the integration of media capture transform (Process/generator design pattern that allows video processing via a VideoFrame object), as illustrated in webrtc-samples both in main thread and in a worker
… the demo I created is based on these two examples to construct a pipeline that uses only GPU for ML processing, with here background blur as our example
… As I explain in my description of the pipeline, it requires several steps:
… - get a gpu buffer for a video frame
… - a shader to blur an image
… - a step to segment the input image between background from other objects, using machine learning
… - based on the segmentation map that annotates background/foreground, another shader blurs the background and leaves the foreground alone
… - this produces and output texture that can be drawn on an offscreencanvas to produce a videoframe fed into the media capture transform generator
… that can be used for video playback or for sending on a webrtc connection
… I implemented that pipeline with WebGL: WebGL shader and WebGL-TF.js backend for segmentation with TF.js DeepLabV3 model (it can be used on many objects, but this example focus on background detection)
… this serves as a baseline
… I initially wanted to make a WebGPU-only pipeline, but the WebGPU backend of TF.js has issues for segmentation (which I reported to the TF.js team)
… once that resolved, I'll be able to update the sample with the WebGPU-only version
… The final pipeline I published uses WebGPU shaders with a WebNN segmentation, which illustrates the interop with WebGPU
… WebNN uses the output of the WebGPU shader as input to the compute
… the output of WebNN compute is fed into another WebGPU shader for post-processing (blending input image with output segmentation map)
… In this prototype, I still use the existing MLGraph.compute that uses GPUBuffer as input/output (found some related issue I documented in PR #255)
… as I alluded before
… I also ran into issues - the WebGPU backend of TF.js limitation; a WebGPU implementation bug in Chromium; and on entry level GPUs, the WebGL pipeline can freeze the browser UI
… Dom commented on the CPU usage, and asked whether main thread/worker will help reduce the CPU usage
… it shouldn't based on my observation
… the worker appraoch will mostly help with running sync version of the frameworks

RafaelCintron: congratulations ningxin_hu - this is really informative
… being able to use seamlessly the GPU buffer from WebGPU to WebNN is a great signal

anssi: +1 - getting interop right is harder but critical

dom: thanks for this amazing work, really impressive
… I wonder if you have a sense whether this estimated CPU usage ~20 with WebGPU and ~40% with WebGL is what we should expect or whether there's optimizations to be done

ningxu: re CPU usage, it probably depends on device driver & implementations
… I didn't observe a difference of CPU usage between the two pipelines on my own setup

ningxin_hu: it depends on device drivers and implementation, I did not observe this big difference between the two pipelines

ningxu: in terms of CPU usage, looking at WebGPU samples, they trigger similar CPU usage
… e.g. when running the image blur sample
… so this probably a question to discuss with WebGPU folks

<Zakim> dom, you wanted to ask pixel formats/color space

ningxin: I observed significant performance benefits using WebNN

dom: did you get a sense whether WebNN does contribute to the better perf of the WebNN/WebGPU pipeline?

ningxin_hu: yes

ningxin_hu: the WebGL pipeline runs into freezes with the browser UI, in which case the FPS goes down to 0 or 1
… with an entry level GPU
… WebGPU/WebNN runs with over 10 FPS in that situation

RafaelCintron: if you run your analysis on windows, we can trace the source of cpu usage

ningxin_hu: that would be great - it is on windows; I'll follow up with you

<Zakim> dom, you wanted to ask pixel formats/color space

dom: in issue 226 we discussed VideoFrame does not let you pick pixel format/color space you get from the camera, impact on shaders?

ningxin_hu: I noted that discussion in the thread
… in my implementation, I use the WebGPU copy to external texture which turns it into RGB format
… at this stage, this hasn't been a problem
… @@@ importExternalTexture @@@ will be a good next step to explore the performance implication of this

anssi: this started from a request from the WebRTC WG - is it time to report back to that WG?

dom: a good question, I'll bring this up with WebRTC WG chairs on our regular call next week, I'll suggest we present outcome of this prototyping on WebRTC WG's mid-April call
… we're pushing the limits of many bleeding edge web features in development with this prototyping

Candidate Recommendation proposed new features

WebNN should support int8 quantized models

anssik: looking for input on int8 support as part of our first release of WebNN

chai: +1 - quantized int8 models for CR is important due to new NPU coming to the market; this would otherwise be a serious shortcoming
… it also impacts device selection re NPU
… they should be considered together

anssik: not hearing pushback, I'll mark it for CR

WebNN / WebGPU interop

WebGPU issue: WebNN / WebGPU interop

anssik: what are remaining investigation items once #257 lands?

ningxin_hu: my plan is to review Chai's update PR #257 and prototype that in Chromium, and then update the samples accordingly and that would be a good milestone for discussion with WebGPU WG

chai: I think one of the WebGPU topics is we're wondering how Vulkan Linux community and Apple CoreML will implement it, when we push for WebGPU review then we get explicit review from those communities I assume

RafaelCintron: in the WebGPU CG Apple is present and attends all meetings
… no Vulkan reps in the CG, but people who are familiar with it are participants

– DRAFT –
WebML WG Teleconference – 24 March 2022

24 March 2022

Attendees