Meeting minutes
Repository: webmachinelearning/webnn
Anssi: please welcome the new participants to the WG:
… Gregory Terzian as an Invited Expert, working on WebNN Servo implementation
… Tarek Ziade from Hugging Face, working on RustNN implementation, transitioning from Mozilla
Anssi: Chelsea Kohl from Flying Okole
… Joan Leon, David Mulder as an individual contributors
… welcome all!
… I'm pleased to see new diverse implementation experience
… this broader implementation feedback continues to further strengthen and validate the WebNN API spec design
Web Neural Network API
MLComputePolicy-driven device selection
Anssi: PR #923
<gb> Pull Request 923 Refactor device selection: Rename to computePolicy, remove accelerated, and add fallback (by mingmingtasd) [device selection]
Anssi: we'll review the PR that executes the resolution:
Anssi: last time we resolved to convert the MLComputePolicy IDL proposal into a spec PR
… Mingming has submitted a PR #923 for review that refactors the device selection preference API to establish a more extensible framework by replacing MLPowerPreference with MLComputePolicy in MLContextOptions
… thank you Mingming for this contribution
… key changes proposed:
… - renamed preference enum MLPowerPreference -> MLComputePolicy
… - clarified extensibility
… - removed "accelerated" option
… - introduced "fallback" policy
… there's a corresponding Chromium CL providing implementation experience:
https://
Anssi: spec PR review was requested from Rafael, Dwayne, Ningxin and Reilly
… any questions or comments?
Zoltan: this is expressing developer preference when context is created
Zoltan: I gave thumbs up
<RafaelCintron> I will take a look but it looks good to me so far.
Zoltan: up to MikeW and Markus to say whether they're happy
<handellm> Looks good, I'll follow up on the PR
Bounded dynamic dimension
Anssi: issue #883
<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+]
Anssi: next we'll discuss the proposed new bounded dynamic dimension mechanism for MLDynamicDimension
… MarkusT suggested to add minSize to complement maxSize
… maxSize use case is to inform the implementation of memory allocation
… minSize use case would be similar, Dwayne gave thumbs up
… Ningxin proposed minSize to be optional with 1 as default
… Ningxin conducted a survey for bounded dynamic dimensions, notes "supported by CoreML, TensorRT and OpenVINO"
… but notes ONNX model cannot represent dimension bounds
… also JS ML frameworks don't support dimension bounds
… ONNX model cannot pass dimension bounds info to EPs
… MarkusT notes min/max bounds are just hints
… this allows implementations to accept any size
Ningxin: I got the task to create the PR, we're still exploring framework support, specifically ORT Web, looking to define session options, e.g. freeDimensionBounds
… similar to freeDimensionOverrides
… with this mechanism we can satisfy the WebNN change proposal, freeDimension will require maxSize
… we're prototyping this and making good progress, testing with various models
… on the WebNN backend side, MarkusT shared we can inform the underlying runtime via some mechanism, but haven't yet prototyped that and will explore it
… another open issue is, TF.js models are more flexible wrt free dimensions. e.g. shape op can capture tensor shape at runtime followed by e.g. gather and squeeze to construct a shape at inference time
… and also have dynamic ops like reshape, where new shape is defined by dynamic tensor, like a dynamic slice
… not sure yet if we should introduce such new ops, their output would be defined by tensor rather than something that can be inferred at model build time
… we have WebNN tensor element count validation at build time, with dynamic ops it is challenging to do shape inference and validation at build time
Ningxin: I can append the issue accordingly
Tensor element count limit
<gb> Pull Request 926 Specify element count limit (by philloooo)
<gb> Issue 924 Consider specifying tensor element count limit (by philloooo) [feature request] [Agenda+]
Anssi: a proposal from Phillis to spec tensor element count limit
… WebNN API currently reports the maximum supported length of tensors in bytes via MLOpSupportLimits.maxTensorByteLength
https://
Anssi: but tensor element count is not exposed and there are apparently use cases that require both maxTensorByteLength and total element count
… Chromium implementation already checks for the tensor element count limit
https://
Anssi: PR #926 updates the "check dimensions" algorithm to test that the tensor element count is within the range of long, ~2B elements
Dwayne: this seems reasonable, in some cases element count can be different from max byte count, e.g. something fits in memory but exceeds max indexing size for the API
… we say 32 max is the limit for this
Dwayne: I did not see any dissent about this in the issue or PR discussions
Rafael: I think this is fine for now, later on we may add a context attribute so developer knows what the limit is
… it helps to give web developers less than what you have
Dwayne: opSupportsLimits could have that as a complement
Security considerations for out-of-bounds access
Anssi: we have the following inline issue in the security considerations:
… "Document operations susceptible to out-of-bounds access as a guidance to implementers"
https://
Anssi: with further implementation experience across backends I believe the group is in a good position to make progress with this issue
… the proposed path forward is to document ops the group believes are susceptible to out-of-bounds access
… and propose mitigations as a guidance to implementers
… the mitigations are most likely implementation-defined to allow each implementation to use the mitigation techniques best suited for them, thus generic language is appropriate
https://
Rafael: what's in the spec seems fine to me, the browser should be secure by default
<ningxin> +1
Rafael: I consider all ops as important in terms of security
Ningxin: I think the spec already has text about bounds, for e.g. scatter and gather we have text that talks about OOB mitigation, how to use clamp to sanitize
… another related thing, some ops are more compute intensive than others e.g. conv* may allocate a buffer for layout as an optimization
… such a buffer may be larger than the conv weights, for example
… in such cases we should do overflow checks, not sure if WebNN can impose that, it would be implementation-defined check
… to make sure the pointer does not overflow
… web spec should have non-normative section to inform the implementations
MikeW: in general, anything resulting in undefined behaviour is not good, we should, as best we can, limit undefined behaviour, have default action
… e.g. in WebGPU OOB read would still be defined and you get no zero but some random data instead
https://
MikeW: this section was produced based on looking at the three major graphics APIs and observing how they behave
<RafaelCintron> https://
<DwayneR> Evidently Metal and DirectML have the same clamping behavior (clamped to buffer length rather than returning 0).
Rafael: WebGL has similar text to what happens when you index OOB
… you get a value in a buffer or all zeros, it won't be able to access data that belongs to another domain
<DwayneR> +1
<Mike_Wyrzykowski> +1
RESOLUTION: Open a meta issue to discuss out-of-bounds access issue in general and other security considerations.
WPT runners for testing on GPU devices
Anssi: in this topic, I wanted to discuss possible WPT runner improvements for GPUs
… and also check if our WebGPU folks have learnings from the WebGPU CTS work
<gb> Issue 2252 Export or move the WebGPU CTS into WPT (by kainino0x)
Anssi: I provided as background material cost estimate from RustNN perspective drafted by Tarek
… I expect us to be in the same ballpark for the other implementers
… Tarek's cost estimate is 6k USD / year for RustNN across Linux/Windows/macOS if usage is capped to 1 hour/day
… if we add Chromium into the mix, and relax the daily cap, we'd multiply that with some number
GitHub-hosted GPU runners cost estimate for rustnn
Tarek: renting a box would be probably cheaper than renting GPUs per time unit
… if there are many projects that would use this infra, e.g. also WebGPU, would be good to know if partners would like to share their e.g. H100 GPU clusters to run these tests
Ningxin: I think we don't have dedicated GPU bots in Chromium, we use SW only
Tarek: we also may want to consider multi-GPU environments, moving data from GPU from another GPU, test with WebNN and WebGPU together
Anssi: I will check if W3C has budget for this WPT infra improvement
… I think this would be a sponsorship opportunity for interested W3C members
Rafael: I think Kai's post notes WebGPU CTS is its own island, and Kai thinks WebGPU CTS should be put to WPT, WebGL is the same, part of Khronos Group
Tarek: Mozilla has boxes that could be used, I can also ask Hugging Face if they'd be interested in donating
… we want to have more projects to use this testing infra
<tarek8> +1
RESOLUTION: Inform the W3C team about the suggested WPT runner infra improvements to support GPUs. Share cost estimates to understand the sponsorship opportunity for W3C members.
<tarek8> servo fork: rustnn/