Meeting minutes
Repository: webmachinelearning/webnn
Anssi: please join me in welcoming the latest new participants to the WG:
… Yoav Weiss from Shopify
… Fidel Tian from Zoom
… Ali Spivak from Google
Web Neural Network API
Lower limit for conv2d/pool2d kernel sizes, dilations, strides
Anssi: issue #928
<gb> Issue 928 Consider specify lower limit for conv2d/pool2d kernel sizes, dilations, strides (by philloooo) [security-tracker] [Agenda+]
Anssi: I'd like to have a group discussion on the edge cases reported in the issue that may cause integer overflows
… and also discuss the proposed mitigations to be codified in the specification
… Phillis reports she found edge cases in pool2d and conv2d when "unreasonably large" sizes for the parameters are used
… this may cause integer overflows in some backends
… Phillis notes in real-life use cases these parameters never have such large values
… and thus proposes as mitigations parameter upper bounds that are "reasonable for all use cases"
… this helps with fuzz testing by limiting scope
… initial proposal suggests the following size limits:
… kernel size 1024
… strides and dilations 256
Phillis: Dillon works for TFLite and proposed checks to fix "fishy" parameter sizes
webmachinelearning/
<gb> Issue 928 Consider specify lower limit for conv2d/pool2d kernel sizes, dilations, strides (by philloooo) [security-tracker] [Agenda+]
<tarek> Sorry I am late.
Dwayne: I agree with limits on the high level, I want to be cautious picking arbitrary ranges that we feel should be safe
… express in terms of inputs instead, some of our samples e.g. SD use more than the initially proposed limits
… want to be careful to not break things with these limits
… Dillon's statements are in terms of input sizes and as such relative metrics
Anssi: "these rules would catch some valid graphs, but [...] unlikely that "real" graphs would violate these checks"
Dwayne: would help to have collection of cases where we have int overflows in backends
Anssi: Phillis, can that information we shared in public?
Phillis: will check with the team
Ningxin: I haven't yet received an update from investigation from my side
MarkusT: should WebNN do this, or should different backends do this? Can we expose underlying limits of the backend, would this cause fragmentation
Dwayne: I'll review Dillon's checks with additional information provided
conv2d output channels validation
Anssi: issue #925
<gb> Issue 925 Mandate `output_channels % groups == 0` validation for conv2d (by lynne0326) [security-tracker] [Agenda+]
Phillis: Lynne from Google works on WebNN
Anssi: let's discuss the issue and review the proposed spec change
Phillis: this issue also triggers some errors in the backends and we'd like to the proposed validation to the spec for conv2d
… proposed spec change: "The specification must mandate that the total number of output channels is a multiple of the groups attribute."
Anssi: motivation: "This eliminates an entire class of potential buffer overflows before the graph is ever compiled."
Anssi: as future work, add the same validation to convTranspose2d if it adds support for groups > 1 that'd cause OOB write
<DwayneR> I think this restriction should be okay.
Ningxin: I think this is a good addition, looking at conv2d groups attribute, it also mentions channels are divided in two, just need to add this to the validation step, +1 from me
MarkusT: it is a good change, +1 from me
Anssi: Dwayne +1'd
RESOLUTION: Add the proposed mitigation to conv2d spec to eliminate potential buffer overflows. (issue #925)
MLComputePolicy naming
Anssi: PR #923
<gb> Pull Request 923 Refactor device selection: Rename to computePolicy, remove accelerated, and add fallback (by mingmingtasd) [device selection] [Agenda+]
Anssi: we are close to landing this PR
… the remaining thing we need to agree on is naming
… MikeW approved with "fallback" as the name
… I see Dwayne's "compatible" alternative name has merit and warrants discussion
… the staged PR has the following names for the MLComputePolicy:
enum MLComputePolicy {
"default",
"high-performance",
"low-power",
"fallback"
};
Anssi: the proposed alternative is "fallback" -> "compatible":
enum MLComputePolicy {
"default",
"high-performance",
"low-power",
"compatible"
};
Anssi: Web Platform Design Principles that has a section called "Naming principles" to help spec authors choose names:
https://
Anssi: I put these two names, "fallback" and "compatible", through this test to inform the naming decision:
… - 12.1 Use common words - both "fallback" and "compatible" are readable US English
… - 12.2 Use ASCII names - both "fallback" and "compatible" pass this test with flying colors
… - 12.3 Consult others on naming - "fallback" in more common name on the web platform
"fallback" name in WebGPU and CSS
"compatMode" name in DOM
… - 12.4 Use names that describe a purpose - "compatible" seems better aligned with TAG principles
… TAG says "Name things for what they do, not how they do it."
… "fallback" tells how to do it ("use fallback device")
… "compatible" describes behavior ("use device that provides maximum compatibility")
… - 12.5 Name things consistently - "compatible" seems more consistent within MLComputePolicy names
… TAG says "Naming schemes should aim for consistency, to avoid confusion"
… "compatible", "high-performance" and "low-power" all describe what is to be prioritized
… "fallback" describes how to do it
… the non-scientific TAG naming principles test score, using equal weight for all tests:
… - "fallback" 3/5 - fails tests 12.4 and 12.5
… - "compatible" 4/5 - fails test 12.3
… if we give more weight to naming consistency within the web platform APIs it is a tie
… my gut feeling is folks who try WebNN are also familiar with WebGPU
… I see preference from Dwayne and Zoltan for the "compatible" name, Markus seems indifferent?
MarkusH: I was wondering, as a follow up, would be nice to have an effective compute policy on Graph.devices
… slight preference for "fallback"
Zoltan: an original issue connected to this PR, used "fallback" as the name
… argument for fallback was then that WebGPU is also using it, but WebGPU prefers an adapter concept that does not exist in WebNN
… so adapter and ComputePolicy do not go so well together
… ComputePolicy being fallback is not too far from WebGPU fallback adapter
… adding more consideration
<RafaelCintron> +1
Rafael: we prefer "fallback" to keep aligned with WebGPU naming
Dwayne: I can live with "fallback"
MarkusT: should we have a priority list of devices?
<zolkis> we started long ago with a priority list
Zoltan: we have an explainer with background discussion
RESOLUTION: Use MLComputePolicy "fallback" as the name for maximum compatibility preference. (PR #923)
Anssi: other enhancements from this PR, to be addressed in a separate PR:
… - "low-latency" MLComputePolicy, example use case audio processing
… - "precision" MLContextOptions, to signal chopping off low bits is not preferred for this context
Bounded dynamic dimension
Anssi: issue #883
<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+]
Anssi: thanks Ningxin and Markus for providing new information for this consideration
… first, ORT Web does not currently support dimension bounds
… Ningxin proposed to add freeDimensionBounds similar to freeDimensionOverrides in WebNN EP session options, Guenther was pinged
Ningxin: I will catch up with Guenther
Anssi: Markus notes some backends could accept any shape size, thus min/max bounds are to be considered as hints
… this is backend specific, question to ONNX team how to handle this in a compatible manner across all backends
Ningxin: two aspects, ONNX Runtime API, another is EP interface, for the EP interface we can talk with a person in ONNX team with Rafael
… SOTA image generation model Z-Image-Turbo, cannot be supported by only having input dynamic dimension as in the previous proposal shared on Feb 11:
webmachinelearning/
<gb> https://github.com/webmachinelearning/webnn/issues/883
Anssi: to address this limitation, Bin and Wanming prototyped 9 new ops to enable Z-Image-Turbo successfully:
+ MLOperand mod(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
+ MLOperand shape(MLOperand input, optional MLOperatorOptions options = {});
+ MLOperand range(MLOperand start, MLOperand limit, MLOperand delta, optional MLOperatorOptions options = {});
+ MLOperand dynamicReshape(MLOperand input, MLOperand newShape, optional MLOperatorOptions options = {});
+ MLOperand dynamicExpand(MLOperand input, MLOperand newShape, optional MLOperatorOptions options = {});
+ MLOperand dynamicSlice(MLOperand input, MLOperand starts, MLOperand ends, optional MLOperatorOptions options = {});
+ MLOperand dynamicPad(MLOperand input, MLOperand pads, optional MLOperatorOptions options = {});
+ sequence<MLOperand> dynamicSplit(MLOperand input, MLOperand splits, unsigned long numOutputs, optional MLOperatorOptions options = {});
+ MLOperand dynamicResample2d(MLOperand input, MLOperand sizes, optional MLDynamicResample2dOptions options = {});
webmachinelearning/
<gb> Issue 883 Support flexible input sizes (by huningxin) [feature request] [operator specific] [Agenda+]
Ningxin: I want to get the group's feedback for there new proposed ops
… dynamic size calculation required within the graph, initial proposal allows developer to define dynamic dimensions at input, placeholder with name and bounds
… this new model and other Transformers.js models require shape calculation, using pad-to-chunks algorithm
… later we want to pad the input to multiple chunks, padding size needs to be calculated at runtime
… must get shape at runtime and return tensor shape in another tensor, int64 or uint32
… tensor size used by other calculations to calculate e.g. padding to chunk size value
… only static padding supported currently
… dynamic size should be accepted, that's why we proposed dynamicPad
… 9 proposed ops, one missing, sample element-wise unary op
… they are dynamic versions of existing ops, e.g. dynamicReshape
… range is a tensor generation op
… Z-Image-Turbo has three models
… this proposal is to add 9 new ops in addition to the previous proposal that is also required, both are required
… we can validate tensor shape, dynamic version of this op, you don't know what the output tensor size is at build time
… next step to explore dispatch-time shape validation when all tensors' sizes are specified or can be inferred
Anssi: multiple new models require these new features?
Ningxin: correct
MarkusT: thank you Ningxin for this work
Anssi: everyone happy with the direction and the proposed next steps?
Rafael: I'm supportive of this work if we can do this, given some platforms are more challenging wrt dynamic shapes
MarkusT: some NPUs may have problems supporting this, in those cases GPU or CPU could be used as a fallback
Ningxin: I think fallback compute policy provides a solution to that issue