Meeting minutes
Repository: webmachinelearning/webnn
anssik: please welcome Aditya Chhabra, Drew Morris, Adnaan Nazir, Josi Rosenfeld, Eugen Thaci, Jeffrey Phillips Freeman from CleverThis Inc. to the WebML WG and CG!
… CleverThis is a tech company with interest in open standards and responsible and explainable AI
Eugen: I'm assisting Jeffrey in this WG
anssik: my expectation is CleverThis participants will be interested in contributing to the Ethical Principles work we have in this WG
anssik: also, please welcome Kevin Petit from ARM Limited to the WebML WG
… and finally, please welcome Jonathan Schneerson representing Temporal Series AI, a startup working on AI-driven solutions for financial markets, to the WebML CG!
Incubations
anssik: our next CG meeting is at EU and APAC friendly time on Mon 28 April:
https://
anssik: the Community Group is adding a new deliverable Proofreader API to its scope
https://
anssik: please join on Mon if you're interested in the new local inference web extension experiment proposed by Tarek/Mozilla for consideration as a new CG deliverable
… we'll also discuss Prompt API feature requests: multi-modal real-time capabilities, Model Context Protocol support
… built-in AI APIs received early wide review, added shared privacy and security considerations, and i18n expert feedback helped address identified i18n issues with language detector
WG-CG collaboration
anssik: to encourage broader participation, I pushed this WG meeting forward by 1 hour
… an optional proposal is to reuse certain CG meetings for both WebNN and Built-in AI discussions, available options:
… AMER-APAC friendly meetings ~monthly Tue 17:00 PDT / Wed 08:00 CST
… EU-APAC friendly meeting ~monthly Mon 9:00 CEST / Mon 15:00 CST
anssik: the standards group (WG) and the incubation group (CG) have different IPR policies, but if the active participants are in both we can make this happen, you can join at:
https://
anssik: finally, the intent is not to create more meetings, but reuse the existing meeting infra more flexibly
… questions, feedback, thoughts?
Christian: +1 for the EU friendly slot
Ningxin: both slots are Shanghai friendly, so good with both
<jsbell> For me: neutral; whether a discussion is useful in a given meeting depends who will be present, so knowing agenda and predicted attendance could make it work
MikeW: Tue 17:00 PDT usually works
BlinkOn 20 takeaways
anssik: BlinkOn 20 happened earlier this month
… I'd like to discuss takeaways relevant to this group
… we have representatives from non-Chromium engine projects in this group, so I felt it is helpful to update everyone on discussions relevant to our cross-browser spec efforts
… and thanks to BlinkOn organizers for publicly sharing videos from the sessions
… I found the following two talks particularly relevant to the WebML WG and CG:
Compute Abstraction for AI: Wasm, WebGPU, and WebNN
Expandable Built in AI: Opening the Vision with Shared AI Models
… in the agenda I mapped a few BlinkOn discussion topics to our spec issues
… I likely missed some relevant talks, so please fill me in
… for WebNN:
… - Expose available hardware → device selection
… - Ahead-of-time compilation → MLGraph caching
… - NPU differences & model portability → op support limits future work?
… - Hybrid execution → best-effort buffer-sharing with tensors and constants
<jsbell> 3 more discussions on the "AI track":
<jsbell> Exploring Challenges with Cross Orign Resource & Model Sharing
<jsbell> Built in AI: One Year Later
<jsbell> WebGPU in Blink Discussion
jsbell: three other talks that are relevant
… cross-origin sharing that is experimental
… another built-in AI talk
… third one was WebGPU talk relevant for the interop aspects
… the broader audience did not know the specifics of compute APIs in general, so the talks were on a higher level of abstraction
… anssik: for built-in AI APIs:
… - Common model format & ops → (paused) Model Loader API, core operator set
… - Model sharing → built-in models exposed to low-level compute APIs?
… - Built-in model transparency → Model Cards integration?
MikeW: WebKit contributors meeting is in the Fall
Etienne: my key takeaway was AI is everywhere, inference on the web, a lot of feedback on Kenji's and my talk about built-in AI APIs
… Prompt API with structured output gathered interest, differences between browsers, but we think developers can work around that
… shared AI models has potential to solve same-origin cache issue
Operator specific issues
anssik: today we'll review and discuss operator specific issues that reduce code complexity and improve maintainability
anssik: the `pad` operator has a few such open issues I wanted to discuss next
Clarify constraints for pad operation
anssik: issue #377
<gb> Issue 377 Need clarify constraint of 'beginningPadding' and 'endingPadding' for pad operation in "reflection" and "symmetric" mode (by BruceDai) [operator specific] [interop]
anssik: to elaborate, the issue is about the need to clarify constraint of 'beginningPadding' and 'endingPadding' for pad operation in "reflection" and "symmetric" mode
https://
anssik: in the issue discussion the following platform APIs and frameworks have been reviewed:
… - DML_PADDING_OPERATOR_DESC
… - tf.mirrorPad
… - torch.nn.ReflectionPad2d behavior
… and recently Phillis shared Core ML constraints for its "reflect" and "replicate" modes that map to MLPaddingModes "reflection" and "edge" respectively
… and per Ningin's experiment, without limiting the padding size, TFLite gives different results than DirectML
… it looks like the latest proposal is to limit the padding size
… this proposal was supported by both Dwayne and Ningxin
Dwayne: correct
Ningxin: +1
anssik: rationale is the following, I think:
… - this is a safer option
… - allows for future extension to support extended wrapping if we identify models that require that feature to perform well
… - as a bonus, this behaviour could be emulated
… do we have an agreement to proceed with limiting the padding size?
<jsbell> SGTM!
anssik: no concerns, editors are free to proceed with the proposed solution
<jsbell> For #739, I think the big question is: can we drop "symmetric" ?
<gb> Issue 739 Limited support for pad on CoreML backend (by philloooo) [operator specific] [interop]
Limited support for pad on CoreML backend
anssik: issue #739
… this issue reports findings from Core ML backend implementation, specifically constraints wrt MLPaddingMode equivalent:
https://
anssik: we touched some of this in the previous issue
anssik: Phillis reports from Core ML:
… 1. `symmetric` mode is not supported
… 2. padding for more than the last two dimensions only supports `constant` mode
… 3. If mode is `reflect` (aka `reflection`) then beginning and ending paddings can be at most input size-1
… 4. If mode is `replicate` (aka `edge`) then beginning and ending paddings can be at most input size
… we discussed 3 and 4 in context of issue #377
<gb> Issue 377 Need clarify constraint of 'beginningPadding' and 'endingPadding' for pad operation in "reflection" and "symmetric" mode (by BruceDai) [operator specific] [interop]
anssik: for 1 and 2 the question is can they be emulated?
… the issue also contains proposals for Core ML pad future work
… MikeW has there been any updates to Core ML in this regard, how to share this feedback with your Core ML team?
MikeW: I can take the suggestions to the Core ML team
jsbell: based on research it looks like only one backend supports symmetric, so suggest dropping it from the spec and add if it is needed
Dwayne: I'm OK with it, I don't recall a model that used it
… OK dropping symmetric, another possibility is to have opSupportLimits support for modes
… if we can just drop symmetric and emulate edge case it is OK
ningxin: if there are no well-known use cases I'm fine dropping it to simplify implementation, if use cases are identified later we can reconsider
<Joshua_Lochner> https://
Joshua_Lochner: I could assist with what models are supporting that, can provide additional information from transformers benchmarking database, on models that use symmetric padding
… now says uses pad operator, can expand what mode is used
Dwayne: data on this welcome in #739
<gb> Issue 739 Limited support for pad on CoreML backend (by philloooo) [operator specific] [interop]
Tensors for graph constants
<gb> MERGED Pull Request 830 Allow tensors for graph constants. (by bbernhar)
<gb> CLOSED Issue 760 Support building graphs from `MLTensor` containing constants (by bbernhar) [feature request]
anssik: over the course of the last two weeks all the suggestions in the PR were addressed, PR reviewed and merged, thank you everyone
… thanks Bryan for the PR, Bryan and Austin for prototyping, others for reviews and help
… from the IDL perspective, the change is the following:
interface MLContext { + Promise<MLTensor> createConstantTensor( + MLOperandDescriptor descriptor, AllowSharedBufferSource inputData); // ... }; interface MLTensor { readonly attribute FrozenArray<unsigned long> shape; readonly attribute boolean readable; readonly attribute boolean writable; + readonly attribute boolean constant; undefined destroy(); }; interface MLGraphBuilder { // Create an operand for a graph constant. MLOperand constant(MLOperandDescriptor descriptor, AllowSharedBufferSource buffer); // Create a scalar operand from the specified number of the specified type. MLOperand constant(MLOperandDataType type, MLNumber value); + // Create an operand from a specified constant tensor. + MLOperand constant(MLTensor tensor); // Compile the graph up to the specified output operands asynchronously. Promise<MLGraph> build(MLNamedOperands outputs); };
anssik: anything specific to report about this PR?
Dwayne: constants gives the backend an opportunity to optimize data layout, as it is not changing in the future
jsbell: question, what are the plans for ORT Web to take advantage of this?
ningxin: we could have that in our plan, we have a use case for a merged model, two subgraphs sharing constants as weights
… will update the group when available
Core operator set
anssik: #573
<gb> Issue 573 Core operator set (by philloooo) [question] [opset]
anssik: recently Arm Ltd folks joined the WG (welcome!) so I wanted to reinvigorate this effort that aims to identify current primitive gaps by mapping compositional fundamentals (e.g. PyTorch prims, TOSA, StableHLO) to WebNN operators
… Dwayne has produced a table to help with this mapping:
Machine Learning Operator Mapping - All Raw Operators
anssik: in particular I wanted to discuss if the TOSA mappings identified as part of this exercise have any open questions attached to them that ARM participants could help address
… for reference, the MLIR TOSA dialect implements the TOSA specification:
anssik: I note some of the recently added Wave 3 WebNN ops have not been mapped to TOSA yet in this table:
transpose triangular where tile sign scatterNd gatherNd gatherElements scatterElements cumulativeSum quantizeLinear dequantizeLinear logicalAnd logicalOr logicalXor notEqual
anssik: it looks like some of these map quite directly to the TOSA 1.0.0 spec but it'd help if someone working closely with TOSA could take a look and provide feedback in this issue?
Dwayne: I remember talking with NVIDIA folks that they're planning to simplify TOSA, so have been waiting for that
ningxin: I want to note issue #817 about rounding op
<gb> Issue 817 Rounding operators (by fdwr) [feature request] [interop]
ningxin: use case for ONNX Runtime Web decomposition, missing rounding op support, rounding should be part of the core op set
<jsbell> +1 to adding rounding (already gave issue thumbs up)
MikeW: with the rounding, the behaviour should be consistent across platforms and backends
Dwayne: Ningxin tested round on Core ML, how about different compute units?
ningxin: that'd be good, I need to investigate how to do that
jsbell: I talked with Phillis and what you request about compute units
jsbell: Dwayne, one of the things in your analysis you found expanding conv beyond 2d make it N dimensional
… do you see models that require that?
<ningxin> whisper uses conv1d
Dwayne: we see conv1d cases
jsbell: Joshua_Lochner, perhaps you can add conv1d to your script?
Joshua_Lochner: can you send me a message with details and I'll follow up, I'm updating the script so everyone can run it locally
Caching mechanism for MLGraph
anssik: issue #807
<gb> Issue 807 Caching mechanism for MLGraph (by anssiko) [question] [feature request]
anssik: we made a decision at out last meeting to create a new explainer that aligns with the Chromium implementation direction, Reilly is working on that
… meanwhile, we've received very initial Chromium implementation feedack via Shiyi (thanks!)
<gb> Pull Request 227 [DO NOT SUBMIT] Model cache POC (by shiyi9801)
anssik: there's also an example how this caching feature integrates into an existing image_classification webnn-sample from Shiyi (thanks again!):
anssik: basically the web developer first stores a key locally and replaces the build() line with caching logic wrapped in try...catch block
… the logic first tries to loadGraph() and if that fails falls back to build() followed by saveGraph()
… here's the code snippet:
try { console.log("try to load graph..."); this.graph_ = await this.context_.loadGraph(this.modelCacheKey_); console.log("load graph succeed!"); } catch (e) { console.log("failed to load graph: ", e.message, " try to build graph..."); this.graph_ = await this.builder_.build({'output': outputOperand}); await this.context_.saveGraph(this.modelCacheKey_, this.graph_); }
anssik: this example looks clear, any feedback or suggestions?
… Ningxin, any feedback from ORT backend model cache experimentation, e.g. does this WebNN feature integrate fit in with ORT Web?
ningxin: have discussed with Wanming, would make it simple from the start, provide the key so WebNN EP does what the sample shows above
… want to have a prototype for this, earlier we discussed if we should bring native ORT EPContext to the web, but that's more complex
… we re-thinked that and want to go with the simpler solution and will let the group know
<anssik> s/… anssik: for built-in AI APIs:/anssik: for built-in AI APIs: