WebML WG Teleconference – 18 June 2026

Meeting minutes

Anssi: please join me in welcoming the latest new participant to the WG:
… Severin Ferrand from Google
… Yehonatan Daniv from Wix.com
… Julien Bataille from Rakuten Group, Inc.
… welcome all!

Announcements

TPAC 2026

Anssi: as discussed, TPAC 2026 takes place in Dublin, Ireland on 26-30 October 2026
… I have requested Monday, 26 October 2026, for this Working Group, to be confirmed by TPAC planners
… I have also started a F2F agenda issue where I invite the group to share their thoughts on potential agenda topics for the TPAC F2F meeting:

webmachinelearning/meetings#39

<gb> Issue 39 WebML WG/CG F2F Agenda - TPAC 2026 (Dublin, Ireland) (by anssiko)

Anssi: the expectation is we use the F2F to have both discussions on the shorter-term issues as well as longer-term horizon with new features for future directions
… we also provide space for sharing demos and implementation updates with the broader community

Anssi: I have planned the Community Group meeting for Tuesday, 27 October 2026, so if you attend TPAC you may want to consider attending that as well to connect with the broader community
… we operate in the same space and there is a lot of overlap between the two groups, TPAC is a prime opportunity for cross-pollination of ideas between the groups
… I expect the TPAC group schedule to be finalized later this month, I will share more information as soon as it becomes available

Anssi: questions, comments?

<Zakim> anssik, you wanted to ask a question

New charter proposal

Anssi: I want to share an update on the new charter proposal for this Working Group
… as you may have noticed, the WebMCP proposal being developed in the WebML CG has generated excitement and momentum

webmachinelearning/webmcp

Anssi: I added WebMCP deliverable to the WebML CG last year, and I'm pleased to see the progress that has been made by the WebML CG, and the excitement WebMCP has generated within the broader web community
… WebMCP is one of the fastest growing incubations in the history of the W3C's Community Group (CG) program
… the W3C CEO and senior leadership team is supportive of evolving the WebML WG charter to include WebMCP as a deliverable
… the next step is to finalize the formal charter proposal document that outlines the proposed changes to the charter, including the addition of WebMCP as a deliverable, and submit it for review by the W3C membership
… the draft charter proposal is available for you to review:

https://github.com/w3c/charter-drafts/pull/829/changes

<gb> Pull Request 829 [wg/webmachinelearning] Adds WebMCP in scope (by plehegar)

Anssi: as an important deliverable, I expect active discussion in the W3C community, a healthy part of the process
… I welcome your feedback on the draft charter proposal on our next call
… in the meantime, if you have any immediate thoughts or questions about the draft charter proposal, please feel free to share them now or reach out to me directly

Web Neural Network API

Repository: webmachinelearning/webnn

WebGPU interop

Anssi: this topic is to discuss WebGPU interop and expected spec changes
… I have invited Bryan to the call who has signaled interest in this topic and has been actively involved in the WebGPU interop discussions
… the MLTensor Explainer discussed WebGPU interop:

https://github.com/webmachinelearning/webnn/blob/main/mltensor-explainer.md#webgpu-interop

Anssi: we have a few "webgpu interop" related issues:

"webgpu interop" issues

Anssi: issue #529 for WebNN timelines

<gb> Issue 529 Specify WebNN timelines (by a-sully) [webgpu interop]

another issue #343 for WebGPU interop sample code

<gb> Issue 343 Add sample code for WebGPU interop (by huningxin) [editorial] [webgpu interop]

Anssi: are there any other spec changes expected either in the WebNN or WebGPU spec, or both, to enable WebGPU interop?
… have we ported over all the necessary WebGPU interop bits from the MLTensor explainer to the WebNN spec?

Reilly: we have a Chromium prototype, the JS API itself is implemented to some extent, platform-specific bits e.g. HW buffers, most mature on macOS, being implemented on Windows and Linux with LiteRT
… it is currently in the explainer only, WebGPU interop changes have not been ported over to the WebNN spec
… Bryan made a change to make the export feature synchronous to improve pipelining

Bryan: I agree with Reilly, the explainer is out of date, a lot of people want to use this feature and would like to see this well specified

<RafaelCintron> +1 to specifying it more formally and updating the explainer.

<ningxin> +1

<Mike_Wyrzykowski> +1

Bryan: more formal specification would be helpful, does the group support specifying WebGPU interop parts in the spec?

<dwayner> +1

Ningxin: about WebGPU interop sample, I shared a demo at TPAC 2025, I recommend to move that sample to the group's webnn-samples repo

<reillyg> +1

<ningxin> +1

<RafaelCintron> +1

RESOLUTION: The group acknowledges the importance of WebGPU interop and supports porting over the necessary features to the WebNN API spec from the MLTensor explainer.

<reillyg> q_

Reilly: Bryan mentioned some difficulties, was that about undocumented buffering requirements maybe?

Bryan: today the usage, if you import a buffer, it is implicit, what format each side of the APIs should use

Reilly: these could be formally documented in the spec
… when implementing this on Core ML there are some limitations in terms of data types and shapes for input buffers
… we partially worked around these, probably an open question, something to be exposed through opSupportLimits

Bryan: there was another issue about device selection that breaks interop, we need to clarify how that will work

Reilly: there's a version of content creation that takes a GPUDevice
… implicit hint to pick a device that is efficient
… some platforms may ignore this hint
… looking at ML inference frameworks, GPUs seem to like tensors in certain shapes

Bryan: we have to come up with guidelines for this

Reilly: can you open a new GH issue for this Bryan?

<ningxin> +1 to open an issue about gpudevice passing

Bryan: will do that, filing separate issues

Effective MLComputePolicy

Anssi: issue #934

<gb> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] [Agenda+]

Anssi: last time we received an update on the Dynamic AI Offloading Protocol (DAOP) incubation by Jonathan
… we had a limited timebox for DAOP, so on this call, I wanted to share some additional information about the explainer to inform the effective MLComputePolicy discussion

The estimateQoS() API
… DAOP explainer contained estimateQoS() API for performance negotiation
… using the estimateQoS() API the web developer can get a performance tier estimate for executing the graph on the local device, which can help them make informed decisions about offloading to the cloud or executing locally

dictionary MLQoSReport {
  MLPerformanceTier performanceTier;
};
partial interface MLContext {
  Promise<MLQoSReport> estimateQoS(MLGraph graph, optional MLQoSOptions options);
};
dictionary MLQoSOptions {
  // Input characteristics
  record<DOMString, MLOperandDescriptor> inputDescriptors;
  // Weights characteristics (Optional)
  boolean weightsSparsity = false;
};

Performance Tiers

Anssi: performance tiers are represented as Tier strings to avoid fingerprinting and to allow implementations evolve the exact tier boundaries based on their specific hardware capabilities and performance characteristics:

Anssi: Tier / Indicative Latency / Interpretation
… "excellent" / < 16 ms / Real-time (60 fps frame budget)
… "good" / < 100 ms / Interactive responsiveness
… "fair" / < 1 s / Responsive for non-real-time tasks
… "moderate" / < 10 s / Tolerable for batch or one-shot tasks
… "slow" / < 30 s / Noticeable wait
… "very-slow" / < 60 s / Long wait
… "poor" / ≥ 60 s / Likely unacceptable for most use cases

Anssi: the important point to note is that the exact tier boundaries are implementation-defined
… the explainer also provides an "Adaptive Background Blur" example to make the API more concrete:

https://github.com/webmachinelearning/daop/tree/main#example-code-adaptive-background-blur

Anssi: in this example, if the tier is one of "excellent", "good", "fair", or "moderate" the graph is executed locally, otherwise the graph is offloaded to the cloud

Boolean Requirement API

Anssi: this meetsRequirement() API returns asynchronously and via events a "can meet requirement" boolean response given a tier expectation:

partial interface MLContext {
  Promise<boolean> meetsRequirement(MLGraph graph, MLPerformanceTier requiredTier, optional MLQoSOptions options);
};
interface MLQoSChangeEvent : Event {
  readonly attribute boolean meetsRequirement;
};

Anssi: with these DAOP insights in mind, I want to circle back to the effective MLComputePolicy discussion in issue #934

<gb> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] [Agenda+]

Anssi: the following proposals were discussed last time:
… - 1) compilation metrics & runtime estimates by MikeW
… - 2) low latency v high throughput tradeoff implications by Dwayne
… - 3) strict hints to fail at build by MarkusH
… - 4) "low-latency" and "precision" hints by Dwayne

Anssi: looking at the recent comments, I see an exchange between Ningxin and MikeW about runtime drift considerations
… Ningxin acknowledged MikeW's point that system load, thermal state, and accelerator contention do change over time
… Ningxin proposed an intendedComputePolicies API, a compile-time signal to address MarkusH's use case "if fallback, route to cloud":

webmachinelearning/webnn#934 (comment)

<gb> Issue 934 Effective MLComputePolicy exposure (by anssiko) [policy selection] [Agenda+]

Ningxin: I want to find a middle ground, MikeW make a good point that runtime characteristics are dynamic, timestamp query would be one approach
… my point was for an application, for MarkusH's use case, the intended compute policy, the best result the context can deliver against the user's hint, decided at MLGraphBuilder.build() and stable for the MLGraph's lifetime
… the API could have some signal to let the application know the compile-time intention for graph execution plan, we make it clear this could change at runtime, if runtime monitoring is needed, other solutions need to be used

Anssi: the proposed API for this is as follows:

interface MLGraph {
  // Compile-time assignment, not a runtime guarantee.
  readonly attribute FrozenArray<MLComputePolicy> intendedComputePolicies;
}

MikeW: the association of fallback with a quaranteed CPU path seems questionnable, because there can be multiple CPU path
… hints for web apps to make a decision is a wrong API shape
… the current proposal seems to be that hints can be translated to performance characteristics, it would be better to provide performance characteristics, for instance, latency, how many milliseconds bucketized, runtime execution, bucketized

Reilly: MikeW, in the scenario where we are providing bucketed performance characteristics, how does that interact with the system how the graph is constructed at later stage?
… how much does this is a guarantee, how much sites could use this information?

MikeW: you can poll the API, or the granularity of the metrics will be made less precise

Rafael: question to MikeW, is your proposal to have the UA take the model and guess how long it takes to run and provide that information before inference?
… and during runtime, tell how long it took?

MikeW: I don't have preference and would be fine with either of these options

Rafael: the second solution seems better fit, the guessing approach, how would you implement that on Core ML?

MikeW: we can guess when we compile the model, you get back information from Core ML, and can guess based on this, consider it a rough hint

Rafael: if all ops running on GPU the guess would be "fast"
… I wonder how well that would work in practice, how good a guess our "guess" would be

Reilly: I think that guessing is not required, you can run the model, and in the heat of loading the benchmark would be worse than at runtime
… possible to implement, yes, but not clear on the details
… related discussion for CPU Performance API on blink-dev
… multiple pipelines could submit to hardware, changes are dynamic, if performance is not adequate could switch to another compute unit
… applications are capable of dynamic scaling, but in ML models, loading an ML model is a significant upfront cost, loading and compiling both are costly
… the purpose of an API like this is to really allow the developer to pick something to do initially that is close to correct
… the developer may look at this performance information and see it is "mid-tier" and it is much more than they need, and they could download a bigger model in the background, while the app is running fine using a smaller model
… the space seems to me like solving the "page load time problem", how to in a reasonable amount of time to decide what to load initially to allow the page to function
… all the solutions discussed don't solve that problem

Ningxin: by revising MarkusH's use case I observe, it is not only about model inference performance, but also want to avoid data copies
… use case about noise suppression, audio comes from CPU and the application does not want to move the work to other compute unit
… this is something I suppose the app wants to avoid
… also Markus mentions the need to get the execution to be scheduled to a low-latency enabling compute unit
… I want to add to MikeW's proposal, if we have performance characteristics, could we also indicate if there is a data copy?
… e.g. add low-latency signal?

MikeW: what Ningxin and Reilly said sounds good, both ideas are valuable and we should consider real use cases such as MarkusH

Discrepancies with maxPool2d padding behavior

Anssi: issue #935

<gb> Issue 935 Discrepancies with maxPool2d padding behavior (by philloooo) [operator specific] [Agenda+]

Anssi: Phillis reports that maxPool2d with roundingType "ceil" produces inconsistent results for different backends when pooling windows cover only padding elements

https://www.w3.org/TR/webnn/#dom-mlroundingtype-ceil

Anssi: questions posed to the group:
… - "What is the expected behavior when a pooling window covers only padding elements for maxPool2d?
… - "Should the spec clarify whether padding elements are treated as -Infinity (ignored) or if there is a fallback value like 0 when no valid input elements are in the window?"

Anssi: Ningxin proposed "Ignoring padding elements for maxPool2d makes sense to me. A window covering only padding would also be ignored."
… Dwayne +1'd "ignoring padding elements for maxPool2d"
… suggests that for "a window covering only padding" needs some value for the pool() op, either 0 or -Infinity

Ningxin: for maxPool2d if we ignore window, the value does not matter, we should make the spec text clear so that we ignore also window for padding elements, should change the output size calculation to adjust output size with stride and padding if window only covers the padding elements

Dwayne: I mentioned in general, you can't sample outside window and padding size, making this point moot

Ningxin: in my proposal "ignore padding elements" if the window covers partially, this is implementation-defined, spec can say ignore this
… ONNX reference implementation is NaN for this case

RESOLUTION: Ignore padding elements for maxPool2d and adjust output size when a window covers only padding. (issues #935)

<Mike_Wyrzykowski> +1

– DRAFT –
WebML WG Teleconference – 18 June 2026

18 June 2026

Attendees

Meeting minutes

Announcements

TPAC 2026

New charter proposal

Web Neural Network API

WebGPU interop

Effective MLComputePolicy

Discrepancies with maxPool2d padding behavior

Summary of resolutions

Diagnostics