WebML WG Teleconference – 12 March 2026

Meeting minutes

Anssi: please welcome the new participants who joined the WG:
… Christopher Hebert from NVIDIA
… Ninja.zx Zhangxiao from ByteDance
… Jaewon Lee from Google
… welcome all!

Incubations

Dynamic AI Offloading Protocol

Repository: webmachinelearning/daop

Anssi: PR #1

<gb> Pull Request 1 Add DAOP explainer and estimateQoS() illustration with background blur demo (by jonathanding)

Anssi: today, we will review and approve the initial PR as the baseline for further incubation
… to recap the timeline:
… in January, the group resolved to create an explainer for the Dynamic AI Offloading Protocol (DAOP) and initiate prototyping:

2026-01-15 resolution

Anssi: in February, Jonathan delivered an explainer and a prototype and the group reviewed the proposal

2026-02-12 review feedback
… today, the proposed next step is to approve to merge the initial PR to allow further incubation of this proposal in the WebML CG
… the WebML WG will review the progress of this CG incubation from time to time and will separately from today's decision resolve whether to adopt the feature into the WebNN spec after adequate incubation period

Anssi: any comments or questions?

<Mike_Wyrzykowski> +1

<ningxin> +1

<zolkis> +1

RESOLUTION: Approve the initial Dynamic AI Offloading Protocol (DAOP) PR as the baseline for further incubation in the WebML CG.

WebNN Graph DSL and Portable File Format

Repository: webmachinelearning/proposals

Anssi: issue #16

<gb> Issue 16 Standardize a WebNN Graph DSL and Portable File Format (by tarekziade)

Anssi: for this topic I want to review and discuss use cases for the new proposal, WebNN graph DSL and portable file format
… this proposal from Tarek addresses the following use cases:
… 1) Toolchain interoperability: ONNX (or other formats) -> WebNN deployment
… 2) Team collaboration and code review
… 3) Long-term reproducibility and governance
… the target audience is ML tooling authors, framework and converter maintainers, browser implementers, and application teams shipping WebNN models at scale
… the proposal has three parts:
… (1) a human-readable text representation (.webnn)
… (2) a canonical JSON representation for tooling
… (3) an optional external weights manifest + binary container
… Dave Raggett supported the proposal in the issue
… MarkusT proposed to consider GGUF as the weight file format

GGUF file format

MarkusT: first, I was thinking of GGUF initially, maybe safetensors is more appropriate
… we have Python and Rust bindings, C++, JS for WebNN with a serialization format we'd have the ecosystem
… Torch to WebNN converter being worked on by a colleague of mine
… my questions to the group, does the group like the idea WebNN would be for not just for web browers but also for other environments?
… also about tests, would be great to be able to have a more generic test suite for other languages

MikeW: WebNN being for other environments, I don't think there's any objections for having support from other enviroments, for WebNN Apple prefers it being limited to the Web as a naming thing, if a native environment wants to exists it is fine

MarkusT: we can think about the names, we have RustNN already for the Rust implementation, we'd like to keep the consistent interface across languages

MikeW: RustNN (as a name) would be totally fine

Rafael: I have not read the proposal in detail, but would not be opposed to spec a serialization format
… the ops supported in the seriazation format should be specified here in this group to ensure interop
… whether this should be implemented just in browsers, we should be open to other enviroments to allow use of this proposed format

MarkusT: this could be a small library that could be reused by projects
… this serialization format is defined as close to WebNN ops as possible and should map to WebIDL

Anssi: as the next step I'd invite further review from the group for this proposal

Web Neural Network API

Repository: webmachinelearning/webnn

Public API surface reduction investigation

Anssi: issue #907

<gb> Issue 907 Composite operators / subgraphs (by fdwr) [opset]

Anssi: Markus opened an issue for a proposal to reduce the public API surface with a subgraph operator library
… the group has had similar discussions earlier in context of core op set #573

<gb> Issue 573 Core operator set (by philloooo) [question] [opset]

Anssi: Markus proposed to let backends implement more complex ops, such as lstm and gru #689

<gb> Issue 689 Consider removing `lstm` and `gru` operators (by a-sully) [question] [operator specific]

Anssi: Markus' summary shared:
… 26 WebNN ops with an emulation path
… 3 ops only via emulation path
… I note WebNN op set has evolved guided by the following litmus test for each op:
… - are there use cases?
… - are there sample models?
… - cross-framework support?
… - implementable and performant across platforms?

Contributing: Proposing and adding a new operation

MarkusT: for RustNN I implemented WebNN per the spec, if we build composite ops we wouldn't need to do this for all backends
… with a library approach people could make a new graph and spin it into a new higher-level op

MarkusT: would have been easier to not needing to implement those 26 ops
… to support lstm and gru, this discussion would not be required if we'd do that via a subgraph approach

Dwayne: I agree the smaller op set facilitates adoption, for that list I want to check we have emulation paths defined, I would not remove ops before we have subgraphs feature done

MarkusT: agree to not remove any ops before subgraph feature has been added

Dwayne: if WebNN's graph builder is extended with dynamic tensor shapes, then those operators certainly become more important

MarkusT: we'd have to detect those ops are emulated, there are a few ops that would be interesting to carry around
… we don't want to add too many ops to the spec by default

Anssi: proposed next step to figure out a clean interface for the subgraph before advancing with the op cleanup

Ningxin: want to add an aspect from a performance perspective, we discussed that high-level ops were added due to performance
… before removing any I highly suggest the group will investigate that we can fuse the ops if we feed subgraph to them to understand the performance impact of possible op removal

Generalize/relax the normalize* ops

Anssi: issue #490 and #904

<gb> Issue 904 2D, or not 2D, that is the question (by fdwr) [opset]

<gb> Issue 490 Clarify `instanceNormalization`'s support for 1d, 2d, 3d use cases (by huningxin) [operator specific]

Anssi: initially this issue was filed by Jiewei in 2023 to rename instanceNormalization to instanceNormalization2d to clarify its 2d data use case
… we got recent input to resample2d naming, and Dwayne pointed out this is under consideration in #904
… this recent comment spurred another improvement suggestion from Dwayne
… to generalize/relax this now unnecessary rank restriction for also normalize* ops (normalizeInstance, normalizeBatch, normalizeLayer)

Dwayne: I will update the issue #904 with new information

<Mike_Wyrzykowski> +1

<mtavenrath> +1

RESOLUTION: Generalize/relax the now unnecessary rank restriction for also normalize* ops. (issue #904)

Create a context in-parallel steps refinement

Anssi: issue #919

<gb> Issue 919 Creating a context should be done using finer-grained in-parallel steps (by gterzian)

Anssi: Gregory who works on RustNN with Tarek and Markus proposed a fix to the WebNN spec to align its "create a context" algorithm with the latest in-parallel steps conventions

https://www.w3.org/TR/webnn/#create-a-context

Anssi: Greg's proposal verbatim:

  https://github.com/webmachinelearning/webnn/issues/919#issue-3951842470

Anssi: I think the group agrees with this proposal, so this proposal is ready to be converted into a PR
… any questions or comments?

Power preferences and the fallback concept

Anssi: issue #911

<gb> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]

Anssi: Zoltan shared an updated IDL sketch to make the proposal more concrete (thanks for all the iteration!)
… comments were requested from MarkusH, Mingming, MikeW, and Ningxin

webmachinelearning/webnn#911 (comment)

<gb> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]

Anssi: here's the proposed IDL:

enum MLComputePolicy {
  "default",  // or even better: "auto",
  "high-performance",
  "low-power",
  "fallback"
};
dictionary MLContextOptions {
  MLComputePolicy computePolicy = "default";
};
[SecureContext, Exposed=(Window, Worker)]
partial interface MLContext {
  ...
  readonly attribute MLComputePolicy computePolicy;
      // or would you prefer a "boolean fallback" instead?
  readonly attribute boolean fallback;
}

Anssi: a summary of IDL changes:
… - rename enum MLPowerPreference -> enum MLComputePolicy (enum rename not web-exposed)
… - add a new "fallback" value to MLComputePolicy
… - add a new MLContext.computePolicy attribute
… - add a new MLContext.fallback attribute

Anssi: MarkusH gave thumbs up in the issue

Zoltan: not a powerPreference anymore, using an ORT convention

Rafael: I think what's the issue is about is fine, if we but fallback in the enum, the attribute on the MLContext need to be computePolicy

Zoltan: IIRC there was a reason to expose a fallback as a boolean
… thanks for pointing that out

Rafael: I'd be fine with the WebGPU approach, powerPreference + fallback separately
… if we put this in an enum we don't need the seperate boolean, to avoid double fallback

Zoltan: we should expose computePolicy, I think I agree

Anssi: would it be a reasonable next step to spin up the PR based on this IDL sketch and improvements we just discussed

Ningxin: I think Mingming can help with the PR

<Mike_Wyrzykowski> +1

<gb> Issue 911 accelerated should be prior to powerPreference for device selection (by mingmingtasd) [device selection]

<zolkis> +1

<ningxin> +1

RESOLUTION: Convert the MLComputePolicy IDL sketch into a spec PR. (issue #911)

Anssi: I'd like to squeeze the post-compile graph.devices as a new topic next because MikeW bumped that (thanks!) and it connects with this topic we just discussed

webmachinelearning/webnn#836 (comment)

<gb> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]

Post-compile graph.devices

Anssi: issue #836

<gb> Issue 836 Get devices used for a graph after graph compilation (by philloooo) [device selection]

Anssi: MikeW asked whether this issue can be closed, so I wanted to bring this topic to the agenda
… I believe the remaining open tasks for the group are:
… - document use case(s) for graph.devices, MarkusH contributed the adaptation use case, others?
… - check graph.devices API and pre-compile hints (MLComputePolicy) fit well together

Anssi: ONNX Runtime is adding support for querying the device type to which the subgraph is assigned to
… any new implementation experience to be shared from that?
… I'd await for these two tasks to be completed before making a resolution on this issue

MikeW: there's strong concerns from privacy perspective for having any mechanism to report on which device the graph runs, we'd prefer to keep with the hints as discussed in the earlier issue and not provide the information to web apps regarding which device the graph run on

Rafael: wanted to ask about the Kobe resolution

MikeW: if the information is "is my graph accelerated" we can provide it via hints, post-compilation the graph runs on accelerated device

<handellm> has the audio broken again?

MarkusH: wanted to understand can we expose effective compute policy on the graph?

MikeW: I think exactly that, using the same enum for both the pre- and post-compilation that'd be acceptable

Zoltan: wanted to ask, this is relative to the whole graph, what happens when we have subgraphs, can we execute different graphs on different devices

Rafael: a machine with an NPU without all the ops, the system could say "low-power" yes, but not all ops supported still
… then can find out only some of the ops run on NPU and others on CPU and the experience will be worse
… for fingerprinting concern, maybe we can have a compromise by using bucketing "90% run on NPU" so people don't know exactly which ops run on which device

<zolkis> +1 to Rafael

– DRAFT –
WebML WG Teleconference – 12 March 2026

12 March 2026

Attendees

Meeting minutes

Incubations

Dynamic AI Offloading Protocol

WebNN Graph DSL and Portable File Format

Web Neural Network API

Public API surface reduction investigation

Generalize/relax the normalize* ops

Create a context in-parallel steps refinement

Power preferences and the fallback concept

Post-compile graph.devices

Summary of resolutions

Diagnostics