Web Neural Network API

W3C Working Draft,

More details about this document
This version:
https://www.w3.org/TR/2022/WD-webnn-20220324/
Latest published version:
https://www.w3.org/TR/webnn/
Editor's Draft:
https://webmachinelearning.github.io/webnn/
Previous Versions:
History:
https://www.w3.org/standards/history/webnn
Test Suite:
https://github.com/web-platform-tests/wpt/tree/master/webnn
Feedback:
GitHub
Inline In Spec
Editors:
Ningxin Hu (Intel Corporation)
Chai Chaoweeraprasit (Microsoft Corporation)
Explainer:
explainer.md
Polyfill:
webnn-polyfill / webnn-samples

Abstract

This document describes a dedicated low-level API for neural network inference hardware acceleration.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

The Web Machine Learning Working Group maintains a list of all bug reports that the group has not yet addressed. Pull requests with proposed specification text for outstanding issues are strongly encouraged.

This document was published by the Web Machine Learning Working Group as a Working Draft using the Recommendation track. This document is intended to become a W3C Recommendation.

Publication as a Working Draft does not imply endorsement by W3C and its Members. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 2 November 2021 W3C Process Document.

1. Introduction

The Web Neural Network API defines a web-friendly hardware-agnostic abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms without being tied to platform-specific capabilities. The abstraction layer addresses the requirements of key Machine Learning JavaScript frameworks and also allows web developers familiar with the ML domain to write custom code without the help of libraries. A complementary Model Loader API defines a higher-level abstraction targeting primarily web developers.

For an illustrated introduction, please see the explainer.

2. Use cases

2.1. Application Use Cases

This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models].

2.1.1. Person Detection

A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.

When she comes back, the application automatically detects her and notifies other online users that she is active now.

2.1.2. Semantic Segmentation

A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+] or [MaskR-CNN] to semantically split an image into segments and replaces segments that represent other people and background with another picture.

2.1.3. Skeleton Detection

A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.

2.1.4. Face Recognition

There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD]) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet], which verifies whether two faces would be identical or not.

2.1.5. Facial Landmark Detection

A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.

2.1.6. Style Transfer

A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.

2.1.7. Super Resolution

A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.

2.1.8. Image Captioning

For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.

2.1.9. Machine Translation

Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT], which translates every text into different language.

2.1.10. Emotion Analysis

A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji], which infers emotion from input texts, and displays an emoji that represents the estimated emotion.

2.1.11. Video Summarization

A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM].

2.1.12. Noise Suppression

A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.

2.1.13. Detecting fake video

A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite a user politically or to manipulate user’s opinion. The deepfake detection applications such as [FaceForensics++] analyze the videos and protect a user against the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time.

2.2. Framework Use Cases

This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.

2.2.1. Custom Layer

A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU], [ELU], etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.

2.2.2. Network Concatenation

A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.

Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.

2.2.3. Performance Adaptation

A web application developer has a concern about performance of her DNN model on mobile devices. She has confirmed that it may run too slow on mobile devices which do not have GPU acceleration. To address this issue, her web application refers to the WebNN API to confirm whether acceleration is available or not, so that the application can display the warning for devices without acceleration.

After several weeks, she has developed a tiny DNN model that can even run on CPU. In order to accommodate CPU execution, she modifies the application so that the application loads the tiny model in the case of CPU-only devices.

2.2.4. Operation Level Execution

A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.

2.2.5. Integration with real-time video processing

The user experience of WebRTC-based video conferencing is enhanced using real-time video processing. For example, background blur implemented using a § 2.1.2 Semantic Segmentation model blurs the background in the user’s live camera feed. To satisfy the performance requirements of this use case, the WebNN API integrates with primitives from other Web APIs that make up the media pipeline to allow WebNN API-based transformation of real-time video streams.

3. Security Considerations

This API is disabled by default in all cross-origin frames using the § 7.2.1 Permissions Policy Integration. This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.

This API allows creation of an MLContext from a GPUDevice or WebGLRenderingContext defined by WebGPU and WebGL specifications respectively. See WebGPU Security Considerations and WebGL Security Consideration for more information regarding security characteristics of these contexts.

Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. The bounds checking occurs when the compute method is invoked that executes the graph against the actual data. No actual data is bound to the compiled graph before this stage. It is the implementation’s responsibility to make sure proper bounds checking occurs against the shapes of the data already inferred by that time.

Document operations susceptible to out-of-bounds access as a guidance to implementers.

As a future-proofing measure, the API design allows certain operations that can be generically emulated to be deprecated for security, performance, or other reasons without breaking compatibility. This is made possible by high-level functions that are defined in terms of smaller primitive operations defined in this specifications. This enables a native implementation of a high-level function to be replaced with a polyfill implementation.

Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.

In order to not allow an attacker to target a specific implementation that may contain a flaw, the § 6.2 Device Selection mechanism is a hint only, and the concrete device selection is left to the implementation - a user agent could for instance choose never to run a model on a device with known vulnerabilities. As a further mitigation, no device enumeration mechanism is defined.

Hinting partially mitigates the concern. Investigate additional mitigations.

The API design minimizes the attack surface for the compiled computational graph. The MLGraphBuilder interface that hosts the various operations is a data definition API and as such doesn’t execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the compute() method. This enables implementers to focus on hardening the compute() method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected.

Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [hr-time-3]. The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks.

3.1. Guidelines for new operations

To ensure operations defined in this specification are shaped in a way they can be implemented securely, this section includes guidelines on how operations are expected to be defined to reduce potential for implementation problems. These guidelines are expected to evolve over time to align with industry best practices:

In general, always consider the security and privacy implications as documented in [security-privacy-questionnaire] by the Technical Architecture Group and the Privacy Interest Group when adding new features.

4. Privacy Considerations

This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox.

This API exposes the minimum amount of information necessary to address the identified § 2 Use cases for the best performance and reliability of results.

No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform.

Note: The group is soliciting further input on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.

Implementers of this API are expected to be familiar with the WebGPU Privacy Considerations.

5. Ethical Considerations

The Working Group has started documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account. This work currently happens in a dedicated GitHub repository.

6. Programming Model

6.1. Overview

At the heart of neural networks is a computational graph of mathematical operations. These operations are the building blocks of modern machine learning technologies in computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks.

The MLGraph interface represents a compiled computational graph (that is, a model) and exposes a compute method to perform inference.

The MLGraphBuilder interface serves as a builder (factory) to create a MLGraph. An MLOperand is a representation of data that flows within the computational graph, which include input-values for inference, constants (including trained weights) used for inference, intermediate values (often referred to as activations) computed during inference, as well as the output values of inference. At inference time, every MLOperand will be bound to a tensor (the actual data).

The MLGraphBuilder interface enables the creation of MLOperands. A key part of the MLGraphBuilder interface are the operations (such as gemm() and softmax()). The operations have a functional semantics, with no side effects. Each operation invocation conceptually returns a distinct new value, without changing the value of any other MLOperand.

The build() method of the MLGraphBuilder interface is used to compile and optimize the computation graph used to compute one or more specified outputs. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion.

The compute() method of the MLGraph interface is used to execute the compiled computation graph (to perform inference). The caller supplies the input values using MLNamedInputs, binding the input MLOperands to their values. The caller supplies pre-allocated buffers for output MLOperands using MLNamedOutputs.

The runtime values (of MLOperands) are tensors, which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the array data (such as its shape).

As mentioned above, the operations have a functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation of operations such as reshape, or slice, or squeeze may return a view of its input tensor that shares the same buffer as the input tensor. (In the case of reshape or squeeze, the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values.

6.2. Device Selection

An MLContext interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. An MLContext could be created from a specific GPU device such as GPUDevice or WebGLRenderingContext that is already in use by the application, in which case the corresponding GPUBuffer or WebGLBuffer resources used as graph constants, as well as the GPUTexture and WebGLTexture as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for MLContext must be created from the same adapter as the device used to allocate the resources referenced in the graph.

In a situation when a GPU context executes a graph with a constant or an input in the system memory as an ArrayBufferView, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an ArrayBufferView output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn’t occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller’s perspective.

When an MLContext is created with MLContextOptions, the user agent selects and creates the underlying execution device by taking into account the application’s power preference and device preference specified in the MLPowerPreference and MLDevicePreference options.

The following table summarizes the types of resource supported by the device selected.

Device Type ArrayBufferView GPUBuffer GPUTexture WebGLBuffer WebGLTexture
GPUDevice Yes Yes Yes No No
WebGLRenderingContext Yes No No Yes Yes
default Yes No No No No
gpu Yes No No No No
cpu Yes No No No No

7. API

7.1. navigator.ml

A ML object is available in the Window and DedicatedWorkerGlobalScope contexts through the Navigator and WorkerNavigator interfaces respectively and is exposed via navigator.ml:

interface mixin NavigatorML {
  [SecureContext, SameObject] readonly attribute ML ml;
};
Navigator includes NavigatorML;
WorkerNavigator includes NavigatorML;

7.2. ML

enum MLDevicePreference {
  "default",
  "gpu",
  "cpu"
};

enum MLPowerPreference {
  "default",
  "high-performance",
  "low-power"
};

dictionary MLContextOptions {
  MLDevicePreference devicePreference = "default";
  MLPowerPreference powerPreference = "default";
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  MLContext createContext(optional MLContextOptions options = {});
  MLContext createContext(WebGLRenderingContext glContext);
  MLContext createContext(GPUDevice gpuDevice);
};

The createContext() method steps are:

  1. If this's relevant global object's associated Document is not allowed to use the webnn feature, then throw a "SecurityError" DOMException and abort these steps.

  2. Let context be a new MLContext object.

  3. Switch on the method’s first argument:

    MLContextOptions
    Set context.[[contextType]] to default.
    Set context.[[devicePreference]] to the value of MLContextOptions's devicePreference member.
    Set context.[[powerPreference]] to the value of MLContextOptions's powerPreference member.
    WebGLRenderingContext
    Set context.[[contextType]] to webgl.
    Set context.[[devicePreference]] to "gpu".
    Set context.[[powerPreference]] to "default".
    GPUDevice
    Set context.[[contextType]] to webgpu.
    Set context.[[devicePreference]] to "gpu".
    Set context.[[powerPreference]] to "default".
    Otherwise
    Set context.[[contextType]] to default.
    Set context.[[devicePreference]] to "default".
    Set context.[[powerPreference]] to "default".
  4. Return context.

Note: When [[contextType]] is set to "webgl" or "webgpu", device preference "gpu" is implied and [[devicePreference]] is set to "gpu" and [[powerPreference]] is set to "default".

7.2.1. Permissions Policy Integration

This specification defines a policy-controlled feature identified by the string "webnn". Its default allowlist is 'self'.

7.3. MLContext

The MLContext interface represents a global state of neural network compute workload and execution processes. Each MLContext object has associated context type, device preference and power preference.

The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:

"default"
Context created per the user agent’s preference.
"webgl"
Context created from WebGL rendering context.
"webgpu"
Context created from WebGPU device.

The device preference indicates the preferred kind of device to be used. It is one of the following:

"default"
The user agent selects the most suitable device to use.
"gpu"
Provides the broadest range of achievable performance across graphics hardware platforms from consumer devices to professional workstations.
"cpu"
Provides the broadest reach of software compute availability, but with limited scalability of execution performance on the more complex neural networks.

The power preference indicates preference as related to power consumption. It is one of the following:

"default"
Let the user agent select the most suitable behavior.
"high-performance"
Prioritizes execution speed over power consumption.
"low-power"
Prioritizes power consumption over other considerations such as execution speed.
[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLContext {};

MLContext has the following internal slots:

[[contextType]] of type context type

The MLContext's context type.

[[devicePreference]] of type device preference

The MLContext's device preference.

[[powerPreference]] of type power preference

The MLContext's power preference.

7.4. MLOperandDescriptor

enum MLInputOperandLayout {
  "nchw",
  "nhwc"
};

enum MLOperandType {
  "float32",
  "float16",
  "int32",
  "uint32",
  "int8",
  "uint8"
};

dictionary MLOperandDescriptor {
  // The operand type.
  required MLOperandType type;

  // The dimensions field is only required for tensor operands.
  // The negative value means an unknown dimension.
  sequence<long> dimensions;
};

7.5. MLOperand

An MLOperand represents an intermediary graph being constructed as a result of compositing parts of an operation into a fully composed operation.

For instance, an MLOperand may represent a constant feeding to an operation or the result from combining multiple constants together into an operation. See also § 6 Programming Model.

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLOperand {};

See also § 3.1 Guidelines for new operations

7.6. MLOperator

Objects implementing the MLOperator interface represent activation function types. As a generic construct, this interface may be reused for other types in a future version of this specification.

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLOperator {};
These activations function types are used to create other operations. One such use of this interface is for when an activation function is fused into another operation such as § 7.7.4 conv2d or § 7.7.1 batchNormalization during a graph construction session.
The implementation of the MLOperator interface can simply be a struct that holds a string type of the activation function along with other properties needed. The actual creation of the activation function e.g. a § 7.7.24 sigmoid or § 7.7.21 relu can then be deferred until when the rest of the graph is ready to connect with it such as during the construction of § 7.7.4 conv2d for example.

7.7. MLGraphBuilder

The MLGraphBuilder interface defines a set of operations as identified by the § 2 Use cases that can be composed into a computational graph. It also represents the intermediate state of a graph building session.

typedef record<DOMString, MLOperand> MLNamedOperands;

dictionary MLBufferResourceView {
  required (WebGLBuffer or GPUBuffer) resource;
  unsigned long long offset = 0;
  unsigned long long size;
};

typedef (ArrayBufferView or MLBufferResourceView) MLBufferView;

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraphBuilder {
  // Construct the graph builder from the context.
  constructor(MLContext context);

  // Create an operand for a graph input.
  MLOperand input(DOMString name, MLOperandDescriptor desc);

  // Create an operand for a graph constant.
  MLOperand constant(MLOperandDescriptor desc, MLBufferView bufferView);

  // Create a single-value operand from the specified number of the specified type.
  MLOperand constant(double value, optional MLOperandType type = "float32");

  // Compile the graph up to the specified output operands
  MLGraph build(MLNamedOperands outputs);
};

7.7.1. batchNormalization

Normalize the tensor values of input features across the batch dimension using [Batch-Normalization]. For each input feature, the mean and variance values of that feature supplied in this calculation as parameters are previously computed across the batch dimension of the input during the model training phase of this operation.
dictionary MLBatchNormalizationOptions {
  MLOperand scale;
  MLOperand bias;
  long axis = 1;
  float epsilon = 1e-5;
  MLOperator activation;
};

partial interface MLGraphBuilder {
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance,
                             optional MLBatchNormalizationOptions options = {});
};
Arguments:

Returns: an MLOperand. The batch-normalized N-D tensor of the same shape as the input tensor.

When input is a 4-D tensor of the "nchw" or "nhwc" layout, options.axis should be set to 1 or 3 respectively. The axis value designates the feature or channel count dimension of the input tensor.

The behavior of this operation when the input tensor is 4-D of the "nchw" layout and the activation is of operator type relu can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
const shape = [1,-1,1,1];
return builder.relu(
  builder.add(
    builder.mul(
      builder.reshape(options.scale, shape),
      builder.div(
        builder.sub(input, builder.reshape(mean, shape)),
        builder.pow(
          builder.add(builder.reshape(variance, shape), builder.constant(options.epsilon)),
          builder.constant(0.5))
        )),
    builder.reshape(options.bias, shape)));

7.7.2. clamp

Clamp the input tensor element-wise within a range specified by the minimum and maximum values.
dictionary MLClampOptions {
  float minValue;
  float maxValue;
};

partial interface MLGraphBuilder {
  MLOperand clamp(MLOperand x, optional MLClampOptions options = {});
  MLOperator clamp(optional MLClampOptions options = {});
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
if (options.minValue === undefined) {
  if (options.maxValue === undefined) {
    return x;
  } else {
    return builder.min(x, builder.constant(options.maxValue));
  }
} else {
  if (options.maxValue === undefined) {
    return builder.max(x, builder.constant(options.minValue));
  } else {
    return builder.min(
        builder.max(x, builder.constant(options.minValue)),
        builder.constant(options.maxValue));
  }
}

7.7.3. concat

Concatenates the input tensors along a given axis.
partial interface MLGraphBuilder {
  MLOperand concat(sequence<MLOperand> inputs, long axis);
};
Arguments:

Returns: an MLOperand. The concatenated tensor of all the inputs along the axis. The output tensor has the same shape except on the dimension that all the inputs concatenated along. The size of that dimension is computed as the sum of all the input sizes of the same dimension.

7.7.4. conv2d

Compute a 2-D convolution given 4-D input and filter tensors
enum MLConv2dFilterOperandLayout {
  "oihw",
  "hwio",
  "ohwi",
  "ihwo"
};

enum MLAutoPad {
  "explicit",
  "same-upper",
  "same-lower"
};

dictionary MLConv2dOptions {
  sequence<long> padding;
  sequence<long> strides;
  sequence<long> dilations;
  MLAutoPad autoPad = "explicit";
  long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConv2dFilterOperandLayout filterLayout = "oihw";
  MLOperand bias;
  MLOperator activation;
};

partial interface MLGraphBuilder {
  MLOperand conv2d(MLOperand input, MLOperand filter, optional MLConv2dOptions options = {});
};
Arguments:

Returns: an MLOperand. The output 4-D tensor that contains the convolution result. The output shape is interpreted according to the options.inputLayout value. More specifically, the spatial dimensions or the sizes of the last two dimensions of the output tensor for the nchw input layout can be calculated as follow:

output size = 1 + (input size - filter size - (filter size - 1) * (dilation - 1) + beginning padding + ending padding) / stride

A depthwise conv2d operation is a variant of grouped convolution, used in models like the MobileNet, where the options.groups = input_channels = output_channels and the shape of filter tensor is [options.groups, 1, height, width] for "oihw" layout, [height, width, 1, options.groups] for "hwio" layout, [options.groups, height, width, 1] for "ohwi" layout and [1, height, width, options.groups] for "ihwo" layout.

7.7.5. convTranspose2d

Compute a 2-D transposed convolution given 4-D input and filter tensors
enum MLConvTranspose2dFilterOperandLayout {
  "iohw",
  "hwoi",
  "ohwi"
};

dictionary MLConvTranspose2dOptions {
  sequence<long> padding;
  sequence<long> strides;
  sequence<long> dilations;
  sequence<long> outputPadding;
  sequence<long> outputSizes;
  MLAutoPad autoPad = "explicit";
  long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConvTranspose2dFilterOperandLayout filterLayout = "iohw";
  MLOperand bias;
  MLOperator activation;
};

partial interface MLGraphBuilder {
  MLOperand convTranspose2d(MLOperand input, MLOperand filter,
                            optional MLConvTranspose2dOptions options = {});
};
Arguments:

Returns: an MLOperand. The output 4-D tensor that contains the transposed convolution result. The output shape is interpreted according to the options.inputLayout value. More specifically, unless the options.outputSizes values are explicitly specified, the options.outputPadding may be needed to compute the spatial dimension values of the output tensor as follow:

output size = (input size - 1) * stride + filter size + (filter size - 1) * (dilation - 1) - beginning padding - ending padding + output padding

7.7.6. element-wise binary operations

Compute the element-wise binary addition, subtraction, multiplication, division, maximum and minimum of the two input tensors.
partial interface MLGraphBuilder {
  MLOperand add(MLOperand a, MLOperand b);
  MLOperand sub(MLOperand a, MLOperand b);
  MLOperand mul(MLOperand a, MLOperand b);
  MLOperand div(MLOperand a, MLOperand b);
  MLOperand max(MLOperand a, MLOperand b);
  MLOperand min(MLOperand a, MLOperand b);
  MLOperand pow(MLOperand a, MLOperand b);
};
Arguments:

Returns: an MLOperand. The output tensor that contains the result of element-wise binary operation of the two input tensors.

The element-wise binary operation will be broadcasted according to [numpy-broadcasting-rule]. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.

Operation types:

7.7.7. element-wise unary operations

Compute the element-wise unary operation for input tensor.
partial interface MLGraphBuilder {
  MLOperand abs(MLOperand x);
  MLOperand ceil(MLOperand x);
  MLOperand cos(MLOperand x);
  MLOperand exp(MLOperand x);
  MLOperand floor(MLOperand x);
  MLOperand log(MLOperand x);
  MLOperand neg(MLOperand x);
  MLOperand sin(MLOperand x);
  MLOperand tan(MLOperand x);
};
Arguments:

Returns: an MLOperand. The output tensor that contains the result of element-wise unary operation of the input tensor. The shape of the output tensor is the same as the shape of input tensor.

Operation types:

7.7.8. elu

Calculate the exponential linear unit function on the input tensor element-wise. The calculation follows the expression max(0, x) + alpha * (exp(min(0, x)) - 1).
dictionary MLEluOptions {
  float alpha = 1;
};

partial interface MLGraphBuilder {
  MLOperand elu(MLOperand x, optional MLEluOptions options = {});
  MLOperator elu(optional MLEluOptions options = {});
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.add(
          builder.max(0, x),
          builder.mul(
            builder.constant(options.alpha), 
            builder.sub(
              builder.exp(builder.min(builder.constant(0), x)), 
              builder.constant(1))));

7.7.9. gemm

Calculate the general matrix multiplication of the Basic Linear Algebra Subprograms. The calculation follows the expression alpha * A * B + beta * C, where A is a 2-D tensor with shape [M, K] or [K, M], B is a 2-D tensor with shape [K, N] or [N, K], and C is broadcastable to the shape [M, N]. A and B may optionally be transposed prior to the calculation.
dictionary MLGemmOptions {
  MLOperand c;
  float alpha = 1.0;
  float beta = 1.0;
  boolean aTranspose = false;
  boolean bTranspose = false;
};

partial interface MLGraphBuilder {
  MLOperand gemm(MLOperand a, MLOperand b, optional MLGemmOptions options = {});
};
Arguments:

Returns: an MLOperand. The output 2-D tensor of shape [M, N] that contains the calculated product of all the inputs.

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
if (options.aTranspose)
  a = builder.transpose(a);

if (options.bTranspose)
  b = builder.transpose(b);

let ab = builder.matmul(builder.mul(builder.constant(options.alpha), a), b);
return (c ? builder.add(ab, builder.mul(builder.constant(options.beta), c)) : ab);

7.7.10. gru

Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of the Network
enum MLRecurrentNetworkWeightLayout {
  "zrn",  // update-reset-new gate ordering
  "rzn"   // reset-update-new gate ordering
};

enum MLRecurrentNetworkDirection {
  "forward",
  "backward",
  "both"
};

dictionary MLGruOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand initialHiddenState;
  boolean resetAfter = true;
  boolean returnSequence = false;
  MLRecurrentNetworkDirection direction = "forward";
  MLRecurrentNetworkWeightLayout layout = "zrn";
  sequence<MLOperator> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> gru(MLOperand input, MLOperand weight, MLOperand recurrentWeight, 
                        long steps, long hiddenSize, optional MLGruOptions options = {});
};
Arguments:

Returns: a sequence of MLOperand. The first element of the sequence is a 3-D tensor of shape [num_directions, batch_size, hidden_size], the cell output from the last time step of the network. Additionally, if returnSequence is set to true, the second element is the 4-D output tensor of shape [steps, num_directions, batch_size, hidden_size] containing every cell outputs from each time step in the temporal sequence.

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
const numDirections = (options.direction == "both" ? 2 : 1);
let hiddenState = options.initialHiddenState;

if (!hiddenState) {
  const desc = { type: 'float32', dimensions: [numDirections, 1, hiddenSize] };
  const totalSize = numDirections * hiddenSize;
  hiddenState = builder.constant(desc, new Float32Array(totalSize).fill(0));
}

let sequence = null;
let cellWeight = [];
let cellRecurrentWeight = [];
let cellBias = [];
let cellRecurrentBias = [];

for (let slot = 0; slot < numDirections; ++slot) {
  cellWeight.push(builder.squeeze(builder.slice(weight, [slot, 0, 0], [1, -1, -1]), { axes: [0] }));
  cellRecurrentWeight.push(builder.squeeze(builder.slice(recurrentWeight, [slot, 0, 0], [1, -1, -1]), { axes: [0] }));
  cellBias.push(options.bias ? (builder.squeeze(builder.slice(options.bias, [slot, 0], [1, -1]), { axes: [0] })) : null);
  cellRecurrentBias.push(options.recurrentBias ? 
    (builder.squeeze(builder.slice(options.recurrentBias, [slot, 0], [1, -1]), { axes: [0] })) : null);
}

for (let step = 0; step < steps; ++step) {
  let cellHidden = [];
  let cellOutput = null;

  for (let slot = 0; slot < numDirections; ++slot) {
    cellHidden.push(builder.squeeze(builder.slice(hiddenState, [slot, 0, 0], [1, -1, -1]), { axes: [0] }));
  }

  for (let slot = 0; slot < numDirections; ++slot) {
    let slice = (slot == 1 || options.direction == "backward" ? steps - step - 1 : step);
    let cellInput = builder.squeeze(builder.slice(input, [slice, 0, 0], [1, -1, -1]), { axes: [0] });

    let result = builder.reshape(
      builder.gruCell(
        cellInput, cellWeight[slot], cellRecurrentWeight[slot],
        cellHidden[slot], hiddenSize, { bias: cellBias[slot],
        recurrentBias: cellRecurrentBias[slot], resetAfter: options.resetAfter,
        layout: options.layout, activations: options.activations }),
      [1, -1, hiddenSize]);

    cellOutput = (cellOutput ? builder.concat([cellOutput, result], 0) : result);
  }

  hiddenState = cellOutput;

  if (options.returnSequence) {
    cellOutput = builder.reshape(cellOutput, [1, numDirections, -1, hiddenSize]);
    sequence = (sequence ? builder.concat([sequence, cellOutput], 0) : cellOutput);
  }
}

return (sequence ? [hiddenState, sequence] : [hiddenState]);

7.7.11. gruCell

A single time step of the Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.
dictionary MLGruCellOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  boolean resetAfter = true;
  MLRecurrentNetworkWeightLayout layout = "zrn";
  sequence<MLOperator> activations;
};

partial interface MLGraphBuilder {
  MLOperand gruCell(MLOperand input, MLOperand weight, MLOperand recurrentWeight, 
                  MLOperand hiddenState, long hiddenSize, optional MLGruCellOptions options = {});
};
Arguments:

Returns: an MLOperand. The 2-D tensor of shape [batch_size, hidden_size], the cell output hidden state of a single time step of the recurrent network.

The behavior of this operation when the activations of the update/reset gate and new gate are of the operator types sigmoid and tanh respectively can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
const one = builder.constant(1);
const zero = builder.constant(0);

// update gate
let z = builder.sigmoid(
  builder.add(
    builder.add(
      (options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero), 
      (options.recurrentBias ? builder.slice(options.recurrentBias, [0], [hiddenSize]) : zero)
      ),
    builder.add(
      builder.matmul(
        input, 
        builder.transpose(builder.slice(weight, [0, 0], [hiddenSize, -1]))
        ),
      builder.matmul(
        hiddenState,
        builder.transpose(builder.slice(recurrentWeight, [0, 0], [hiddenSize, -1]))
        )
      )
    )
  );

// reset gate
let r = builder.sigmoid(
  builder.add(
    builder.add(
      (options.bias ? builder.slice(options.bias, [hiddenSize], [hiddenSize]) : zero),
      (options.recurrentBias ? builder.slice(options.recurrentBias, [hiddenSize], [hiddenSize]) : zero)
      ),
    builder.add(
      builder.matmul(
        input, 
        builder.transpose(builder.slice(weight, [hiddenSize, 0], [hiddenSize, -1]))
        ),
      builder.matmul(
        hiddenState, 
        builder.transpose(builder.slice(recurrentWeight, [hiddenSize, 0], [hiddenSize, -1]))
        )
      )
    )
  );

// new gate
let n;
if (resetAfter) {
  n = builder.tanh(
    builder.add(
      (options.bias ? builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) : zero),
      builder.add(
        builder.matmul(
          input, 
          builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1]))
          ),
        builder.mul(
          r,
          builder.add(
            (options.recurrentBias ? builder.slice(options.recurrentBias, [2 * hiddenSize], [hiddenSize]) : zero),
            builder.matmul(
              hiddenState, 
              builder.transpose(builder.slice(recurrentWeight, [2 * hiddenSize, 0], [hiddenSize, -1]))
              )
            )
          )
        )
      )
    );
}
else {
  n = builder.tanh(
    builder.add(
      builder.add(
        (options.bias ? builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) : zero),
        (options.recurrentBias ? builder.slice(options.recurrentBias, [2 * hiddenSize], [hiddenSize]) : zero)
        ),
      builder.add(
        builder.matmul(
          input, 
          builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1]))
          ),
        builder.matmul(
          builder.mul(r, hiddenState),
          builder.transpose(builder.slice(recurrentWeight, [2 * hiddenSize, 0], [hiddenSize, -1]))
          )
        )
      )
    );
}

// compute the new hidden state
return builder.add(builder.mul(z, hiddenState), builder.mul(n, builder.sub(one, z)));

7.7.12. hardSigmoid

Calculate the non-smooth function used in place of a sigmoid function on the input tensor.
dictionary MLHardSigmoidOptions {
  float alpha = 0.2;
  float beta = 0.5;
};

partial interface MLGraphBuilder {
  MLOperand hardSigmoid(MLOperand x, optional MLHardSigmoidOptions options = {});
  MLOperator hardSigmoid(optional MLHardSigmoidOptions options = {});
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.max(
           builder.min(
               builder.add(
                   builder.mul(builder.constant(options.alpha), x),
                   builder.constant(options.beta)), 
               builder.constant(1)),
           builder.constant(0));

7.7.13. hardSwish

Computes the nonlinear function y = x * max(0, min(6, (x + 3))) / 6 that is introduced by [MobileNetV3] on the input tensor element-wise.
partial interface MLGraphBuilder {
  MLOperand hardSwish(MLOperand x);
  MLOperator hardSwish();
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.div(
           builder.mul(
               x,
               builder.max(
                   builder.constant(0),
                   builder.min(
                       builder.constant(6),
                       builder.add(x, builder.constant(3))))),
           builder.constant(6));

7.7.14. instanceNormalization

Normalize the input features using [Instance-Normalization]. Unlike § 7.7.1 batchNormalization where the mean and variance values used in the calculation are previously computed across the batch dimension during the model training phase, the mean and variance values used in the calculation of an instance normalization are computed internally on the fly per input feature.
dictionary MLInstanceNormalizationOptions {
  MLOperand scale;
  MLOperand bias;
  float epsilon = 1e-5;
  MLInputOperandLayout layout = "nchw";
};

partial interface MLGraphBuilder {
  MLOperand instanceNormalization(MLOperand input, 
                                optional MLInstanceNormalizationOptions options = {});
};
Arguments:

Returns: an MLOperand. The instance-normalized 4-D tensor of the same shape as the input tensor.

The behavior of this operation when the input tensor is 4-D of the "nchw" layout can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
// The mean reductions happen over the spatial dimensions of the input
// e.g. axis 2 and 3 of the input tensor.
const reduceOptions = { axes: [2,3], keepDimensions: true };
const mean = builder.reduceMean(input, reduceOptions);
const variance = builder.reduceMean(
  builder.pow(
    builder.sub(input, mean), 
    buider.constant(2)),
  reduceOptions
  );

// The scale and bias values are applied per input feature
// e.g. axis 1 of the input tensor.
const shape = [1,-1,1,1];
return builder.add(
  builder.mul(
    builder.reshape(options.scale, shape),
    builder.div(
      builder.sub(input, mean),
      buidler.pow(
        builder.add(variance, options.epsilon), 
        builder.constant(0.5))
      )
    ),
  builder.reshape(options.bias, shape)
  );

7.7.15. leakyRelu

Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expression max(0, x) + alpha ∗ min(0, x).
dictionary MLLeakyReluOptions {
  float alpha = 0.01;
};

partial interface MLGraphBuilder {
  MLOperand leakyRelu(MLOperand x, optional MLLeakyReluOptions options = {});
  MLOperator leakyRelu(optional MLLeakyReluOptions options = {});
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.add(builder.max(builder.constant(0), x),
          builder.mul(builder.constant(options.alpha), builder.min(builder.constant(0), x)));

7.7.16. matmul

Compute the matrix product of two input tensors.
partial interface MLGraphBuilder {
  MLOperand matmul(MLOperand a, MLOperand b);
};
Arguments:

Returns: an MLOperand. The output N-D tensor that contains the matrix product of two input tensors.

Compute the matrix product of two input tensors. It behaves as following:

7.7.17. linear

Calculate a linear function y = alpha * x + beta on the input tensor.
dictionary MLLinearOptions {
  float alpha = 1;
  float beta = 0;
};

partial interface MLGraphBuilder {
  MLOperand linear(MLOperand x, optional MLLinearOptions options = {});
  MLOperator linear(optional MLLinearOptions options = {});
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.add(
          builder.mul(x, builder.constant(options.alpha)), 
          builder.constant(options.beta));

7.7.18. pad

Inflate the tensor with constant or mirrored values on the edges.
enum MLPaddingMode {
  "constant",
  "edge",
  "reflection",
  "symmetric"
};

dictionary MLPadOptions {
  MLPaddingMode mode = "constant";
  float value = 0;
};

partial interface MLGraphBuilder {
  MLOperand pad(MLOperand input, MLOperand padding, optional MLPadOptions options = {});
};
Arguments:

Returns: an MLOperand. The padded output tensor.

// input: [[1,2,3], [4,5,6]]
const input = builder.constant(
  { type: 'float32', dimensions: [2,3] }, new Float32Array([1,2,3,4,5,6]));

// padding: [[1,1], [2,2]]
const padding = builder.constant(
  { type: 'float32', dimensions: [2,2] }, new Float32Array([1,1,2,2]));

// "constant" padded:
//    [[0,0,0,0,0,0,0],
//     [0,0,1,2,3,0,0],
//     [0,0,4,5,6,0,0],
//     [0,0,0,0,0,0,0]]
builder.pad(input, padding);

// "edge" padded:
//    [[1,1,1,2,3,3,3],
//     [1,1,1,2,3,3,3],
//     [4,4,4,5,6,6,6],
//     [4,4,4,5,6,6,6]]
builder.pad(input, padding, { mode: "edge" });

// "reflection" padded:
//    [[6,5,4,5,6,5,4],
//     [3,2,1,2,3,2,1],
//     [6,5,4,5,6,5,4],
//     [3,2,1,2,3,2,1]]
builder.pad(input, padding, { mode: "reflection" });

// "symmetric" padded:
//    [[2,1,1,2,3,3,2],
//     [2,1,1,2,3,3,2],
//     [5,4,4,5,6,6,5],
//     [5,4,4,5,6,6,5]]
builder.pad(input, padding, { mode: "symmetric" });

7.7.19. pooling operations

Compute a mean, L2 norm, or max reduction operation across all the elements within the moving window over the input tensor. See the description of each type of reduction in § 7.7.20 reduction operations.
enum MLRoundingType {
  "floor",
  "ceil"
};

dictionary MLPool2dOptions {
  sequence<long> windowDimensions;
  sequence<long> padding;
  sequence<long> strides;
  sequence<long> dilations;
  MLAutoPad autoPad = "explicit";
  MLInputOperandLayout layout = "nchw";
  MLRoundingType roundingType = "floor";
  sequence<long> outputSizes;
};

partial interface MLGraphBuilder {
  MLOperand averagePool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand l2Pool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand maxPool2d(MLOperand input, optional MLPool2dOptions options = {});
};
Arguments:

Returns: an MLOperand. The output 4-D tensor that contains the result of the reduction. The logical shape is interpreted according to the value of layout. More specifically, if the options.roundingType is "floor", the spatial dimensions of the output tensor can be calculated as follow:

output size = floor(1 + (input size - filter size + beginning padding + ending padding) / stride)

or if options.roundingType is "ceil":

output size = ceil(1 + (input size - filter size + beginning padding + ending padding) / stride)

A global pooling operation such as one for the max pooling operation is a variant of pooling where the window dimensions is the spatial dimensions (last two dimensions) of the input shape, as follow.
// 'global' max pooling
builder.maxPool2d(input);

7.7.20. reduction operations

Reduce the input along the dimensions given in axes.
dictionary MLReduceOptions {
  sequence<long> axes = null;
  boolean keepDimensions = false;
};

partial interface MLGraphBuilder {
  MLOperand reduceL1(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceL2(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSumExp(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMax(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMean(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMin(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceProduct(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSumSquare(MLOperand input, optional MLReduceOptions options = {});
};
Arguments:

Returns: an MLOperand. The reduced output tensor.

Reduction types:

7.7.21. relu

Compute the rectified linear function of the input tensor.
partial interface MLGraphBuilder {
  MLOperand relu(MLOperand x);
  MLOperator relu();
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.max(builder.constant(0), x);

7.7.22. resample2d

Resample the tensor values from the source to the destination spatial dimensions according to the scaling factors.
enum MLInterpolationMode {
  "nearest-neighbor",
  "linear"
};

dictionary MLResample2dOptions {
  MLInterpolationMode mode = "nearest-neighbor";
  sequence<float> scales;
  sequence<long> sizes;
  sequence<long> axes;
};

partial interface MLGraphBuilder {
  MLOperand resample2d(MLOperand input, optional MLResample2dOptions options = {});
};
Arguments:

Returns: an MLOperand. The output 4-D tensor.

7.7.23. reshape

Alter the shape of a tensor to a new shape. Reshape does not copy or change the content of the tensor. It just changes the tensor’s logical dimensions for the subsequent operations.
partial interface MLGraphBuilder {
  MLOperand reshape(MLOperand input, sequence<long> newShape);
};
Arguments:

Returns: an MLOperand. The output tensor. The values of the output tensor are the same as values of the input tensor. The shape of the output tensor is specified by the newShape argument.

7.7.24. sigmoid

Compute the sigmoid function of the input tensor. The calculation follows the expression 1 / (exp(-x) + 1).
partial interface MLGraphBuilder {
  MLOperand sigmoid(MLOperand x);
  MLOperator sigmoid();
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.div(
          builder.constant(1),
          builder.add(
            builder.exp(builder.neg(x)), 
            builder.constant(1)));

7.7.25. slice

Produce a slice of the input tensor.
dictionary MLSliceOptions {
  sequence<long> axes;
};

partial interface MLGraphBuilder {
  MLOperand slice(MLOperand input, sequence<long> starts, sequence<long> sizes,
                optional MLSliceOptions options = {});
};
Arguments:

Returns: an MLOperand. The output tensor of the same rank as the input tensor with tensor values stripped to the specified starting and ending indices in each dimension.

7.7.26. softmax

Compute the softmax values of the 2-D input tensor along axis 1.
partial interface MLGraphBuilder {
  MLOperand softmax(MLOperand x);
};
Arguments:

Returns: an MLOperand. The output 2-D tensor that contains the softmax results, of the same shape as the input tensor.

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
// This sample deploys a well-known implementation trick [1] to compute the
// exponentials of the distances to the max value, instead of the exponentials
// of the input values itself, in order to increase the numerical stability of
// the result.
// [1]: https://cs231n.github.io/linear-classify/#softmax
const max_x = builder.reduceMax(x, { axes: [1], keepDimensions: true });
const exp_x = builder.exp(builder.sub(x, max_x));
return builder.div(exp_x, builder.reduceSum(exp_x, { axes: [1], keepDimensions: true }));

7.7.27. softplus

Compute the softplus function of the input tensor. The calculation follows the expression ln(1 + exp(steepness * x)) / steepness.
dictionary MLSoftplusOptions {
  float steepness = 1;
};

partial interface MLGraphBuilder {
  MLOperand softplus(MLOperand x, optional MLSoftplusOptions options = {});
  MLOperator softplus(optional MLSoftplusOptions options = {});
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.div(
          builder.log(
            builder.add(
              builder.exp(builder.mul(x, builder.constant(options.steepness))),
              builder.constant(1))),
          builder.constant(options.steepness));

7.7.28. softsign

Compute the softsign function of the input tensor. The calculation follows the expression x / (1 + |x|).
partial interface MLGraphBuilder {
  MLOperand softsign(MLOperand x);
  MLOperator softsign();
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.div(x, builder.add(builder.constant(1), build.abs(x)));

7.7.29. split

Split the input tensor into a number of sub tensors along the given axis.
dictionary MLSplitOptions {
  long axis = 0;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> split(MLOperand input,
                          (unsigned long or sequence<unsigned long>) splits,
                          optional MLSplitOptions options = {});
};
Arguments:

Returns: a sequence of MLOperand. The splitted output tensors. If splits is an unsigned long, the length of the output sequence equals to splits. The shape of each output tensor is the same as input except the dimension size of axis equals to the quotient of dividing the dimension size of input along axis by splits. If splits is a sequence of unsigned long, the length of the output sequence equals to the length of splits. The shape of the i-th output tensor is the same as as input except along axis where the dimension size is splits[i].

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
// This sample shows the case that the splits parameter is an array.
const outputs = [];
let start = 0;
for (const size of splits) {
  outputs.push(builder.slice(input, [start], [size], { axis: [options.axis] }));
  start += size;
}
return outputs;

7.7.30. squeeze

Reduce the rank of a tensor by eliminating dimensions with size 1 of the tensor shape. Squeeze only affects the tensor’s logical dimensions. It does not copy or change the content in the tensor.
dictionary MLSqueezeOptions {
  sequence<long> axes;
};

partial interface MLGraphBuilder {
  MLOperand squeeze(MLOperand input, optional MLSqueezeOptions options = {});
};
Arguments:

Returns: an MLOperand. The output tensor of the same or reduced rank with the shape dimensions of size 1 eliminated.

7.7.31. tanh

Compute the hyperbolic tangent function of the input tensor. The calculation follows the expression (exp(2 * x) - 1) / (exp(2 * x) + 1).
partial interface MLGraphBuilder {
  MLOperand tanh(MLOperand x);
  MLOperator tanh();
};
Arguments:

Returns:

The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
return builder.div(
          builder.sub(builder.exp(builder.mul(builder.constant(2), x)), builder.constant(1)),
          builder.add(builder.exp(builder.mul(builder.constant(2), x)), builder.constant(1)));

7.7.32. transpose

Permute the dimensions of the input tensor according to the permutation argument.
dictionary MLTransposeOptions {
  sequence<long> permutation;
};

partial interface MLGraphBuilder {
  MLOperand transpose(MLOperand input, optional MLTransposeOptions options = {});
};
Arguments:

Returns: an MLOperand. The permuted or transposed N-D tensor.

7.8. MLGraph

The MLGraph interface represents a compiled computational graph. A compiled graph once constructed is immutable and cannot be subsequently changed.
typedef (MLBufferView or WebGLTexture or GPUTexture) MLResource;

dictionary MLInput {
  required MLResource resource;
  required sequence<long> dimensions;
};

typedef record<DOMString, (MLResource or MLInput)> MLNamedInputs;
typedef record<DOMString, MLResource> MLNamedOutputs;

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraph {
  undefined compute(MLNamedInputs inputs, MLNamedOutputs outputs);
};

MLGraph has the following internal slots:

[[context]] of type MLContext

The context of type MLContext associated with this MLGraph.

[[inputDescriptors]] of type record<DOMString, MLOperandDescriptor>

Maps the name of an input MLOperand to its MLOperandDescriptor for all input MLOperands of this MLGraph.

[[outputNames]] of type sequence<DOMString>

Contains the names of all output MLOperands of this MLGraph.

[[implementation]]

The underlying implementation provided by the User Agent.

compute(inputs, outputs)

Compute the MLGraph given MLNamedInputs and MLNamedOutputs. Return once the compute has completed and the results in MLNamedOutputs are ready to be consumed.

Called on: MLGraph this.

Arguments:

Arguments for the MLGraph.compute(inputs, outputs) method.
Parameter Type Nullable Optional Description
inputs MLNamedInputs an MLNamedInputs. The resources and optional dimensions of inputs for the compute.
outputs MLNamedOutputs an MLNamedOutputs. The pre-allocated resources of required outputs for the compute.

Returns: undefined.

  1. If any of the following requirements are unmet, then throw a DataError DOMException and stop.

    1. For each key -> value of inputs:

      1. this.[[inputDescriptors]][key] must exist.

      2. Let inputDesc be this.[[inputDescriptors]][key].

      3. Let inputSize be 1.

      4. If value is an MLInput, then:

        1. The length of value.dimensions must be the same as the length of inputDesc.dimensions.

        2. Let i be 0.

        3. While true:

          1. Let dimension be value.dimensions[i].

          2. dimension must be greater than 0.

          3. If inputDesc.dimensions[i] is greater than 0, then dimension must be equal to inputDesc.dimensions[i].

          4. Set inputSize to the product of inputSize and dimension.

          5. Increment i by 1.

          6. If i if equal to the length of value.dimensions, then break.

      5. Else:

        1. For each dimension of inputDesc.dimensions:

          1. The value of dimension must be greater than 0.

          2. Set inputSize to the product of inputSize and dimension.

      6. If value is an MLInput, then let resource be value.resource.

      7. If value is an MLResource, then let resource be value.

      8. If resource is an ArrayBufferView, then:

        1. The kind of resource must be compatible with inputDesc.type according to this table.

        2. The length of resource must be the same as inputSize.

    2. For each key -> value of outputs:

      1. this.[[outputNames]][key] must exist.

  2. For each key -> value of inputs:

    1. Let inputDesc be this.[[inputDescriptors]][key].

    2. Let inputTensor be a new tensor for this.[[implementation]] of data type that is compatible with inputDesc.type.

    3. If value is an MLInput, then:

      1. Set the dimensions of inputTensor to value.dimensions.

    4. Else:

      1. Set the dimensions of inputTensor to inputDesc.dimensions.

    5. If value is an MLInput, then:

      1. Set the values of inputTensor to the values of value.resource.

    6. If value is an MLResource, then:

      1. Set the values of inputTensor to the values of value.

    7. Set the input of this.[[implementation]] that is associated with key to inputTensor.

  3. For each key -> value of outputs:

    1. Issue a compute request for output of this.[[implementation]] that is associated with key.

    2. Wait for the compute request to be completed.

    3. If there is an error returned by this.[[implementation]], then:

      1. Throw an OperationError DOMException and stop.

    4. Else:

      1. Let outputTensor be the output tensor returned by this.[[implementation]].

      2. If the kind of value is not compatible with the value type of outputTensor, then throw a DataError DOMException and stop.

      3. Let outputSize be 1.

      4. For each dimension of dimensions of outputTensor:

        1. Set outputSize to the product of outputSize and dimension.

      5. If outputSize is greater than the length of value, then:

        1. Throw a DataError DOMException and stop.

      6. Else:

        1. Set the values of value to the values of outputTensor.

  4. Return undefined.

Describe the algorithm steps for this.[[context]] created from WebGLRenderingContext and GPUDevice.

7.8.1. Examples

The following code showcases the computation with dynamic input dimensions.
function sizeOfShape(array) {
  return array.reduce(
      (accumulator, currentValue) => accumulator * currentValue);
}

const context = navigator.ml.createContext();

// Create a graph with dynamic shaped inputs.
const builder = new MLGraphBuilder(context);
const descA = {type: 'float32', dimensions: [-1, 4]};
const a = builder.input('a', descA);
const descB = {type: 'float32', dimensions: [4, -1]};
const b = builder.input('b', descB);
const c = builder.matmul(a, b);
const graph = builder.build({'c': c});

function allocateAndCompute(shapeA, shapeB, shapeC) {
  const bufferA = new Float32Array(sizeOfShape(shapeA)).fill(0.5);
  const bufferB = new Float32Array(sizeOfShape(shapeB)).fill(0.5);
  const bufferC = new Float32Array(sizeOfShape(shapeC));

  // Specify the shape of inputs when computing.
  const inputs = {
    'a': {resource: bufferA, dimensions: shapeA},
    'b': {resource: bufferB, dimensions: shapeB},
  };
  const outputs = {'c': bufferC};
  graph.compute(inputs, outputs);
  console.log(`values: ${bufferC}`);
}

allocateAndCompute([3, 4], [4, 3], [3, 3]);
allocateAndCompute([4, 4], [4, 4], [4, 4]);
allocateAndCompute([5, 4], [4, 5], [5, 5]);
The following code showcases the computation with optional outputs.
const context = navigator.ml.createContext();

// Build a graph with two outputs.
const builder = new MLGraphBuilder(context);
const descA = {type: 'float32', dimensions: [3, 4]};
const a = builder.input('a', descA);
const descB = {type: 'float32', dimensions: [4, 3]};
const bufferB = new Float32Array(sizeOfShape(descB.dimensions)).fill(0.5);
const b = builder.constant(descB, bufferB);
const descC = {type: 'float32', dimensions: [3, 3]};
const bufferC = new Float32Array(sizeOfShape(descC.dimensions)).fill(1);
const c = builder.constant(descC, bufferC);
const d = builder.matmul(a, b);
const e = builder.add(d, c);
const graph = builder.build({'d': d, 'e': e});

const bufferA = new Float32Array(sizeOfShape(descA.dimensions)).fill(0.5);
const inputs = {'a': bufferA};

// Compute d.
const bufferD = new Float32Array(sizeOfShape([3, 3]));
graph.compute(inputs, {'d': bufferD});
console.log(`values: ${bufferD}`);

// Compute e.
const bufferE = new Float32Array(sizeOfShape([3, 3]));
graph.compute(inputs, {'e': bufferE});
console.log(`values: ${bufferE}`);

8. Examples

The following code gets the MLContext object.
const context = navigator.ml.createContext({powerPreference: 'low-power'});
The following code builds a graph as:
constant1 ---+
             +--- Add ---> intermediateOutput1 ---+
input1    ---+                                    |
                                                  +--- Mul---> output
constant2 ---+                                    |
             +--- Add ---> intermediateOutput2 ---+
input2    ---+
// Use tensors in 4 dimensions.
const TENSOR_DIMS = [1, 2, 2, 2];
const TENSOR_SIZE = 8;

const builder = new MLGraphBuilder(context);

// Create MLOperandDescriptor object.
const desc = {type: 'float32', dimensions: TENSOR_DIMS};

// constant1 is a constant MLOperand with the value 0.5.
const constantBuffer1 = new Float32Array(TENSOR_SIZE).fill(0.5);
const constant1 = builder.constant(desc, constantBuffer1);

// input1 is one of the input MLOperands. Its value will be set before execution.
const input1 = builder.input('input1', desc);

// constant2 is another constant MLOperand with the value 0.5.
const constantBuffer2 = new Float32Array(TENSOR_SIZE).fill(0.5);
const constant2 = builder.constant(desc, constantBuffer2);

// input2 is another input MLOperand. Its value will be set before execution.
const input2 = builder.input('input2', desc);

// intermediateOutput1 is the output of the first Add operation.
const intermediateOutput1 = builder.add(constant1, input1);

// intermediateOutput2 is the output of the second Add operation.
const intermediateOutput2 = builder.add(constant2, input2);

// output is the output MLOperand of the Mul operation.
const output = builder.mul(intermediateOutput1, intermediateOutput2);
Compile the graph up to the output operand.
// Compile the constructed graph.
const graph = builder.build({'output': output});
The following code executes the compiled graph.
// Setup the input buffers with value 1.
const inputBuffer1 = new Float32Array(TENSOR_SIZE).fill(1);
const inputBuffer2 = new Float32Array(TENSOR_SIZE).fill(1);
const outputBuffer = new Float32Array(TENSOR_SIZE);

// Execute the compiled graph with the specified inputs.
const inputs = {
  'input1': inputBuffer1,
  'input2': inputBuffer2,
};
const outputs = {'output': outputBuffer};
graph.compute(inputs, outputs);

console.log('Output value: ' + outputBuffer);
// Output value: 2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25

9. Appendices

9.1. MLOperandType and ArrayBufferView compatibility

MLOperandType ArrayBufferView
float32 Float32Array
int32 Int32Array
uint32 Uint32Array
int8 Int8Array
uint8 Uint8Array

clarify the usage of ArrayBufferView for float16. [Issue #webmachinelearning/webnn#127]

10. Acknowledgements

This specification follows the concepts of the Android Neural Networks API C API.

Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.

Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.

Thanks to Sangwhan Moon and the W3C Technical Architecture Group for review of this specification for web architecture fit, design consistency and developer ergonomics.

Thanks to W3C Privacy Interest Group for privacy and security review and feedback.

Thanks to Alex Gough and the Chrome Security team for security review and questions.

Thanks to Michal Karzynski for sharing practical guidelines and learnings from ONNX.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[NUMPY-BROADCASTING-RULE]
The SciPy community. General Broadcasting Rules of NumPy. July 2019. URL: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html#general-broadcasting-rules
[PERMISSIONS-POLICY-1]
Ian Clelland. Permissions Policy. 16 July 2020. WD. URL: https://www.w3.org/TR/permissions-policy-1/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[WEBGL-1]
Dean Jackson; Jeff Gilbert. WebGL Specification, Version 1.0. 9 August 2017. URL: https://www.khronos.org/registry/webgl/specs/latest/1.0/
[WEBGPU]
Dzmitry Malyshau; Kai Ninomiya. WebGPU. ED. URL: https://gpuweb.github.io/gpuweb/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[Batch-Normalization]
Sergey Ioffe; Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. March 2015. URL: https://arxiv.org/abs/1502.03167
[ContextualLoss]
Roey Mechrez; Itamar Talmi; Lihi Zelnik-Manor. The Contextual Loss for Image Transformation with Non-Aligned Data. July 2018. URL: https://arxiv.org/abs/1803.02077
[DeepLabv3+]
Liang-Chieh Chen; et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. August 2018. URL: https://arxiv.org/abs/1802.02611
[DeepMoji]
Bjarke Felbo; et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. October 2017. URL: https://arxiv.org/abs/1708.00524
[ELU]
Djork-Arné Clevert; Thomas Unterthiner; Sepp Hochreiter. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). February 2016. URL: https://arxiv.org/abs/1511.07289
[FaceForensics++]
Andreas Rössler; et al. FaceForensics++. January 2019. URL: https://github.com/ondyari/FaceForensics
[FaceNet]
Florian Schroff; Dmitry Kalenichenko; James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. June 2015. URL: https://arxiv.org/abs/1503.03832
[FAN]
Adrian Bulat; Georgios Tzimiropoulos. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). September 2017. URL: https://arxiv.org/abs/1703.07332
[GNMT]
Minh-Thang Luong; Eugene Brevdo; Rui Zhao. Neural Machine Translation (seq2seq) Tutorial. May 2017. URL: https://github.com/tensorflow/nmt
[GRU]
Kyunghyun Cho; et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. September 2014. URL: https://arxiv.org/pdf/1406.1078.pdf
[HR-TIME-3]
Yoav Weiss. High Resolution Time. 28 January 2022. WD. URL: https://www.w3.org/TR/hr-time-3/
[IM2TXT]
Oriol Vinyals; et al. Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. September 2016. URL: https://arxiv.org/abs/1609.06647
[Instance-Normalization]
Dmitry Ulyanov; Andrea Vedaldi; Victor Lempitsky. Instance Normalization: The Missing Ingredient for Fast Stylization. July 2016. URL: https://arxiv.org/abs/1607.08022
[LeakyReLU]
Andrew L. Maas; Awni Y. Hannun; Andrew Y. Ng. Rectifier Nonlinearities Improve Neural Network Acoustic Models. June 2013. URL: https://pdfs.semanticscholar.org/367f/2c63a6f6a10b3b64b8729d601e69337ee3cc.pdf
[MaskR-CNN]
Kaiming He; et al. Mask R-CNN. January 2018. URL: https://arxiv.org/abs/1703.06870
[MobileNetV3]
Andrew Howard; et al. Searching for MobileNetV3. November 2019. URL: https://arxiv.org/pdf/1905.02244
[MODELS]
Machine Learning for the Web Community Group. The first-wave models. 2020. URL: https://github.com/webmachinelearning/webnn/blob/master/op_compatibility/first_wave_models.md
[OpenNMT]
Guillaume Klein; et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation. March 2017. URL: https://arxiv.org/abs/1701.02810
[PairedCycleGAN]
Huiwen Chang; et al. PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup. June 2018. URL: http://openaccess.thecvf.com/content_cvpr_2018/html/Chang_PairedCycleGAN_Asymmetric_Style_CVPR_2018_paper.html
[PoseNet]
Dan Oved. Real-time Human Pose Estimation in the Browser with TensorFlow.js. May 2018. URL: https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5
[RNNoise]
Jean-Marc Valin. Recurrent neural network for audio noise reduction. September 2017. URL: https://github.com/xiph/rnnoise
[SECURITY-PRIVACY-QUESTIONNAIRE]
Theresa O'Connor; Peter Snyder. Self-Review Questionnaire: Security and Privacy. 16 December 2021. NOTE. URL: https://www.w3.org/TR/security-privacy-questionnaire/
[SRGAN]
Christian Ledig; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. May 2017. URL: https://arxiv.org/abs/1609.04802
[SSD]
Wei Liu; et al. SSD: Single Shot MultiBox Detector. December 2016. URL: https://arxiv.org/abs/1512.02325
[Video-Summarization-with-LSTM]
Ke Zhang; et al. Video summarization with long short-term memory. October 2016. URL: http://www-scf.usc.edu/~zhan355/ke_eccv2016.pdf
[YOLO]
Joseph Redmon; et al. You Only Look Once: Unified, Real-Time Object Detection. May 2016. URL: https://arxiv.org/abs/1506.02640

IDL Index

interface mixin NavigatorML {
  [SecureContext, SameObject] readonly attribute ML ml;
};
Navigator includes NavigatorML;
WorkerNavigator includes NavigatorML;

enum MLDevicePreference {
  "default",
  "gpu",
  "cpu"
};

enum MLPowerPreference {
  "default",
  "high-performance",
  "low-power"
};

dictionary MLContextOptions {
  MLDevicePreference devicePreference = "default";
  MLPowerPreference powerPreference = "default";
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
  MLContext createContext(optional MLContextOptions options = {});
  MLContext createContext(WebGLRenderingContext glContext);
  MLContext createContext(GPUDevice gpuDevice);
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLContext {};

enum MLInputOperandLayout {
  "nchw",
  "nhwc"
};

enum MLOperandType {
  "float32",
  "float16",
  "int32",
  "uint32",
  "int8",
  "uint8"
};

dictionary MLOperandDescriptor {
  // The operand type.
  required MLOperandType type;

  // The dimensions field is only required for tensor operands.
  // The negative value means an unknown dimension.
  sequence<long> dimensions;
};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLOperand {};

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLOperator {};

typedef record<DOMString, MLOperand> MLNamedOperands;

dictionary MLBufferResourceView {
  required (WebGLBuffer or GPUBuffer) resource;
  unsigned long long offset = 0;
  unsigned long long size;
};

typedef (ArrayBufferView or MLBufferResourceView) MLBufferView;

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraphBuilder {
  // Construct the graph builder from the context.
  constructor(MLContext context);

  // Create an operand for a graph input.
  MLOperand input(DOMString name, MLOperandDescriptor desc);

  // Create an operand for a graph constant.
  MLOperand constant(MLOperandDescriptor desc, MLBufferView bufferView);

  // Create a single-value operand from the specified number of the specified type.
  MLOperand constant(double value, optional MLOperandType type = "float32");

  // Compile the graph up to the specified output operands
  MLGraph build(MLNamedOperands outputs);
};

dictionary MLBatchNormalizationOptions {
  MLOperand scale;
  MLOperand bias;
  long axis = 1;
  float epsilon = 1e-5;
  MLOperator activation;
};

partial interface MLGraphBuilder {
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance,
                             optional MLBatchNormalizationOptions options = {});
};

dictionary MLClampOptions {
  float minValue;
  float maxValue;
};

partial interface MLGraphBuilder {
  MLOperand clamp(MLOperand x, optional MLClampOptions options = {});
  MLOperator clamp(optional MLClampOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand concat(sequence<MLOperand> inputs, long axis);
};

enum MLConv2dFilterOperandLayout {
  "oihw",
  "hwio",
  "ohwi",
  "ihwo"
};

enum MLAutoPad {
  "explicit",
  "same-upper",
  "same-lower"
};

dictionary MLConv2dOptions {
  sequence<long> padding;
  sequence<long> strides;
  sequence<long> dilations;
  MLAutoPad autoPad = "explicit";
  long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConv2dFilterOperandLayout filterLayout = "oihw";
  MLOperand bias;
  MLOperator activation;
};

partial interface MLGraphBuilder {
  MLOperand conv2d(MLOperand input, MLOperand filter, optional MLConv2dOptions options = {});
};

enum MLConvTranspose2dFilterOperandLayout {
  "iohw",
  "hwoi",
  "ohwi"
};

dictionary MLConvTranspose2dOptions {
  sequence<long> padding;
  sequence<long> strides;
  sequence<long> dilations;
  sequence<long> outputPadding;
  sequence<long> outputSizes;
  MLAutoPad autoPad = "explicit";
  long groups = 1;
  MLInputOperandLayout inputLayout = "nchw";
  MLConvTranspose2dFilterOperandLayout filterLayout = "iohw";
  MLOperand bias;
  MLOperator activation;
};

partial interface MLGraphBuilder {
  MLOperand convTranspose2d(MLOperand input, MLOperand filter,
                            optional MLConvTranspose2dOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand add(MLOperand a, MLOperand b);
  MLOperand sub(MLOperand a, MLOperand b);
  MLOperand mul(MLOperand a, MLOperand b);
  MLOperand div(MLOperand a, MLOperand b);
  MLOperand max(MLOperand a, MLOperand b);
  MLOperand min(MLOperand a, MLOperand b);
  MLOperand pow(MLOperand a, MLOperand b);
};

partial interface MLGraphBuilder {
  MLOperand abs(MLOperand x);
  MLOperand ceil(MLOperand x);
  MLOperand cos(MLOperand x);
  MLOperand exp(MLOperand x);
  MLOperand floor(MLOperand x);
  MLOperand log(MLOperand x);
  MLOperand neg(MLOperand x);
  MLOperand sin(MLOperand x);
  MLOperand tan(MLOperand x);
};

dictionary MLEluOptions {
  float alpha = 1;
};

partial interface MLGraphBuilder {
  MLOperand elu(MLOperand x, optional MLEluOptions options = {});
  MLOperator elu(optional MLEluOptions options = {});
};

dictionary MLGemmOptions {
  MLOperand c;
  float alpha = 1.0;
  float beta = 1.0;
  boolean aTranspose = false;
  boolean bTranspose = false;
};

partial interface MLGraphBuilder {
  MLOperand gemm(MLOperand a, MLOperand b, optional MLGemmOptions options = {});
};

enum MLRecurrentNetworkWeightLayout {
  "zrn",  // update-reset-new gate ordering
  "rzn"   // reset-update-new gate ordering
};

enum MLRecurrentNetworkDirection {
  "forward",
  "backward",
  "both"
};

dictionary MLGruOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  MLOperand initialHiddenState;
  boolean resetAfter = true;
  boolean returnSequence = false;
  MLRecurrentNetworkDirection direction = "forward";
  MLRecurrentNetworkWeightLayout layout = "zrn";
  sequence<MLOperator> activations;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> gru(MLOperand input, MLOperand weight, MLOperand recurrentWeight, 
                        long steps, long hiddenSize, optional MLGruOptions options = {});
};

dictionary MLGruCellOptions {
  MLOperand bias;
  MLOperand recurrentBias;
  boolean resetAfter = true;
  MLRecurrentNetworkWeightLayout layout = "zrn";
  sequence<MLOperator> activations;
};

partial interface MLGraphBuilder {
  MLOperand gruCell(MLOperand input, MLOperand weight, MLOperand recurrentWeight, 
                  MLOperand hiddenState, long hiddenSize, optional MLGruCellOptions options = {});
};

dictionary MLHardSigmoidOptions {
  float alpha = 0.2;
  float beta = 0.5;
};

partial interface MLGraphBuilder {
  MLOperand hardSigmoid(MLOperand x, optional MLHardSigmoidOptions options = {});
  MLOperator hardSigmoid(optional MLHardSigmoidOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand hardSwish(MLOperand x);
  MLOperator hardSwish();
};

dictionary MLInstanceNormalizationOptions {
  MLOperand scale;
  MLOperand bias;
  float epsilon = 1e-5;
  MLInputOperandLayout layout = "nchw";
};

partial interface MLGraphBuilder {
  MLOperand instanceNormalization(MLOperand input, 
                                optional MLInstanceNormalizationOptions options = {});
};

dictionary MLLeakyReluOptions {
  float alpha = 0.01;
};

partial interface MLGraphBuilder {
  MLOperand leakyRelu(MLOperand x, optional MLLeakyReluOptions options = {});
  MLOperator leakyRelu(optional MLLeakyReluOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand matmul(MLOperand a, MLOperand b);
};

dictionary MLLinearOptions {
  float alpha = 1;
  float beta = 0;
};

partial interface MLGraphBuilder {
  MLOperand linear(MLOperand x, optional MLLinearOptions options = {});
  MLOperator linear(optional MLLinearOptions options = {});
};

enum MLPaddingMode {
  "constant",
  "edge",
  "reflection",
  "symmetric"
};

dictionary MLPadOptions {
  MLPaddingMode mode = "constant";
  float value = 0;
};

partial interface MLGraphBuilder {
  MLOperand pad(MLOperand input, MLOperand padding, optional MLPadOptions options = {});
};

enum MLRoundingType {
  "floor",
  "ceil"
};

dictionary MLPool2dOptions {
  sequence<long> windowDimensions;
  sequence<long> padding;
  sequence<long> strides;
  sequence<long> dilations;
  MLAutoPad autoPad = "explicit";
  MLInputOperandLayout layout = "nchw";
  MLRoundingType roundingType = "floor";
  sequence<long> outputSizes;
};

partial interface MLGraphBuilder {
  MLOperand averagePool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand l2Pool2d(MLOperand input, optional MLPool2dOptions options = {});
  MLOperand maxPool2d(MLOperand input, optional MLPool2dOptions options = {});
};

dictionary MLReduceOptions {
  sequence<long> axes = null;
  boolean keepDimensions = false;
};

partial interface MLGraphBuilder {
  MLOperand reduceL1(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceL2(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceLogSumExp(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMax(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMean(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceMin(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceProduct(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSum(MLOperand input, optional MLReduceOptions options = {});
  MLOperand reduceSumSquare(MLOperand input, optional MLReduceOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand relu(MLOperand x);
  MLOperator relu();
};

enum MLInterpolationMode {
  "nearest-neighbor",
  "linear"
};

dictionary MLResample2dOptions {
  MLInterpolationMode mode = "nearest-neighbor";
  sequence<float> scales;
  sequence<long> sizes;
  sequence<long> axes;
};

partial interface MLGraphBuilder {
  MLOperand resample2d(MLOperand input, optional MLResample2dOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand reshape(MLOperand input, sequence<long> newShape);
};

partial interface MLGraphBuilder {
  MLOperand sigmoid(MLOperand x);
  MLOperator sigmoid();
};

dictionary MLSliceOptions {
  sequence<long> axes;
};

partial interface MLGraphBuilder {
  MLOperand slice(MLOperand input, sequence<long> starts, sequence<long> sizes,
                optional MLSliceOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand softmax(MLOperand x);
};

dictionary MLSoftplusOptions {
  float steepness = 1;
};

partial interface MLGraphBuilder {
  MLOperand softplus(MLOperand x, optional MLSoftplusOptions options = {});
  MLOperator softplus(optional MLSoftplusOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand softsign(MLOperand x);
  MLOperator softsign();
};

dictionary MLSplitOptions {
  long axis = 0;
};

partial interface MLGraphBuilder {
  sequence<MLOperand> split(MLOperand input,
                          (unsigned long or sequence<unsigned long>) splits,
                          optional MLSplitOptions options = {});
};

dictionary MLSqueezeOptions {
  sequence<long> axes;
};

partial interface MLGraphBuilder {
  MLOperand squeeze(MLOperand input, optional MLSqueezeOptions options = {});
};

partial interface MLGraphBuilder {
  MLOperand tanh(MLOperand x);
  MLOperator tanh();
};

dictionary MLTransposeOptions {
  sequence<long> permutation;
};

partial interface MLGraphBuilder {
  MLOperand transpose(MLOperand input, optional MLTransposeOptions options = {});
};

typedef (MLBufferView or WebGLTexture or GPUTexture) MLResource;

dictionary MLInput {
  required MLResource resource;
  required sequence<long> dimensions;
};

typedef record<DOMString, (MLResource or MLInput)> MLNamedInputs;
typedef record<DOMString, MLResource> MLNamedOutputs;

[SecureContext, Exposed=(Window, DedicatedWorker)]
interface MLGraph {
  undefined compute(MLNamedInputs inputs, MLNamedOutputs outputs);
};

Issues Index

Document operations susceptible to out-of-bounds access as a guidance to implementers.
Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.
Hinting partially mitigates the concern. Investigate additional mitigations.
Describe the algorithm steps for this.[[context]] created from WebGLRenderingContext and GPUDevice.
clarify the usage of ArrayBufferView for float16. [Issue #webmachinelearning/webnn#127]