1. Introduction
The Web Neural Network API defines a web-friendly hardware-agnostic abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms without being tied to platform-specific capabilities. The abstraction layer addresses the requirements of key Machine Learning JavaScript frameworks and also allows web developers familiar with the ML domain to write custom code without the help of libraries. A complementary Model Loader API defines a higher-level abstraction targeting primarily web developers.
For an illustrated introduction, please see the explainer.
2. Use cases
2.1. Application Use Cases
This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models].
Note: Please be aware that some of the use cases described here, are by their very nature, privacy-invasive. Developers who are planning to use the API for such use cases should ensure that the API is being used to benefit users, for purposes that users understand, and approve. They should apply the Ethical Principles for Web Machine Learning [webmachinelearning-ethics] and implement appropriate privacy risk mitigations such as transparency, data minimisation, and users controls.
2.1.1. Person Detection
A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.
When she comes back, the application automatically detects her and notifies other online users that she is active now.
2.1.2. Semantic Segmentation
A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+] or [MaskR-CNN] to semantically split an image into segments and replaces segments that represent other people and background with another picture.
2.1.3. Skeleton Detection
A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.
2.1.4. Face Recognition
There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD]) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet], which verifies whether two faces would be identical or not.
2.1.5. Facial Landmark Detection
A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.
2.1.6. Style Transfer
A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.
2.1.7. Super Resolution
A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.
2.1.8. Image Captioning
For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.
2.1.9. Machine Translation
Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT], which translates every text into different language.
2.1.10. Emotion Analysis
A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji], which infers emotion from input texts, and displays an emoji that represents the estimated emotion.
2.1.11. Video Summarization
A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM].
2.1.12. Noise Suppression
A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.
2.1.13. Detecting fake video
A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite a user politically or to manipulate user’s opinion. The deepfake detection applications such as [FaceForensics++] analyze the videos and protect a user against the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time.
2.2. Framework Use Cases
This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.
2.2.1. Custom Layer
A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU], [ELU], etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.
2.2.2. Network Concatenation
A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.
Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.
2.2.3. Performance Adaptation
A web application developer has a concern about performance of her DNN model on mobile devices. She has confirmed that it may run too slow on mobile devices which do not have GPU acceleration. To address this issue, her web application refers to the WebNN API to confirm whether acceleration is available or not, so that the application can display the warning for devices without acceleration.
After several weeks, she has developed a tiny DNN model that can even run on CPU. In order to accommodate CPU execution, she modifies the application so that the application loads the tiny model in the case of CPU-only devices.
2.2.4. Operation Level Execution
A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.
2.2.5. Integration with real-time video processing
The user experience of WebRTC-based video conferencing is enhanced using real-time video processing. For example, background blur implemented using a § 2.1.2 Semantic Segmentation model blurs the background in the user’s live camera feed. To satisfy the performance requirements of this use case, the WebNN API integrates with primitives from other Web APIs that make up the media pipeline to allow WebNN API-based transformation of real-time video streams.
3. Security Considerations
This API is disabled by default in all cross-origin frames using the § 7.2.1 Permissions Policy Integration. This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.
This API allows creation of an MLContext
from a GPUDevice
defined by WebGPU specification. See WebGPU Security Considerations for more information regarding security characteristics of this context.
Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. The bounds checking occurs when the compute method is invoked that executes the graph against the actual data. No actual data is bound to the compiled graph before this stage. It is the implementation’s responsibility to make sure proper bounds checking occurs against the shapes of the data already inferred by that time.
Document operations susceptible to out-of-bounds access as a guidance to implementers.
As a future-proofing measure, the API design allows certain operations that can be generically emulated to be deprecated for security, performance, or other reasons without breaking compatibility. This is made possible by high-level functions that are defined in terms of smaller primitive operations defined in this specifications. This enables a native implementation of a high-level function to be replaced with a polyfill implementation.
Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.
In order to not allow an attacker to target a specific implementation that may contain a flaw, the § 6.2 Device Selection mechanism is a hint only, and the concrete device selection is left to the implementation - a user agent could for instance choose never to run a model on a device with known vulnerabilities. As a further mitigation, no device enumeration mechanism is defined.
Hinting partially mitigates the concern. Investigate additional mitigations.
The API design minimizes the attack surface for the compiled computational graph. The MLGraphBuilder
interface that hosts the various operations is a data definition API and as such doesn’t execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the MLContext
.compute()
method. This enables implementers to focus on hardening the MLContext
.compute()
method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected.
Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [hr-time-3]. The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks.
3.1. Guidelines for new operations
To ensure operations defined in this specification are shaped in a way they can be implemented securely, this section includes guidelines on how operations are expected to be defined to reduce potential for implementation problems. These guidelines are expected to evolve over time to align with industry best practices:
-
Prefer simplicity of arguments
-
Don’t use parsers for complex data formats
-
If an operation can be decomposed to low level primitives:
-
Add an informative emulation path
-
Prefer primitives over new high level operations but consider performance consequences
-
-
Operations should follow a consistent style for inputs and attributes
-
Operation families such as pooling and reduction should share API shape and options
-
Formalize failure cases into test cases whenever possible
-
When in doubt, leave it out: API surface should be as small as possible required to satisfy the use cases, but no smaller
-
Try to keep the API free of implementation details that might inhibit future evolution, do not overspecify
-
Fail fast: the sooner the web developer is informed of an issue, the better
In general, always consider the security and privacy implications as documented in [security-privacy-questionnaire] by the Technical Architecture Group and the Privacy Interest Group when adding new features.
4. Privacy Considerations
This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox.
This API exposes the minimum amount of information necessary to address the identified § 2 Use cases for the best performance and reliability of results.
No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform.
Note: The group is soliciting further input on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.
Unlike WebGPU, this API does not intrinsically support custom shader authoring; and as a result is not prone to timing attacks that rely on shader caches, or other persistent data. The API builds upon pre-existing shaders and lower level primitives of the browser or the underlying OS. Web developers who interface with GPUDevice
are expected to be aware of WebGPU compilation cache considerations.
The WebGPU API identifies machine-specific artifacts as a privacy consideration. Given the WebNN API defines means to record an ML workload onto a WebGPU-compatible GPUCommandBuffer
, compute unit scheduling may under certain circumstances introduce a fingerprint. However, similarly to WebGPU, such fingerprints are identical across most or all of the devices of each vendor, mitigating the concern. Furthermore, software implementations can be used to further eliminate such artifacts.
The WebNN API defines two developer-settable preferences to help inform § 6.2 Device Selection and allow the implementation to better select the most appropriate underlying execution device for the workload. Device type normatively indicates the kind of device and is either "cpu" or "gpu". If this type cannot be satisfied, an OperationError
exception is thrown, thus this type can in some cases add two bits of entropy to the fingerprint. Power preference indicates preference as related to the power consumption and is considered a hint only and as such does not increase entropy of the fingerprint.
If a future version of this specification introduces support for new a device type that can only support a subset of MLOperandType
s, that may introduce a new fingerprint.
In general, implementers of this API are expected to apply WebGPU Privacy Considerations to their implementations where applicable.
5. Ethical Considerations
The Working Group has started documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account. The Working Group publishes and maintains an Ethical Principles for Web Machine Learning document [webmachinelearning-ethics] open to contributions from the wider community via a dedicated GitHub repository.
6. Programming Model
6.1. Overview
At the heart of neural networks is a computational graph of mathematical operations. These operations are the building blocks of modern machine learning technologies in computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks.
The MLGraph
interface represents a compiled computational graph that is immutable (that is, a model).
The MLGraphBuilder
interface serves as a builder (factory) to create a MLGraph
.
An MLOperand
is a representation of data that flows within the computational graph,
which include input-values for inference, constants (including trained weights)
used for inference, intermediate values (often referred to as activations) computed
during inference, as well as the output values of inference.
At inference time, every MLOperand
will be bound to a tensor (the actual data).
The MLGraphBuilder
interface enables the creation of MLOperand
s.
A key part of the MLGraphBuilder
interface are the operations (such as MLGraphBuilder
.gemm()
and MLGraphBuilder
.softmax()
). The operations have a functional
semantics, with no side effects.
Each operation invocation conceptually returns a distinct new value, without
changing the value of any other MLOperand
.
The runtime values (of MLOperand
s) are tensors, which are essentially multidimensional
arrays. The representation of the tensors is implementation dependent, but it typically
includes the array data stored in some buffer (memory) and some metadata describing the
array data (such as its shape).
As mentioned above, the operations have a functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation of operations such as reshape, or slice, or squeeze may return a view of its input tensor that shares the same buffer as the input tensor. (In the case of reshape or squeeze, the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values.
Before the execution, the computation graph that is used to compute one or more specified outputs needs to be compiled and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion.
There are multiple ways by which the graph may be compiled. The MLGraphBuilder
.build()
method compiles the graph immediately on the calling thread, which must be a worker thread running on CPU or GPU device, and returns an MLGraph
. The MLGraphBuilder
.buildAsync()
method compiles the graph in background without blocking the calling thread, and returns a Promise
that resolves to an MLGraph
. Both compilation methods produce an MLGraph
that represents a compiled graph for optimal execution.
Once the MLGraph
is constructed, there are multiple ways by which the graph may be executed. The MLContext
.compute()
method represents a way the execution of the graph is carried out immediately
on the calling thread, which must also be a worker thread, either on a CPU or GPU device. The execution
produces the results of the computation from all the inputs bound to the graph.
The MLContext
.computeAsync()
method represents a way the execution of the graph is performed asynchronously
either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU
command queue. This method returns immediately without blocking the calling thread while the actual execution is
offloaded to a different timeline. This type of execution is appropriate when the responsiveness of the calling
thread is critical to good user experience. The computation results will be placed at the bound outputs at the
time the operation is successfully completed on the offloaded timeline at which time the calling thread is
signaled. This type of execution supports both the CPU and GPU device.
In both the MLContext
.compute()
and MLContext
.computeAsync()
execution methods, the caller supplies
the input values using MLNamedArrayBufferViews
, binding the input MLOperand
s to their values. The caller
then supplies pre-allocated buffers for output MLOperand
s using MLNamedArrayBufferViews
.
The MLCommandEncoder
interface created by the MLContext
.createCommandEncoder()
method supports
a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their
application. It does this by placing the workload required to initialize and compute the results of the
operations in the graph onto a GPUCommandBuffer
. The callers are responsible for the eventual submission
of this workload on the GPUQueue
through the WebGPU queue submission mechanism. Once the submitted workload
is completely executed, the result is avaialble in the bound output buffers.
6.2. Device Selection
An MLContext
interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. In addition to the default method of creation with MLContextOptions
, an MLContext
could also be created from a specific GPUDevice
that is already in use by the application, in which case the corresponding GPUBuffer
resources used as graph constants, as well as the GPUTexture
as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for MLContext
must be created from the same adapter as the device used to allocate the resources referenced in the graph.
In a situation when a GPU context executes a graph with a constant or an input in the system memory as an ArrayBufferView
, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an ArrayBufferView
output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn’t occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller’s perspective.
When an MLContext
is created with MLContextOptions
, the user agent selects and creates the underlying execution device by taking into account the application’s power preference and device type specified in the MLPowerPreference
and MLDeviceType
options.
The following table summarizes the types of resource supported by the context created through different method of creation:
Creation method | ArrayBufferView | GPUBuffer | GPUTexture |
---|---|---|---|
MLContextOptions | Yes | No | No |
GPUDevice | Yes | Yes | Yes |
7. API
7.1. navigator.ml
A ML
object is available in the Window
and DedicatedWorkerGlobalScope
contexts through the Navigator
and WorkerNavigator
interfaces respectively and is exposed via navigator.ml
:
interface mixin { [
NavigatorML SecureContext ,SameObject ]readonly attribute ML ; };
ml Navigator includes NavigatorML ;WorkerNavigator includes NavigatorML ;
7.2. ML
enum {
MLDeviceType ,
"cpu" };
"gpu" enum {
MLPowerPreference ,
"default" ,
"high-performance" };
"low-power" dictionary {
MLContextOptions MLDeviceType = "cpu";
deviceType MLPowerPreference = "default"; }; [
powerPreference SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {
ML MLContext (
createContext optional MLContextOptions = {});
options MLContext (
createContext GPUDevice ); };
gpuDevice
The createContext()
method steps are:
-
If this's relevant global object's associated Document is not allowed to use the webnn feature, then throw a "
SecurityError
"DOMException
and abort these steps. -
Let context be a new
MLContext
object. -
Switch on the method’s first argument:
MLContextOptions
- Set context.
[[contextType]]
to default.- Set context.
[[deviceType]]
to the value ofMLContextOptions
'sdeviceType
.- Set context.
[[powerPreference]]
to the value ofMLContextOptions
'spowerPreference
. - Set context.
GPUDevice
- Set context.
[[contextType]]
to webgpu.- Set context.
[[deviceType]]
to "gpu".- Set context.
[[powerPreference]]
to "default". - Set context.
-
Return context.
7.2.1. Permissions Policy Integration
This specification defines a policy-controlled feature identified by the
string "webnn
".
Its default allowlist is 'self'
.
7.3. MLContext
TheMLContext
interface represents a global state of neural network compute workload and execution processes. Each MLContext
object has associated context type, device type and power preference.
The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:
- "
default
" - Context created per user preference options.
- "
webgpu
" - Context created from WebGPU device.
The device type indicates the kind of device used for the context. It is one of the following:
- "
cpu
" - Provides the broadest compatibility and usability across all client devices with varying degrees of performance.
- "
gpu
" - Provides the broadest range of achievable performance across graphics hardware platforms from consumer devices to professional workstations.
The power preference indicates preference as related to power consumption. It is one of the following:
- "
default
" - Let the user agent select the most suitable behavior.
- "
high-performance
" - Prioritizes execution speed over power consumption.
- "
low-power
" - Prioritizes power consumption over other considerations such as execution speed.
typedef record <DOMString ,ArrayBufferView >; [
MLNamedArrayBufferViews SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLContext
MLContext
has the following internal slots:
[[contextType]]
of type context type-
The
MLContext
's context type. [[deviceType]]
of type device type-
The
MLContext
's device type. [[powerPreference]]
of type power preference-
The
MLContext
's power preference.
[[contextType]]
is set to default with the MLContextOptions
.deviceType
set to gpu, the user agent is responsible for creating an internal GPU device that operates within the context and is capable of ML workload submission on behalf of the calling application. In this setting however, only ArrayBufferView
inputs and outputs are allowed in and out of the graph execution since the application has no way to know what type of internal GPU device is being created on their behalf. In this case, the user agent is responsible for automatic uploads and downloads of the inputs and outputs to and from the GPU memory using this said internal device. 7.3.1. Synchronous Execution
Synchronously carries out the computational workload of a compiled graphMLGraph
on the calling thread, which must be a worker thread, to produce results as defined by the operations in the graph. This method of execution requires an MLContext
created with MLContextOptions
. Otherwise, it throws an OperationError
exception.
partial interface MLContext { [Exposed =(DedicatedWorker )]undefined (
compute MLGraph ,
graph MLNamedArrayBufferViews ,
inputs MLNamedArrayBufferViews ); };
outputs
Arguments:
-
graph: an
MLGraph
. The compiled graph to be executed. -
inputs: an
MLNamedArrayBufferViews
. The resources of inputs. -
outputs: an
MLNamedArrayBufferViews
. The pre-allocated resources of required outputs.
Returns: undefined
.
-
If any of the following requirements are unmet, then throw a
DataError
DOMException
and stop.-
For each key -> value of inputs:
-
graph.
[[inputDescriptors]]
[key] must exist. -
Let inputDesc be graph.
[[inputDescriptors]]
[key]. -
The type of
ArrayBufferView
value must match inputDesc.type
according to this table. -
value.[[ByteLength]] must equal to byte length of inputDesc.
-
-
For each key -> value of outputs:
-
graph.
[[outputDescriptors]]
[key] must exist. -
Let outputDesc be graph.
[[outputDescriptors]]
[key]. -
The type of
ArrayBufferView
value must match outputDesc.type
according to this table. -
value.[[ByteLength]] must equal to byte length of outputDesc.
-
-
-
For each key -> value of inputs:
-
Let inputDesc be graph.
[[inputDescriptors]]
[key]. -
Let inputTensor be a new tensor for graph.
[[implementation]]
. -
Set the data type of inputTensor to the one that matches the element type of
ArrayBufferView
value. -
Set the dimensions of inputTensor to inputDesc.
dimensions
. -
Set the values of inputTensor to the values of value.
-
Set the input of graph.
[[implementation]]
that is associated with key to inputTensor.
-
-
For each key -> value of outputs:
-
Issue a compute request for output of graph.
[[implementation]]
that is associated with key. -
Wait for the compute request to be completed.
-
If there is an error returned by graph.
[[implementation]]
, then:-
Throw an
OperationError
DOMException
and stop.
-
-
Else:
-
Let outputTensor be the output tensor returned by graph.
[[implementation]]
. -
If the data type of outputTensor doesn’t match the element type of
ArrayBufferView
value, then throw aDataError
DOMException
and stop. -
If the byte length of outputTensor is not equal to value.[[ByteLength]], then:
-
Throw a
DataError
DOMException
and stop.
-
-
Else:
-
Set the values of value to the values of outputTensor.
-
-
-
-
Return
undefined
.
7.3.1.1. Examples
const context= navigator. ml. createContext(); // Build a graph with two outputs. const builder= new MLGraphBuilder( context); const descA= { type: 'float32' , dimensions: [ 3 , 4 ]}; const a= builder. input( 'a' , descA); const descB= { type: 'float32' , dimensions: [ 4 , 3 ]}; const bufferB= new Float32Array( sizeOfShape( descB. dimensions)). fill( 0.5 ); const b= builder. constant( descB, bufferB); const descC= { type: 'float32' , dimensions: [ 3 , 3 ]}; const bufferC= new Float32Array( sizeOfShape( descC. dimensions)). fill( 1 ); const c= builder. constant( descC, bufferC); const d= builder. matmul( a, b); const e= builder. add( d, c); const graph= builder. build({ 'd' : d, 'e' : e}); const bufferA= new Float32Array( sizeOfShape( descA. dimensions)). fill( 0.5 ); const inputs= { 'a' : bufferA}; // Compute d. const bufferD= new Float32Array( sizeOfShape([ 3 , 3 ])); context. compute( graph, inputs, { 'd' : bufferD}); console. log( `values: ${ bufferD} ` ); // Compute e. const bufferE= new Float32Array( sizeOfShape([ 3 , 3 ])); context. compute( graph, inputs, { 'e' : bufferE}); console. log( `values: ${ bufferE} ` );
7.3.2. Asynchronous Execution
Asynchronously carries out the computational workload of a compiled graphMLGraph
on a separate timeline, either on a worker thread for the CPU execution, or on a GPU timeline for the submission of GPU workload on the command queue. The asynchronous nature of this call avoids blocking the calling thread while the computation for result is ongoing. This method of execution requires an MLContext
created with MLContextOptions
. Otherwise, it throws an OperationError
exception.
partial interface MLContext {Promise <undefined >(
computeAsync MLGraph ,
graph MLNamedArrayBufferViews ,
inputs MLNamedArrayBufferViews ); };
outputs
Arguments:
-
graph: an
MLGraph
. The compiled graph to be executed. -
inputs: an
MLNamedArrayBufferViews
. The resources of inputs. -
outputs: an
MLNamedArrayBufferViews
. The pre-allocated resources of required outputs.
Returns: Promise<undefined
>.
-
If any of the following requirements are unmet, then throw a
DataError
DOMException
and stop.-
For each key -> value of inputs:
-
graph.
[[inputDescriptors]]
[key] must exist. -
Let inputDesc be graph.
[[inputDescriptors]]
[key]. -
The type of
ArrayBufferView
value must match inputDesc.type
according to this table. -
value.[[ByteLength]] must equal to byte length of inputDesc.
-
-
For each key -> value of outputs:
-
graph.
[[outputDescriptors]]
[key] must exist. -
Let outputDesc be graph.
[[outputDescriptors]]
[key]. -
The type of
ArrayBufferView
value must match outputDesc.type
according to this table. -
value.[[ByteLength]] must equal to byte length of outputDesc.
-
-
-
Let promise be a new promise.
-
For each key -> value of inputs:
-
Let inputDesc be graph.
[[inputDescriptors]]
[key]. -
Let inputTensor be a new tensor for graph.
[[implementation]]
. -
Set the data type of inputTensor to the one that matches the element type of
ArrayBufferView
value. -
Set the dimensions of inputTensor to inputDesc.
dimensions
. -
Set the values of inputTensor to the values of value.
-
Set the input of graph.
[[implementation]]
that is associated with key to inputTensor.
-
-
For each key -> value of outputs:
-
Issue a compute request for output of graph.
[[implementation]]
that is associated with key. -
Wait for the compute request to be completed.
-
If there is an error returned by graph.
[[implementation]]
, then:-
reject promise with an
OperationError
and stop.
-
-
Else:
-
Let outputTensor be the output tensor returned by graph.
[[implementation]]
. -
Let outputDesc be graph.
[[outputDescriptors]]
[key]. -
If the data type of outputTensor doesn’t match the element type of
ArrayBufferView
value, then throw aDataError
DOMException
and stop. -
If the byte length of outputTensor is not equal to byte length of outputDesc, then:
-
reject promise with an
OperationError
and stop.
-
-
Else:
-
Set the values of value to the values of outputTensor.
-
-
If all compute requests are completed, Resolve promise and stop.
-
-
-
Return promise.
7.3.3. WebGPU Interoperability
CreateMLCommandEncoder
interface used to record the ML workload onto a WebGPU-compatible GPUCommandBuffer
to allow mixing of ML workload with other GPU workload in an application that leverages WebGPU. This method only succeeds on an MLContext
created with GPUDevice
. Otherwise, it throws an OperationError
exception.
partial interface MLContext {MLCommandEncoder (); };
createCommandEncoder
MLCommandEncoder
. The command encoder used to record ML workload on the GPU. 7.4. MLOperandDescriptor
enum {
MLInputOperandLayout ,
"nchw" };
"nhwc" enum {
MLOperandType ,
"float32" ,
"float16" ,
"int32" ,
"uint32" ,
"int8" };
"uint8" dictionary { // The operand type.
MLOperandDescriptor required MLOperandType ; // The dimensions field is only required for tensor operands.
type sequence <unsigned long >; };
dimensions
MLOperandDescriptor
desc is the value returned by the following steps:
-
Let elementLength be 1.
-
For each dimension of desc.
dimensions
:-
Set elementLength to elementLength × dimension.
-
-
Let elementSize be the element size of one of the
ArrayBufferView
types that matches desc.type
according to this table. -
Return elementLength × elementSize.
7.5. MLOperand
An MLOperand
represents an intermediary graph being constructed as a result of compositing parts of an operation into a fully composed operation.
For instance, an MLOperand
may represent a constant feeding to an operation or the result from combining multiple constants together into an operation. See also § 6 Programming Model.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLOperand
See also § 3.1 Guidelines for new operations
7.6. MLOperator
Objects implementing the MLOperator
interface represent activation function types. As a generic construct, this interface may be reused for other types in a future version of this specification.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLOperator
MLOperator
interface can simply be a struct that holds a string type of the activation function along with other properties needed. The actual creation of the activation function e.g. a § 7.7.24 sigmoid or § 7.7.21 relu can then be deferred until when the rest of the graph is ready to connect with it such as during the construction of § 7.7.4 conv2d for example. 7.7. MLGraphBuilder
The MLGraphBuilder
interface defines a set of operations as identified by the § 2 Use cases that can be composed into a computational graph. It also represents the intermediate state of a graph building session.
typedef record <DOMString ,MLOperand >;
MLNamedOperands dictionary {
MLBufferResourceView required GPUBuffer ;
resource unsigned long long = 0;
offset unsigned long long ; };
size typedef (ArrayBufferView or MLBufferResourceView ); [
MLBufferView SecureContext ,Exposed =(Window ,DedicatedWorker )]interface { // Construct the graph builder from the context.
MLGraphBuilder (
constructor MLContext ); // Create an operand for a graph input.
context MLOperand (
input DOMString ,
name MLOperandDescriptor ); // Create an operand for a graph constant.
desc MLOperand (
constant MLOperandDescriptor ,
desc MLBufferView ); // Create a single-value operand from the specified number of the specified type.
bufferView MLOperand (
constant double ,
value optional MLOperandType = "float32"); // Compile the graph up to the specified output operands synchronously. [
type Exposed =(DedicatedWorker )]MLGraph (
build MLNamedOperands ); // Compile the graph up to the specified output operands asynchronously.
outputs Promise <MLGraph >(
buildAsync MLNamedOperands ); };
outputs
MLGraphBuilder
.build()
and MLGraphBuilder
.buildAsync()
methods compile the graph builder state up to the specified output operands into a compiled graph according to the type of MLContext
that creates it. Since this operation can be costly in some machine configurations, the calling thread of the MLGraphBuilder
.build()
method must only be a worker thread to avoid potential disruption of the user experience. When the [[contextType]]
of the MLContext
is set to default, the compiled graph is initialized right before the MLGraph
is returned. This graph initialization stage is important for optimal performance of the subsequent graph executions. See § 7.9.1 Graph Initialization for more detail. 7.7.1. batchNormalization
Normalize the tensor values of input features across the batch dimension using [Batch-Normalization]. For each input feature, the mean and variance values of that feature supplied in this calculation as parameters are previously computed across the batch dimension of the input during the model training phase of this operation.dictionary {
MLBatchNormalizationOptions MLOperand ;
scale MLOperand ;
bias long = 1;
axis float = 1e-5;
epsilon MLOperator ; };
activation partial interface MLGraphBuilder {MLOperand (
batchNormalization MLOperand ,
input MLOperand ,
mean MLOperand ,
variance optional MLBatchNormalizationOptions = {}); };
options
-
input: an
MLOperand
. The input N-D tensor. -
mean: an
MLOperand
. The 1-D tensor of the mean values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis. -
variance: an
MLOperand
. The 1-D tensor of the variance values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis. -
options: an optional
MLBatchNormalizationOptions
. The optional parameters of the operation.-
scale: an
MLOperand
. The 1-D tensor of the scaling values whose length is equal to the size of the input dimension denoted by options.axis. -
bias: an
MLOperand
. The 1-D tensor of the bias values whose length is equal to the size of the input dimension denoted by options.axis. -
axis: a
long
scalar. The index to the feature count dimension of the input shape for which the mean and variance values are. When it’s not specified, the default value is 1. -
epsilon: a
float
scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. -
activation: an
MLOperator
. The optional activation function that immediately follows the normalization operation.
-
Returns: an MLOperand
. The batch-normalized N-D tensor of the same shape as the input tensor.
When input is a 4-D tensor of the "nchw" or "nhwc" layout, options.axis should be set to 1 or 3 respectively. The axis value designates the feature or channel count dimension of the input tensor.
const shape= [ 1 , - 1 , 1 , 1 ]; return builder. relu( builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, builder. reshape( mean, shape)), builder. pow( builder. add( builder. reshape( variance, shape), builder. constant( options. epsilon)), builder. constant( 0.5 )) )), builder. reshape( options. bias, shape)));
7.7.2. clamp
Clamp the input tensor element-wise within a range specified by the minimum and maximum values.dictionary {
MLClampOptions float ;
minValue float ; };
maxValue partial interface MLGraphBuilder {MLOperand (
clamp MLOperand ,
x optional MLClampOptions = {});
options MLOperator (
clamp optional MLClampOptions = {}); };
options
-
x: an
MLOperand
. The input tensor. -
options: an optional
MLClampOptions
. The optional parameters of the operation.-
minValue: a
float
scalar. Specifies the minimum value of the range. When it is not specified, the clamping is not performed on the lower limit of the range. -
maxValue: a
float
scalar. Specifies the maximum value of the range. When it is not specified, the clamping is not performed on the upper limit of the range.
-
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the clamp operation.
if ( options. minValue=== undefined ) { if ( options. maxValue=== undefined ) { return x; } else { return builder. min( x, builder. constant( options. maxValue)); } } else { if ( options. maxValue=== undefined ) { return builder. max( x, builder. constant( options. minValue)); } else { return builder. min( builder. max( x, builder. constant( options. minValue)), builder. constant( options. maxValue)); } }
7.7.3. concat
Concatenates the input tensors along a given axis.partial interface MLGraphBuilder {MLOperand (
concat sequence <MLOperand >,
inputs long ); };
axis
-
inputs: a sequence of
MLOperand
. All input tensors must have the same shape, except for the size of the dimension to concatenate on. -
axis: a
long
scalar. The axis that the inputs concatenate along, with the value in the interval [0, N) where N is the rank of all the inputs.
Returns: an MLOperand
. The concatenated tensor of all the inputs along
the axis. The output tensor has the same shape except on the dimension
that all the inputs concatenated along. The size of that dimension is
computed as the sum of all the input sizes of the same dimension.
7.7.4. conv2d
Compute a 2-D convolution given 4-D input and filter tensorsenum {
MLConv2dFilterOperandLayout ,
"oihw" ,
"hwio" ,
"ohwi" };
"ihwo" enum {
MLAutoPad ,
"explicit" ,
"same-upper" };
"same-lower" dictionary {
MLConv2dOptions sequence <long >;
padding sequence <long >;
strides sequence <long >;
dilations MLAutoPad = "explicit";
autoPad long = 1;
groups MLInputOperandLayout = "nchw";
inputLayout MLConv2dFilterOperandLayout = "oihw";
filterLayout MLOperand ;
bias MLOperator ; };
activation partial interface MLGraphBuilder {MLOperand (
conv2d MLOperand ,
input MLOperand ,
filter optional MLConv2dOptions = {}); };
options
-
input: an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.inputLayout. -
filter: an
MLOperand
. The filter 4-D tensor. The logical shape is interpreted according to the value of options.filterLayout and options.groups. -
options: an optional
MLConv2dOptions
. The optional parameters of the operation.-
padding: a sequence of
long
of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input, [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides: a sequence of
long
of length 2. The stride of the sliding window for each spatial dimension of input, [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations: a sequence of
long
of length 2. The dilation factor for each spatial dimension of input, [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
autoPad: an
MLAutoPad
. The automatic input padding options. By default, this argument is set to "explicit", which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit", the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
groups: a
long
scalar. The number of groups that input channels and output channels are divided into, default to 1. -
inputLayout: an
MLInputOperandLayout
. The default value is "nchw". This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, input_channels, height, width]
-
output tensor: [batches, output_channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, input_channels]
-
output tensor: [batches, height, width, output_channels]
-
-
filterLayout: an
MLConv2dFilterOperandLayout
. The default value is "oihw". This option specifies the layout format of the filter tensor as follow:"oihw":
-
[output_channels, input_channels/groups, height, width]
"hwio":
-
[height, width, input_channels/groups, output_channels]
"ohwi":
-
[output_channels, height, width, input_channels/groups]
"ihwo":
-
[input_channels/groups, height, width, output_channels]
-
-
bias: an
MLOperand
. The additional 1-D tensor with the shape of [output_channels] whose values are to be added to the convolution result. -
activation: an
MLOperator
. The optional activation function that immediately follows the convolution operation.
-
Returns: an MLOperand
. The output 4-D tensor that contains the convolution result. The output shape is interpreted according to the options.inputLayout value. More specifically, the spatial dimensions or the sizes of the last two dimensions of the output tensor for the nchw input layout can be calculated as follow:
output size = 1 + (input size - filter size - (filter size - 1) * (dilation - 1) + beginning padding + ending padding) / stride
7.7.5. convTranspose2d
Compute a 2-D transposed convolution given 4-D input and filter tensorsenum {
MLConvTranspose2dFilterOperandLayout ,
"iohw" ,
"hwoi" };
"ohwi" dictionary {
MLConvTranspose2dOptions sequence <long >;
padding sequence <long >;
strides sequence <long >;
dilations sequence <long >;
outputPadding sequence <long >;
outputSizes MLAutoPad = "explicit";
autoPad long = 1;
groups MLInputOperandLayout = "nchw";
inputLayout MLConvTranspose2dFilterOperandLayout = "iohw";
filterLayout MLOperand ;
bias MLOperator ; };
activation partial interface MLGraphBuilder {MLOperand (
convTranspose2d MLOperand ,
input MLOperand ,
filter optional MLConvTranspose2dOptions = {}); };
options
-
input: an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.inputLayout. -
filter: an
MLOperand
. The filter 4-D tensor. The logical shape is interpreted according to the value of options.filterLayout and options.groups. -
options: an optional
MLConvTranspose2dOptions
. The optional parameters of the operation.-
padding: a sequence of
long
of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input, [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides: a sequence of
long
of length 2. The stride of the sliding window for each spatial dimension of input, [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations: a sequence of
long
of length 2. The dilation factor for each spatial dimension of input, [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
outputPadding: a sequence of
long
of length 2. The padding values applied to each spatial dimension of the output tensor. This explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the options.strides is greater than 1. Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor. If not specified, the values are assumed to be [0,0]. -
outputSizes: a sequence of
long
of length 2. The sizes of the last two dimensions of the output tensor. When the output sizes are explicitly specified, the output padding values in options.outputPadding are ignored. If not specified, the output sizes are automatically computed. -
autoPad: an
MLAutoPad
. The automatic input padding options. By default, this argument is set to "explicit", which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit", the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
groups: a
long
scalar. The number of groups that input channels and output channels are divided into, default to 1. -
inputLayout: an
MLInputOperandLayout
. The default value is "nchw". This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, input_channels, height, width]
-
output tensor: [batches, output_channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, input_channels]
-
output tensor: [batches, height, width, output_channels]
-
-
filterLayout: an
MLConvTranspose2dFilterOperandLayout
. The default value is "iohw". This option specifies the layout format of the filter tensor as follow:"iohw":
-
[input_channels, output_channels/groups, height, width]
"hwoi":
-
[height, width, output_channels/groups, input_channels]
"ohwi":
-
[output_channels/groups, height, width, input_channels]
-
-
bias: an
MLOperand
. The additional 1-D tensor with the shape of [output_channels] whose values are to be added to the transposed convolution result. -
activation: an
MLOperator
. The optional activation function that immediately follows the transposed convolution operation.
-
Returns: an MLOperand
. The output 4-D tensor that contains the transposed convolution result. The output shape is interpreted according to the options.inputLayout value. More specifically, unless the options.outputSizes values are explicitly specified, the options.outputPadding may be needed to compute the spatial dimension values of the output tensor as follow:
output size = (input size - 1) * stride + filter size + (filter size - 1) * (dilation - 1) - beginning padding - ending padding + output padding
7.7.6. element-wise binary operations
Compute the element-wise binary addition, subtraction, multiplication, division, maximum and minimum of the two input tensors.partial interface MLGraphBuilder {MLOperand (
add MLOperand ,
a MLOperand );
b MLOperand (
sub MLOperand ,
a MLOperand );
b MLOperand (
mul MLOperand ,
a MLOperand );
b MLOperand (
div MLOperand ,
a MLOperand );
b MLOperand (
max MLOperand ,
a MLOperand );
b MLOperand (
min MLOperand ,
a MLOperand );
b MLOperand (
pow MLOperand ,
a MLOperand ); };
b
Returns: an MLOperand
. The output tensor that contains the result of
element-wise binary operation of the two input tensors.
The element-wise binary operation will be broadcasted according to [numpy-broadcasting-rule]. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
Operation types:
-
add: Add the values of the two input tensors, element-wise.
-
sub: Subtract the values of the second input tensor from the values of the first input tensor, element-wise.
-
mul: Multiply the values of the two input tensors, element-wise.
-
div: Divide the values of the first input tensor with the values of the second tensor, element-wise.
-
max: Select the greater values of the two input tensors, element-wise.
-
min: Select the lesser values of the two input tensors, element-wise.
-
pow: Compute the values of the values of the first input tensor to the power of the values of the second input tensor, element-wise.
7.7.7. element-wise unary operations
Compute the element-wise unary operation for input tensor.partial interface MLGraphBuilder {MLOperand (
abs MLOperand );
x MLOperand (
ceil MLOperand );
x MLOperand (
cos MLOperand );
x MLOperand (
exp MLOperand );
x MLOperand (
floor MLOperand );
x MLOperand (
log MLOperand );
x MLOperand (
neg MLOperand );
x MLOperand (
sin MLOperand );
x MLOperand (
tan MLOperand ); };
x
-
x: an
MLOperand
. The input tensor.
Returns: an MLOperand
. The output tensor that contains the result of
element-wise unary operation of the input tensor. The shape of the output
tensor is the same as the shape of input tensor.
Operation types:
-
abs: Compute the absolute value of the input tensor, element-wise.
-
ceil: Compute the ceiling of the input tensor, element-wise.
-
cos: Compute the cosine of the input tensor, element-wise.
-
exp: Compute the exponential of the input tensor, element-wise.
-
floor: Compute the floor of the input tensor, element-wise.
-
log: Compute the natural logarithm of the input tensor, element-wise.
-
neg: Compute the numerical negative value of the input tensor, element-wise.
-
sin: Compute the sine of the input tensor, element-wise.
-
tan: Compute the tangent of the input tensor, element-wise.
7.7.8. elu
Calculate the exponential linear unit function on the input tensor element-wise. The calculation follows the expressionmax(0, x) + alpha * (exp(min(0, x)) - 1)
.
dictionary {
MLEluOptions float = 1; };
alpha partial interface MLGraphBuilder {MLOperand (
elu MLOperand ,
x optional MLEluOptions = {});
options MLOperator (
elu optional MLEluOptions = {}); };
options
-
x: an
MLOperand
. The input tensor. -
options: an optional
MLEluOptions
. The optional parameters of the operation.-
alpha: a
float
scalar multiplier, default to 1.
-
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the elu operation.
return builder. add( builder. max( 0 , x), builder. mul( builder. constant( options. alpha), builder. sub( builder. exp( builder. min( builder. constant( 0 ), x)), builder. constant( 1 ))));
7.7.9. gemm
Calculate the general matrix multiplication of the Basic Linear Algebra Subprograms. The calculation follows the expressionalpha * A * B + beta * C
, where A
is a 2-D tensor with shape [M, K] or [K, M], B
is a 2-D tensor with shape [K, N] or [N, K], and C
is broadcastable to the shape [M, N]. A
and B
may optionally be transposed prior to the calculation.
dictionary {
MLGemmOptions MLOperand ;
c float = 1.0;
alpha float = 1.0;
beta boolean =
aTranspose false ;boolean =
bTranspose false ; };partial interface MLGraphBuilder {MLOperand (
gemm MLOperand ,
a MLOperand ,
b optional MLGemmOptions = {}); };
options
-
a: an
MLOperand
. The first input 2-D tensor with shape [M, K] if aTranspose is false, or [K, M] if aTranspose is true. -
b: an
MLOperand
. The second input 2-D tensor with shape [K, N] if bTranspose is false, or [N, K] if bTranspose is true. -
options: an optional
MLGemmOptions
. The optional parameters of the operation.-
c: an
MLOperand
. The third input tensor. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape [M, N] according to [numpy-broadcasting-rule]. When it is not specified, the computation is done as if c is a scalar 0.0. -
alpha: a
float
scalar multiplier for the first input, default to 1.0. -
beta: a
float
scalar multiplier for the third input, default to 1.0. -
aTranspose: a
boolean
indicating if the first input should be transposed prior to calculating the output, default to false. -
bTranspose: a
boolean
indicating if the second input should be transposed prior to calculating the output, default to false.
-
Returns: an MLOperand
. The output 2-D tensor of shape [M, N] that contains the calculated product of all the inputs.
if ( options. aTranspose) a= builder. transpose( a); if ( options. bTranspose) b= builder. transpose( b); let ab= builder. matmul( builder. mul( builder. constant( options. alpha), a), b); return ( c? builder. add( ab, builder. mul( builder. constant( options. beta), c)) : ab);
7.7.10. gru
Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of the Networkenum {
MLRecurrentNetworkWeightLayout , // update-reset-new gate ordering
"zrn" // reset-update-new gate ordering };
"rzn" enum {
MLRecurrentNetworkDirection ,
"forward" ,
"backward" };
"both" dictionary {
MLGruOptions MLOperand ;
bias MLOperand ;
recurrentBias MLOperand ;
initialHiddenState boolean =
resetAfter true ;boolean =
returnSequence false ;MLRecurrentNetworkDirection = "forward";
direction MLRecurrentNetworkWeightLayout = "zrn";
layout sequence <MLOperator >; };
activations partial interface MLGraphBuilder {sequence <MLOperand >(
gru MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight long ,
steps long ,
hiddenSize optional MLGruOptions = {}); };
options
-
input: an
MLOperand
. The input 3-D tensor of shape [steps, batch_size, input_size]. -
weight: an
MLOperand
. The 3-D input weight tensor of shape [num_directions, 3 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the layout argument. -
recurrentWeight: an
MLOperand
. The 3-D recurrent weight tensor of shape [num_directions, 3 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the layout argument. -
steps: a
long
scalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize: a
long
scalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options: an optional
MLGruOptions
. The optional parameters of the operation.-
bias: an
MLOperand
. The 2-D input bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias: an
MLOperand
. The 2-D recurrent bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
initialHiddenState: an
MLOperand
. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it’s assumed to be a tensor filled with zero. -
resetAfter: a
boolean
indicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
returnSequence: a
boolean
indicating whether to also return the entire sequence with every cell output from each time step in it in addition to the cell output of the last time step. Default to false. -
direction: a
MLRecurrentNetworkDirection
. The processing direction of the input sequence. When set to "both", the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. -
layout: a
MLRecurrentNetworkWeightLayout
. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z), reset (r), and new (n) gate, as indicated in the second dimension of the weight and bias tensor shape. When not specified, the default layout is "zrn". -
activations: a sequence of
MLOperator
. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it’s assumed to be the sigmoid ("sigmoid") and the hyperbolic tangent ("tanh") function respectively.
-
Returns: a sequence of MLOperand
. The first element of the sequence is a 3-D tensor of shape [num_directions, batch_size, hidden_size], the cell output from the last time step of the network. Additionally, if returnSequence is set to true, the second element is the 4-D output tensor of shape [steps, num_directions, batch_size, hidden_size] containing every cell outputs from each time step in the temporal sequence.
const numDirections= ( options. direction== "both" ? 2 : 1 ); let hiddenState= options. initialHiddenState; if ( ! hiddenState) { const desc= { type: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; hiddenState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } let sequence= null ; let cellWeight= []; let cellRecurrentWeight= []; let cellBias= []; let cellRecurrentBias= []; for ( let slot= 0 ; slot< numDirections; ++ slot) { cellWeight. push( builder. squeeze( builder. slice( weight, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); cellRecurrentWeight. push( builder. squeeze( builder. slice( recurrentWeight, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); cellBias. push( options. bias? ( builder. squeeze( builder. slice( options. bias, [ slot, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); cellRecurrentBias. push( options. recurrentBias? ( builder. squeeze( builder. slice( options. recurrentBias, [ slot, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); } for ( let step= 0 ; step< steps; ++ step) { let cellHidden= []; let cellOutput= null ; for ( let slot= 0 ; slot< numDirections; ++ slot) { cellHidden. push( builder. squeeze( builder. slice( hiddenState, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); } for ( let slot= 0 ; slot< numDirections; ++ slot) { let slice= ( slot== 1 || options. direction== "backward" ? steps- step- 1 : step); let cellInput= builder. squeeze( builder. slice( input, [ slice, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] }); let result= builder. reshape( builder. gruCell( cellInput, cellWeight[ slot], cellRecurrentWeight[ slot], cellHidden[ slot], hiddenSize, { bias: cellBias[ slot], recurrentBias: cellRecurrentBias[ slot], resetAfter: options. resetAfter, layout: options. layout, activations: options. activations}), [ 1 , - 1 , hiddenSize]); cellOutput= ( cellOutput? builder. concat([ cellOutput, result], 0 ) : result); } hiddenState= cellOutput; if ( options. returnSequence) { cellOutput= builder. reshape( cellOutput, [ 1 , numDirections, - 1 , hiddenSize]); sequence= ( sequence? builder. concat([ sequence, cellOutput], 0 ) : cellOutput); } } return ( sequence? [ hiddenState, sequence] : [ hiddenState]);
7.7.11. gruCell
A single time step of the Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.dictionary {
MLGruCellOptions MLOperand ;
bias MLOperand ;
recurrentBias boolean =
resetAfter true ;MLRecurrentNetworkWeightLayout = "zrn";
layout sequence <MLOperator >; };
activations partial interface MLGraphBuilder {MLOperand (
gruCell MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight MLOperand ,
hiddenState long ,
hiddenSize optional MLGruCellOptions = {}); };
options
-
input: an
MLOperand
. The input 2-D tensor of shape [batch_size, input_size]. -
weight: an
MLOperand
. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the layout argument. -
recurrentWeight: an
MLOperand
. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the layout argument. -
hiddenState: an
MLOperand
. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. -
hiddenSize: a
long
scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options: an optional
MLGruCellOptions
. The optional parameters of the operation.-
bias: an
MLOperand
. The 1-D input bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias: an
MLOperand
. The 1-D recurrent bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
resetAfter: a
boolean
indicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
layout: a
MLRecurrentNetworkWeightLayout
. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z), reset (r), and new (n) gate, as indicated in the first dimension of the weight and bias tensor shapes. When not specified, the default layout is "zrn". -
activations: a sequence of
MLOperator
. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it’s default to the sigmoid ("sigmoid") and the hyperbolic tangent ("tanh") function respectively.
-
Returns: an MLOperand
. The 2-D tensor of shape [batch_size, hidden_size], the cell output hidden state of a single time step of the recurrent network.
const one= builder. constant( 1 ); const zero= builder. constant( 0 ); // update gate let z= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 0 ], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // reset gate let r= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // new gate let n; if ( resetAfter) { n= builder. tanh( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. mul( r, builder. add( ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ) ) ); } else { n= builder. tanh( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( builder. mul( r, hiddenState), builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); } // compute the new hidden state return builder. add( builder. mul( z, hiddenState), builder. mul( n, builder. sub( one, z)));
7.7.12. hardSigmoid
Calculate the non-smooth function used in place of a sigmoid function on the input tensor.dictionary {
MLHardSigmoidOptions float = 0.2;
alpha float = 0.5; };
beta partial interface MLGraphBuilder {MLOperand (
hardSigmoid MLOperand ,
x optional MLHardSigmoidOptions = {});
options MLOperator (
hardSigmoid optional MLHardSigmoidOptions = {}); };
options
-
x: an
MLOperand
. The input tensor. -
options: an optional
MLHardSigmoidOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the hard sigmoid operation.
return builder. max( builder. min( builder. add( builder. mul( builder. constant( options. alpha), x), builder. constant( options. beta)), builder. constant( 1 )), builder. constant( 0 ));
7.7.13. hardSwish
Computes the nonlinear functiony = x * max(0, min(6, (x + 3))) / 6
that is introduced by [MobileNetV3] on the input tensor element-wise.
partial interface MLGraphBuilder {MLOperand (
hardSwish MLOperand );
x MLOperator (); };
hardSwish
-
x: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the hard-swish operation.
return builder. div( builder. mul( x, builder. max( builder. constant( 0 ), builder. min( builder. constant( 6 ), builder. add( x, builder. constant( 3 ))))), builder. constant( 6 ));
7.7.14. instanceNormalization
Normalize the input features using [Instance-Normalization]. Unlike § 7.7.1 batchNormalization where the mean and variance values used in the calculation are previously computed across the batch dimension during the model training phase, the mean and variance values used in the calculation of an instance normalization are computed internally on the fly per input feature.dictionary {
MLInstanceNormalizationOptions MLOperand ;
scale MLOperand ;
bias float = 1e-5;
epsilon MLInputOperandLayout = "nchw"; };
layout partial interface MLGraphBuilder {MLOperand (
instanceNormalization MLOperand ,
input optional MLInstanceNormalizationOptions = {}); };
options
-
input: an
MLOperand
. The input 4-D tensor. -
options: an optional
MLInstanceNormalizationOptions
. The optional parameters of the operation.-
scale: an
MLOperand
. The 1-D tensor of the scaling values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
bias: an
MLOperand
. The 1-D tensor of the bias values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
epsilon: a
float
scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. -
layout: an
MLInputOperandLayout
. This option specifies the layout format of the input. The default value is "nchw".
-
Returns: an MLOperand
. The instance-normalized 4-D tensor of the same shape as the input tensor.
// The mean reductions happen over the spatial dimensions of the input // e.g. axis 2 and 3 of the input tensor. const reduceOptions= { axes: [ 2 , 3 ], keepDimensions: true }; const mean= builder. reduceMean( input, reduceOptions); const variance= builder. reduceMean( builder. pow( builder. sub( input, mean), buider. constant( 2 )), reduceOptions); // The scale and bias values are applied per input feature // e.g. axis 1 of the input tensor. const shape= [ 1 , - 1 , 1 , 1 ]; return builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, mean), buidler. pow( builder. add( variance, options. epsilon), builder. constant( 0.5 )) ) ), builder. reshape( options. bias, shape) );
7.7.15. leakyRelu
Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expressionmax(0, x) + alpha ∗ min(0, x)
.
dictionary {
MLLeakyReluOptions float = 0.01; };
alpha partial interface MLGraphBuilder {MLOperand (
leakyRelu MLOperand ,
x optional MLLeakyReluOptions = {});
options MLOperator (
leakyRelu optional MLLeakyReluOptions = {}); };
options
-
x: an
MLOperand
. The input tensor. -
options: an optional
MLLeakyReluOptions
. The optional parameters of the operation.-
alpha: a
float
scalar multiplier, default to 0.01.
-
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the leaky relu operation.
return builder. add( builder. max( builder. constant( 0 ), x), builder. mul( builder. constant( options. alpha), builder. min( builder. constant( 0 ), x)));
7.7.16. matmul
Compute the matrix product of two input tensors.partial interface MLGraphBuilder {MLOperand (
matmul MLOperand ,
a MLOperand ); };
b
Returns: an MLOperand
. The output N-D tensor that contains the matrix
product of two input tensors.
Compute the matrix product of two input tensors. It behaves as following:
-
If both a and b are 2-D, they are multiplied like conventional matrices and produce a 2-D tensor as the output.
-
If either a or b is N-D, N > 2, it is treated as a stack of matrices with dimensions corresponding to the last two indices. The matrix multiplication will be broadcasted accordingly by following [numpy-broadcasting-rule]. The output is a N-D tensor whose rank is the maximum rank of the input tensors. For each dimension, except the last two, of the output tensor, its size is the maximum size along that dimension of the input tensors.
-
If a is 1-D, it is converted to a 2-D tensor by prepending a 1 to its dimensions.
-
If b is 1-D, it is converted to a 2-D tensor by by appending a 1 to its dimensions.
-
If both a and b are 1-D, the operation is a vector dot-product, which produces a scalar output.
7.7.17. linear
Calculate a linear functiony = alpha * x + beta
on the input tensor.
dictionary {
MLLinearOptions float = 1;
alpha float = 0; };
beta partial interface MLGraphBuilder {MLOperand (
linear MLOperand ,
x optional MLLinearOptions = {});
options MLOperator (
linear optional MLLinearOptions = {}); };
options
-
x: an
MLOperand
. The input tensor. -
options: an optional
MLLinearOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the linear operation.
return builder. add( builder. mul( x, builder. constant( options. alpha)), builder. constant( options. beta));
7.7.18. pad
Inflate the tensor with constant or mirrored values on the edges.enum {
MLPaddingMode ,
"constant" ,
"edge" ,
"reflection" };
"symmetric" dictionary {
MLPadOptions MLPaddingMode = "constant";
mode float = 0; };
value partial interface MLGraphBuilder {MLOperand (
pad MLOperand ,
input MLOperand ,
padding optional MLPadOptions = {}); };
options
-
input: an
MLOperand
. The input tensor. -
padding: an
MLOperand
. The 2-D Tensor of integer values indicating the number of padding values to add at the beginning and end of each input dimensions. The tensor has shape [n, 2] where n is the rank of the input tensor. For each dimension D of input, padding[D, 0] indicates how many values to add before the content in that dimension, and padding[D, 1] indicates how many values to add after the content in that dimension. -
options: an optional
MLPadOptions
. The optional parameters of the operation.-
mode: a
MLPaddingMode
. The different ways to pad the tensor. When not set, it’s assumed to be "constant". -
value: a
float
. The pad value when the options.mode is set to "constant". When not set, it’s assumed to be 0.
-
Returns: an MLOperand
. The padded output tensor.
// input: [[1,2,3], [4,5,6]] const input= builder. constant( { type: 'float32' , dimensions: [ 2 , 3 ] }, new Float32Array([ 1 , 2 , 3 , 4 , 5 , 6 ])); // padding: [[1,1], [2,2]] const padding= builder. constant( { type: 'float32' , dimensions: [ 2 , 2 ] }, new Float32Array([ 1 , 1 , 2 , 2 ])); // "constant" padded: // [[0,0,0,0,0,0,0], // [0,0,1,2,3,0,0], // [0,0,4,5,6,0,0], // [0,0,0,0,0,0,0]] builder. pad( input, padding); // "edge" padded: // [[1,1,1,2,3,3,3], // [1,1,1,2,3,3,3], // [4,4,4,5,6,6,6], // [4,4,4,5,6,6,6]] builder. pad( input, padding, { mode: "edge" }); // "reflection" padded: // [[6,5,4,5,6,5,4], // [3,2,1,2,3,2,1], // [6,5,4,5,6,5,4], // [3,2,1,2,3,2,1]] builder. pad( input, padding, { mode: "reflection" }); // "symmetric" padded: // [[2,1,1,2,3,3,2], // [2,1,1,2,3,3,2], // [5,4,4,5,6,6,5], // [5,4,4,5,6,6,5]] builder. pad( input, padding, { mode: "symmetric" });
7.7.19. pooling operations
Compute a mean, L2 norm, or max reduction operation across all the elements within the moving window over the input tensor. See the description of each type of reduction in § 7.7.20 reduction operations.enum {
MLRoundingType ,
"floor" };
"ceil" dictionary {
MLPool2dOptions sequence <long >;
windowDimensions sequence <long >;
padding sequence <long >;
strides sequence <long >;
dilations MLAutoPad = "explicit";
autoPad MLInputOperandLayout = "nchw";
layout MLRoundingType = "floor";
roundingType sequence <long >; };
outputSizes partial interface MLGraphBuilder {MLOperand (
averagePool2d MLOperand ,
input optional MLPool2dOptions = {});
options MLOperand (
l2Pool2d MLOperand ,
input optional MLPool2dOptions = {});
options MLOperand (
maxPool2d MLOperand ,
input optional MLPool2dOptions = {}); };
options
-
input: an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.layout. -
options: an optional
MLPool2dOptions
. The optional parameters of the operation.-
windowDimensions: a sequence of
long
of length 2. The dimensions of the sliding window, [window_height, window_width]. If not present, the window dimensions are assumed to be the height and width dimensions of the input shape. -
padding: a sequence of
long
of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input, [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides: a sequence of
long
of length 2. The stride of the sliding window for each spatial dimension of input, [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations: a sequence of
long
of length 2. The dilation factor for each spatial dimension of input, [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
autoPad: an
MLAutoPad
. The automatic input padding options. By default, this argument is set to "explicit", which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit", the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
layout: an
MLInputOperandLayout
. The default value is "nchw". This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, channels, height, width]
-
output tensor: [batches, channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, channels]
-
output tensor: [batches, height, width, channels]
-
-
roundingType: an
MLRoundingType
. The option specifies the rounding function used to compute the output shape. -
outputSizes: a sequence of long of length 2. The sizes of the two spacial dimensions of the output tensor. When the output sizes are explicitly specified, the options.roundingType is ignored. If not specified, the output sizes are automatically computed.
-
Returns: an MLOperand
. The output 4-D tensor that contains the
result of the reduction. The logical shape is interpreted according to the
value of layout. More specifically, if the options.roundingType is "floor", the spatial dimensions of the output tensor can be calculated as follow:
output size = floor(1 + (input size - filter size + beginning padding + ending padding) / stride)
or if options.roundingType is "ceil":
output size = ceil(1 + (input size - filter size + beginning padding + ending padding) / stride)
// 'global' max pooling builder. maxPool2d( input);
7.7.20. reduction operations
Reduce the input along the dimensions given in axes.dictionary {
MLReduceOptions sequence <long >=
axes null ;boolean =
keepDimensions false ; };partial interface MLGraphBuilder {MLOperand (
reduceL1 MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceL2 MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceLogSum MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceLogSumExp MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceMax MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceMean MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceMin MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceProduct MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceSum MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceSumSquare MLOperand ,
input optional MLReduceOptions = {}); };
options
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLReduceOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The reduced output tensor.
Reduction types:
-
L1: Compute the L1 norm of all the input values along the axes.
-
L2: Compute the L2 norm of all the input values along the axes.
-
LogSum: Compute the log value of the sum of all the input values along the axes.
-
LogSumExp: Compute the log value of the sum of the exponent of all the input values along the axes.
-
Max: Compute the maximum value of all the input values along the axes.
-
Mean: Compute the average value of all the input values along the axes.
-
Min: Compute the minimum value of all the input values along the axes.
-
Product: Compute the product of all the input values along the axes.
-
Sum: Compute the sum of all the input values along the axes.
-
SumSquare: Compute the sum of the square of all the input values along the axes.
7.7.21. relu
Compute the rectified linear function of the input tensor.partial interface MLGraphBuilder {MLOperand (
relu MLOperand );
x MLOperator (); };
relu
-
x: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the relu operation.
return builder. max( builder. constant( 0 ), x);
7.7.22. resample2d
Resample the tensor values from the source to the destination spatial dimensions according to the scaling factors.enum {
MLInterpolationMode ,
"nearest-neighbor" };
"linear" dictionary {
MLResample2dOptions MLInterpolationMode = "nearest-neighbor";
mode sequence <float >;
scales sequence <long >;
sizes sequence <long >; };
axes partial interface MLGraphBuilder {MLOperand (
resample2d MLOperand ,
input optional MLResample2dOptions = {}); };
options
-
input: an
MLOperand
. The input 4-D tensor. -
options: an optional
MLResample2dOptions
. The optional parameters of the operation.-
mode: an
MLInterpolationMode
. The interpolation algorithm used to fill the output tensor values. If not set, it is assumed to be the Nearest Neighbor interpolation. -
scales: a sequence of
float
of length 2. Each value represents the scaling factor used to scale in each spatial dimensions of input, [scale_height, scale_width]. If not set, the values are assumed to be [1.0, 1.0]. -
sizes: a sequence of
long
of length 2. The target sizes for each spatial dimensions of input, [size_height, size_width]. When the target sizes are specified, the options.scales argument is ignored as the scaling factor values are derived from the target sizes of each spatial dimension of input. -
axes: a sequence of
long
of length 2. The two consecutive dimensions of the input tensor to which the interpolation algorithm applies. The valid values in the sequence are [0, 1], [1, 2] or [2, 3]. When not specified, the sequence is assumed to be [2, 3].
-
Returns: an MLOperand
. The output 4-D tensor.
7.7.23. reshape
Alter the shape of a tensor to a new shape. Reshape does not copy or change the content of the tensor. It just changes the tensor’s logical dimensions for the subsequent operations.partial interface MLGraphBuilder {MLOperand (
reshape MLOperand ,
input sequence <long >); };
newShape
-
input: an
MLOperand
. The input tensor. -
newShape: a sequence of
long
. The shape of the output tensor. The number of elements implied by newShape must be the same as the number of elements in the input tensor. Only one component of newShape can be the special value of -1. The size of the dimension with the value -1 is computed so that the total size remains constant.
Returns: an MLOperand
. The output tensor. The values of the output
tensor are the same as values of the input tensor. The shape of the output
tensor is specified by the newShape argument.
7.7.24. sigmoid
Compute the sigmoid function of the input tensor. The calculation follows the expression1 / (exp(-x) + 1)
.
partial interface MLGraphBuilder {MLOperand (
sigmoid MLOperand );
x MLOperator (); };
sigmoid
-
x: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the sigmoid operation.
return builder. div( builder. constant( 1 ), builder. add( builder. exp( builder. neg( x)), builder. constant( 1 )));
7.7.25. slice
Produce a slice of the input tensor.dictionary {
MLSliceOptions sequence <long >; };
axes partial interface MLGraphBuilder {MLOperand (
slice MLOperand ,
input sequence <long >,
starts sequence <long >,
sizes optional MLSliceOptions = {}); };
options
-
input: an
MLOperand
. The input tensor. -
starts: a sequence of
long
. The starting indices to slice of the corresponding axes of the input shape. A negative index value is interpreted as counting back from the end. For example, the value -1 -
sizes: a sequence of
long
. The lengths to slice of the corresponding axes of the input shape. The length value of -1 selects all the remaining elements from the starting index of the given axis. -
options: an optional
MLSliceOptions
. The optional parameters of the operation.-
axes: a sequence of
long
. The dimensions of the input shape to which starts and sizes apply. The values in the sequence are either within the [0, r-1] range where r is the input tensor rank, or the [-r, -1] range where negative values mean counting back from the end of the input shape. When not specified, the sequence is assumed to be [0,1,..r-1].
-
Returns: an MLOperand
. The output tensor of the same rank as the input tensor with tensor values stripped to the specified starting and ending indices in each dimension.
7.7.26. softmax
Compute the softmax values of the 2-D input tensor along axis 1.partial interface MLGraphBuilder {MLOperand (
softmax MLOperand ); };
x
-
x: an
MLOperand
. The input 2-D tensor.
Returns: an MLOperand
. The output 2-D tensor that contains the softmax
results, of the same shape as the input tensor.
// This sample deploys a well-known implementation trick [1] to compute the // exponentials of the distances to the max value, instead of the exponentials // of the input values itself, in order to increase the numerical stability of // the result. // [1]: https://cs231n.github.io/linear-classify/#softmax const max_x= builder. reduceMax( x, { axes: [ 1 ], keepDimensions: true }); const exp_x= builder. exp( builder. sub( x, max_x)); return builder. div( exp_x, builder. reduceSum( exp_x, { axes: [ 1 ], keepDimensions: true }));
7.7.27. softplus
Compute the softplus function of the input tensor. The calculation follows the expressionln(1 + exp(steepness * x)) / steepness
.
dictionary {
MLSoftplusOptions float = 1; };
steepness partial interface MLGraphBuilder {MLOperand (
softplus MLOperand ,
x optional MLSoftplusOptions = {});
options MLOperator (
softplus optional MLSoftplusOptions = {}); };
options
-
x: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the softplus operation.
return builder. div( builder. log( builder. add( builder. exp( builder. mul( x, builder. constant( options. steepness))), builder. constant( 1 ))), builder. constant( options. steepness));
7.7.28. softsign
Compute the softsign function of the input tensor. The calculation follows the expressionx / (1 + |x|)
.
partial interface MLGraphBuilder {MLOperand (
softsign MLOperand );
x MLOperator (); };
softsign
-
x: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the softsign operation.
return builder. div( x, builder. add( builder. constant( 1 ), builder. abs( x)));
7.7.29. split
Split the input tensor into a number of sub tensors along the given axis.dictionary {
MLSplitOptions long = 0; };
axis partial interface MLGraphBuilder {sequence <MLOperand >(
split MLOperand , (
input unsigned long or sequence <unsigned long >),
splits optional MLSplitOptions = {}); };
options
-
input: an
MLOperand
. The input tensor. -
splits: an
unsigned long
or a sequence ofunsigned long
. If anunsigned long
, it specifies the number of output tensors along the axis. The number must evenly divide the dimension size of input along options.axis. If a sequence ofunsigned long
, it specifies the sizes of each output tensor along the options.axis. The sum of sizes must equal to the dimension size of input along options.axis. -
options: an optional
MLSplitOptions
. The optional parameters of the operation.-
axis: a
long
. The dimension along which to split. Default to 0. A negative value is interpreted as counting back from the end.
-
Returns: a sequence of MLOperand
. The splitted output tensors. If splits is an unsigned long
, the length of the output sequence equals to splits. The shape of each output tensor is the same as input except the dimension size of axis equals to the quotient of dividing the dimension size of input along axis by splits. If splits is a sequence of unsigned long
, the length of the output sequence equals to the length of splits. The shape of the i-th output tensor is the same as as input except along axis where the dimension size is splits[i].
// This sample shows the case that the splits parameter is an array. const outputs= []; let start= 0 ; for ( const sizeof splits) { outputs. push( builder. slice( input, [ start], [ size], { axes: [ options. axis] })); start+= size; } return outputs;
7.7.30. squeeze
Reduce the rank of a tensor by eliminating dimensions with size 1 of the tensor shape. Squeeze only affects the tensor’s logical dimensions. It does not copy or change the content in the tensor.dictionary {
MLSqueezeOptions sequence <long >; };
axes partial interface MLGraphBuilder {MLOperand (
squeeze MLOperand ,
input optional MLSqueezeOptions = {}); };
options
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLSqueezeOptions
. The optional parameters of the operation.-
axes: a sequence of
long
. Indices to the shape dimensions of size 1 to eliminate. When not specified, every shape dimensions of size 1 in the tensor are eliminated.
-
Returns: an MLOperand
. The output tensor of the same or reduced rank with the shape dimensions of size 1 eliminated.
7.7.31. tanh
Compute the hyperbolic tangent function of the input tensor. The calculation follows the expression(exp(2 * x) - 1) / (exp(2 * x) + 1)
.
partial interface MLGraphBuilder {MLOperand (
tanh MLOperand );
x MLOperator (); };
tanh
-
x: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x. -
an
MLOperator
. The operator representing the tanh operation.
return builder. div( builder. sub( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )), builder. add( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )));
7.7.32. transpose
Permute the dimensions of the input tensor according to the permutation argument.dictionary {
MLTransposeOptions sequence <long >; };
permutation partial interface MLGraphBuilder {MLOperand (
transpose MLOperand ,
input optional MLTransposeOptions = {}); };
options
-
input: an
MLOperand
. The input N-D tensor. -
options: an optional
MLTransposeOptions
. The optional parameters of the operation.-
permutation: a sequence of
long
values. The values used to permute the output shape. When it’s not specified, it’s set to[N-1...0]
, whereN
is the rank of the input tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values in the sequence must be the same as the rank of the input tensor, and the values in the sequence must be within the range from 0 to N-1 with no two or more same values found in the sequence.
-
Returns: an MLOperand
. The permuted or transposed N-D tensor.
7.8. MLGraph
TheMLGraph
interface represents a compiled computational graph. A compiled graph once constructed is immutable and cannot be subsequently changed.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLGraph
MLGraph
has the following internal slots:
[[context]]
of typeMLContext
[[inputDescriptors]]
of type record<DOMString
,MLOperandDescriptor
>-
Maps the name of an input
MLOperand
to itsMLOperandDescriptor
for all inputMLOperand
s of thisMLGraph
. [[outputDescriptors]]
of type record<DOMString
,MLOperandDescriptor
>-
Maps the name of an output
MLOperand
to itsMLOperandDescriptor
for all outputMLOperand
s of thisMLGraph
. [[implementation]]
-
The underlying implementation provided by the User Agent.
7.9. MLCommandEncoder
TheMLCommandEncoder
interface represents a method of execution that synchronously records the computational workload of a compiled MLGraph
to a GPUCommandBuffer
on the calling thread. Since the workload is not immediately executed, just recorded, this method allows more flexibility for the caller to determine how and when the recorded commands will be submitted for execution on the GPU relative to other GPU workload on the same or different queue.
typedef (GPUBuffer or GPUTexture );
MLGPUResource typedef record <DOMString ,MLGPUResource >; [
MLNamedGPUResources SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLCommandEncoder
MLCommandEncoder
has the following internal slots:
[[context]]
of typeMLContext
-
The context of type
MLContext
associated with thisMLCommandEncoder
. [[implementation]]
-
The underlying implementation provided by the User Agent.
7.9.1. Graph Initialization
Record the initialization of theMLGraph
. This is a necessary step for optimal performance during graph execution as it gives the platform an opportunity to prepare and optimize constant input data for the subsequent execution of the graph. This method should only be called once per graph.
partial interface MLCommandEncoder {undefined (
initializeGraph MLGraph ); };
graph
-
graph: an
MLGraph
. The compiled graph to be initialized with graph constant inputs.
Returns: undefined
.
MLGraphBuilder
.constant(desc, bufferView)
method as constant operands during graph construction time. 7.9.2. Dispatch Execution Commands
Record theMLGraph
execution with the inputs MLNamedGPUResources
and outputs MLNamedGPUResources
.
partial interface MLCommandEncoder {undefined (
dispatch MLGraph ,
graph MLNamedGPUResources ,
inputs MLNamedGPUResources ); };
outputs
-
graph: an
MLGraph
. The compiled graph to be executed. -
inputs: an
MLNamedGPUResources
. The resources of inputs. -
outputs: an
MLNamedGPUResources
. The pre-allocated resources of required outputs.
Returns: undefined
.
-
If any of the following requirements are unmet, then throw a
DataError
DOMException
and stop.-
For each key -> value of inputs:
-
graph.
[[inputDescriptors]]
[key] must exist. -
Let inputDesc be graph.
[[inputDescriptors]]
[key]. -
If value is a
GPUBuffer
, then:-
value.
size
must equal to byte length of inputDesc.
-
-
-
For each key -> value of outputs:
-
graph.
[[outputDescriptors]]
[key] must exist. -
Let outputDesc be graph.
[[outputDescriptors]]
[key]. -
If value is a
GPUBuffer
, then:-
value.
size
must equal to byte length of outputDesc.
-
-
-
-
For each key -> value of inputs:
-
Set the input of graph.
[[implementation]]
that is associated with key to value.
-
-
For each key -> value of outputs:
-
Set the output of graph.
[[implementation]]
that is associated with key to value.
-
-
Issue a compute request of graph.
[[implementation]]
. -
If there is an error returned by graph.
[[implementation]]
, then:-
Throw an
OperationError
DOMException
and stop.
-
-
Return
undefined
.
7.9.3. Generate GPU Command Buffer
Complete the recording of ML workload and return a WebGPU-compatibleGPUCommandBuffer
containing the recorded workload.
partial interface MLCommandEncoder {GPUCommandBuffer (
finish optional GPUCommandBufferDescriptor = {}); };
descriptor
-
descriptor: an optional
GPUCommandBufferDescriptor
. Descriptor of the command buffer.
Returns: GPUCommandBuffer
.
8. Examples
const context= navigator. ml. createContext({ powerPreference: 'low-power' });
constant1 ---+ +--- Add ---> intermediateOutput1 ---+ input1 ---+ | +--- Mul---> output constant2 ---+ | +--- Add ---> intermediateOutput2 ---+ input2 ---+
// Use tensors in 4 dimensions. const TENSOR_DIMS= [ 1 , 2 , 2 , 2 ]; const TENSOR_SIZE= 8 ; const builder= new MLGraphBuilder( context); // Create MLOperandDescriptor object. const desc= { type: 'float32' , dimensions: TENSOR_DIMS}; // constant1 is a constant MLOperand with the value 0.5. const constantBuffer1= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant1= builder. constant( desc, constantBuffer1); // input1 is one of the input MLOperands. Its value will be set before execution. const input1= builder. input( 'input1' , desc); // constant2 is another constant MLOperand with the value 0.5. const constantBuffer2= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant2= builder. constant( desc, constantBuffer2); // input2 is another input MLOperand. Its value will be set before execution. const input2= builder. input( 'input2' , desc); // intermediateOutput1 is the output of the first Add operation. const intermediateOutput1= builder. add( constant1, input1); // intermediateOutput2 is the output of the second Add operation. const intermediateOutput2= builder. add( constant2, input2); // output is the output MLOperand of the Mul operation. const output= builder. mul( intermediateOutput1, intermediateOutput2);
// Compile the constructed graph. const graph= builder. build({ 'output' : output});
// Setup the input buffers with value 1. const inputBuffer1= new Float32Array( TENSOR_SIZE). fill( 1 ); const inputBuffer2= new Float32Array( TENSOR_SIZE). fill( 1 ); const outputBuffer= new Float32Array( TENSOR_SIZE); // Execute the compiled graph with the specified inputs. const inputs= { 'input1' : inputBuffer1, 'input2' : inputBuffer2, }; const outputs= { 'output' : outputBuffer}; context. compute( graph, inputs, outputs); console. log( 'Output value: ' + outputBuffer); // Output value: 2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25
9. Appendices
9.1. MLOperandType
and ArrayBufferView
compatibility
MLOperandType
| ArrayBufferView
|
---|---|
float32
| Float32Array
|
int32
| Int32Array
|
uint32
| Uint32Array
|
int8
| Int8Array
|
uint8
| Uint8Array
|
clarify the usage of ArrayBufferView
for float16
. [Issue #webmachinelearning/webnn#127]
10. Acknowledgements
This specification follows the concepts of the Android Neural Networks API C API.
Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.
Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.
Thanks to Sangwhan Moon and the W3C Technical Architecture Group for review of this specification for web architecture fit, design consistency and developer ergonomics.
Thanks to W3C Privacy Interest Group for privacy and security review and feedback.
Thanks to Alex Gough and the Chrome Security team for security review and questions.
Thanks to Michal Karzynski for sharing practical guidelines and learnings from ONNX.
Thanks to Kaustubha Govind and Chrome privacy reviewers for feedback and privacy considerations.