1. Introduction
The Web Neural Network API defines a web-friendly hardware-agnostic abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms without being tied to platform-specific capabilities. The abstraction layer addresses the requirements of key Machine Learning JavaScript frameworks and also allows web developers familiar with the ML domain to write custom code without the help of libraries.
For an illustrated introduction, please see the explainer.
2. Use cases
2.1. Application Use Cases
This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models].
Note: Please be aware that some of the use cases described here, are by their very nature, privacy-invasive. Developers who are planning to use the API for such use cases should ensure that the API is being used to benefit users, for purposes that users understand, and approve. They should apply the Ethical Principles for Web Machine Learning [webmachinelearning-ethics] and implement appropriate privacy risk mitigations such as transparency, data minimisation, and users controls.
2.1.1. Person Detection
A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.
When she comes back, the application automatically detects her and notifies other online users that she is active now.
2.1.2. Semantic Segmentation
A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+], [MaskR-CNN] or [SegAny] to semantically split an image into segments and replaces segments that represent other people and background with another picture.
2.1.3. Skeleton Detection
A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.
2.1.4. Face Recognition
There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD]) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet], which verifies whether two faces would be identical or not.
2.1.5. Facial Landmark Detection
A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.
2.1.6. Style Transfer
A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.
2.1.7. Super Resolution
A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.
2.1.8. Image Captioning
For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.
2.1.9. Text-to-image
Images are a core part of modern web experiences. An ability to generate images based on text input in a privacy-preserving manner enables visual personalization and adaptation of web applications and content. For example, a web application can use as an input a natural language description on the web page or a description provided by the user within a text prompt to produce an image matching the text description. This text-to-image use case enabled by latent diffusion model architecture [LDM] forms the basis for additional text-to-image use cases. For example, inpainting where a portion of an existing image on the web page is selectively modified using the newly generated content, or the converse, outpainting, where an original image is extended beyond its original dimensions filling the empty space with generated content.
2.1.10. Machine Translation
Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT], which translates every text into different language.
2.1.11. Emotion Analysis
A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji], which infers emotion from input texts, and displays an emoji that represents the estimated emotion.
2.1.12. Video Summarization
A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM].
2.1.13. Noise Suppression
A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.
2.1.14. Speech Recognition
Speech recognition, also known as speech to text, enables recognition and translation of spoken language into text. Example applications of speech recognition include transcription, automatic translation, multimodal interaction, real-time captioning and virtual assistants. Speech recognition improves accessibility of auditory content and makes it possible to interact with such content in a privacy-preserving manner in a textual form. Examples of common use cases include watching videos or participating in online meetings using real-time captioning. Models such as [Whisper] approach humans in their accuracy and robustness and are well positioned to improve accessibility of such use cases.
2.1.15. Text Generation
Various text generation use cases are enabled by large language models (LLM) that are able to perform tasks where a general ability to predict the next item in a text sequence is required. This class of models can translate texts, answer questions based on a text input, summarize a larger body of text, or generate text output based on a textual input. LLMs enable better performance compared to older models based on RNN, CNN, or LSTM architectures and further improve the performance of many other use cases discussed in this section. Examples of LLMs include [t5-small], [m2m100_418M], [gpt2], and [llama-2-7b].
2.1.16. Detecting fake video
A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite a user politically or to manipulate user’s opinion. The deepfake detection applications such as [FaceForensics++] analyze the videos and protect a user against the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time.
2.2. Framework Use Cases
This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.
2.2.1. Custom Layer
A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU], [ELU], etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.
2.2.2. Network Concatenation
A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.
Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.
2.2.3. Performance Adaptation
A web application developer has a concern about performance of her DNN model on mobile devices. She has confirmed that it may run too slow on mobile devices which do not have GPU acceleration. To address this issue, her web application refers to the WebNN API to confirm whether acceleration is available or not, so that the application can display the warning for devices without acceleration.
After several weeks, she has developed a tiny DNN model that can even run on CPU. In order to accommodate CPU execution, she modifies the application so that the application loads the tiny model in the case of CPU-only devices.
2.2.4. Operation Level Execution
A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.
2.2.5. Integration with real-time video processing
The user experience of WebRTC-based video conferencing is enhanced using real-time video processing. For example, background blur implemented using a § 2.1.2 Semantic Segmentation model blurs the background in the user’s live camera feed. To satisfy the performance requirements of this use case, the WebNN API integrates with primitives from other Web APIs that make up the media pipeline to allow WebNN API-based transformation of real-time video streams.
3. Security Considerations
This specification defines a low-level API for neural network inference hardware acceleration. This API is considered a powerful feature [POWERFUL-FEATURES] because it grants low-level access to a user’s computer. To meet the authentication and confidentiality expectations of a powerful feature and to prevent man-in-the-middle attacks, all interfaces defined by this specification are only available in a secure context.This API is disabled by default in all cross-origin frames using the § 6.4 Permissions Policy Integration. This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.
This API allows creation of an MLContext
from a GPUDevice
defined by WebGPU specification. See WebGPU Security Considerations for more information regarding security characteristics of this context.
Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. The bounds checking occurs when the compute method is invoked that executes the graph against the actual data. No actual data is bound to the compiled graph before this stage. It is the implementation’s responsibility to make sure proper bounds checking occurs against the shapes of the data already inferred by that time.
Document operations susceptible to out-of-bounds access as a guidance to implementers.
As a future-proofing measure, the API design allows certain operations that can be generically emulated to be deprecated for security, performance, or other reasons without breaking compatibility. This is made possible by high-level functions that are defined in terms of smaller primitive operations defined in this specifications. This enables a native implementation of a high-level function to be replaced with a polyfill implementation.
Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.
In order to not allow an attacker to target a specific implementation that may contain a flaw, the § 6.2 Device Selection mechanism is a hint only, and the concrete device selection is left to the implementation - a user agent could for instance choose never to run a model on a device with known vulnerabilities. As a further mitigation, no device enumeration mechanism is defined.
Hinting partially mitigates the concern. Investigate additional mitigations.
The API design minimizes the attack surface for the compiled computational graph. The MLGraphBuilder
interface that hosts the various operations is a data definition API and as such doesn’t execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the MLContext
.compute()
method. This enables implementers to focus on hardening the MLContext
.compute()
method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected.
Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [hr-time-3]. The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks.
3.1. Guidelines for new operations
To ensure operations defined in this specification are shaped in a way they can be implemented securely, this section includes guidelines on how operations are expected to be defined to reduce potential for implementation problems. These guidelines are expected to evolve over time to align with industry best practices:
-
Prefer simplicity of arguments
-
Don’t use parsers for complex data formats
-
If an operation can be decomposed to low level primitives:
-
Add an informative emulation path
-
Prefer primitives over new high level operations but consider performance consequences
-
-
Operations should follow a consistent style for inputs and attributes
-
Operation families such as pooling and reduction should share API shape and options
-
Formalize failure cases into test cases whenever possible
-
When in doubt, leave it out: API surface should be as small as possible required to satisfy the use cases, but no smaller
-
Try to keep the API free of implementation details that might inhibit future evolution, do not overspecify
-
Fail fast: the sooner the web developer is informed of an issue, the better
In general, always consider the security and privacy implications as documented in [security-privacy-questionnaire] by the Technical Architecture Group and the Privacy Interest Group when adding new features.
4. Privacy Considerations
This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox.
This API exposes the minimum amount of information necessary to address the identified § 2 Use cases for the best performance and reliability of results.
No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform.
Note: The group is soliciting further input on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.
Unlike WebGPU, this API does not intrinsically support custom shader authoring; and as a result is not prone to timing attacks that rely on shader caches, or other persistent data. The API builds upon pre-existing shaders and lower level primitives of the browser or the underlying OS. Web developers who interface with GPUDevice
are expected to be aware of WebGPU compilation cache considerations.
The WebGPU API identifies machine-specific artifacts as a privacy consideration. Similarly, the WebNN API’s compute unit scheduling may under certain circumstances introduce a fingerprint. However, similarly to WebGPU, such fingerprints are identical across most or all of the devices of each vendor, mitigating the concern. Furthermore, software implementations can be used to further eliminate such artifacts.
The WebNN API defines two developer-settable preferences to help inform § 6.2 Device Selection and allow the implementation to better select the most appropriate underlying execution device for the workload. An MLDeviceType
normatively indicates the kind of device and is either "cpu"
or "gpu"
. If this type cannot be satisfied, an "OperationError
" DOMException
is thrown, thus this type can in some cases add two bits of entropy to the fingerprint. An MLPowerPreference
indicates preference as related to the power consumption and is considered a hint only and as such does not increase entropy of the fingerprint.
If a future version of this specification introduces support for a new MLDeviceType
that can only support a subset of MLOperandDataType
s, that may introduce a new fingerprint.
In general, implementers of this API are expected to apply WebGPU Privacy Considerations to their implementations where applicable.
5. Ethical Considerations
The Working Group has started documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account. The Working Group publishes and maintains an Ethical Principles for Web Machine Learning document [webmachinelearning-ethics] open to contributions from the wider community via a dedicated GitHub repository.
6. Programming Model
6.1. Overview
At the heart of neural networks is a computational graph of mathematical operations. These operations are the building blocks of modern machine learning technologies in computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks.
The MLGraph
interface represents a compiled computational graph that is immutable (that is, a model).
The MLGraphBuilder
interface serves as a builder (factory) to construct a computational graph (its graph) that is then compiled to create an MLGraph
.
In WebNN, a computational graph is composed of operators which act on data, and are the nodes of the graph. MLOperand
s are a representation of data that flows within the computational graph, and are the edges of the graph. MLOperand
s include a computational graph's input values for inference, constants (including trained weights) used for inference, intermediate values (often referred to as activations) computed during inference, as well as the output values of inference. An operator's input is one or more MLOperand
s. An operator's output is one or more MLOperand
s. Operators have operator-specific parameters that control their behavior, which can include zero or more activation functions, which are MLActivation
s.
A key part of the MLGraphBuilder
interface are methods such as gemm()
and softmax()
which create an operator which represents the actual operation to perform on the input data when the computation is run, and return a new MLOperand
or MLActivation
holding the operator. Methods that create an MLOperand
connect any inputs and activations to the operator. Each method invocation returns a distinct new value, without changing the value of any other MLOperand
.
At inference time, every MLOperand
will be bound to a tensor (the actual data), which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the array data (such as its shape).
Operations within the computational graph have functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation of operations such as reshape, or slice may return a view of its input tensor that shares the same buffer as the input tensor. (In the case of reshape, the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values.
Before the execution, the computation graph that is used to compute one or more specified outputs needs to be converted, compiled, and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion. The user agent may also perform these optimizations during graph conversion.
The MLGraphBuilder
.build()
method compiles the graph in the background without blocking the calling thread, and returns a Promise
that resolves to an MLGraph
. The compilation step produces an MLGraph
that represents a compiled graph for optimal execution.
The MLGraph
underlying implementation will be composed of platform-specific representations of operators and operands which correspond to the MLGraphBuilder
's operators and MLOperand
s, but which are not script-visible and may be compositions or decompositions of the graph as constructed by script.
Once the MLGraph
is constructed, the MLContext
.compute()
method performs the execution of the graph asynchronously either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU command queue. This method returns immediately without blocking the calling thread while the actual execution is offloaded to a different timeline. The caller supplies the input values using MLNamedArrayBufferViews
, binding the input MLOperand
s to their values. The caller then supplies pre-allocated buffers for output MLOperand
s using MLNamedArrayBufferViews
. The execution produces the results of the computation from all the inputs bound to the graph. The computation results will be placed at the bound outputs at the time the operation is successfully completed on the offloaded timeline at which time the calling thread is signaled. This type of execution supports both the CPU and GPU device.
6.2. Device Selection
An MLContext
interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. In addition to the default method of creation with MLContextOptions
, an MLContext
could also be created from a specific GPUDevice
that is already in use by the application.
In a situation when a GPU context executes a graph with a constant or an input in the system memory as an ArrayBufferView
, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an ArrayBufferView
output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn’t occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller’s perspective.
When an MLContext
is created with MLContextOptions
, the user agent selects and creates the underlying execution device by taking into account the application’s MLPowerPreference
and MLDeviceType
options.
6.3. Task Source
The ML task source is a task source to be used for all tasks related to asynchronous compilation and execution of MLGraph
s and creation of MLContext
s.
To queue an ML task given a global object global and a series of steps steps, queue a global task on the ML task source with global and steps.
6.4. Permissions Policy Integration
This specification defines a policy-controlled feature identified by the
string "webnn
".
Its default allowlist is 'self'
.
7. API
7.1. The navigator.ml interface
An ML
object is available in the Window
and DedicatedWorkerGlobalScope
contexts through the Navigator
and WorkerNavigator
interfaces respectively and is exposed via navigator.ml
.
interface mixin { [
NavigatorML SecureContext ,SameObject ]readonly attribute ML ; };
ml Navigator includes NavigatorML ;WorkerNavigator includes NavigatorML ;
7.2. ML
interface
enum MLDeviceType {"cpu" ,"gpu" };enum MLPowerPreference {"default" ,"high-performance" ,"low-power" };dictionary {
MLContextOptions MLDeviceType deviceType = "cpu";MLPowerPreference powerPreference = "default"; }; [SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {
ML Promise <MLContext >createContext (optional MLContextOptions = {});
options Promise <MLContext >createContext (GPUDevice ); };
gpuDevice
7.2.1. MLContextOptions
The deviceType
option is an MLDeviceType
and indicates the application’s preference for the kind of device used for the context. It is one of the following:
- "
cpu
" - Provides the broadest compatibility and usability across all client devices with varying degrees of performance.
- "
gpu
" - Provides the broadest range of achievable performance across graphics hardware platforms from consumer devices to professional workstations.
The powerPreference
option is an MLPowerPreference
and indicates the application’s preference as related to power consumption. It is one of the following:
- "
default
" - Let the user agent select the most suitable behavior.
- "
high-performance
" - Prioritizes execution speed over power consumption.
- "
low-power
" - Prioritizes power consumption over other considerations such as execution speed.
7.2.2. createContext()
To create a context given realm realm and options (a GPUDevice
or MLContextOptions
), run these steps:
-
Let context be a new
MLContext
object with realm. -
If options is a
GPUDevice
object,-
Set context.
[[contextType]]
to "webgpu". -
Set context.
[[deviceType]]
to"gpu"
. -
Set context.
[[powerPreference]]
to"default"
.
-
-
Otherwise,
-
Set context.
[[contextType]]
to "default". -
If options["
deviceType
"] exists, then set context.[[deviceType]]
to options["deviceType
"]. Otherwise, set context.[[deviceType]]
to"cpu"
. -
If options["
powerPreference
"] exists, then set context.[[powerPreference]]
to options["powerPreference
"]. Otherwise, set context.[[powerPreference]]
to"default"
.
-
-
If the user agent cannot support context.
[[contextType]]
, context.[[deviceType]]
and context.[[powerPreference]]
, return failure. -
Return context.
The createContext(options)
steps are:
-
Let global be this's relevant global object.
-
If global’s associated Document is not allowed to use the webnn feature, return a new promise rejected with a "
SecurityError
"DOMException
. -
Let realm be this's relevant realm.
-
Let promise be a new promise.
-
Run the following steps in parallel.
-
Let context be the result of creating a context given realm and options. If that returns failure, then queue an ML task with global to reject promise with a "
NotSupportedError
"DOMException
and abort these steps. -
Queue an ML task with global to resolve promise with context.
-
-
Return promise.
The createContext(gpuDevice)
method steps are:
-
Let global be this's relevant global object.
-
If global’s associated Document is not allowed to use the webnn feature, return a new promise rejected with a "
SecurityError
"DOMException
. -
Let realm be this's relevant realm.
-
Let promise be a new promise.
-
Run the following steps in parallel.
-
Let context be the result of creating a context given realm and gpuDevice. If that returns failure, then queue an ML task with global to reject promise with a "
NotSupportedError
"DOMException
and abort these steps. -
Queue an ML task with global to resolve promise with context.
-
-
Return promise.
7.3. MLContext
interface
The MLContext
interface represents a global state of neural network compute workload and execution processes. Each MLContext
object has associated context type, MLDeviceType
and MLPowerPreference
.
typedef record <DOMString ,ArrayBufferView >;
MLNamedArrayBufferViews dictionary {
MLComputeResult MLNamedArrayBufferViews inputs ;MLNamedArrayBufferViews outputs ; }; [SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {
MLContext Promise <MLComputeResult >compute (MLGraph ,
graph MLNamedArrayBufferViews ,
inputs MLNamedArrayBufferViews ); };
outputs
MLContext
has the following internal slots:
[[contextType]]
of type context type.-
The
MLContext
's context type. [[deviceType]]
of typeMLDeviceType
.-
The
MLContext
'sMLDeviceType
. [[powerPreference]]
of typeMLPowerPreference
.-
The
MLContext
'sMLPowerPreference
.
The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:
- "default"
- Context created per user preference options.
- "webgpu"
- Context created from WebGPU device.
[[contextType]]
is set to default with the MLContextOptions
.deviceType
set to "gpu"
, the user agent is responsible for creating an internal GPU device that operates within the context and is capable of ML workload submission on behalf of the calling application. In this setting however, only ArrayBufferView
inputs and outputs are allowed in and out of the graph execution since the application has no way to know what type of internal GPU device is being created on their behalf. In this case, the user agent is responsible for automatic uploads and downloads of the inputs and outputs to and from the GPU memory using this said internal device. inputs
, of type MLNamedArrayBufferViews-
An object where the keys are the graph input names, and the values are the transferred
ArrayBufferView
s for the supplied input tensor values. outputs
, of type MLNamedArrayBufferViews-
An object where the keys are the graph output names, and the values are the transferred
ArrayBufferView
s for the computed output tensor values.
To validate buffer with descriptor given ArrayBufferView
bufferView and MLOperandDescriptor
descriptor, run the following steps:
-
If bufferView’s element type does not match to descriptor.
dataType
according to this table, return false. -
If bufferView.[[ByteLength]] is not equal to descriptor’s byte length, return false.
To execute graph, given MLGraph
graph, MLNamedArrayBufferViews
inputs and MLNamedArrayBufferViews
outputs, run the following steps. They return undefined
, or an error.
-
Let inputResources be the input resources of graph.
[[implementation]]
. -
For each name → inputValue of inputs:
-
Let inputDescriptor be graph.
[[inputDescriptors]]
[name]. -
Let inputTensor be a new tensor for graph.
[[implementation]]
as follows:-
Set the data type of inputTensor to the one that matches inputValue’s element type.
-
Set the dimensions of inputTensor to inputDescriptor.
dimensions
. -
Set the values of elements in inputTensor to the values of elements in inputValue.
-
-
Request the underlying implementation of graph to bind inputResources[name] to inputTensor.
-
-
For each name → outputValue of outputs:
-
Issue a compute request to graph.
[[implementation]]
given name and inputResources and wait for completion.-
If that returns an error, then return an "
OperationError
"DOMException
. -
Otherwise, let outputTensor be the result.
-
-
Let outputDesc be graph.
[[outputDescriptors]]
[name]. -
If the byte length of outputTensor is not equal to outputDesc’s byte length, then return a
TypeError
. -
If outputTensor’s element type doesn’t match outputValue’s element type, then return a
TypeError
. -
Request the underlying implementation of graph to set the values of elements in outputValue to the values of elements in outputTensor.
-
-
Return
undefined
.
7.3.1. MLNamedArrayBufferViews
transfer algorithm
To transfer an MLNamedArrayBufferViews
views with realm realm:
-
Let transferredViews be a new
MLNamedArrayBufferViews
. -
For each name → view of views:
-
Let transferredBuffer be the result of transferring view’s underlying buffer.
-
Let constructor be the appropriate view constructor for the type of
ArrayBufferView
view from realm. -
Let elementsNumber be the result of view’s byte length / view’s element size.
-
Let transferredView be Construct(constructor, transferredBuffer, view.[[ByteOffset]], elementsNumber).
-
Set transferredViews[name] to transferredView.
-
-
Return transferredViews.
7.3.2. compute()
Asynchronously carries out the computational workload of a compiled graph MLGraph
on a separate timeline, either on a worker thread for the CPU execution, or on a GPU timeline for the submission of GPU workload on the command queue. The asynchronous nature of this call avoids blocking the calling thread while the computation for result is ongoing. This method of execution requires an MLContext
created with MLContextOptions
. Otherwise, it throws an "OperationError
" DOMException
.
MLNamedArrayBufferViews
to new views that share the same backing memory allocations. The transferred views are returned to the caller via the promise fulfillment with the computation result written into the backing memory of the output views. -
graph: an
MLGraph
. The compiled graph to be executed. -
inputs: an
MLNamedArrayBufferViews
. The resources of inputs. Will be transferred if there are no validation errors. -
outputs: an
MLNamedArrayBufferViews
. The pre-allocated resources of required outputs. Will be transferred if there are no validation errors.
Returns: Promise<MLComputeResult
>.
Note: Invocations of compute()
will fail if any of the graph
's inputs are not provided as inputs
, or if any requested outputs
do not match the graph
's outputs.
The compute(graph, inputs, outputs)
method steps are:
-
Let global be this's relevant global object.
-
Let realm be this's relevant realm.
-
If graph.
[[context]]
is not this, then return a new promise rejected with aTypeError
. -
If graph.
[[context]]
.[[contextType]]
is not "default", then return a new promise rejected with an "OperationError
"DOMException
. -
For each name → descriptor of graph.
[[inputDescriptors]]
:-
If inputs[name] does not exist, then return a new promise rejected with a
TypeError
. -
If validating buffer with descriptor given inputs[name] and descriptor returns false, then return a new promise rejected with a
TypeError
.
-
-
For each name → resource of outputs:
-
If graph.
[[outputDescriptors]]
[name] does not exist, then return a new promise rejected with aTypeError
. -
If validating buffer with descriptor given resource and graph.
[[outputDescriptors]]
[name] returns false, then return a new promise rejected with aTypeError
.
-
-
Let transferredInputs be the result of transferring
MLNamedArrayBufferViews
inputs with realm. If that threw an exception, then return a new promise rejected with that exception. -
Let transferredOutputs be the result of transferring
MLNamedArrayBufferViews
outputs with realm. If that threw an exception, then return a new promise rejected with that exception. -
Let promise be a new promise.
-
Run the following steps in parallel:
-
Invoke execute graph given graph, transferredInputs and transferredOutputs. If that returns an error, then queue an ML task with global to reject promise with an equivalent error in realm and abort these steps.
-
Let result be a new
MLComputeResult
with realm. -
Set result.
inputs
to transferredInputs. -
Set result.
outputs
to transferredOutputs. -
Queue an ML task with global to resolve promise with result.
-
-
Return promise.
7.3.2.1. Examples
The following code showcases the asynchronous computation.
const operandType= { dataType: 'float32' , dimensions: [ 2 , 2 ]}; const context= await navigator. ml. createContext(); const builder= new MLGraphBuilder( context); // 1. Create a computational graph 'C = 0.2 * A + B'. const constant= builder. constant( 0.2 ); const A= builder. input( 'A' , operandType); const B= builder. input( 'B' , operandType); const C= builder. add( builder. mul( A, constant), B); // 2. Compile it into an executable. const graph= await builder. build({ 'C' : C}); // 3. Bind inputs to the graph and execute for the result. const bufferA= new Float32Array( 4 ). fill( 1.0 ); const bufferB= new Float32Array( 4 ). fill( 0.8 ); const bufferC= new Float32Array( 4 ); const inputs= { 'A' : bufferA, 'B' : bufferB}; const outputs= { 'C' : bufferC}; const result= await context. compute( graph, inputs, outputs); // The computed result of [[1, 1], [1, 1]] is in the buffer associated with // the output operand. console. log( 'Output value: ' + result. outputs. C); // Note: the result.outputs.C buffer is different from the bufferC, but it // shares the same backing memory allocation.
7.4. MLGraph
interface
The MLGraph
interface represents a compiled computational graph. A compiled graph once constructed is immutable and cannot be subsequently changed.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLGraph
MLGraph
has the following internal slots:
[[context]]
of typeMLContext
[[inputDescriptors]]
of type record<DOMString
,MLOperandDescriptor
>-
Maps the name of an input
MLOperand
to itsMLOperandDescriptor
for all inputMLOperand
s of thisMLGraph
. [[outputDescriptors]]
of type record<DOMString
,MLOperandDescriptor
>-
Maps the name of an output
MLOperand
to itsMLOperandDescriptor
for all outputMLOperand
s of thisMLGraph
. [[implementation]]
-
The underlying implementation provided by the User Agent.
7.5. MLOperandDescriptor
dictionary
An MLOperandDescriptor
describes the shape (dimensions) and data type of an operand. They are used to describe the inputs and constants for an MLGraph
, and every MLOperand
has an internal MLOperandDescriptor
.
enum {
MLInputOperandLayout ,
"nchw" };
"nhwc" enum {
MLOperandDataType ,
"float32" ,
"float16" ,
"int32" ,
"uint32" ,
"int64" ,
"uint64" ,
"int8" };
"uint8" dictionary {
MLOperandDescriptor required MLOperandDataType dataType ;sequence <[EnforceRange ]unsigned long >dimensions = []; };
dataType
, of type MLOperandDataType-
The operand data type.
dimensions
, of typesequence<[EnforceRange] unsigned long>
, defaulting to[]
-
The shape of the operand. It is empty for scalar operands, and non-empty for tensor operands.
The byte length of an MLOperandDescriptor
desc is the value returned by the following steps:
-
Let elementLength be 1.
-
For each dimension of desc.
dimensions
:-
Set elementLength to elementLength * dimension.
-
-
Let elementSize be the element size of one of the
ArrayBufferView
types that matches desc.dataType
according to this table. -
Return elementLength * elementSize.
A valid dimension is an integer greater than zero in the range of unsigned long
. Implementations may impose a smaller upper bound.
Should 0-size dimensions be supported? [Issue #391]
To check dimensions given MLOperandDescriptor
descriptor, run the following steps:
-
If any element of descriptor.
dimensions
is not a valid dimension, return false. -
If descriptor.
dimensions
's size is too large to be supported by the implementation, return false.The maximum number of operand dimensions is not defined, but native ML APIs usually have a maximum supported size. [Issue #456]
-
If descriptor’s byte length is not supported by the implementation, then return false.
-
Return true.
7.6. MLOperand
interface
An MLOperand
represents an intermediary graph being constructed as a result of compositing parts of an operation into a fully composed operation.
For instance, an MLOperand
may represent a constant feeding to an operation or the result from combining multiple constants together into an operation. See also § 6 Programming Model.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {
MLOperand MLOperandDataType dataType ();sequence <unsigned long >shape (); };
MLOperand
has the following internal slots:
[[builder]]
of typeMLGraphBuilder
-
The
MLOperand
's associated builder object. [[descriptor]]
of typeMLOperandDescriptor
-
The
MLOperand
's descriptor. [[name]]
of type string-
The
MLOperand
's name (only for input operands). [[operator]]
of type operator
An MLOperand
's shape is its [[descriptor]]
.dimensions
.
An MLOperand
's rank is its shape's size.
An MLOperand
's dataType is its [[descriptor]]
.dataType
.
Since the [[builder]]
object is bound by the MLGraphBuilder()
constructor to an MLContext
object, an MLOperand
is also always bound to the same MLContext
object.
7.6.1. Creating an MLOperand
The MLOperand
objects are created by the methods of MLGraphBuilder
, internally using the following algorithms.
To create an MLOperand given MLGraphBuilder
builder and MLOperandDescriptor
desc, run the following steps:
-
Let operand be a new
MLOperand
. -
Set operand.
[[builder]]
to builder. -
Set operand.
[[descriptor]]
to desc. -
Return operand.
To copy an MLOperand given MLOperand
operand, run the following steps:
-
Let result be a new
MLOperand
. -
Set result.
[[builder]]
to operand.[[builder]]
. -
Set result.
[[descriptor]]
to operand.[[descriptor]]
. -
If operand.
[[name]]
exists, then set result.[[name]]
to operand.[[name]]
. -
Return result.
To validate operand given MLGraphBuilder
builder and MLOperand
operand, return true if operand.[[builder]]
is builder, and false otherwise.
7.6.2. dataType()
Return a data type of the MLOperand
.
MLOperandDataType
. The data type of the operand. 7.6.3. shape()
Return a shape of the MLOperand
.
unsigned long
. The shape of the operand. 7.7. MLActivation
interface
Objects implementing the MLActivation
interface represent activation function types.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLActivation
MLActivation
has the following internal slots:
[[name]]
of type string-
The
MLActivation
's name. [[builder]]
of typeMLGraphBuilder
-
A dictionary containing
MLActivation
options. [[operator]]
of type operator-
Reference to
MLActivation
's corresponding operator.
gru()
or lstm()
. Each MLActivation
has associated validation steps, which is an algorithm accepting an MLOperandDescriptor
and returning a boolean. The default activation validation steps are to return true.
7.7.1. Creating MLActivation
MLActivation
objects (including the ones passed as input to methods) are created by the methods of MLGraphBuilder
and are identified by their name. The options dictionary is defined by those methods. The actual creation of the activation function e.g. a sigmoid()
or relu()
can then be deferred until when the rest of the graph is ready to connect with it such as during the construction of lstm()
for example. To create an MLActivation given MLGraphBuilder
builder, string name, optional ordered map options, and optional algorithm validation steps, run the following steps:
-
Let activation be a new
MLActivation
. -
Set activation.
[[builder]]
to builder. -
Set activation.
[[name]]
to name. -
Let operator be an operator for the name operation, given options.
-
Set activation.
[[operator]]
to operator. -
Set activation’s validation steps to validation steps if given, or the default activation validation steps otherwise.
-
Return activation.
To validate activation given MLGraphBuilder
builder and MLActivation
activation, return true if activation.[[builder]]
is builder, and false otherwise.
7.8. MLGraphBuilder
interface
The MLGraphBuilder
interface defines a set of operations as identified by the § 2 Use cases that can be composed into a computational graph. It also represents the intermediate state of a graph building session.
typedef record <DOMString ,MLOperand >; [
MLNamedOperands SecureContext ,Exposed =(Window ,DedicatedWorker )]interface { // Construct the graph builder from the context.
MLGraphBuilder constructor (MLContext ); // Create an operand for a graph input.
context MLOperand input (DOMString ,
name MLOperandDescriptor ); // Create an operand for a graph constant.
descriptor MLOperand constant (MLOperandDescriptor ,
descriptor ArrayBufferView ); // Create a single-value operand from the specified number of the specified type.
bufferView MLOperand constant (double ,
value optional MLOperandDataType = "float32"); // Compile the graph up to the specified output operands asynchronously.
type Promise <MLGraph >build (MLNamedOperands ); };
outputs
MLGraphBuilder
.build()
method compiles the graph builder state up to the specified output operands into a compiled graph according to the type of MLContext
that creates it. When the [[contextType]]
of the MLContext
is set to "default", the compiled graph is initialized right before the MLGraph
is returned. This graph initialization stage is important for optimal performance of the subsequent graph executions. It typically involves a process known as "weight preprocessing" where all the constant inputs to the graph are preprocessed and cached at the operating system level for subsequent graph execution calls. The initializing inputs are typically the constant weight data specified through the MLGraphBuilder/constant(value, type)
method as constant operands during graph construction time. MLGraphBuilder
has the following internal slots:
[[context]]
of typeMLContext
-
The context of type
MLContext
associated with thisMLGraphBuilder
.
7.8.1. MLGraphBuilder
constructor
The new MLGraphBuilder(context)
constructor steps are:
-
If this's relevant global object's associated Document is not allowed to use the webnn feature, then throw a "
SecurityError
"DOMException
. -
Set this.
[[context]]
to context.
7.8.2. input operands
Create a named MLOperand
based on a descriptor, that can be used as an input.
-
name: a string name of the input.
-
descriptor: an
MLOperandDescriptor
object.
MLOperand
object.
The input(name, descriptor)
method steps are:
-
If checking dimensions given descriptor returns false, then throw a
TypeError
. -
Make graph connections:
-
Return operand.
MLGraphBuilder
API allows creating an MLGraph
without input operands. If the underlying platform doesn’t support that, implementations may add a stub input or passing constants as inputs to the graph. 7.8.3. constant operands
Create a constantMLOperand
that can be used in MLGraphBuilder
methods.
7.8.3.1. constant(descriptor, bufferView)
Create a constant MLOperand
of the specified data type and shape that contains the initializing data.
-
descriptor: an
MLOperandDescriptor
. The descriptor of the output tensor. -
bufferView: an
ArrayBufferView
. The view of the buffer containing the initializing data.
MLOperand
. The constant output tensor.
The constant(descriptor, bufferView)
method steps are:
-
If checking dimensions given descriptor returns false, then throw a
TypeError
. -
If validating buffer with descriptor given bufferView and descriptor returns false, then throw a
TypeError
. -
Make graph connections:
-
Let operand be the result of creating an MLOperand given this and descriptor.
-
Let bytes be the result of getting a copy of the bytes held by the buffer source given bufferView.
-
Add operand to this's graph's constants with bytes as value.
-
-
Return operand.
7.8.3.2. constant(value, type)
Create a constant MLOperand
of the specified value and data type.
"int8"
data type, etc. -
value: a
float
number. The value of the constant. -
type: an optional
MLOperandDataType
. If not specified, it is assumed to be"float32"
.
MLOperand
. The constant output.
The constant(value, type)
method steps are:
-
Let descriptor be a new
MLOperandDescriptor
.-
Set descriptor.
dataType
to type. -
Set descriptor.
dimensions
to an empty list.
-
-
Make graph connections:
-
Let operand be the result of creating an MLOperand given this and descriptor.
-
Add operand to this's graph's constants with value as value.
-
-
Return operand.
7.8.4. build method
Build a composed graph up to a given output operand into a computational graph asynchronously. The build(outputs)
method steps are:
-
If outputs is empty, then return a new promise rejected with a
TypeError
. -
For each name → operand of outputs:
-
If name is empty, then return a new promise rejected with a
TypeError
. -
If validating operand given this and operand returns false, then return a new promise rejected with a
TypeError
. -
If operand is in this's graph's inputs or constants, then return a new promise rejected with a
TypeError
.
-
-
Let operands be a new empty set.
-
Let operators be a new empty set.
-
Let inputs be a new empty set.
-
While queue is not empty:
-
If any
MLOperand
s in inputs have the same[[name]]
, then return a new promise rejected with aTypeError
.If
MLGraphBuilder
can’t be re-used, then this can be simplified: enforce uniqueness ininput()
instead, and iteration can be done over all of the graph’s inputs instead of needing this traversal. [Issue #567] -
Let global be this's relevant global object.
-
Let realm be this's relevant realm.
-
Let graph be a new
MLGraph
with realm. -
Set graph.
[[context]]
to this.[[context]]
. -
For each operand in inputs:
-
Set graph.
[[inputDescriptors]]
[operand.[[name]]
] to operand.[[descriptor]]
.If
constants'
ArrayBuffer
s are not transferred, make copies for graph's constants here. [Issue #566]
-
-
For each name → operand of outputs:
-
Set graph.
[[outputDescriptors]]
[name] to operand.[[descriptor]]
.
-
-
Let promise be a new promise.
-
Run the following steps in parallel:
-
Let graphImpl be the result of converting this's graph with operands, operators, inputs, and outputs’s values into an implementation-defined format which can be interpreted by the underlying platform.
-
If the underlying platform does not support a requested feature, then queue an ML task with global to reject promise with an "
OperationError
"DOMException
, and abort these steps.
-
-
Set graph.
[[implementation]]
to graphImpl. -
Queue an ML task with global to resolve promise with graph.
-
-
Return promise.
NOTE: Specifying an input operand or constant operand as a graph output
results in an error, as this is usually an incorrect usage of the API. Callers can work around this by introducing an identity()
operator.
7.8.5. argMin/argMax operations
Return the index location of the minimum or maxmium values of all the input values along the axes.dictionary {
MLArgMinMaxOptions sequence <[EnforceRange ]unsigned long >axes ;boolean keepDimensions =false ;boolean selectLastIndex =false ; };partial interface MLGraphBuilder {MLOperand argMin (MLOperand ,
input optional MLArgMinMaxOptions = {});
options MLOperand argMax (MLOperand ,
input optional MLArgMinMaxOptions = {}); };
options
MLArgMinMaxOptions
has the following members:
axes
, of typesequence<[EnforceRange] unsigned long>
-
The dimensions to reduce. The values must be in the range [0, N-1] where N is the rank of the input tensor. If not present, all dimensions are reduced. If empty, no dimensions are reduced, and the shape of the output tensor is the same as the shape of the input tensor.
keepDimensions
, of type boolean, defaulting tofalse
-
If true, retains reduced dimensions with size 1.
selectLastIndex
, of type boolean, defaulting tofalse
-
If true, select the last index instead of the first found along the axes.
-
input: an
MLOperand
. The input N-D tensor. -
options: an optional
MLArgMinMaxOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The N-D tensor of the reduced shape. The values must be of type "int64"
in the range [0, N-1] where N is the corresponding size of each of the input dimensions specified by options.axes.
To create argMin/argMax operation given string op, MLOperand
input and MLArgMinMaxOptions
options, run the following steps:
-
Assert: op is one of "argMin", "argMax".
-
If validating operand with this and input returns false, then throw a
TypeError
. -
Let outputShape be the result of calculating reduction output sizes given input’s shape, options.
axes
(if it exists), and options.keepDimensions
. If that returns failure, then throw aTypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to outputShape. -
Make graph connections:
-
Let operator be an operator for the op operation, given options.
-
Let output be the result of creating an MLOperand given this and desc.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
The following argMin/argMax algorithms are supported.
argMin(input, options)
method steps are:
-
Let output be the result of running the create argMin/argMax operation given "argMin", input and options.
-
Return output.
argMax(input, options)
method steps are:
-
Let output be the result of running the create argMin/argMax operation given "argMax", input and options.
-
Return output.
7.8.6. batchNormalization
Normalize the values of the input tensor using [Batch-Normalization]. For each input feature, the mean and variance values of that feature are computed across all the samples in the batch dimension while the model is trained. These mean and variance values are then subsequently given to this operation during model inference.dictionary {
MLBatchNormalizationOptions MLOperand scale ;MLOperand bias ; [EnforceRange ]unsigned long axis = 1;float epsilon = 1e-5; };partial interface MLGraphBuilder {MLOperand batchNormalization (MLOperand ,
input MLOperand ,
mean MLOperand ,
variance optional MLBatchNormalizationOptions = {}); };
options
MLBatchNormalizationOptions
has the following members:
scale
, of type MLOperand-
The 1-D tensor of the scaling values whose size is equal to the size of the input dimension denoted by
axis
. bias
, of type MLOperand-
The 1-D tensor of the bias values whose size is equal to the size of the input dimension denoted by
axis
. axis
, of type unsigned long, defaulting to1
-
The index to the feature count dimension of the input shape for which the mean and variance values are. Its value must be in the range [0, N-1] where N is the rank of the input tensor. The default value is 1, corresponding to the channel ("c") dimension in the
"nchw"
data layout. epsilon
, of type float, defaulting to1e-5
-
A small value to prevent computational error due to divide-by-zero.
-
input: an
MLOperand
. The input N-D tensor. -
mean: an
MLOperand
. Specifies the 1-D tensor of the mean values of the input features across the batch. Its size is equal to the size of the input dimension denoted byaxis
. -
variance: an
MLOperand
. The 1-D tensor of the variance values of the input features across the batch whose size is equal to the size of the input dimension denoted byaxis
. -
options: an optional
MLBatchNormalizationOptions
. Specifies the optional parameters of the operation.
Returns: an MLOperand
. The batch-normalized N-D tensor of the same shape as input.
The batchNormalization(input, mean, variance, options)
method steps are:
-
If validating operand with this and any of input, mean, variance, options.
scale
(if it exists), and options.bias
(if it exists) returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If options.
axis
is not in the range 0 to input’s rank, exclusive, then throw aTypeError
. -
If mean’s dataType is not equal to input’s dataType, then throw a
TypeError
. -
If mean’s shape is not equal to « input’s shape[options.
axis
] », then throw aTypeError
. -
If variance’s dataType is not equal to input’s dataType, then throw a
TypeError
. -
If variance’s shape is not equal to « input’s shape[options.
axis
] », then throw aTypeError
. -
Make graph connections:
-
Let operator be an operator for the "batchNormalization" operation, given input, mean, variance and options.
-
Let output be the result of creating an MLOperand given this and input.
[[descriptor]]
. -
Set output.
[[operator]]
to operator. -
Set operator’s inputs to input, mean, and variance.
-
Set operator’s output to output.
-
-
Return output.
The behavior of this operation when the input tensor is 4-D of the "nchw"
layout can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
const shape= [ 1 , input. shape()[ options. axis], 1 , 1 ]; return builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, builder. reshape( mean, shape)), builder. sqrt( builder. add( builder. reshape( variance, shape), builder. constant( options. epsilon))) )), builder. reshape( options. bias, shape));
7.8.7. cast
Cast each element in the input tensor to the target data type.partial interface MLGraphBuilder {MLOperand cast (MLOperand ,
input MLOperandDataType ); };
type
-
input: an
MLOperand
. The input N-D tensor. -
type: an
MLOperandDataType
. The target data type.
Returns: an MLOperand
. The N-D tensor of the same shape as input with each element casted to the target data type.
The cast(input, type)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
Make graph connections:
-
Let operator be an operator for the "cast" operation, given type.
-
Let output be the result of copying an MLOperand given input.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.8. clamp
Clamp the input tensor element-wise within a range specified by the minimum and maximum values.dictionary {
MLClampOptions float minValue ;float maxValue ; };partial interface MLGraphBuilder {MLOperand clamp (MLOperand ,
input optional MLClampOptions = {});
options MLActivation clamp (optional MLClampOptions = {}); };
options
minValue
, of type float-
The minimum value of the range. When it is not specified, the clamping is not performed on the lower limit of the range.
maxValue
, of type float-
The maximum value of the range. When it is not specified, the clamping is not performed on the upper limit of the range.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
if ( options. minValue=== undefined ) { if ( options. maxValue=== undefined ) { return input; } else { return builder. min( input, builder. constant( options. maxValue)); } } else { if ( options. maxValue=== undefined ) { return builder. max( input, builder. constant( options. minValue)); } else { return builder. min( builder. max( input, builder. constant( options. minValue)), builder. constant( options. maxValue)); } }
To check clamp options given MLClampOptions
options, run the following steps:
7.8.8.1. clamp(input, options)
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLClampOptions
. The optional parameters of the operation.
-
an
MLOperand
. The output tensor of the same shape as operand.
The clamp(input, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If checking clamp options given options returns false, then throw a
TypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "clamp" operation, given options.
minValue
and options.maxValue
. -
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.8.2. clamp(options)
-
options: an optional
MLClampOptions
. The optional parameters of the operation.
-
an
MLActivation
. The operator representing the clamp operation.
The clamp(options)
method steps are:
-
If checking clamp options given options returns false, then throw a
TypeError
. -
Let op be the result of creating an MLActivation given this, "clamp" and options.
-
Return op.
7.8.9. concat
Concatenates the input tensors along a given axis.partial interface MLGraphBuilder {MLOperand concat (sequence <MLOperand >, [
inputs EnforceRange ]unsigned long ); };
axis
-
inputs: a sequence of
MLOperand
. All input tensors must have the same shape, except for the size of the dimension to concatenate on. -
axis: an
unsigned long
scalar. The axis that the inputs concatenate along. Its value must be in the range [0, N-1] where N is the rank of the input tensors.
Returns: an MLOperand
. The concatenated tensor of all the inputs along
the axis. The output tensor has the same shape except on the dimension
that all the inputs concatenated along. The size of that dimension is
computed as the sum of all the input sizes of the same dimension.
The concat(inputs, axis)
method steps are:
-
If validating operand with this and any item in inputs returns false, then throw a
TypeError
. -
Let first be inputs[0].
-
If axis is greater than or equal to first’s rank, then throw a
TypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to a clone of first’s shape. -
Set desc.
dimensions
[axis] to first’s shape[axis]. -
For each index in the range 1 to inputs’s size, exclusive:
-
Let input be inputs[index].
-
If input’s dataType is not equal to first’s dataType, then throw a
TypeError
. -
If input’s rank is not equal to first’s rank, then throw a
TypeError
. -
For each dim in the range 0 to input’s rank, exclusive:
If the shape of each corresponding dimension and type of the operands, except for those of the dimension given by axis, is not the same, fail.-
If dim is not equal to axis and if input’s shape[dim] is not equal to first’s shape[dim], then throw a
TypeError
. -
If dim is equal to axis:
-
Let size be the sum of desc.
dimensions
[axis] and input’s shape[dim]. -
If size is not a valid dimension, then throw a
TypeError
. -
Set desc.
dimensions
[axis] to size.
-
-
-
-
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "concat" operation, given inputs and axis.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to inputs.
-
Set operator’s output to output.
-
-
Return output.
7.8.10. conv2d
Compute a 2-D convolution given 4-D input and filter tensorsenum {
MLConv2dFilterOperandLayout ,
"oihw" ,
"hwio" ,
"ohwi" };
"ihwo" dictionary {
MLConv2dOptions sequence <[EnforceRange ]unsigned long >padding ;sequence <[EnforceRange ]unsigned long >strides ;sequence <[EnforceRange ]unsigned long >dilations ; [EnforceRange ]unsigned long groups = 1;MLInputOperandLayout inputLayout = "nchw";MLConv2dFilterOperandLayout filterLayout = "oihw";MLOperand bias ; };partial interface MLGraphBuilder {MLOperand conv2d (MLOperand ,
input MLOperand ,
filter optional MLConv2dOptions = {}); };
options
MLConv2dOptions
has the following members:
padding
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0, 0, 0, 0].
strides
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1, 1].
dilations
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1, 1].
groups
, of type unsigned long, defaulting to1
-
The number of groups that input channels and output channels are divided into.
inputLayout
, of type MLInputOperandLayout, defaulting to"nchw"
-
Specifies the layout format of the input and output tensor as follows:
filterLayout
, of type MLConv2dFilterOperandLayout, defaulting to"oihw"
-
Specifies the layout format of the filter tensor as follows:
bias
, of type MLOperand-
An additional 1-D tensor with the shape of [outputChannels] whose values are to be added to the convolution result.
-
input: an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.inputLayout
. -
filter: an
MLOperand
. The filter 4-D tensor. The logical shape is interpreted according to the value of options.filterLayout
and options.groups
. -
options: an
MLConv2dOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The output 4-D tensor that contains the convolution result. The output shape is interpreted according to the options.inputLayout
value. More specifically, the spatial dimensions or the sizes of the last two dimensions of the output tensor for the nchw input layout can be calculated as follow:
outputSize = 1 + (inputSize - (filterSize - 1) * dilation - 1 + beginningPadding + endingPadding) / stride
"oihw"
layout, [height, width, 1, options.groups] for "hwio"
layout, [options.groups, height, width, 1] for "ohwi"
layout and [1, height, width, options.groups] for "ihwo"
layout. To calculate conv output size given unsigned integers inputSize, filterSize, beginningPadding, endingPadding, stride and dilation, perform these steps. They return a number.
-
Let effectiveFilterSize be ( filterSize - 1 ) * dilation + 1.
-
Let outputSize be ( inputSize - effectiveFilterSize + beginningPadding + endingPadding ) / stride + 1.
-
Return outputSize.
To calculate conv2d output sizes given unsigned integers inputHeight, inputWidth, filterHeight and filterWidth, list of 4 unsigned integers padding, list of 2 unsigned integers strides, and list of 2 unsigned integers dilations, perform these steps. They return a list of 2 numbers.
-
Let outputHeight be the result of calculating conv output size given inputHeight, filterHeight, padding[0], padding[1], strides[0] and dilations[0].
-
Let outputWidth be the result of calculating conv output size given inputWidth, filterWidth, padding[2], padding[3], strides[1] and dilations[1].
-
Return « outputHeight, outputWidth ».
The conv2d(input, filter, options)
method steps are:
-
If validating operand with this and any of input, filter, and options.
bias
(if it exists) returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If filter’s dataType is not equal to input’s dataType, then throw a
TypeError
. -
If options.
padding
does not exist, set it to the list « 0, 0, 0, 0 ». -
Otherwise, if options.
padding
's size is not 4, then throw aTypeError
. -
If options.
strides
does not exist, set it to the list « 1, 1 ». -
Otherwise, if options.
strides
's size is not 2, then throw aTypeError
. -
If any element in options.
strides
is equal to 0, then throw aTypeError
. -
If options.
dilations
does not exist, set it to the list « 1, 1 ». -
Otherwise, if options.
dilations
's size is not 2, then throw aTypeError
. -
Calculate the output shape:
-
Let inputShape be input’s shape.
-
Switch on options.
inputLayout
: -
Let filterShape be filter’s shape.
-
Switch on options.
filterLayout
:"hwio"
-
-
Let filterHeight be filterShape[0].
-
Let filterWidth be filterShape[1].
-
Let filterInputChannels be filterShape[2].
-
Let outputChannels be filterShape[3].
-
"ohwi"
-
-
Let outputChannels be filterShape[0].
-
Let filterHeight be filterShape[1].
-
Let filterWidth be filterShape[2].
-
Let filterInputChannels be filterShape[3].
-
"ihwo"
-
-
Let filterInputChannels be filterShape[0].
-
Let filterHeight be filterShape[1].
-
Let filterWidth be filterShape[2].
-
Let outputChannels be filterShape[3].
-
"oihw"
-
-
Let outputChannels be filterShape[0].
-
Let filterInputChannels be filterShape[1].
-
Let filterHeight be filterShape[2].
-
Let filterWidth be filterShape[3].
-
-
If inputChannels % options.
groups
is not 0, then throw aTypeError
. -
Otherwise, if inputChannels / options.
groups
is not equal to filterInputChannels, then throw aTypeError
. -
Let outputSizes be the result of calculating conv2d output sizes given inputHeight, inputWidth, filterHeight, filterWidth, options.
padding
, options.strides
, and options.dilations
. -
Switch on options.
inputLayout
: -
If any item in outputShape is not a valid dimension, then throw a
TypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to outputShape.
-
-
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "conv2d" operation, given options and filter.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to input and filter.
-
Set operator’s output to output.
-
-
Return output.
7.8.11. convTranspose2d
Compute a 2-D transposed convolution given 4-D input and filter tensorsenum {
MLConvTranspose2dFilterOperandLayout ,
"iohw" ,
"hwoi" };
"ohwi" dictionary {
MLConvTranspose2dOptions sequence <[EnforceRange ]unsigned long >padding ;sequence <[EnforceRange ]unsigned long >strides ;sequence <[EnforceRange ]unsigned long >dilations ;sequence <[EnforceRange ]unsigned long >outputPadding ;sequence <[EnforceRange ]unsigned long >outputSizes ; [EnforceRange ]unsigned long groups = 1;MLInputOperandLayout inputLayout = "nchw";MLConvTranspose2dFilterOperandLayout filterLayout = "iohw";MLOperand bias ; };partial interface MLGraphBuilder {MLOperand convTranspose2d (MLOperand ,
input MLOperand ,
filter optional MLConvTranspose2dOptions = {}); };
options
MLConvTranspose2dOptions
has the following members:
padding
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0, 0, 0, 0].
strides
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1, 1].
dilations
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1, 1].
outputPadding
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2. Specifies the padding values applied to each spatial dimension of the output tensor. The explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the options.
strides
is greater than 1.Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor.
The default value is [0, 0].
outputSizes
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2. Specifies the sizes of the last two dimensions of the output tensor. When the output sizes are explicitly specified, the output padding values in
outputPadding
are ignored.If not specified, the output sizes are automatically computed.
groups
, of type unsigned long, defaulting to1
-
The number of groups that input channels and output channels are divided into.
inputLayout
, of type MLInputOperandLayout, defaulting to"nchw"
-
Specifies the layout format of the input and output tensor as follows:
filterLayout
, of type MLConvTranspose2dFilterOperandLayout, defaulting to"iohw"
-
Specifies the layout format of the filter tensor as follow:
bias
, of type MLOperand-
An additional 1-D tensor with the shape of [outputChannels] whose values are to be added to the convolution result.
-
input: an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.inputLayout
. -
filter: an
MLOperand
. The filter 4-D tensor. The logical shape is interpreted according to the value of options.filterLayout
andgroups
. -
options: an optional
MLConvTranspose2dOptions
.
Returns: an MLOperand
. The output 4-D tensor that contains the transposed convolution result. The output shape is interpreted according to the options.inputLayout
value. More specifically, unless the options.outputSizes
values are explicitly specified, the options.outputPadding
may be needed to compute the spatial dimension values of the output tensor as follow:
outputSize = (inputSize - 1) * stride + (filterSize - 1) * dilation + 1 - beginningPadding - endingPadding + outputPadding
To calculate convtranspose output size given unsigned integers inputSize, filterSize, beginningPadding, endingPadding, stride, dilation, and outputPadding, perform these steps. They return a number.
-
Let effectiveFilterSize be ( filterSize - 1 ) * dilation + 1.
-
Let outputSize be ( inputSize - 1 ) * stride + effectiveFilterSize - beginningPadding - endingPadding + outputPadding.
-
Return outputSize.
To calculate convtranspose2d output sizes given unsigned integers inputHeight, inputWidth, filterHeight and filterWidth, list of 4 unsigned integers padding, list of 2 unsigned integers strides, list of 2 unsigned integers dilations, and list of 2 unsigned integers outputPadding, perform these steps. They return a list of 2 numbers.
-
Let outputHeight be the result of calculating convtranspose output size given inputHeight, filterHeight, padding[0], padding[1], strides[0], dilations[0], and outputPadding[0].
-
Let outputWidth be the result of calculating convtranspose output size given inputWidth, filterWidth, padding[2], padding[3], strides[1], dilations[1] and outputPadding[1].
-
Return « outputHeight, outputWidth ».
The convTranspose2d(input, filter, options)
method steps are:
-
If validating operand with this and any of input, filter, and options.
bias
(if it exists) returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If filter’s dataType is not equal to input’s dataType, then throw a
TypeError
. -
If options.
padding
does not exist, set it to the list « 0, 0, 0, 0 ». -
Otherwise, if options.
padding
's size is not 4, then throw aTypeError
. -
If options.
strides
does not exist, set it to the list « 1, 1 ». -
Otherwise, if options.
strides
's size is not 2, then throw aTypeError
. -
If any element in options.
strides
is equal to 0, then throw aTypeError
. -
If options.
dilations
does not exist, set it to the list « 1, 1 ». -
Otherwise, if options.
dilations
's size is not 2, then throw aTypeError
. -
If options.
outputPadding
does not exist, set it to the list « 0, 0 ». -
Otherwise, if options.
outputPadding
's size is not 2, then throw aTypeError
. -
If options.
outputSizes
exists: -
Otherwise:
-
If options.
outputPadding
[0] is greater than or equal to options.strides
[0], or options.outputPadding
[1] is greater than or equal to options.strides
[1], then throw aTypeError
.
-
-
Calculate the output shape:
-
Let inputShape be input’s shape.
-
Switch on options.
inputLayout
: -
Let filterShape be filter’s shape.
-
Switch on options.
filterLayout
:"iohw"
-
-
Let filterInputChannels be filterShape[0].
-
Let filterOutputChannels be |filterShape[1].
-
Let filterHeight be filterShape[2].
-
Let filterWidth be filterShape[3].
-
"hwoi"
-
-
Let filterHeight be filterShape[0].
-
Let filterWidth be filterShape[1].
-
Let filterOutputChannels be |filterShape[2].
-
Let filterInputChannels be filterShape[3].
-
"ohwi"
-
-
Let filterOutputChannels be |filterShape[0].
-
Let filterHeight be filterShape[1].
-
Let filterWidth be filterShape[2].
-
Let filterInputChannels be filterShape[3].
-
-
If inputChannels is not equal to filterInputChannels, then throw a
TypeError
. -
Let outputChannels be filterOutputChannels * options.
groups
-
If options.
outputSizes
exists, let outputSizes be options.outputSizes
. -
Otherwise, let outputSizes be the result of calculating convtranspose2d output sizes given inputHeight, inputWidth, filterHeight, filterWidth, options.
padding
, options.strides
, options.dilations
, and options.outputPadding
. -
Switch on options.
inputLayout
: -
If any item in outputShape is not a valid dimension, then throw a
TypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to outputShape.
-
-
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "convTranspose2d" operation, given options and filter.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to input and filter.
-
Set operator’s output to output.
-
-
Return output.
7.8.12. Element-wise binary operations
Compute the element-wise binary addition, subtraction, multiplication, division, power, maximum and minimum of the two input tensors.The element-wise binary operations will be broadcasted according to [numpy-broadcasting-rule]. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
partial interface MLGraphBuilder {MLOperand add (MLOperand ,
a MLOperand );
b MLOperand sub (MLOperand ,
a MLOperand );
b MLOperand mul (MLOperand ,
a MLOperand );
b MLOperand div (MLOperand ,
a MLOperand );
b MLOperand max (MLOperand ,
a MLOperand );
b MLOperand min (MLOperand ,
a MLOperand );
b MLOperand pow (MLOperand ,
a MLOperand ); };
b
Returns: an MLOperand
. The output tensor that contains the result of
element-wise binary operation of the two input tensors.
-
add: Add the values of the two input tensors, element-wise.
-
sub: Subtract the values of the second input tensor from the values of the first input tensor, element-wise.
-
mul: Multiply the values of the two input tensors, element-wise.
-
div: Divide the values of the first input tensor with the values of the second tensor, element-wise.
-
max: Select the greater values of the two input tensors, element-wise.
-
min: Select the lesser values of the two input tensors, element-wise.
-
pow: Compute the values of the values of the first input tensor to the power of the values of the second input tensor, element-wise.
To create element-wise binary operation given string op, MLOperand
a and MLOperand
b, run the following steps:
-
Assert: op is one of "add", "sub", "mul", "div", "max", "min", "pow".
-
If validating operand with this and any of a and b returns false, then throw a
TypeError
. -
If a’s dataType is not equal to b’s dataType, then throw a
TypeError
. -
Let descriptor be a new
MLOperandDescriptor
. -
Set descriptor.
dimensions
to the result of bidirectionally broadcasting the shapes a’s shape and b’s shape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and descriptor.
-
Let operator be an operator for the op operation, given a and b.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to a and b.
-
Set operator’s output to output.
-
-
Return output.
The element-wise binary operation algorithms invoke the create element-wise binary operation steps as follows.
add(a, b)
method steps are:
-
Let output be the result of running the create element-wise binary operation given "add", a and b.
-
Return output.
sub(a, b)
method steps are:
-
Let output be the result of running the create element-wise binary operation given "sub", a and b.
-
Return output.
mul(a, b)
method steps are:
-
Let output be the result of running the create element-wise binary operation given "mul", a and b.
-
Return output.
div(a, b)
method steps are:
-
Let output be the result of running the create element-wise binary operation given "div", a and b.
-
Return output.
max(a, b)
method steps are:
-
Let output be the result of running the create element-wise binary operation given "max", a and b.
-
Return output.
min(a, b)
method steps are:
-
Let output be the result of running the create element-wise binary operation given "min", a and b.
-
Return output.
pow(a, b)
method steps are:
-
Let output be the result of running the create element-wise binary operation given "pow", a and b.
-
Return output.
7.8.13. Element-wise logical operations
Compare input tensors element-wise and return a uint8 tensor of values 0 or 1 for the comparisons. For single-operand operations, return the logical results of the operation.The input tensor will be broadcasted according to [numpy-broadcasting-rule]. The rank of the output tensor is the maximum rank of the input tensors.
partial interface MLGraphBuilder {MLOperand equal (MLOperand ,
a MLOperand );
b MLOperand greater (MLOperand ,
a MLOperand );
b MLOperand greaterOrEqual (MLOperand ,
a MLOperand );
b MLOperand lesser (MLOperand ,
a MLOperand );
b MLOperand lesserOrEqual (MLOperand ,
a MLOperand );
b MLOperand not (MLOperand ); };
a
Returns: an MLOperand
. The output tensor that contains the result of element-wise comparison of the two input tensors.
-
equal: Compare if the values of the two input tensors are equal, element-wise.
-
greater: Compare if the values of the first input tensor is greater, element-wise.
-
greaterOrEqual: Compare if the values of the first input tensor is greater or equal, element-wise.
-
lesser: Compare if the values of the first input tensor is lesser, element-wise.
-
lesserOrEqual: Compare if the values of the first input tensor is lesser or equal, element-wise.
-
not: Invert the values of the input tensor to values 0 or 1, element-wise. Specifically, when the input value is non-zero, invert it to a
boolean
value 0. Conversely, for a zero input value, invert it to aboolean
value 1.
greaterOrEqual()
and lesserOrEqual()
can each be implemented in terms of operations not()
, lesser()
, and greater()
in other words builder.greaterOrEqual(a, b)
is builder.not(builder.lesser(a, b))
, they are specifically defined to handle NaN cases and for performance reason to avoid double comparisons. To create element-wise logical operation given string op, MLOperand
a and an optional MLOperand
b, run the following steps:
-
Assert: op is one of "equal", "greater", "greaterOrEqual", "lesser", "lesserOrEqual", "not".
-
If op is "not":
-
Otherwise:
-
If validating operand with this and any of a and b returns false, then throw a
TypeError
. -
If a’s dataType is not equal to b’s dataType, then throw a
TypeError
. -
Let outputShape be the result of bidirectionally broadcasting the shapes a’s shape and b’s shape. If that returns failure, then throw a
TypeError
.
-
-
Let descriptor be a new
MLOperandDescriptor
. -
Set descriptor.
dimensions
to outputShape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and descriptor.
-
Let operator be an operator for the op operation, given a and (if op is not "not") b.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to a and (if op is anything other than "not") b.
-
Set operator’s output to output.
-
-
Return output.
The element-wise logical operation algorithms invoke the create element-wise logical operation steps as follows.
equal(a, b)
method steps are:
-
Let output be the result of running the create element-wise logical operation given "equal", a and b.
-
Return output.
greater(a, b)
method steps are:
-
Let output be the result of running the create element-wise logical operation given "greater", a and b.
-
Return output.
greaterOrEqual(a, b)
method steps are:
-
Let output be the result of running the create element-wise logical operation given "greaterOrEqual", a and b.
-
Return output.
lesser(a, b)
method steps are:
-
Let output be the result of running the create element-wise logical operation given "lesser", a and b.
-
Return output.
lesserOrEqual(a, b)
method steps are:
-
Let output be the result of running the create element-wise logical operation given "lesserOrEqual", a and b.
-
Return output.
not(a)
method steps are:
-
Let output be the result of running the create element-wise logical operation given "not" and a.
-
Return output.
7.8.14. Element-wise unary operations
Compute the element-wise unary operation for input tensor.partial interface MLGraphBuilder {MLOperand abs (MLOperand );
input MLOperand ceil (MLOperand );
input MLOperand cos (MLOperand );
input MLOperand erf (MLOperand );
input MLOperand exp (MLOperand );
input MLOperand floor (MLOperand );
input MLOperand identity (MLOperand );
input MLOperand log (MLOperand );
input MLOperand neg (MLOperand );
input MLOperand reciprocal (MLOperand );
input MLOperand sin (MLOperand );
input MLOperand sqrt (MLOperand );
input MLOperand tan (MLOperand ); };
input
-
input: an
MLOperand
. The input tensor.
Returns: an MLOperand
. The output tensor that contains the result of
element-wise unary operation of the input tensor. The shape of the output
tensor is the same as the shape of input tensor.
-
abs: Compute the absolute value of the input tensor, element-wise.
-
ceil: Compute the ceiling of the input tensor, element-wise.
-
cos: Compute the cosine of the input tensor, element-wise.
-
erf: Compute the error function [Error-Function] of the input tensor, element-wise.
-
exp: Compute the exponential of the input tensor, element-wise.
-
floor: Compute the floor of the input tensor, element-wise.
-
identity: Copy the value of the input tensor to the output tensor, element-wise.
-
log: Compute the natural logarithm of the input tensor, element-wise.
-
neg: Compute the numerical negative value of the input tensor, element-wise.
-
reciprocal: Compute the reciprocal of the input tensor, element-wise.
-
sin: Compute the sine of the input tensor, element-wise.
-
sqrt: Compute the square root of the input tensor, element-wise.
-
tan: Compute the tangent of the input tensor, element-wise.
To create element-wise unary operation given string op, MLOperand
input, and optional list allowedDataTypes, run the following steps:
-
Assert: op is one of "abs", "ceil", "cos", "erf", "exp", "floor", "identity", "log", "neg", "reciprocal", "sin", "sqrt", "tan".
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If allowedDataTypes is given and it does not contain input’s dataType, then throw a
TypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the op operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
The element-wise unary operation algorithms invoke the create element-wise unary operation steps as follows.
abs(input)
method steps are:
ceil(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "ceil", input, and «
"float32"
,"float16"
». -
Return output.
cos(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "cos", input, and «
"float32"
,"float16"
». -
Return output.
erf(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "erf", input, and «
"float32"
,"float16"
». -
Return output.
exp(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "exp", input, and «
"float32"
,"float16"
». -
Return output.
floor(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "floor", input, and «
"float32"
,"float16"
». -
Return output.
identity(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "identity" and input.
-
Return output.
log(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "log", input, and «
"float32"
,"float16"
». -
Return output.
neg(input)
method steps are:
reciprocal(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "reciprocal", input, and «
"float32"
,"float16"
». -
Return output.
sin(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "sin", input, and «
"float32"
,"float16"
». -
Return output.
sqrt(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "sqrt", input, and «
"float32"
,"float16"
». -
Return output.
tan(input)
method steps are:
-
Let output be the result of running the create element-wise unary operation given "tan", input, and «
"float32"
,"float16"
». -
Return output.
7.8.15. elu
Calculate the exponential linear unit function (ELU) on the input tensor element-wise. The calculation follows the expressionmax(0, x) + alpha * (exp(min(0, x)) - 1)
.
dictionary {
MLEluOptions float alpha = 1; };partial interface MLGraphBuilder {MLOperand elu (MLOperand ,
input optional MLEluOptions = {});
options MLActivation elu (optional MLEluOptions = {}); };
options
MLEluOptions
has the following members:
alpha
, of type float, defaulting to1
-
A scalar multiplier.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. add( builder. max( builder. constant( 0 ), x), builder. mul( builder. constant( options. alpha), builder. sub( builder. exp( builder. min( builder. constant( 0 ), x)), builder. constant( 1 ))));
7.8.15.1. elu(input, options)
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLEluOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The elu(input, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "elu" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.15.2. elu(options)
-
options: an optional
MLEluOptions
. The optional parameters of the operation.
Returns:
-
an
MLActivation
. The activation function representing the elu operation.
The elu(options)
method steps are:
-
Let op be the result of creating an MLActivation given this, "elu" and options.
-
Return op.
7.8.16. expand
Expand any dimension of size 1 of the input tensor to a larger size according to the new shape. The expansion is consistent with [numpy-broadcasting-rule]. The input dimensions must have the size of 1 or match the sizes of the corresponding output dimensions according to the new shape.partial interface MLGraphBuilder {MLOperand expand (MLOperand ,
input sequence <[EnforceRange ]unsigned long >); };
newShape
-
input: an
MLOperand
. An input tensor -
newShape: a sequence of
unsigned long
. The new shape the input tensor is expanded to.
Returns: an MLOperand
. The tensor with expanded size dimensions.
The expand(input, newShape)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
Let outputDescriptor be a new
MLOperandDescriptor
. -
Set outputDescriptor.
dimensions
to the result of unidirectionally broadcasting the shapes input’s shape and newShape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and outputDescriptor.
-
Let operator be an operator for the "expand" operation, given input and newShape.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.17. gather
Gather values of the input tensor along an axis according to the indices.dictionary { [
MLGatherOptions EnforceRange ]unsigned long axis = 0; };partial interface MLGraphBuilder {MLOperand gather (MLOperand ,
input MLOperand ,
indices optional MLGatherOptions = {}); };
options
MLGatherOptions
has the following members:
axis
, of type unsigned long, defaulting to0
-
The axis along which the gathered values are obtained. Its value must be in the range [0, N-1] where N is the rank of the input tensor.
-
input: an
MLOperand
. The input N-D tensor from which the values are gathered. -
indices: an
MLOperand
. The indices N-D tensor of the input values to gather. The values must be of type"int32"
,"uint32"
or"int64"
, and must be in the range -N (inclusive) to N (exclusive) where N is the size of the input dimension indexed by options.axis, and a negative index means indexing from the end of the dimension. -
options: an optional
MLGatherOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The output N-D tensor of rank equal to the rank of input + the rank of indices - 1.
indices
parameter to gather()
can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp()
in the compiled graph if the required clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index. The gather(input, indices, options)
method steps are:
-
If validating operand with this and any of input and indices returns false, then throw a
TypeError
. -
If indices’s dataType is not
"int32"
,"uint32"
or"int64"
, then throw aTypeError
. -
Let shapeInput be input’s shape and rankInput be shapeInput’s rank.
-
Let shapeIndices be indices’s shape.
-
Let axis be options.
axis
. -
If axis is greater than or equal to rankInput, then throw a
TypeError
. -
Let dimCount be zero.
-
Let rankOutput be zero.
-
Let shapeOutput be an empty list.
-
For each size of shapeInput:
-
If dimCount is equal to axis then break.
-
Set shapeOutput[dimCount] to size.
-
Increment dimCount by one.
-
-
Set rankOutput to dimCount.
-
Let dimCount be zero.
-
For each size of shapeIndices:
-
Set shapeOutput[rankOutput + dimCount] to size.
-
Increment dimCount by one.
-
-
Set rankOutput to rankOutput + dimCount.
-
Let dimCount be zero.
-
For each size of shapeInput:
-
If dimCount is less than or equal to axis then continue.
-
Set shapeOutput[rankOutput + dimCount - axis - 1] to size.
-
Increment dimCount by one.
-
-
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to shapeOutput. -
Make graph connections:
-
Let output be the result of creating an MLOperand given desc.
-
Let operator be an operator for the "gather" operation, given input, indices, and options.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to input and indices.
-
Set operator’s output to output.
-
-
Return output.
Examples of how gather works in different slicing schemes.
// input of shape [4,3]: // [[ 0, 1, 2], // [10, 11, 12], // [20, 21, 22], // [30, 31, 32]] const input= builder. constant( { dimensions: [ 4 , 3 ] }, new Float32Array([ 0 , 1 , 2 , 10 , 11 , 12 , 20 , 21 , 22 , 30 , 31 , 32 ])); const indices1= builder. constant( { dataType: 'uint32' , dimensions: [ 2 ] }, new Uint32Array([ 3 , 1 ])); const indices2= builder. constant( { dataType: 'uint32' , dimensions: [ 3 ] }, new Uint32Array([ 2 , 1 , 1 ])); const indices3= builder. constant( { dataType: 'uint32' , dimensions: [ 2 , 2 ] }, new Uint32Array([ 0 , 1 , 1 , 2 ])); // axis = 0 (default) // indices of shape [2]: // [3,1] // output of shape [2,3]: // [[30, 31, 32], // [10, 11, 12]] const output1= builder. gather( input, indices1); // axis = 1 // indices of shape [3]: // [2,1,1] // output of shape [4,3]: // [[ 2, 1, 1], // [12, 11, 11], // [22, 21, 21], // [32, 31, 31]] const output2= builder. gather( input, indices2, { axis: 1 }); // axis = 1 // indices of shape [2,2]: // [[0, 1], // [1, 2]] // output of shape [4,2,2]: // [[[ 0, 1], [ 1, 2]], // [[10, 11], [11, 12]], // [[20, 21], [21, 22]], // [[30, 31], [31, 32]]] const output3= builder. gather( input, indices3, { axis: 1 });
7.8.18. gelu
Compute the gaussian error linear unit function (GELU) of the input tensor. The calculation follows the expression0.5 * x * (1 + erf(x / sqrt(2)))
.
partial interface MLGraphBuilder {MLOperand gelu (MLOperand );
input MLActivation gelu (); };
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. mul( builder. mul( x, builder. constant( 0.5 )), builder. add( builder. constant( 1 ), builder. erf( builder. div( x, builder. sqrt( builder. constant( 2 ))))));
7.8.18.1. gelu(input)
-
input: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The gelu(input)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "gelu" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.18.2. gelu()
The gelu()
method steps are:
-
Let op be the result of creating an MLActivation given this and "gelu".
-
Return op.
7.8.19. gemm
Calculate the general matrix multiplication of the Basic Linear Algebra Subprograms. The calculation follows the expressionalpha * A * B + beta * C
, where A
is a 2-D tensor with shape [M, K] or [K, M], B
is a 2-D tensor with shape [K, N] or [N, K], and C
is unidirectionally broadcastable to the shape [M, N]. A
and B
may optionally be transposed prior to the calculation.
dictionary {
MLGemmOptions MLOperand c ;float alpha = 1.0;float beta = 1.0;boolean aTranspose =false ;boolean bTranspose =false ; };partial interface MLGraphBuilder {MLOperand gemm (MLOperand ,
a MLOperand ,
b optional MLGemmOptions = {}); };
options
MLGemmOptions
has the following members:
c
, of type MLOperand-
The third input tensor. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape [M, N]. When it is not specified, the computation is done as if c is a scalar 0.0.
alpha
, of type float, defaulting to1.0
-
A multiplier for the first input.
beta
, of type float, defaulting to1.0
-
A multiplier for the third input
c
. aTranspose
, of type boolean, defaulting tofalse
-
Indicates if the first input should be transposed prior to calculating the output.
bTranspose
, of type boolean, defaulting tofalse
-
Indicates if the second input should be transposed prior to calculating the output.
-
a: an
MLOperand
. The first input 2-D tensor with shape [M, K] if aTranspose is false, or [K, M] if aTranspose is true. -
b: an
MLOperand
. The second input 2-D tensor with shape [K, N] if bTranspose is false, or [N, K] if bTranspose is true. -
options: an optional
MLGemmOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The output 2-D tensor of shape [M, N] that contains the calculated product of all the inputs.
The gemm(a, b, options)
method steps are:
-
If validating operand with this and any of a and b returns false, then throw a
TypeError
. -
If a’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If b’s dataType is not equal to a’s dataType, then throw a
TypeError
. -
If a’s rank is not 2 or b’s rank is not 2, then throw a
TypeError
. -
If options.
aTranspose
is true, then reverse the order of the items in shapeA. -
If options.
bTranspose
is true, then reverse the order of the items in shapeB. -
If shapeA[1] is not equal to shapeB[0], then throw a
TypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to the list « shapeA[0], shapeB[1] ». -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "gemm" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to a and b.
-
Set operator’s output to output.
-
-
Return output.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
if ( options. aTranspose) a= builder. transpose( a); if ( options. bTranspose) b= builder. transpose( b); let ab= builder. matmul( builder. mul( builder. constant( options. alpha), a), b); return ( c? builder. add( ab, builder. mul( builder. constant( options. beta), c)) : ab);
7.8.20. gru
Gated Recurrent Unit [GRU] recurrent network uses an update, reset, and new gate to compute the output state that rolls into the output across the temporal sequence of the network.enum {
MLGruWeightLayout , // update-reset-new gate ordering
"zrn" // reset-update-new gate ordering };
"rzn" enum {
MLRecurrentNetworkDirection ,
"forward" ,
"backward" };
"both" dictionary {
MLGruOptions MLOperand bias ;MLOperand recurrentBias ;MLOperand initialHiddenState ;boolean resetAfter =true ;boolean returnSequence =false ;MLRecurrentNetworkDirection direction = "forward";MLGruWeightLayout layout = "zrn";sequence <MLActivation >activations ; };partial interface MLGraphBuilder {sequence <MLOperand >gru (MLOperand ,
input MLOperand ,
weight MLOperand , [
recurrentWeight EnforceRange ]unsigned long , [
steps EnforceRange ]unsigned long ,
hiddenSize optional MLGruOptions = {}); };
options
MLGruOptions
has the following members:
bias
, of type MLOperand-
The 2-D input bias tensor of shape [numDirections, 3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the
layout
argument. recurrentBias
, of type MLOperand-
The 2-D recurrent bias tensor of shape [numDirections, 3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the
layout
argument. initialHiddenState
, of type MLOperand-
The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations SHOULD use a tensor filled with zero.
resetAfter
, of type boolean, defaulting totrue
-
Indicates whether to apply the reset gate after or before matrix multiplication.
returnSequence
, of type boolean, defaulting tofalse
-
Indicates whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step.
direction
, of type MLRecurrentNetworkDirection, defaulting to"forward"
-
The processing direction of the input sequence. When set to
"both"
, the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. layout
, of type MLGruWeightLayout, defaulting to"zrn"
-
The ordering of the weight and bias vectors for the internal gates of GRU, specifically the
update (z)
,reset (r)
, andnew (n)
gate, as indicated in the second dimension of the weight and bias tensor shape. activations
, of type sequence<MLActivation>-
Specifies a pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, implementations SHOULD use the pair of sigmoid ("sigmoid") and the hyperbolic tangent ("tanh") functions, respectively.
-
input: an
MLOperand
. The input 3-D tensor of shape [steps, batchSize, inputSize]. -
weight: an
MLOperand
. The 3-D input weight tensor of shape [numDirections, 3 * hiddenSize, inputSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout
argument. -
recurrentWeight: an
MLOperand
. The 3-D recurrent weight tensor of shape [numDirections, 3 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout
argument. -
steps: an
unsigned long
scalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize: an
unsigned long
scalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options: an optional
MLGruOptions
. The optional parameters of the operation.
Returns: a sequence of MLOperand
. The first element of the sequence is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the cell output from the last time step of the network. Additionally, if options.returnSequence
is set to true, the second element is the 4-D output tensor of shape [steps, numDirections, batchSize, hiddenSize] containing every cell outputs from each time step in the temporal sequence.
The gru(input, weight, recurrentWeight, steps, hiddenSize, options)
method steps are:
-
If validating operand with this and any of input, weight, recurrentWeight, options.
bias
(if it exists), options.recurrentBias
(if it exists), and options.(if it exists) returns false, then throw a
TypeError
. -
If options.
activations
exists, and validating activation with this and any item in it returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If the dataType of either weight or recurrentWeight is not equal to input’s dataType, then throw a
TypeError
. -
If input’s shape[0] is not equal to steps, then throw a
TypeError
. -
Let batchSize be input’s shape[1].
-
Let inputSize be input’s shape[2].
-
Let numDirections be 2 if options.
direction
is"both"
, or 1 otherwise. -
If weight’s shape is not equal to « numDirections, 3 * hiddenSize, inputSize », then throw a
TypeError
. -
If recurrentWeight’s shape is not equal to « numDirections, 3 * hiddenSize, hiddenSize », then throw a
TypeError
. -
If hiddenSize * 6 is not a valid dimension, then throw a
TypeError
.Why hiddenSize * 6 ?
Some underlying platforms operate on a single bias tensor which is a concatenation ofbias
andrecurrentBias
. Therefore, 3 * hiddenSize + 3 * hiddenSize must also be a valid dimension. -
If options.
recurrentBias
exists: -
If options.
exists:
-
If options.
activations
exists and its size is not 2, then throw aTypeError
. -
If options.
activations
exists:-
Let gateDescriptor be a new
MLOperandDescriptor
. -
Set gateDescriptor.
dimensions
to « batchSize, hiddenSize ». -
If running the validation steps of any item in options.
activations
with gateDescriptor returns false, then throw aTypeError
.
-
-
Calculate the output shape:
-
Let desc0 be a new
MLOperandDescriptor
. -
Set desc0.
dimensions
to the list « numDirections, batchSize, hiddenSize ». -
If options.
returnSequence
is true:-
Let desc1 be a new
MLOperandDescriptor
. -
Set desc1.
dimensions
to the list « steps, numDirections, batchSize, hiddenSize ».
-
-
-
Make graph connections:
-
Let operator be an operator for the "gru" operation, given weight, recurrentWeight, steps, hiddenSize and options as parameters.
-
Let output0 be the result of creating an MLOperand given this and desc0.
-
If options.
returnSequence
is true:-
Let output1 be the result of creating an MLOperand given this and desc1.
-
Let output be the list « output0, output1 ».
-
Set output0.
[[operator]]
and output1.[[operator]]
to operator.
-
-
Otherwise:
-
Let output be the list « output0 ».
-
Set output0.
[[operator]]
to operator.
-
-
Set operator’s inputs to input, weight, and recurrentWeight.
-
If options.
recurrentBias
exists, then add it to operator’s inputs. -
If options.
activations
exists, then add its items to operator’s activation functions. -
Set operator’s output to output.
-
-
Return output.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function squeeze( builder, op) { return builder. reshape( op, op. shape(). remove( 0 )); } const numDirections= ( options. direction== "both" ? 2 : 1 ); let hiddenState= options. initialHiddenState; if ( ! hiddenState) { const desc= { dataType: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; hiddenState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } let sequence= null ; let currentWeight= []; let currentRecurrentWeight= []; let currentBias= []; let currentRecurrentBias= []; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentWeight. push( squeeze( builder, builder. slice( weight, [ dir, 0 , 0 ], [ 1 , 3 * hiddenSize, inputSize]))); currentRecurrentWeight. push( squeeze( builder, builder. slice( recurrentWeight, [ dir, 0 , 0 ], [ 1 , 3 * hiddenSize, hiddenSize]))); currentBias. push( options. bias? ( squeeze( builder, builder. slice( options. bias, [ dir, 0 ], [ 1 , 3 * hiddenSize]))) : null ); currentRecurrentBias. push( options. recurrentBias? ( squeeze( builder, builder. slice( options. recurrentBias, [ dir, 0 ], [ 1 , 3 * hiddenSize]))) : null ); } for ( let step= 0 ; step< steps; ++ step) { let currentHidden= []; let currentOutput= null ; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentHidden. push( squeeze( builder, builder. slice( hiddenState, [ dir, 0 , 0 ], [ 1 , batchSize, hiddenSize]))); } for ( let dir= 0 ; dir< numDirections; ++ dir) { let slice= ( dir== 1 || options. direction== "backward" ? steps- step- 1 : step); let currentInput= squeeze( builder, builder. slice( input, [ slice, 0 , 0 ], [ 1 , batchSize, inputSize])); let result= builder. reshape( builder. gruCell( currentInput, currentWeight[ dir], currentRecurrentWeight[ dir], currentHidden[ dir], hiddenSize, { bias: currentBias[ dir], recurrentBias: currentRecurrentBias[ dir], resetAfter: options. resetAfter, layout: options. layout, activations: options. activations}), [ 1 , batchSize, hiddenSize]); currentOutput= ( currentOutput? builder. concat([ currentOutput, result], 0 ) : result); } hiddenState= currentOutput; if ( options. returnSequence) { currentOutput= builder. reshape( currentOutput, [ 1 , numDirections, batchSize, hiddenSize]); sequence= ( sequence? builder. concat([ sequence, currentOutput], 0 ) : currentOutput); } } return ( sequence? [ hiddenState, sequence] : [ hiddenState]);
7.8.21. gruCell
A single time step of the Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.dictionary {
MLGruCellOptions MLOperand bias ;MLOperand recurrentBias ;boolean resetAfter =true ;MLGruWeightLayout layout = "zrn";sequence <MLActivation >activations ; };partial interface MLGraphBuilder {MLOperand gruCell (MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight MLOperand , [
hiddenState EnforceRange ]unsigned long ,
hiddenSize optional MLGruCellOptions = {}); };
options
MLGruCellOptions
has the following members:
bias
, of type MLOperand-
The 1-D input bias tensor of shape [3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the
layout
argument. recurrentBias
, of type MLOperand-
The 1-D recurrent bias tensor of shape [3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the
layout
argument. resetAfter
, of type boolean, defaulting totrue
-
Indicates whether to apply the reset gate after or before matrix multiplication.
layout
, of type MLGruWeightLayout, defaulting to"zrn"
-
The ordering of the weight and bias vectors for the internal gates of GRU, specifically the
update (z)
,reset (r)
, andnew (n)
gate, as indicated in the second dimension of the weight and bias tensor shape. activations
, of type sequence<MLActivation>-
Specifies a pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, implementations SHOULD use the pair of sigmoid ("sigmoid") and the hyperbolic tangent ("tanh") functions, respectively.
-
input: an
MLOperand
. The input 2-D tensor of shape [batchSize, inputSize]. -
weight: an
MLOperand
. The 2-D input weight tensor of shape [3 * hiddenSize, inputSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentWeight: an
MLOperand
. The 2-D recurrent weight tensor of shape [3 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
hiddenState: an
MLOperand
. The 2-D input hidden state tensor of shape [batchSize, hiddenSize]. -
hiddenSize: an
unsigned long
scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options: an optional
MLGruCellOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The 2-D tensor of shape [batchSize, hiddenSize], the cell output hidden state of a single time step of the recurrent network.
The gruCell(input, weight, recurrentWeight, hiddenState, hiddenSize, options)
method steps are:
-
If validating operand with this and any of input, weight, recurrentWeight, hiddenState, options.
bias
(if it exists), and options.recurrentBias
(if it exists) returns false, then throw aTypeError
. -
If options.
activations
exists, and validating activation with this and any item in it returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Let batchSize be input’s shape[0];
-
Let inputSize be input’s shape[1];
-
If the dataType of any of weight, recurrentWeight, or hiddenState is not equal to input’s dataType, then throw a
TypeError
. -
If weight’s shape is not equal to « 3 * hiddenSize, inputSize », then throw a
TypeError
. -
If recurrentWeight’s shape is not equal to « 3 * hiddenSize, hiddenSize », then throw a
TypeError
. -
If hiddenState’s shape is not equal to « batchSize, hiddenSize », then throw a
TypeError
. -
If hiddenSize * 6 is not a valid dimension, then throw a
TypeError
.Why hiddenSize * 6 ?
Some underlying platforms operate on a single bias tensor which is a concatenation ofbias
andrecurrentBias
. Therefore, 3 * hiddenSize + 3 * hiddenSize must also be a valid dimension. -
If options.
recurrentBias
exists: -
If options.
activations
exists and its size is not 2, then throw aTypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to the list « batchSize, hiddenSize ». -
If options.
activations
exists, and running the validation steps of any item in it with desc returns false, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "gruCell" operation, given weight, recurrentWeight, hiddenState, hiddenSize and options as parameters.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to input, weight, recurrentWeight, and hiddenState.
-
If options.
recurrentBias
exists, then add it to operator’s inputs. -
If options.
activations
exists, then add its items to operator’s activation functions. -
Set operator’s output to output.
-
-
Return output.
The behavior of this operation when the weight layout is the default "zrn"
layout, and the activation functions of the update/reset gate and new gate are sigmoid()
and tanh()
respectively can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
const one= builder. constant( 1 ); const zero= builder. constant( 0 ); // update gate (z) let z= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 0 ], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 0 , 0 ], [ hiddenSize, inputSize])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 0 , 0 ], [ hiddenSize, hiddenSize])) ) ) ) ); // reset gate (r) let r= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ hiddenSize, 0 ], [ hiddenSize, inputSize])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ hiddenSize, 0 ], [ hiddenSize, hiddenSize])) ) ) ) ); // new gate (n) let n; if ( resetAfter) { n= builder. tanh( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, inputSize])) ), builder. mul( r, builder. add( ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, hiddenSize])) ) ) ) ) ) ); } else { n= builder. tanh( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, inputSize])) ), builder. matmul( builder. mul( r, hiddenState), builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, hiddenSize])) ) ) ) ); } // compute the new hidden state return builder. add( builder. mul( z, hiddenState), builder. mul( n, builder. sub( one, z)));
7.8.22. hardSigmoid
Calculate the non-smooth hard sigmoid function on the input tensor, used instead of the sigmoid function for faster computation.dictionary {
MLHardSigmoidOptions float alpha = 0.2;float beta = 0.5; };partial interface MLGraphBuilder {MLOperand hardSigmoid (MLOperand ,
input optional MLHardSigmoidOptions = {});
options MLActivation hardSigmoid (optional MLHardSigmoidOptions = {}); };
options
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. max( builder. min( builder. add( builder. mul( builder. constant( options. alpha), x), builder. constant( options. beta)), builder. constant( 1 )), builder. constant( 0 ));
MLHardSigmoidOptions
has the following members:
alpha
, of type float, defaulting to0.2
-
A scalar multiplier.
beta
, of type float, defaulting to0.5
-
A scalar addition.
7.8.22.1. hardSigmoid(input, options)
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLHardSigmoidOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The hardSigmoid(input, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "hardSigmoid" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.22.2. hardSigmoid(options)
-
options: an optional
MLHardSigmoidOptions
. The optional parameters of the operation.
Returns:
-
an
MLActivation
. The activation function representing the hard sigmoid operation.
The hardSigmoid(options)
method steps are:
-
Let op be the result of creating an MLActivation given this, "hardSigmoid" and options.
-
Return op.
7.8.23. hardSwish
Computes the nonlinear functiony = x * max(0, min(6, (x + 3))) / 6
that is introduced by [MobileNetV3] on the input tensor element-wise.
partial interface MLGraphBuilder {MLOperand hardSwish (MLOperand );
input MLActivation hardSwish (); };
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. div( builder. mul( x, builder. max( builder. constant( 0 ), builder. min( builder. constant( 6 ), builder. add( x, builder. constant( 3 ))))), builder. constant( 6 ));
7.8.23.1. hardSwish(input)
-
input: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The hardSwish(input)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "hardSwish" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.23.2. hardSwish()
-
None.
Returns:
-
an
MLActivation
. The activation function representing the hard-swish operation.
The hardSwish()
method steps are:
-
Let op be the result of creating an MLActivation given this and "hardSwish".
-
Return op.
7.8.24. instanceNormalization
Normalize the input using [Instance-Normalization]. UnlikebatchNormalization()
where the mean and variance values used in the normalization are computed across all the samples in the batch dimension while the model is trained, the mean and variance values used in the instance normalization are computed on the fly for each input feature of each individual sample in the batch.
dictionary {
MLInstanceNormalizationOptions MLOperand scale ;MLOperand bias ;float epsilon = 1e-5;MLInputOperandLayout layout = "nchw"; };partial interface MLGraphBuilder {MLOperand instanceNormalization (MLOperand ,
input optional MLInstanceNormalizationOptions = {}); };
options
MLInstanceNormalizationOptions
has the following members:
scale
, of type MLOperand-
The 1-D tensor of the scaling values whose size is equal to the number of channels, i.e. the size of the feature dimension of the input. For example, for an input tensor with
"nchw"
layout, the size is equal to input’s shape[1]. bias
, of type MLOperand-
The 1-D tensor of the bias values whose size is equal to the size of the feature dimension of the input. For example, for an input tensor with
"nchw"
layout, the size is equal to input’s shape[1]. epsilon
, of type float, defaulting to1e-5
-
A small value to prevent computational error due to divide-by-zero.
layout
, of type MLInputOperandLayout, defaulting to"nchw"
-
The layout format of the input.
-
input: an
MLOperand
. The input 4-D tensor. -
options: an optional
MLInstanceNormalizationOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The instance-normalized 4-D tensor of the same shape as input.
The instanceNormalization(input, options)
method steps are:
-
If validating operand with this and any of input, options.
scale
(if it exists), and options.bias
(if it exists) returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "instanceNormalization" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
The behavior of this operation when the input tensor is 4-D of the "nchw"
layout can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
// The reduction of the mean and variance values happens over the spatial dimensions of the input // e.g. axis 2 and 3 of the input tensor. const reduceOptions= { axes: [ 2 , 3 ], keepDimensions: true }; const mean= builder. reduceMean( input, reduceOptions); const variance= builder. reduceMean( builder. pow( builder. sub( input, mean), buider. constant( 2 )), reduceOptions); // The scale and bias values are applied per input feature // e.g. axis 1 of the input tensor. const shape= [ 1 , input. shape()[ 1 ], 1 , 1 ]; return builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, mean), buidler. sqrt( builder. add( variance, options. epsilon)) ) ), builder. reshape( options. bias, shape) );
7.8.25. layerNormalization
Normalize the input using [Layer-Normalization]. UnlikebatchNormalization()
where the mean and variance values are computed across all the samples in the batch dimension while the model is trained, and in instanceNormalization()
where the mean and variance values are computed on the fly for each input feature of each individual sample in the batch, the means and variance values of the layer normalization are computed on the fly across all the input features of each individual sample in the batch.
dictionary {
MLLayerNormalizationOptions MLOperand scale ;MLOperand bias ;sequence <[EnforceRange ]unsigned long >axes ;float epsilon = 1e-5; };partial interface MLGraphBuilder {MLOperand layerNormalization (MLOperand ,
input optional MLLayerNormalizationOptions = {}); };
options
MLLayerNormalizationOptions
has the following members:
scale
, of type MLOperand-
The N-D tensor of the scaling values whose shape is determined by the axes member in that each value in axes indicates the dimension of the input tensor with scaling values. For example, for an axes values of [1,2,3], the shape of this tensor is the list of the corresponding sizes of the input dimension 1, 2 and 3. When this member is not present, the scaling value is assumed to be 1.
bias
, of type MLOperand-
The N-D tensor of the bias values whose shape is determined by the axes member in that each value in axes indicates the dimension of the input tensor with bias values. For example, for an axes values of [1,2,3], the shape of this tensor is the list of the corresponding sizes of the input dimension 1, 2 and 3. When this member is not present, the bias value is assumed to be 0.
axes
, of typesequence<[EnforceRange] unsigned long>
-
The indices to the input dimensions to reduce. When this member is not present, it is treated as if all dimensions except the first were given (e.g. for a 4-D input tensor, axes = [1,2,3]). That is, the reduction for the mean and variance values are calculated across all the input features for each independent batch. If empty, no dimensions are reduced.
epsilon
, of type float, defaulting to1e-5
-
A small value to prevent computational error due to divide-by-zero.
-
input: an
MLOperand
. The input N-D tensor. -
options: an optional
MLLayerNormalizationOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The layer-normalized N-D tensor of the same shape as input.
The layerNormalization(input, options)
method steps are:
-
If validating operand with this and any of input, options.
scale
(if it exists), and options.bias
(if it exists) returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If options.
axes
does not exist, then set options.axes
to a new list, either equal to the range from 1 to input’s rank, exclusive, if input’s rank is greater than 1, or an empty list otherwise. -
Otherwise, if options.
axes
contains duplicate values, or if any of its elements is not in the range 0 to input’s rank, exclusive, then return failure. -
For each index in the range 0 to options.
axes
's size, exclusive: -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "layerNormalization" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
The behavior of this operation when the axes parameter is set to [1,2,3] can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
// The reduction of the mean and variance values happens over the spatial dimensions // across all the input features (i.e. all channels) of the input tensor. const reduceOptions= { axes: [ 1 , 2 , 3 ], keepDimensions: true }; const mean= builder. reduceMean( input, reduceOptions); const variance= builder. reduceMean( builder. pow( builder. sub( input, mean), buider. constant( 2 )), reduceOptions); // The scale and bias tensors are of the shape of the input dimensions specified // by the values in the axes parameter (i.e. [1,2,3]). return builder. add( builder. mul( options. scale, builder. div( builder. sub( input, mean), buidler. sqrt( builder. add( variance, options. epsilon)) ) ), options. bias);
7.8.26. leakyRelu
Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expressionmax(0, x) + alpha * min(0, x)
.
dictionary {
MLLeakyReluOptions float alpha = 0.01; };partial interface MLGraphBuilder {MLOperand leakyRelu (MLOperand ,
input optional MLLeakyReluOptions = {});
options MLActivation leakyRelu (optional MLLeakyReluOptions = {}); };
options
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. add( builder. max( builder. constant( 0 ), x), builder. mul( builder. constant( options. alpha), builder. min( builder. constant( 0 ), x)));
MLLeakyReluOptions
has the following members:
alpha
, of type float, defaulting to0.01
-
A scalar multiplier.
7.8.26.1. leakyRelu(input, options)
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLLeakyReluOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The leakyRelu(input, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "leakyRelu" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.26.2. leakyRelu(options)
-
options: an optional
MLLeakyReluOptions
. The optional parameters of the operation.
Returns:
-
an
MLActivation
. The activation function representing the leaky relu operation.
The leakyRelu(options)
method steps are:
-
Let op be the result of creating an MLActivation given this, "leakyRelu" and options.
-
Return op.
7.8.27. linear
Calculate a linear functiony = alpha * x + beta
on the input tensor.
dictionary {
MLLinearOptions float alpha = 1;float beta = 0; };partial interface MLGraphBuilder {MLOperand linear (MLOperand ,
input optional MLLinearOptions = {});
options MLActivation linear (optional MLLinearOptions = {}); };
options
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. add( builder. mul( x, builder. constant( options. alpha)), builder. constant( options. beta));
MLLinearOptions
has the following members:
alpha
, of type float, defaulting to1
-
A scalar multiplier.
beta
, of type float, defaulting to0
-
A scalar addition.
7.8.27.1. linear(input, options)
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLLinearOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The linear(input, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "linear" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.27.2. linear(options)
-
options: an optional
MLLinearOptions
. The optional parameters of the operation.
Returns:
-
an
MLActivation
. The activation function representing the linear operation.
The linear(options)
method steps are:
-
Let op be the result of creating an MLActivation given this, "linear" and options.
-
Return op.
7.8.28. lstm
Long Short-Term Memory [LSTM] recurrent network uses an input, output, forget, and cell gate to compute the output state that rolls into the output across the temporal sequence of the network.enum {
MLLstmWeightLayout , // input-output-forget-cell gate ordering
"iofg" // input-forget-cell-output gate ordering };
"ifgo" dictionary {
MLLstmOptions MLOperand bias ;MLOperand recurrentBias ;MLOperand peepholeWeight ;MLOperand initialHiddenState ;MLOperand initialCellState ;boolean returnSequence =false ;MLRecurrentNetworkDirection direction = "forward";MLLstmWeightLayout layout = "iofg";sequence <MLActivation >activations ; };partial interface MLGraphBuilder {sequence <MLOperand >lstm (MLOperand ,
input MLOperand ,
weight MLOperand , [
recurrentWeight EnforceRange ]unsigned long , [
steps EnforceRange ]unsigned long ,
hiddenSize optional MLLstmOptions = {}); };
options
MLLstmOptions
has the following members:
bias
, of type MLOperand-
The 2-D input bias tensor of shape [numDirections, 4 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to
layout
. recurrentBias
, of type MLOperand-
The 2-D recurrent bias tensor of shape [numDirections, 4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to
layout
. peepholeWeight
, of type MLOperand-
The 2-D weight tensor for peepholes of shape [numDirections, 3 * hiddenSize]. The pack ordering of the weight vectors is for the
input (i)
,output (o)
, andforget (f)
gate, respectively. initialHiddenState
, of type MLOperand-
The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations SHOULD use a tensor filled with zero.
initialCellState
, of type MLOperand-
The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations SHOULD use a tensor filled with zero.
returnSequence
, of type boolean, defaulting tofalse
-
Indicates whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step.
direction
, of type MLRecurrentNetworkDirection, defaulting to"forward"
-
The processing direction of the input sequence. When set to
"both"
, the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. layout
, of type MLLstmWeightLayout, defaulting to"iofg"
-
The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the
input (i)
,output (o)
,forget (f)
, andcell (g)
gate, as indicated in the first dimension of the weight and bias tensor shapes. activations
, of type sequence<MLActivation>-
A list of three activation functions, the first one is used for the
input (i)
,forget (f)
, andoutput (o)
gate, the second one is used for thecell (g)
gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, implementations SHOULD use the sequence of the sigmoid function ("sigmoid") followed by two hyperbolic tangent functions ("tanh") respectively.
-
input: an
MLOperand
. The input 3-D tensor of shape [steps, batchSize, inputSize]. -
weight: an
MLOperand
. The 3-D input weight tensor of shape [numDirections, 4 * hiddenSize, inputSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout
. -
recurrentWeight: an
MLOperand
. The 3-D recurrent weight tensor of shape [numDirections, 4 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout
argument. -
steps: an
unsigned long
scalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize: an
unsigned long
scalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options: an optional
MLLstmOptions
. The optional parameters of the operation.
Returns: a sequence of MLOperand
. The first element of the sequence is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the output hidden state from the last time step of the network. The second element is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the output cell state from the last time step of the network. Additionally, if options.returnSequence
is set to true, the third element is the 4-D output tensor of shape [steps, numDirections, batchSize, hiddenSize] containing every output from each time step in the temporal sequence.
The lstm(input, weight, recurrentWeight, steps, hiddenSize, options)
method steps are:
-
If validating operand with this and any of input, weight, recurrentWeight, options.
bias
(if it exists), options.recurrentBias
(if it exists), options.peepholeWeight
(if it exists), options.(if it exists), and options.
initialCellState
(if it exists) returns false, then throw aTypeError
. -
If options.
activations
exists, and validating activation with this and any item in it returns false, then throw aTypeError
. -
Let numDirections be 2 if options.
direction
is"both"
, or 1 otherwise. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If input’s shape[0] is not equal to steps, then throw a
TypeError
. -
If the dataType of either weight or recurrentWeight is not equal to input’s dataType, then throw a
TypeError
. -
Let batchSize be input’s shape[1].
-
Let inputSize be input’s shape[2].
-
If the dataType of either weight or recurrentWeight is not equal to input’s dataType, then throw a
TypeError
. -
If weight’s shape is not equal to « numDirections, 4 * hiddenSize, inputSize », then throw a
TypeError
. -
If recurrentWeight’s shape is not equal to « numDirections, 4 * hiddenSize, hiddenSize », then throw a
TypeError
. -
If hiddenSize * 8 is not a valid dimension, then throw a
TypeError
.Why hiddenSize * 8 ?
Some underlying platforms operate on a single bias tensor which is a concatenation ofbias
andrecurrentBias
. Therefore, 4 * hiddenSize + 4 * hiddenSize must also be a valid dimension. -
If options.
recurrentBias
exists: -
If options.
peepholeWeight
exists: -
If options.
exists:
-
If options.
initialCellState
exists: -
If options.
activations
exists:-
Let gateDescriptor be a new
MLOperandDescriptor
. -
Set gateDescriptor.
dimensions
to the list « batchSize, hiddenSize ». -
If running the validation steps of any item in options.
activations
with gateDescriptor returns false, then throw aTypeError
.
-
Calculate the output shape:
-
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to the list « numDirections, batchSize, hiddenSize ». -
If options.
returnSequence
is true:-
Let desc2 be a new
MLOperandDescriptor
. -
Set desc2.
dimensions
to the list « steps, numDirections, batchSize, hiddenSize ».
-
-
-
Make graph connections:
-
Let operator be an operator for the "lstm" operation, given weight, recurrentWeight, steps, hiddenSize and options.
-
Let output0 be the result of creating an MLOperand given this and desc.
-
Let output1 be the result of creating an MLOperand given this and desc.
-
If options.
returnSequence
is true:-
Let output2 be the result of creating an MLOperand given this and desc2.
-
Let output be the list « output0, output1, output2 ».
-
Set output0.
[[operator]]
, output1.[[operator]]
and output2.[[operator]]
to operator.
-
-
Otherwise:
-
Let output be the list « output0, output1 ».
-
Set output0.
[[operator]]
and output1.[[operator]]
to operator.
-
-
Set operator’s inputs to input, weight, and recurrentWeight.
-
If options.
recurrentBias
exists, then add it to operator’s inputs. -
If options.
peepholeWeight
exists, then add it to operator’s inputs. -
If options.
initialCellState
exists, then add it to operator’s inputs. -
If options.
activations
exists, then add its items to operator’s activation functions. -
Set operator’s output to output.
-
-
Return output.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
function squeeze( builder, op) { return builder. reshape( op, op. shape(). remove( 0 )); } const numDirections= ( options. direction== "both" ? 2 : 1 ); let hiddenState= options. initialHiddenState; let cellState= options. initialCellState; if ( ! hiddenState) { const desc= { dataType: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; hiddenState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } if ( ! cellState) { const desc= { dataType: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; cellState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } let sequence= null ; let currentWeight= []; let currentRecurrentWeight= []; let currentBias= []; let currentRecurrentBias= []; let currentPeepholeWeight= []; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentWeight. push( squeeze( builder, builder. slice( weight, [ dir, 0 , 0 ], [ 1 , 4 * hiddenSize, inputSize]))); currentRecurrentWeight. push( squeeze( builder, builder. slice( recurrentWeight, [ dir, 0 , 0 ], [ 1 , 4 * hiddenSize, hiddenSize]))); currentBias. push( options. bias? ( squeeze( builder, builder. slice( options. bias, [ dir, 0 ], [ 1 , 4 * hiddenSize]))) : null ); currentRecurrentBias. push( options. recurrentBias? ( squeeze( builder, builder. slice( options. recurrentBias, [ dir, 0 ], [ 1 , 4 * hiddenSize]))) : null ); currentPeepholeWeight. push( options. peepholeWeight? ( squeeze( builder, builder. slice( options. peepholeWeight, [ dir, 0 ], [ 1 , 3 * hiddenSize]))) : null ); } for ( let step= 0 ; step< steps; ++ step) { let currentHidden= []; let currentCell= []; let nextHidden= null ; let nextCell= null ; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentHidden. push( squeeze( builder, builder. slice( hiddenState, [ dir, 0 , 0 ], [ 1 , batchSize, hiddenSize]))); currentCell. push( squeeze( builder, builder. slice( cellState, [ dir, 0 , 0 ], [ 1 , batchSize, hiddenSize]))); } for ( let dir= 0 ; dir< numDirections; ++ dir) { let slice= ( dir== 1 || options. direction== "backward" ? steps- step- 1 : step); let currentInput= squeeze( builder, builder. slice( input, [ slice, 0 , 0 ], [ 1 , batchSize, inputSize])); let results= builder. lstmCell( currentInput, currentWeight[ dir], currentRecurrentWeight[ dir], currentHidden[ dir], currentCell[ dir], hiddenSize, { bias: currentBias[ dir], recurrentBias: currentRecurrentBias[ dir], peepholeWeight: currentPeepholeWeight[ dir], layout: options. layout, activations: options. activations}); let output= builder. reshape( results[ 0 ], [ 1 , batchSize, hiddenSize]); let cell= builder. reshape( results[ 1 ], [ 1 , batchSize, hiddenSize]); nextHidden= ( nextHidden? builder. concat([ nextHidden, output], 0 ) : output); nextCell= ( nextCell? builder. concat([ nextCell, cell], 0 ) : cell); } hiddenState= nextHidden; cellState= nextCell; if ( options. returnSequence) { nextHidden= builder. reshape( nextHidden, [ 1 , numDirections, batchSize, hiddenSize]); sequence= ( sequence? builder. concat([ sequence, nextHidden], 0 ) : nextHidden); } } return ( sequence? [ hiddenState, cellState, sequence] : [ hiddenState, cellState]);
7.8.29. lstmCell
A single time step of the Long Short-Term Memory [LSTM] recurrent network using a cell state, an input, output, and forget gate to compute the cell state and the hidden state of the next time step that rolls into the output across the temporal sequence of the network.dictionary {
MLLstmCellOptions MLOperand bias ;MLOperand recurrentBias ;MLOperand peepholeWeight ;MLLstmWeightLayout layout = "iofg";sequence <MLActivation >activations ; };partial interface MLGraphBuilder {sequence <MLOperand >lstmCell (MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight MLOperand ,
hiddenState MLOperand , [
cellState EnforceRange ]unsigned long ,
hiddenSize optional MLLstmCellOptions = {}); };
options
MLLstmCellOptions
has the following members:
bias
, of type MLOperand-
The 1-D input bias tensor of shape [4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the
layout
argument. recurrentBias
, of type MLOperand-
The 1-D recurrent bias tensor of shape [4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the
layout
argument. peepholeWeight
, of type MLOperand-
The 1-D weight tensor for peepholes of shape [3 * hiddenSize]. The pack ordering of the weight vectors is for the
input (i)
,output (o)
, andforget (f)
gate, respectively. layout
, of type MLLstmWeightLayout, defaulting to"iofg"
-
The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the
input (i)
,output (o)
,forget (f)
, andcell (g)
gate, as indicated in the first dimension of the weight and bias tensor shapes. activations
, of type sequence<MLActivation>-
A list of three activation functions, the first one is used for the
input (i)
,forget (f)
, andoutput (o)
gate, the second one is used for thecell (g)
gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, they are assumed to be of the sigmoid function ("sigmoid") followed by two hyperbolic tangent functions ("tanh") respectively.
-
input: an
MLOperand
. The input 2-D tensor of shape [batchSize, inputSize]. -
weight: an
MLOperand
. The 2-D input weight tensor of shape [4 * hiddenSize, inputSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentWeight: an
MLOperand
. The 2-D recurrent weight tensor of shape [4 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
hiddenState: an
MLOperand
. The 2-D input hidden state tensor of shape [batchSize, hiddenSize]. -
cellState: an
MLOperand
. The 2-D input cell state tensor of shape [batchSize, hiddenSize]. -
hiddenSize: an
unsigned long
scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options: an optional
MLLstmCellOptions
. The optional parameters of the operation.
Returns: a sequence of MLOperand
. The first element of the sequence is the output hidden state of the current time step of the recurrent network. The following element is the output cell state. Both elements are 2-D tensors of shape [batchSize, hiddenSize].
The lstmCell(input, weight, recurrentWeight, hiddenState, cellState, hiddenSize, options)
method steps are:
-
If validating operand with this and any of input, weight, recurrentWeight, hiddenState, cellState, options.
bias
(if it exists), options.recurrentBias
(if it exists), and options.peepholeWeight
(if it exists) returns false, then throw aTypeError
. -
If options.
activations
exists, and validating activation with this and any item in it returns false, then throw aTypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If the dataType of any of weight, recurrentWeight, hiddenState or cellState is not equal to input’s dataType, then throw a
TypeError
. -
Let batchSize be input’s shape[0].
-
Let inputSize be input’s shape[1].
-
If weight’s shape is not equal to « 4 * hiddenSize, inputSize », then throw a
TypeError
. -
If recurrentWeight’s shape is not equal to « 4 * hiddenSize, hiddenSize », then throw a
TypeError
. -
If hiddenState’s shape is not equal to « batchSize, hiddenSize », then throw a
TypeError
. -
If cellState’s shape is not equal to « batchSize, hiddenSize », then throw a
TypeError
. -
If hiddenSize * 8 is not a valid dimension, then throw a
TypeError
.Why hiddenSize * 8 ?
Some underlying platforms operate on a single bias tensor which is a concatenation ofbias
andrecurrentBias
. Therefore, 4 * hiddenSize + 4 * hiddenSize must also be a valid dimension. -
If options.
recurrentBias
exists: -
If options.
peepholeWeight
exists: -
If options.
activations
exists: -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to the list « batchSize, hiddenSize ». -
If options.
activations
exists, and running the validation steps of any item in it with desc returns false, then throw aTypeError
. -
Make graph connections:
-
Let output0 be the result of creating an MLOperand given this and desc.
-
Let output1 be the result of creating an MLOperand given this and desc.
-
Let output be the list « output0, output1 ».
-
Let operator be an operator for the "lstmCell" operation, given weight, recurrentWeight, hiddenState, cellState, hiddenSize and options.
-
Set output0.
[[operator]]
and output1.[[operator]]
to operator. -
Set operator’s inputs to input, weight, recurrentWeight, hiddenState, and cellState.
-
If options.
recurrentBias
exists, then add it to operator’s inputs. -
If options.
peepholeWeight
exists, then add it to operator’s inputs. -
If options.
activations
exists, then add its items to operator’s activation functions. -
Set operator’s output to output.
-
-
Return output.
The behavior of this operation when the weight layout is the default "iofg"
layout, and the activation functions of the input/forget/output gate and the cell gate/the cell state’s filter for the output hidden state are sigmoid()
and tanh()
respectively can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
const zero= builder. constant( 0 ); // input gate (i) let i= builder. sigmoid( builder. add( builder. mul( cellState, ( options. peepholeWeight? builder. slice( options. peepholeWeight, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 0 ], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 0 , 0 ], [ hiddenSize, inputSize])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 0 , 0 ], [ hiddenSize, hiddenSize])) ) ) ) ) ); // forget gate (f) let f= builder. sigmoid( builder. add( builder. mul( cellState, ( options. peepholeWeight? builder. slice( options. peepholeWeight, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, inputSize])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, hiddenSize])) ) ) ) ) ); // cell gate (g) let g= builder. tanh( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 3 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 3 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 3 * hiddenSize, 0 ], [ hiddenSize, inputSize])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 3 * hiddenSize, 0 ], [ hiddenSize, hiddenSize])) ) ) ) ); // output gate (o) let o= builder. sigmoid( builder. add( builder. mul( cellState, ( options. peepholeWeight? builder. slice( options. peepholeWeight, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ hiddenSize, 0 ], [ hiddenSize, inputSize])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ hiddenSize, 0 ], [ hiddenSize, hiddenSize])) ) ) ) ) ); // output cell state (ct) let ct= builder. add( builder. mul( f, cellState), builder. mul( i, g)); // output hidden state (ht) let ht= builder. mul( o, builder. tanh( ct)); return [ ht, ct];
7.8.30. matmul
Compute the matrix product of two input tensors.partial interface MLGraphBuilder {MLOperand matmul (MLOperand ,
a MLOperand ); };
b
-
a: an
MLOperand
. The first input tensor which is at least 2-D. -
b: an
MLOperand
. The second input tensor which is at least 2-D.
Returns: an MLOperand
. The output tensor that contains the matrix
product of two input tensors.
-
If both a and b are 2-dimensional, they are multiplied like conventional matrices and produce a 2-dimensional tensor as the output.
-
If either a or b is
N
-dimensional whereN > 2
, it is treated as a stack of matrices with dimensions corresponding to the last two indices. The matrix multiplication will be broadcasted accordingly by following the [numpy-broadcasting-rule]. The output is aN
-dimensional tensor whose rank is the maximum rank of the input tensors. For each dimension, except the last two, of the output tensor, its size is the maximum size along that dimension of the input tensors.
To calculate matmul output sizes, given MLOperand
a and MLOperand
b run the following steps:
-
Let rankA be a’s rank.
-
Let rankB be b’s rank.
-
If either rankA or rankB is less than 2, then throw a
TypeError
. -
Let colsA be shapeA[rankA - 1].
-
Let rowsA be shapeA[rankA - 2].
-
Let colsB be shapeB[rankB - 1].
-
Let rowsB be shapeB[rankB - 2].
-
Let batchShapeA be a clone of shapeA with the spatial dimensions (last 2 items) removed.
-
Let batchShapeB be a clone of shapeB with the spatial dimensions (last 2 items) removed.
-
Let outputShape be the result of bidirectionally broadcasting the shapes batchShapeA and batchShapeB. If that returns failure, then throw a
TypeError
. -
Append « rowsA, colsB » to outputShape.
-
Return outputShape.
The matmul(a, b)
method steps are:
-
If validating operand with this and any of a and b returns false, then throw a
TypeError
. -
If a’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If b’s dataType is not equal to a’s dataType, then throw a
TypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to the result of calculating matmul output sizes given a and b. -
If that throws an error, re-throw the error.
-
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "matmul" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to a and b.
-
Set operator’s output to output.
-
-
Return output.
7.8.31. pad
Inflate the tensor with constant or mirrored values on the edges.enum {
MLPaddingMode ,
"constant" ,
"edge" ,
"reflection" };
"symmetric" dictionary {
MLPadOptions MLPaddingMode mode = "constant";float value = 0; };partial interface MLGraphBuilder {MLOperand pad (MLOperand ,
input sequence <[EnforceRange ]unsigned long >,
beginningPadding sequence <[EnforceRange ]unsigned long >,
endingPadding optional MLPadOptions = {}); };
options
MLPadOptions
has the following members:
mode
, of type MLPaddingMode, defaulting to"constant"
-
The different ways to pad the tensor.
value
, of type float, defaulting to0
-
The padding value when
mode
is set to"constant"
.
-
input: an
MLOperand
. The input tensor. -
beginningPadding: a sequence of
unsigned long
. The sequence of unsigned integer values indicating the number of padding values to add at the beginning of each input dimension, of length N where N is the rank of the input tensor. For each dimension d of input, beginningPadding[d] indicates how many values to add before the content in that dimension. -
endingPadding: a sequence of
unsigned long
. The sequence of unsigned integer values indicating the number of padding values to add at the ending of each input dimension, of length N where N is the rank of the input tensor. For each dimension d of input, endingPadding[d] indicates how many values to add after the content in that dimension. -
options: an optional
MLPadOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The padded output tensor. Each dimension of the output tensor can be calculated as follow:
output size = beginning padding + input size + ending padding
To calculate padding output sizes, given input, beginningPadding and endingPadding, run the following steps:
The pad(input, beginningPadding, endingPadding, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If beginningPadding’s size and endingPadding’s size are not both equal to input’s rank, then throw a "
TypeError
". -
Let desc be a copy of input.
[[descriptor]]
. -
Let outputShape be the result of calculating padding output sizes given input, beginningPadding and endingPadding.
-
If any item in outputShape is not a valid dimension, then throw a
TypeError
. -
Set desc.
dimensions
to outputShape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "padding" operation, given beginningPadding, endingPadding and options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
Examples for constant, edge, reflection and symmetric padding:
// input: [[1,2,3], [4,5,6]] const input= builder. constant( { dataType: 'float32' , dimensions: [ 2 , 3 ] }, new Float32Array([ 1 , 2 , 3 , 4 , 5 , 6 ])); const beginningPadding= [ 1 , 2 ]; const endingPadding= [ 1 , 2 ]; // "constant" padded: // [[0,0,0,0,0,0,0], // [0,0,1,2,3,0,0], // [0,0,4,5,6,0,0], // [0,0,0,0,0,0,0]] builder. pad( input, beginningPadding, endingPadding); // "edge" padded: // [[1,1,1,2,3,3,3], // [1,1,1,2,3,3,3], // [4,4,4,5,6,6,6], // [4,4,4,5,6,6,6]] builder. pad( input, beginningPadding, endingPadding, { mode: "edge" }); // "reflection" padded: // [[6,5,4,5,6,5,4], // [3,2,1,2,3,2,1], // [6,5,4,5,6,5,4], // [3,2,1,2,3,2,1]] builder. pad( input, beginningPadding, endingPadding, { mode: "reflection" }); // "symmetric" padded: // [[2,1,1,2,3,3,2], // [2,1,1,2,3,3,2], // [5,4,4,5,6,6,5], // [5,4,4,5,6,6,5]] builder. pad( input, beginningPadding, endingPadding, { mode: "symmetric" });
7.8.32. Pooling operations
Compute a pooling operation across all the elements within the moving window over the input tensor.enum {
MLRoundingType ,
"floor" };
"ceil" dictionary {
MLPool2dOptions sequence <[EnforceRange ]unsigned long >windowDimensions ;sequence <[EnforceRange ]unsigned long >padding ;sequence <[EnforceRange ]unsigned long >strides ;sequence <[EnforceRange ]unsigned long >dilations ;MLInputOperandLayout layout = "nchw";MLRoundingType roundingType = "floor";sequence <[EnforceRange ]unsigned long >outputSizes ; };partial interface MLGraphBuilder {MLOperand averagePool2d (MLOperand ,
input optional MLPool2dOptions = {});
options MLOperand l2Pool2d (MLOperand ,
input optional MLPool2dOptions = {});
options MLOperand maxPool2d (MLOperand ,
input optional MLPool2dOptions = {}); };
options
MLPool2dOptions
has the following members:
windowDimensions
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2: [windowHeight, windowWidth]. Specifies the dimensions of the sliding window. The default value for the window dimensions are the height and width dimensions of the input shape.
padding
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0,0,0,0].
strides
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1,1].
dilations
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1,1].
layout
, of type MLInputOperandLayout, defaulting to"nchw"
-
Specifies the layout format of the input and output tensor as follows:
roundingType
, of type MLRoundingType, defaulting to"floor"
-
The rounding function used to compute the output shape.
outputSizes
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2. Specifies the sizes of the two spacial dimensions of the output tensor. When the output sizes are explicitly specified, the
roundingType
is ignored.If not specified, the output sizes are automatically computed.
-
input: an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.layout. -
options: an optional
MLPool2dOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The output 4-D tensor that contains the
result of the reduction. The logical shape is interpreted according to the
value of layout. More specifically, if the options.roundingType is "floor"
, the spatial dimensions of the output tensor can be calculated as follow:
output size = floor(1 + (input size - filter size + beginning padding + ending padding) / stride)
or if options.roundingType is "ceil"
:
output size = ceil(1 + (input size - filter size + beginning padding + ending padding) / stride)
// 'global' max pooling builder. maxPool2d( input);
To calculate pool2d output sizes given MLInputOperandLayout
layout, list of 4 unsigned integers inputShape, MLRoundingType
roundingType, list of 2 unsigned integers windowDimensions, list of 4 unsigned integers padding, list of 2 unsigned integers strides, list of 2 unsigned integers dilations, and optional list of 2 unsigned integers outputSizes, perform these steps. They return a list of 4 unsigned integers.
-
Switch on layout:
-
If outputSizes is not given, then:
-
Let outputHeight be outputSizes[0].
-
Let outputWidth be outputSizes[1].
-
-
Otherwise:
-
Let outputSizes be the result of calculating conv2d output sizes given inputHeight, inputWidth, windowDimensions[0], windowDimensions[1], padding, strides, and dilations.
-
Let outputHeight be outputSizes[0].
-
Let outputWidth be outputSizes[1].
-
Switch on roundingType
-
-
Switch on layout:
To create pooling operation given string op, MLOperand
input, MLPool2dOptions
options, and optional list allowedDataTypes, run the following steps:
-
Assert: op is one of "averagePool2d", "l2Pool2d", "maxPool2d".
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If allowedDataTypes is given and it does not contain input’s dataType, then throw a
TypeError
. -
If options.
windowDimensions
exists and its size is not 2, then throw aTypeError
. -
Otherwise, set options.
windowDimensions
to the height and width dimensions of the shape of input. -
If options.
outputSizes
exists, or if options.padding
does not exist, set options.padding
to the list « 0, 0, 0, 0 ». -
If options.
strides
does not exist, set options.strides
to the list « 1, 1 ». -
If any value in options.
strides
is not greater than 0, then throw aTypeError
. -
If options.
outputSizes
exists: -
If options.
dilations
does not exist, set options.dilations
to the list « 1, 1 ». -
If options.
dilations
's size is not 2, then throw aTypeError
. -
If any value in options.
dilations
is not greater than 0, then throw aTypeError
. -
Let desc be a copy of input.
[[descriptor]]
. -
Let outputShape be the result of calculating pool2d output sizes given options.
layout
, input’s shape, options.roundingType
, options.windowDimensions
, options.padding
, options.strides
, options.dilations
, and options.outputSizes
(if it exists). -
If any item in outputShape is not a valid dimension, then throw a
TypeError
. -
Set desc.
dimensions
to outputShape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the op operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
The following pooling algorithms are supported.
averagePool2d(input, options)
method steps are:
-
Let output be the result of running the create pooling operation given "averagePool2d", input, options, and «
"float32"
,"float16"
». -
Return output.
l2Pool2d(input, options)
method steps are:
-
Let output be the result of running the create pooling operation given "l2Pool2d", input, options, and «
"float32"
,"float16"
». -
Return output.
maxPool2d(input, options)
method steps are:
-
Let output be the result of running the create pooling operation given "maxPool2d", input and options.
-
Return output.
7.8.32.1. averagePool2d
Calculate the average value for patches of a feature map, and use it to create a pooled feature map. See § 7.8.32 Pooling operations for more detail.7.8.32.2. l2Pool2d
Apply the L2 norm function to a region of the input feature map. The L2 norm is the square root of the sum of the squares of its elements. See § 7.8.32 Pooling operations for more detail.7.8.32.3. maxPool2d
Calculate the maximum value for patches of a feature map, and use it to create a pooled feature map. See § 7.8.32 Pooling operations for more detail.7.8.33. prelu
Calculate the parametric version of rectified linear function (Parametric ReLU) on the input tensor element-wise. Parametric ReLU is a type of leaky ReLU that, instead of having a scalar slope like 0.01, making the slope (coefficient of leakage) into a parameter that is learned during the model training phase of this operation. The calculation follows the expressionmax(0, x) + slope * min(0, x)
.
partial interface MLGraphBuilder {MLOperand prelu (MLOperand ,
input MLOperand ); };
slope
-
input: an
MLOperand
. The input tensor. -
slope: an
MLOperand
. The slope tensor. Its shape is either the same as, or unidirectionally broadcastable to the shape of input tensor input.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The prelu(input, slope)
method steps are:
-
If validating operand with this and any of input and slope returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
,"float16"
,"int32"
, or"int8"
, then throw aTypeError
. -
If slope’s dataType is not equal to input’s dataType, then throw a
TypeError
. -
Let descriptor be a new
MLOperandDescriptor
. -
Set descriptor.
dimensions
to the result of unidirectionally broadcasting the shapes slope’s shape and input’s shape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and descriptor.
-
Let operator be an operator for the "prelu" operation, given slope.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to input and slope.
-
Set operator’s output to output.
-
-
Return output.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. add( builder. max( builder. constant( 0 ), x), builder. mul( slope, builder. min( builder. constant( 0 ), x)));
7.8.34. Reduction operations
Reduce the input tensor along all dimensions, or along the axes specified in theaxes
array parameter. For each specified axis, the dimension with that index is reduced, i.e. the resulting tensor will not contain it, unless the keepDimensions
option is specified. The values of the resulting tensor are calculated using the specified reduction function that takes as parameters all the values across the reduced dimension.
dictionary {
MLReduceOptions sequence <[EnforceRange ]unsigned long >axes ;boolean keepDimensions =false ; };partial interface MLGraphBuilder {MLOperand reduceL1 (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceL2 (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceLogSum (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceLogSumExp (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceMax (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceMean (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceMin (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceProduct (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceSum (MLOperand ,
input optional MLReduceOptions = {});
options MLOperand reduceSumSquare (MLOperand ,
input optional MLReduceOptions = {}); };
options
MLReduceOptions
has the following members:
axes
, of typesequence<[EnforceRange] unsigned long>
-
The dimensions to reduce. The values in the list must be in the range [0, N-1] where N is the rank of the input tensor. If not present, all dimensions are reduced. If empty, no dimensions are reduced, and the shape of the output tensor is the same as the shape of the input tensor.
keepDimensions
, of type boolean, defaulting tofalse
-
If true, the output has the same rank as the input, setting any reduced dimensions to size 1.
-
input: an
MLOperand
. The input tensor. -
options: an optional
MLReduceOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The reduced output tensor.
-
L1: Compute the L1 norm of all the input values along the axes.
-
L2: Compute the L2 norm of all the input values along the axes.
-
LogSum: Compute the log value of the sum of all the input values along the axes.
-
LogSumExp: Compute the log value of the sum of the exponent of all the input values along the axes.
-
Max: Compute the maximum value of all the input values along the axes.
-
Mean: Compute the average value of all the input values along the axes.
-
Min: Compute the minimum value of all the input values along the axes.
-
Product: Compute the product of all the input values along the axes.
-
Sum: Compute the sum of all the input values along the axes.
-
SumSquare: Compute the sum of the square of all the input values along the axes.
To calculate reduction output sizes, given a list of unsigned integers inputShape, a optional list of unsigned integers axes, and boolean keepDimensions, perform the following steps. They return a new list of unsigned integers, or failure.
-
Let inputRank be inputShape’s size.
-
If axes is not given, let axes be the range 0 to inputRank, exclusive.
-
Otherwise, if axes contains duplicate values, or if any of its elements is not in the range 0 to inputRank, exclusive, then return failure.
-
If keepDimensions is true, then:
-
Otherwise:
-
Return outputShape.
To create reduce operation given string op, MLOperand
input, MLReduceOptions
options, and optional list allowedDataTypes, run the following steps:
-
Assert: op is one of "reduceL1", "reduceL2", "reduceLogSum", "reduceLogSumExp", "reduceMax", "reduceMean", "reduceMin", "reduceProduct", "reduceSum", "reduceSumSquare".
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If allowedDataTypes is given and it does not contain input’s dataType, then throw a
TypeError
. -
Let outputShape be the result of calculating reduction output sizes given input’s shape, options.
axes
(if it exists), and options.keepDimensions
. If that returns failure, then throw aTypeError
. -
Let desc be a new
MLOperandDescriptor
. -
Set desc.
dimensions
to outputShape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the op operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
The following reduce algorithms are supported.
reduceL1(input, options)
method steps are:
reduceL2(input, options)
method steps are:
-
Let output be the result of running the create reduce operation given "reduceL2", input, options, and «
"float32"
,"float16"
». -
Return output.
reduceLogSum(input, options)
method steps are:
-
Let output be the result of running the create reduce operation given "reduceLogSum", input, options, and «
"float32"
,"float16"
». -
Return output.
reduceLogSumExp(input, options)
method steps are:
-
Let output be the result of running the create reduce operation given "reduceLogSumExp", input, options, and «
"float32"
,"float16"
». -
Return output.
reduceMax(input, options)
method steps are:
-
Let output be the result of running the create reduce operation given "reduceMax", input and options.
-
Return output.
reduceMean(input, options)
method steps are:
-
Let output be the result of running the create reduce operation given "reduceMean", input, options, and «
"float32"
,"float16"
» -
Return output.
reduceMin(input, options)
method steps are:
-
Let output be the result of running the create reduce operation given "reduceMin", input and options.
-
Return output.
reduceProduct(input, options)
method steps are:
reduceSum(input, options)
method steps are:
The behavior of several reduction operations can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
// reduceLogSum(input, options) return builder. log( builder. reduceSum( input, options)); // reduceLogSumExp(input, options) return builder. log( builder. reduceSum( builder. exp( input), options)); // reduceSumSquare(input, options) return builder. reduceSum( builder. pow( input, 2 ), options);
keepDimensions
directly. This does not affect the underlying tensor data, only the shape. For example, if the input shape is [2, 3, 4], the axis is 1, and keepDimensions
is true, the expected output shape is [2, 1 ,4]. If the underlying platform never keeps reduced dimensions it will produce an output shape of [2, 4]. The implementation can introduce a no-op reshape to [2, 1, 4]. A similar no-op reshape can be introduced if keepDimensions
is false but the underlying platform always keeps reduced dimensions. 7.8.35. relu
Compute the rectified linear function of the input tensor.partial interface MLGraphBuilder {MLOperand relu (MLOperand );
input MLActivation relu (); };
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. max( builder. constant( 0 ), x);
7.8.35.1. relu(input)
-
input: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The relu(input)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
,"float16"
,"int32"
, or"int8"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "relu" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.35.2. relu()
The relu()
method steps are:
-
Let op be the result of creating an MLActivation given this and "relu".
-
Return op.
7.8.36. resample2d
Resample the tensor values from the source to the destination spatial dimensions according to the scaling factors.enum {
MLInterpolationMode ,
"nearest-neighbor" };
"linear" dictionary {
MLResample2dOptions MLInterpolationMode mode = "nearest-neighbor";sequence <float >scales ;sequence <[EnforceRange ]unsigned long >sizes ;sequence <[EnforceRange ]unsigned long >axes ; };partial interface MLGraphBuilder {MLOperand resample2d (MLOperand ,
input optional MLResample2dOptions = {}); };
options
-
input: an
MLOperand
. The input 4-D tensor. -
options: an optional
MLResample2dOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The output 4-D tensor.
MLResample2dOptions
has the following members:
mode
, of type MLInterpolationMode, defaulting to"nearest-neighbor"
-
The interpolation algorithm used to fill the output tensor values.
scales
, of type sequence<float>-
A list of length 2. Specifies the scaling factor in each spatial dimensions of the input: [scaleHeight, scaleWidth]. The default value is [1.0, 1.0].
sizes
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2. Specifies the target sizes for each spatial dimensions of the input: [sizeHeight, sizeWidth]. When the target sizes are specified, the
scales
argument is ignored, since the scaling factor values are derived from the target sizes of each spatial dimension of the input. axes
, of typesequence<[EnforceRange] unsigned long>
-
A list of length 2. Specifies the two consecutive dimensions of the input tensor to which the interpolation algorithm applies. The valid values in the sequence are [0, 1], [1, 2] or [2, 3]. The default value is [2, 3].
To check resample options given options, run the following steps:
-
If options.
scales
does not exist, set it to the list « 1.0, 1.0 ». -
Otherwise, if any of its values is not greater than 0, or if its size is not 2, return false.
-
If options.
sizes
exists, and if its size is not 2, or if any of its values is not greater than 0, return false. -
If options.
axes
does not exists, set it to the list « 2, 3 ». -
Otherwise, if its value is not one of « 0, 1», « 1, 2», « 2, 3 », return false.
-
Return true.
To calculate resample output sizes given MLOperand
input and MLResample2dOptions
options, run the following steps:
-
Let desc be a new
MLOperandDescriptor
initialized to input.[[descriptor]]
. -
For each index in the range 0 to options.
axes
's size, exclusive:-
If options.
sizes
exists, then let size be options.sizes
[index]. -
Otherwise, let size be floor(input’s shape[options.
axes
[index]] * options.scales
[index]). -
If size is not a valid dimension, then return failure.
-
Set desc.
dimensions
[options.axes
[index]] to size.
-
-
Return desc.
The resample2d(input, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If checking resample options given options returns false, then throw a
TypeError
. -
Let desc be the result of calculating resample output sizes given input and options. If that returns failure, then throw a
TypeError
. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "resample2d" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.37. reshape
Alter the shape of a tensor to a new shape. Reshape does not copy or change the content of the tensor. It just changes the tensor’s logical dimensions for the subsequent operations.partial interface MLGraphBuilder {MLOperand reshape (MLOperand ,
input sequence <[EnforceRange ]unsigned long >); };
newShape
-
input: an
MLOperand
. The input tensor. -
newShape: a sequence of
unsigned long
. The shape of the output tensor. The number of elements implied by newShape must be the same as the number of elements in the input tensor.
Returns: an MLOperand
. The output tensor. The values of the output
tensor are the same as values of the input tensor. The shape of the output
tensor is specified by the newShape argument.
The reshape(input, newShape)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
Let outputShape be an empty array of
unsigned long
. -
If newShape’s size is 0, set outputShape to an empty list for a scalar.
-
If any item in newShape is not a valid dimension, then throw a
TypeError
. -
Let inputElementCount be the product of all elements in input’s shape. Empty dimensions yield an inputElementCount of 1.
-
If product of all values in newShape is not equal to inputElementCount, then throw a
TypeError
. -
Let desc be a copy of input.
[[descriptor]]
. -
Set desc.
dimensions
to newShape. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and desc.
-
Let operator be an operator for the "reshape" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
Many shape-related operations such as squeeze, unsqueeze, and flatten can be generically implemented using the reshape()
operation as follows:
// Returns a tensor with all specified dimensions of input of size 1 removed. function squeeze( input, axes) { if ( ! axes) axes= []; if ( ! axes. length) input. shape(). forEach(( item, i) => { axes. push( i); }); shape= Array. from ( input. shape()); for ( let axisin axes. sort(). reverse()) if ( axis< shape. length&& shape[ axis] == 1 ) shape. splice( axis, 1 ); return builder. reshape( input, shape); } // Returns a new tensor with a dimension of size one inserted at the specified position. function unsqueeze( input, axes) { shape= Array. from ( input. shape()); for ( let axisin axes. sort()) shape. splice( axis, 0 , 1 ); return builder. reshape( input, shape); } // Flattens input by reshaping it into a one-dimensional tensor. function flatten( input, axis) { if ( axis> input. shape(). length) return input; let before= axis. slice( 0 , axis). reduce(( a, b) => { a* b; }); let after= axis. slice( axis, input. shape(). length). reduce(( a, b) => { a* b; }); return builder. reshape( input, [ before, after]); }
7.8.38. sigmoid
Compute the sigmoid function of the input tensor. The calculation follows the expression1 / (exp(-x) + 1)
.
partial interface MLGraphBuilder {MLOperand sigmoid (MLOperand );
input MLActivation sigmoid (); };
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. div( builder. constant( 1 ), builder. add( builder. exp( builder. neg( x)), builder. constant( 1 )));
7.8.38.1. sigmoid(input)
-
input: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The sigmoid(input)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "sigmoid" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.38.2. sigmoid()
-
None.
Returns:
-
an
MLActivation
. The activation function representing the sigmoid operation.
The sigmoid()
method steps are:
-
Let op be the result of creating an MLActivation given this and "sigmoid".
-
Return op.
7.8.39. slice
Produce a slice of the input tensor.partial interface MLGraphBuilder {MLOperand slice (MLOperand ,
input sequence <[EnforceRange ]unsigned long >,
starts sequence <[EnforceRange ]unsigned long >); };
sizes
-
input: an
MLOperand
. The input tensor. -
starts: a sequence of
unsigned long
. The sequence of unsigned integer values indicating the starting index to slice of each input dimension, of length N where N is the rank of the input tensor. For each dimension d of input, starts[d] indicates the starting index to slice in that dimension. The starting index must be in the range [0, input size - 1] in that dimension. -
sizes: a sequence of
unsigned long
. The sequence of unsigned integer values indicating the number of elements to slice of each input dimension, of length N where N is the rank of the input tensor. For each dimension d of input, sizes[d] indicates the number of elements to slice in that dimension. The size must not be 0 and must satisfy the constraint starting index + size <= input size in that dimension.
Returns: an MLOperand
. The output tensor of the same rank as the input tensor with tensor values stripped to the specified starting and ending indices in each dimension.
The slice(input, starts, sizes)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If starts’s size and sizes’s size are not both equal to input’s rank, then throw a
TypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "slice" operation, given starts and sizes.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.40. softmax
Compute the softmax values of the N-D input tensor along the given axis.partial interface MLGraphBuilder {MLOperand softmax (MLOperand ,
input unsigned long );
axis MLActivation softmax (unsigned long ); };
axis
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
// This sample deploys a well-known implementation trick [1] to compute the // exponentials of the distances to the max value, instead of the exponentials // of the input values itself, in order to increase the numerical stability of // the result. // [1]: https://cs231n.github.io/linear-classify/#softmax const maxX= builder. reduceMax( x, { axes: [ axis], keepDimensions: true }); const expX= builder. exp( builder. sub( x, maxX)); return builder. div( expX, builder. reduceSum( expX, { axes: [ axis], keepDimensions: true }));
7.8.40.1. softmax(input, axis)
-
input: an
MLOperand
. The input N-D tensor. -
axis: an
unsigned long
scalar. The dimension the reduction will be performed on.
Returns:
-
an
MLOperand
. The output N-D tensor that contains the softmax results, of the same shape as input.
The softmax(input, axis)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
If axis is greater than or equal to input’s rank, then throw a
TypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "softmax" operation, given axis.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.40.2. softmax(axis)
-
None.
Returns:
-
an
MLActivation
. The activation function representing the softmax operation.
The softmax(axis)
method steps are:
-
Let validationSteps given
MLOperandDescriptor
descriptor be these steps:-
If axis is greater than or equal to descriptor.
dimensions
's size, then return false; -
Otherwise, return true.
-
-
Let op be the result of creating an MLActivation given this, "softmax", «[ "axis" → axis ]», and validationSteps.
-
Return op.
7.8.41. softplus
Compute the softplus function of the input tensor. The calculation follows the expressionln(1 + exp(x))
.
partial interface MLGraphBuilder {MLOperand softplus (MLOperand );
input MLActivation softplus (); };
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. log( builder. add( builder. exp( x), builder. constant( 1 )));
7.8.41.1. softplus(input)
-
input: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The softplus(input)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "softplus" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.41.2. softplus()
-
None.
Returns:
-
an
MLActivation
. The activation function representing the softplus operation.
The softplus()
method steps are:
-
Let op be the result of creating an MLActivation given this and "softplus".
-
Return op.
7.8.42. softsign
Compute the softsign function of the input tensor. The calculation follows the expressionx / (1 + |x|)
.
partial interface MLGraphBuilder {MLOperand softsign (MLOperand );
input MLActivation softsign (); };
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. div( x, builder. add( builder. constant( 1 ), builder. abs( x)));
7.8.42.1. softsign(input)
-
input: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The softsign(input)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "softsign" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.42.2. softsign()
-
None.
Returns:
-
an
MLActivation
. The activation function representing the softsign operation.
The softsign()
method steps are:
-
Let op be the result of creating an MLActivation given this and "softsign".
-
Return op.
7.8.43. split
Split the input tensor into a number of sub tensors along the given axis.dictionary { [
MLSplitOptions EnforceRange ]unsigned long axis = 0; };partial interface MLGraphBuilder {sequence <MLOperand >split (MLOperand , ([
input EnforceRange ]unsigned long or sequence <[EnforceRange ]unsigned long >),
splits optional MLSplitOptions = {}); };
options
-
input: an
MLOperand
. The input tensor. -
splits: an
unsigned long
or a sequence ofunsigned long
. If anunsigned long
, it specifies the number of output tensors along the axis. The number must evenly divide the dimension size of input along options.axis. If a sequence ofunsigned long
, it specifies the sizes of each output tensor along the options.axis. The sum of sizes must equal to the dimension size of input along options.axis. -
options: an optional
MLSplitOptions
. The optional parameters of the operation.
Returns: a sequence of MLOperand
. The splitted output tensors. If splits is an unsigned long
, the size of the output sequence equals to splits. The shape of each output tensor is the same as input except the dimension size of axis equals to the quotient of dividing the dimension size of input along axis by splits. If splits is a sequence of unsigned long
, the size of the output sequence equals to the size of splits. The shape of the i-th output tensor is the same as input except along axis where the dimension size is splits[i].
MLSplitOptions
has the following members:
axis
, of type unsigned long, defaulting to0
-
The dimension along which to split. Its value must be in the range [0, N-1] where N is the rank of the input tensor.
The split(input, splits, options)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
Let axis be options.
axis
. -
If splits is an
unsigned long
: -
If splits is a sequence of
unsigned long
: -
Make graph connections:
-
Let operator be an operator for the "split" operation, given splits and options.
-
Let outputs be a new list.
-
For each index in the range 0 to splitCount, exclusive:
-
Let operand be the result of copying an MLOperand given input.
-
If splits is an
unsigned long
, then let newDimension be operand’s shape[axis] / splits. -
Otherwise, let newDimension be splits[index].
-
Set operand’s shape[axis] to newDimension.
-
Set operand.
[[operator]]
to operator. -
Append operand to outputs.
-
-
Set operator’s input to input.
-
Set operator’s outputs to outputs.
-
-
Return outputs.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
// This sample shows the case that the splits parameter is an array. const outputs= []; let starts= Array( input_rank). fill( 0 ); let sizes= input_shape; let start= 0 ; for ( const sizeof splits) { starts[ options. axis] = start; sizes[ options. axis] = size; outputs. push( builder. slice( input, starts, sizes)); start+= size; } return outputs;
7.8.44. tanh
Compute the hyperbolic tangent function of the input tensor. The calculation follows the expression(exp(2 * x) - 1) / (exp(2 * x) + 1)
.
partial interface MLGraphBuilder {MLOperand tanh (MLOperand );
input MLActivation tanh (); };
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
return builder. div( builder. sub( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )), builder. add( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )));
7.8.44.1. tanh(input)
-
input: an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as input.
The tanh(input)
method steps are:
-
If validating operand with this and input returns false, then throw a
TypeError
. -
If input’s dataType is not
"float32"
or"float16"
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "tanh" operation.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.44.2. tanh()
The tanh()
method steps are:
-
Let op be the result of creating an MLActivation given this and "tanh".
-
Return op.
7.8.45. transpose
Permute the dimensions of the input tensor according to the permutation argument.dictionary {
MLTransposeOptions sequence <[EnforceRange ]unsigned long >permutation ; };partial interface MLGraphBuilder {MLOperand transpose (MLOperand ,
input optional MLTransposeOptions = {}); };
options
MLTransposeOptions
has the following members:
permutation
, of typesequence<[EnforceRange] unsigned long>
-
The values used to permute the output shape. The default value is [N-1, ..., 0], where N is the rank of the input tensor, e.g. [2,1,0] for a 3-D tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values in the sequence must be the same as the rank of the input tensor, and the values in the sequence must be within the range from 0 to N-1 with no two or more same values found in the sequence.
-
input: an
MLOperand
. The input N-D tensor. -
options: an optional
MLTransposeOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The permuted or transposed N-D tensor.
The transpose(input, options)
method steps are:
-
If options.
permutation
does not exist, let options.permutation
be the reversed sequence of all indices for input’s shape. -
Otherwise if options.
permutation
exists: -
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "transpose" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
7.8.46. triangular
Given a 2-D tensor (matrix), return a 2-D tensor containing either the upper or lower triangular part of the input tensor. If the input tensor has greater than 2 dimensions it is treated as a batch of matrices and the result has the same shape.dictionary {
MLTriangularOptions boolean upper =true ; [EnforceRange ]long diagonal = 0; };partial interface MLGraphBuilder {MLOperand triangular (MLOperand ,
input optional MLTriangularOptions = {}); };
options
MLTriangularOptions
has the following members:
upper
, of type boolean, defaulting totrue
-
Indicates whether the output the upper or the lower part of the input matrix is retained. True indicates that the upper part is retained.
diagonal
, of type long, defaulting to0
-
Specifies how many diagonals above or below the main diagonals of the input matrix are retained or excluded. A value of 0 means no diagonals other than the main diagonals are affected.
-
input: an
MLOperand
. The input tensor which is at least 2-D. -
options: an optional
MLTriangularOptions
. The optional parameters of the operation.
Returns: an MLOperand
. The output tensor representing a triangular matrix, or batch of matrices which is the same shape as the input.
The triangular(input, options)
method steps are:
-
Make graph connections:
-
Let output be the result of copying an MLOperand given input.
-
Let operator be an operator for the "triangular" operation, given options.
-
Set output.
[[operator]]
to operator. -
Set operator’s input to input.
-
Set operator’s output to output.
-
-
Return output.
Examples of how triangular works in different diagonal settings.
// input: // [[7, 1, 2], // [9, 4, 8], // [2, 6, 3]] const input= builder. constant( { dimensions: [ 3 , 3 ] }, new Float32Array([ 7 , 1 , 2 , 9 , 4 , 8 , 2 , 6 , 3 ])); // upper triangular matrix: // [[7, 1, 2], // [0, 4, 8], // [0, 0, 3]] const upper= builder. triangular( input); // upper triangular matrix with one additional set of diagonals excluded: // [[0, 1, 2], // [0, 0, 8], // [0, 0, 0]] const upperPositive= builder. triangular( input, { diagonal: 1 }); // upper triangular matrix with one additional set of diagonals retained: // [[7, 1, 2], // [9, 4, 8], // [0, 6, 3]] const upperNegative= builder. triangular( input, { diagonal: - 1 }); // lower triangular matrix: // [[7, 0, 0], // [9, 4, 0], // [2, 6, 3]] const lower= builder. triangular( input, { upper: false }); // lower triangular matrix with one additional set of diagonals retained: // [[7, 1, 0], // [9, 4, 8], // [2, 6, 3]] const lowerPositive= builder. triangular( input, { upper: false , diagonal: 1 }); // lower triangular matrix with one additional set of diagonals excluded: // [[0, 0, 0], // [9, 0, 0], // [2, 6, 0]] const lowerNegative= builder. triangular( input, { upper: false , diagonal: - 1 }) // lower triangular matrix with two batches: // [[[7, 0, 0], // [9, 4, 0], // [2, 6, 3]], // [[1, 0, 0], // [4, 5, 0], // [7, 8, 9]]] const lowerWithBatches= builder. triangular( input, { upper: false });
7.8.47. where
Select the values from the input or the other tensor depending on the correspondingboolean
values of the condition tensor. The condition tensor is often the output of one of the element-wise logical operations.
The input tensors will be broadcasted according to [numpy-broadcasting-rule] to the final output shape. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
partial interface MLGraphBuilder {MLOperand where (MLOperand ,
condition MLOperand ,
input MLOperand ); };
other
-
condition: an
MLOperand
. The condition tensor. -
input: an
MLOperand
. The input tensor from which the value is selected when the condition of the corresponding element is set to true. -
other: an
MLOperand
. The other tensor from which the value is selected when the condition of the corresponding element is set to false.
Returns: an MLOperand
. The output tensor that contains the values selected element-wise from either the input or the other tensor.
The where(condition, input, other)
method steps are:
-
If condition’s dataType is not equal to
"uint8"
, then throw aTypeError
. -
If input’s dataType is not equal to other’s dataType, then throw a
TypeError
. -
Let descriptor be a new
MLOperandDescriptor
. -
Set descriptor.
dimensions
to the result of bidirectionally broadcasting the shapes input’s shape and other’s shape. -
If condition is not bidirectionally broadcastable to descriptor.
dimensions
, then throw aTypeError
. -
Make graph connections:
-
Let output be the result of creating an MLOperand given this and descriptor.
-
Let operator be an operator for the "where" operation, given condition, input and other.
-
Set output.
[[operator]]
to operator. -
Set operator’s inputs to condition, input and other.
-
Set operator’s output to output.
-
-
Return output.
The behavior of this operation can be generically emulated from the usage of other operations as follows, although user agents typically have a more efficient implementation. In cases where the underlying platform does not directly support an operation, this decomposition can be used as a template to guide the implementation.
const c= builder. clamp( condition, { 'minValue' : 0 , 'maxValue' : 1 }); builder. add( builder. mul( input, builder. cast( c, input. dataType())), builder. mul( other, builder. cast( builder. not( c), other. dataType())));
8. Algorithms
8.1. Broadcasting
Broadcasting refers to how operations treat tensors with different shapes, and follow the precedent set by [numpy-broadcasting-rule].
To unidirectionally broadcast the shapes shapeA and shapeB, perform the following steps. shapeA and shapeB are lists of positive integers, representing the dimensions of tensors, and the steps return a new list of positive integers, or failure.
-
Let sizeA be shapeA’s size.
-
Let sizeB be shapeB’s size.
-
If sizeB > sizeA, then return failure.
-
Let paddedB be a clone of shapeB.
-
While paddedB’s size is less than sizeA, prepend 1 to paddedB.
-
Let outputShape be a new list.
-
For each index in the range 0 to sizeA, exclusive:
-
Let dimA be shapeA[index].
-
Let dimB be paddedB[index].
-
If dimA is not equal to dimB and dimA is not equal to 1, then return failure.
-
Append dimA to outputShape.
-
-
Return outputShape.
shapeA is unidirectionally broadcastable to shapeB if unidirectionally broadcasting the shapes shapeA and shapeB does not result in failure.
To bidirectionally broadcast the shapes shapeA and shapeB, perform the following steps. shapeA and shapeB are lists of positive integers, representing the dimensions of tensors, and the steps return a new list of positive integers, or failure.
-
Let sizeA be shapeA’s size.
-
Let sizeB be shapeB’s size.
-
Let outputSize be the maximum of sizeA and sizeB.
-
Let paddedA be a clone of shapeA.
-
While paddedA’s size is less than outputSize, prepend 1 to paddedA.
-
Let paddedB be a clone of shapeB.
-
While paddedB’s size is less than outputSize, prepend 1 to paddedB.
-
Let outputShape be a new list.
-
For each index in the range 0 to outputSize, exclusive:
-
Let dimA be paddedA[index].
-
Let dimB be paddedB[index].
-
If dimA is not equal to dimB, and dimA is not equal to 1, and dimB is not equal to 1, then return failure.
-
Append the maximum of dimA and dimB to outputShape.
-
-
Return outputShape.
shapeA is bidirectionally broadcastable to shapeB if bidirectionally broadcasting the shapes shapeA and shapeB does not result in failure.
9. Examples
const context= await navigator. ml. createContext({ powerPreference: 'low-power' });
constant1 ---+ +--- Add ---> intermediateOutput1 ---+ input1 ---+ | +--- Mul---> output constant2 ---+ | +--- Add ---> intermediateOutput2 ---+ input2 ---+
The following code implements the graph:
// Use tensors in 4 dimensions. const TENSOR_DIMS= [ 1 , 2 , 2 , 2 ]; const TENSOR_SIZE= 8 ; const builder= new MLGraphBuilder( context); // Create MLOperandDescriptor object. const desc= { dataType: 'float32' , dimensions: TENSOR_DIMS}; // constant1 is a constant MLOperand with the value 0.5. const constantBuffer1= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant1= builder. constant( desc, constantBuffer1); // input1 is one of the input MLOperands. Its value will be set before execution. const input1= builder. input( 'input1' , desc); // constant2 is another constant MLOperand with the value 0.5. const constantBuffer2= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant2= builder. constant( desc, constantBuffer2); // input2 is another input MLOperand. Its value will be set before execution. const input2= builder. input( 'input2' , desc); // intermediateOutput1 is the output of the first Add operation. const intermediateOutput1= builder. add( constant1, input1); // intermediateOutput2 is the output of the second Add operation. const intermediateOutput2= builder. add( constant2, input2); // output is the output MLOperand of the Mul operation. const output= builder. mul( intermediateOutput1, intermediateOutput2);
// Compile the constructed graph. const graph= await builder. build({ 'output' : output});
The following code executes the compiled graph.
// Setup the input buffers with value 1. const inputBuffer1= new Float32Array( TENSOR_SIZE). fill( 1 ); const inputBuffer2= new Float32Array( TENSOR_SIZE). fill( 1 ); const outputBuffer= new Float32Array( TENSOR_SIZE); // Execute the compiled graph with the specified inputs. const inputs= { 'input1' : inputBuffer1, 'input2' : inputBuffer2, }; const outputs= { 'output' : outputBuffer}; const result= await context. compute( graph, inputs, outputs); console. log( 'Output value: ' + result. outputs. output); // Output value: 2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25
10. Appendices
10.1. MLOperandDataType
and ArrayBufferView
compatibility
MLOperandDataType
| ArrayBufferView
|
---|---|
float32
| Float32Array
|
float16
| Float16Array
|
int32
| Int32Array
|
uint32
| Uint32Array
|
int8
| Int8Array
|
uint8
| Uint8Array
|
Float16Array
is at ECMA Stage 3 signaling its design is finished. Implementers wanting to enable this type ahead native implementations can emulate the type by passing raw bits via Uint16Array
. [Issue webnn#373]
11. Acknowledgements
This specification follows the concepts of the Android Neural Networks API C API.
Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.
Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.
Thanks to Sangwhan Moon and the W3C Technical Architecture Group for review of this specification for web architecture fit, design consistency and developer ergonomics.
Thanks to Zoltan Kis for adding algorithms and making navigating this specification a delightful experience. Thanks to Joshua Bell for aligning the specification with modern editorial conventions. Thanks to Ningxin Hu, Lisha Guo, Shiyi Zou, Mingming Xu, Junwei Fu, Bruce Dai and Bin Miao for careful review and comments.
Thanks to W3C Privacy Interest Group for privacy and security review and feedback.
Thanks to Alex Gough and the Chrome Security team for security review and questions.
Thanks to Michal Karzynski for sharing practical guidelines and learnings from ONNX.
Thanks to Kaustubha Govind and Chrome privacy reviewers for feedback and privacy considerations.
Thanks to Jiewei Qian for Chromium implementation review and feedback.
Thanks to Dwayne Robinson, Joshua Lochner and Wanming Lin for their work investigating and providing recommendation for transformer support. Additional thanks to Dwayne and Wanming for providing reviews of operator conformance and web-platform-tests implementation.
Thanks to Feng Dai for his continuous contributions that keep web-platform-tests evolving alongside the specification.