WebCodecs

W3C Working Draft,

More details about this document
This version:
https://www.w3.org/TR/2022/WD-webcodecs-20220504/
Latest published version:
https://www.w3.org/TR/webcodecs/
Editor's Draft:
https://w3c.github.io/webcodecs/
Previous Versions:
History:
https://www.w3.org/standards/history/webcodecs
Feedback:
GitHub
Inline In Spec
Editors:
Chris Cunningham (Google Inc.)
Paul Adenot (Mozilla)
Bernard Aboba (Microsoft Corporation)
Participate:
Git Repository.
File an issue.
Version History:
https://github.com/w3c/webcodecs/commits

Abstract

This specification defines interfaces to codecs for encoding and decoding of audio, video, and images.

This specification does not specify or require any particular codec or method of encoding or decoding. The purpose of this specification is to provide JavaScript interfaces to implementations of existing codec technology developed elsewhere. Implementers are free to support any combination of codecs or none at all.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

Feedback and comments on this specification are welcome. GitHub Issues are preferred for discussion on this specification. Alternatively, you can send comments to the Media Working Group’s mailing-list, public-media-wg@w3.org (archives). This draft highlights some of the pending issues that are still to be discussed in the working group. No decision has been taken on the outcome of these issues including whether they are valid.

This document was published by the Media Working Group as a Working Draft using the Recommendation track. This document is intended to become a W3C Recommendation.

Publication as a Working Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 2 November 2021 W3C Process Document.

1. Definitions

Codec

Refers generically to an instance of AudioDecoder, AudioEncoder, VideoDecoder, or VideoEncoder.

Key Chunk

An encoded chunk that does not depend on any other frames for decoding. Also commonly referred to as a "key frame".

Internal Pending Output

Codec outputs such as VideoFrames that currently reside in the internal pipeline of the underlying codec implementation. The underlying codec implementation MAY emit new outputs only when new inputs are provided. The underlying codec implementation MUST emit all outputs in response to a flush.

Codec System Resources

Resources including CPU memory, GPU memory, and exclusive handles to specific decoding/encoding hardware that MAY be allocated by the User Agent as part of codec configuration or generation of AudioData and VideoFrame objects. Such resources MAY be quickly exhausted and SHOULD be released immediately when no longer in use.

Temporal Layer

A grouping of EncodedVideoChunks whose timestamp cadence produces a particular framerate. See scalabilityMode.

Progressive Image

An image that supports decoding to multiple levels of detail, with lower levels becoming available while the encoded data is not yet fully buffered.

Progressive Image Frame Generation

A generational identifier for a given Progressive Image decoded output. Each successive generation adds additional detail to the decoded output. The mechanism for computing a frame’s generation is implementer defined.

Primary Image Track

An image track that is marked by the given image file as being the default track. The mechanism for indicating a primary track is format defined.

RGB Format

A VideoPixelFormat containing red, green, and blue color channels in any order or layout (interleaved or planar), and irrespective of whether an alpha channel is present.

sRGB Color Space

A VideoColorSpaceInit containing «[ "primaries" → bt709, "transfer" → iec61966-2-1, "matrix" → rgb, "fullRange" → true]».

REC709 Color Space

A VideoColorSpaceInit containing «[ "primaries" → bt709, "transfer" → bt709, "matrix" → bt709, "fullRange" → false]».

2. Codec Processing Model

2.1. Background

This section is non-normative.

The codec interfaces defined by the specification are designed such that new codec tasks can be scheduled while previous tasks are still pending. For example, web authors can call decode() without waiting for a previous decode() to complete. This is achieved by offloading underlying codec tasks to a separate thread for parallel execution.

This section describes threading behaviors as they are visible from the perspective of web authors. Implementers can choose to use more or less threads as long as the exernally visible behaviors of blocking and sequencing are maintained as follows.

2.2. Control Thread and Codec Thread

All steps in this specification will run on either a control thread or a codec thread.

The control thread is the thread from which authors will construct a codec and invoke its methods. Invoking a codec’s methods will typically result in the creation of control messages which are later executed on the codec thread. Each global object has a separate control thread.

The codec thread is the thread from which a codec will dequeue control messages and execute their steps. Each codec instance has a separate codec thread. The lifetime of a codec thread matches that of its associated codec instance.

The control thread uses a traditional event loop, as described in [HTML].

The codec thread uses a specialized codec processing loop.

Communication from the control thread to the codec thread is done using control message passing. Communication in the other direction is done using regular event loop tasks.

Each codec instance has a single control message queue that is a queue of control messages.

Queuing a control message means enqueuing the message to a codec’s control message queue. Invoking codec methods will often queue a control message to schedule work.

Running a control message means performing a sequence of steps specified by the method that enqueued the message.

The codec processing loop MUST run these steps:

  1. While true:

    1. If the control message queue is empty, continue.

    2. Dequeue front message from the control message queue.

    3. Run control message steps described by front message.

3. AudioDecoder Interface

[Exposed=(Window,DedicatedWorker), SecureContext]
interface AudioDecoder {
  constructor(AudioDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long decodeQueueSize;

  undefined configure(AudioDecoderConfig config);
  undefined decode(EncodedAudioChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioDecoderSupport> isConfigSupported(AudioDecoderConfig config);
};

dictionary AudioDecoderInit {
  required AudioDataOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback AudioDataOutputCallback = undefined(AudioData output);

3.1. Internal Slots

[[codec implementation]]

Underlying decoder implementation provided by the User Agent.

[[output callback]]

Callback given at construction for decoded outputs.

[[error callback]]

Callback given at construction for decode errors.

[[key chunk required]]

A boolean indicating that the next chunk passed to decode() MUST describe a key chunk as indicated by [[type]].

[[state]]

The current CodecState of this AudioDecoder.

[[decodeQueueSize]]

The number of pending decode requests. This number will decrease as the underlying codec is ready to accept new input.

[[pending flush promises]]

A list of unresolved promises returned by calls to flush().

3.2. Constructors

AudioDecoder(init)
  1. Let d be a new AudioDecoder object.

  2. Assign init.output to [[output callback]].

  3. Assign init.error to [[error callback]].

  4. Assign true to [[key chunk required]].

  5. Assign "unconfigured" to [[state]]

  6. Return d.

3.3. Attributes

state, of type CodecState, readonly
Returns the value of [[state]].
decodeQueueSize, of type unsigned long, readonly
Returns the value of [[decodeQueueSize]].

3.4. Methods

configure(config)
Enqueues a control message to configure the audio decoder for decoding chunks as described by config.

NOTE: This method will trigger a NotSupportedError if the User Agent does not support config. Authors are encouraged to first check support by calling isConfigSupported() with config. User Agents don’t have to support any particular codec type or configuration.

When invoked, run these steps:

  1. If config is not a valid AudioDecoderConfig, throw a TypeError.

  2. If [[state]] is “closed”, throw an InvalidStateError.

  3. Set [[state]] to "configured".

  4. Set [[key chunk required]] to true.

  5. Queue a control message to configure the decoder with config.

Running a control message to configure the decoder means running these steps:

  1. Let supported be the result of running the Check Configuration Support algorithm with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close AudioDecoder algorithm with NotSupportedError.

decode(chunk)
Enqueues a control message to decode the given chunk.

When invoked, run these steps:

  1. If [[state]] is not "configured", throw an InvalidStateError.

  2. If [[key chunk required]] is true:

    1. If chunk.[[type]] is not key, throw a DataError.

    2. Implementers SHOULD inspect the chunk’s [[internal data]] to verify that it is truly a key chunk. If a mismatch is detected, throw a DataError.

    3. Otherwise, assign false to [[key chunk required]].

  3. Increment [[decodeQueueSize]].

  4. Queue a control message to decode the chunk.

Running a control message to decode the chunk means performing these steps:

  1. Attempt to use [[codec implementation]] to decode the chunk.

  2. If decoding results in an error, queue a task on the control thread event loop to run the Close AudioDecoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement [[decodeQueueSize]].

  4. Let decoded outputs be a list of decoded audio data outputs emitted by [[codec implementation]].

  5. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output AudioData algorithm with decoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If [[state]] is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Set [[key chunk required]] to true.

  3. Let promise be a new Promise.

  4. Queue a control message to flush the codec with promise.

  5. Append promise to [[pending flush promises]].

  6. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let decoded outputs be a list of decoded audio data outputs emitted by [[codec implementation]].

  3. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output AudioData algorithm with decoded outputs.

  4. Queue a task on the control thread event loop to run these steps:

    1. Remove promise from [[pending flush promises]].

    2. Resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset AudioDecoder algorithm with an AbortError DOMException.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close AudioDecoder algorithm with an AbortError DOMException.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the User Agent.

NOTE: The returned AudioDecoderSupport config will contain only the dictionary members that User Agent recognized. Unrecognized dictionary members will be ignored. Authors can detect unrecognized dictionary members by comparing config to their provided config.

When invoked, run these steps:

  1. If config is not a valid AudioDecoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let decoderSupport be a newly constructed AudioDecoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with decoderSupport.

  5. Return p.

3.5. Algorithms

Output AudioData (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let data be an AudioData, initialized as follows:

      1. Assign false to [[Detached]].

      2. Let resource be the media resource described by output.

      3. Let resourceReference be a reference to resource.

      4. Assign resourceReference to [[resource reference]].

      5. Let timestamp be the [[timestamp]] of the EncodedAudioChunk associated with output.

      6. Assign timestamp to [[timestamp]].

      7. If output uses a recognized AudioSampleFormat, assign that format to [[format]]. Otherwise, assign null to [[format]].

      8. Assign values to [[sample rate]], [[number of frames]], and [[number of channels]] as determined by output.

    2. Invoke [[output callback]] with data.

Reset AudioDecoder (with exception)
Run these steps:
  1. If [[state]] is "closed", throw an InvalidStateError.

  2. Set [[state]] to "unconfigured".

  3. Signal [[codec implementation]] to cease producing output for the previous configuration.

  4. Remove all control messages from the control message queue.

  5. Set [[decodeQueueSize]] to zero.

  6. For each promise in [[pending flush promises]]:

    1. Reject promise with exception.

    2. Remove promise from [[pending flush promises]].

Close AudioDecoder (with exception)
Run these steps:
  1. Run the Reset AudioDecoder algorithm with exception.

  2. Set [[state]] to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If exception is not an AbortError DOMException, queue a task on the control thread event loop to invoke the [[error callback]] with exception.

4. VideoDecoder Interface

[Exposed=(Window,DedicatedWorker), SecureContext]
interface VideoDecoder {
  constructor(VideoDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long decodeQueueSize;

  undefined configure(VideoDecoderConfig config);
  undefined decode(EncodedVideoChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<VideoDecoderSupport> isConfigSupported(VideoDecoderConfig config);
};

dictionary VideoDecoderInit {
  required VideoFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback VideoFrameOutputCallback = undefined(VideoFrame output);

4.1. Internal Slots

[[codec implementation]]

Underlying decoder implementation provided by the User Agent.

[[output callback]]

Callback given at construction for decoded outputs.

[[error callback]]

Callback given at construction for decode errors.

[[active decoder config]]

The VideoDecoderConfig that is actively applied.

[[key chunk required]]

A boolean indicating that the next chunk passed to decode() MUST describe a key chunk as indicated by type.

[[state]]

The current CodecState of this VideoDecoder.

[[decodeQueueSize]]

The number of pending decode requests. This number will decrease as the underlying codec is ready to accept new input.

[[pending flush promises]]

A list of unresolved promises returned by calls to flush().

4.2. Constructors

VideoDecoder(init)
  1. Let d be a new VideoDecoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign true to [[key chunk required]].

  5. Assign "unconfigured" to [[state]].

  6. Return d.

4.3. Attributes

state, of type CodecState, readonly
Returns the value of [[state]].
decodeQueueSize, of type unsigned long, readonly
Returns the value of [[decodeQueueSize]].

4.4. Methods

configure(config)
Enqueues a control message to configure the video decoder for decoding chunks as described by config.

NOTE: This method will trigger a NotSupportedError if the User Agent does not support config. Authors are encouraged to first check support by calling isConfigSupported() with config. User Agents don’t have to support any particular codec type or configuration.

When invoked, run these steps:

  1. If config is not a valid VideoDecoderConfig, throw a TypeError.

  2. If [[state]] is “closed”, throw an InvalidStateError.

  3. Set [[state]] to "configured".

  4. Set [[key chunk required]] to true.

  5. Queue a control message to configure the decoder with config.

Running a control message to configure the decoder means running these steps:

  1. Let supported be the result of running the Check Configuration Support algorithm with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close VideoDecoder algorithm with NotSupportedError and abort these steps.

  4. Set [[active decoder config]] to config.

decode(chunk)
Enqueues a control message to decode the given chunk.

NOTE: Authors are encouraged to call close() on output VideoFrames immediately when frames are no longer needed. The underlying media resources are owned by the VideoDecoder and failing to release them (or waiting for garbage collection) can cause decoding to stall.

NOTE: VideoDecoder requires that frames are output in the order they expect to be presented, commonly known as presentation order. When using some [[codec implementation]]s the User Agent will have to to reorder outputs into presentation order.

When invoked, run these steps:

  1. If [[state]] is not "configured", throw an InvalidStateError.

  2. If [[key chunk required]] is true:

    1. If chunk.type is not key, throw a DataError.

    2. Implementers SHOULD inspect the chunk’s [[internal data]] to verify that it is truly a key chunk. If a mismatch is detected, throw a DataError.

    3. Otherwise, assign false to [[key chunk required]].

  3. Increment [[decodeQueueSize]].

  4. Queue a control message to decode the chunk.

Running a control message to decode the chunk means performing these steps:

  1. Attempt to use [[codec implementation]] to decode the chunk.

  2. If decoding results in an error, queue a task on the control thread event loop to run the Close VideoDecoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement [[decodeQueueSize]]

  4. Let decoded outputs be a list of decoded video data outputs emitted by [[codec implementation]] in presentation order.

  5. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output VideoFrames algorithm with decoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If [[state]] is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Set [[key chunk required]] to true.

  3. Let promise be a new Promise.

  4. Queue a control message to flush the codec with promise.

  5. Append promise to [[pending flush promises]].

  6. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let decoded outputs be a list of decoded video data outputs emitted by [[codec implementation]].

  3. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output VideoFrames algorithm with decoded outputs.

  4. Queue a task on the control thread event loop to run these steps:

    1. Remove promise from [[pending flush promises]].

    2. Resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset VideoDecoder algorithm with an AbortError DOMException.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close VideoDecoder algorithm with an AbortError DOMException.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the User Agent.

NOTE: The returned VideoDecoderSupport config will contain only the dictionary members that User Agent recognized. Unrecognized dictionary members will be ignored. Authors can detect unrecognized dictionary members by comparing config to their provided config.

When invoked, run these steps:

  1. If config is not a valid VideoDecoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let decoderSupport be a newly constructed VideoDecoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with decoderSupport.

  5. Return p.

4.5. Algorithms

Output VideoFrames (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let timestamp and duration be the timestamp and duration from the EncodedVideoChunk associated with output.

    2. Let displayAspectWidth and displayAspectHeight be undefined.

    3. If displayAspectWidth and displayAspectHeight exist in the [[active decoder config]], assign their values to displayAspectWidth and displayAspectHeight respectively.

    4. Let colorSpace be the VideoColorSpace for output as detected by the codec implementation. If no VideoColorSpace is detected, let colorSpace be undefined.

      NOTE: The codec implementation can detect a VideoColorSpace by analyzing the bitstream. Detection is made on a best-effort basis. The exact method of detection is implementer defined and codec-specific. Authors can override the detected VideoColorSpace by providing a colorSpace in the VideoDecoderConfig.

    5. If colorSpace exists in the [[active decoder config]], assign its value to colorSpace.

    6. Let frame be the result of running the Create a VideoFrame algorithm with output, timestamp, duration, displayAspectWidth, displayAspectHeight, and colorSpace.

    7. Invoke [[output callback]] with frame.

Reset VideoDecoder (with exception)
Run these steps:
  1. If state is "closed", throw an InvalidStateError.

  2. Set state to "unconfigured".

  3. Signal [[codec implementation]] to cease producing output for the previous configuration.

  4. Remove all control messages from the control message queue.

  5. Set [[decodeQueueSize]] to zero.

  6. For each promise in [[pending flush promises]]:

    1. Reject promise with exception.

    2. Remove promise from [[pending flush promises]].

Close VideoDecoder (with exception)
Run these steps:
  1. Run the Reset VideoDecoder algorithm with exception.

  2. Set state to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If exception is not an AbortError DOMException, queue a task on the control thread event loop to invoke the [[error callback]] with exception.

5. AudioEncoder Interface

[Exposed=(Window,DedicatedWorker), SecureContext]
interface AudioEncoder {
  constructor(AudioEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long encodeQueueSize;

  undefined configure(AudioEncoderConfig config);
  undefined encode(AudioData data);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioEncoderSupport> isConfigSupported(AudioEncoderConfig config);
};

dictionary AudioEncoderInit {
  required EncodedAudioChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedAudioChunkOutputCallback =
    undefined (EncodedAudioChunk output,
               optional EncodedAudioChunkMetadata metadata = {});

5.1. Internal Slots

[[codec implementation]]
Underlying encoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for encoded outputs.
[[error callback]]
Callback given at construction for encode errors.
[[active encoder config]]
The AudioEncoderConfig that is actively applied.
[[active output config]]
The AudioDecoderConfig that describes how to decode the most recently emitted EncodedAudioChunk.
[[state]]
The current CodecState of this AudioEncoder.
[[encodeQueueSize]]
The number of pending encode requests. This number will decrease as the underlying codec is ready to accept new input.
[[pending flush promises]]
A list of unresolved promises returned by calls to flush().

5.2. Constructors

AudioEncoder(init)
  1. Let e be a new AudioEncoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to [[state]].

  5. Assign null to [[active encoder config]].

  6. Assign null to [[active output config]].

  7. Return e.

5.3. Attributes

state, of type CodecState, readonly
Returns the value of [[state]].
encodeQueueSize, of type unsigned long, readonly
Returns the value of [[encodeQueueSize]].

5.4. Methods

configure(config)
Enqueues a control message to configure the audio encoder for decoding chunks as described by config.

NOTE: This method will trigger a NotSupportedError if the User Agent does not support config. Authors are encouraged to first check support by calling isConfigSupported() with config. User Agents don’t have to support any particular codec type or configuration.

When invoked, run these steps:

  1. If config is not a valid AudioEncoderConfig, throw a TypeError.

  2. If [[state]] is "closed", throw an InvalidStateError.

  3. Set [[state]] to "configured".

  4. Queue a control message to configure the encoder using config.

Running a control message to configure the encoder means performing these steps:

  1. Let supported be the result of running the Check Configuration Support algorithm with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close AudioEncoder algorithm with NotSupportedError and abort these steps.

  4. Assign config to [[active encoder config]]

encode(data)
Enqueues a control message to encode the given data.

When invoked, run these steps:

  1. If the value of data’s [[Detached]] internal slot is true, throw a TypeError.

  2. If [[state]] is not "configured", throw an InvalidStateError.

  3. Let dataClone hold the result of running the Clone AudioData algorithm with data.

  4. Increment [[encodeQueueSize]].

  5. Queue a control message to encode dataClone.

Running a control message to encode the data means performing these steps.

  1. Attempt to use [[codec implementation]] to encode the media resource described by dataClone.

  2. If encoding results in an error, queue a task on the control thread event loop to run the Close AudioEncoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement [[encodeQueueSize]].

  4. Let encoded outputs be a list of encoded audio data outputs emitted by [[codec implementation]].

  5. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedAudioChunks algorithm with encoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If [[state]] is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Let promise be a new Promise.

  3. Queue a control message to flush the codec with promise.

  4. Append promise to [[pending flush promises]].

  5. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let encoded outputs be a list of encoded audio data outputs emitted by [[codec implementation]].

  3. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedAudioChunks algorithm with encoded outputs.

  4. Queue a task on the control thread event loop to run these steps:

    1. Remove promise from [[pending flush promises]].

    2. Resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset AudioEncoder algorithm with an AbortError DOMException.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close AudioEncoder algorithm with an AbortError DOMException.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the User Agent.

NOTE: The returned AudioEncoderSupport config will contain only the dictionary members that User Agent recognized. Unrecognized dictionary members will be ignored. Authors can detect unrecognized dictionary members by comparing config to their provided config.

When invoked, run these steps:

  1. If config is not a valid AudioEncoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let encoderSupport be a newly constructed AudioEncoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with encoderSupport.

  5. Return p.

5.5. Algorithms

Output EncodedAudioChunks (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let chunkInit be an EncodedAudioChunkInit with the following keys:

      1. Let data contain the encoded audio data from output.

      2. Let type be the EncodedAudioChunkType of output.

      3. Let timestamp be the timestamp from the AudioData associated with output.

      4. Let duration be the duration from the AudioData associated with output.

    2. Let chunk be a new EncodedAudioChunk constructed with chunkInit.

    3. Let chunkMetadata be a new EncodedAudioChunkMetadata.

    4. Let encoderConfig be the [[active encoder config]].

    5. Let outputConfig be a new AudioDecoderConfig that describes output. Initialize outputConfig as follows:

      1. Assign encoderConfig.codec to outputConfig.codec.

      2. Assign encoderConfig.sampleRate to outputConfig.sampleRate.

      3. Assign to encoderConfig.numberOfChannels to outputConfig.numberOfChannels.

      4. Assign outputConfig.description with a sequence of codec specific bytes as determined by the [[codec implementation]]. The User Agent MUST ensure that the provided description could be used to correctly decode output.

        NOTE: The codec specific requirements for populating the description are described in the [WEBCODECS-CODEC-REGISTRY].

    6. If outputConfig and [[active output config]] are not equal dictionaries:

      1. Assign outputConfig to chunkMetadata.decoderConfig.

      2. Assign outputConfig to [[active output config]].

    7. Invoke [[output callback]] with chunk and chunkMetadata.

Reset AudioEncoder (with exception)
Run these steps:
  1. If [[state]] is "closed", throw an InvalidStateError.

  2. Set [[state]] to "unconfigured".

  3. Set [[active encoder config]] to null.

  4. Set [[active output config]] to null.

  5. Signal [[codec implementation]] to cease producing output for the previous configuration.

  6. Remove all control messages from the control message queue.

  7. Set [[encodeQueueSize]] to zero.

  8. For each promise in [[pending flush promises]]:

    1. Reject promise with exception.

    2. Remove promise from [[pending flush promises]].

Close AudioEncoder (with exception)
Run these steps:
  1. Run the Reset AudioEncoder algorithm with exception.

  2. Set [[state]] to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If exception is not an AbortError DOMException, queue a task on the control thread event loop to invoke the [[error callback]] with exception.

5.6. EncodedAudioChunkMetadata

The following metadata dictionary is emitted by the EncodedVideoChunkOutputCallback alongside an associated EncodedVideoChunk.
dictionary EncodedAudioChunkMetadata {
  AudioDecoderConfig decoderConfig;
};
decoderConfig, of type AudioDecoderConfig

A AudioDecoderConfig that authors MAY use to decode the associated EncodedAudioChunk.

6. VideoEncoder Interface

[Exposed=(Window,DedicatedWorker), SecureContext]
interface VideoEncoder {
  constructor(VideoEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long encodeQueueSize;

  undefined configure(VideoEncoderConfig config);
  undefined encode(VideoFrame frame, optional VideoEncoderEncodeOptions options = {});
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<VideoEncoderSupport> isConfigSupported(VideoEncoderConfig config);
};

dictionary VideoEncoderInit {
  required EncodedVideoChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedVideoChunkOutputCallback =
    undefined (EncodedVideoChunk chunk,
               optional EncodedVideoChunkMetadata metadata = {});

6.1. Internal Slots

[[codec implementation]]
Underlying encoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for encoded outputs.
[[error callback]]
Callback given at construction for encode errors.
[[active encoder config]]
The VideoEncoderConfig that is actively applied.
[[active output config]]
The VideoDecoderConfig that describes how to decode the most recently emitted EncodedVideoChunk.
[[state]]
The current CodecState of this VideoEncoder.
[[encodeQueueSize]]
The number of pending encode requests. This number will decrease as the underlying codec is ready to accept new input.
[[pending flush promises]]
A list of unresolved promises returned by calls to flush().

6.2. Constructors

VideoEncoder(init)
  1. Let e be a new VideoEncoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to [[state]].

  5. Return e.

6.3. Attributes

state, of type CodecState, readonly
Returns the value of [[state]].
encodeQueueSize, of type unsigned long, readonly
Returns the value of [[encodeQueueSize]].

6.4. Methods

configure(config)
Enqueues a control message to configure the video encoder for decoding chunks as described by config.

NOTE: This method will trigger a NotSupportedError if the User Agent does not support config. Authors are encouraged to first check support by calling isConfigSupported() with config. User Agents don’t have to support any particular codec type or configuration.

When invoked, run these steps:

  1. If config is not a valid VideoEncoderConfig, throw a TypeError.

  2. If [[state]] is "closed", throw an InvalidStateError.

  3. Set [[state]] to "configured".

  4. Queue a control message to configure the encoder using config.

Running a control message to configure the encoder means performing these steps:

  1. Let supported be the result of running the Check Configuration Support algorithm with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close VideoEncoder algorithm with NotSupportedError and abort these steps.

  4. Assign config to [[active encoder config]].

encode(frame, options)
Enqueues a control message to encode the given frame.

When invoked, run these steps:

  1. If the value of frame’s [[Detached]] internal slot is true, throw a TypeError.

  2. If [[state]] is not "configured", throw an InvalidStateError.

  3. Let frameClone hold the result of running the Clone VideoFrame algorithm with frame.

  4. Increment [[encodeQueueSize]].

  5. Queue a control message to encode frameClone.

Running a control message to encode the frame means performing these steps.

  1. Attempt to use [[codec implementation]] to encode frameClone according to options.

  2. If encoding results in an error, queue a task on the control thread event loop to run the Close VideoEncoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement [[encodeQueueSize]].

  4. Let encoded outputs be a list of encoded video data outputs emitted by [[codec implementation]].

  5. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedVideoChunks algorithm with encoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If [[state]] is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Let promise be a new Promise.

  3. Queue a control message to flush the codec with promise.

  4. Append promise to [[pending flush promises]].

  5. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let encoded outputs be a list of encoded video data outputs emitted by [[codec implementation]].

  3. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedVideoChunks algorithm with encoded outputs.

  4. Queue a task on the control thread event loop to run these steps:

    1. Remove promise from [[pending flush promises]].

    2. Resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset VideoEncoder algorithm with an AbortError DOMException.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close VideoEncoder algorithm with an AbortError DOMException.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the User Agent.

NOTE: The returned VideoEncoderSupport config will contain only the dictionary members that User Agent recognized. Unrecognized dictionary members will be ignored. Authors can detect unrecognized dictionary members by comparing config to their provided config.

When invoked, run these steps:

  1. If config is not a valid VideoEncoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let encoderSupport be a newly constructed VideoEncoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with encoderSupport.

  5. Return p.

6.5. Algorithms

Output EncodedVideoChunks (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let chunkInit be an EncodedVideoChunkInit with the following keys:

      1. Let data contain the encoded video data from output.

      2. Let type be the EncodedVideoChunkType of output.

      3. Let timestamp be the [[timestamp]] from the VideoFrame associated with output.

      4. Let duration be the [[duration]] from the VideoFrame associated with output.

    2. Let chunk be a new EncodedVideoChunk constructed with chunkInit.

    3. Let chunkMetadata be a new EncodedVideoChunkMetadata.

    4. Let encoderConfig be the [[active encoder config]].

    5. Let outputConfig be a VideoDecoderConfig that describes output. Initialize outputConfig as follows:

      1. Assign encoderConfig.codec to outputConfig.codec.

      2. Assign encoderConfig.width to outputConfig.visibleRect.width.

      3. Assign encoderConfig.height to outputConfig.visibleRect.height.

      4. Assign encoderConfig.displayWidth to outputConfig.displayWidth.

      5. Assign encoderConfig.displayHeight to outputConfig.displayHeight.

      6. Assign the remaining keys of outputConfig as determined by [[codec implementation]]. The User Agent MUST ensure that the configuration is completely described such that outputConfig could be used to correctly decode output.

        NOTE: The codec specific requirements for populating the description are described in the [WEBCODECS-CODEC-REGISTRY].

    6. If outputConfig and [[active output config]] are not equal dictionaries:

      1. Assign outputConfig to chunkMetadata.decoderConfig.

      2. Assign outputConfig to [[active output config]].

    7. If encoderConfig.scalabilityMode describes multiple temporal layers:

      1. Let svc be a new SvcOutputMetadata instance.

      2. Let temporal_layer_id be the zero-based index describing the temporal layer for output.

      3. Assign temporal_layer_id to svc.temporalLayerId.

      4. Assign svc to chunkMetadata.svc.

    8. If encoderConfig.alpha is set to "keep":

      1. Let alphaSideData be the encoded alpha data in output.

      2. Assign alphaSideData to chunkMetadata.alphaSideData.

    9. Invoke [[output callback]] with chunk and chunkMetadata.

Reset VideoEncoder (with exception)
Run these steps:
  1. If [[state]] is "closed", throw an InvalidStateError.

  2. Set [[state]] to "unconfigured".

  3. Set [[active encoder config]] to null.

  4. Set [[active output config]] to null.

  5. Signal [[codec implementation]] to cease producing output for the previous configuration.

  6. Remove all control messages from the control message queue.

  7. Set [[encodeQueueSize]] to zero.

  8. For each promise in [[pending flush promises]]:

    1. Reject promise with exception.

    2. Remove promise from [[pending flush promises]].

Close VideoEncoder (with exception)
Run these steps:
  1. Run the Reset VideoEncoder algorithm with exception.

  2. Set [[state]] to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If exception is not an AbortError DOMException, queue a task on the control thread event loop to invoke the [[error callback]] with exception.

6.6. EncodedVideoChunkMetadata

The following metadata dictionary is emitted by the EncodedVideoChunkOutputCallback alongside an associated EncodedVideoChunk.
dictionary EncodedVideoChunkMetadata {
  VideoDecoderConfig decoderConfig;
  SvcOutputMetadata svc;
  BufferSource alphaSideData;
};

dictionary SvcOutputMetadata {
  unsigned long temporalLayerId;
};
decoderConfig, of type VideoDecoderConfig

A VideoDecoderConfig that authors MAY use to decode the associated EncodedVideoChunk.

svc, of type SvcOutputMetadata

A collection of metadata describing this EncodedVideoChunk with respect to the configured scalabilityMode.

alphaSideData, of type BufferSource

A BufferSource that contains the EncodedVideoChunk's extra alpha channel data.

temporalLayerId, of type unsigned long

A number that identifies the temporal layer for the associated EncodedVideoChunk.

7. Configurations

7.1. Check Configuration Support (with config)

Run these steps:
  1. If config is an AudioDecoderConfig or VideoDecoderConfig and the User Agent can’t provide a codec that can decode the exact profile (where present), level (where present), and constraint bits (where present) indicated by the codec string in config.codec, return false.

  2. If config is an AudioEncoderConfig or VideoEncoderConfig:

    1. If the codec string in config.codec contains a profile and the User Agent can’t provide a codec that can encode the exact profile indicated by config.codec, return false.

    2. If the codec string in config.codec contains a level and the User Agent can’t provide a codec that can encode to a level less than or equal to the level indicated by config.codec, return false.

    3. If the codec string in config.codec contains constraint bits and the User Agent can’t provide a codec that can produce an encoded bitstream at least as constrained as indicated by config.codec, return false.

  3. If the User Agent can provide a codec to support all entries of the config, including applicable default values for keys that are not included, return true.

    NOTE: The types AudioDecoderConfig, VideoDecoderConfig, AudioEncoderConfig, and VideoEncoderConfig each define their respective configuration entries and defaults.

    NOTE: Support for a given configuration can change dynamically if the hardware is altered (e.g. external GPU unplugged) or if essential hardware resources are exhausted. User Agents describe support on a best-effort basis given the resources that are available at the time of the query.

  4. Otherwise, return false.

7.2. Clone Configuration (with config)

NOTE: This algorithm will copy only the dictionary members that the User Agent recognizes as part of the dictionary type.

Run these steps:

  1. Let dictType be the type of dictionary config.

  2. Let clone be a new empty instance of dictType.

  3. For each dictionary member m defined on dictType:

    1. If m does not exist in config, then continue.

    2. If config[m] is a nested dictionary, set clone[m] to the result of recursively running the Clone Configuration algorithm with config[m].

    3. Otherwise, assign the value of config[m] to clone[m].

7.3. Signalling Configuration Support

7.3.1. AudioDecoderSupport

dictionary AudioDecoderSupport {
  boolean supported;
  AudioDecoderConfig config;
};
supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the User Agent.
config, of type AudioDecoderConfig
An AudioDecoderConfig used by the User Agent in determining the value of supported.

7.3.2. VideoDecoderSupport

dictionary VideoDecoderSupport {
  boolean supported;
  VideoDecoderConfig config;
};
supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the User Agent.
config, of type VideoDecoderConfig
A VideoDecoderConfig used by the User Agent in determining the value of supported.

7.3.3. AudioEncoderSupport

dictionary AudioEncoderSupport {
  boolean supported;
  AudioEncoderConfig config;
};
supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the User Agent.
config, of type AudioEncoderConfig
An AudioEncoderConfig used by the User Agent in determining the value of supported.

7.3.4. VideoEncoderSupport

dictionary VideoEncoderSupport {
  boolean supported;
  VideoEncoderConfig config;
};
supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the User Agent.
config, of type VideoEncoderConfig
A VideoEncoderConfig used by the User Agent in determining the value of supported.

7.4. Codec String

A codec string describes a given codec format to be used for encoding or decoding.

A valid codec string MUST meet the following conditions.

  1. Is valid per the relevant codec specification (see examples below).

  2. It describes a single codec.

  3. It is unambiguous about codec profile, level, and constraint bits for codecs that define these concepts.

NOTE: In other media specifications, codec strings historically accompanied a MIME type as the "codecs=" parameter (isTypeSupported(), canPlayType()) [RFC6381]. In this specification, encoded media is not containerized; hence, only the value of the codecs parameter is accepted.

NOTE: Encoders for codecs that define level and constraint bits have flexibility around these parameters, but won’t produce bitstreams that have a higher level or are less constrained than requested.

The format and semantics for codec strings are defined by codec registrations listed in the [WEBCODECS-CODEC-REGISTRY]. A compliant implementation MAY support any combination of codec registrations or none at all.

7.5. AudioDecoderConfig

dictionary AudioDecoderConfig {
  required DOMString codec;
  [EnforceRange] required unsigned long sampleRate;
  [EnforceRange] required unsigned long numberOfChannels;
  BufferSource description;
};

To check if an AudioDecoderConfig is a valid AudioDecoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. Return true.

codec, of type DOMString
Contains a codec string in config.codec describing the codec.
sampleRate, of type unsigned long
The number of frame samples per second.
numberOfChannels, of type unsigned long
The number of audio channels.
description, of type BufferSource
A sequence of codec specific bytes, commonly known as extradata.

NOTE: The registrations in the [WEBCODECS-CODEC-REGISTRY] describe whether/how to populate this sequence, corresponding to the provided codec.

7.6. VideoDecoderConfig

dictionary VideoDecoderConfig {
  required DOMString codec;
  BufferSource description;
  [EnforceRange] unsigned long codedWidth;
  [EnforceRange] unsigned long codedHeight;
  [EnforceRange] unsigned long displayAspectWidth;
  [EnforceRange] unsigned long displayAspectHeight;
  VideoColorSpaceInit colorSpace;
  HardwareAcceleration hardwareAcceleration = "no-preference";
  boolean optimizeForLatency;
};

To check if a VideoDecoderConfig is a valid VideoDecoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. If one of codedWidth or codedHeight is provided but the other isn’t, return false.

  3. If codedWidth = 0 or codedHeight = 0, return false.

  4. If one of displayAspectWidth or displayAspectHeight is provided but the other isn’t, return false.

  5. If displayAspectWidth = 0 or displayAspectHeight = 0, return false.

  6. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
description, of type BufferSource
A sequence of codec specific bytes, commonly known as extradata.

NOTE: The registrations in the [WEBCODECS-CODEC-REGISTRY] describes whether/how to populate this sequence, corresponding to the provided codec.

codedWidth, of type unsigned long
Width of the VideoFrame in pixels, potentially including non-visible padding, and prior to considering potential ratio adjustments.
codedHeight, of type unsigned long
Height of the VideoFrame in pixels, potentially including non-visible padding, and prior to considering potential ratio adjustments.

NOTE: codedWidth and codedHeight are used when selecting a [[codec implementation]].

displayAspectWidth, of type unsigned long
Horizontal dimension of the VideoFrame’s aspect ratio when displayed.
displayAspectHeight, of type unsigned long
Vertical dimension of the VideoFrame’s aspect ratio when displayed.

NOTE: displayWidth and displayHeight can both be different from displayAspectWidth and displayAspectHeight, but have identical ratios, after scaling is applied when creating the video frame.

colorSpace, of type VideoColorSpaceInit
Configures the VideoFrame.colorSpace for VideoFrames associated with this VideoDecoderConfig. If colorSpace exists, the provided values will override any in-band values from the bitsream.
hardwareAcceleration, of type HardwareAcceleration, defaulting to "no-preference"
Hint that configures hardware acceleration for this codec. See HardwareAcceleration.
optimizeForLatency, of type boolean
Hint that the selected decoder SHOULD be configured to minimize the number of EncodedVideoChunks that have to be decoded before a VideoFrame is output.

NOTE: In addition to User Agent and hardware limitations, some codec bitstreams require a minimum number of inputs before any output can be produced.

7.7. AudioEncoderConfig

dictionary AudioEncoderConfig {
  required DOMString codec;
  [EnforceRange] unsigned long sampleRate;
  [EnforceRange] unsigned long numberOfChannels;
  [EnforceRange] unsigned long long bitrate;
};

NOTE: Codec-specific extensions to AudioEncoderConfig are described in their registrations in the [WEBCODECS-CODEC-REGISTRY].

To check if an AudioEncoderConfig is a valid AudioEncoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
sampleRate, of type unsigned long
The number of frame samples per second.
numberOfChannels, of type unsigned long
The number of audio channels.
bitrate, of type unsigned long long
The average bitrate of the encoded audio given in units of bits per second.

7.8. VideoEncoderConfig

dictionary VideoEncoderConfig {
  required DOMString codec;
  [EnforceRange] required unsigned long width;
  [EnforceRange] required unsigned long height;
  [EnforceRange] unsigned long displayWidth;
  [EnforceRange] unsigned long displayHeight;
  [EnforceRange] unsigned long long bitrate;
  [EnforceRange] double framerate;
  HardwareAcceleration hardwareAcceleration = "no-preference";
  AlphaOption alpha = "discard";
  DOMString scalabilityMode;
  BitrateMode bitrateMode = "variable";
  LatencyMode latencyMode = "quality";
};

NOTE: Codec-specific extensions to VideoEncoderConfig are described in their registrations in the [WEBCODECS-CODEC-REGISTRY].

To check if a VideoEncoderConfig is a valid VideoEncoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. If width = 0 or height = 0, return false.

  3. If displayWidth = 0 or displayHeight = 0, return false.

  4. Return true.

codec, of type DOMString
Contains a codec string in config.codec describing the codec.
width, of type unsigned long
The encoded width of output EncodedVideoChunks in pixels, prior to any display aspect ratio adjustments.

The encoder MUST scale any VideoFrame whose [[visible width]] differs from this value.

height, of type unsigned long
The encoded height of output EncodedVideoChunks in pixels, prior to any display aspect ratio adjustments.

The encoder MUST scale any VideoFrame whose [[visible height]] differs from this value.

displayWidth, of type unsigned long
The intended display width of output EncodedVideoChunks in pixels. Defaults to width if not present.
displayHeight, of type unsigned long
The intended display height of output EncodedVideoChunks in pixels. Defaults to width if not present.
NOTE: Providing a displayWidth or displayHeight that differs from width and height signals that chunks are to be scaled after decoding to arrive at the final display aspect ratio.

For many codecs this is merely pass-through information, but some codecs can sometimes include display sizing in the bitstream.

bitrate, of type unsigned long long
The average bitrate of the encoded video given in units of bits per second.

NOTE: Authors are encouraged to additionally provide a framerate to inform rate control.

framerate, of type double
The expected frame rate in frames per second, if known. This value, along with the frame timestamp, SHOULD be used by the video encoder to calculate the optimal byte length for each encoded frame. Additionally, the value SHOULD be considered a target deadline for outputting encoding chunks when latencyMode is set to realtime.
hardwareAcceleration, of type HardwareAcceleration, defaulting to "no-preference"
Hint that configures hardware acceleration for this codec. See HardwareAcceleration.
alpha, of type AlphaOption, defaulting to "discard"
Whether the alpha component of the VideoFrame inputs SHOULD be kept or discarded prior to encoding. If alpha is equal to discard, alpha data is always discarded, regardless of a VideoFrame's [[format]].
scalabilityMode, of type DOMString
An encoding scalability mode identifier as defined by [WebRTC-SVC].
bitrateMode, of type BitrateMode, defaulting to "variable"
Configures encoding to use a constant or variable bitrate as defined by [MEDIASTREAM-RECORDING].

NOTE: The precise degree of bitrate fluctuation in either mode is implementation defined.

latencyMode, of type LatencyMode, defaulting to "quality"
Configures latency related behaviors for this codec. See LatencyMode.

7.9. Hardware Acceleration

enum HardwareAcceleration {
  "no-preference",
  "prefer-hardware",
  "prefer-software",
};

When supported, hardware acceleration offloads encoding or decoding to specialized hardware. prefer-hardware and prefer-software are hints. While User Agents SHOULD respect these values when possible, User Agents may ignore these values in some or all circumstances for any reason.

To prevent fingerprinting, if a User Agent implements [media-capabilities], the User Agent MUST ensure rejection or acceptance of a given HardwareAcceleration preference reveals no additional information on top of what is inherent to the User Agent and revealed by [media-capabilities]. If a User Agent does not implement [media-capabilities] for reasons of fingerprinting, they SHOULD ignore the HardwareAcceleration preference.

NOTE: Good examples of when a User Agent can ignore prefer-hardware or prefer-software are for reasons of user privacy or circumstances where the User Agent determines an alternative setting would better serve the end user.

Most authors will be best served by using the default of no-preference. This gives the User Agent flexibility to optimize based on its knowledge of the system and configuration. A common strategy will be to prioritize hardware acceleration at higher resolutions with a fallback to software codecs if hardware acceleration fails.

Authors are encouraged to carefully weigh the tradeoffs when setting a hardware acceleration preference. The precise tradeoffs will be device-specific, but authors can generally expect the following:

Given these tradeoffs, a good example of using "prefer-hardware" would be if an author intends to provide their own software based fallback via WebAssembly.

Alternatively, a good example of using "prefer-software" would be if an author is especially sensitive to the higher startup latency or decreased robustness generally associated with hardware acceleration.

no-preference
Indicates that the User Agent MAY use hardware acceleration if it is available and compatible with other aspects of the codec configuration.
prefer-software
Indicates that the User Agent SHOULD prefer a software codec implementation. User Agents may ignore this value for any reason.

NOTE: This can cause the configuration to be unsupported on platforms where an unaccelerated codec is unavailable or is incompatible with other aspects of the codec configuration.

prefer-hardware
Indicates that the User Agent SHOULD prefer hardware acceleration. User Agents may ignore this value for any reason.

NOTE: This can cause the configuration to be unsupported on platforms where an accelerated codec is unavailable or is incompatible with other aspects of the codec configuration.

7.10. Alpha Option

enum AlphaOption {
  "keep",
  "discard",
};

Describes how the user agent SHOULD behave when dealing with alpha channels, for a variety of different operations.

keep
Indicates that the user agent SHOULD preserve alpha channel data for VideoFrames, if it is present.
discard
Indicates that the user agent SHOULD ignore or remove VideoFrame's alpha channel data.

7.11. Latency Mode

enum LatencyMode {
  "quality",
  "realtime"
};
quality

Indicates that the User Agent SHOULD optimize for encoding quality. In this mode:

  • User Agents MAY increase encoding latency to improve quality.

  • User Agents MUST not drop frames to achieve the target bitrate and/or framerate.

  • framerate SHOULD not be used as a target deadline for emitting encoded chunks.

realtime

Indicates that the User Agent SHOULD optimize for low latency. In this mode:

  • User Agents MAY sacrifice quality to improve latency.

  • User Agents MAY drop frames to achieve the target bitrate and/or framerate.

  • framerate SHOULD be used as a target deadline for emitting encoded chunks.

7.12. Configuration Equivalence

Two dictionaries are equal dictionaries if they contain the same keys and values. For nested dictionaries, apply this definition recursively.

7.13. VideoEncoderEncodeOptions

dictionary VideoEncoderEncodeOptions {
  boolean keyFrame = false;
};
keyFrame, of type boolean, defaulting to false
A value of true indicates that the given frame MUST be encoded as a key frame. A value of false indicates that the User Agent has flexibility to decide whether the frame will be encoded as a key frame.

7.14. CodecState

enum CodecState {
  "unconfigured",
  "configured",
  "closed"
};
unconfigured
The codec is not configured for encoding or decoding.
configured
A valid configuration has been provided. The codec is ready for encoding or decoding.
closed
The codec is no longer usable and underlying system resources have been released.

7.15. WebCodecsErrorCallback

callback WebCodecsErrorCallback = undefined(DOMException error);

8. Encoded Media Interfaces (Chunks)

These interfaces represent chunks of encoded media.

8.1. EncodedAudioChunk Interface

[Exposed=(Window,DedicatedWorker)]
interface EncodedAudioChunk {
  constructor(EncodedAudioChunkInit init);
  readonly attribute EncodedAudioChunkType type;
  readonly attribute long long timestamp;          // microseconds
  readonly attribute unsigned long long? duration; // microseconds
  readonly attribute unsigned long byteLength;

  undefined copyTo([AllowShared] BufferSource destination);
};

dictionary EncodedAudioChunkInit {
  required EncodedAudioChunkType type;
  [EnforceRange] required long long timestamp;    // microseconds
  [EnforceRange] unsigned long long duration;     // microseconds
  required BufferSource data;
};

enum EncodedAudioChunkType {
    "key",
    "delta",
};

8.1.1. Internal Slots

[[internal data]]

An array of bytes representing the encoded chunk data.

[[type]]

Describes whether the chunk is a key chunk.

[[timestamp]]

The presentation timestamp, given in microseconds.

[[duration]]

The presentation duration, given in microseconds.

[[byte length]]

The byte length of [[internal data]].

8.1.2. Constructors

EncodedAudioChunk(init)
  1. Let chunk be a new EncodedAudioChunk object, initialized as follows

    1. Assign init.type to [[type]].

    2. Assign init.timestamp to [[timestamp]].

    3. If init.duration exists, assign it to [[duration]], or assign null otherwise.

    4. Assign a copy of init.data to [[internal data]].

    5. Assign init.data.byteLength to [[byte length]];

  2. Return chunk.

8.1.3. Attributes

type, of type EncodedAudioChunkType, readonly

Returns the value of [[type]].

timestamp, of type long long, readonly

Returns the value of [[timestamp]].

duration, of type unsigned long long, readonly, nullable

Returns the value of [[duration]].

byteLength, of type unsigned long, readonly

Returns the value of [[byte length]].

8.1.4. Methods

copyTo(destination)

When invoked, run these steps:

  1. If the [[byte length]] of this EncodedAudioChunk is greater than in destination, throw a TypeError.

  2. Copy the [[internal data]] into destination.

8.2. EncodedVideoChunk Interface

[Exposed=(Window,DedicatedWorker)]
interface EncodedVideoChunk {
  constructor(EncodedVideoChunkInit init);
  readonly attribute EncodedVideoChunkType type;
  readonly attribute long long timestamp;             // microseconds
  readonly attribute unsigned long long? duration;    // microseconds
  readonly attribute unsigned long byteLength;

  undefined copyTo([AllowShared] BufferSource destination);
};

dictionary EncodedVideoChunkInit {
  required EncodedVideoChunkType type;
  [EnforceRange] required long long timestamp;        // microseconds
  [EnforceRange] unsigned long long duration;         // microseconds
  required BufferSource data;
};

enum EncodedVideoChunkType {
    "key",
    "delta",
};

8.2.1. Internal Slots

[[internal data]]

An array of bytes representing the encoded chunk data.

[[type]]

The EncodedAudioChunkType of this EncodedVideoChunk;

[[timestamp]]

The presentation timestamp, given in microseconds.

[[duration]]

The presentation duration, given in microseconds.

[[byte length]]

The byte length of [[internal data]].

8.2.2. Constructors

EncodedVideoChunk(init)
  1. Let chunk be a new EncodedVideoChunk object, initialized as follows

    1. Assign init.type to [[type]].

    2. Assign init.timestamp to [[timestamp]].

    3. If duration is present in init, assign init.duration to [[duration]]. Otherwise, assign null to [[duration]].

    4. Assign a copy of init.data to [[internal data]].

    5. Assign init.data.byteLength to [[byte length]];

  2. Return chunk.

8.2.3. Attributes

type, of type EncodedVideoChunkType, readonly

Returns the value of [[type]].

timestamp, of type long long, readonly

Returns the value of [[timestamp]].

duration, of type unsigned long long, readonly, nullable

Returns the value of [[duration]].

byteLength, of type unsigned long, readonly

Returns the value of [[byte length]].

8.2.4. Methods

copyTo(destination)

When invoked, run these steps:

  1. If [[byte length]] is greater than the [[byte length]] of destination, throw a TypeError.

  2. Copy the [[internal data]] into destination.

9. Raw Media Interfaces

These interfaces represent unencoded (raw) media.

9.1. Memory Model

9.1.1. Background

This section is non-normative.

Decoded media data MAY occupy a large amount of system memory. To minimize the need for expensive copies, this specification defines a scheme for reference counting (clone() and close()).

NOTE: Authors are encourage to call close() immediately when frames are no longer needed.

9.1.2. Reference Counting

A media resource is storage for the actual pixel data or the audio sample data described by a VideoFrame or AudioData.

The AudioData [[resource reference]] and VideoFrame [[resource reference]] internal slots hold a reference to a media resource.

VideoFrame.clone() and AudioData.clone() return new objects whose [[resource reference]] points to the same media resource as the original object.

VideoFrame.close() and AudioData.close() will clear their [[resource reference]] slot, releasing the reference their media resource.

A media resource MUST remain alive at least as long as it continues to be referenced by a [[resource reference]].

NOTE: When a media resource is no longer referenced by a [[resource reference]], the resource can be destroyed. User Agents are encouraged to destroy such resources quickly to reduce memory pressure and facilitate resource reuse.

9.1.3. Transfer and Serialization

This section is non-normative.

AudioData and VideoFrame are both transferable and serializable objects. Their transfer and serialization steps are defined in § 9.2.6 Transfer and Serialization and § 9.4.7 Transfer and Serialization respectively.

Transferring an AudioData or VideoFrame moves its [[resource reference]] to the destination object and closes (as in close()) the source object. Authors MAY use this facility to move an AudioData or VideoFrame between realms without copying the underlying media resource.

Serializing an AudioData or VideoFrame effectively clones (as in clone()) the source object, resulting in two objects that reference the same media resource. Authors MAY use this facility to clone an AudioData or VideoFrame to another realm without copying the underlying media resource.

9.2. AudioData Interface

[Exposed=(Window,DedicatedWorker), Serializable, Transferable]
interface AudioData {
  constructor(AudioDataInit init);

  readonly attribute AudioSampleFormat? format;
  readonly attribute float sampleRate;
  readonly attribute unsigned long numberOfFrames;
  readonly attribute unsigned long numberOfChannels;
  readonly attribute unsigned long long duration;  // microseconds
  readonly attribute long long timestamp;          // microseconds

  unsigned long allocationSize(AudioDataCopyToOptions options);
  undefined copyTo([AllowShared] BufferSource destination, AudioDataCopyToOptions options);
  AudioData clone();
  undefined close();
};

dictionary AudioDataInit {
  required AudioSampleFormat format;
  required float sampleRate;
  [EnforceRange] required unsigned long numberOfFrames;
  [EnforceRange] required unsigned long numberOfChannels;
  [EnforceRange] required long long timestamp;  // microseconds
  required BufferSource data;
};

9.2.1. Internal Slots

[[resource reference]]

A reference to a media resource that stores the audio sample data for this AudioData.

[[format]]

The AudioSampleFormat used by this AudioData. Will be null whenever the underlying format does not map to an AudioSampleFormat or when [[Detached]] is true.

[[sample rate]]

The sample-rate, in Hz, for this AudioData.

[[number of frames]]

The number of frames for this AudioData.

[[number of channels]]

The number of audio channels for this AudioData.

[[timestamp]]

The presentation timestamp, in microseconds, for this AudioData.

9.2.2. Constructors

AudioData(init)
  1. If init is not a valid AudioDataInit, throw a TypeError.

  2. Let frame be a new AudioData object, initialized as follows:

    1. Assign false to [[Detached]].

    2. Assign init.format to [[format]].

    3. Assign init.sampleRate to [[sample rate]].

    4. Assign init.numberOfFrames to [[number of frames]].

    5. Assign init.numberOfChannels to [[number of channels]].

    6. Assign init.timestamp to [[timestamp]].

    7. Let resource be a media resource containing a copy of init.data.

    8. Let resourceReference be a reference to resource.

    9. Assign resourceReference to [[resource reference]].

  3. Return frame.

9.2.3. Attributes

format, of type AudioSampleFormat, readonly, nullable

The AudioSampleFormat used by this AudioData. Will be null whenever the underlying format does not map to a AudioSampleFormat or when [[Detached]] is true.

The format getter steps are to return [[format]].

sampleRate, of type float, readonly

The sample-rate, in Hz, for this AudioData.

The sampleRate getter steps are to return [[sample rate]].

numberOfFrames, of type unsigned long, readonly

The number of frames for this AudioData.

The numberOfFrames getter steps are to return [[number of frames]].

numberOfChannels, of type unsigned long, readonly

The number of audio channels for this AudioData.

The numberOfChannels getter steps are to return [[number of channels]].

timestamp, of type long long, readonly

The presentation timestamp, in microseconds, for this AudioData.

The numberOfChannels getter steps are to return [[timestamp]].

duration, of type unsigned long long, readonly

The duration, in microseconds, for this AudioData.

The duration getter steps are to:

  1. Let microsecondsPerSecond be 1,000,000.

  2. Let durationInSeconds be the result of dividing [[number of frames]] by [[sample rate]].

  3. Return the product of durationInSeconds and microsecondsPerSecond.

9.2.4. Methods

allocationSize(options)

Returns the number of bytes required to hold the samples as described by options.

When invoked, run these steps:

  1. If [[Detached]] is true, throw an InvalidStateError DOMException.

  2. Let copyElementCount be the result of running the Compute Copy Element Count algorithm with options.

  3. Let destFormat be the value of [[format]].

  4. If options.format exists, assign options.format to destFormat.

  5. Let bytesPerSample be the number of bytes per sample, as defined by the destFormat.

  6. Return the product of multiplying bytesPerSample by copyElementCount.

copyTo(destination, options)

Copies the samples from the specified plane of the AudioData to the destination buffer.

When invoked, run these steps:

  1. If [[Detached]] is true, throw an InvalidStateError DOMException.

  2. Let copyElementCount be the result of running the Compute Copy Element Count algorithm with options.

  3. Let destFormat be the value of [[format]].

  4. If options.format exists, assign options.format to destFormat.

  5. Let bytesPerSample be the number of bytes per sample, as defined by the destFormat.

  6. If the product of multiplying bytesPerSample by copyElementCount is greater than destination.byteLength, throw a RangeError.

  7. Let resource be the media resource referenced by [[resource reference]].

  8. Let planeFrames be the region of resource corresponding to options.planeIndex.

  9. Copy elements of planeFrames into destination, starting with the frame positioned at options.frameOffset and stopping after copyElementCount samples have been copied. If destFormat does not equal [[format]], convert elements to the destFormat AudioSampleFormat while making the copy.

clone()

Creates a new AudioData with a reference to the same media resource.

When invoked, run these steps:

  1. If [[Detached]] is true, throw an InvalidStateError DOMException.

  2. Return the result of running the Clone AudioData algorithm with this.

close()

Clears all state and releases the reference to the media resource. Close is final.

When invoked, run the Close AudioData algorithm with this.

9.2.5. Algorithms

Compute Copy Element Count (with options)

Run these steps:

  1. Let destFormat be the value of [[format]].

  2. If options.format exists, assign options.format to destFormat.

  3. If destFormat describes an interleaved AudioSampleFormat and options.planeIndex is greater than 0, throw a RangeError.

  4. Otherwise, if destFormat describes a planar AudioSampleFormat and if options.planeIndex is greater or equal to [[number of channels]], throw a RangeError.

  5. If [[format]] does not equal destFormat and the User Agent does not support the requested AudioSampleFormat conversion, throw a NotSupportedError DOMException. Conversion to f32-planar MUST always be supported.

  6. Let frameCount be the number of frames in the plane identified by options.planeIndex.

  7. If options.frameOffset is greater than or equal to frameCount, throw a RangeError.

  8. Let copyFrameCount be the difference of subtracting options.frameOffset from frameCount.

  9. If options.frameCount exists:

    1. If options.frameCount is greater than copyFrameCount, throw a RangeError.

    2. Otherwise, assign options.frameCount to copyFrameCount.

  10. Let elementCount be copyFrameCount.

  11. If destFormat describes an interleaved AudioSampleFormat, mutliply elementCount by [[number of channels]]

  12. return elementCount.

Clone AudioData (with data)

Run these steps:

  1. Let clone be a new AudioData initialized as follows:

    1. Let resource be the media resource referenced by data’s [[resource reference]].

    2. Let reference be a new reference to resource.

    3. Assign reference to [[resource reference]].

    4. Assign the values of data’s [[Detached]], [[format]], [[sample rate]], [[number of frames]], [[number of channels]], and [[timestamp]] slots to the corresponding slots in clone.

  2. Return clone.

Close AudioData (with data)

Run these steps:

  1. Assign true to data’s [[Detached]] internal slot.

  2. Assign null to data’s [[resource reference]].

  3. Assign 0 to data’s [[sample rate]].

  4. Assign 0 to data’s [[number of frames]].

  5. Assign 0 to data’s [[number of channels]].

  6. Assign null to data’s [[format]].

To check if a AudioDataInit is a valid AudioDataInit, run these steps:
  1. If sampleRate less than or equal to 0, return false.

  2. If numberOfFrames = 0, return false.

  3. If numberOfChannels = 0, return false.

  4. Verify data has enough data by running the following steps:

    1. Let totalSamples be the product of multiplying numberOfFrames by numberOfChannels.

    2. Let bytesPerSample be the number of bytes per sample, as defined by the format.

    3. Let totalSize be the product of multiplying bytesPerSample with totalSamples.

    4. Let dataSize be the size in bytes of data.

    5. If dataSize is less than totalSize, return false.

  5. Return true.

Note: It’s expected that AudioDataInit's data's memory layout matches the expectations of the planar or interleaved format. There is no real way to verify whether the samples conform to their AudioSampleFormat.

9.2.6. Transfer and Serialization

The AudioData transfer steps (with value and dataHolder) are:
  1. If value’s [[Detached]] is true, throw a DataCloneError DOMException.

  2. For all AudioData internal slots in value, assign the value of each internal slot to a field in dataHolder with the same name as the internal slot.

  3. Run the Close AudioData algorithm with value.

The AudioData transfer-receiving steps (with dataHolder and value) are:
  1. For all named fields in dataHolder, assign the value of each named field to the AudioData internal slot in value with the same name as the named field.

The AudioData serialization steps (with value, serialized, and forStorage) are:
  1. If value’s [[Detached]] is true, throw a DataCloneError DOMException.

  2. If forStorage is true, throw a TypeError.

  3. Let resource be the media resource referenced by value’s [[resource reference]].

  4. Let newReference be a new reference to resource.

  5. Assign newReference to serialized.[[resource reference]].

  6. For all remaining AudioData internal slots (excluding [[resource reference]]) in value, assign the value of each internal slot to a field in serialized with the same name as the internal slot.

The AudioData deserialization steps (with serialized and value) are:
  1. For all named fields in serialized, assign the value of each named field to the AudioData internal slot in value with the same name as the named field.

9.2.7. AudioDataCopyToOptions

dictionary AudioDataCopyToOptions {
  [EnforceRange] required unsigned long planeIndex;
  [EnforceRange] unsigned long frameOffset = 0;
  [EnforceRange] unsigned long frameCount;
  AudioSampleFormat format;
};
planeIndex, of type unsigned long

The index identifying the plane to copy from.

frameOffset, of type unsigned long, defaulting to 0

An offset into the source plane data indicating which frame to begin copying from. Defaults to 0.

frameCount, of type unsigned long

The number of frames to copy. If not provided, the copy will include all frames in the plane beginning with frameOffset.

format, of type AudioSampleFormat

The output AudioSampleFormat for the destination data. If not provided, the resulting copy will use this AudioData’s [[format]]. Invoking copyTo() will throw a NotSupportedError if conversion to the requested format is not supported. Conversion from any AudioSampleFormat to f32-planar MUST always be supported.

NOTE: Authors seeking to integrate with [WEBAUDIO] can request f32-planar and use the resulting copy to create and AudioBuffer or render via AudioWorklet.

9.3. Audio Sample Format

An audio sample format describes the numeric type used to represent a single sample (e.g. 32-bit floating point) and the arrangement of samples from different channels as either interleaved or planar. The audio sample type refers solely to the numeric type and interval used to store the data, this is u8, s16, s32, or f32 for respectively unsigned 8-bits, signed 16-bits, signed 32-bits, and 32-bits floating point number. The audio buffer arrangement refers solely to the way the samples are laid out in memory (planar or interleaved).

A sample refers to a single value that is the magnitude of a signal at a particular point in time in a particular channel.

A frame or (sample-frame) refers to a set of values of all channels of a multi-channel signal, that happen at the exact same time.

NOTE: Consequently if an audio signal is mono (has only one channel), a frame and a sample refer to the same thing.

All audio samples in this specification are using linear pulse-code modulation (Linear PCM): quantization levels are uniform between values.

NOTE: The Web Audio API, that is expected to be used with this specification, also uses Linear PCM.

enum AudioSampleFormat {
  "u8",
  "s16",
  "s32",
  "f32",
  "u8-planar",
  "s16-planar",
  "s32-planar",
  "f32-planar",
};
u8

8-bit unsigned integer samples with interleaved channel arrangement.

s16

16-bit signed integer samples with interleaved channel arrangement.

s32

32-bit signed integer samples with interleaved channel arrangement.

f32

32-bit float samples with interleaved channel arrangement.

u8-planar

8-bit unsigned integer samples with planar channel arrangement.

s16-planar

16-bit signed integer samples with planar channel arrangement.

s32-planar

32-bit signed integer samples with planar channel arrangement.

f32-planar

32-bit float samples with planar channel arrangement.

9.3.1. Arrangement of audio buffer

When an AudioData has an AudioSampleFormat that is interleaved, the audio samples from different channels are laid out consecutively in the same buffer, in the order described in the section § 9.3.3 Audio channel ordering. The AudioData has a single plane, that contains a number of elements therefore equal to [[number of frames]] * [[number of channels]].

When an AudioData has an AudioSampleFormat that is planar, the audio samples from different channels are laid out in different buffers, themselves arranged in an order described in the section § 9.3.3 Audio channel ordering. The AudioData has a number of planes equal to the AudioData's [[number of channels]]. Each plane contains [[number of frames]] elements.

NOTE: The Web Audio API currently uses f32-planar exclusively.

NOTE: The following diagram exemplifies the memory layout of planar versus interleaved AudioSampleFormats

Graphical representation the memory layout of interleaved and planar
    formats

9.3.2. Magnitude of the audio samples

The minimum value and maximum value of an audio sample, for a particular audio sample type, are the values below which (respectively above which) audio clipping might occur. They are otherwise regular types, that can hold values outside this interval during intermediate processing.

The bias value for an audio sample type is the value that often corresponds to the middle of the range (but often the range is not symmetrical). An audio buffer comprised only of values equal to the bias value is silent.

Sample type IDL type Minimum value Bias value Maximum value
u8 octet 0 128 +255
s16 short -32768 0 +32767
s32 long -2147483648 0 +2147483647
f32 float -1.0 0.0 +1.0

NOTE: There is no data type that can hold 24 bits of information conveniently, but audio content using 24-bit samples is common, so 32-bits integers are commonly used to hold 24-bit content.

AudioData containing 24-bit samples SHOULD store those samples in s32 or f32. When samples are stored in s32, each sample MUST be left-shifted by 8 bits. By virtue of this process, samples outside of the valid 24-bit range ([-8388608, +8388607]) will be clipped. To avoid clipping and ensure lossless transport, samples MAY be converted to f32.

NOTE: While clipping is unavoidable in u8, s16, and s32 samples due to their storage types, implementations SHOULD take care not to clip internally when handling f32 samples.

9.3.3. Audio channel ordering

When decoding, the ordering of the audio channels in the resulting AudioData MUST be the same as what is present in the EncodedAudioChunk.

When encoding, the ordering of the audio channels in the resulting EncodedAudioChunk MUST be the same as what is preset in the given AudioData;

In other terms, no channel reordering is performed when encoding and decoding.

NOTE: The container either implies or specifies the channel mapping: the channel attributed to a particular channel index.

9.4. VideoFrame Interface

NOTE: VideoFrame is a CanvasImageSource. A VideoFrame can be passed to any method accepting a CanvasImageSource, including CanvasDrawImage's drawImage().

[Exposed=(Window,DedicatedWorker), Serializable, Transferable]
interface VideoFrame {
  constructor(CanvasImageSource image, optional VideoFrameInit init = {});
  constructor([AllowShared] BufferSource data, VideoFrameBufferInit init);

  readonly attribute VideoPixelFormat? format;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute DOMRectReadOnly? codedRect;
  readonly attribute DOMRectReadOnly? visibleRect;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;  // microseconds
  readonly attribute long long? timestamp;          // microseconds
  readonly attribute VideoColorSpace colorSpace;

  unsigned long allocationSize(
      optional VideoFrameCopyToOptions options = {});
  Promise<sequence<PlaneLayout>> copyTo(
      [AllowShared] BufferSource destination,
      optional VideoFrameCopyToOptions options = {});
  VideoFrame clone();
  undefined close();
};

dictionary VideoFrameInit {
  unsigned long long duration;  // microseconds
  long long timestamp;          // microseconds
  AlphaOption alpha = "keep";

  // Default matches image. May be used to efficiently crop. Will trigger
  // new computation of displayWidth and displayHeight using image’s pixel
  // aspect ratio unless an explicit displayWidth and displayHeight are given.
  DOMRectInit visibleRect;

  // Default matches image unless visibleRect is provided.
  [EnforceRange] unsigned long displayWidth;
  [EnforceRange] unsigned long displayHeight;
};

dictionary VideoFrameBufferInit {
  required VideoPixelFormat format;
  required [EnforceRange] unsigned long codedWidth;
  required [EnforceRange] unsigned long codedHeight;
  required [EnforceRange] long long timestamp;  // microseconds
  [EnforceRange] unsigned long long duration;  // microseconds

  // Default layout is tightly-packed.
  sequence<PlaneLayout> layout;

  // Default visible rect is coded size positioned at (0,0)
  DOMRectInit visibleRect;

  // Default display dimensions match visibleRect.
  [EnforceRange] unsigned long displayWidth;
  [EnforceRange] unsigned long displayHeight;

  VideoColorSpaceInit colorSpace;
};

9.4.1. Internal Slots

[[resource reference]]

A reference to the media resource that stores the pixel data for this frame.

[[format]]

A VideoPixelFormat describing the pixel format of the VideoFrame. Will be null whenever the underlying format does not map to a VideoPixelFormat or when [[Detached]] is true.

[[coded width]]

Width of the VideoFrame in pixels, potentially including non-visible padding, and prior to considering potential ratio adjustments.

[[coded height]]

Height of the VideoFrame in pixels, potentially including non-visible padding, and prior to considering potential ratio adjustments.

[[visible left]]

The number of pixels defining the left offset of the visible rectangle.

[[visible top]]

The number of pixels defining the top offset of the visible rectangle.

[[visible width]]

The width of pixels to include in visible rectangle, starting from [[visible left]].

[[visible height]]

The height of pixels to include in visible rectangle, starting from [[visible top]].

[[display width]]

Width of the VideoFrame when displayed after applying aspect ratio adjustments.

[[display height]]

Height of the VideoFrame when displayed after applying aspect ratio adjustments.

[[duration]]

The presentation duration, given in microseconds. The duration is copied from the EncodedVideoChunk corresponding to this VideoFrame.

[[timestamp]]

The presentation timestamp, given in microseconds. The timestamp is copied from the EncodedVideoChunk corresponding to this VideoFrame.

[[color space]]

The VideoColorSpace associated with this frame.

9.4.2. Constructors

VideoFrame(image, init)

  1. Check the usability of the image argument. If this throws an exception or returns bad, then throw an InvalidStateError DOMException.

  2. If the origin of image’s image data is not same origin with the entry settings object's origin, then throw a SecurityError DOMException.

  3. Let frame be a new VideoFrame.

  4. Switch on image:

  5. Return frame.

VideoFrame(data, init)

  1. If init is not a valid VideoFrameBufferInit, throw a TypeError.

  2. Let defaultRect be «[ "x:" → 0, "y" → 0, "width" → init.codedWidth, "height" → init.codedWidth ]».

  3. Let overrideRect be undefined.

  4. If init.visibleRect exists, assign its value to overrideRect.

  5. Let parsedRect be the result of running the Parse Visible Rect algorithm with defaultRect, overrideRect, init.codedWidth, init.codedHeight, and init.format.

  6. If parsedRect is an exception, return parsedRect.

  7. Let optLayout be undefined.

  8. If options.layout exists, assign its value to optLayout.

  9. Let combinedLayout be the result of running the Compute Layout and Allocation Size algorithm with parsedRect, init.format, and optLayout.

  10. If combinedLayout is an exception, throw combinedLayout.

  11. If data.byteLength is less than combinedLayout’s allocationSize, throw a TypeError.

  12. Let resource be a new media resource containing a copy of data. Use visibleRect and layout to determine where in data the pixels for each plane reside.

    The User Agent MAY choose to allocate resource with a larger coded size and plane strides to improve memory alignment. Increases will be reflected by codedWidth and codedHeight. Additionally, the User Agent MAY use visibleRect to copy only the visible rectangle. It MAY also reposition the visible rectangle within resource. The final position will be reflected by visibleRect.

  13. Let resourceCodedWidth be the coded width of resource.

  14. Let resourceCodedHeight be the coded height of resource.

  15. Let resourceVisibleLeft be the left offset for the visible rectangle of resource.

  16. Let resourceVisibleTop be the top offset for the visible rectangle of resource.

    The spec SHOULD provide definitions (and possibly diagrams) for coded size, visible rectangle, and display size. See #166.

  17. Let frame be a new VideoFrame object initialized as follows:

    1. Assign resourceCodedWidth, resourceCodedHeight, resourceVisibleLeft, and resourceVisibleTop to [[coded width]], [[coded height]], [[visible left]], and [[visible top]] respectively.

    2. If init.visibleRect exists:

      1. Let truncatedVisibleWidth be the value of visibleRect.width after truncating.

      2. Assign truncatedVisibleWidth to [[visible width]].

      3. Let truncatedVisibleHeight be the value of visibleRect.height after truncating.

      4. Assign truncatedVisibleHeight to [[visible height]].

    3. Otherwise:

      1. Assign [[coded width]] to [[visible width]].

      2. Assign [[coded height]] to [[visible height]].

    4. If init.displayWidth exists, assign it to [[display width]]. Otherwise, assign [[visible width]] to [[display width]].

    5. If init.displayHeight exists, assign it to [[display height]]. Otherwise, assign [[visible height]] to [[display height]].

    6. Assign init’s timestamp and duration to [[timestamp]] and [[duration]] respectively.

    7. Let colorSpace be undefined.

    8. If init.colorSpace exists, assign its value to colorSpace.

    9. Assign the result of running the Pick Color Space algorithm, with colorSpace and [[format]], to [[color space]].

  18. Return frame.

9.4.3. Attributes

format, of type VideoPixelFormat, readonly, nullable

Describes the arrangement of bytes in each plane as well as the number and order of the planes. Will be null whenever the underlying format does not map to a VideoPixelFormat or when [[Detached]] is true.

The format getter steps are to return [[format]].

codedWidth, of type unsigned long, readonly

Width of the VideoFrame in pixels, potentially including non-visible padding, and prior to considering potential ratio adjustments.

The codedWidth getter steps are to return [[coded width]].

codedHeight, of type unsigned long, readonly

Height of the VideoFrame in pixels, potentially including non-visible padding, and prior to considering potential ratio adjustments.

The codedHeight getter steps are to return [[coded height]].

codedRect, of type DOMRectReadOnly, readonly, nullable

A DOMRectReadOnly with width and height matching codedWidth and codedHeight and x and y at (0,0). Offered for convenience for use with allocationSize() and copyTo().

The codedRect getter steps are:

  1. If [[Detached]] is true, return null.

  2. Let rect be a new DOMRectReadOnly, initialized as follows:

    1. Assign 0 to x and y.

    2. Assign [[coded width]] and [[coded height]] to width and height respectively.

  3. Return rect.

visibleRect, of type DOMRectReadOnly, readonly, nullable

A DOMRectReadOnly describing the visible rectangle of pixels for this VideoFrame.

The visibleRect getter steps are:

  1. If [[Detached]] is true, return null.

  2. Let rect be a new DOMRectReadOnly, initialized as follows:

    1. Assign [[visible left]], [[visible top]], [[visible width]], and [[visible height]] to x, y, width, and height respectively.

  3. Return rect.

displayWidth, of type unsigned long, readonly

Width of the VideoFrame when displayed after applying aspect ratio adjustments.

The displayWidth getter steps are to return [[display width]].

displayHeight, of type unsigned long, readonly

Height of the VideoFrame when displayed after applying aspect ratio adjustments.

The displayHeight getter steps are to return [[display height]].

timestamp, of type long long, readonly, nullable

The presentation timestamp, given in microseconds. The timestamp is copied from the EncodedVideoChunk corresponding to this VideoFrame.

The timestamp getter steps are to return [[timestamp]].

duration, of type unsigned long long, readonly, nullable

The presentation duration, given in microseconds. The duration is copied from the EncodedVideoChunk corresponding to this VideoFrame.

The duration getter steps are to return [[duration]].

colorSpace, of type VideoColorSpace, readonly

The VideoColorSpace associated with this frame.

The colorSpace getter steps are to return [[color space]].

9.4.4. Internal Structures

A combined buffer layout is a struct that consists of:

A computed plane layout is a struct that consists of:

9.4.5. Methods

allocationSize(options)

Returns the minimum byte length for a valid destination BufferSource to be used with copyTo() with the given options.

When invoked, run these steps:

  1. If [[Detached]] is true, throw an InvalidStateError DOMException.

  2. If [[format]] is null, throw a NotSupportedError DOMException.

  3. Let combinedLayout be the result of running the Parse VideoFrameCopyToOptions algorithm with options.

  4. If combinedLayout is an exception, throw combinedLayout.

  5. Return combinedLayout’s allocationSize.

copyTo(destination, options)

Asynchronously copies the planes of this frame into destination according to options. The format of the data is the same as this VideoFrame's format.

NOTE: Promises that are returned by several calls to copyTo() are not guaranteed to resolve in the order they were returned.

When invoked, run these steps:

  1. If [[Detached]] is true, throw an InvalidStateError DOMException.

  2. If [[format]] is null, throw a NotSupportedError DOMException.

  3. Let combinedLayout be the result of running the Parse VideoFrameCopyToOptions algorithm with options.

  4. If combinedLayout is an exception, return a promise rejected with combinedLayout.

  5. If destination.byteLength is less than combinedLayout’s allocationSize, return a promise rejected with a TypeError.

  6. Let p be a new Promise.

  7. Let copyStepsQueue be the result of starting a new parallel queue.

  8. Enqueue the following steps to copyStepsQueue:

    1. Let resource be the media resource referenced by [[resource reference]].

    2. Let numPlanes be the number of planes as defined by [[format]].

    3. Let planeIndex be 0.

    4. While planeIndex is less than combinedLayout’s numPlanes:

      1. Let sourceStride be the stride of the plane in resource as identified by planeIndex.

      2. Let computedLayout be the computed plane layout in combinedLayout’s computedLayouts at the position of planeIndex

      3. Let sourceOffset be the product of multiplying computedLayout’s sourceTop by sourceStride

      4. Add computedLayout’s sourceLeftBytes to sourceOffset.

      5. Let destinationOffset be computedLayout’s destinationOffset.

      6. Let rowBytes be computedLayout’s sourceWidthBytes.

      7. Let row be 0.

      8. While row is less than computedLayout’s sourceHeight:

        1. Copy rowBytes bytes from resource starting at sourceOffset to destination starting at destinationOffset.

        2. Increment sourceOffset by sourceStride.

        3. Increment destinationOffset by computedLayout’s destinationStride.

        4. Increment row by 1.

      9. Increment planeIndex by 1.

    5. Queue a task on the control thread event loop to resolve p.

  9. Return p.

clone()

Creates a new VideoFrame with a reference to the same media resource.

When invoked, run these steps:

  1. If the value of frame’s [[Detached]] internal slot is true, throw an InvalidStateError DOMException.

  2. Return the result of running the Clone VideoFrame algorithm with this.

close()

Clears all state and releases the reference to the media resource. Close is final.

When invoked, run the Close VideoFrame algorithm with this.

9.4.6. Algorithms

Create a VideoFrame (with output, timestamp, duration, displayAspectWidth, displayAspectHeight, and colorSpace)
  1. Let frame be a new VideoFrame, constructed as follows:

    1. Assign false to [[Detached]].

    2. Let resource be the media resource described by output.

    3. Let resourceReference be a reference to resource.

    4. Assign resourceReference to [[resource reference]].

    5. If output uses a recognized VideoPixelFormat, assign that format to [[format]]. Otherwise, assign null to [[format]].

    6. Let codedWidth and codedHeight be the coded width and height of the output in pixels.

    7. Let visibleLeft, visibleTop, visibleWidth, and visibleHeight be the left, top, width and height for the visible rectangle of output.

    8. Let displayWidth and displayHeight be the the display size of output in pixels.

    9. If displayAspectWidth and displayAspectHeight are provided, increase displayWidth or displayHeight until the ratio of displayWidth to displayHeight matches the ratio of displayAspectWidth to displayAspectHeight.

    10. Assign codedWidth, codedHeight, visibleLeft, visibleTop, visibleWidth, visibleHeight, displayWidth, and displayHeight to [[coded width]], [[coded height]], [[visible left]], [[visible top]], [[visible width]], and [[visible height]] respectively.

    11. Assign duration and timestamp to [[duration]] and [[timestamp]] respectively.

    12. Assign [[color space]] with the result of running the Pick Color Space algorithm, with colorSpace and [[format]].

  2. Return frame.

Pick Color Space (with overrideColorSpace and format)
  1. If overrideColorSpace is provided, return a new VideoColorSpace constructed with overrideColorSpace.

    User Agents MAY replace null members of the provided overrideColorSpace with guessed values as determined by implementer defined heuristics.

  2. Otherwise, if [[format]] is an RGB format return a new instance of the sRGB Color Space

  3. Otherwise, return a new instance of the REC709 Color Space.

Validate VideoFrameInit (with format, codedWidth, and codedHeight):
  1. If visibleRect exists:

    1. Let validAlignment be the result of running the Verify Rect Offset Alignment with format and visibleRect.

    2. If validAlignment is false, return false.

    3. If any attribute of visibleRect is negative or not finite, return false.

    4. If visibleRect.width == 0 or visibleRect.height == 0 return false.

    5. If visibleRect.y + visibleRect.height >= codedHeight, return false.

    6. If visibleRect.x + visibleRect.width >= codedWidth, return false.

  2. If codedWidth = 0 or codedHeight = 0,return false.

  3. If only one of displayWidth or displayHeight exists, return false.

  4. If displayWidth == 0 or displayHeight == 0, return false.

  5. Return true.

To check if a VideoFrameBufferInit is a valid VideoFrameBufferInit, run these steps:
  1. If codedWidth = 0 or codedHeight = 0,return false.

  2. If any attribute of visibleRect is negative or not finite, return false.

  3. If visibleRect.y + visibleRect.height >= codedHeight, return false.

  4. If visibleRect.x + visibleRect.width >= codedWidth, return false.

  5. If only one of displayWidth or displayHeight exists, return false.

  6. If displayWidth = 0 or displayHeight = 0, return false.

  7. Return true.

Initialize Frame From Other Frame (with init, frame, and otherFrame)
  1. Let format be otherFrame.format.

  2. If init.alpha is discard, assign otherFrame.format's equivalent opaque format format.

  3. Let validInit be the result of running the Validate VideoFrameInit algorithm with format and otherFrame’s [[coded width]] and [[coded height]].

  4. If validInit is false, throw a TypeError.

  5. Let resource be the media resource referenced by otherFrame’s [[resource reference]].

  6. Assign a new reference for resource to frame’s [[resource reference]].

  7. Assign the following attributes from otherFrame to frame: codedWidth, codedHeight, colorSpace.

  8. Let defaultVisibleRect be the result of performing the getter steps for visibleRect on otherFrame.

  9. Let defaultDisplayWidth, and defaultDisplayHeight be otherFrame’s [[display width]], and [[display height]] respectively.

  10. Run the Initialize Visible Rect and Display Size algorithm with init, frame, defaultVisibleRect, defaultDisplayWidth, and defaultDisplayHeight.

  11. If duration exists in init, assign it to frame.duration. Otherwise, assign otherFrame.duration to frame.duration.

  12. If timestamp exists in init, assign it to frame.timestamp. Otherwise, assign otherFrame.timestamp to frame.timestamp.

  13. Assign format to frame.[[format]].

Initialize Frame With Resource and Size (with init, frame, resource, width and height)
  1. Let format be null.

  2. If resource uses a recognized VideoPixelFormat, assign the VideoPixelFormat of resource to format.

  3. Let validInit be the result of running the Validate VideoFrameInit algorithm with format, width and height.

  4. If validInit is false, throw a TypeError.

  5. Assign a new reference for resource to frame’s [[resource reference]].

  6. If init.alpha is discard, assign format’s equivalent opaque format to format.

  7. Assign format to [[format]]

  8. Assign width and height to frame’s [[coded width]] and [[coded height]] respectively.

  9. Let defaultVisibleRect be a new DOMRect constructed with «[ "x:" → 0, "y" → 0, "width" → width, "height" → height

  10. Run the Initialize Visible Rect and Display Size algorithm with init, frame, defaultVisibleRect, width, and height.

  11. Assign init.duration to frame.duration.

  12. Assign init.timestamp to frame.timestamp.

  13. If resource has a known VideoColorSpace, assign its value to [[color space]].

  14. Otherwise, assign a new VideoColorSpace, constructed with an empty VideoColorSpaceInit, to [[color space]].

Initialize Visible Rect and Display Size (with init, frame, defaultVisibleRect, defaultDisplayWidth and defaultDisplayHeight)
  1. Let visibleRect be defaultVisibleRect.

  2. If init.visibleRect exists, assign it to visibleRect.

  3. Assign visibleRect’s x, y, width, and height, to frame’s [[visible left]], [[visible top]], [[visible width]], and [[visible height]] respectively.

  4. If displayWidth and displayHeight exist in init, assign them to [[display width]] and [[display height]] respectively.

  5. Otherwise:

    1. Let widthScale be the result of dividing defaultDisplayWidth by defaultVisibleRect.width.

    2. Let heightScale be the result of dividing defaultDisplayHeight by defaultVisibleRect.height.

    3. Multiply frame’s [[visible width]] by widthScale and round the result. Assign the rounded result to [[display width]].

    4. Multiply frame’s [[visible height]] by heightScale and round the result. Assign the rounded result to frame’s [[display height]].

Clone VideoFrame (with frame)
  1. Let clone be a new VideoFrame initialized as follows:

    1. Let resource be the media resource referenced by frame’s [[resource reference]].

    2. Let newReference be a new reference to resource.

    3. Assign newReference to clone’s [[resource reference]].

    4. Assign all remaining internal slots of frame (excluding [[resource reference]]) to those of the same name in clone.

  2. Return clone.

Close VideoFrame (with frame)
  1. Assign null to frame’s [[resource reference]].

  2. Assign true to frame’s [[Detached]].

  3. Assign null to frame’s format.

  4. Assign 0 to frame’s [[coded width]], [[coded height]], [[visible left]], [[visible top]], [[visible width]], [[visible height]], [[display width]], and [[display height]].

  5. Assign null to frame’s [[duration]] and [[timestamp]].

Parse VideoFrameCopyToOptions (with options)
  1. Let defaultRect be the result of performing the getter steps for visibleRect.

  2. Let overrideRect be undefined.

  3. If options.rect exists:

    1. Assign the value of options.rect to overrideRect.

    2. Let validAlignment be the result of running the Verify Rect Size Alignment algorithm with overrideRect and [[format]].

    3. If validAlignment is false, throw a TypeError.

  4. Let parsedRect be the result of running the Parse Visible Rect algorithm with defaultRect, overrideRect, [[coded width]], [[coded height]], and [[format]].

  5. If parsedRect is an exception, return parsedRect.

  6. Let optLayout be undefined.

  7. If options.layout exists, assign its value to optLayout.

  8. Let combinedLayout be the result of running the Compute Layout and Allocation Size algorithm with parsedRect, [[format]], and optLayout.

  9. Return combinedLayout.

Verify Rect Offset Alignment (with format and rect)
  1. If format is null, return true.

  2. Let planeIndex be 0.

  3. Let numPlanes be the number of planes as defined by format.

  4. While planeIndex is less than numPlanes:

    1. Let plane be the Plane identified by planeIndex as defined by format.

    2. Let sampleWidth be the horizontal sub-sampling factor of each subsample for plane.

    3. Let sampleHeight be the vertical sub-sampling factor of each subsample for plane.

    4. If rect.x is not a multiple of sampleWidth, return false.

    5. If rect.y is not a multiple of sampleHeight, return false.

    6. Increment planeIndex by 1.

  5. Return true.

Verify Rect Size Alignment (with format and rect)
  1. If format is null, return true.

  2. Let planeIndex be 0.

  3. Let numPlanes be the number of planes as defined by format.

  4. While planeIndex is less than numPlanes:

    1. Let plane be the Plane identified by planeIndex as defined by format.

    2. Let sampleWidth be the horizontal sub-sampling factor of each subsample for plane.

    3. Let sampleHeight be the vertical sub-sampling factor of each subsample for plane.

    4. If rect.width is not a multiple of sampleWidth, return false.

    5. If rect.height is not a multiple of sampleHeight, return false.

    6. Increment planeIndex by 1.

  5. Return true.

Parse Visible Rect (with defaultRect, overrideRect, codedWidth, codedHeight, and format)
  1. Let sourceRect be defaultRect

  2. If overrideRect is not undefined:

    1. If either of overrideRect.width or height is 0, return a TypeError.

    2. If the sum of overrideRect.x and overrideRect.width is greater than codedWidth, return a TypeError.

    3. If the sum of overrideRect.y and overrideRect.height is greater than codedHeight, return a TypeError.

    4. Assign overrideRect to sourceRect.

  3. Let validAlignment be the result of running the Verify Rect Offset Alignment algorithm with format and sourceRect.

  4. If validAlignment is false, throw a TypeError.

  5. Return sourceRect.

Compute Layout and Allocation Size (with parsedRect, format, and layout)
  1. Let numPlanes be the number of planes as defined by format.

  2. If layout is not undefined and its length does not equal numPlanes, throw a TypeError.

  3. Let minAllocationSize be 0.

  4. Let computedLayouts be a new list.

  5. Let endOffsets be a new list.

  6. Let planeIndex be 0.

  7. While planeIndex < numPlanes:

    1. Let plane be the Plane identified by planeIndex as defined by format.

    2. Let sampleBytes be the number of bytes per sample for plane.

    3. Let sampleWidth be the horizontal sub-sampling factor of each subsample for plane.

    4. Let sampleHeight be the vertical sub-sampling factor of each subsample for plane.

    5. Let sampleWidthBytes be the product of multiplying sampleWidth by sampleBytes.

    6. Let computedLayout be a new computed plane layout.

    7. Set computedLayout’s sourceTop to the result of the integer division of truncated parsedRect.y by sampleHeight.

    8. Set computedLayout’s sourceHeight to the result of the integer division of truncated parsedRect.height by sampleHeight

    9. Set computedLayout’s sourceLeftBytes to the result of the integer division of truncated parsedRect.x by sampleWidthBytes.

    10. Set computedLayout’s sourceWidthBytes to the result of the integer division of truncated parsedRect.width by sampleWidthBytes.

    11. If layout is not undefined:

      1. Let planeLayout be the PlaneLayout in layout at position planeIndex.

      2. If planeLayout.stride is less than computedLayout’s sourceWidthBytes, return a TypeError.

      3. Assign planeLayout.offset to computedLayout’s destinationOffset.

      4. Assign planeLayout.stride to computedLayout’s destinationStride.

    12. Otherwise:

      NOTE: If an explicit layout was not provided, the following steps default to tight packing.

      1. Assign minAllocationSize to computedLayout’s destinationOffset.

      2. Assign computedLayout’s sourceWidthBytes to computedLayout’s destinationStride.

    13. Let planeSize be the product of multiplying computedLayout’s destinationStride and sourceHeight.

    14. Let planeEnd be the sum of planeSize and computedLayout’s destinationOffset.

    15. If planeSize or planeEnd is greater than maximum range of unsigned long, return a TypeError.

    16. Append planeEnd to endOffsets.

    17. Assign the maximum of minAllocationSize and planeEnd to minAllocationSize.

      NOTE: The above step uses a maximum to allow for the possibility that user specified plane offsets reorder planes.

    18. Let earlierPlaneIndex be 0.

    19. While earlierPlaneIndex is less than planeIndex.

      1. Let earlierLayout be computedLayouts[earlierPlaneIndex].

      2. If endOffsets[planeIndex] is less than or equal to earlierLayout’s destinationOffset or if endOffsets[earlierPlaneIndex] is less than or equal to computedLayout’s destinationOffset, continue.

        NOTE: If plane A ends before plane B starts, they do not overlap.

      3. Otherwise, return a TypeError.

      4. Increment earlierPlaneIndex by 1.

    20. Append computedLayout to computedLayouts.

    21. Increment planeIndex by 1.

  8. Let combinedLayout be a new combined buffer layout, initialized as follows:

    1. Assign computedLayouts to computedLayouts.

    2. Assign minAllocationSize to allocationSize.

  9. Return combinedLayout.

9.4.7. Transfer and Serialization

The VideoFrame transfer steps (with value and dataHolder) are:
  1. If value’s [[Detached]] is true, throw a DataCloneError DOMException.

  2. For all VideoFrame internal slots in value, assign the value of each internal slot to a field in dataHolder with the same name as the internal slot.

  3. Run the Close VideoFrame algorithm with value.

The VideoFrame transfer-receiving steps (with dataHolder and value) are:
  1. For all named fields in dataHolder, assign the value of each named field to the VideoFrame internal slot in value with the same name as the named field.

The VideoFrame serialization steps (with value, serialized, and forStorage) are:
  1. If value’s [[Detached]] is true, throw a DataCloneError DOMException.

  2. If forStorage is true, throw a TypeError.

  3. Let resource be the media resource referenced by value’s [[resource reference]].

  4. Let newReference be a new reference to resource.

  5. Assign newReference to serialized.[[resource reference]].

  6. For all remaining VideoFrame internal slots (excluding [[resource reference]]) in value, assign the value of each internal slot to a field in serialized with the same name as the internal slot.

The VideoFrame deserialization steps (with serialized and value) are:
  1. For all named fields in serialized, assign the value of each named field to the VideoFrame internal slot in value with the same name as the named field.

9.4.8. Rendering

When rendered, for example by CanvasDrawImage drawImage(), a VideoFrame MUST be converted to a color space compatible with the rendering target, unless color conversion is explicitly disabled.

Color space conversion during ImageBitmap construction is controlled by ImageBitmapOptions colorSpaceConversion. Setting this value to "none" disables color space conversion.

9.5. VideoFrame CopyTo() Options

Options to specify a rectangle of pixels to copy and the offset and stride of planes in the destination buffer.
dictionary VideoFrameCopyToOptions {
  DOMRectInit rect;
  sequence<PlaneLayout> layout;
};
NOTE: The steps of copyTo() or allocationSize() will enforce the following requirements:
rect, of type DOMRectInit

A DOMRectInit describing the rectangle of pixels to copy from the VideoFrame. If unspecified, the visibleRect will be used.

NOTE: The coded rectangle can be specified by passing VideoFrame's codedRect.

NOTE: The default rect does not necessarily meet the sample-alignment requirement and can result in copyTo() or allocationSize() rejecting.

layout, of type sequence<PlaneLayout>

The PlaneLayout for each plane in VideoFrame, affording the option to specify an offset and stride for each plane in the destination BufferSource. If unspecified, the planes will be tightly packed. It is invalid to specify planes that overlap.

9.6. DOMRects in VideoFrame

The VideoFrame interfaces uses DOMRects to specify the position and dimensions for a rectangle of pixels. DOMRectInit is used with copyTo() and allocationSize() to describe the dimensions of the source rectangle. VideoFrame defines codedRect and visibleRect for convenient copying of the coded size and visible region respectively.

NOTE: VideoFrame pixels are only addressable by integer numbers. All floating point values provided to DOMRectInit will be truncated.

9.7. Plane Layout

A PlaneLayout is a dictionary specifying the offset and stride of a VideoFrame plane once copied to a BufferSource. A sequence of PlaneLayouts MAY be provided to VideoFrame's copyTo() to specify how the plane is laid out in the destination BufferSource}. Alternatively, callers can inspect copyTo()'s returned sequence of PlaneLayouts to learn the the offset and stride for planes as decided by the User Agent.
dictionary PlaneLayout {
  [EnforceRange] required unsigned long offset;
  [EnforceRange] required unsigned long stride;
};
offset, of type unsigned long

The offset in bytes where the given plane begins within a BufferSource.

stride, of type unsigned long

The number of bytes, including padding, used by each row of the plane within a BufferSource.

9.8. Pixel Format

Pixel formats describe the arrangement of bytes in each plane as well as the number and order of the planes. Each format is described in its own sub-section.
enum VideoPixelFormat {
  // 4:2:0 Y, U, V
  "I420",
  // 4:2:0 Y, U, V, A
  "I420A",
  // 4:2:2 Y, U, V
  "I422",
  // 4:4:4 Y, U, V
  "I444",
  // 4:2:0 Y, UV
  "NV12",
  // 32bpp RGBA
  "RGBA",
  // 32bpp RGBX (opaque)
  "RGBX",
  // 32bpp BGRA
  "BGRA",
  // 32bpp BGRX (opaque)
  "BGRX",
};

Sub-sampling is a technique where a single sample contains information for multiple pixels in the final image. Sub-sampling can be horizontal, vertical or both, and has a factor, that is the number of final pixels in the image that are derived from a sub-sampled sample.

If a VideoFrame is in I420 format, then the very first component of the second plane (the U plane) corresponds to four pixels, that are the pixels in the top-left angle of the image. Consequently, the first component of the second row corresponds to the four pixels below those initial four top-left pixels. The sub-sampling factor is 2 in both the horizontal and vertical direction.

If a VideoPixelFormat has an alpha component, the format’s equivalent opaque format is the same VideoPixelFormat, without an alpha component. If a VideoPixelFormat does not have an alpha component, it is its own equivalent opaque format.

I420
This format is composed of three distinct planes, one plane of Luma and two planes of Chroma, denoted Y, U and V, and present in this order. It is also often refered to as Planar YUV 4:2:0.

The U an V planes are sub-sampled horizontaly and vertically by a factor of 2 compared to the Y plane.

Each sample in this format is 8 bits.

There are codedWidth * codedHeight samples (and therefore bytes) in the Y plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

There is codedWidth * codedHeight / 4 samples (and therefore bytes) in the two U and V planes, arranged starting at the top left in the image, in codedHeight / 2 lines of codedWidth / 2 samples.

The codedWidth and codedHeight MUST be even. Similarly, the visible rectangle offset (visibleRect.x and visibleRect.y) MUST be even.

I420A

This format is composed of four distinct planes, one plane of Luma, two planes of Chroma, denoted Y, U and V, and one place of alpha values, all present in this order. It is also often refered to as Planar YUV 4:2:0 with an alpha channel.

The U an V planes are sub-sampled horizontaly and vertically by a factor of 2 compared to the Y and Alpha planes.

Each sample in this format is 8 bits.

There are codedWidth * codedHeight samples (and therefore bytes) in the Y and alpha plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

There are codedWidth * codedHeight / 4 samples (and therefore bytes) in the two U and V planes, arranged starting at the top left in the image, in codedHeight / 2 lines of codedWidth / 2 samples.

The codedWidth and codedHeight MUST be even. Similarly, the visible rectangle offset (visibleRect.x and visibleRect.y) MUST be even.

I420A's equivalent opaque format is I420.

I422

This format is composed of three distinct planes, one plane of Luma and two planes of Chroma, denoted Y, U and V, and present in this order. It is also often refered to as Planar YUV 4:2:2.

The U an V planes are sub-sampled horizontaly by a factor of 2 compared to the Y plane, and not sub-sampled vertically.

Each sample in this format is 8 bits.

There are codedWidth * codedHeight samples (and therefore bytes) in the Y plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

There are codedWidth * codedHeight / 2 samples (and therefore bytes) in the two U and V planes, arranged starting at the top left in the image, in codedHeight / 2 lines of codedWidth samples.

The codedHeight MUST be even. Similarly, the visible rectangle offset (visibleRect.x and visibleRect.y) MUST be even.

I444

This format is composed of three distinct planes, one plane of Luma and two planes of Chroma, denoted Y, U and V, and present in this order. It is also often refered to as Planar YUV 4:4:4.

Each sample in this format is 8 bits. This format does not use sub-sampling.

There are codedWidth * codedHeight samples (and therefore bytes) in all three planes, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

NV12

This format is composed of two distinct planes, one plane of Luma and then another plane for the two Chroma components. The two planes are present in this order, and are refered to as respectively the Y plane and the UV plane.

The U an V components are sub-sampled horizontaly and vertically by a factor of 2 compared to the components in the Y planes.

Each sample in this format is 8 bits.

There are codedWidth * codedHeight samples (and therefore bytes) in the Y plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

The UV planes is composed of interleaved U and V values, in codedWidth * codedHeight / 4 elements (of two bytes each), arranged starting at the top left in the image, in codedHeight / 2 lines of codedWidth / 2 elements. Each element is composed of a two Chroma values, the U and V value, in this order.

The codedWidth and codedHeight MUST be even. Similarly, the visible rectangle offset (visibleRect.x and visibleRect.y) MUST be even.

An image in the NV12 pixel format that is 16 pixels wide and 9 pixels tall will be arranged like so in memory:
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
YYYYYYYYYYYYYY
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV
UVUVUVUVUVUVUV

All samples being linear in memory.

RGBA

This format is composed of a single plane, that encodes four components: Red, Green, Blue, and an alpha value, present in this order.

Each sample in this format is 8 bits, and each pixel is therefore 32 bits.

There are codedWidth * codedHeight * 4 samples (and therefore bytes) in the single plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

RGBA's equivalent opaque format is RGBX.

RGBX

This format is composed of a single plane, that encodes four components: Red, Green, Blue, and a padding value, present in this order.

Each sample in this format is 8 bits. The fourth element in each pixel is to be ignored, the image is always fully opaque.

There are codedWidth * codedHeight * 4 samples (and therefore bytes) in the single plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

BGRA

This format is composed of a single plane, that encodes four components: Blue, Green, Red, and an alpha value, present in this order.

Each sample in this format is 8 bits.

There are codedWidth * codedHeight * 4 samples (and therefore bytes) in the single plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

BGRA's equivalent opaque format is BGRX.

BGRX

This format is composed of a single plane, that encodes four components: Blue, Green, Red, and a padding value, present in this order.

Each sample in this format is 8 bits. The fourth element in each pixel is to be ignored, the image is always fully opaque.

There are codedWidth * codedHeight * 4 samples (and therefore bytes) in the single plane, arranged starting at the top left in the image, in codedHeight lines of codedWidth samples.

9.9. Video Color Space Interface

[Exposed=(Window,DedicatedWorker)]
interface VideoColorSpace {
  constructor(optional VideoColorSpaceInit init = {});

  readonly attribute VideoColorPrimaries? primaries;
  readonly attribute VideoTransferCharacteristics? transfer;
  readonly attribute VideoMatrixCoefficients? matrix;
  readonly attribute boolean? fullRange;

  [Default] VideoColorSpaceInit toJSON();
};

dictionary VideoColorSpaceInit {
  VideoColorPrimaries primaries;
  VideoTransferCharacteristics transfer;
  VideoMatrixCoefficients matrix;
  boolean fullRange;
};

9.9.1. Internal Slots

[[primaries]]

The color primaries.

[[transfer]]

The transfer characteristics.

[[matrix]]

The matrix coefficients.

[[full range]]

Indicates whether full-range color values are used.

9.9.2. Constructors

VideoColorSpace(init)
  1. Let c be a new VideoColorSpace object, initialized as follows:

    1. If primaries is present in init, assign init.primaries to [[primaries]]. Otherwise, assign null to [[primaries]].

    2. If transfer is present in init, assign init.transfer to [[transfer]]. Otherwise, assign null to [[transfer]].

    3. If matrix is present in init, assign init.matrix to [[matrix]]. Otherwise, assign null to [[matrix]].

    4. If fullRange is present in init, assign init.fullRange to [[full range]]. Otherwise, assign null to [[full range]].

  2. Return c.

9.9.3. Attributes

primaries, of type VideoColorPrimaries, readonly, nullable

The primaries getter steps are to return the value of [[primaries]].

transfer, of type VideoTransferCharacteristics, readonly, nullable

The transfer getter steps are to return the value of [[transfer]].

matrix, of type VideoMatrixCoefficients, readonly, nullable

The matrix getter steps are to return the value of [[matrix]].

fullRange, of type boolean, readonly, nullable

The fullRange getter steps are to return the value of [[full range]].

9.10. Video Color Primaries

Color primaries describe the color gamut of video samples.
enum VideoColorPrimaries {
  "bt709",      // BT.709, sRGB
  "bt470bg",    // BT.601 PAL
  "smpte170m",  // BT.601 NTSC
};
bt709
Color primaries used by BT.709 and sRGB, as described by [H.273] section 8.1 table 2 value 1.
bt470bg
Color primaries used by BT.601 PAL, as described by [H.273] section 8.1 table 2 value 5.
smpte170m
Color primaries used by BT.601 NTSC, as described by [H.273] section 8.1 table 2 value 6.

9.11. Video Transfer Characteristics

Transfer characteristics describe the opto-electronic transfer characteristics of video samples.
enum VideoTransferCharacteristics {
  "bt709",         // BT.709
  "smpte170m",     // BT.601 (functionally the same as bt709)
  "iec61966-2-1",  // sRGB
};
bt709
Transfer characteristics used by BT.709, as described by [H.273] section 8.2 table 3 value 1.
smpte170m
Transfer characteristics used by BT.601, as described by [H.273] section 8.2 table 3 value 6.
iec61966-2-1
Transfer characteristics used by sRGB, as described by [H.273] section 8.2 table 3 value 13.

9.12. Video Matrix Coefficients

Matrix coefficients describe the relationship between sample component values and color coordinates.
enum VideoMatrixCoefficients {
  "rgb",        // sRGB
  "bt709",      // BT.709
  "bt470bg",    // BT.601 PAL
  "smpte170m",  // BT.601 NTSC (functionally the same as bt470bg)
};
rgb
Matrix coefficients used by sRGB, as described by [H.273] section 8.3 table 4 value 0.
bt709
Matrix coefficients used by BT.709, as described by [H.273] section 8.3 table 4 value 1.
bt470bg
Matrix coefficients used by BT.601 PAL, as described by [H.273] section 8.3 table 4 value 5.
smpte170m
Matrix coefficients used by BT.601 NTSC, as described by [H.273] section 8.3 table 4 value 6.

10. Image Decoding

10.1. Background

This section is non-normative.

Image codec definitions are typically accompanied by a definition for a corresponding file format. Hence image decoders often perform both duties of unpacking (demuxing) as well as decoding the encoded image data. The WebCodecs ImageDecoder follows this pattern, which motivates an interface design that is notably different from that of VideoDecoder and AudioDecoder.

In spite of these differences, ImageDecoder uses the same codec processing model as the other codec interfaces. Additionally, ImageDecoder uses the VideoFrame interface to describe decoded outputs.

10.2. ImageDecoder Interface

[Exposed=(Window,DedicatedWorker), SecureContext]
interface ImageDecoder {
  constructor(ImageDecoderInit init);

  readonly attribute DOMString type;
  readonly attribute boolean complete;
  readonly attribute Promise<undefined> completed;
  readonly attribute ImageTrackList tracks;

  Promise<ImageDecodeResult> decode(optional ImageDecodeOptions options = {});
  undefined reset();
  undefined close();

  static Promise<boolean> isTypeSupported(DOMString type);
};

10.2.1. Internal Slots

[[ImageTrackList]]

An ImageTrackList describing the tracks found in [[encoded data]]

[[type]]

A string reflecting the value of the MIME type given at construction.

[[complete]]

A boolean indicating whether [[encoded data]] is completely buffered.

[[completed promise]]

The promise used to signal when [[complete]] becomes true.

[[codec implementation]]

An underlying image decoder implementation provided by the User Agent.

[[encoded data]]

A byte sequence containing the encoded image data to be decoded.

[[prefer animation]]

A boolean reflecting the value of preferAnimation given at construction.

[[pending decode promises]]

A list of unresolved promises returned by calls to decode().

[[internal selected track index]]

Identifies the image track within [[encoded data]] that is used by decoding algorithms on the codec thread.

[[tracks established]]

A boolean indicating whether the track list has been established in [[ImageTrackList]].

[[closed]]

A boolean indicating that the ImageDecoder is in a permanent closed state and can no longer be used.

[[progressive frame generations]]

A mapping of frame indices to Progressive Image Frame Generations. The values represent the Progressive Image Frame Generation for the VideoFrame which was most recently output by a call to decode() with the given frame index.

10.2.2. Constructor

ImageDecoder(init)

NOTE: Calling decode() on the constructed ImageDecoder will trigger a NotSupportedError if the User Agent does not support type. Authors are encouraged to first check support by calling isTypeSupported() with type. User Agents don’t have to support any particular type.

When invoked, run these steps:

  1. If init is not valid ImageDecoderInit, throw a TypeError.

  2. Let d be a new ImageDecoder object. In the steps below, all mentions of ImageDecoder members apply to d unless stated otherwise.

  3. Assign [[ImageTrackList]] a new ImageTrackList initialized as follows:

    1. Assign a new list to [[track list]].

    2. Assign -1 to [[selected index]].

  4. Assign type to [[type]].

  5. Assign null to [[codec implementation]].

  6. If init.preferAnimation exists, assign init.preferAnimation to the [[prefer animation]] internal slot. Otherwise, assign 'null' to [[prefer animation]] internal slot.

  7. Assign a new list to [[pending decode promises]].

  8. Assign -1 to [[internal selected track index]].

  9. Assign false to [[tracks established]].

  10. Assign false to [[closed]].

  11. Assign a new map to [[progressive frame generations]].

  12. If init’s data member is of type ReadableStream:

    1. Assign a new list to [[encoded data]].

    2. Assign false to [[complete]]

    3. Queue a control message to configure the image decoder with init.

    4. Let reader be the result of getting a reader for data.

    5. In parallel, perform the Fetch Stream Data Loop on d with reader.

  13. Otherwise:

    1. Assert that init.data is of type BufferSource.

    2. Assign a copy of init.data to [[encoded data]].

    3. Assign true to [[complete]].

    4. Reslove [[completed promise]].

    5. Queue a control message to configure the image decoder with init.

    6. Queue a control message to decode track metadata.

  14. return d.

Running a control message to configure the image decoder means running these steps:

  1. Let supported be the result of running the Check Type Support algorithm with init.type.

  2. If supported is false, queue a task on the control thread event loop to run the Close ImageDecoder algorithm with a NotSupportedError DOMException and abort these steps.

  3. If supported is true, assign the [[codec implementation]] internal slot with an implementation supporting init.type

  4. Configure [[codec implementation]] in accordance with the values given for premultiplyAlpha, colorSpaceConversion, desiredWidth, and desiredHeight.

Running a control message to decode track metadata means running these steps:

  1. Run the Establish Tracks algorithm.

10.2.3. Attributes

type, of type DOMString, readonly

A string reflecting the value of the MIME type given at construction.

The type getter steps are to return [[type]].

complete, of type boolean, readonly

Indicates whether [[encoded data]] is completely buffered.

The complete getter steps are to return [[complete]].

completed, of type Promise<undefined>, readonly

The promise used to signal when complete becomes true.

The completed getter steps are to return [[completed promise]].

tracks, of type ImageTrackList, readonly

Returns a live ImageTrackList, which provides metadata for the available tracks and a mechanism for selecting a track to decode.

The tracks getter steps are to return [[ImageTrackList]].

10.2.4. Methods

decode(options)

Enqueues a control message to decode the frame according to options.

When invoked, run these steps:

  1. If [[closed]] is true, return a Promise rejected with an InvalidStateError DOMException.

  2. If [[ImageTrackList]]'s [[selected index]] is '-1', return a Promise rejected with an InvalidStateError DOMException.

  3. If options is undefined, assign a new ImageDecodeOptions to options.

  4. Let promise be a new Promise.

  5. Queue a control message to decode the image with options, and promise.

  6. Append promise to [[pending decode promises]].

  7. Return promise.

Running a control message to decode the image means running these steps:

  1. Wait for [[tracks established]] to become true.

  2. If options.completeFramesOnly is false and the image is a Progressive Image for which the User Agent supports progressive decoding, run the Decode Progressive Frame algorithm with options.frameIndex and promise.

  3. Otherwise, run the Decode Complete Frame algorithm with options.frameIndex and promise.

reset()

Immediately aborts all pending work.

When invoked, run the Reset ImageDecoder algorithm with an AbortError DOMException.

close()

Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close ImageDecoder algorithm with an AbortError DOMException.

isTypeSupported(type)

Returns a promise indicating whether the provided config is supported by the User Agent.

When invoked, run these steps:

  1. If type is not a valid image MIME type, return a Promise rejected with TypeError.

  2. Let p be a new Promise.

  3. In parallel, resolve p with the result of running the Check Type Support algorithm with type.

  4. Return p.

10.2.5. Algorithms

Fetch Stream Data Loop (with reader)

Run these steps:

  1. Let readRequest be the following read request.

    chunk steps, given chunk
    1. If [[closed]] is true, abort these steps.

    2. If chunk is not a Uint8Array object, queue a task on the control thread event loop to run the Close ImageDecoder algorithm with a DataError DOMException and abort these steps.

    3. Let bytes be the byte sequence represented by the Uint8Array object.

    4. Append bytes to the [[encoded data]] internal slot.

    5. If [[tracks established]] is false, run the Establish Tracks algorithm.

    6. Otherwise, run the Update Tracks algorithm.

    7. Run the Fetch Stream Data Loop algorithm with reader.

    close steps
    1. Assign true to [[complete]]

    2. Resolve [[completed promise]].

    error steps
    1. Queue a task on the control thread event loop to run the Close ImageDecoder algorithm with a NotReadableError DOMException

  2. Read a chunk from reader given readRequest.

Establish Tracks

Run these steps:

  1. Assert [[tracks established]] is false.

  2. If [[encoded data]] does not contain enough data to determine the number of tracks:

    1. If complete is true, queue a task on the control thread event loop to run the Close ImageDecoder algorithm.

    2. Abort these steps.

  3. If the number of tracks is found to be 0, queue a task on the control thread event loop to run the Close ImageDecoder algorithm and abort these steps.

  4. Let newTrackList be a new list.

  5. For each image track found in [[encoded data]]:

    1. Let newTrack be a new ImageTrack, initialized as follows:

      1. Assign this to [[ImageDecoder]].

      2. Assign tracks to [[ImageTrackList]].

      3. If image track is found to be animated, assign true to newTrack’s [[animated]] internal slot. Otherwise, assign false.

      4. If image track is found to describe a frame count, assign that count to newTrack’s [[frame count]] internal slot. Otherwise, assign 0.

        NOTE: If this was constructed with data as a ReadableStream, the frameCount can change as additional bytes are appended to [[encoded data]]. See the Update Tracks algorithm.

      5. If image track is found to describe a repetition count, assign that count to [[repetition count]] internal slot. Otherwise, assign 0.

        NOTE: A value of Infinity indicates infinite repetitions.

      6. Assign false to newTrack’s [[selected]] internal slot.

    2. Append newTrack to newTrackList.

  6. Let selectedTrackIndex be the result of running the Get Default Selected Track Index algorithm with newTrackList.

  7. Let selectedTrack be the track at position selectedTrackIndex within newTrackList.

  8. Assign true to selectedTrack’s [[selected]] internal slot.

  9. Assign selectedTrackIndex to [[internal selected track index]].

  10. Assign true to [[tracks established]].

  11. Queue a task on the control thread event loop to perform the following steps:

    1. Assign newTrackList to the tracks [[track list]] internal slot.

    2. Assign selectedTrackIndex to tracks [[selected index]].

    3. Resolve [[ready promise]].

Get Default Selected Track Index (with trackList)

Run these steps:

  1. If [[encoded data]] identifies a Primary Image Track:

    1. Let primaryTrack be the ImageTrack from trackList that describes the Primary Image Track.

    2. Let primaryTrackIndex be position of primaryTrack within trackList.

    3. If [[prefer animation]] is null, return primaryTrackIndex.

    4. If primaryTrack.animated equals [[prefer animation]], return primaryTrackIndex.

  2. If any ImageTracks in trackList have animated equal to [[prefer animation]], return the position of the earliest such track in trackList.

  3. Return 0.

Update Tracks

A track update struct is a struct that consists of a track index (unsigned long) and a frame count (unsigned long).

Run these steps:

  1. Assert [[tracks established]] is true.

  2. Let trackChanges be a new list.

  3. Let trackList be a copy of tracks' [[track list]].

  4. For each track in trackList:

    1. Let trackIndex be the position of track in trackList.

    2. Let latestFrameCount be the frame count as indicated by [[encoded data]] for the track corresponding to track.

    3. Assert that latestFrameCount is greater than or equal to track.frameCount.

    4. If latestFrameCount is greater than track.frameCount:

      1. Let change be a track update struct whose track index is trackIndex and frame count is latestFrameCount.

      2. Append change to tracksChanges.

  5. If tracksChanges is empty, abort these steps.

  6. Queue a task on the control thread event loop to perform the following steps:

    1. For each update in trackChanges:

      1. Let updateTrack be the ImageTrack at position update.trackIndex within tracks' [[track list]].

      2. Assign update.frameCount to updateTrack’s [[frame count]].

      3. Fire a simple event named change at the tracks object.

Decode Complete Frame (with frameIndex and promise)
  1. Assert that [[tracks established]] is true.

  2. Assert that [[internal selected track index]] is not -1.

  3. Let encodedFrame be the encoded frame identified by frameIndex and [[internal selected track index]].

  4. Wait for any of the following conditions to be true (whichever happens first):

    1. [[encoded data]] contains enough bytes to completely decode encodedFrame.

    2. [[encoded data]] is found to be malformed.

    3. complete is true.

    4. [[closed]] is true.

  5. If [[encoded data]] is found to be malformed, run the Fatally Reject Bad Data algorithm and abort these steps.

  6. If [[encoded data]] does not contain enough bytes to completely decode encodedFrame, run the Reject Infeasible Decode algorithm with promise and abort these steps.

  7. Attempt to use [[codec implementation]] to decode encodedFrame.

  8. If decoding produces an error, run the Fatally Reject Bad Data algorithm and abort these steps.

  9. If [[progressive frame generations]] contains an entry keyed by frameIndex, remove the entry from the map.

  10. Let output be the decoded image data emitted by [[codec implementation]] corresponding to encodedFrame.

  11. Let decodeResult be a new ImageDecodeResult initialized as follows:

    1. Assign 'true' to complete.

    2. Let timestamp and duration be the presentation timestamp and duration for output as described by encodedFrame. If encodedFrame does not describe a timestamp or duration, assign null to the corresponding variable.

    3. Assign image with the result of running the Create a VideoFrame algorithm with output, timestamp, and duration.

  12. Run the Resolve Decode algorithm with promise and decodeResult.

Decode Progressive Frame (with frameIndex and promise)
  1. Assert that [[tracks established]] is true.

  2. Assert that [[internal selected track index]] is not -1.

  3. Let encodedFrame be the encoded frame identified by frameIndex and [[internal selected track index]].

  4. Let lastFrameGeneration be null.

  5. If [[progressive frame generations]] contains a map entry with the key frameIndex, assign the value of the map entry to lastFrameGeneration.

  6. Wait for any of the following conditions to be true (whichever happens first):

    1. [[encoded data]] contains enough bytes to decode encodedFrame to produce an output whose Progressive Image Frame Generation exceeds lastFrameGeneration.

    2. [[encoded data]] is found to be malformed.

    3. complete is true.

    4. [[closed]] is true.

  7. If [[encoded data]] is found to be malformed, run the Fatally Reject Bad Data algorithm and abort these steps.

  8. Otherwise, if [[encoded data]] does not contain enough bytes to decode encodedFrame to produce an output whose Progressive Image Frame Generation exceeds lastFrameGeneration, run the Reject Infeasible Decode algorithm with promise and abort these steps.

  9. Attempt to use [[codec implementation]] to decode encodedFrame.

  10. If decoding produces an error, run the Fatally Reject Bad Data algorithm and abort these steps.

  11. Let output be the decoded image data emitted by [[codec implementation]] corresponding to encodedFrame.

  12. Let decodeResult be a new ImageDecodeResult.

  13. If output is the final full-detail progressive output corresponding to encodedFrame:

    1. Assign true to decodeResult’s complete.

    2. If [[progressive frame generations]] contains an entry keyed by frameIndex, remove the entry from the map.

  14. Otherwise:

    1. Assign false to decodeResult’s complete.

    2. Let frameGeneration be the Progressive Image Frame Generation for output.

    3. Add a new entry to [[progressive frame generations]] with key frameIndex and value frameGeneration.

  15. Let timestamp and duration be the presentation timestamp and duration for output as described by encodedFrame. If encodedFrame does not describe a timestamp or duration, assign null to the corresponding variable.

  16. Assign image with the result of running the Create a VideoFrame algorithm with output, timestamp, and duration.

  17. Remove promise from [[pending decode promises]].

  18. Resolve promise with decodeResult.

Resolve Decode (with promise and result)
  1. Queue a task on the control thread event loop to run these steps:

    1. If [[closed]], abort these steps.

    2. Assert that promise is an element of [[pending decode promises]].

    3. Remove promise from [[pending decode promises]].

    4. Resolve promise with result.

Reject Infeasible Decode (with promise)
  1. Assert that complete is true or [[closed]] is true.

  2. If complete is true, let exception be a RangeError. Otherwise, let exception be an InvalidStateError DOMException.

  3. Queue a task on the control thread event loop to run these steps:

    1. If [[closed]], abort these steps.

    2. Assert that promise is an element of [[pending decode promises]].

    3. Remove promise from [[pending decode promises]].

    4. Reject promise with exception.

Fatally Reject Bad Data
  1. Queue a task on the control thread event loop to run these steps:

    1. If [[closed]], abort these steps.

    2. Run the Close ImageDecoder algorithm with an EncodingError DOMException.

Check Type Support (with type)
  1. If the User Agent can provide a codec to support decoding type, return true.

  2. Otherwise, return false.

Reset ImageDecoder (with exception)
  1. Signal [[codec implementation]] to abort any active decoding operation.

  2. For each decodePromise in [[pending decode promises]]:

    1. Reject decodePromise with exception.

    2. Remove decodePromise from [[pending decode promises]].

Close ImageDecoder (with exception)
  1. Run the Reset ImageDecoder algorithm with exception.

  2. Assign true to [[closed]].

  3. Clear [[codec implementation]] and release associated system resources.

  4. Remove all entries from [[ImageTrackList]].

  5. Assign -1 to [[ImageTrackList]]'s [[selected index]].

10.3. ImageDecoderInit Interface

typedef (BufferSource or ReadableStream) ImageBufferSource;
dictionary ImageDecoderInit {
  required DOMString type;
  required ImageBufferSource data;
  PremultiplyAlpha premultiplyAlpha = "default";
  ColorSpaceConversion colorSpaceConversion = "default";
  [EnforceRange] unsigned long desiredWidth;
  [EnforceRange] unsigned long desiredHeight;
  boolean preferAnimation;
};

To determine if an ImageDecoderInit is a valid ImageDecoderInit, run these steps:

  1. If type is not a valid image MIME type, return false.

  2. If data is of type ReadableStream and the ReadableStream is disturbed or locked, return false.

  3. If data is of type BufferSource:

    1. If the result of running IsDetachedBuffer (described in [ECMASCRIPT]) on data is false, return false.

    2. If data is empty, return false.

  4. If desiredWidth exists and desiredHeight does not exist, return false.

  5. If desiredHeight exists and desiredWidth does not exist, return false.

  6. Return true.

A valid image MIME type is a string that is a valid MIME type string and for which the type, per Section 3.1.1.1 of [RFC7231], is image.

type, of type DOMString

String containing the MIME type of the image file to be decoded.

data, of type ImageBufferSource

BufferSource or ReadableStream of bytes representing an encoded image file as described by type.

premultiplyAlpha, of type PremultiplyAlpha, defaulting to "default"

Controls whether decoded outputs' color channels are to be premultiplied by their alpha channel, as defined by premultiplyAlpha in ImageBitmapOptions.

colorSpaceConversion, of type ColorSpaceConversion, defaulting to "default"

Controls whether decoded outputs' color space is converted or ignored, as defined by colorSpaceConversion in ImageBitmapOptions.

desiredWidth, of type unsigned long

Indicates a desired width for decoded outputs. Implementation is best effort; decoding to a desired width MAY not be supported by all formats/ decoders.

desiredHeight, of type unsigned long

Indicates a desired height for decoded outputs. Implementation is best effort; decoding to a desired height MAY not be supported by all formats/decoders.

preferAnimation, of type boolean

For images with multiple tracks, this indicates whether the initial track selection SHOULD prefer an animated track.

NOTE: See the Get Default Selected Track Index algorithm.

10.4. ImageDecodeOptions Interface

dictionary ImageDecodeOptions {
  [EnforceRange] unsigned long frameIndex = 0;
  boolean completeFramesOnly = true;
};

frameIndex, of type unsigned long, defaulting to 0

The index of the frame to decode.

completeFramesOnly, of type boolean, defaulting to true

For Progressive Images, a value of false indicates that the decoder MAY output an image with reduced detail. Each subsequent call to decode() for the same frameIndex will resolve to produce an image with a higher Progressive Image Frame Generation (more image detail) than the previous call, until finally the full-detail image is produced.

If completeFramesOnly is assigned true, or if the image is not a Progressive Image, or if the User Agent does not support progressive decoding for the given image type, calls to decode() will only resolve once the full detail image is decoded.

NOTE: For Progressive Images, setting completeFramesOnly to false can be used to offer users a preview an image that is still being buffered from the network (via the data ReadableStream).

Upon decoding the full detail image, the ImageDecodeResult's complete will be set to true.

10.5. ImageDecodeResult Interface

dictionary ImageDecodeResult {
  required VideoFrame image;
  required boolean complete;
};

image, of type VideoFrame

The decoded image.

complete, of type boolean

Indicates whether image contains the final full-detail output.

NOTE: complete is always true when decode() is invoked with completeFramesOnly set to true.

10.6. ImageTrackList Interface

[Exposed=(Window,DedicatedWorker)]
interface ImageTrackList {
  getter ImageTrack (unsigned long index);

  readonly attribute Promise<undefined> ready;
  readonly attribute unsigned long length;
  readonly attribute long selectedIndex;
  readonly attribute ImageTrack? selectedTrack;
};

10.6.1. Internal Slots

[[ready promise]]

The promise used to signal when the ImageTrackList has been populated with ImageTracks.

NOTE: ImageTrack frameCount can receive subsequent updates until complete is true.

[[track list]]

The list of ImageTracks describe by this ImageTrackList.

[[selected index]]

The index of the selected track in [[track list]]. A value of -1 indicates that no track is selected.

10.6.2. Attributes

ready, of type Promise<undefined>, readonly

The ready getter steps are to return the [[ready promise]].

length, of type unsigned long, readonly

The length getter steps are to return the length of [[track list]].

selectedIndex, of type long, readonly

The selectedIndex getter steps are to return [[selected index]];

selectedTrack, of type ImageTrack, readonly, nullable

The selectedTrack getter steps are:

  1. If [[selected index]] is -1, return null.

  2. Otherwise, return the ImageTrack from [[track list]] at the position indicated by [[selected index]].

10.7. ImageTrack Interface

[Exposed=(Window,DedicatedWorker)]
interface ImageTrack : EventTarget {
  readonly attribute boolean animated;
  readonly attribute unsigned long frameCount;
  readonly attribute unrestricted float repetitionCount;
  attribute EventHandler onchange;
  attribute boolean selected;
};

10.7.1. Internal Slots

[[ImageDecoder]]

The ImageDecoder instance that constructed this ImageTrack.

[[ImageTrackList]]

The ImageTrackList instance that lists this ImageTrack.

[[animated]]

Indicates whether this track contains an animated image with multiple frames.

[[frame count]]

The number of frames in this track.

[[repetition count]]

The number of times the animation is intended to repeat.

[[selected]]

Indicates whether this track is selected for decoding.

10.7.2. Attributes

animated, of type boolean, readonly

The animated getter steps are to return the value of [[animated]].

NOTE: This attribute provides an early indication that frameCount will ultimately exceed 0 for images where the frameCount starts at 0 and later increments as new chunks of the ReadableStream data arrive.

frameCount, of type unsigned long, readonly

The frameCount getter steps are to return the value of [[frame count]].

repetitionCount, of type unrestricted float, readonly

The repetitionCount getter steps are to return the value of [[repetition count]].

onchange, of type EventHandler

An event handler IDL attribute whose event handler event type is change.

selected, of type boolean

The selected getter steps are to return the value of [[selected]].

The selected setter steps are:

  1. If [[ImageDecoder]]'s [[closed]] slot is true, abort these steps.

  2. Let newValue be the given value.

  3. If newValue equals [[selected]], abort these steps.

  4. Assign newValue to [[selected]].

  5. Let parentTrackList be [[ImageTrackList]]

  6. Let oldSelectedIndex be the value of parentTrackList [[selected index]].

  7. If oldSelectedIndex is not -1:

    1. Let oldSelectedTrack be the ImageTrack in parentTrackList [[track list]] at the position of oldSelectedIndex.

    2. Assign false to oldSelectedTrack [[selected]]

  8. If newValue is true, let selectedIndex be the index of this ImageTrack within parentTrackList’s [[track list]]. Otherwise, let selectedIndex be -1.

  9. Assign selectedIndex to parentTrackList [[selected index]].

  10. Run the Reset ImageDecoder algorithm on [[ImageDecoder]].

  11. Queue a control message to [[ImageDecoder]]'s control message queue to update the internal selected track index with selectedIndex.

Running a control message to update the internal selected track index means running these steps:

  1. Assign selectedIndex to [[internal selected track index]].

  2. Remove all entries from [[progressive frame generations]].

10.7.3. Event Summary

change

Fired at the ImageTrack when the frameCount is altered.

11. Resource Reclamation

When resources are constrained, a User Agent MAY proactively reclaim codecs. This is particularly true in the case where hardware codecs are limited, and shared accross web pages or platform apps.

To reclaim a codec, a User Agent MUST run the appropriate close algorithm (amongst Close AudioDecoder, Close AudioEncoder, Close VideoDecoder and Close VideoEncoder) with a QuotaExceededError DOMException.

The rules governing when a codec may be reclaimed depend on whether the codec is an active or inactive codec and/or a background codec.

An active codec is a codec that has received a call to encode(), decode(), configure(), flush() or reset() in the past 10 seconds, or has called its output() callback in the past 10 seconds. Addionally, VideoEncoders are considered active if they are making progress in encoding queued VideoFrames.

NOTE: Encoding large VideoFrames can take more than 10s per frame. The special case for VideoEncoders ensures that they are not reclaimed if more than 10 seconds elapses between each output() callback.

An inactive codec is any codec that does not meet the definition of an active codec.

A background codec is a codec whose ownerDocument (or owner set's Document, for codecs in workers) has a hidden attribute equal to true.

A User Agent MUST only reclaim a codec that is either an inactive codec, a background codec, or both. A User Agent MUST NOT reclaim a codec that is both active and in the foreground, i.e. not a background codec.

Additionally, User Agents MUST NOT reclaim an active background codec if it is:

12. Security Considerations

The primary security impact is that features of this API make it easier for an attacker to exploit vulnerabilities in the underlying platform codecs. Additionally, new abilities to configure and control the codecs MAY allow for new exploits that rely on a specific configuration and/or sequence of control operations.

Platform codecs are historically an internal detail of APIs like HTMLMediaElement, [WEBAUDIO], and [WebRTC]. In this way, it has always been possible to attack the underlying codecs by using malformed media files/streams and invoking the various API control methods.

For example, you can send any stream to a decoder by first wrapping that stream in a media container (e.g. mp4) and setting that as the src of an HTMLMediaElement. You can then cause the underlying video decoder to be reset() by setting a new value for <video>.currentTime.

WebCodecs makes such attacks easier by exposing low level control when inputs are provided and direct access to invoke the codec control methods. This also affords attackers the ability to invoke sequences of control methods that were not previously possible via the higher level APIs.

User Agents SHOULD mitigate this risk by extensively fuzzing their implementation with random inputs and control method invocations. Additionally, User Agents are encouraged to isolate their underlying codecs in processes with restricted privileges (sandbox) as a barrier against successful exploits being able to read user data.

An additional concern is exposing the underlying codecs to input mutation race conditions. Specifically, it SHOULD not be possible for a site to mutate a codec input or output while the underlying codec MAY still be operating on that data. This concern is mitigated by ensuring that input and output interfaces are immutable.

13. Privacy Considerations

The primary privacy impact is an increased ability to fingerprint users by querying for different codec capabilities to establish a codec feature profile. Much of this profile is already exposed by existing APIs. Such profiles are very unlikely to be uniquely identifying, but MAY be used with other metrics to create a fingerprint.

An attacker MAY accumulate a codec feature profile by calling IsConfigSupported() methods with a number of different configuration dictionaries. Similarly, an attacker MAY attempt to configure() a codec with different configuration dictionaries and observe which configurations are accepted.

Attackers MAY also use existing APIs to establish much of the codec feature profile. For example, the [media-capabilities] decodingInfo() API describes what types of decoders are supported and its powerEfficient attribute MAY signal when a decoder uses hardware acceleration. Similarly, the [WebRTC] getCapabilities() API MAY be used to determine what types of encoders are supported and the getStats() API MAY be used to determine when an encoder uses hardware acceleration. WebCodecs will expose some additional information in the form of low level codec features.

A codec feature profile alone is unlikely to be uniquely identifying. Underlying codecs are often implemented entirely in software (be it part of the User Agent binary or part of the operating system), such that all users who run that software will have a common set of capabilities. Additionally, underlying codecs are often implemented with hardware acceleration, but such hardware is mass produced and devices of a particular class and manufacture date (e.g. flagship phones manufactured in 2020) will often have common capabilities. There will be outliers (some users MAY run outdated versions of software codecs or use a rare mix of custom assembled hardware), but most of the time a given codec feature profile is shared by a large group of users.

Segmenting groups of users by codec feature profile still amounts to a bit of entropy that can be combined with other metrics to uniquely identify a user. User Agents MAY partially mitigate this by returning an error whenever a site attempts to exhaustively probe for codec capabilities. Additionally, User Agents MAY implement a "privacy budget", which depletes as authors use WebCodecs and other identifying APIs. Upon exhaustion of the privacy budget, codec capabilities could be reduced to a common baseline or prompt for user approval.

14. Best Practices for Authors Using WebCodecs

This section is non-normative.

While WebCodecs internally operates on background threads, authors working with realtime media or in contended main thread environments are encouraged to ensure their media pipelines operate in worker contexts entirely independent of the main thread where possible. For example, realtime media processing of VideoFrames are generally be done in a worker context.

The main thread has significant potential for high contention and jank that can go unnoticed in development, yet degrade inconsistently across devices and User Agents in the field -- potentially dramatically impacting the end user experience. Ensuring the media pipeline is decoupled from the main thread helps provide a smooth experience for end users.

Authors using the main thread for their media pipeline ought to be sure of their target frame rates, main thread workload, how their application will be embedded, and the class of devices their users will be using.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSS-IMAGES-3]
Tab Atkins Jr.; Elika Etemad; Lea Verou. CSS Images Module Level 3. 17 December 2020. CR. URL: https://www.w3.org/TR/css-images-3/
[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/
[GEOMETRY-1]
Simon Pieters; Chris Harrelson. Geometry Interfaces Module Level 1. 4 December 2018. CR. URL: https://www.w3.org/TR/geometry-1/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[MEDIA-CAPABILITIES]
Mounir Lamouri; Chris Cunningham; Vi Nguyen. Media Capabilities. 11 January 2022. WD. URL: https://www.w3.org/TR/media-capabilities/
[MEDIASTREAM-RECORDING]
Miguel Casas-sanchez; James Barnett; Travis Leithead. MediaStream Recording. 4 June 2021. WD. URL: https://www.w3.org/TR/mediastream-recording/
[MIMESNIFF]
Gordon P. Hemsley. MIME Sniffing Standard. Living Standard. URL: https://mimesniff.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[STREAMS]
Adam Rice; et al. Streams Standard. Living Standard. URL: https://streams.spec.whatwg.org/
[SVG2]
Amelia Bellamy-Royds; et al. Scalable Vector Graphics (SVG) 2. 4 October 2018. CR. URL: https://www.w3.org/TR/SVG2/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/
[WebRTC]
Cullen Jennings; Henrik Boström; Jan-Ivar Bruaroey. WebRTC 1.0: Real-Time Communication Between Browsers. 26 January 2021. REC. URL: https://www.w3.org/TR/webrtc/
[WebRTC-SVC]
Bernard Aboba. Scalable Video Coding (SVC) Extension for WebRTC. 3 March 2022. WD. URL: https://www.w3.org/TR/webrtc-svc/

Informative References

[H.273]
Coding-independent code points for video signal type identification. December 2016. URL: https://www.itu.int/rec/T-REC-H.273/en
[MEDIA-SOURCE]
Matthew Wolenetz; et al. Media Source Extensions™. 17 November 2016. REC. URL: https://www.w3.org/TR/media-source/
[RFC6381]
R. Gellens; D. Singer; P. Frojdh. The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types. August 2011. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc6381
[RFC7231]
R. Fielding, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. June 2014. Proposed Standard. URL: https://httpwg.org/specs/rfc7231.html
[WEBAUDIO]
Paul Adenot; Hongchan Choi. Web Audio API. 17 June 2021. REC. URL: https://www.w3.org/TR/webaudio/
[WEBCODECS-CODEC-REGISTRY]
Chris Cunningham; Paul Adenot; Bernard Aboba. WebCodecs Codec Registry. 21 March 2022. NOTE. URL: https://www.w3.org/TR/webcodecs-codec-registry/

IDL Index

[Exposed=(Window,DedicatedWorker), SecureContext]
interface AudioDecoder {
  constructor(AudioDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long decodeQueueSize;

  undefined configure(AudioDecoderConfig config);
  undefined decode(EncodedAudioChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioDecoderSupport> isConfigSupported(AudioDecoderConfig config);
};

dictionary AudioDecoderInit {
  required AudioDataOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback AudioDataOutputCallback = undefined(AudioData output);

[Exposed=(Window,DedicatedWorker), SecureContext]
interface VideoDecoder {
  constructor(VideoDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long decodeQueueSize;

  undefined configure(VideoDecoderConfig config);
  undefined decode(EncodedVideoChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<VideoDecoderSupport> isConfigSupported(VideoDecoderConfig config);
};

dictionary VideoDecoderInit {
  required VideoFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback VideoFrameOutputCallback = undefined(VideoFrame output);

[Exposed=(Window,DedicatedWorker), SecureContext]
interface AudioEncoder {
  constructor(AudioEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long encodeQueueSize;

  undefined configure(AudioEncoderConfig config);
  undefined encode(AudioData data);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioEncoderSupport> isConfigSupported(AudioEncoderConfig config);
};

dictionary AudioEncoderInit {
  required EncodedAudioChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedAudioChunkOutputCallback =
    undefined (EncodedAudioChunk output,
               optional EncodedAudioChunkMetadata metadata = {});

dictionary EncodedAudioChunkMetadata {
  AudioDecoderConfig decoderConfig;
};

[Exposed=(Window,DedicatedWorker), SecureContext]
interface VideoEncoder {
  constructor(VideoEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute unsigned long encodeQueueSize;

  undefined configure(VideoEncoderConfig config);
  undefined encode(VideoFrame frame, optional VideoEncoderEncodeOptions options = {});
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<VideoEncoderSupport> isConfigSupported(VideoEncoderConfig config);
};

dictionary VideoEncoderInit {
  required EncodedVideoChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedVideoChunkOutputCallback =
    undefined (EncodedVideoChunk chunk,
               optional EncodedVideoChunkMetadata metadata = {});

dictionary EncodedVideoChunkMetadata {
  VideoDecoderConfig decoderConfig;
  SvcOutputMetadata svc;
  BufferSource alphaSideData;
};

dictionary SvcOutputMetadata {
  unsigned long temporalLayerId;
};

dictionary AudioDecoderSupport {
  boolean supported;
  AudioDecoderConfig config;
};

dictionary VideoDecoderSupport {
  boolean supported;
  VideoDecoderConfig config;
};

dictionary AudioEncoderSupport {
  boolean supported;
  AudioEncoderConfig config;
};

dictionary VideoEncoderSupport {
  boolean supported;
  VideoEncoderConfig config;
};

dictionary AudioDecoderConfig {
  required DOMString codec;
  [EnforceRange] required unsigned long sampleRate;
  [EnforceRange] required unsigned long numberOfChannels;
  BufferSource description;
};

dictionary VideoDecoderConfig {
  required DOMString codec;
  BufferSource description;
  [EnforceRange] unsigned long codedWidth;
  [EnforceRange] unsigned long codedHeight;
  [EnforceRange] unsigned long displayAspectWidth;
  [EnforceRange] unsigned long displayAspectHeight;
  VideoColorSpaceInit colorSpace;
  HardwareAcceleration hardwareAcceleration = "no-preference";
  boolean optimizeForLatency;
};

dictionary AudioEncoderConfig {
  required DOMString codec;
  [EnforceRange] unsigned long sampleRate;
  [EnforceRange] unsigned long numberOfChannels;
  [EnforceRange] unsigned long long bitrate;
};

dictionary VideoEncoderConfig {
  required DOMString codec;
  [EnforceRange] required unsigned long width;
  [EnforceRange] required unsigned long height;
  [EnforceRange] unsigned long displayWidth;
  [EnforceRange] unsigned long displayHeight;
  [EnforceRange] unsigned long long bitrate;
  [EnforceRange] double framerate;
  HardwareAcceleration hardwareAcceleration = "no-preference";
  AlphaOption alpha = "discard";
  DOMString scalabilityMode;
  BitrateMode bitrateMode = "variable";
  LatencyMode latencyMode = "quality";
};

enum HardwareAcceleration {
  "no-preference",
  "prefer-hardware",
  "prefer-software",
};

enum AlphaOption {
  "keep",
  "discard",
};

enum LatencyMode {
  "quality",
  "realtime"
};

dictionary VideoEncoderEncodeOptions {
  boolean keyFrame = false;
};

enum CodecState {
  "unconfigured",
  "configured",
  "closed"
};

callback WebCodecsErrorCallback = undefined(DOMException error);

[Exposed=(Window,DedicatedWorker)]
interface EncodedAudioChunk {
  constructor(EncodedAudioChunkInit init);
  readonly attribute EncodedAudioChunkType type;
  readonly attribute long long timestamp;          // microseconds
  readonly attribute unsigned long long? duration; // microseconds
  readonly attribute unsigned long byteLength;

  undefined copyTo([AllowShared] BufferSource destination);
};

dictionary EncodedAudioChunkInit {
  required EncodedAudioChunkType type;
  [EnforceRange] required long long timestamp;    // microseconds
  [EnforceRange] unsigned long long duration;     // microseconds
  required BufferSource data;
};

enum EncodedAudioChunkType {
    "key",
    "delta",
};

[Exposed=(Window,DedicatedWorker)]
interface EncodedVideoChunk {
  constructor(EncodedVideoChunkInit init);
  readonly attribute EncodedVideoChunkType type;
  readonly attribute long long timestamp;             // microseconds
  readonly attribute unsigned long long? duration;    // microseconds
  readonly attribute unsigned long byteLength;

  undefined copyTo([AllowShared] BufferSource destination);
};

dictionary EncodedVideoChunkInit {
  required EncodedVideoChunkType type;
  [EnforceRange] required long long timestamp;        // microseconds
  [EnforceRange] unsigned long long duration;         // microseconds
  required BufferSource data;
};

enum EncodedVideoChunkType {
    "key",
    "delta",
};

[Exposed=(Window,DedicatedWorker), Serializable, Transferable]
interface AudioData {
  constructor(AudioDataInit init);

  readonly attribute AudioSampleFormat? format;
  readonly attribute float sampleRate;
  readonly attribute unsigned long numberOfFrames;
  readonly attribute unsigned long numberOfChannels;
  readonly attribute unsigned long long duration;  // microseconds
  readonly attribute long long timestamp;          // microseconds

  unsigned long allocationSize(AudioDataCopyToOptions options);
  undefined copyTo([AllowShared] BufferSource destination, AudioDataCopyToOptions options);
  AudioData clone();
  undefined close();
};

dictionary AudioDataInit {
  required AudioSampleFormat format;
  required float sampleRate;
  [EnforceRange] required unsigned long numberOfFrames;
  [EnforceRange] required unsigned long numberOfChannels;
  [EnforceRange] required long long timestamp;  // microseconds
  required BufferSource data;
};

dictionary AudioDataCopyToOptions {
  [EnforceRange] required unsigned long planeIndex;
  [EnforceRange] unsigned long frameOffset = 0;
  [EnforceRange] unsigned long frameCount;
  AudioSampleFormat format;
};

enum AudioSampleFormat {
  "u8",
  "s16",
  "s32",
  "f32",
  "u8-planar",
  "s16-planar",
  "s32-planar",
  "f32-planar",
};

[Exposed=(Window,DedicatedWorker), Serializable, Transferable]
interface VideoFrame {
  constructor(CanvasImageSource image, optional VideoFrameInit init = {});
  constructor([AllowShared] BufferSource data, VideoFrameBufferInit init);

  readonly attribute VideoPixelFormat? format;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute DOMRectReadOnly? codedRect;
  readonly attribute DOMRectReadOnly? visibleRect;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;  // microseconds
  readonly attribute long long? timestamp;          // microseconds
  readonly attribute VideoColorSpace colorSpace;

  unsigned long allocationSize(
      optional VideoFrameCopyToOptions options = {});
  Promise<sequence<PlaneLayout>> copyTo(
      [AllowShared] BufferSource destination,
      optional VideoFrameCopyToOptions options = {});
  VideoFrame clone();
  undefined close();
};

dictionary VideoFrameInit {
  unsigned long long duration;  // microseconds
  long long timestamp;          // microseconds
  AlphaOption alpha = "keep";

  // Default matches image. May be used to efficiently crop. Will trigger
  // new computation of displayWidth and displayHeight using image’s pixel
  // aspect ratio unless an explicit displayWidth and displayHeight are given.
  DOMRectInit visibleRect;

  // Default matches image unless visibleRect is provided.
  [EnforceRange] unsigned long displayWidth;
  [EnforceRange] unsigned long displayHeight;
};

dictionary VideoFrameBufferInit {
  required VideoPixelFormat format;
  required [EnforceRange] unsigned long codedWidth;
  required [EnforceRange] unsigned long codedHeight;
  required [EnforceRange] long long timestamp;  // microseconds
  [EnforceRange] unsigned long long duration;  // microseconds

  // Default layout is tightly-packed.
  sequence<PlaneLayout> layout;

  // Default visible rect is coded size positioned at (0,0)
  DOMRectInit visibleRect;

  // Default display dimensions match visibleRect.
  [EnforceRange] unsigned long displayWidth;
  [EnforceRange] unsigned long displayHeight;

  VideoColorSpaceInit colorSpace;
};

dictionary VideoFrameCopyToOptions {
  DOMRectInit rect;
  sequence<PlaneLayout> layout;
};

dictionary PlaneLayout {
  [EnforceRange] required unsigned long offset;
  [EnforceRange] required unsigned long stride;
};

enum VideoPixelFormat {
  // 4:2:0 Y, U, V
  "I420",
  // 4:2:0 Y, U, V, A
  "I420A",
  // 4:2:2 Y, U, V
  "I422",
  // 4:4:4 Y, U, V
  "I444",
  // 4:2:0 Y, UV
  "NV12",
  // 32bpp RGBA
  "RGBA",
  // 32bpp RGBX (opaque)
  "RGBX",
  // 32bpp BGRA
  "BGRA",
  // 32bpp BGRX (opaque)
  "BGRX",
};

[Exposed=(Window,DedicatedWorker)]
interface VideoColorSpace {
  constructor(optional VideoColorSpaceInit init = {});

  readonly attribute VideoColorPrimaries? primaries;
  readonly attribute VideoTransferCharacteristics? transfer;
  readonly attribute VideoMatrixCoefficients? matrix;
  readonly attribute boolean? fullRange;

  [Default] VideoColorSpaceInit toJSON();
};

dictionary VideoColorSpaceInit {
  VideoColorPrimaries primaries;
  VideoTransferCharacteristics transfer;
  VideoMatrixCoefficients matrix;
  boolean fullRange;
};

enum VideoColorPrimaries {
  "bt709",      // BT.709, sRGB
  "bt470bg",    // BT.601 PAL
  "smpte170m",  // BT.601 NTSC
};

enum VideoTransferCharacteristics {
  "bt709",         // BT.709
  "smpte170m",     // BT.601 (functionally the same as bt709)
  "iec61966-2-1",  // sRGB
};

enum VideoMatrixCoefficients {
  "rgb",        // sRGB
  "bt709",      // BT.709
  "bt470bg",    // BT.601 PAL
  "smpte170m",  // BT.601 NTSC (functionally the same as bt470bg)
};

[Exposed=(Window,DedicatedWorker), SecureContext]
interface ImageDecoder {
  constructor(ImageDecoderInit init);

  readonly attribute DOMString type;
  readonly attribute boolean complete;
  readonly attribute Promise<undefined> completed;
  readonly attribute ImageTrackList tracks;

  Promise<ImageDecodeResult> decode(optional ImageDecodeOptions options = {});
  undefined reset();
  undefined close();

  static Promise<boolean> isTypeSupported(DOMString type);
};


typedef (BufferSource or ReadableStream) ImageBufferSource;
dictionary ImageDecoderInit {
  required DOMString type;
  required ImageBufferSource data;
  PremultiplyAlpha premultiplyAlpha = "default";
  ColorSpaceConversion colorSpaceConversion = "default";
  [EnforceRange] unsigned long desiredWidth;
  [EnforceRange] unsigned long desiredHeight;
  boolean preferAnimation;
};


dictionary ImageDecodeOptions {
  [EnforceRange] unsigned long frameIndex = 0;
  boolean completeFramesOnly = true;
};


dictionary ImageDecodeResult {
  required VideoFrame image;
  required boolean complete;
};


[Exposed=(Window,DedicatedWorker)]
interface ImageTrackList {
  getter ImageTrack (unsigned long index);

  readonly attribute Promise<undefined> ready;
  readonly attribute unsigned long length;
  readonly attribute long selectedIndex;
  readonly attribute ImageTrack? selectedTrack;
};


[Exposed=(Window,DedicatedWorker)]
interface ImageTrack : EventTarget {
  readonly attribute boolean animated;
  readonly attribute unsigned long frameCount;
  readonly attribute unrestricted float repetitionCount;
  attribute EventHandler onchange;
  attribute boolean selected;
};


Issues Index

The spec SHOULD provide definitions (and possibly diagrams) for coded size, visible rectangle, and display size. See #166.