Geek Week 2022 explorations
Dominique Hazaël-MassieuxWebCodecs defines raw media interfaces and in particular:
VideoFrame
Connection with WebRTC through the MediaStreamTrack Insertable Media Processing using Streams specification, which defines VideoTrackGenerator
& MediaStreamTrackProcessor
Generate a stream of VideoFrame
objects.
VideoFrame
Manipulate the bytes exposed by the VideoFrame
object, using JavaScript, WebAssembly, WebGPU, WebNN...
A stream of processed VideoFrame
objects.
VTG
= VideoTrackGenerator
MSTP
= MediaStreamTrackProcessor
TS
= TransformStream
WT
= WebTransport
JS
= JavaScript
VideoFrame
stream
getUserMedia()
VTG
MSTP
VideoEncoder
TS
VideoDecoder
TS
WTSendStream
WTReceiveStream
<video>
<canvas>
Plug | Construct |
---|---|
None | VideoFrame
(in WebCodecs) |
Stream of VideoFrame
(used by VTG , MSTP ) |
|
Stream of encoded chunks
( VideoEncoder + TS ) |
|
MediaStreamTrack
(in WebRTC) |
From scratch:
VideoFrame
stream
From camera:
getUserMedia()
MSTP
From a received stream:
WTReceiveStream
VideoDecoder
TS
Notes:
- RTCDataChannel
could be used as well.
- Also possible from containerized media but no API to demux the media to create a stream of encoded chunks, so up to the application.
Idea is to use a TransformStream
that takes a stream of video frames as input and produces another stream of video frames as output.
TransformStream
Processing can be chained if needed:
TransformStream
TransformStream
TransformStream
Actual transformation logic can use pure JavaScript, WebGPU, WebNN, WebAssembly, etc.
Render to display:
VTG
<video>
Or:
<canvas>
... but implementing a full video player in JavaScript is no easy task! (e.g. sync, accessibility, controls)
Send somewhere:
VideoEncoder
TS
WTSendStream
Note: RTCDataChannel
could be used as well
Demo: https://tidoust.github.io/media-tests/
Code: https://github.com/tidoust/media-tests/
VideoFrame
, timestamp
may be used to track the frame through the processing pipeline.requestVideoFrameCallback
does not report the frame's timestamp
.requestVideoFrameCallback
to detect frame changes, then copy rendered video frames onto a canvas, and analyze pixels to identify the frame. The color overlay encodes the timestamp.
Timer | Frames | Min. | Max. | Avg. | Median |
---|---|---|---|---|---|
overlay | 104 | 4 | 45 | 10 | 7 |
encoding | 104 | 15 | 245 | 20 | 17 |
decoding | 104 | 1 | 23 | 1 | 1 |
queued | 104 | 0 | 288 | 26 | 13 |
end2end | 104 | 24 | 338 | 58 | 151 |
displayed during | 101 | 20 | 59 | 38 | 39 |
Times in milliseconds
Interconnections between APIs is not straightforward:
VideoFrame
VideoFrame
closure across workersMedia WG and WebRTC WG have started to discuss joint architectural considerations for the evolution of the media pipeline on the web.
Repository:
https://github.com/w3c/media-pipeline-arch
A report from Dom and François on their explorations on video processing during Geek Week 2022.