WebRTC in live media production
Presenter: Sacha Guddoy (Grabyo)
Duration: 6 minutes
Slides & video
Keyboard shortcuts in the video player
- Play/pause: space
- Increase volume: up arrow
- Decrease volume: down arrow
- Seek forward: right arrow
- Seek backward: left arrow
- Captions on/off: C
- Fullscreen on/off: F
- Mute/unmute: M
- Seek to 0%, 10%… 90%: 0-9
Hello, there. My name is Sacha Guddoy and I'm the lead Front End Engineer at Grabyo.
Grabyo is a SaaS platform which delivers tools for live broadcast production to commercial broadcasters.
Some of our offerings include live broadcast production, video editing, clipping from live streams, and publishing to various endpoints.
At Grabyo, we use WebRTC in our live production offering.
The way that this works is the user will see, in their web browser, they'll have multiple live streams coming in and they will be able to monitor these live streams and choose which ones are being output to their broadcast endpoint.
We also have multiple sidecar applications and multi-window workflows. For example, popping out a player.
One of the challenges that we face is the synchronization of streams.
What we'd like to do is to have multiple live feeds from different cameras coming in and to be able to switch between them. But if our live feeds aren't perfectly in sync with each other, if those two cameras aren't perfectly in sync, it's going to be very noticeable when you switch between them that there's some delay between them and it's jarring for the viewer.
When you have multiple WebRTC streams on your page, keeping those all in sync is not necessarily the most straightforward thing. The browser will do its best, but they aren't tied together.
So, for example, if you are cutting between different cameras, you want those camera feeds to be showing exactly at the same time. If you're doing multi-party chat, you don't want latency.
The synchronization aspect is pretty difficult. Network conditions can be unpredictable and you don't really have a way of correcting for that or to reconcile with synchronization of streams on the client side.
If there were embedded time stamps on the streams, then you potentially could do that. Using something lower level, such as Web Transport, may allow you to do that and may even be a more performant technology for this use case than WebRTC anyway.
One pattern that we've been using recently is splitting workflows into different browser contexts. Being able to create a pop out window which allows you to monitor a specific video in one window and be able to monitor everything else in another window.
Or to be able to edit your audio in one window and monitor your videos in another window. In that last scenario, you're going to be having two instances in your browser of that same WebRTC connection. If I wanted to have the video of my live stream in one window because that's my video control suite and I want to have the same live stream in another window because that's my audio control suite, then I have to have two WebRTC connections. That's twice the performance overhead, twice the bandwidth, et cetera.
We think about the way that Shared WebWorkers work, the SharedWorker interface, that allows multiple contexts to share whatever is happening in that Worker. If we could do the exact same thing with WebRTC, that would significantly reduce our performance overhead.
And these kinds of workflows are really powerful for professional desktop applications. If you are a video editor using some kind of NLE, you probably want as much screen space as you want for your timeline, your monitors, your asset bins, et cetera. Being able to kind of split different parts of our interfaces out into different windows so the user can position them as they see fit is really, really helpful.
What are the advantages of doing this?
There's obviously less resource consumption because you only have that one connection.
There's inherent synchronization between the contexts because the data is coming from that same connection. Now this is probably possible using shared workers and Web Transport. But browser support for that is not particularly great.
Accuracy is also important in this technology. More accurate time stamps might help us synchronize those streams together. And also it helps synchronize other things.
For example, synchronize an overlay in the DOM. Or a notification in the DOM.
Capability to encode and decode data from a WebRTC connection would also be really useful.
Right now, the API surface of the WebRTC connection is pretty minimal and it doesn't expose much useful information to us. Being able to put our own code in that pipeline would allow us to do all this interesting stuff.
Say, for example, presenting a particular frame when we want to present it. Say, for example, synchronizing audio and video from different browser windows.
We could know exactly which frame is being presented before they even get rendered to the DOM, so we can prepare our DOM elements which would synchronize to that. We could potentially send over proprietary error correction data to smooth over any link failures with picture quality as a priority.
And going back the other way, you could do stuff like funny hats. You could do chroma keying. You could do machine learning analysis and do stuff like background blur or embedding metadata.
A lot of this can be solved using the MediaStreamTrack Insertable Streams feature. That is still in a draft specification and I'd really love to see more browser support for that.
Thank you for watching. I hope you enjoyed hearing about our use cases and I'm looking forward to hearing any questions and feedback. Thanks, bye.
Interested in sponsoring the workshop?
Please check the sponsorship package.