<tidoust> scribenick: tidoust
Chris: Low-latency distribution of
media is a very important issue for media companies, so glad to
have this call.
... John will give an introduction to the business case, and then
we'll go into the detail of WebRTC with Peter.
<cpn> scribenick: cpn
slides https://docs.google.com/presentation/d/1eDIUzwMeug_XAsRHOu-mzojBPWjlfOedodfkpKQ3tDk/edit
John: This is something I've been
hearing about in the market, the last few years.
... (Slide 1) Previously, web video was done using plug-ins.
... Holes have been filled, with the tag, MSE for adaptive
streaming, EME for protected content, WebRTC for peer to
peer.
... Traditionally we've used RTMP in Flash, but this still a gap on
the Web.
... (Slide 2) Not too many options to replace that for real time
streaming,
... in particular extremely low latency live streaming. Not like
HLS or recent use of DASH, where there's latency due to segment
caching.
... These can still be 10-60 seconds behind live.
... (Slide 3) I was at IBC recently, hearing about this a lot. Use
cases include live sports, gaming (i.e, gambling on horse racing),
e-sports, video game streaming.
... Low latency is extremely important.
... Breaking news, for when important events are happening.
... And I heard about something unexpected. Someone at IBC
mentioned monitoring of industrial manufacturing processes using
RTMP.
... He's worried about not being able to do that once Flash is gone
from browsers.
... (Slide 4) There are people doing low latency streaming in one
way or another.
... Some solutions are not supported across all browsers, or the
latency isn't low enough.
<igarashi> we should separate the low latency requirement from the one-to-many(multicast) requirement
John: HTTP Chunked transfer with
CMAF.
... LHLS, used by Twitter with Periscope.
... Multicast HTTP/QUIC, a BBC effort.
... SRT is a proposal from Wowza, not implemented in browsers, so
still requires additional client software.
... A number of WebRTC based approaches. My understanding is it's
difficult and costly to scale, because each user requires a direct
connection to the server.
... Project Stream from Google, for low latency gaming.
... I'll hand over to Peter now.
Peter: I should mention that Project Stream is using WebRTC
slides https://docs.google.com/presentation/d/1_xQSoIdN-srjBc-GE_vuQMkxkaer2G-mZBchLPTyY20/edit
Peter: I know how WebRTC works, not
so much about normal video distribution, I have some questions for
you too, so please help me.
... I'm coming at it from a super low latency distribution point of
view.
... (Slide 2) Questions: Can we use WebRTC for video streaming,
does it scale, can we use content protection?
... (Slide 3) We can, there are different approaches, some are more
certain, some more speculative.
... (Slide 4) WebRTC has a data channel. If you're using MSE you
could use SCTP instead of HTTP. The JavaScript receives over SCTP,
passes it to the MSE API
... (Slide 5) Pros: Server push would be a lot easier, with
DataChannel it's quite easy.
... With really low latency, it would allow out-of-order.
... Also need a congestion control algorithm.
... Only available for TCP in the kernel. SCTP runs in user space,
so could be easier.
... Cons: you have to implement ICE, DTLS, and SCTP on your server.
That's a lot of new stuff.
... Don't need to use PeerConnection on the server, but it is
needed on the client.
... There are still issues with latency.
... (Slide 6) Some things are coming that may help. BBR will
eventually be available in SCTP.
... There's a proposal in the WebRTC Working Group to add SCTP data
channels independent of PeerConnection, and also available in
Service Workers.
... (Slide 7, 8) RTP receiver has a buffer that's highly tuned for
low latency.
... Packets go directly from the network to the buffer, not via
JavaScript.
... Low latency, 20ms of audio, one video frame.
... Adapts to network conditions quickly, keeping a steady
predicted bitrate.
... I don't believe you'd need to re-encode to send RTP from your
server.
... Cons: You'll need RTP on the server, SDP and PeerConnection on
the client
... To keep the latency on the buffer low, it will speed up or slow
down audio to increase/decrease the buffer delay.
... In a normal video call, you probably don't notice. Your ears
may not hear it with voice, but it could be more noticeable with
music.
... The buffer is good at working around gaps in the timeline. If
there's a 40ms gap in the timeline, it can conceal it.
... For larger gaps, you get a robot-like voice that doesn't sound
good. But it's better than dropping to silence.
... WebRTC has no concept of rewind or history. So if you want to
pause to watch content delayed or timeshifted, there's no way to do
that with WebRTC.
... There's a feature in to change the playback rate. It's not
there with WebRTC.
... You need to generate keyframes on demand on the server. The RTP
client will send a signal to the server "I need a keyframe right
now".
... You need to be able to adapt the bitrate on demand. You can't
send above that bitrate without introducing queueing.
... (Slide 9) Some things are coming that may help: WebRTC without
PeerConnection and control of the jitter buffer delay.
... The Chrome implementation allows for this, but it's not exposed
in the Web API, could be added as needed.
... (Slide 10, 11) One thing that's easier with QUIC than SCTP, as
it already has BBR, is bandwidth estimation to avoid queuing.
... It's not in any browser yet, it's an editor's draft. being
implemneted in Chrome, behind an opt-in flag.
... (Slide 12, 13) Some speculative ideas, proposed in the Working
Group, exposing a very low level decoder inside WebRTC. This would
make the buffers inside MSE much more under your control.
... If there was a low level decoder, you'd be able to control
everything to your app-specific needs.
... Cons: This doesn't exist yet, you'd have to write a JS / WASM
library. And no-one has an idea of how EME or DRM would work into
that.
... (Slide 14, 15) Another speculative idea, can be done today, is
writing codecs in WASM. Requires writing a lot of code.
... Could be fine for audio, but a bigger issue for video.
... Also no access to DRM using this approach.
... (Slide 16) Technical gaps.
... With SCTP and MSE, have difficulty putting this on your
server.
... RTP is difficult to implement on the server, audio acceleration
/ concealment, no rewind, also DRM.
... (Slide 17) Does it scale?
... If using the MSE approach, but replacing with a WebRTC
transport, a limiting factor is adding ICE.
... RTP is rather stateful.
... You could parse the container formats you already have, and
turn them into RTP packets.
... And you need to be able to generate key frames on demand.
... (Slide 18) ICE on the server.
... We think of WebRTC as a peer to peer protocol.
... ICE also works client/server. There's a mode, ICE Lite, that's
easy to implement on a server.
... All you have to do is ack some packets. It's a fairly simple
thing, a couple hundred lines of code.
... It needs a shared secret and some negotiation, share the secret
among your servers.
... On the other hand, QUIC may not require ICE.
... (Slide 19) SRTP on the server.
... Each server needs ot know the SRTP crypto key and server
parameters.
... Divide the media into small chunks, 20ms for audio, single
video frames.
... (Slide 20) Nothing aorund WebRTC changes the multicast story.
They all do a per-client crypto handshake, so packets can't be
multicast.
... RTP has a mode where the crypto key can be shared, RTP SDES.
This mode has been banned by the WebRTC Working Group.
... Chrome has it, and plans to remove it eventually.
... If you're doing multicast you'll need a proxy or satellite
server, do multicast to there, then per-client crypto from there to
the client.
... (Slide 21) What about content protection?
... If using QUIC / SCTP DataChannel with MSE, there's no
change.
... But it's not possible with RTP.
... (Slide 22) MSE vs RtpReceiver.
... I read through the MSE code in Chromium and the MSE spec. I
never used MSE, so don't know how it's used in real life.
... What would I do to get it as low latency as possible?
... RtpReceiver is the transport, buffer and decoder.
... A typical HTTP / MSE implementation has similar
structure.
... Can you theoretically get the same latency from MSE?
... (Slide 23) HTTP with TCP introduces head of line blocking
issues.
... With containerized media, you need to wait for chunks to build
up, which adds latency.
... Buffers can be increased up to 3000ms audio, 9 frames of
video.
... There's no way to say that you want to interpolate.
... (Slide 24) How to make MSE better?
... You could use QUIC. There's no way to push QUIC streams into
the browser.
... To work similar to WebRTC, you want to push frames into the MSE
buffer, 20ms audio.
... I believe you can do that, something hacky, per-frame WebM,
would be nice if you could inject a single frame into the MSE
buffer.
... Would be nice if the buffer had controls for delay and
interpolation.
... (Slide 25) What's needed? RTCQuicTransport is in
progress.
... If we added an appendFrame method, we could add the h.264 or
VP8 payload with timestamps.
... Limits on how much audio to expect, 20ms to 3000ms, would be
useful to adjust these.
... Would be nice to set the interpolation behaviour, or allow the
video to skip ahead. Keep the audio going, the video looks frozen,
then resume.
... I have questions for you in the IG, but I'll take some
questions from you first.
Nigel: You said that a con of RTP is
that it doesn't offer rewind and history. I thought that was a
feature of WebRTC.
... Is there in general a way to specify a rewind point with
WebRTC?
Peter: No, as soon as something it's
played, it moves on.
... If you want the ability to rewind, it's something you'd have to
give up if using an RTP receiver.
... There are different ways to use WebRTC. If you take chunks from
the DataChannel you can keep the rewind.
<Zakim> kaz, you wanted to ask if it's OK to distribute the slides of John and Peter to the MEIG public list (and add the link to the minutes)
Kaz: Thank you for a great
presentation, John and Peter.
... Can we share the slides publicly?
Peter: Yes
Kaz: I wonder about if it's possible to synchronize multiple video streams and text captions using this?
Peter: WebRTC gives you tracks. If
you put these in the same media stream, theoretically they'll be
synchronized.
... I know you can sync audio and video, but two videos in two
separate tags.
<kaz> scribenick: kaz
Chris: This is something we've looked
at at the BBC, using WebRTC for a vision mixing application, using
separate streams.
... There was lack of synchronization for multiple video
streams.
<cpn> https://www.bbc.co.uk/rd/blog/2017-07-compositing-mixing-video-browser
<cpn> https://www.bbc.co.uk/rd/blog/2017-07-synchronising-media-browser
<cpn> scribenick: cpn
Francois: Thank you for describing
the QUIC and MSE solution. It strikes me that this could be the
simplest solution from a media perspective: QUIC is coming, MSE is
here, so combine these.
... What's the standardization status of the QUIC API? Can the
M&E IG help?
Peter: It's in a funny state. It started in the ORTC Working Group, the incubation group. Google and Microsoft have been heavily involved.
<kaz> QUIC API for WebRTC
Peter: We're implementing this inside
of Chrome.
... At the last F2F meeting, the WebRTC Working Group was undecided
whether to adopt it inside the Working Group.
... I came away with the action item to keep bringing this up.
We're incubating in the ORTC Working Group for sure.
Francois: Process-wise, the M&E IG could voice its view, to support the work, with use cases.
Peter: That probably would be helpful. Working Group members who were last supportive were asking for use cases.
Francois: Sounds like a good action
for the Interest Group.
... I am interested in the interpolation behaviour idea for MSE.
We're researching scenarios for synchronization, as Chris
mentioned, between videos.
... For example, an animation where if the video stops we don't
care. It could be attached to the HTMLMediaElement, as a behaviour
we want for regular video too, not necessarily specific to MSE.
Peter: That's an excellent point. You could probably do the same thing for the other methods I proposed.
Igarahsi: Thank you for the
presentation. I am wondering if there is any way to bind WebRTC
with EME, using DataChannel,
... so that encrpyted media frames are decrypted using EME?
Peter: If you're using SRTP or QUIC DataChannel could deliver encrypted media. If you set the keys, the media could then be decrypted.
Igarashi: I am wondering if you do this at a video or audio frame level (per-frame chunks)? So just using EME without MSE.
Peter: An EME implementation may have
a lower bound on the chunk size.
... I don't know if the EME mechanism would allow for chunks that
small.
Igarashi: One issue with MSE is that frame handling should be handled by the UA, not the web app. That's behind my idea to use EME directly.
Peter: Yes, if the JS gets blocked,
the video will stall.
... It's a reason why this hasn't been adopted. It's not clear what
the performance would be. We had an idea to use Worklets, like in
Web Audio, also Houdini, the CSS pipeline.
... If you're tring to do low latency, you do want to ensure the
JavaScript doesn't pause.
Chris_Poole: I want to lend my
support to the MSE based approach. It enables latency in the 4-5
second range.
... For broadcasting, we don't need such ultra low latency. The
idea to play through a lack of data, or conceal, these would be
very helpful there.
<Will_Law> requests queue
Chris_Poole: Also, on per-frame
chunks, with ISO BMFF you can put individual frames in
chunks.
... If you're aiming for compression efficiency and video quality,
you do long-GOP encoding.
... Generally people are doing large chunks.
... But if you value low latency, you'll be doing forward
prediction. Nothing stops you doing individual frames. With WebM or
ISO BMFF you'll getting the timestamps.
... Is API support needed, or do you get that anyway?
Peter: I was wondering that, so thank you for confirming.
<Will_Law> sorry, my audio seems not to be working. Will dial in
Martin: A question we have here is about scalability. Imagine an event like the FIFA world championship, what challenges will we face? Can we have thousands or more concurrent users?
Peter: I don't see why not. For each
client, you'll need a server that packetizes and encrypts the
contents. So will need a lot of servers.
... The closer the server is to the client geographically, the
better. Works better with lower round-trip times.
... There's nothing inherent to block scalability, other than
needing lots of servers.
Martin: The servers would need to support the protocols you described.
Peter: Yes, the front end servers would need to get the content from somewhere, then packetize and send to clients.
Will: Back to the thread about MSE. I
like the idea of QUIC getting data quickly to the client.
... It seems to me that putting JavaScript in the way and using
appendFrame doesn't seem like the way to go.
... Can we hook up the MSE to a stream, and remove JavaScript from
this?
Peter: Yes, a colleague of mine
brought that up. I couldn't find appendStream in the code or the
spec.
... Where does the stream come from on the QUIC side? A WHATWG
ReadableStream.
... If you're doing something where you put all the media into a
QUIC stream, We could go into how to map media into QUIC
streams.
... We have head-of-line blocking. But you wouldn't know what
serialization there is. You could say this is one big chunk of WebM
or MP4.
... How are you going to take each individual stream and plug this
down into MSE?
<tidoust> [FWIW, appendStream was removed from MSE before publication, because streams were not ready at the time, seems nothing is blocking re-introduction of appendStream now: https://github.com/w3c/media-source/issues/14]
Peter: What you're describing could
work, but not if you're using many QUIC streams, or if you're doing
something fancier over the wire using the QUIC transport.
... It would only work in some specific scenarios, and not for
everything.
<Zakim> tidoust, you wanted to wonder we can characterize "lots of servers" in comparison to current situation
Francois: Back to the scalability discussion, can we characterize the "lots of servers". Today, if you use a DASH or HLS based infrastructure, would you need many more servers than that? How does it compare in terms of processing power?
Peter: I'm not familiar with those
kinds of deployments, so it's hard for me to say.
... It's not CPU intensive at all. Yuo may not have the same crypto
speed for SRTP encryption.
Peter: Is MSE widely adopted?
Chris: Yes absolutely. It's the building block for adaptive streaming. Libraries such as DASH.js build on top of it.
Peter: Seems like people are
interested in interpolation, but appendFrame not so much.
... How about liveness and delays? Maybe different heuristics.
Will: I think this is important. I'd like to see the ability to set a target latency. And then for the media element to adjust the playback rate.
Peter: I can talk to my coworkers who work on MSE, and tell them that people want better live stream support.
Igarashi: I'm interested in using EME directly, and concerned about JavaScript performance of appending frames.
Peter: What about the server protocols, is this a pain?
Will: I can answer on behalf of
Akamai. We've deployed QUIC. It uses twice as much CPU per bit
delivered. A cost issue, maybe also an optimisation issue.
... We're in favour of QUIC, see no problems with ICE.
Peter: Next question is around the ability to alter the media, changing bitrate.
Martin: I think this is not very practical from a video delivery point of view. If you do DASH or HLS, you've decided on specific bitrates, the keyframe are kind of fixed. This would require re-encoding of the video which would be costly.
Peter: What kind of key-frame frequency is typical for live streams?
Martin: In HLS, recommendation is 10
seconds, 6, down to 2 seconds. Each segment starts with an I-frame,
so you can join every 2nd second.
... You can't join the stream at any random point.
Igarashi: To support quick channel-change, we should support 1 second.
Peter: About fixed bit rates, it's
surprising that you want this. It's not going to work at all with
live streams in varying network conditions.
... What if the network goes down?
Martin: Of course, you adapt, but with different bitrates and adaptation sets. You design the DASH or HLS stream in advance to cope with different network conditions, and switch between them.
Peter: That would work fine with differerent network conditions. You can select the quality level, video conference systems work like this already.
Chris_Poole: Adaptation between
bitrates, we have a few options for seamless changes, could be
tricky with ultra-low latency, so you don't have much opportunity
to change without introducing a glitch.
... The other thing to note is that the potential for adaptation is
normally linked to P-frame and I-frame locations. Today, you can't
do that adaptation at any point in the stream. You have limited
scope for doing something before needing error concealment.
Peter: There is the option of using scalable codecs. VP9 has different quality or resolution layers, so you can drop down wihtout requiring a key frame. It's used in video conference systems for just this. Are any of you using this?
Will: No. In a video conference system, there's controlled endpoint hardware. There isn't a universally available solution we can work with.
Peter: It's in VP9, and also in AV1. Is it because you're in the h.264/h.265 world?
Will: Yes
<Zakim> kaz, you wanted to to ask if we want to have more discussion with the WebRTC WG during TPAC
Kaz: Given the discussion here,
should we continue this at TPAC?
... Could you come to the M&E IG meeting on Monday?
Peter: I'll be at TPAC, happy to talk to people there, will be in Working Group meetings, happy to have dinner or lunch.
<kaz> Chris: Thanks and looking forward to seeing many of you at TPAC!
<kaz> [adjourned]