Simulcast V1

I've been involved in a number of recent conversations around simulcast 
for WebRTC, and a several implementors have indicated that it's an 
important feature for the initial release of WebRTC.

As I understand the state of play:

  * Chrome has a form of simulcasting implemented using undocumented SDP
    mangling
  * Firefox has no simulcasting implemented, but will soon
  * The WebRTC 1.0 API has no simulcast-related controls whatsoever
  * The IETF MMUSIC working group is nearing completion on a document
    (draft-ietf-mmusic-sdp-simulcast-01) that allows negotiation of
    simulcast in SDP


I also understand and sympathize with the goal to stop adding any 
non-trivial modifications to the existing WebRTC spec, so that we can 
finally publish an initial version of the document.

At the same time, the vast majority of the use cases that make sense for 
simulcast involve browsers talking to an MCU (or similar server), 
sending multiple encodings per track in the browser-to-MCU direction, 
but receiving only one encoding per track in the MCU-to-browser direction.

This is interesting, because it means that we don't really require any 
controls that indicate the desire for a browser to /receive/ simulcast 
-- all we need is the ability to indicate a willingness to send it. At 
the same time, the MCU will know what resolutions (and other variations) 
it wants to receive, and can inform the browser of this information via SDP.

Based on the foregoing, then, I propose that we instead add a trivial 
control to the existing RTCRtpSender objects. My strawman proposal would 
be something like:


------------------------------------------------------------------------

partial interface RTCRtpSender {
   attribute unsigned short maxSimulcastCount;
};

maxSimulcastCount of type unsigned short

    This attribute controls the number of simulcast streams that will be
    offered for the specific RTCRtpSender. The actual number of streams
    used for this sender will depend on the answer that is passed to
    setRemoteDescription.

------------------------------------------------------------------------

Here's how that would work (I'm going to use simulcast with two 
encodings for my examples, but extrapolating use for more streams than 
that should be obvious).

If the browser is the entity creating the offer, the script driving its 
side of stuff would (for any streams it wants to support simulcast) set:

   rtpSender.maxSimulcastCount = 2;

The SDP that it gets from a subsequent createOffer would include two 
simulcast PTs. Both would have identical imageattrs, indicating the 
range of encodings supported for simulcast. Only one would be supported 
for recv (this is just the resulting m-line):

    m=video 49300 RTP/AVP 97 98
    a=rtpmap:97 H264/90000
    a=rtpmap:98 H264/90000
    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
    a=fmtp:98 profile-level-id=42c00b; max-fs=3600; max-mbps=108000
    a=imageattr:97 send [x=[128:16:1280],y=[72:9:720]] recv [x=[128:16:1280],y=[72:9:720]]
    a=imageattr:98 send [x=[128:16:1280],y=[72:9:720]]
    a=simulcast send 97;98 recv 97


The MCU would then communicate actual desired resolutions using imagattr 
"recv" in its answer:

    m=video 49674 RTP/AVP 97 98
    a=rtpmap:97 H264/90000
    a=rtpmap:98 H264/90000
    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
    a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600
    a=imageattr:97 send [x=[320:16:1280],y=[180:9:720]] recv [x=1280,y=720]
    a=imageattr:98 recv [x=320,y=180]
    a=simulcast recv 97;98 send 97


------------------------------------------------------------------------

Conversely, if the MCU were creating the offer, it would include the 
simulcast resolutions in the offer:

    m=video 49674 RTP/AVP 97 98
    a=rtpmap:97 H264/90000
    a=rtpmap:98 H264/90000
    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
    a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600
    a=imageattr:97 send [x=[320:16:1280],y=[180:9:720]] recv [x=1280,y=720]
    a=imageattr:98 recv [x=320,y=180]
    a=simulcast recv 97;98 send 97


When the receiving JavaScript calls setRemoteDescription, the 
maxSimulcastCount on the corresponding sender(s) would be automatically 
updated according to the number of encodings indicated for each video 
m-line. And, of course, the answer created by createAnswer would 
similarly contain simulcast information matching the number of desired 
encodings from the offer:

    m=video 49300 RTP/AVP 97 98
    a=rtpmap:97 H264/90000
    a=rtpmap:98 H264/90000
    a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
    a=fmtp:98 profile-level-id=42c00b; max-fs=3600; max-mbps=108000
    a=imageattr:97 send [x=1280,y=720] recv [x=[320:16:1280],y=[180:9:720]]
    a=imageattr:98 send [x=320,y=180]
    a=simulcast send 97;98 recv 97


------------------------------------------------------------------------

I think this satisfies a broad range of simulcast use cases with very 
little impact on the 1.0 API. I'll also note that this is intended to be 
a first-pass of simulcast implementation; if we find that other use 
cases arise that would benefit from more granular controls, we could 
easily add them in post-1.0 systems in a way that I believe could easily 
be backwards compatible with the scheme I describe above.

-- 
Adam Roach
Principal Platform Engineer
abr@mozilla.com
+1 650 903 0800 x863

Received on Thursday, 13 August 2015 23:03:44 UTC