Re: A proposal for how we would use the SDP that comes out of the MMUSIC interm from Peter Thatcher on 2015-10-12 (public-webrtc@w3.org from October 2015)

From: Peter Thatcher <pthatcher@google.com>
Date: Mon, 12 Oct 2015 08:37:24 -0700
To: Randell Jesup <randell-ietf@jesup.org>
Cc: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <CAJrXDUGOx9B4tWURbVQCZAhFqU1vfVfauM+R3CijyYj0kCX5Uw@mail.gmail.com>
On Sun, Oct 11, 2015 at 2:22 AM, Randell Jesup <randell-ietf@jesup.org>
wrote:

> On 10/9/2015 1:56 PM, Peter Thatcher wrote:
>
>
> On Fri, Oct 9, 2015 at 9:14 AM, Byron Campen <docfaraday@gmail.com> wrote:
>
>>
>>
>> dictionary RTCRtpEncodingParameters {
>>   double scale;  // Resolution scale
>>   unsigned long rsid;  // RTP Source Stream ID
>>   // ... the rest as-is
>> }
>>
>>    I am skeptical that a resolution scale is the right tool for
>> "full/postage stamp", which is the primary use case for simulcast.
>>  
>> A conferencing service is probably going to want to define the postage
>> stamp as a fixed resolution (and probably framerate), not a scale of the
>> full resolution that can slide around.
>>
>
> Ultimately, it's the client-side Javascript in control of what gets sent
> to the server.  The big question has always been: does the JS specify a
> fixed resolution
>
>  (or height or width) or a relative one?  All of the discussions we've
> had in the past in the WebRTC working group about this have always ended up
> in favor of relative, and not fixed one.   If the JS wants to send a
> specific resolution, it can control that on the track, not via the
> RtpEncodingParameters or the SDP.
>
> As for what conferencing services want, the one I'm very familiar with
> wants a resolution scale.  So at least the desire for a fixed resolution
> isn't universal.
> 
>   And, as I already mentioned, services that do want a fixed resolution
> can send a fixed resolution from the JS via track controls.
>
>
> My position is that the JS (which comes from the conferencing
> server/provider!) should be in control of the resolutions, in the end.  If
> a camera provides CIF, do you really want to end up scaling up the
> "thumbnail" version from 88x72? (assuming the layout you mentioned,
> full:/2:/4 from CIF source).  And if the source is running HD, your
> thumbnail would be transmitted at 1920x1080, 960x540, and 480x270 (which is
> likely a fair bit larger than "thumbnail" in most conf services).  You can
> work around the problem by monitoring the source (modulo that it can change
> resolutions depending on various factors, especially things like a window
> capture), but that adds more than a bit of complexity (pre-roll the source
> to get the resolution and/or have immediate renegotiation, and add a
> continual monitor in a hidden <video> element to pick up on resolution
> changes, more limits in getUserMedia constraints with additional fallback
> paths, etc).
>
> There are several ways to lay out conferencing services; one is bunch of
> fixed sizes, another is adaptive to windowsize and participants (and there
> are more!)  Services that don't have a "gallery of thumbnails" may have a
> fixed area to display people (active talker plus last N talkers perhaps)
> in.  There is an advantage to shipping a constant resolution to each - you
> can avoid client-side scale ups/downs (and wasted bits on the scale down or
> ugly images on scale up).  (Though of the two, scale up, if moderate, isn't
> so bad).  The takeaway here is: don't assume your conferencing service is
> "the" way the feature will be used.
>
> As for track scaling to allow for fixed sizes: That does give the JS
> control.  If we can clone the tracks, and apply scales to each at the track
> level, then attach them for the separate layers instead of using
> "resolutionScale": great.  However, will codecs be able to take advantage
> of encoding N encodings from one source with track cloning and scaling?
> Likely No.  You need N senders fed from one track (and the ability to
> specify that or otherwise determine they have the same input).  So an API
> based on cloning plus track scaling will preempt the ability to use a
> multi-encoding codec.  (And also lock in a bunch of extra CPU/power use,
> etc)
>

No one is suggesting we use multiple RtpSenders.  We talked about that at
the f2f and we all realize it's not a good option.  At the f2f, we also
were in agreement that resolutionScale and track contraints is enough
control for JS to send the simulcast desired with one RtpSender (assuming
we give JS the control to specify multiple encodings in the first place).



>
>
> But even if we did say "RTCRtpEncodingParameters should have a .maxWidth
> and a .maxHeight", which I doubt we will, that's somewhat orthogonal to
> this proposal.
>
>
> Ok by me - just let's not optimize for a particular application instance.
>

My point was that if you think we should have ".maxWidth" and ".maxHeight"
in RTCRtpEncodingParameters, that's a separate proposal.   So, if you want
it, please propose it.


>
>
> And here's the *subset* of the SDP from MMUSIC we could use in the offer
>> (obviously subject to change based on the results of the interim):
>>
>> m=video ...
>> ...
>> a=rsid send 1
>> a=rsid send 2
>> a=rsid send 3
>> a=simulcast rsids=1,2,3
>>
>>    The semantics of this are pretty unclear; what does each of these rids
>> mean? You can say that it is "application dependent" I suppose, but the
>> implementers of conferencing servers are going to want something a little
>> more concrete than that.
>>
>
> If the JS wants to send more information about what semantics it is giving
> to each encoding/rsid, it is more than capable of doing so in its
> signalling to the server.  We don't need to put all signalling into SDP.
> We may choose, for convenience of the JS, to put a minor amount of
> signalling in the JS, like we put the track ID into the SDP.  If so, what
> you're really advocating for is an RSID that's a string instead of an int:
>
>
> A side-channel in the non-SDP signaling (or elsewhere) certainly is fine
> for anything the JS wants both sides to know.  The question is whether the
> SDP as defined is useful in *any* context outside of WebRTC as-is.
> Generally, it should be.  So we need to care about a) does the draft in
> question work as a definition separate from WebRTC?  (The answer doesn't
> *have* to be yes, but likely should be yes - see comment 22).  And "what
> features from this draft will WebRTC endpoints use"?  And "what happens
> when a WebRTC endpoint using this draft talks to a non-WebRTC endpoint -
> how hard is the conversion/translation/etc"? For example, perhaps, Vidyo or
> a Vidyo-like service.
>
> So this may or may not meet the bar set be my questions just above.  If it
> doesn't, it shouldn't need much to meet it I think.
>

As I mentioned in another email, if someone wants to propose that RSID is
a string in MMUSIC, then that would be useful in the context of WebRTC and
outside.  Would could also have a line like "a=rsid send 1 big", where a
label is attached to the RSID line.  I'd be fine with that.



>
>
> --
> Randell Jesup -- rjesup a t mozilla d o t com
> Please please please don't email randell-ietf@jesup.org!  Way too much spam
>
>
Received on Monday, 12 October 2015 15:38:33 UTC