WebRTC October 2021 Virtual Interim

14 October 2021


BernardA, BrianBaldino, Carine, CullenJennings, Dom, EladAlon, Guido, Harald, Jan-Ivar, PatrickRockhill, TimP, Youenn
bernard, harald, jan-ivar

Meeting minutes

Slideset: https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0000/WEBRTCWG-2021-10-14.pdf

[ Slide 8 ]

Bernard: [reviewing agenda]

The Streams Pipeline Model (Youennf) 🎬

[ Slide 9 ]

[ Slide 10 ]

Youenn: this presentation is about topics and issues we discussed with Jan-Ivar when we explored using Streams for media pipelines
… goal is to identify blocking issues when looking at adopting streams for media pipelines

[ Slide 11 ]

Youenn: media pipelines connect sources with sinks
… sources are readablestreams and sinks writablestreams
… we would want to go from camera to network just using streams
… I'll be focusing only on video pipelines
… and we'll look at threads and intersection between frames and @@@

[ Slide 12 ]

Youenn: dealing with realtime media is better done off the main thread
… in the Web Audio API, the graph is done in the main thread but the processing is done in a dedicated audio thread
… in our case, there is no dedicated thread
… the safest assumption is to asusme the video frames flow where they're set up

[ Slide 13 ]

Youenn: example 1 is a funny hat example using pipeThrough and pipeTo
… it's not clear where the video frames would flow in terms of thread
… the assumption would be that it runs in the same thread where these operations are being called
… example 2 uses a JS transform
… example 3 uses a tee - it makes it very unclear where it would be run, whether the UA would optimize it or not
… so the safest assumption, with streams being a generic mechanism, is to assume same-thread

[ Slide 14 ]

Youenn: one potential related idea is to transfer the stream to a worker
… this requires optimizations that are not standard and hard to expose to Web developers
… the current implementation in Chrome is also not compliant
… it's really hard to predict whether the optimization will kick in or not

[ Slide 15 ]

Youenn: a few examples - example 1 is the typical example where chrome will optimize after a stream transfer
… in example 2 - not clear whether optimization will happen
… in example 3 - also unclear
… and again in example 4, when using non-camera streams
… let's say you transfer an MST to another frame, and then take a stream transfered to a worker - will it be optimized? as a developer, you can never know
… as opposed to Web Audio that gives very clear spec'd guarantees

[ Slide 16 ]

Youenn: streams are a generic tool designed for flexibility - we can't guarantee for performance
… we can give that guarantee with transferable MediaStreamTrack
… this allows to avoid the issues associated with streams when dealing with realtime streams
… additional optimizations can still happen as a bonus, but they're no longer a pre-requisite

[ Slide 17 ]

Youenn: buffering with streams happens at each transform step in the media pipeline
… a typical pipeline is like the one at the top, with greedy processing
… but in cases you don't want to process all frames, e.g. a 1-second old frame might be better skipped
… as does mediastreamtrackgenerator
… the second pipeline illustrates sequential processing which can be beneficial
… I think that's a safer approach

[ Slide 18 ]

Youenn: this is a real issue; videoframe are big and scarce resources
… it's also unclear for web developers what happens; buffering is hidden from them
… issue-1158 is where this is being described - there is probably a solution that will emerge
… but it's unlikely that the default behavior will be the safe behavior for stream of frames

[ Slide 19 ]

Youenn: in general for streams, the idea is that backpressure will deal with buffering
… but for us, some limited buffering might be useful to allow
… but it's hard to deal with WHATWG streams
… the stream queue is opaque to the application by design
… and the queuing strategy is very static, based on the high-water mark
… updating the strategy requires resetting your pipeline
… WHATWG streams might be able to cover the use case, but with complexity

[ Slide 20 ]

Youenn: Tee is the typical way to allow multiple consumers with streams
… tee is part of the design of the API so we should support it

[ Slide 21 ]

Youenn: but we know tee is broken when used with our videoframes stream
… structured clone might solve this, as suggested in issue 1156
… but the default behavior again won't be the right one for us

[ Slide 22 ]

Youenn: but even with structured clone, more changes are needed
… if you apply structureClone, you add hidden buffering
… if the two branches don't consume data at the same pace
… issue 1157 discusses this - so far, no clear solution to this
… streams by design aren't made to drop items

[ Slide 23 ]

Youenn: the last issue I want to discuss is lifetime management
… streams rely on garbage collection, whereas we don't want to rely on GC for videoframe
… there is no easy way to enforce who will close a VideoFrame, making it error prone for Web developers
… there is no API contract, so unclear how to solve this

<hta> +1

Youenn: maybe a dedicated subclass with built-in memory management?
… but no work has started in that direction
… if you look at the pipeline - if you change the pipeline, you need to cancel streams
… these streams might have buffer, which raises the question of GC again

[ Slide 25 ]

Youenn: we need to solve these issues, buffering, tee and life management for VideoFrame
… there has been progress, but more is needed and it's unclear to me how far we can go

[ Slide 27 ]

Youenn: having a high level confidence that these issues can be solved before picking it as our model for designing our APis
… if we select streams, we should extend support for them in existing and new API (e.g. videodecoder/encoder, barcodedetector)
… this doesn't seem to be part of the plans for e.g. WebCodecs

Jan-Ivar: a couple of comments
… on backpressure, I believe with a transformstream and highwatermark of 0 will automatically call backpressure
… wrt dynamic buffering, highwatermark is indeed static, but dynamic buffering can be dealt with a transformstream - but not with a high water mark of 0

Youenn: I'm not optimistic of seeing the problem solved at the source level
… my understanding with life time management is that there is no API contract
… you don't know if close will be called; I like consistency
… memory management would be something we would want to design carefully

<hta> I intended to write q+

Bernard: in the current model where we don't have highwatermark

Youenn: the camera pool might have 10 video frames; with a 5 steps pipeline, 5 frames will be automatically allocated - this leaves only 5 remaining slots which might not be enough
… and some devices might have a smaller buffer of frames
… which will create variable framerates

bernard: the lack of streams integration in webcodecs creates two queues that need to be managed
… and that's not particularly transparent, something you have to keep track of
… this can create significant memory management issues
… wrapping streams is not particularly satisfactory in our case

Harald: a couple of observations
… webcodecs did have a stream-based API for a while; MSTP and MSTG was the reason they got dropped
… we've had very few people reporting problems with these issues
… my impression is that the Stream model has been somewhat confused with the stream shim implementation
… we should have a clean model where issues are moved to implementations, not the model
… wrt tees, I have some experience with reading the CL that added tee to the spec
… worries were expressed that are very similar to ours
… tee is a bad design
… it's fairly easy to write your own JS to get the tee you want, which is quite dependent on your app
… tee doesn't respect the high water mark on down stream - tee is bad
… on the contract point, I think it's natural to say that downstream either has to call close, or pass it to something that will call close on VideoFrame
… we shouldn't depend on upstream to do anything
… we do have an issue with disrupted pipeline - that needs to be solved
… my conclusion is that some of these issues are with the description more than implementations, and some are issues we need to solve but aren't fatal
… like tee - it's not because it's possible to use it badly that we shouldn't use streams
… the streams API is superior to callbacks because it avoid re-doing it all

Youenn: I agree with you that tee is bad - salvaging it will be difficult
… doing one's tee in JS is indeed better - but you'll end up using promise-based callbacks
… but if so, why using streams?
… re other issues not being fatal, I would welcome proposals that address these concerns
… at the moment, I'm not confident we can proceed with confidence that streams is a good enough match
… if they can be solved, I agree that streams are appealing

Jan-Ivar: all these issues filed on github are with the model
… they're not necessarily huge though, and I'm not sure we should block on them
… given that one API is already shipping, I think we need to converge on a standard sooner rather than later

Youenn: I'd be interested in getting a pro/cons comparison of promsise callbacks vs streams

Altnerative Mediacapture-transform API 🎬

[ Slide 30 ]

jib: today, the realtime media pipeline is off main thread today

[ Slide 31 ]

jib: that remains true in webrtc-encoded-transform
… the original chrome APi was on main thread, but we then converged on a standardized API off the main thread
… this was importatn for encoded media, all the more so for aw media

[ Slide 32 ]

jib: the premise here is that the main thread is bad - "overworked & underpaid" as surma qualified during a chrome dev summit in 2019
… surma highlighted webworkers as the solution to that problem
… contention on the main thread is common and unpredictable
… and hard to detect outside of a controlled environment - as opposed to web workers

[ Slide 33 ]

jib: when webcodecs made the decision to expose the API on the main thread, they based this on non-realtime media use cases
… and they strongly encourage to do realtime processing off the main thread

[ Slide 34 ]

jib: we have a non-adopted document "mediacapture-transform" (which has shipped in Chrome 94 despite not being standardized)
… my position is that this proposal is not satisfactory because it exposes realtime pipeline on main thread by default, it doesn't encourage use in workers, relies on non-standardized optimizations
… also, now mediastreamtrack is transferable so this creates new opportunities

[ Slide 35 ]

[ Slide 36 ]

jib: having to ask the main thread all the time to interact with the API makes sense
… it's baked in the assumption of main thread

hta: that's untrue

[ Slide 37 ]

jib: for a processed (e.g. background replacement) self-view use case combined with webtransport
… tee, clone, postMessage(constraints) aren't good approaches
… whereas with track available in a worker, we have a natural API

[ Slide 38 ]

jib: the tunnel semantics of WHATWG streams are not meant to solve creating streams on the wrong realm
… MSTP is built on broken assumptions

[ Slide 39 ]

jib: I have an alternative proposal based on transferable mediastreamtrack
… the proposal focuses on video at the moment
… it encourages use on workers
… it still uses streams, despite youenn's identified issues - which I think we can find solutions for

[ Slide 40 ]

jib: we expose a readable attribute in a worker version of the MediaStreamTrack
… this keeps data off the main thread

[ Slide 41 ]

jib: a more complicated example, read & write
… this is the equivalent of mediastreamtrackgenerator
… we expose only on workers a new VideoTrackSource interface
… the example is a crop example inspired from WebCodecs
… it aligns better with the separate of source and track of the mediacapture-streams spec
… it interacts well with clone and structured cloning

[ Slide 42 ]

jib: for any video processing, you have a self-view (with high framerate) and a low-fps to send on the network
… applyConstraints works well with a peerconnection

[ Slide 43 ]

jib: now with WebTransport, using track cloning
… this shows native downscaling with applyConstraints as a workaround to using tee
… not clear how MSTG would let you do this via a worker

[ Slide 44 ]

jib: benefits: simpler API taking advantage of transferable tracks, with fewer APIs to learn
… doesn't block real-time media pipeline by default
… it has parity with MSTP & MSTG features
… similar in terms of brevity
… doesn't rely on UA optimizations
… and deal with muted sources

[ Slide 45 ]

jib: Bonus: if we want promise callbacks for stream-based, you can use "for await" on the stream

[ Slide 46 ]

jib: if you want more than a readable - this can be done with cloning, but we could also provide dedicated surface

Harald: I kind of like the proposal - it's almost totally equivalent to MSTG and MSTP
… the examples where you have posting messages to the main thread - MSTG and MSTP are designed to be available to the same contexts where tracks are
… MSTG and MSTP will need to be available on workers when MST are
… in terms of quoting Chris Needham on the Web Codecs decision - one of the motivation for main thread is the availability of other APIs on the main thread
… transfering streams as a pipeline between origin and destination context - it assumes the source is main thread, but that's not true
… with a camera, the source of the stream is the camera, not the main thread
… otherwise, I like the shape of the API; it's very similar to what I proposed

jib: I didn't mean to misrepresent these aspects; I see now that MSTG and MSTP are available in workers
… but they're not transferable
… so they would have to be created in the worker?

harald: yes

youenn: re slide 37
… re not using tee because it's bad - I agree, but I hope we should be able to use it
… with the example in slide 37, we lose back pressure
… we might be able to add it back
… in general, in terms of API shape, if we assume that we use streams, this is a good shape that solves some of the issues that I had with the prior proposal
… in general, mediacapture-main has concepts of source and track
… having a JS object that represent the source is a good thing
… similar to a readablestream that can be native or a JS object
… I think we should go there, will make it easier to extend the API and remove edge cases
… I would prefer not to rely on tranferable streams, but instead rely on transferable MST
… which creates a typed way of transferring that can help fulfill the requirements we need

jib: my example may have a mistake on which track to clone - would flipping it around fix backpressure?

youenn: I don't think so
… introducing backpressure on the writablestream might do the trick

harald: backpressure cannot deal with framerate

[ Slide 47 ]

jib: tee can help with backpressure, at the cost of tee problems
… the only thing odd is the "createFrameDropper", a transform stream to drop frames
… clone/applyConstraint is a work around if we can't solve the tee problem

bernard: slide-36 and -37 don't make sense to me

jib: right, I wasn't aware that MSTP and MSTG were be available in workers
… but you could still do this, and the situation would need to be handled
… but Harald is right there is a lot of similarities between two proposals
… the advantage is that we don't need to add a new object

bernard: re slide 33
… datachannels for instance is only available on the main thread
… the lack of consistent API support in workers was part of the challenge

jib: MSTG is a bit of an odd duck - it's also track
… re lack of APIs, you can always transfer tracks back to the main thread when needed
… this doesn't require breaking transferable streams semantics

Harald: if you have to tell some place upstream that you're frame is 30, then backpressure can't carry that information
… backpressure can't tell the difference between "I'm slightly late" and "I want only every other frame"
… we need to be able to carry these signals
… we haven't gotten to it yet

bernard: there may be several stages of reporting that's needed

youenn: this depends on whether sources are push or pull
… consumers need to propagate things up to the source
… backpressure may not always be the right mechanism, but we need to support it
… I also agree we need to fix carry backmessages
… the fact that some of the APIs need to be done in the main thread is sad, but it still moves a lot of the heavy processing to workers, leaving only some of the plumbing on the main thread
… there may be gaps to do good media processing - if so, we should make them available in workers, and this API would help accelerate that transition

Guido: in addition to APIs availability on the main thread, we have first-hand feedback from app developers who WANT to do on main thread for their use cases
… otherwise, the two APIs are equivalent beyond their shape

dom: re use cases on main thread, is it a matter of developer experience?

guido: for certain apps, adding workers in the mix is adding a cost, not a value
… it only adds complexity and extra resource consumption

jib: even there are such use cases, we're trying to protect a realtime media pipeline

Mediacapture Transform API 🎬

[ Slide 50 ]

Harald: [summarizes the API of MSTP and MSTG]

[ Slide 51 ]

Harald: it shipped in Chrome 94, it's actually used in products with new features based on it
… very few problems reported on it

[ Slide 52 ]

Harald: we believe the threading model is something that app developers need to pick, with encouragement from platform developers
… but dictacting it is not the right approach
… Streams are transferable objects
… adding worker availability to MSTG and MSTP is a reasonable addition following the transferability of MST

[ Slide 53 ]

Harald: we need to make sure we have samples that show realistic working real-time operations
… including offthread processing

[ Slide 54 ]

Harald: in terms of improvements, we need better control of adaptation source (backpressure, synchronizing streams, framerate)
… we need to improve experience with streams that don't come from camera - not trivial to synchronize them
… we can work on these aspects once we have agreed on a common base

[ Slide 55 ]

Harald: the two proposals agree on Streams for frame delivery
… difference of opinion for availability on main thread
… the proposals differ on whether the generator or the consumer expands MST or use a separate class
… this can be discussed
… another difference is that MSTG/MSTP is dealing with both audio and video
… where jib is focused on video only
… clear similarities on model, and distinctions that can be derived in specific issues

jib: streams are transferable, but implicit transfer of the source isn't web compatible and we should go away from it

harald: my interpretation is that the stream source is NOT on the main thread - e.g. it's attached to the camera

jib: the optimizations that chrome has been doing is not compliant to spec AFAICT

harald: I haven't been convinced the issue is not with the spec

jib: the fact that this can't be optimized all the times would make this head scratching

harald: I find the stream spec impossible to navigate - happy to get pointers

jib: one of my slide covered the intent of the spec

harald: but it relies on the interpretation that the source is in the main thread

youenn: the algorithms described in the stream spec will need to be run in the context of the stream (not the source)
… there is some leeway in the stream spec to optimize pipethrough et al
… but not for the rest afaict
… Adam Rice (stream editor) suggested a specific optimizable stream might be needed

[ Slide 38 ]

jib: [quoting from the spec]
… it's explicitly about transfer between realms

Transferable streams: the double transfer problem #1063

jib: re exposure to main thread - for webrtc-encoded-transform, we agreed to focus off-thread only

harald: I have a bug open to allow to reenable it on main thread
… I think this was a bad decision

Wrap up and next steps 🎬

[ Slide 55 ]

bernard: I would like to get a sense of the room on the major distinctions between the 2 proposals

jib: would also like to get a sense on whether my proposal is acceptable under what changes

harald: we have 2 potential starting points, I don't see any reason to pick one over the other

youenn: I want to reiterate my concerns about the difficult stream issues that I raised and for which I'm not seeing progress

dom: I think the question is about API shape (readable/VideSource vs MSTP/MSTG)

Cullen: I don't feel strongly about any of these questions, not knowing enough about the impact on implementations
… I would need more background to give an informed opinion

Bernard: So, we will bring these questions to the mailing lists

Dom: ... after discussions with the chairs

Minutes manually created (not a transcript), formatted by scribe.perl version 185 (Thu Dec 2 18:51:55 2021 UTC).