WebCodecs breakout session

18 September 2019


Peter Thatcher
tidoust, tidoust_

Meeting minutes

<koalie> koalie has changed the topic to: https://‌w3c.github.io/‌tpac-breakouts/‌sessions.html

peter: The spec for Streams has examples of potential transform streams with a video decoder. That's great, we'd also want encoder.

We want live streaming, faster-than-realtime encoding, RTC, media tools, gaming.

[showing a diagram]

peter: There needs to be a conversion from a track to a stream of objects that represent audio and video. Can be frames, but terminology can be problematic as term is overloaded.
… Chunk comes out encoded.

alex: Trying to fit the current WebRTC on top of that. Feedback mechanism from transport to decoder for packet loss for instance. How would that fit there?

SteveAnton: It's part of the "change settings"

alex: The application would capture that feedback and would call change setting on the encoder

peter: Correct.

?: What kind of controls are you thinking about? Many of them.

peter: I can show you when we have the IDL.
… We're thinking pretty low-level

youenn: real-time vs. non real-time for settings?

peter: Yes, we'd like this to work for both.

youenn: For real-time encoding, you're also thinking of cases where pipe is too slow.
… 60fps per pipe can only support 50fps.

peter: You'd be able to detect that, but you'd have to adjust

paul: Because this is using Streams, you can take benefit of the backpressure mechanism.
… Depending on what you're doing, you may want to drop or buffer.

youenn: The JS would implement that?

paul: Yes, the JS can have the signal here.
… Not clear whether there are going to be common use cases built on that.

peter: The JS would be fully in control of the buffer that gets passed to the decoder.

[discussion on execution on main thread and/or off the main thread]

youenn: usually media is main thread only.

SteveAnton: transform stream would be transferrable.

peter: If the Stream that comes back out of the back is transferable, then that should work.

?: [question on B-frames configuration. Middle-way implementation can support everything]

[pointing at github issue on timestamp]

peter: Example of decode for live streaming or RTC.
… First thing to do is to convert stream that goes out of the decoder into a pipe.
… [showing the pipeline]

Richard: The codec type, how do you differentiate? Capabilities?

peter: We're going to need some type of capability.
… We don't want separate capabilities for this, the video stack, and so on.
… We would a way to specify hardware/software decoding here, but not clear how to expose that here.
… Going into a decoder, you just have a byte array, and maybe a timestamp.

markw: You could imagine feeding stuff that don't have timestamps
… and get a sequence of frames.

peter: Right now, in the IDL it's a byte array with a timestamp.
… What comes out of the decoder, originally was an opaque data, but we changed it recently.
… You can render without looking at image data. Needed for funny hats and so on, but not for other scenarios.

SteveAnton: We don't really want to solve raw media processing in WebCodecs

markf: In the common implementation where you have a hardware decoder, how do you model the semantics?

peter: What the web app sees would be the same whether it's hardware or software

cyril: Clarification about the output.
… Where would the packaging be done?

SteveAnton: Done by the application.

youenn: More general comment that it is exposing more stuff to the Web. Analysis in terms of security and privacy?

SteveAnton: No section for now. Anything you're thinking about?

youenn: You would expose settings for decoders/encoders. That's new capabilities exposed. Also exposing timing information, which is not easy to get now.

JakeA: Raw processing not in scope. If I get a video with 100 frames, do I get 100 frames out?

SteveAnton: There are a lot of use cases. You can render the video.

Paul: You can draw to a canvas, and get an image data in the end.
… You can take it as a texture in WebGL.

JakeA: I think you mean ImageBitmap which is the opaque one.

Paul: Maybe.
… Allows "I'd like the frame to never leave the GPU if the decoding happens on the GPU", which is very common.
… This works for the playback case and you know where it plays back. The Web browser makes the decision as in the normal case. It's very important to have this level of indirection and not manipulate the frame as an image as if it were an image on the CPU.
… Either you decode in software, or it's already in hardware, and texture conversion will happen as usual.
… Connecting to audio is going to need a number of steps.
… You might need to do asynchronous sample conversion.
… [and other steps]

Richard: Can you do something similar with echo cancellation?

Paul: already done by the Web browser

alex: echo cancellation may be hard, [example of same output and work on Chrome to make things work]

jan-ivar: where does the initial encodedAudio, encodedVideo come from?

peter: From anywhere
… Another example of encode/decode together
… [going through example on slide]
… One of the settings is bitsPerSecond.
… Example of getting some input that is containerized to different audio/video.

SteveAnton: There are lots of settings for codecs.

peter: Compared to MediaRecorder. Cannot give you faster-than-realtime, and gives you less control
… Compared to MSE, it's much more lower level.
… and gives you complete control over buffers
… Compared to WebRTC, again lower-level. Decouple encoding/decoding from transport.
… Compared to WebAssembly, it gives access to hardware encoders/decoders.

Richard: Big power difference in particular.

peter: one of the questions is injecting codecs into existing APIs. These same pieces would allow to inject WASM codecs in that way.

Richard: What about image encoding/decoding?

peter: Currently not in scope. But then lots of new image formats are just video keyframes, so you could use this here.
… Assuming there's an AV1 encoder/decoder, WebCodecs could be used.
… People are interested in using GIF this way, but that's not designed for that.

Mike: The one piece is when you define the codec for reader/writer, it would be nice to associate some meeaning to that string.

SteveAnton: I see what you mean but there aren't standard names.

chcunningam: I saw H.264 streams in examples. In practice it's not that but more HEVC1 followed by profiles, etc. Have you considered this?

peter: Tomorrow we have some WICG time to discuss some issues.

youenn: People might want to start with decode. Maybe the decode path may be simpler in some cases. It might be worth experimenting this first and see whether this is working or not.

peter: That's an interesting question.
… Live streaming stuff is just encode.
… Configuration of a decoder is much simpler than encoder.
… If you want more info, there's a proposal, an explainer, and a WebIDL text file that captures what we're thinking about.
… Any final comment or question?

Richard: I think that is needed.

harald: Having this capability and the shape of the API defined would make a lot of other cases easier to address.

Minutes manually created (not a transcript), formatted by Bert Bos's scribe.perl version Mon Apr 15 13:11:59 2019 UTC, a reimplementation of David Booth's scribe.perl. See history.


Succeeded: i/peter: The spec for Streams has examples/scribe: tidoust, tidoust_

Maybe present: ?, alex, chcunningam, cyril, harald, JakeA, jan-ivar, markf, markw, Mike, paul, peter, Richard, SteveAnton, youenn