Web Real-Time Communications Working Group and Media Working Group and Media and Entertainment Interest Group - Joint meeting

Meeting minutes

Introduction

Slideset: https://lists.w3.org/Archives/Public/www-archive/2022Sep/att-0008/TPAC_2022_WebRTC_WG___Media_WG___MEIG_Joint_Meeting.pdf

[Slide 5]

Bernard: some context for this joint meeting

[Slide 6]

Bernard: the pandemic has brought a new wave a technology to the mass market
… podcasting (75% of China once/week), video conferencing, video streaming (vod, live streaming)
… ...
… a bunch of these are being compiled in the webrtc-nv use cases and webtransport
… they blur the lines between streaming and realtime

Bernard: lots of SDOs involved, and even lots of different groups involved on these topics in W3C itself
… as a developer, you have to combine all these APIs together to build an app
… the webcodecs sample code illustrate that as they need to use many different of APIs

[Slide 12]

[Slide 13]

Bernard: the webrtc worked with WHATWG to make the WHATWG streams work better for media use cases
… converting video frames from WebCodecs into other formats is another need that has emerged
… the support for worker thread for the pipeline is still inconsistent
… separating worker threads for send and receive pipelines is not easy to put in place in practice

[Slide 14]

Bernard: goal is to identify owners of the identified problems and next steps

[Slide 15]

Kaz: Web of Things are thinking of using WebRTC for use in video surveillance cameras
… I wonder about security considerations for these types of applications

Youenn: does this need DRM?

Kaz: not necessarily, but the cameras would still want to protect their data

youenn: WebRTC provides hop-by-hop encryption
… complemented by end-to-end encryption done in the encoded content
… that may be something to consider here as well

bernard: please file an issue for use cases

WebRTC NV Use Cases Repo

fluffy: it would be important to have support for end-to-end encryption - beyond JS-based encryption, it would be good to re-use our existing infrastructure (certificates management, webauthn) to enable E2EE
… probably anchored into MLS

Bernard: would that be separate from WebRTC then?

Cullen: yes
… we have a much better understanding of what's needed
… we need to make sure our architecture exposes API in the pipeline where the browser doesn't necessarily have access to the underlying data

Bernard: there is a related issue in the webcodecs repo, filed in the DRM context

Dom: Code samples is a good way to identify where the gaps are

ACTION: Set up a place where we can discuss cross-group or cross-API architecture issues

Issue 90/Issue 121: Pluggable codecs

[Slide 16]

Adopt feedback streams in RTCRtpScriptTransformer #90

Need APIs for using WebCodec with PeerConnection #121

Harald: once you take a media and you transform it, it no longer has the same format, which is problematic for SDP-based negotiations
… this creates issues where the pipeline is configured for the wrong media
… one property that we need from the communications machine for webrtc is the ability to speak truth that the content being sent doesn't conform to the spec
… there is also an issue around packetization/depacketization
… we should figure an API so that once you install a transformation in the pipeline how to handle the data in the communications machinery
… that should operate in terms of codec description

Bernard: this is key to enabling to combine webcodecs & webrtc
… also needed for E2EE

Harald: this problem has indeed arisen already in the context of E2EE
… also, should this be done in terms of packets or in terms of frames

Bernard: in terms of next steps, the idea is to sollicitate proposals?

Youenn: re WebCodecs, the traditional view is that if you're using webcodecs you will use web transport or datachannel to transport the data
… what's the use case for using webcodecs inside something very integrated like peerconnection
… it's good to provide proposals, but I would focus on use cases and requirements

Bernard: the key reason is that niether datachannels nor webtransport will generate low latency
… getting to 100ms can only be achieved with WebRTC

Youenn: I agree with that
… one of the main benefits of WebCodecs is for the use of hardware encoder/decoder
… when you expose that in WebCodecs, we should expose it in PC
… the main benefits would be to finetune encoders/decoders behavior
… not clear what the benefits would be vs native encoder/decoder integration in PC

Harald: in a browser, the code supporting the codecs for the 2 are the same
… but the capabilities in the 2 APIs aren't
… the most intensive control surface would be in WebCodecs rather than in WebRTC
… you control the interface via WebCodecs with an input/output in RTP
… this requires supporting the integration

Bernard: WebCodecs hasn't been for long, but they're already exposing a lot more capabilities than WebRTC

Peter: +1 to Youenn that the same codecs are exposed in both WebRTC & WebCodecs
… but as we add new low level controls - would we want to add them in both places or in a single place?

jib: we have multiple APIs that are starting to overlap in usages
… what might help would be to ask what we prefer this or that API
… I'm hoping we can avoid "this API is better than that API"
… what makes WebCodecs better? if it's more codec support, we can add more codecs to WebRTC
… or for WHATWG stream support in WebTransport, that could be exposed in data channels
… it may not be necessary to break open WebRTC to benefits from the RTP transport benefits

eugene: it's been mentioned that RTCDataChannel and WebTransport are not truly optimized for RTC
… can something be done about it?
… A/V is nto the only application for real-time transport

Bernard: this is under discussion in a number of IETF WGs
… some things can be done but it's still a research problem
… in the WebTransport WG, we discussed interactions with the congestion control algorithms
… not sure how much momentum there would be to change congestion control for datachannels

Harald: datachannels are built on SCTP
… it's now easier to play with the congestion control for that
… understanding congestion control is hard, improving it even harder
… unlikely to be solved in the short term, in IETF

fluffy: in terms of congestion control, ReNo CC can be moved to BBRv2
… in terms of driving interest on WebCodecs, the next generation of codecs driven by AI are providing substantial improvements to this space
… these are proprietary codecs that some would like to deploy at scale in browsers

Bernard: a lot of these technologies aren't that easy to do in WebCodecs
… e.G. AI integration
… pre/post-processing can be done with webcodecs

peter: re congestion control, there are something we could do at the W3C level by exposing more information about what's happening in the transport so that app can adjust what they send
… I've volunteered to help with that
… on the codecs side, re which API to use - even if we allow "bring your own codec" and an RTP Transport, there is still a question of the jitter buffer which is uniquely exposed in WebRTC (not in WebTransport)

CPN: additional control surface in WebCodecs?

Paul: we can add knobs in WebCodecs when they apply to everything
… otherwise, we add knobs in struct in the registry
… e.g. tuning for drawing vs real-life for video
… or packet size for audio

CPN: we welcome issues for additional control surface in the webcodecs repo

youenn: one of the things we discussed was asking higher quality in the area of the video whtere there is e.g. a face

Issue 131: Packetization API

[Slide 17]

Should we expose packetization level API to RTCRtpScriptTransform? #131

Youenn: rtcpeerconnection used to be a blackbox with an input and output
… webrtc-encoded-transform opens it up a little bit
… it works for some things
… but it doesn't cover all use cases
… we tried to identify use cases where there isn't good support
… e.G. when adding redundancy for audio to make it more reliable over transport
… but adding redundancy expose the risk of having packets being discarded
… Giving more control on the generation packets for video frame would also help in the context of encryption (SPacket)
… RTP packets come with RTP headers that aren't exposed directly - it could be nice to give r/w access to apps e.g. to improve the voice-activity header
… I plan to gather use cases and requirements since there is some interest - hope others will join me

cpn: where would that be done?

youenn: issue #131 is probably the right place

fluffy: it's hard to know how to find where the discussions happen; I'm interested

bernard: this relates to the challenge of keeping track of work happening across all these groups

fluffy: the architectural discussion is whether packetization is part of the pipeline we need to consider in our cross-group coordination

<englishm> fluffy++

cpn: we need to seed our x-greoup repo with a description of where we are and where we want to go

bernard: another possibility would be a workshop to help with these x-groups discussions

jib: I played with the API which illustrated that packetization is both codec and transport specific
… is our quesiton "where should it belong architecturally?"

bernard: great question

jake: has anyone talked with IETF on the need to surface the MTU to the encoder?

youenn: the packetizer has the info on MTU
… the question is whether the packetizer would give the info back to the app

ACTION: Create initial architecture description

Issue 99 & Issue 141: WebCodecs & WebRTC

[Slide 18]

Encoded frame IDLs share definitions with WebCodecs #99

Relationship to WebCodecs #141

Youenn: webrtc-encoded-transform and webcodecs share some very similar structures, but with differences incl on mutability of some of the data
… it's a little bit inconvenient we have these different models
… but that ship has probably sailed
… for metadata, we're thinking of referring back to webcodecs
… this got support from the webcodecs involved in the WebRTC WG meeting on Monday

Issue 70: WebCodecs & MediaStream transform

[Slide 19]

Issue 70: WebCodecs & MediaStream transform

Youenn: media capture transform allows to grab frames, modify them, and repackage them as a mediastreamtrack
… this is based on the VideoFrame object
… cameras nowadays are capable to expose information about e.g. face positions
… it's cheap to compute and could be usefully exposed to Web apps
… since this is tightly synchronized data, it would be good to package it with the video frames
… in our dsicussion on Monday, we agreed to have a metadata dictionary in VideoFrame
… WebRTC would expose its set of metadata for e.G. face detection
… we discussed the possibility to have webapp specific metadata e.g. through JS objects
… this could be useful in the context of AR/VR
… the initial PR won't go that far, but it should open the way to that direction
… the first use case will focus on browser-defined metadata
… these would be exposed as the top-level, and there would be a .user sub-object that would need to be serializable

Bernard: this has application beyond the use cases on the slide
… e.g. face detection could help encoders around faces vs blurred background
… we would need to consider whether the encoder should be expected to look at that metadata

youenn: +1

cpn: how common is face detection in the camera level?

eric: it's very common

youenn: we could also expose a face detection blackbox that would transform a camera feed and annotate it with face detection metaata

Ada: (Immersive Web WG co-chair)
… some of of our WG participants are very intereesting in bringing more AR feature to WebRTC
… they do AR by running WASM on the camera feed from WebRTC
… if you were to fire up the various AR systems on the devices, you could surface more metadata e.G. the current position of the device or solid objects in the environments

Youenn: very important we get the AR/VR input on this work
… we're also thinking of exposing requestVideoFrame to expose some of these metadata as well

WebCodec as pass through to application metadata #189

bialpio: also from the IWWG
… we also have a use case to integrate WebXR animation loop with video frames used in XR session
… how would we correlate a video frame from gUM with poses?

youenn: in WebRTC, accessing frames from the camera is via a track processor in a worker
… correlating that with timestamps might work
… would be interesting to see if this is workable with readablestreams

bialpio: in this particular use case, WebXR would be introducing video frames, they could be added as a metatadata into video frames
… some devices are doing pose prediction - this might make it tricky to sync with past video frames
… the question is what we would be the best place to discuss this?

youenn: VideoFrame sounds like the place where this is converging
… so webcodecs repo in the Media WG would seem good

cpn: also worth discussing in the architectural repo

ACTION: Include image capture for AR applications and stream correlation in architecture description

kaz: spatial data wg also looking at sync between location and time
… useful to consider sync with video stream as well

Bernard: the synchronization issue has come up several time
… the videotrackprocess gives you videoframe with timestamps
… how to render that accurately in sync with audio
… it's not always clear that you get the sync that you want
… is that the appropriate API when dealing with all these operations
… what's the right way to render sync'd audio/video?
… this is probably worth some sample code to figure it out

fluffy: one of the architectural issues is that these various systems are working at different timing with different control loops
… how to synchronize them and render them correctly is an architectural issue - it needs the big picture of how it fits together

Bernard: half of the questions I get is in that space

ACTION: capture synchronization issues in the architecture description

paul: this metadata seems to become a central piece

<riju_> Thanks Youenn for bringing this up.. Others if there's something specific to FaceDetection you want to contribute/ask here's an explainer https://github.com/riju/faceDetection/blob/main/explainer.md

paul: it seems worth focus on this to save trouble down the line

[Slide 20]

<jholland> apologies, I have to leave early. Thanks for an informative session.

WebCodecs

[Slide 21]

w3c/webcodecs #198 Emit metadata (SPS,VUI,SEI,...) during decoding

Emit metadata (SPS,VUI,SEI,...) during decoding #198

DanS: with our metadata proposal, the way to do it is relatively clear, the question is really platform support

w3c/webcodecs #371 AudioEncoderConfig.latencyMode (or similar) extension

AudioEncoderConfig.latencyMode (or similar) #371

Fixed audio chunk size support #405

tguilbert: these 2 issues may be the same issue

paul: indeed, probably overlap
… it's mostly something we need to do
… in terms of codecs it will apply to: most codecs are fixed-size frame, but OPUS, @@@ and FLAC (not adapted to real-time)
… adding it to OPUS registry would work

Bernard: 405 is adding ptime - doesn't need to be codec specific?

<Bernard> Issue 405 is about ptime... should be for all codecs, no?

tguilbert: if we expose latency mode on audio encoder, the meaning may differ across codecs

<Bernard> Bernard: Question about "latency mode": for video, the knob doesn't seem to make much difference in latency...

tguilbert: so a per-codec setting might be better than a generic low-latency/high-quality toggle

w3c/webcodecs #270 Support per-frame QP configuration by VideoEncoder extension

Support per-frame QP configuration by VideoEncoder #270

eugene: QP tends to be codec specific
… the advice we're getting from people working with codecs is to not try to use a common denominator approach
… they want finegrained controls to tune this
… so we're working towards a codec specific approach via settings in the registry

Bernard: the cool thing about per-frame QP is that it can have an impact on congestion control
… re latency mode - I've been playing with it on the video side, when I set it to "quality", it doesn't seem to make any difference in latency
… and it generates fewer keyframes, which improve the transport latency
… is that intended?

eugene: this is implementation specific, it's hard to tell without more details on the encoder
… the API is not trying to be creative, it reflects the knobs from the encoder
… please send me links to your code sample and I'll take a look
… it's not the intended behavior

Wrap up

CPN: who are the relevant WGs we need to coordinate with?

Bernard: I had missed the XR folks in my list

Dom: For the architecure we need the WGs more than IGs
… For the media pipeline, not sure WHATWG stream is needed to be involved in the architecture

Bernard: WHATWG streams are a black box, difficult to see how much latency is contributed

Dom: As we go through this and collaborate on code samples, some issues may need to be surfaced to WHATWG streams
… Who do we need at the table to design the pipeline?
… Suggest doing it iteratively, then convince people to contribute to it
… Let's start with those already here, then when we find pieces we don't understand in depths, reach out to the relevant groups
… Workshop with several components. It's an interesting idea to run a workshop. Once we have an initial architecture, we'll know who to invite
… So suggest starting small and iterate. Once we get stuck, extend the conversation
… The real question is to find people committed to look at it. May be harder to find than people willing to work on the solutions
… Who'd be willing to drive it?

Bernard: I'm one person, but don't know all the pieces

Peter: I volunteer, for the parts I know about

Bernard: Would be helpful to have sample code that illustrate problem points
… We have some in WebCodecs, spend time developing samples. Particularly encourage the AR/VR people to try out WebCodecs
… What kinds of sample could would be illustrative?

Dom: I think we should take the use cases and identify integration points. Some may not be possible, and what would it take to make them possible would be an outcome?

ChrisN: Where to host the repo?

Dom: Something like media-pipeline-architecture under w3c github

Bernard: IETF have hackathons, could that be a useful thing? Does w3c do that?

Dom: If we have sponsors and a time and place, it's possible. I have less experience with virtual hackathons
… Could be investigated as part of the workshop idea

Cullen: An IETF hackathon could be welcome. Agree with Dom, virtual hackathons haven't been so successful compared to having people in the room
… Next IETF is London in November. They'd be happy if we showed up to hack there

kaz: Web of things wg has been holding plugfests, with SoftEther VPN to help separate remote collaboration

cpn: we've done a few ad-hoc joint meetings between our 2 groups
… when should we plan our next?

Dom: How long would it need to do some initial work to bring to that meeting?

Bernard: Depends on the topic. A month or two for sample code.

Dom: So perhaps end of October

Bernard: Aligns with IETF

Web Real-Time Communications Working Group and Media Working Group and Media and Entertainment Interest Group - Joint meeting

15 September 2022

Attendees

Meeting minutes

Introduction

Issue 90/Issue 121: Pluggable codecs

Issue 131: Packetization API

Issue 99 & Issue 141: WebCodecs & WebRTC

Issue 70: WebCodecs & MediaStream transform

WebCodecs

w3c/webcodecs #198 Emit metadata (SPS,VUI,SEI,...) during decoding

w3c/webcodecs #371 AudioEncoderConfig.latencyMode (or similar) extension

w3c/webcodecs #270 Support per-frame QP configuration by VideoEncoder extension

Wrap up

Summary of action items