Meeting minutes
Introduction
Bernard: some context for this joint meeting
Bernard: the pandemic has brought a new wave a technology to the mass market
… podcasting (75% of China once/week), video conferencing, video streaming (vod, live streaming)
… ...
… a bunch of these are being compiled in the webrtc-nv use cases and webtransport
… they blur the lines between streaming and realtime
Bernard: lots of SDOs involved, and even lots of different groups involved on these topics in W3C itself
… as a developer, you have to combine all these APIs together to build an app
… the webcodecs sample code illustrate that as they need to use many different of APIs
Bernard: the webrtc worked with WHATWG to make the WHATWG streams work better for media use cases
… converting video frames from WebCodecs into other formats is another need that has emerged
… the support for worker thread for the pipeline is still inconsistent
… separating worker threads for send and receive pipelines is not easy to put in place in practice
Bernard: goal is to identify owners of the identified problems and next steps
Kaz: Web of Things are thinking of using WebRTC for use in video surveillance cameras
… I wonder about security considerations for these types of applications
Youenn: does this need DRM?
Kaz: not necessarily, but the cameras would still want to protect their data
youenn: WebRTC provides hop-by-hop encryption
… complemented by end-to-end encryption done in the encoded content
… that may be something to consider here as well
bernard: please file an issue for use cases
fluffy: it would be important to have support for end-to-end encryption - beyond JS-based encryption, it would be good to re-use our existing infrastructure (certificates management, webauthn) to enable E2EE
… probably anchored into MLS
Bernard: would that be separate from WebRTC then?
Cullen: yes
… we have a much better understanding of what's needed
… we need to make sure our architecture exposes API in the pipeline where the browser doesn't necessarily have access to the underlying data
Bernard: there is a related issue in the webcodecs repo, filed in the DRM context
Dom: Code samples is a good way to identify where the gaps are
ACTION: Set up a place where we can discuss cross-group or cross-API architecture issues
Issue 90/Issue 121: Pluggable codecs
Adopt feedback streams in RTCRtpScriptTransformer #90
Need APIs for using WebCodec with PeerConnection #121
Harald: once you take a media and you transform it, it no longer has the same format, which is problematic for SDP-based negotiations
… this creates issues where the pipeline is configured for the wrong media
… one property that we need from the communications machine for webrtc is the ability to speak truth that the content being sent doesn't conform to the spec
… there is also an issue around packetization/depacketization
… we should figure an API so that once you install a transformation in the pipeline how to handle the data in the communications machinery
… that should operate in terms of codec description
Bernard: this is key to enabling to combine webcodecs & webrtc
… also needed for E2EE
Harald: this problem has indeed arisen already in the context of E2EE
… also, should this be done in terms of packets or in terms of frames
Bernard: in terms of next steps, the idea is to sollicitate proposals?
Youenn: re WebCodecs, the traditional view is that if you're using webcodecs you will use web transport or datachannel to transport the data
… what's the use case for using webcodecs inside something very integrated like peerconnection
… it's good to provide proposals, but I would focus on use cases and requirements
Bernard: the key reason is that niether datachannels nor webtransport will generate low latency
… getting to 100ms can only be achieved with WebRTC
Youenn: I agree with that
… one of the main benefits of WebCodecs is for the use of hardware encoder/decoder
… when you expose that in WebCodecs, we should expose it in PC
… the main benefits would be to finetune encoders/decoders behavior
… not clear what the benefits would be vs native encoder/decoder integration in PC
Harald: in a browser, the code supporting the codecs for the 2 are the same
… but the capabilities in the 2 APIs aren't
… the most intensive control surface would be in WebCodecs rather than in WebRTC
… you control the interface via WebCodecs with an input/output in RTP
… this requires supporting the integration
Bernard: WebCodecs hasn't been for long, but they're already exposing a lot more capabilities than WebRTC
Peter: +1 to Youenn that the same codecs are exposed in both WebRTC & WebCodecs
… but as we add new low level controls - would we want to add them in both places or in a single place?
jib: we have multiple APIs that are starting to overlap in usages
… what might help would be to ask what we prefer this or that API
… I'm hoping we can avoid "this API is better than that API"
… what makes WebCodecs better? if it's more codec support, we can add more codecs to WebRTC
… or for WHATWG stream support in WebTransport, that could be exposed in data channels
… it may not be necessary to break open WebRTC to benefits from the RTP transport benefits
eugene: it's been mentioned that RTCDataChannel and WebTransport are not truly optimized for RTC
… can something be done about it?
… A/V is nto the only application for real-time transport
Bernard: this is under discussion in a number of IETF WGs
… some things can be done but it's still a research problem
… in the WebTransport WG, we discussed interactions with the congestion control algorithms
… not sure how much momentum there would be to change congestion control for datachannels
Harald: datachannels are built on SCTP
… it's now easier to play with the congestion control for that
… understanding congestion control is hard, improving it even harder
… unlikely to be solved in the short term, in IETF
fluffy: in terms of congestion control, ReNo CC can be moved to BBRv2
… in terms of driving interest on WebCodecs, the next generation of codecs driven by AI are providing substantial improvements to this space
… these are proprietary codecs that some would like to deploy at scale in browsers
Bernard: a lot of these technologies aren't that easy to do in WebCodecs
… e.G. AI integration
… pre/post-processing can be done with webcodecs
peter: re congestion control, there are something we could do at the W3C level by exposing more information about what's happening in the transport so that app can adjust what they send
… I've volunteered to help with that
… on the codecs side, re which API to use - even if we allow "bring your own codec" and an RTP Transport, there is still a question of the jitter buffer which is uniquely exposed in WebRTC (not in WebTransport)
CPN: additional control surface in WebCodecs?
Paul: we can add knobs in WebCodecs when they apply to everything
… otherwise, we add knobs in struct in the registry
… e.g. tuning for drawing vs real-life for video
… or packet size for audio
CPN: we welcome issues for additional control surface in the webcodecs repo
youenn: one of the things we discussed was asking higher quality in the area of the video whtere there is e.g. a face
Issue 131: Packetization API
Should we expose packetization level API to RTCRtpScriptTransform? #131
Youenn: rtcpeerconnection used to be a blackbox with an input and output
… webrtc-encoded-transform opens it up a little bit
… it works for some things
… but it doesn't cover all use cases
… we tried to identify use cases where there isn't good support
… e.G. when adding redundancy for audio to make it more reliable over transport
… but adding redundancy expose the risk of having packets being discarded
… Giving more control on the generation packets for video frame would also help in the context of encryption (SPacket)
… RTP packets come with RTP headers that aren't exposed directly - it could be nice to give r/w access to apps e.g. to improve the voice-activity header
… I plan to gather use cases and requirements since there is some interest - hope others will join me
cpn: where would that be done?
youenn: issue #131 is probably the right place
fluffy: it's hard to know how to find where the discussions happen; I'm interested
bernard: this relates to the challenge of keeping track of work happening across all these groups
fluffy: the architectural discussion is whether packetization is part of the pipeline we need to consider in our cross-group coordination
<englishm> fluffy++
cpn: we need to seed our x-greoup repo with a description of where we are and where we want to go
bernard: another possibility would be a workshop to help with these x-groups discussions
jib: I played with the API which illustrated that packetization is both codec and transport specific
… is our quesiton "where should it belong architecturally?"
bernard: great question
jake: has anyone talked with IETF on the need to surface the MTU to the encoder?
youenn: the packetizer has the info on MTU
… the question is whether the packetizer would give the info back to the app
ACTION: Create initial architecture description
Issue 99 & Issue 141: WebCodecs & WebRTC
Encoded frame IDLs share definitions with WebCodecs #99
Relationship to WebCodecs #141
Youenn: webrtc-encoded-transform and webcodecs share some very similar structures, but with differences incl on mutability of some of the data
… it's a little bit inconvenient we have these different models
… but that ship has probably sailed
… for metadata, we're thinking of referring back to webcodecs
… this got support from the webcodecs involved in the WebRTC WG meeting on Monday
Issue 70: WebCodecs & MediaStream transform
Issue 70: WebCodecs & MediaStream transform
Youenn: media capture transform allows to grab frames, modify them, and repackage them as a mediastreamtrack
… this is based on the VideoFrame object
… cameras nowadays are capable to expose information about e.g. face positions
… it's cheap to compute and could be usefully exposed to Web apps
… since this is tightly synchronized data, it would be good to package it with the video frames
… in our dsicussion on Monday, we agreed to have a metadata dictionary in VideoFrame
… WebRTC would expose its set of metadata for e.G. face detection
… we discussed the possibility to have webapp specific metadata e.g. through JS objects
… this could be useful in the context of AR/VR
… the initial PR won't go that far, but it should open the way to that direction
… the first use case will focus on browser-defined metadata
… these would be exposed as the top-level, and there would be a .user sub-object that would need to be serializable
Bernard: this has application beyond the use cases on the slide
… e.g. face detection could help encoders around faces vs blurred background
… we would need to consider whether the encoder should be expected to look at that metadata
youenn: +1
cpn: how common is face detection in the camera level?
eric: it's very common
youenn: we could also expose a face detection blackbox that would transform a camera feed and annotate it with face detection metaata
Ada: (Immersive Web WG co-chair)
… some of of our WG participants are very intereesting in bringing more AR feature to WebRTC
… they do AR by running WASM on the camera feed from WebRTC
… if you were to fire up the various AR systems on the devices, you could surface more metadata e.G. the current position of the device or solid objects in the environments
Youenn: very important we get the AR/VR input on this work
… we're also thinking of exposing requestVideoFrame to expose some of these metadata as well
WebCodec as pass through to application metadata #189
bialpio: also from the IWWG
… we also have a use case to integrate WebXR animation loop with video frames used in XR session
… how would we correlate a video frame from gUM with poses?
youenn: in WebRTC, accessing frames from the camera is via a track processor in a worker
… correlating that with timestamps might work
… would be interesting to see if this is workable with readablestreams
bialpio: in this particular use case, WebXR would be introducing video frames, they could be added as a metatadata into video frames
… some devices are doing pose prediction - this might make it tricky to sync with past video frames
… the question is what we would be the best place to discuss this?
youenn: VideoFrame sounds like the place where this is converging
… so webcodecs repo in the Media WG would seem good
cpn: also worth discussing in the architectural repo
ACTION: Include image capture for AR applications and stream correlation in architecture description
kaz: spatial data wg also looking at sync between location and time
… useful to consider sync with video stream as well
Bernard: the synchronization issue has come up several time
… the videotrackprocess gives you videoframe with timestamps
… how to render that accurately in sync with audio
… it's not always clear that you get the sync that you want
… is that the appropriate API when dealing with all these operations
… what's the right way to render sync'd audio/video?
… this is probably worth some sample code to figure it out
fluffy: one of the architectural issues is that these various systems are working at different timing with different control loops
… how to synchronize them and render them correctly is an architectural issue - it needs the big picture of how it fits together
Bernard: half of the questions I get is in that space
ACTION: capture synchronization issues in the architecture description
paul: this metadata seems to become a central piece
<riju_> Thanks Youenn for bringing this up.. Others if there's something specific to FaceDetection you want to contribute/ask here's an explainer https://
paul: it seems worth focus on this to save trouble down the line
<jholland> apologies, I have to leave early. Thanks for an informative session.
WebCodecs
w3c/webcodecs #198 Emit metadata (SPS,VUI,SEI,...) during decoding
Emit metadata (SPS,VUI,SEI,...) during decoding #198
DanS: with our metadata proposal, the way to do it is relatively clear, the question is really platform support
w3c/webcodecs #371 AudioEncoderConfig.latencyMode (or similar) extension
AudioEncoderConfig.latencyMode (or similar) #371
Fixed audio chunk size support #405
tguilbert: these 2 issues may be the same issue
paul: indeed, probably overlap
… it's mostly something we need to do
… in terms of codecs it will apply to: most codecs are fixed-size frame, but OPUS, @@@ and FLAC (not adapted to real-time)
… adding it to OPUS registry would work
Bernard: 405 is adding ptime - doesn't need to be codec specific?
<Bernard> Issue 405 is about ptime... should be for all codecs, no?
tguilbert: if we expose latency mode on audio encoder, the meaning may differ across codecs
<Bernard> Bernard: Question about "latency mode": for video, the knob doesn't seem to make much difference in latency...
tguilbert: so a per-codec setting might be better than a generic low-latency/high-quality toggle
w3c/webcodecs #270 Support per-frame QP configuration by VideoEncoder extension
Support per-frame QP configuration by VideoEncoder #270
eugene: QP tends to be codec specific
… the advice we're getting from people working with codecs is to not try to use a common denominator approach
… they want finegrained controls to tune this
… so we're working towards a codec specific approach via settings in the registry
Bernard: the cool thing about per-frame QP is that it can have an impact on congestion control
… re latency mode - I've been playing with it on the video side, when I set it to "quality", it doesn't seem to make any difference in latency
… and it generates fewer keyframes, which improve the transport latency
… is that intended?
eugene: this is implementation specific, it's hard to tell without more details on the encoder
… the API is not trying to be creative, it reflects the knobs from the encoder
… please send me links to your code sample and I'll take a look
… it's not the intended behavior
Wrap up
CPN: who are the relevant WGs we need to coordinate with?
Bernard: I had missed the XR folks in my list
Dom: For the architecure we need the WGs more than IGs
… For the media pipeline, not sure WHATWG stream is needed to be involved in the architecture
Bernard: WHATWG streams are a black box, difficult to see how much latency is contributed
Dom: As we go through this and collaborate on code samples, some issues may need to be surfaced to WHATWG streams
… Who do we need at the table to design the pipeline?
… Suggest doing it iteratively, then convince people to contribute to it
… Let's start with those already here, then when we find pieces we don't understand in depths, reach out to the relevant groups
… Workshop with several components. It's an interesting idea to run a workshop. Once we have an initial architecture, we'll know who to invite
… So suggest starting small and iterate. Once we get stuck, extend the conversation
… The real question is to find people committed to look at it. May be harder to find than people willing to work on the solutions
… Who'd be willing to drive it?
Bernard: I'm one person, but don't know all the pieces
Peter: I volunteer, for the parts I know about
Bernard: Would be helpful to have sample code that illustrate problem points
… We have some in WebCodecs, spend time developing samples. Particularly encourage the AR/VR people to try out WebCodecs
… What kinds of sample could would be illustrative?
Dom: I think we should take the use cases and identify integration points. Some may not be possible, and what would it take to make them possible would be an outcome?
ChrisN: Where to host the repo?
Dom: Something like media-pipeline-architecture under w3c github
Bernard: IETF have hackathons, could that be a useful thing? Does w3c do that?
Dom: If we have sponsors and a time and place, it's possible. I have less experience with virtual hackathons
… Could be investigated as part of the workshop idea
Cullen: An IETF hackathon could be welcome. Agree with Dom, virtual hackathons haven't been so successful compared to having people in the room
… Next IETF is London in November. They'd be happy if we showed up to hack there
kaz: Web of things wg has been holding plugfests, with SoftEther VPN to help separate remote collaboration
cpn: we've done a few ad-hoc joint meetings between our 2 groups
… when should we plan our next?
Dom: How long would it need to do some initial work to bring to that meeting?
Bernard: Depends on the topic. A month or two for sample code.
Dom: So perhaps end of October
Bernard: Aligns with IETF