W3C

Media WG Teleconference

28 September 2021

Attendees

Present
Bernard Aboba, Chris Needham, Cyril Concolato, Eric Carlson, Francois Daoust, Jan-Ivar Bruaroey, Jer Noble, Mark Watson, Matt Wolenetz, Peng Liu, Yuhao Fu
Chair
Chris, Jer
Scribe
cpn

Meeting minutes

TPAC planning: WebRTC WG / Media WG joint meeting

Chris: Topics for WebRTC joint meeting?

Jer: Media capabilities and harmonising with RTC

Jan-Ivar: Main topic is MediaCapture Transform or its replacement, about exposing real-time audio and video between MediaStreamTrack and JS
… Now it seems will be Streams based. Some issues open around that
… Specific to WebCodecs are video frame lifetime, close and clone methods
… GC cleanup. If you do a tee to branch a stream into two, there's no cloning by default
… When one branch closes the video frame it stops working from the other. Want to discuss also audio issues
… We invited the Audio WG to also join

Bernard: TPAC is a good time for overviews or relationships between things
… So the overall direction we're going with media, using Streams to create pipelines to process media
… Future of content protection. S-Frame in WebRTC is an encrypted content form, doesn't work with WebCodecs
… Questions about transports. Looking at the overview, what's missing, how does it all fit together for developers?
… We get lots of questions from developers. How to render: WebGPU, Canvas, etc. Any recommendations?
… Coherent view on Workers

Chris: Anything on WebTransport specifically?

Bernard: WT supports workers, RTCDataChannel doesn't, but that's proposed, it's an extension spec. PeerConnection doesn't support workers
… So lots of different things, good to look at the overall picture. What are we not doing?

Jan-Ivar: Opportunity to look at the overall picture, Alternative to stand up an alternative to WebRTC using the other APIs. Looking at how that fits together and what does and doesn't work

Bernard: Developers ask how all the parts fit together. Could present what we think the overview is, see where there's agreement. Does it make sense?

Jan-Ivar: We'll make slides in WebRTC WG. Media WG can contribute to that
… A question could be: If a source and sink are sink-based, why not also the bit in the middle?
… Why use streams instead of promises for media capture transform?
… Why isn't encode promise based?
… The streams model would be able to handle it.

Bernard: We could present the story, issues, questions people ask. Streaming and RTC are converging, low latency streaming
… Using EME with WebRTC doesn't work today. Could put some examples in

Jer: Can you come up with the overview?

Bernard: That's for the first hour, then audio for the second hour

MSE Bytestream Format

Cyril: I opened 4 issues
… Generally, the intent was to read and understand the spec, check for conflicts with other specs, if any
… And check interop across browsers. I also wrote some unit tests, hand-editing MP4 files and feeding to an MSE based player
… I found some surprising results
… I'm migrating the tests to WPT
Issue #4 ftyp box

Cyril: Wording in the BSF spec says the ftyp box is part of the init segment, and the UA should run the error algorithm
… if there's a mismatch. It seems to ask a lot from browsers
… Requires the browser to validate. In my tests, I found the ftyp box is ignored. If I use an init segment with a moof box it's fine
… I can put anything in the ftyp box and everything is accepted
… My understanding of what's implemented today is that we shouldn't say anything about the ftyp box
… Just say an init segment it just a moof box with some constraints
… There's a similar statement about the segment type box that could be safely ignored

Jer: Wasn't there a move in the MEIG to have a restrictive ftyp that would throw errors in cases where extra boxes would be ignored. If we remove this, will there be a request to add it back?

ChrisN: The CMAF BSF discussion

Cyril: I argued against that. If all browsers implement the same thing, why take it out?

Jer: I agree. It's that they didn't have separate mime type so wanted to use ftyp
… I think if we can parse the file, we should, and not throw errors. Being relaxed in error handling is consistent with the web approach
… I'm fine with adding it to the list of boxes to ignore
… But concerned that others will object

Cyril: I doubt browsers will verify conformance to brands. Browsers and players try to do their best with the content

Jer: Making WPTs would answer these questions. Update the spec to match browser behaviour

Cyril: If a box contains a compatible brand the UA doesn't support, it should fail. That's the opposite of what ISO BMFF says

Jer: I wonder if what's written was the opposite of the intent

Matt: I agree that the ftyp box is superfluous in Chrome
… Is there a case where folks want to play streams but they want capability detection based on ftyp?
… As no browsers filter on this, we should remove it from the spec. It's in the list of boxes that should be skipped in implementations

Cyril: Next issue, support for edit lists

Cyril: The spec currently says browsers must support one type of edit lists
… An offset edit list, which offsets the composition time when you have B-frames
… The edit list maps the non-zero composition time to a presentation time of zero
… The spec is silent on other types of edit lists
… Can you use fractional or zero rate? Can you use empty or multiple entries in edit list?
… Rare to have interop in those tests. Mostly browsers ignore edit lists not supported. I think it should fire an error
… Could lead to A/V sync issues

Matt: I have a concern about raising a decode or parse error on content that previously played successfully
… Could be a note for clarification on which edit lists have interoperable support, and others would be ignored

Cyril: I tested fractional rates, empty edit list (should fill the timeline with a gap)
… Maybe deprecation first, then removal in a future edition

Matt: Are there components of these edit lists used with other parts of MSE, timestampOffset, playbackRate, so MSE couldn't afford applications polyfilling

Jer: Hard to polyfill with muxed tracks

Matt: Do we have any stats on existence of these kinds of edit lists?

Cyril: There's one that must be supported

Matt: So a note to say the others should be ignored by implementations

Cyril: I'd prefer to say "may be" ignored, and content providers "should not" use

Matt: Makes sense, also gather data

Jer: Other uses? Offset is needed for B-frames. What about multiple playback rates, other use cases?

Cyril: Empty edits could be used, when you want to align audio and video, you can either remove some video content to start at the audio start
… or say the audio has a gap, and the player should play video without audio until audio starts

Jer: Another option, if we find that these are being used in the wild, could add a "should" statement for the empty edits, or others with valid use cases

Matt: We don't have enough data now. If there are use cases that can't be solved ergonomically in the MSE API, can address at a later time
… If we don't see people complain that playback isn't working, should we then add telemetry?

Cyril: I think it's important to document what content creators can rely on

Matt: so documenting it may be ingored, and content providers should not be used. File a github issue so people can reply to bring to our attention

Cyril: Next is #6, support for unknown boxes. Boxes accepted and ignored
… Not sure what is meant by valid top-level boxes

Matt: If you put an out of order box, such as a moof before a moov. The spec handles that, but are there other cases?

Cyril: ... That's the next issue on the number and order of boxes...

Cyril: I tested a unkn box

Matt: Is that in the spec, how do we know its a top-level box?

Cyril: From where it's placed in the stream

Matt: If it's not defined as a top-level box in the normative spec

Mark: All the boxes that the ISO spec says are allow to appear at the top level

Cyril: Anyone can add other boxes at the top level if they want

Matt: If we need to bind the MSE spec more closely to ISO BMFF, we can

Cyril: Is the intent to ignore unknown boxes at the top level?

Matt: It could indicate the stream is malformed

Cyril: Concerned it doesn't scale. Each time ISOBMFF spec changes, you'd have to change implementation
… It has happened before

Jer: If a box not defined at top level is found at top level, could through an error. But it's OK to skip an unknown box

Cyril: Just consume the bytes and continue parsing

Matt: Could there be a malformed stream that causes the implementation to hold onto large blocks of data?

Jer: Seems like an implementation detail

Matt: Some implementations may see 2 gigabytes as too large and couldn't skip
… We have quota exceeded mechansim. Just thinking through implementation based considerations
… In terms of API usage, there was one user-defined box, proposed by the BSF, but that source was unaware of pre-existing top level boxes they could have used, JS level parsing

Cyril: The unkn box is one I invented
… ISO BMFF recently introduced compressed boxes (gzip). The sidx can be replaced with !sdx, defined in ISO BMFF

Cyril: Let's discuss how to make the spec changes
… Should I open an issue for each problem, then a PR. Or propose a rewrite as a draft and review the whole thing?

Matt: Depends on scale. If just a few, disuss as one offs
… Design principles about not wanting to regress

Cyril: What about small issues? I'd like to be able to rewrite the text and review as a whole
… Agree on the intent of the issues, than make a PR

Jer: Seems reasonable to close a number of issues in one PR

Matt: An issue per item sounds good, and a PR that addresses multiple

Cyril: The BSF is a Note. Why is it not on the Rec track? If there are tests and implementations can be compliant to it, why a Note?

Matt: We focused on testing MSE itself and not so much the BSFs. An implementation must support *a* BSF, so implementations may support different ones
… Allowed more flexibility at the time

Francois: Another reason it's a note relates to patents. It more directly relates to codecs, so didn't need to ask about the royalty free patent policy

ChrisN: What about Process 2021, provides a structure for registries and entries?

Matt: WebCodecs has taken a similar approach to MSE for registries. Anything we can learn from that?

Francois: Similar reasons, would make sense to have them as Rec track specs

Matt: It's been easy to propose and support new entries fairly quickly

Cyril: I think it's fine if the entries aren't all at same maturity level

Matt: I'd need to check with colleagues on that

Cyril: I created WPT for this spec. It may need some more work. Is there a link between WPT and this WG?

Matt: Existing tests were bound to the API itself, not so much a specific BSF. The tests are mostly testing the API for a supported format
… Testing BSF format more deeply is good, put into a subfolder

Cyril: I'll start a PR. How do you detect an error? Buffer range, error event, etc?

Matt: There's a proposed introspection API that could help with that

Cyril: Thank you

Matt: FPWD of MSE v2 and short name

Francois: It'll be published on Thursday, no need for a new CfC

Matt: Update the SoTD?

Francois: Yes, feel free to do that. In future we'll switch to audomatic publishing to /TR

Chris: Second screen WG would like a joint meeting to talk through some issues around capability detection
… Will schedule that for an upcoming call, possibly next time?

[adjourned]

Minutes manually created (not a transcript), formatted by scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).