Meeting minutes
MSE Dynamic addition / removal of source buffers
<cpn> w3c/
<kaz> FYI, related discussion during the MEIG call on Jan 7
cpn: Daniel responded to some of the comments in the reply
cpn: The use case you've given is different DASH periods where
… either there's video or no audio, and back to video and audio, or similar
Daniel: Yes, that would be the scenario
cpn: And the workaround you're using is, if there's no audio, that you continue
… to fill the audio source buffer with dummy data, silence presumably,
… then you can maintain the continuous throughput so that playback doesn't stall due to
… an underflowing buffer?
Daniel: Yes that would be more elegant than what we have today,
… where we tear the whole thing down and set it up again for each period.
… I agree dummy data would be more elegant but I don't know if anyone does that in their players today.
… You would need dummy data or reuse part of the previous segment
cpn: Two suggestions.
… 1. Use different players with different source buffer sets and switch the video player in the page
Daniel: Maybe performance problems hiding and showing players.
… Not sure about ad blocker behaviour either.
… If you're in full screen and do the switch, is it possible to continue in full screen?
… The timing of the switch might not be completely accurate.
cpn: Wonder if the switch timing is a problem when switching media sources?
Daniel: Yes that might also be a point. Would need to check how accurate it would be.
… Maybe there's a way to request removal of a buffer when empty.
… Also what about playback through unbuffered ranges.
alicia: In gstreamer we have a solution for this called gap events.
… Each track has a serialiser stream of buffers which are frames and events
… Events can be "change the coding parameters for the upcoming part of the video"
<kaz> i|Dainel responded|FYI, related discussion during the MEIG call on Jan 7|
alicia: Or Gap Events where you are not going to get frames for x seconds but don't consider it an underrun
… Explicitly say that there's no data for a period. And they're format agnostic.
… Could try to do that in MSE if it works for the use cases.
… They're just like frames. Instead of AppendBuffer could call AppendGap, and specify a time
… period in the stream where there is intentionally no data.
… Would be the correct time and duration as specified.
Daniel: Sounds a bit like Chris's proposal?
cpn: Possibly [pulls up #160]
… We're interested in keeping as close to the live edge as possible and having a mode where that's
… more important than continuous live playback, so if there's a drop out skip ahead to the most recent
… playback position, to avoid buffering and delay, important for live sports for example.
Daniel: Dave was contributing to dash.js at the time - 2016 was a long time ago!
cpn: Figuring out where this issue got to...
… In the WG this hasn't been progressed particularly. It was discussed in FOMS 2023 for example
Daniel: I think there was a hacky way to skip the gaps and it didn't work in all browsers.
cpn: We have to be a bit careful in this group - we can't design solutions because we don't have IPR coverage.
… We can identify issues with existing APIs and be very clear about the requirements we have.
… Solution design needs to happen in the MediaWG.
… We could put this onto the Media WG agenda, then we would have the browser implementers
… in the conversation as well.
Daniel: Would be good from my point of view too.
… Also want feedback from other player makers, Shaka Player, HLS.js etc
… Would like to know if these things I'm stumbling upon also affect others.
… Want feedback from implementers.
cpn: We know e.g. Joey Parrish from Shaka Player. And does Gary Katsevman still work on players?
… The other place is the video-dev Slack group, which can be a good place to get this kind of feedback.
… There's a good community of app and player developers there.
… Common use case where there's a stall and you want to get to latest playhead position.
… Though this suggests there's sparse data so you want to resume playback and then
… restart buffering later.
alicia: I do think the two use cases are separate though
… I've never been super-convinced about low latency MSE, it always felt like a job for WebRTC.
cpn: You would need the underlying media playout to be aware it was in that mode so it could
… take over without relying on the javascript doing something perhaps.
… If we go back to #359, more concretely what are our next steps?
… I think its worth replying to explain the limitations of having multiple video elements.
… And also describe the latency and timing considerations and our goals, i.e. maintaining continuous
… throughput and playback, minimising gaps when there is a switchover.
… We think the switching of a media source would have the same timing issues as switching the media
… element itself.
… The full screen thing is interesting because if the video element itself is full screen then
… I can see how you would potentially have a visible impact when switching.
… But can there be a full screen container element that has video as a child, to make it smoother?
alicia: Additional complication. What if the event is at the beginning of the video, so we have
… a video that starts without a video. How would we handle that? Would it be empty? Maybe.
… But I guess you could still use the initial resolution of the video - that's already a thing in MSE.
… When you sent the first init segment I think that sets the resolution - may be misremembering.
Daniel: I think there are always interesting edge cases.
… I just asked in the video-dev channel.
… I'm with you that the solution, and for #160, may be just to tell the buffer that it's okay to be without
… content for a while, but then you have to handle no video at the beginning.
alicia: Or you could signal "now we're in ads".
cpn: Would the web app want control over the holding image?
alicia: I was suggesting it could be the last image shown.
… Simplest solution, then can be customised by sending the last image you want to show.
cpn: I was wondering if configuration to select "last good image" or "black screen" would be needed.
alicia: I think it's better to have an explicit gap API than just a flag of ignore underruns.
… I think it would backfire if people aren't sure how they should handle underruns.
… For audio-free ads, it needs to be clear that we are not going to have audio for some time.
… For the live stream maybe the video element needs a way to set a policy of how to handle
… underruns.
cpn: Good point you might want different behaviour
alicia: Not only MSE, could be a problem in low latency even if not using MSE,
… so could be something for the video element itself.
Daniel: The Media over QUIC group was doing something similar,
… dropping video data and pausing video while audio continues, until video data comes.
… Showing last frame of video.
alicia: What if the javascript code to handle the underruns cannot run because some other intensive
… thing is happening. What would that do to timing?
cpn: The videos I've seen are all over web transport with canvas-based rendering.
Daniel: The use case of keeping the last frame would not be possible - they would face the same
… limitation with an MSE-based implementation that we are discussing here right now.
… If we could handle gaps by playing over them then we wouldn't need to remove and add source buffers.
… Or have an API that allows us to append dummy data to fill the buffer like virtual data or whatever you want to call it.
alicia: Could be as simple a a gap with a dictionary saying the start and end timestamp and timescale.
<kaz> media-source issue 360 - Multiple attachable/detachable SourceBuffers
cpn: How does this differ from #360 multiple source buffers.
daniel: With two source buffers you could pre-buffer the second buffer and then at the segment boundary
… signal the switch so the player can pick up the already filled source buffer.
alicia: You can already do this, right?
daniel: You could do it with a virtual buffer, yes.
piers: A bit like the two player approach.
alicia: Assume that we are talking about milliseconds if not perfectly aligned?
piers: It's comparable to the two player approach, just taken down one level, having two
… source buffers instead of two players, and then switching between them.
daniel: Now you can only create one source buffer per type right?
Alicia: There's a VideoTrack API, and according to spec there's an API to switch tracks and in theory you can switch tracks
… If that's the case, the problem is in implementatrion
Daniel: https://
… In theory, for this use case we'd need 2 SourceBuffers per track, so 4 overall
Alicia: Not sure how well tested that is
Daniel: When I tried it, it said only one is allowed
Alicia: The Track API doesn't have good cross browser support, WebKit has some support, but experimental at the time in other browsers
Daniel: The spec does mention it as a quality of implementation issue
Nigel: Is the result of this conversation, that the requirement should be "must support at least two configurations" so you can switch between them, for both audio and video?
Nigel: Or put it differently, the requirement is for smooth transitions between configurations that may have different numbers of sources
Daniel: I think the latter
… Support one per type, but if you want to guarantee smooth transition, e.g., for ad insertion, then benificial to support two SourceBuffers per type
Alicia: It mentions the Tracks API in 3.3 activeSourceBuffers
<kaz> 3.3 activeSourceBuffers attribute
Alicia: so already covered by the spec
Could say we have several tracks and want to switch between them. Seems already covered by activeSourceBuffers
… The spec allows playing multiple audio tracks but only one video track
Daniel: I agree it's implicitly there, but to select it you'd have to have multiple tracks
Alicia: Which you do when you have multiple SourceBuffers
Daniel: Could be good to point to this in the GitHub issue
… Do you support multiple SourceBuffers in WebKit?
Alicia: We do
Piers: It doesn't imply you can switch off the video though?
Alicia: Three features: 1) Signalling gaps in either a track or a SourceBuffer, 2) a way to tell the video element that you care more about being on time than not having underruns, and 3) Implementation of being able to switch between video tracks in an MSE context
Chris: Next steps: document requirements, limitations of multiple media elements, and timing issues with switch MediaSources
… Also gathering input from other video players
… Then bring it all to the working group for discussion
Multiple attachable/detachable SourceBuffers
Chris: I'm confused about the overlapping segments in #360
<kaz> media-source issue 360 - Multiple attachable/detachable SourceBuffers
Daniel: I think it means you play segment 1 to the end, then play parts of segment 2, when segment 1 ends you play segment 2.
Nigel: For ad insertion, you want to cut away from segment 1, then play segment 2 at the time of the ad, but the limitation is you can't replace segments or overwrite
Daniel: In this case you can replace the parts of segment 1 that you don't want to play
Alicia: Often there are deviations between time when things, e.g., due to edit lists
Daniel: We should clarify the diagram. Multiple IDR frames, where everything is lost to the next IDR
Piers: And what kind of media encoding change you have. This could be trying to cope with a situation where the content can't be conditioned (adjusted in length so the segments all fit together).
DataCue
Rob: I'm looking at geo-tagged video, interseted in timed metadata
… We proposed DataCue in WICG, which has stalled
… I've been doing some research to try to move it forward
… I wrote a polyfill that works in all browsers
… I looked into the history, DataCue was in HTML and then removed
… The issue is there isn't an exposed TextTrackCue constructor
… The original design had DataCue and VttCue as derived classes, which have constructors
… If we could expose the TextTrackCue constructor, it would allow us to create cues of their own making, e.g., DataCue, or other varieties
… So can be a way to progress this, and allow experimentation with different cue types
… VTTCue as a specific cue for timed text and DataCue as a generic cue for timed data
… It would work as an exposed constructor. But the only constructor today is VTTCue
… And it doesn't have the vital thing, which is a type field, for metadata, so you could have a pub-sub type interface for particular types
Nigel: I think it makes sense for VTTCue to be concrete, or for DataCue to have a constructor, and agree with the intent to have something unburdened with the additional stuff from TextTrackCue
Rob: My polyfill is based on VTTCue, but I have another for Safari where TextTrackCue has a constructor, based on a presentation from Eric and Tess from 2020 I think
… where they were trying to use TextTrackCue directly
… Exposing the constructor would enable them to do that as well, for the cue they wanted to design
Nigel: Makes sense, I'd ask what the default event handlers should be
… VTTCue has some default behaviour
Rob: Should I raise an issue?
Chris: Please, in the WICG/DataCue repo
… Also we can discuss offline, if helpful
[adjourned]
<kaz> kaz: We should clarify how to organize the discussion around these topics (and possibly some more issues around MSE). Let's talk about that during the ME Chairs call.