Meeting minutes
Slideset:
https://
Introduction
Chris: This is a joint meeting with W3C and SVTA.
Daniel: We're SVTA /
DASH-IF members, we want to share feedback around MSE, in the
context of media player implementations
… Pain points and issues we see as developers.
Maybe we're doing something wrong, you could provide your
feedback
… Improve existing implementations, and we want to
understand what you're working on: EME, other APIs
… I'd like to join calls more frequently in
future
Daniel: We want to discuss each topic in turn
Thasso: I work for CastLabs, leading the player team there. We have experience dealing with MSE etc
Daniel: I'm with
Fraunhofer Fokus, lead developer of Dash.js, and co-chair of the
SVTA players and playback WG
… Ali and Yuriy are chairs with me
Chris: This is Media
& Entertainment IG, which focuses on industry coordination and
use cases and requirements. This group can't develop specs.
… I also co-chair Media WG, which does the
standards-track spec development for MSE, EME, WebCodecs, Media
Capabilities, etc.
Daniel: We have various groups and subgroups in SVTA. DASH-IF and SVTA merged
Discussion items
Daniel: We have a general structure for the discussion: how it's working today, implementation issues and implications, and workarounds in players, suggested improvements to MSE and related use cases
Buffer capacity
Daniel: Every media
player buffers data. Create SourceBuffers and append data
… The app can define the size of the forward and
backward buffers
… The forward buffer has a trade off with
latency
… A limitation we have is memory of buffer
capacity. There's no API to query how much data we can append to
the buffer
… We schedule a request for a media segment, but we
get QuotaExceeded if there isn't sufficient capacity
… What would improve the behaviour is to have a way
to query the capacity, then we can delay appending and the fetching
of the segment
… What we do today is way for the error event, then
reduce the max possible buffer
… And adjust the backward and forward buffer. Would
help every player, with downloading segments
Jer: I wrote the MSE player implementation in WebKit. Do you want a general idea of how much room is available without doing appends first? Are you asking for remaining buffer size or something more general?
Daniel: I'd be fine with total buffer size. It depends on the segment duration. Giving a feeling of how much data I can append, and combine with bitrate info, would help understand if I can append or not
Nigel: In implementations, is the buffer size constant, or does it vary over time during playback?
Jer: In our
implementation, it's somewhat constant. But wouldn't want to design
ourselves into a corner. WE have to deal with requests from the
system to jettison memory, which motivated Managed Media
Source
… I would not want fixed buffer size to be a
requirement in the spec
Nigel: Does that imply it's preferable to return how much space there is right now?
Jer: That answer isn't
a guarantee. The system could detect a low memory condition and
that would change the answer
… Any such API couldn't provide a
guarantee
Eric: And we wouldn't want an API that encourages apps to poll
Thasso: On fixed buffer
size, we used to have an implementation where the buffer size was
dynamic. Having the ability to deal with dynamic buffers is
something that needs to be supported from player perspective
… How would it be expressed to the client? Media
time, memory?
… For my use cases, I'd want to poll this
infrequently, just before downloading the next segment
Daniel: I had the same comment. If you schedule the request, you can decide if you want to query data
Jer: What's the
expectation, when you hit the memory limit?
… When we designed the APIs, when you get
QuotaExceeded, you purge the back buffer to make room for the
forward buffer. That wouldn't change this
… I'm not sure what the benefit would be. You have
the data in JS, and as you reach the end of the forward buffer, you
need to append the downloaded data to prevent a stall
… It shouldn't be so expensive you can't do it a
few seconds ahead of time
… If you have at least a minute of forward buffer,
and as you get close to the end you'd have to purge the backbuffer
to append more data. Is that a problem? Do you want us to handle
purging the backbuffer for you?
Thasso: No, but would
be fine with it. This goes in the direction of MMS. I could be OK
with depleting certain parts of the buffer not others
… Most of my pain points are with TVs and
STBs
… We discuss frequently we like having MediaSource
buffers we can retain in memory, and we'd rather not have another
buffer implementation client side
… The machine needs the memory in the end, doesn't
matter if in JS or in MSE side
… Want to avoid splitting it into the two
worlds
… MSE appends take time, so finding the right
moment can be challenging, and time could be 2 or 200ms before the
frame can be rendered
Jer: I'm mostly
familiar with high powered devices, much more powerful than TV or
STBs typically
… For our implementation, an append doesn't have to
be an entire media segment. If you're trying to keep the forward
and back buffers full, with our implementation you can break the
buffer into pieces to keep playback uninterrupted
… Don't know about TV implementations, if they have
enough memory
Jer: You can append parts of mdat and moov boxes, but it does require an init segment first
Daniel: We have CMAF low latency chunks, because what you get from fetch API doesn't align with CMAF chunks
Jer: The MSE parsing
loop understands, and the reset step requires an init segment
first
… No requirement that each chunk is
entire
Thasso: A lot of implementations require full mdat box structures
Jer: There's no requirement for that, but some implementations might require it
Kaz: This proposed API could be harmful, for hackers to crash systems, so we should be careful and discuss pros and cons
Jer: Yes, as an
internet exposed browser we worry about fingerprinting. If you have
a dynamic buffer size API, you could use it for cross-site user
tracking
… Could be like a super-cookie
Nigel: It's interesting
you could append partial segment data to the buffer, but at the
moment it feels like no client code would know that's a good idea
to do
… If it's a strategy to fill the buffer as much as
possible, and the client has a whole segment and half a segment
would fit, there's no information back from the API to suggest
that
Jer: It's not an
unrecoverable error though. Some ideas. I could imagine relaxing
the requirement, so if you exceed the quota, but you can't append
until some flag is cleared
… If some implementations require a full buffer to
be appended, an API could be more flexible with its buffer size
requirements. Accept the buffer but don't allow further appends
until a flag is cleared, e.g., by a remove command
Nigel: A call to append could return the number of bytes successfully appended.
Box parsing
Daniel: We append ISO
BMFF boxes, many players have their own box parser, useful for the
player or app
… Example is EMSG box, for use by player
events
… As of today, MSE doesn't support parsing boxes
and dispatching to the player
… It's done in JS. WASM could be an
option
… With low latency streaming. We parse MOOV and
MDAT boxes, we try to append complete MOOV+MDAT
combinations
… Suggest an API to allow clients to register to
receive the boxes
… EMSG, PRFT for latency adjustment, ELST. Have to
parse MOOV to get correct timescale value
Jer: An arbitrary MP4
parser with WebCodecs lets you create your own player
… As long as the STB supports WebCodecs. WebCodecs
provides low level access to audio and video decoders
… Render to a canvas, preferably
GL-backed.
Chris: The question of
box parsing has come up before in the context of WebCodecs
… Preferable approach was thought to be JavaScript
as it offers flexibility, but also that JS parsing was considered
performant enough to not need a browser-level API.
https://
https://
Chris: On emsg
specifically, WebKit has the DataCue API. In this IG a while ago,
we were looking at how would we do emsg parsing surfaced through
DataCue events.
… That work kind of stalled. We didn't have enough
active contributors pushing this forward.
… If people are interested, I would suggest to get
together and get that moved forward.
… Very targeted solution towards emsg events.
Immediately triggered or triggered at some point on the
timeline.
… It wouldn't do the general box parsing that
you're talking about, though.
https://
Thasso: We're very interested. Essentially, it means we end up with an MSE implementation that we do ourselves.
Thasso: A software
implementation based on WebCodecs sounds a good idea. But we're
still lacking a lot of features, e.g., DRM
… A simple approach, register a listener for any
box type, don't need to do heavy lifting
Jer: I see a couple of
problems here. An API to return an arbitrary box, especially if
it's one the implementation doesn't understand
… There are use cases I'd like to address. EMSG is
one of them
… Other case is 608/708 caption data, given
regulatory requirements
… Those are embedded in the media stream, but not
elevated to the subtitle rendering
… So we see websites doing parsing themselves. But
that might not be accomplished using a box parsing API, they're
muxed in the mdat
Nigel: At TPAC we
talked about potentially adding subtitles and captions to MSE, but
then the question is how do you know on the output side which mdats
to pull out
… so you do what you need to for the player
code.
<Zakim> nigel, you wanted to mention that this could be helpful for subtitle/caption decoding from MSE
Nigel: When you say register for ISO BMFF boxes, it's not any mdat you want, it's some particular mdat
Thasso: I agree, the
general problem with not every implementation understanding the
boxes, and issue with nested boxes
… Maybe we could say, for CMAF content all boxes
defined there are supported by spec, so I can pull them
out
Daniel: Suggest following up offline
Codec information
Daniel: You have
changeType method. In dash.js we save the codec info in a
variable
… Not possible to ask the current codec string, so
you have maintain yourself. Suggest adding an API
Jer: We had an idea in
WebKit, to pull codec information from the VideoTrack. It's
relevant for MSE clients and for HLS and file-based downloads
… changeType requires passing a complete codec
string. We've seen cargo-culting or magic strings being used for
AAC or H.264. How do you know which codec string to use with Media
Capabilities?
… Needs info out of band. An API to get the codec
string as understood by the browser. Some interest from browser
vendors to do this
… We've heard from other clients what they really
want is a timeline based set of information: start, end,
properties
… It's an interesting use case. We want to solve
aspects of this. Please bring to the WG
Dynamic addition of SourceBuffers
Thasso: Could have a
MediaSource session, and maintaining a number of buffers in the
session
… Issue is inability to manage the number of
buffers. Turn off audio, but I can only mute it. Once removed it's
gone, get an error after adding it back
… Use cases: remove buffer: turn off audio fully,
or turn of video fully
… A text track I definitely want to turn off. Some
players have workarounds, difficult to maintain. For audio, pushing
silence isnt' so complicated
… Modelling a black frame with H.264 is not too
bad, but becomes more complex for other codecs
… We want more dynamic behaviour when adding or
removing buffers
Daniel: The IETF MoQ group is looking at low latency, hard with current implementations if you need to append dummy data
Jer: Two related efforts in Media WG: One is behaviour when you hit a gap in video data, continue playing and catch up, don't stall. Would solve some of these use cases
https://
Jer: Bigger issue,
there's a solution for having multiple SourceBuffers in a
MediaSource that aren't currently active
… Tracks associated with media element. Once it's
removed from the active source buffers list, it should have no
impact on playthrough
… Shouldn't have to feed black frames
through
… I don't think Chromium has that yet. But it would
unblock this use case
… It exists in the spec but not all implementations
yet
Multiple Source Buffers
Thasso: Related use
case. We implemented HLS interstitials, ran into problem.
Conditioning not perfect, when timelines overlap
… Want to use MSE as our buffer, and make use of it
later
… Problem is how to do this even with virtual
buffer. If timelines overlap, need to be very precise. currentTime
not accurate enough, every 250 ms. So workaround is to use rAF()
and poll the time to work out when to append
… Want to get rid of the data earlier. Best case
scenario, not do the switching myself but be able to schedule: when
done with video track 1, play number 2, then go back to 1 if there
a no gaps
… Hard to deal with overlapping timelines on the
client side
Jer: A couple of ideas. You shouldn't have to poll currentTime, you can use synthetic TextTrackCue events for example
Thasso: We've tried things like that, but the timing is not accurate on all implementations
Jer: We've heard this
use case before, with HLS interstitials. A MediaSource you can
detach and re-attach later. Designed to solve use case of switching
to differently encoded content.
… Could be used to play interstitial content,
without having to reappend the original data. Only requirement is
to seek back to the main timeline position when you do the
switch
… The issue from implementers we heard is there may
not be enough memory in low-end implementations to support multuple
MS instances. Multiple video buffers would have similar problem,
leading to more QuotaExceeded errors
Thasso: I think the limitation on embedded devices isn't necessarily the memory, it's how they initialise the hardware resources
Daniel: That's why we did the virtual buffer in dash.js
Jer: Detachable
MediaSource. You have main content attached to the media element.
If you want to preload ad insertion in a second MediaSource
… Use an audio element instead, to avoid the
implementation instantiating an embedded codec. It's technically
allowed by the spec, but needs some experimentation on STBs to see
if it would work
… And would require an implementation of datachable
Media Source
Summary
Daniel: I want to join calls more frequently, and we can file GH issues
Chris: Thank you for
bringing these topics, we'll follow up, your input is
welcome.
… You mention potential presentations at the OSMART
workshops, happy to do something like that.
Daniel: yes, let's talk about that offline as well
[adjourned]