Meeting minutes
TPAC 2021 outcomes
https://
chris: TPAC 2021 had 2
meetings: first on app performance on tv devices
… should we be looking at dev technologies, canvas,
etc. there was a specific idea about control of cache usage because
of limited space on devices
… plus some more
… 2nd meeting about timing accuracy, latency, etc
on iot devices
… please state your interest and volunteer to lead
a discussion if you're interested, we can follow up on
these
… discussion between this group and MiniApps, would
like to see a collaboration between our groups
… nothing specific at this stage, but more to come
on that front
… will: who is interested in cmaf over
webtransport?
chris: NHK expressed interest, had a specific question, follow up offline
Media production workshop outcomes
chris: great media
production workshop at smtpe, hoping to bring some of the stuff
they raised to the wg
… participant survey to find out who's interested
and willing to contribute, it needs active participation to move
forward
… report from workshop is forthcoming. there were
25 talks or so, transcripts and presentations at link on slide
2
video.requestVideoLoadCallback proposal
https://
yuhao: at bytedance web
team, working on 2 players
… 2 proposals to present today
… 1st proposal:
Video.requestVideoLoadCallback
… easy way to get request and response info from
video element, similar to video.requestVideoFrameCallback, won't
video from MSE or VideoStream
… use case 1: report http errors with supplemental
info to user, instead of just "not supported". today there's no way
to distinguish what error happened
… use case 2: user watching live stream with hls
can get info from cdn response headers for troubleshooting, but
video tag cannot.
… use case 3: timing for segments req/response not
available with video tag as opposed to hls player, cannot
troubleshoot bad user experience because info is hidden
<xfq> video.requestVideoLoadCallback() Explainer
yuhao: explainer:
https://
… get callbacks when a request starts, when
response happens, so timing is visible, headers are visible.
cancelable
… makes response info available just like fetch
api, content-length, response code
jake: cancel is about the callback, not the request, right?
yuhao: yes
will: if you want this info, why not use mse and fetch? you then have all this info
yuhao: yes, that's what
we're doing. we can get all this info, it works well. but for mp4
we don't have a good stable library to play. most web devs prefer
to use video tag instead of player
… i use mse players exactly for this, but many
colleagues who want to play mp4 just want to use the video
tag
will: if you made a library for this, that would work, right? I say this because adding apis is difficult, and it seems faster to just make a library maybe
will: to 2nd point: I
work at a cdn, so I understand the troubleshooting point. you
shouldn't be using ip address. apple is hiding this info, and many
people will have the same ip
… the way forward is to use standard media client
data to append info to the request such as session id, that will go
into logs and will be handled consistently by the cdn, and will go
into their logs
yuhao: that makes
sense, yes. fetch api to load media streams gets session info in
the headers, but video tag we don't get to put stuff in headers so
we can't add the session id the same way. also a similar problem
there.
… request callback gives access to insert headers
in the request
will: query args are pretty easy and can go into the url today. yes, custom headers are harder, but query args is a reasonable hook.
chris: video with
source url case does seem to have a gap--media element makes
requests and there's lower visibility with less info about why a
failure happened on a failure
… is there a parallel with other elements elsewhere
on the web platform?
yuhao: recently heard a
method to get request/response info: worker can intercept video
request, but I haven't tried it yet.
… i might be able to try a demo to see if that's
useful
chris: interesting. anyway, for hls case you'd get this info on every request?
yuhao: yes, and we can use regex to see which requests are what. resources with manifest file vs. segments is generally easily distinguished
chris: clarification: so the callback would return the manifest as well? or just metadata about the response success/failure?
yuhao: data would be read-only at most to prevent plugins from editing manifest. we just need metadata for what we're doing.
chris: joey, has this come up in your discussions about building media players?
joey: all we're
building is ??-based. we haven't had requests to get response
metadata from the hls playbacks, not sure if serviceworker would
work
… possible to bypass all that with a player and
mse
… you don't have an option on ios/iphones, so i can
see how this would bbe useful
… seems preferable to just have mediasource
everywhere, but we don't.
chris: yes, for file playback where not using hls/dash this makes sense. have you considered raising a bug or github issue against the html spec?
chris: that seems the most likely place to raise this
joey: yes, you could
argue if there's something similar for images or other parts of the
web, maybe this applies to something lower-level applicable to
other parts of the web
… maybe serviceworker is the place for that. could
potentially raise bugs with the browser vendors. if i were doing
this i'd experiment with service workers to find out where there's
gaps
chris: yes, makes sense. big question is whether service worker can do this
joey: yes, and next big question is whether hls on ios can do this too
will: original question is: given mse is too complex to get this out, maybe serviceworker doesn't solve this?
joey: yes, but maybe we
can do that with a library instead of extending web
… 2nd thing was cmcd injection into these requests.
if the serviceworker can do that too, then maybe we don't need to
add an api to the browser
yuhao: like will said,
most web developers uses video element to load the resource
directly, not just for us but everywhere. serviceworker might be a
good place to solve this, makes it easier to understand for web
devs and easier to use
… when i just want to play back video, if it's not
a standard library it's hard to know what to use
… adds complexity to the website. have to worry
about page size and such
… but we can make it best practice for our devs if
this works.
chris: ok, some good feedback, maybe we should capture this in meig repo to track results of this investigation. once we know what serviceworker can do we can use that as guidance on next steps
<tidoust> [I guess one difficulty with Service Worker is that you won't have access to the DOM, so no easy way to associate the request with a specific video element. This may not be fully needed though]
chris: track with a ticket in the repo
Video SEI events proposal
https://
yuhao: 2nd proposal: getting sei event from video element when frame is parsed in the video
<xfq> video SEI event Explainer
yuhao: sei event
contains info about the video: timestamp to synchronize iwth video
frame
… available with mse or mediastream. conversation
with Fuqiao said they're considering taking out sei info from
webcodecs
<xfq> SEI w/ webcodecs discussions
yuhao: had the question
for me if we don't need to decode and just use the demuxer to
decode?
… video annotations ("bullets") for text &
emoji, subtitles. useful to avoid covering people with the
bullets
… ai stuff can place the text overlay to avoid
people by body shape, including with subtitles
<xfq> Masking in Bullet Chatting
yuhao: the reason this
works is with sei. that gets put into video stream
… so we need the same mechanism here
… 2nd use case is quiz overlays on the video, part
of interactive live events. question panel can pop up where users
can click
… needs to sync with time-limited question asked on
the video live stream. sei provides the sync info here
… 3rd use is to capture the time
delay.
… allows measurements about user experience and
performance
… proposal adds sei event for video
element
… gives the payload data for the sei info, and the
timestamp (which frame it's in)
… use the timestamp to start a timeout for various
overlays
… timestamp tells when to render the sei
info
… needed for synchronization. for now we use media
source, we can parse the sei stream
… but we still need the timestamp there to map from
video timestamps to the current system time
<tidoust> [In the body masking scenario, frame accuracy seems needed. The main way to achieve that today is through WebCodecs, so even though the app does not need to access decoded frames in theory, in practice it probably does to control the time at which the frame and the masked overlay get rendered]
yuhao: so we can
synchronize properly. helps user experience. when using webcodecs,
this would help to avoid having to remux
… so sei info is about how to synchronize the
supplemental info with the video element
chris: this fits with
another area of work in the interest group around looking at dash
emsg boxes
… sei is in the video bitstream, not in the media
container, right?
yuhao: yes.
… in china we're still using rtmp because flash was
popular before and cdn still has rtmp
… for web live streams it's easy to move from rtmp
to flv streams.
chris: in the other
work, we were exposing this in text tracks. those can have metadata
and the application can look at events that are scheduled
… the difference here is you have an event list
triggered when the frame is shown? or when it's parsed? it seems
like this has a timing requirement
… so this needs to be linked to the frame so that
overlays can avoid collision? or is the timing not as strict so it
can have some delay?
yuhao: renderer itself
can cause some time delay. that's why i'm looking to get info about
when it's parsed, so we know when to render the effect
… so parsed not rendered. first version if the api
asked for it when rendered. that's also good
chris: so the data extracted includes timestamp, so you can know when to apply the rendering
yuhao: yes, timestamp
is based on video's current time, not video file's time.
… synchronizing across multiple broadcast also.
timestamps need to change for merging these kinds of
events
… so it might not increase smoothly and might come
back to 0
… complex to deal with sei timestamp, as opposed to
current time
will: i support this. we went down this path for dash some. much more efficient while parsing in the internal browser just informs the app. this is possible already in javascript by parsing it
Joey: I need to run to another meeting, but I would love to have SEI metadata somehow. Could be in a callback, could be in a texttrack. I agree with Will. I would not want to do this in a library.
<tidoust> [Other possibilities to expose SEI metadata would be to expose in requestVideoFrameCallback() and/or WebCodecs]
will: but much more efficient to return this from the parser that's already rendering and parsing the video
chris: we'll invite to discuss this use case in media timed events. will follow up separately, this should be included in that work.
cpn: we'll be in
touch
… we have GitHub issues, etc.
… will share the information with you
… any other comments?
(none)
cpn: thanks a lot for
participating
… good use cases, well presented and
described
… we're adjourned
[adjourned]