Media and Entertainment IG – 07 December 2021

Meeting minutes

TPAC 2021 outcomes

https://docs.google.com/presentation/d/1ZLzeFogyepd0pjAPOkDkV2a_YdFZ2mPeu_EyzLCIXqY/edit?usp=sharing <- Chris's slides

chris: TPAC 2021 had 2 meetings: first on app performance on tv devices
… should we be looking at dev technologies, canvas, etc. there was a specific idea about control of cache usage because of limited space on devices
… plus some more
… 2nd meeting about timing accuracy, latency, etc on iot devices
… please state your interest and volunteer to lead a discussion if you're interested, we can follow up on these
… discussion between this group and MiniApps, would like to see a collaboration between our groups
… nothing specific at this stage, but more to come on that front
… will: who is interested in cmaf over webtransport?

chris: NHK expressed interest, had a specific question, follow up offline

Media production workshop outcomes

chris: great media production workshop at smtpe, hoping to bring some of the stuff they raised to the wg
… participant survey to find out who's interested and willing to contribute, it needs active participation to move forward
… report from workshop is forthcoming. there were 25 talks or so, transcripts and presentations at link on slide 2

video.requestVideoLoadCallback proposal

https://www.w3.org/2011/webtv/wiki/images/a/a5/VideoRequestVideoLoadCallback-API-Proposal.pdf <- Yuhao's Slides

yuhao: at bytedance web team, working on 2 players
… 2 proposals to present today
… 1st proposal: Video.requestVideoLoadCallback
… easy way to get request and response info from video element, similar to video.requestVideoFrameCallback, won't video from MSE or VideoStream
… use case 1: report http errors with supplemental info to user, instead of just "not supported". today there's no way to distinguish what error happened
… use case 2: user watching live stream with hls can get info from cdn response headers for troubleshooting, but video tag cannot.
… use case 3: timing for segments req/response not available with video tag as opposed to hls player, cannot troubleshoot bad user experience because info is hidden

<xfq> video.requestVideoLoadCallback() Explainer

yuhao: explainer: https://github.com/leonardoFu/video-rvlc/blob/main/explainer.md
… get callbacks when a request starts, when response happens, so timing is visible, headers are visible. cancelable
… makes response info available just like fetch api, content-length, response code

jake: cancel is about the callback, not the request, right?

yuhao: yes

will: if you want this info, why not use mse and fetch? you then have all this info

yuhao: yes, that's what we're doing. we can get all this info, it works well. but for mp4 we don't have a good stable library to play. most web devs prefer to use video tag instead of player
… i use mse players exactly for this, but many colleagues who want to play mp4 just want to use the video tag

will: if you made a library for this, that would work, right? I say this because adding apis is difficult, and it seems faster to just make a library maybe

will: to 2nd point: I work at a cdn, so I understand the troubleshooting point. you shouldn't be using ip address. apple is hiding this info, and many people will have the same ip
… the way forward is to use standard media client data to append info to the request such as session id, that will go into logs and will be handled consistently by the cdn, and will go into their logs

yuhao: that makes sense, yes. fetch api to load media streams gets session info in the headers, but video tag we don't get to put stuff in headers so we can't add the session id the same way. also a similar problem there.
… request callback gives access to insert headers in the request

will: query args are pretty easy and can go into the url today. yes, custom headers are harder, but query args is a reasonable hook.

chris: video with source url case does seem to have a gap--media element makes requests and there's lower visibility with less info about why a failure happened on a failure
… is there a parallel with other elements elsewhere on the web platform?

yuhao: recently heard a method to get request/response info: worker can intercept video request, but I haven't tried it yet.
… i might be able to try a demo to see if that's useful

chris: interesting. anyway, for hls case you'd get this info on every request?

yuhao: yes, and we can use regex to see which requests are what. resources with manifest file vs. segments is generally easily distinguished

chris: clarification: so the callback would return the manifest as well? or just metadata about the response success/failure?

yuhao: data would be read-only at most to prevent plugins from editing manifest. we just need metadata for what we're doing.

chris: joey, has this come up in your discussions about building media players?

joey: all we're building is ??-based. we haven't had requests to get response metadata from the hls playbacks, not sure if serviceworker would work
… possible to bypass all that with a player and mse
… you don't have an option on ios/iphones, so i can see how this would bbe useful
… seems preferable to just have mediasource everywhere, but we don't.

chris: yes, for file playback where not using hls/dash this makes sense. have you considered raising a bug or github issue against the html spec?

chris: that seems the most likely place to raise this

joey: yes, you could argue if there's something similar for images or other parts of the web, maybe this applies to something lower-level applicable to other parts of the web
… maybe serviceworker is the place for that. could potentially raise bugs with the browser vendors. if i were doing this i'd experiment with service workers to find out where there's gaps

chris: yes, makes sense. big question is whether service worker can do this

joey: yes, and next big question is whether hls on ios can do this too

will: original question is: given mse is too complex to get this out, maybe serviceworker doesn't solve this?

joey: yes, but maybe we can do that with a library instead of extending web
… 2nd thing was cmcd injection into these requests. if the serviceworker can do that too, then maybe we don't need to add an api to the browser

yuhao: like will said, most web developers uses video element to load the resource directly, not just for us but everywhere. serviceworker might be a good place to solve this, makes it easier to understand for web devs and easier to use
… when i just want to play back video, if it's not a standard library it's hard to know what to use
… adds complexity to the website. have to worry about page size and such
… but we can make it best practice for our devs if this works.

chris: ok, some good feedback, maybe we should capture this in meig repo to track results of this investigation. once we know what serviceworker can do we can use that as guidance on next steps

<tidoust> [I guess one difficulty with Service Worker is that you won't have access to the DOM, so no easy way to associate the request with a specific video element. This may not be fully needed though]

chris: track with a ticket in the repo

Video SEI events proposal

https://www.w3.org/2011/webtv/wiki/images/c/cf/Video-SEI-Event-Proposal.pdf <- Yuhao's Slides

yuhao: 2nd proposal: getting sei event from video element when frame is parsed in the video

<xfq> video SEI event Explainer

yuhao: sei event contains info about the video: timestamp to synchronize iwth video frame
… available with mse or mediastream. conversation with Fuqiao said they're considering taking out sei info from webcodecs

<xfq> SEI w/ webcodecs discussions

yuhao: had the question for me if we don't need to decode and just use the demuxer to decode?
… video annotations ("bullets") for text & emoji, subtitles. useful to avoid covering people with the bullets
… ai stuff can place the text overlay to avoid people by body shape, including with subtitles

<xfq> Masking in Bullet Chatting

yuhao: the reason this works is with sei. that gets put into video stream
… so we need the same mechanism here
… 2nd use case is quiz overlays on the video, part of interactive live events. question panel can pop up where users can click
… needs to sync with time-limited question asked on the video live stream. sei provides the sync info here
… 3rd use is to capture the time delay.
… allows measurements about user experience and performance
… proposal adds sei event for video element
… gives the payload data for the sei info, and the timestamp (which frame it's in)
… use the timestamp to start a timeout for various overlays
… timestamp tells when to render the sei info
… needed for synchronization. for now we use media source, we can parse the sei stream
… but we still need the timestamp there to map from video timestamps to the current system time

<tidoust> [In the body masking scenario, frame accuracy seems needed. The main way to achieve that today is through WebCodecs, so even though the app does not need to access decoded frames in theory, in practice it probably does to control the time at which the frame and the masked overlay get rendered]

yuhao: so we can synchronize properly. helps user experience. when using webcodecs, this would help to avoid having to remux
… so sei info is about how to synchronize the supplemental info with the video element

chris: this fits with another area of work in the interest group around looking at dash emsg boxes
… sei is in the video bitstream, not in the media container, right?

yuhao: yes.
… in china we're still using rtmp because flash was popular before and cdn still has rtmp
… for web live streams it's easy to move from rtmp to flv streams.

chris: in the other work, we were exposing this in text tracks. those can have metadata and the application can look at events that are scheduled
… the difference here is you have an event list triggered when the frame is shown? or when it's parsed? it seems like this has a timing requirement
… so this needs to be linked to the frame so that overlays can avoid collision? or is the timing not as strict so it can have some delay?

yuhao: renderer itself can cause some time delay. that's why i'm looking to get info about when it's parsed, so we know when to render the effect
… so parsed not rendered. first version if the api asked for it when rendered. that's also good

chris: so the data extracted includes timestamp, so you can know when to apply the rendering

yuhao: yes, timestamp is based on video's current time, not video file's time.
… synchronizing across multiple broadcast also. timestamps need to change for merging these kinds of events
… so it might not increase smoothly and might come back to 0
… complex to deal with sei timestamp, as opposed to current time

will: i support this. we went down this path for dash some. much more efficient while parsing in the internal browser just informs the app. this is possible already in javascript by parsing it

Joey: I need to run to another meeting, but I would love to have SEI metadata somehow. Could be in a callback, could be in a texttrack. I agree with Will. I would not want to do this in a library.

<tidoust> [Other possibilities to expose SEI metadata would be to expose in requestVideoFrameCallback() and/or WebCodecs]

will: but much more efficient to return this from the parser that's already rendering and parsing the video

chris: we'll invite to discuss this use case in media timed events. will follow up separately, this should be included in that work.

cpn: we'll be in touch
… we have GitHub issues, etc.
… will share the information with you
… any other comments?

(none)

cpn: thanks a lot for participating
… good use cases, well presented and described
… we're adjourned

[adjourned]

– DRAFT –
Media and Entertainment IG

07 December 2021

Attendees

Meeting minutes

TPAC 2021 outcomes

Media production workshop outcomes

video.requestVideoLoadCallback proposal

Video SEI events proposal