Media & Entertainment IG monthly call

02 Feb 2018



Kazuhiro_Hoya, Geun-Hying_Kim, Steve_Morris, Giri_Mandayam, Mohammed_Dadas, Dave_Evans, Will_Law, Tatsuya_Igarashi, Francois_Daoust, Paul_Jessop, Kazuyuki_Ashimura, Chris_O'Brien, Colin_Meerveld, David_Evans, Geun_Hyung_Kim, George_Sarosi, Chris_Needham, Louay_Bassbouss
Chris, Kaz


<cpn> Scribe: Chris

<cpn> Scribenick: cpn

DASH Eventing and HTML5

<kaz> https://www.w3.org/2011/webtv/wiki/images/a/a5/DASH_Eventing_and_HTML5.pdf Giri's slides (Member-only)

<kaz> [Introduction]

Giri: This is a brief intro to ongoing work in MPEG, and what we've done in ATSC
... There are 2 types of events we deal with in DASH
... DASH is adaptive streaming over HTTP, designed to leverage HTTP for streaming media, live or on-demand
... Media Source Extensions and Encrypted Media Extensions, as well as the audio and video media tags deal with this
... Interactivity events, absolute or relative time
... DASH defines two ways to deliver events: in the MPD manifest XML file, it describes the segments in the streaming service
... Then there are in-band events, in an emsg box in the ISO BMFF media track
... ISO BMFF is a packaging format defined by MPEG, the most popular format of DASH packaging.
... There are other forms, WebM being popular also
... Issue with synchronisation, media playback should be handled by the native media player
... There are two things needing synchronisation: the media player and the web page in the browser context
... emsg eventing is a more dire situation, not supported by browsers
... in the byte stream registry, there's no requirement for a browser implementation
... only custom browsers deal with emsg data, not mainstream browsers
... this was problematic in designing ATSC

<kaz> [How does HTML5 handle DASH events today?]

Giri: this is just my opinion, not authoritive
... HTML has TextTrackCue with an identifier, text string, start and end time, and a payload
... There can be metadata cues
... If you have a DASH event on the transmitter side, this could transcode in-band events into text track cues, and present them in the text track
... Here's an example from the WebVTT spec
... There's a separation between cues to be handled by the player and those to be handled by the application

<kaz> [HTML5 Handling of Text Track Cues]

Giri: In HTML5, the video track allows for track specific event handlers, oncuechange event
... There was a proposal for DataCues with binary payloads
... Browser vendor support is non-existent AFAICT
... There's a 4 year old discussion on the Chromium mailing list
... HbbTV has also identified problems with short duration cues, where cues may expire before the UA could handle them
... There's a specific problem in ATSC where we try to minimise channel acquisition
... i.e, start playback as quickly as possible on channel change
... There's a danger with mid-cues if delays are introduced
... If the user just acquires a channel, cues may be missed

<kaz> [ATSC 3.0 Approach]

Giri: ATSC 3.0 defined two playback models: the application media player (AMP) and the receiver media player (RMP)
... AMP is a standard HTML/JS app, such as dash.js
... This is suitable for certain kinds of devices, without an integrated receiver, taking advantage of a standard browser context
... Then there's the RMP. This is colocated with the AMP, and rendering takes place in conjunction with the receiver.
... Control of the RMP is done over WebSockets

<kaz> [ATSC 3.0 Event Handling]

Giri: As far as event handling is concerned, the AMP runs in the browser context, although emsg isn't supported in most browsers
... This is a problem for the AMP. The RMP, as it's integrated, there's room for customisation
... The RMP can convey event data to the app over WebSockets
... Both methods have latency in event handling
... We don't see perfect solutions here in ATSC

<kaz> [Event Retrieval in ATSC 3.0]

Giri: This diagram is from ATSC. It's not synchronous. We discussed having event subscription
... We believe this is HTML5 compatible, even though we're not using the HTML video tag

<kaz> [Emerging Approach]

Giri: To address some of these issues, MPEG has started work on carriage of web resources via ISO BMFF
... It's a joint proposal from Cyril Concolato at Netflix and Thomas Stockhammer
... It allows for direct rendering, so not dependent on the application. This could take care of some of the perf issues I mentioned
... We can't force a broadcaster to write an app per service, can be done by the content author
... It's work in progress

<kaz> [Conclusion]

Giri: If the media player has an integrated runtime media player, it's possible to deal with it directly
... MPEG considering approaches
... That completes my overview

Igarashi: Thank you Giri for the presentation
... You mentioned discussion with browser vendors, what is the issue there, why don't they support event cues?

Giri: It's the emsg that isn't supported. We're considering it for broadcast media, and I guess they are thinking more about online media
... emsg was also controversial in MPEG, not too many proponents
... not popular from a content authors point of view

Will: emsg is gaining prominence through its adoption at CMAF
... We have a strong preference for a usable emsg implementation in browsers
... The SourceBuffer is the appropriate place to extract the data
... We've started a discussion with Google, Microsoft, and Apple on this

Giri: I fully expect CTA WAVE to be involved in this. It would be great if we can get a report from them on preferred approaches

Igarashi: It's good news that CTA WAVE is considering how to handle emsg in HTML5
... Does the HTML cue API need changes to support emsg, or is it just an implementation issue?

Will: emsg can hold binary payloads and TextTrack cues are text, so you'd need to encode, eg with blase64, so we need a way to expose arbitrary binary payloads
... Is there broader interest from the M&E IG in emsg events, and what's the preferred method to deliver events to the JS layer?

Giri: We don't really have a way to handle typed data with TextTrack cues
... With broadacst media, we worry about exploding with track data,
... e.g, the init segment has to be frequently repeated so that players can start playing quickly

Will: Mark Vickers, who's in CTA WAVE, was involved in the DataCue work. Can DataCue be revitalised?

Francois: You mentioned synchronisation needs with event handling. Right now in HTML5, the timeline for media playback isn't necessarily the one that the app sees
... What are the synchronisation requirements there?
... What kinds of cues are used in practice? What are some good examples needing precise sync?

Giri: In smart TVs, we're doing more app logic for personalisation, e.g., ad campains. We want to customise to the device owner, the consumer.
... This means that client logic is needed, and ad media needs to be available and ready when the cue appears
... If there's uncertainty about how the UA surfaces event data, and as the time references aren't perfectly aligned, there maybe issues with the actual insertion
... This was also a problem in TextTrack cues with several hundred milliseconds latency, you could miss an ad-insertion cue and get dead air. This is something TV manufacturers and broadcasters want to avoid.

Francois: I have another question about binary data. TextTrack cues don't support this, and DataCues aren't implemented. What is binary data used for?

Giri: It's for any other data that needs to be time alinged, that's typed, eg, JPEG images, or simple audio files that are related to the media being played
... Anything where you don't want to deal with the round trip time of requesting the resource, so you want it in-band.

Igarashi: MPEG-DASH uses emsg as an arbitrary generic format. If MPEG-DASH has a specific use, it may also select to use emsg.
... In terms of frame-accurate eventing, as Francois said I don't see any specific requirement. Ad insertion won't be achieved at the app level, it's more at the system or rendering level.
... Some broadcasters maybe want to synchronise web content with the media, e.g., information about a soccer player during a game.
... I see these as rarer applications. Accuracy to only about 100 ms is needed, not frame accuracy, for broadcast systems.

Giri: The W3C specs don't guarantee 100 ms accuracy, something that HbbTV complained about.
... There are other issues than UA latency that result in missing cues. Hence the MPEG work, which should take some of the uncertainty out of processig the events.
... Frame accuracy isn't critical, but 500 ms isn't good either.

Igarashi: I think 300 ms is enough in most cases.

Giri: In my time at ATSC, I haven't seen an accurate timeline inserted from the time of introducing the cue in the transmission infrastructure to when the client must complete its logic.
... That could be good for this group to do, no-one else is looking at this from an HTML5 point of view.

Kaz: Would it make sense to invite CTA WAVE to give an update?

<kaz> scribenick: kaz

Chris: I have discussed that with Mark
... He said he'd prefer to wait until after NAB up in April, so maybe for our monthly call in May?

Kaz: tx for your clarification

Chris: What should the next steps be in this interest group?

<scribe> scribenick: cpn

Will: The IG brings lots of real world use cases
... If we can specify emsg event handling, timing requirements, in addition to what's coming from CTA

Igarashi: I agree, also how emsg are used for services
... We should discuss how emsg can be used for broadcast systems, other requirements

<kaz> scribenick: kaz

Chris: We have an unfortunate schedule overlap with TTWG, who also meet on Thursday afternoons
... This topic is clearly in their area of interest, so I want to discuss together with them.
... I know that TTWG have a general issue regarding TTML browser implementations, and a proposed
... solution is passing responsibilities more on the app layer, with an extended TextTrack API.
... I'd like to move the time of this call to avoid the schedule overlap, so that we can share
... information with the TTWG guys. But I'm not sure when to move to at the moment.
... It could be moved to a Tuesday or Wednesday at a similar time.
... I will try to identify a better slot based on people's availability.
... Also, we can the kind of use cases and requirements around synchronization and timing requirements.
... We could start comparison on the wiki, etc.
... Maybe everything is covered by the CTA's work, but would like to see input from the wider Web community
... For example, during the breakout session at TPAC, there was mention of requirements for synchronising
... web content with audiobooks. This is another group we may contact to see if we cover all their requirements.

<tidoust> Synchronized Multimedia for Publications CG

Chris: I can take an action item to do that.

Kaz: Maybe we can start some work on gathering use cases and requirements on the wiki or GitHub?

Chris: This would be useful, also with input from TTWG.
... But, it would be good to have an initial proposal for people to respond to.
... Also use cases coming from the media industry, as Igarashi mentioned.
... Unless any other points for today, we can adjourn the call.
... Thanks, Giri!
... And thank you to all for attending.
... As a reminder, it would be good to hear from you about topics for future calls. Please get in touch.


Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.147 (CVS log)
$Date: 2018/02/02 10:21:34 $