<cpn> Scribe: Chris
<cpn> Scribenick: cpn
<kaz> https://www.w3.org/2011/webtv/wiki/images/a/a5/DASH_Eventing_and_HTML5.pdf Giri's slides (Member-only)
<kaz> [Introduction]
Giri: This is a brief intro to
ongoing work in MPEG, and what we've done in ATSC
... There are 2 types of events we deal with in DASH
... DASH is adaptive streaming over HTTP, designed to leverage HTTP
for streaming media, live or on-demand
... Media Source Extensions and Encrypted Media Extensions, as well
as the audio and video media tags deal with this
... Interactivity events, absolute or relative time
... DASH defines two ways to deliver events: in the MPD manifest
XML file, it describes the segments in the streaming service
... Then there are in-band events, in an emsg box in the ISO BMFF
media track
... ISO BMFF is a packaging format defined by MPEG, the most
popular format of DASH packaging.
... There are other forms, WebM being popular also
... Issue with synchronisation, media playback should be handled by
the native media player
... There are two things needing synchronisation: the media player
and the web page in the browser context
... emsg eventing is a more dire situation, not supported by
browsers
... in the byte stream registry, there's no requirement for a
browser implementation
... only custom browsers deal with emsg data, not mainstream
browsers
... this was problematic in designing ATSC
<kaz> [How does HTML5 handle DASH events today?]
Giri: this is just my opinion, not
authoritive
... HTML has TextTrackCue with an identifier, text string, start
and end time, and a payload
... There can be metadata cues
... If you have a DASH event on the transmitter side, this could
transcode in-band events into text track cues, and present them in
the text track
... Here's an example from the WebVTT spec
... There's a separation between cues to be handled by the player
and those to be handled by the application
<kaz> [HTML5 Handling of Text Track Cues]
Giri: In HTML5, the video track
allows for track specific event handlers, oncuechange event
... There was a proposal for DataCues with binary payloads
... Browser vendor support is non-existent AFAICT
... There's a 4 year old discussion on the Chromium mailing
list
... HbbTV has also identified problems with short duration cues,
where cues may expire before the UA could handle them
... There's a specific problem in ATSC where we try to minimise
channel acquisition
... i.e, start playback as quickly as possible on channel
change
... There's a danger with mid-cues if delays are introduced
... If the user just acquires a channel, cues may be missed
<kaz> [ATSC 3.0 Approach]
Giri: ATSC 3.0 defined two playback
models: the application media player (AMP) and the receiver media
player (RMP)
... AMP is a standard HTML/JS app, such as dash.js
... This is suitable for certain kinds of devices, without an
integrated receiver, taking advantage of a standard browser
context
... Then there's the RMP. This is colocated with the AMP, and
rendering takes place in conjunction with the receiver.
... Control of the RMP is done over WebSockets
<kaz> [ATSC 3.0 Event Handling]
Giri: As far as event handling is
concerned, the AMP runs in the browser context, although emsg isn't
supported in most browsers
... This is a problem for the AMP. The RMP, as it's integrated,
there's room for customisation
... The RMP can convey event data to the app over WebSockets
... Both methods have latency in event handling
... We don't see perfect solutions here in ATSC
<kaz> [Event Retrieval in ATSC 3.0]
Giri: This diagram is from ATSC. It's
not synchronous. We discussed having event subscription
... We believe this is HTML5 compatible, even though we're not
using the HTML video tag
<kaz> [Emerging Approach]
Giri: To address some of these
issues, MPEG has started work on carriage of web resources via ISO
BMFF
... It's a joint proposal from Cyril Concolato at Netflix and
Thomas Stockhammer
... It allows for direct rendering, so not dependent on the
application. This could take care of some of the perf issues I
mentioned
... We can't force a broadcaster to write an app per service, can
be done by the content author
... It's work in progress
<kaz> [Conclusion]
Giri: If the media player has an
integrated runtime media player, it's possible to deal with it
directly
... MPEG considering approaches
... That completes my overview
Igarashi: Thank you Giri for the
presentation
... You mentioned discussion with browser vendors, what is the
issue there, why don't they support event cues?
Giri: It's the emsg that isn't
supported. We're considering it for broadcast media, and I guess
they are thinking more about online media
... emsg was also controversial in MPEG, not too many
proponents
... not popular from a content authors point of view
Will: emsg is gaining prominence
through its adoption at CMAF
... We have a strong preference for a usable emsg implementation in
browsers
... The SourceBuffer is the appropriate place to extract the
data
... We've started a discussion with Google, Microsoft, and Apple on
this
Giri: I fully expect CTA WAVE to be involved in this. It would be great if we can get a report from them on preferred approaches
Igarashi: It's good news that CTA
WAVE is considering how to handle emsg in HTML5
... Does the HTML cue API need changes to support emsg, or is it
just an implementation issue?
Will: emsg can hold binary payloads
and TextTrack cues are text, so you'd need to encode, eg with
blase64, so we need a way to expose arbitrary binary payloads
... Is there broader interest from the M&E IG in emsg events,
and what's the preferred method to deliver events to the JS
layer?
Giri: We don't really have a way to
handle typed data with TextTrack cues
... With broadacst media, we worry about exploding with track
data,
... e.g, the init segment has to be frequently repeated so that
players can start playing quickly
Will: Mark Vickers, who's in CTA WAVE, was involved in the DataCue work. Can DataCue be revitalised?
Francois: You mentioned
synchronisation needs with event handling. Right now in HTML5, the
timeline for media playback isn't necessarily the one that the app
sees
... What are the synchronisation requirements there?
... What kinds of cues are used in practice? What are some good
examples needing precise sync?
Giri: In smart TVs, we're doing more
app logic for personalisation, e.g., ad campains. We want to
customise to the device owner, the consumer.
... This means that client logic is needed, and ad media needs to
be available and ready when the cue appears
... If there's uncertainty about how the UA surfaces event data,
and as the time references aren't perfectly aligned, there maybe
issues with the actual insertion
... This was also a problem in TextTrack cues with several hundred
milliseconds latency, you could miss an ad-insertion cue and get
dead air. This is something TV manufacturers and broadcasters want
to avoid.
Francois: I have another question about binary data. TextTrack cues don't support this, and DataCues aren't implemented. What is binary data used for?
Giri: It's for any other data that
needs to be time alinged, that's typed, eg, JPEG images, or simple
audio files that are related to the media being played
... Anything where you don't want to deal with the round trip time
of requesting the resource, so you want it in-band.
Igarashi: MPEG-DASH uses emsg as an
arbitrary generic format. If MPEG-DASH has a specific use, it may
also select to use emsg.
... In terms of frame-accurate eventing, as Francois said I don't
see any specific requirement. Ad insertion won't be achieved at the
app level, it's more at the system or rendering level.
... Some broadcasters maybe want to synchronise web content with
the media, e.g., information about a soccer player during a
game.
... I see these as rarer applications. Accuracy to only about 100
ms is needed, not frame accuracy, for broadcast systems.
Giri: The W3C specs don't guarantee
100 ms accuracy, something that HbbTV complained about.
... There are other issues than UA latency that result in missing
cues. Hence the MPEG work, which should take some of the
uncertainty out of processig the events.
... Frame accuracy isn't critical, but 500 ms isn't good
either.
Igarashi: I think 300 ms is enough in most cases.
Giri: In my time at ATSC, I haven't
seen an accurate timeline inserted from the time of introducing the
cue in the transmission infrastructure to when the client must
complete its logic.
... That could be good for this group to do, no-one else is looking
at this from an HTML5 point of view.
Kaz: Would it make sense to invite CTA WAVE to give an update?
<kaz> scribenick: kaz
Chris: I have discussed that with
Mark
... He said he'd prefer to wait until after NAB up in April, so
maybe for our monthly call in May?
Kaz: tx for your clarification
Chris: What should the next steps be in this interest group?
<scribe> scribenick: cpn
Will: The IG brings lots of real
world use cases
... If we can specify emsg event handling, timing requirements, in
addition to what's coming from CTA
Igarashi: I agree, also how emsg are
used for services
... We should discuss how emsg can be used for broadcast systems,
other requirements
<kaz> scribenick: kaz
Chris: We have an unfortunate
schedule overlap with TTWG, who also meet on Thursday
afternoons
... This topic is clearly in their area of interest, so I want to
discuss together with them.
... I know that TTWG have a general issue regarding TTML browser
implementations, and a proposed
... solution is passing responsibilities more on the app layer,
with an extended TextTrack API.
... I'd like to move the time of this call to avoid the schedule
overlap, so that we can share
... information with the TTWG guys. But I'm not sure when to move
to at the moment.
... It could be moved to a Tuesday or Wednesday at a similar
time.
... I will try to identify a better slot based on people's
availability.
... Also, we can the kind of use cases and requirements around
synchronization and timing requirements.
... We could start comparison on the wiki, etc.
... Maybe everything is covered by the CTA's work, but would like
to see input from the wider Web community
... For example, during the breakout session at TPAC, there was
mention of requirements for synchronising
... web content with audiobooks. This is another group we may
contact to see if we cover all their requirements.
<tidoust> Synchronized Multimedia for Publications CG
Chris: I can take an action item to do that.
Kaz: Maybe we can start some work on gathering use cases and requirements on the wiki or GitHub?
Chris: This would be useful, also
with input from TTWG.
... But, it would be good to have an initial proposal for people to
respond to.
... Also use cases coming from the media industry, as Igarashi
mentioned.
... Unless any other points for today, we can adjourn the
call.
... Thanks, Giri!
... And thank you to all for attending.
... As a reminder, it would be good to hear from you about topics
for future calls. Please get in touch.
[adjourned]