W3C

- DRAFT -

Media and Entertainment IG
06 Mar 2018

Agenda

Attendees

Present
Kaz_Ashimura, Bob_Lund, Chris_Needham, Cyril_Concolato, Giri_Mandayam, Francois_Daoust, Geun-Hyung_Kim, Eric_Carlson, Tatsuya_Igarashi, Nigel_Megitt, Peter_tho_Pesch, Steve_Morris, Marisa_DeMeglio, John_Luther, Kazuhiro_Hoya
Regrets
Chair
Chris, Igarashi
Scribe
Chris, Kaz

Contents


Introduction

<kaz> scribenick: kaz

Chris: During the previous call, Giri gave a presentation on media timed events
... ATSC work, DASH events, emsg in ISO BMFF containers, ...
... which identified potential gaps with web platform
... That call was well attended, the topic seemed of interest to many IG members
... so I thought that it was something that the IG should follow up on
... As part of that, I produced an initial document to summarize what we discussed
... pointing to existing work, and previous discussions

<tidoust> Use cases and gap analysis: Media timed events and synchronisation in HTML5

Chris: I would like to figure out what the IG should usefully do
... so today I'm hoping for an open discussion amongst us all,
... to think about our next steps to progress on this topic
... The document talked described three use cases:
... Synchronised event triggering, support for subtitle and caption formats other than WebVTT, and Synchronised rendering of web resources
... I would like to invite Cyril to tell us about synchronised rendering of web resources
... I have invited Marisa to join us, as chair of the Synchronised Multimedia for Publications CG
... https://www.w3.org/community/sync-media-pub/
... Maybe you could tell us what some of your goals are, and the current status?
... On the timed text side, it's great to have members of TTWG with us today, thank you
... I've spoken with Andreas about the generic TextTrackCue proposoal, he can't be here today so I'll talk about that later
... I also want to ask Giri to talk about our next steps
... AOB?

Nigel: I sent a message to the IG recently about audio description
... implementation of client side, requirements for capture

Chris: Yes, let's cover that as well, thank you.

Carriage of Web Resources in ISO-BMFF

<scribe> scribenick: cpn

Cyril: Here's a document i'm editing at MPEG: Carriage of Web resources in ISO BMFF
... [shares his screen]
... It started as an activity in MPEG a while ago, exploring what was needed in the MPEG space,
... to facilitate delivery of web resources: HTML, JavaScript, CSS, etc
... We weren't sure at the beginning what the output would be in terms of standards
... We've produced a committee draft, not uploaded yet, I will do that in a few days
... It's quite a light document, it doesn't define a new toolbox
... It's similar to CMAF in that sense, it describes how you use existing tools from ISO BMFF
... The two aspects we're dealing with are: carriage of timed web resources, and carriage of non-timed resources
... The difference is more in how the timing information is delivered,
... eg a resource where the timing is defined in an XML document
... What is a timed web resource? They're stored in tracks, one type carries HTML content, another type with JS, another with WebVTT metadata events
... In the HTML track, the idea is not to define a mechanism or complex processing for HTML data. The document is loaded at the time by the processing
... It's as if the browser navigates from one document to another at the given time
... For JavaScript code, this could have no HTML at all, if the entire timed application is in JavaScript
... A note about emsg boxes: It's important to understand the difference between this, and the draft doc I'm presenting here
... The tracks here are first class tracks in MP4, meant to be processed in a timely manner.
... With emsg boxes, they're more targeted to the application, not meant to be replayed
... The content of the time track in this case would be replayed
... We need to be precise about what entity in the consumer is intended to handle these events,
... is it something deep in the media player, or something in the application layer?

Igarashi: I see the difference between the timed media track and emsg boxes, but i don't see the use cases for timed web resources

Cyril: I agree, in most cases you won't have continuous HTML changes
... The track mechanism can handle sparse events
... The question is which entity will consume the events, and what's the processing
... One thing not clear to me with emsg, is what happens when you defragment the file?
... The emsg box in my view is something that you consume while streaming, but has no meaning outside this
... With timed tracks, content is expected to be useful separately

Bob: This distinction, is this something that should be fixed in the emsg spec?
... I can see applications where you want to replay emsg events

Cyril: Maybe it is possible to design such a player

Bob: We extended the dash player to handle emsg events and dash events

Cyril: In section 5.4, the use of URLs to web resources, the idea is to clarify how to link to such resources
... The meta box contains data that should be seen by the browser as a local cache
... If the browser loads the content, and needs some CSS, it can find it in the cache, otherwise it goes to the network
... This isn't a new idea, just highlighted in this document

<Zakim> nigel, you wanted to ask how WebVTT metadata can be made available to JavaScript code in the absence of DataCue implementations

Nigel: there's a suggestion that the data gets turned into something consumable from JS
... This implies DataCue, or is there another way to do it?

Cyril: This doc only covers storage, not how it's exposed, DataCue is one way to go

Nigel: Other mechanisms? Is it important to MPEG how implementable this is (more a process question)?

Cyril: MPEG started this as there was evidence that with this, you could do something in the browser,
... eg, a service worker consuming an MP4 file is another way

<kaz> Chris: Thanks Cyril for presenting this information, this is really valuable input.

Igarashi: Regarding web resources, via emsg or tracks, who consumes the resources is independent of the delivery

<cyril> RRSAgent: pointer

Igarashi: Also, emsg could be used for replay as well as web resource tracks, and not just in the streaming case

Cyril: I'd like to clarify the terms we're using. We should be clear what is an event and what is a resource
... For me, an event is something that causes a trigger, shouldn't necessarily carry the resource

Igarashi: emsg could be arbitrary binary messsages

E-Publishing on the Web

Marisa: I work for the DAISY consortium,
... on talking books for the blind and visually impaired
... We work with EPUB, audio clips synchronised with fragments in an HTML5
... We want this in the next iteration of EPUB on the web, we spun out a CG from the Publishing WG
... The task for our CG is to look at existing technology, ideally don't reinvent anything
... What we need is the ability to synchronise audio fragments with HTML fragments
... For example, the page of a book is open, the user presses Play, and depending on implementation / user preference
... there's a highlight that follows the phrases
... I heard that datacue could be useful for us, and I want to learn about this group, and TTML

<Zakim> nigel, you wanted to ask if the audio is pre-recorded or synthesised

Nigel: Is the audio pre-recorded, or is it synthesised based on text?

Marisa: It's pre-recorded

Nigel: So there's not the need for a screen reader
... TTML and WebVTT are predicated on playing back timed media, but in your case it seems the events are user driven
... Seems there isn't a good fit with TTML / WebVTT, a better fit could be SMIL

Marisa: SMIL is a good fit, but nobody enjoys writing it, or reading it
... We're looking to move to something simpler to ingest, and also for people to comprehend
... The SMIL files that our producers make are driven by time codes, but the user can start playback and interrupt it,
... but once playback starts, it plays from top to bottom

Nigel: TTML2 has hooks in it for playing audio files at specific times
... My understanding that you'd need custom data in a WebVTT payload to achieve the same thing

Marisa: I've been looking for examples, but found nothing similar. In my case, the TTML wouldn't have text, only audio

Nigel: That's possible with TTML, either embedded fragments or references to external resources

Marisa: Is there a specific profile?

Nigel: I've invited people to participate, maybe as a W3C CG, to create a TTML profile for audio requirements

Marisa: How are browsers with TTML2? This is our primary user agent base

Nigel: Browsers don't generally support it, in the main, it can be done in JavaScript

chris: Anything else to mention on the possible CG, Nigel?

Nigel: Only that synchronised playback will have requirements for playback of media timed events
... In terms of solutions, we might want to look at what Web Audio does
... This has advanced instructions to the processor of what needs to happen and when
... It's a different model to TextTrackCue, instructive to see that that exists. Is it useful to extend that model into other domain?

<Zakim> ericc, you wanted to suggest that a simple "data cue" may be exactly what is needed

Eric: I'd like to suggest that DAISY's needs could be met by a simple DataCue,
... a timed event emitted based on current time of the media file (the spoken audio in this case).
... it contains a blob of data to be interpreted by script rather than the UA.
... When a section of the audio is emitted by the UA, it also emits the DataCue.
... On user interaction with the page, the script would get information from the markup about the time corresponding to that phrase
... The script wouldn't have to be terribly sophisticated, and should work for what you're trying to do

Marisa: That's how it works now, though we want to give it a refresh, move away from SMIL, maybe to something that could be implemented natively by browsers
... Is what you described possible today?

Eric: it is possible in safari. it has an implementation of DataCue, was in the spec several years ago
... it's been removed from the spec, but people are talking about reviving it
... it could be implemented in safari right now

<Zakim> kaz, you wanted to ask about the usage of SSML

Kaz: SSML and the speech API may be of interest too
... You mentioned using pre-recorded audio, if we use speech synthesis we could generate the audio based on SSML

Marisa: What we see with content without pre-recorded audio, people use prefer to use screen readers
... We still need pre-recorded audio for professional productions, and systems without text-to-speech

<Zakim> nigel, you wanted to note that web speech api's output is not available to Web Audio, which is a technical limitation for implementers

Nigel: The Web Speech API makes the operating system generate the speech output, but this audio isn't available to Web Audio API
... This is a gap that we found
... Also, regarding screen readers, what's the size of the community of people who want synthesized speech, but don't have screen readers?

Marisa: That's a good question, let me find out about that

Support for caption formats other than WebVTT

chris: I spoke to Andreas offline. He has hosted discussions at TPACs previously on the need for a generic TextTrackCue API
... I have invited him to give us an update on this when he's ready

Next steps

<kaz> scribenick: kaz

Chris: After the last call, we thought about what to do as next steps within this IG

<cpn> scribenick: cpn

Giri: We talked about making a Task Force, to gather use cases and requirements
... This sounds useful, given the discussion we've had today
... My proposal is to take this into solid proposals for web standardisation
... This could be bringing new requirements to an existing spec, eg ISO BMFF container handling
... A Task Force with limited life span, to conclude at TPAC this year
... We can have monthly calls, can do on it on GitHub or wiki, seems more collaborative on GitHub
... We want to consider not just the streaming media use cases, but also the EPUB use cases,
... and other areas where timed metadata is useful, to cover all our interests
... Will talk with W3C staff about setting up a GitHub

<kaz> scribenick: kaz

Chris: I agree about GitHub, possibly the output could be an W3C IG Note, we'll see

Giri: Would like to do that after the GitHub repo is set up

Chris: We should talk about some of the details offline, for example,
... should we have separate calls for the TF?
... There are other topics that the IG could discuss, so maybe having separate calls for the TF could be a way to go
... We'll discuss and announce something to the IG

Conclusion

Chris: This is really interesting area, thank you all for your contributions
... We've heard different views around a common area of interest
... The detail of the TF is to be announced

Kaz: Should we record the decision to create the TF as RESOLUTION?

Chris: Yes

RESOLUTION: We'll create a dedicated TF for the Media-Timed Events topic (detail to be announced)

Next IG meeting

-> W3C Comm Team's message on Daylight Savings (member-only)

Chris: April 3
... but please note there is daylight saving switch over
... thank you for joining, everybody
... speak to you in one month!

[adjourned]

Summary of Action Items

Summary of Resolutions

  1. We'll create a dedicated TF for the Media-Timed Events topic (detail to be announced)
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.147 (CVS log)
$Date: 2018/03/15 19:04:58 $