<kaz> scribenick: kaz
Chris: During the previous call, Giri
gave a presentation on media timed events
... ATSC work, DASH events, emsg in ISO BMFF containers, ...
... which identified potential gaps with web platform
... That call was well attended, the topic seemed of interest to
many IG members
... so I thought that it was something that the IG should follow up
on
... As part of that, I produced an initial document to summarize
what we discussed
... pointing to existing work, and previous discussions
<tidoust> Use cases and gap analysis: Media timed events and synchronisation in HTML5
Chris: I would like to figure out what
the IG should usefully do
... so today I'm hoping for an open discussion amongst us
all,
... to think about our next steps to progress on this topic
... The document talked described three use cases:
... Synchronised event triggering, support for subtitle and caption
formats other than WebVTT, and Synchronised rendering of web
resources
... I would like to invite Cyril to tell us about synchronised
rendering of web resources
... I have invited Marisa to join us, as chair of the Synchronised
Multimedia for Publications CG
... https://www.w3.org/community/sync-media-pub/
... Maybe you could tell us what some of your goals are, and the
current status?
... On the timed text side, it's great to have members of TTWG with
us today, thank you
... I've spoken with Andreas about the generic TextTrackCue
proposoal, he can't be here today so I'll talk about that
later
... I also want to ask Giri to talk about our next steps
... AOB?
Nigel: I sent a message to the IG
recently about audio description
... implementation of client side, requirements for capture
Chris: Yes, let's cover that as well, thank you.
<scribe> scribenick: cpn
Cyril: Here's a document i'm editing
at MPEG: Carriage of Web resources in ISO BMFF
... [shares his screen]
... It started as an activity in MPEG a while ago, exploring what
was needed in the MPEG space,
... to facilitate delivery of web resources: HTML, JavaScript, CSS,
etc
... We weren't sure at the beginning what the output would be in
terms of standards
... We've produced a committee draft, not uploaded yet, I will do
that in a few days
... It's quite a light document, it doesn't define a new
toolbox
... It's similar to CMAF in that sense, it describes how you use
existing tools from ISO BMFF
... The two aspects we're dealing with are: carriage of timed web
resources, and carriage of non-timed resources
... The difference is more in how the timing information is
delivered,
... eg a resource where the timing is defined in an XML
document
... What is a timed web resource? They're stored in tracks, one
type carries HTML content, another type with JS, another with
WebVTT metadata events
... In the HTML track, the idea is not to define a mechanism or
complex processing for HTML data. The document is loaded at the
time by the processing
... It's as if the browser navigates from one document to another
at the given time
... For JavaScript code, this could have no HTML at all, if the
entire timed application is in JavaScript
... A note about emsg boxes: It's important to understand the
difference between this, and the draft doc I'm presenting
here
... The tracks here are first class tracks in MP4, meant to be
processed in a timely manner.
... With emsg boxes, they're more targeted to the application, not
meant to be replayed
... The content of the time track in this case would be
replayed
... We need to be precise about what entity in the consumer is
intended to handle these events,
... is it something deep in the media player, or something in the
application layer?
Igarashi: I see the difference between the timed media track and emsg boxes, but i don't see the use cases for timed web resources
Cyril: I agree, in most cases you
won't have continuous HTML changes
... The track mechanism can handle sparse events
... The question is which entity will consume the events, and
what's the processing
... One thing not clear to me with emsg, is what happens when you
defragment the file?
... The emsg box in my view is something that you consume while
streaming, but has no meaning outside this
... With timed tracks, content is expected to be useful
separately
Bob: This distinction, is this
something that should be fixed in the emsg spec?
... I can see applications where you want to replay emsg events
Cyril: Maybe it is possible to design such a player
Bob: We extended the dash player to handle emsg events and dash events
Cyril: In section 5.4, the use of
URLs to web resources, the idea is to clarify how to link to such
resources
... The meta box contains data that should be seen by the browser
as a local cache
... If the browser loads the content, and needs some CSS, it can
find it in the cache, otherwise it goes to the network
... This isn't a new idea, just highlighted in this document
<Zakim> nigel, you wanted to ask how WebVTT metadata can be made available to JavaScript code in the absence of DataCue implementations
Nigel: there's a suggestion that the
data gets turned into something consumable from JS
... This implies DataCue, or is there another way to do it?
Cyril: This doc only covers storage, not how it's exposed, DataCue is one way to go
Nigel: Other mechanisms? Is it important to MPEG how implementable this is (more a process question)?
Cyril: MPEG started this as there was
evidence that with this, you could do something in the
browser,
... eg, a service worker consuming an MP4 file is another way
<kaz> Chris: Thanks Cyril for presenting this information, this is really valuable input.
Igarashi: Regarding web resources, via emsg or tracks, who consumes the resources is independent of the delivery
<cyril> RRSAgent: pointer
Igarashi: Also, emsg could be used for replay as well as web resource tracks, and not just in the streaming case
Cyril: I'd like to clarify the terms
we're using. We should be clear what is an event and what is a
resource
... For me, an event is something that causes a trigger, shouldn't
necessarily carry the resource
Igarashi: emsg could be arbitrary binary messsages
Marisa: I work for the DAISY
consortium,
... on talking books for the blind and visually impaired
... We work with EPUB, audio clips synchronised with fragments in
an HTML5
... We want this in the next iteration of EPUB on the web, we spun
out a CG from the Publishing WG
... The task for our CG is to look at existing technology, ideally
don't reinvent anything
... What we need is the ability to synchronise audio fragments with
HTML fragments
... For example, the page of a book is open, the user presses Play,
and depending on implementation / user preference
... there's a highlight that follows the phrases
... I heard that datacue could be useful for us, and I want to
learn about this group, and TTML
<Zakim> nigel, you wanted to ask if the audio is pre-recorded or synthesised
Nigel: Is the audio pre-recorded, or is it synthesised based on text?
Marisa: It's pre-recorded
Nigel: So there's not the need for a
screen reader
... TTML and WebVTT are predicated on playing back timed media, but
in your case it seems the events are user driven
... Seems there isn't a good fit with TTML / WebVTT, a better fit
could be SMIL
Marisa: SMIL is a good fit, but
nobody enjoys writing it, or reading it
... We're looking to move to something simpler to ingest, and also
for people to comprehend
... The SMIL files that our producers make are driven by time
codes, but the user can start playback and interrupt it,
... but once playback starts, it plays from top to bottom
Nigel: TTML2 has hooks in it for
playing audio files at specific times
... My understanding that you'd need custom data in a WebVTT
payload to achieve the same thing
Marisa: I've been looking for examples, but found nothing similar. In my case, the TTML wouldn't have text, only audio
Nigel: That's possible with TTML, either embedded fragments or references to external resources
Marisa: Is there a specific profile?
Nigel: I've invited people to participate, maybe as a W3C CG, to create a TTML profile for audio requirements
Marisa: How are browsers with TTML2? This is our primary user agent base
Nigel: Browsers don't generally support it, in the main, it can be done in JavaScript
chris: Anything else to mention on the possible CG, Nigel?
Nigel: Only that synchronised
playback will have requirements for playback of media timed
events
... In terms of solutions, we might want to look at what Web Audio
does
... This has advanced instructions to the processor of what needs
to happen and when
... It's a different model to TextTrackCue, instructive to see that
that exists. Is it useful to extend that model into other
domain?
<Zakim> ericc, you wanted to suggest that a simple "data cue" may be exactly what is needed
Eric: I'd like to suggest that
DAISY's needs could be met by a simple DataCue,
... a timed event emitted based on current time of the media file
(the spoken audio in this case).
... it contains a blob of data to be interpreted by script rather
than the UA.
... When a section of the audio is emitted by the UA, it also emits
the DataCue.
... On user interaction with the page, the script would get
information from the markup about the time corresponding to that
phrase
... The script wouldn't have to be terribly sophisticated, and
should work for what you're trying to do
Marisa: That's how it works now,
though we want to give it a refresh, move away from SMIL, maybe to
something that could be implemented natively by browsers
... Is what you described possible today?
Eric: it is possible in safari. it
has an implementation of DataCue, was in the spec several years
ago
... it's been removed from the spec, but people are talking about
reviving it
... it could be implemented in safari right now
<Zakim> kaz, you wanted to ask about the usage of SSML
Kaz: SSML and the speech API may be
of interest too
... You mentioned using pre-recorded audio, if we use speech
synthesis we could generate the audio based on SSML
Marisa: What we see with content
without pre-recorded audio, people use prefer to use screen
readers
... We still need pre-recorded audio for professional productions,
and systems without text-to-speech
<Zakim> nigel, you wanted to note that web speech api's output is not available to Web Audio, which is a technical limitation for implementers
Nigel: The Web Speech API makes the
operating system generate the speech output, but this audio isn't
available to Web Audio API
... This is a gap that we found
... Also, regarding screen readers, what's the size of the
community of people who want synthesized speech, but don't have
screen readers?
Marisa: That's a good question, let me find out about that
chris: I spoke to Andreas offline. He
has hosted discussions at TPACs previously on the need for a
generic TextTrackCue API
... I have invited him to give us an update on this when he's
ready
<kaz> scribenick: kaz
Chris: After the last call, we thought about what to do as next steps within this IG
<cpn> scribenick: cpn
Giri: We talked about making a Task
Force, to gather use cases and requirements
... This sounds useful, given the discussion we've had today
... My proposal is to take this into solid proposals for web
standardisation
... This could be bringing new requirements to an existing spec, eg
ISO BMFF container handling
... A Task Force with limited life span, to conclude at TPAC this
year
... We can have monthly calls, can do on it on GitHub or wiki,
seems more collaborative on GitHub
... We want to consider not just the streaming media use cases, but
also the EPUB use cases,
... and other areas where timed metadata is useful, to cover all
our interests
... Will talk with W3C staff about setting up a GitHub
<kaz> scribenick: kaz
Chris: I agree about GitHub, possibly the output could be an W3C IG Note, we'll see
Giri: Would like to do that after the GitHub repo is set up
Chris: We should talk about some of the
details offline, for example,
... should we have separate calls for the TF?
... There are other topics that the IG could discuss, so maybe
having separate calls for the TF could be a way to go
... We'll discuss and announce something to the IG
Chris: This is really
interesting area, thank you all for your contributions
... We've heard different views around a common area of
interest
... The detail of the TF is to be announced
Kaz: Should we record the decision to create the TF as RESOLUTION?
Chris: Yes
RESOLUTION: We'll create a dedicated TF for the Media-Timed Events topic (detail to be announced)
-> W3C Comm Team's message on Daylight Savings (member-only)
Chris: April 3
... but please note there is daylight saving switch over
... thank you for joining, everybody
... speak to you in one month!
[adjourned]