Audio Description Community Group, Timed Text Working Group Joint Meeting

Meeting minutes

Introductions

Nigel: Nigel Megitt, BBC, chair ADCG, co-chair TTWG, one of the Editors of DAPT

ray-schwartz9: Ray Schwartz: he/him NFCU, memeber of ARIA

gabriel: eng on MS Edge, part of Web Audio

atsushi: w3c contact TTWG

<nigel> s/??/atsushi

niko: Nikolas Fairburn, Media and Entertainment Interest Group

bernd: member of Media and Entertainment Interest Group, and WICG

<jcraig> s/??/niko/

jcraig: James Craig, Apple Accessibility, member of TTWG, interested in audio descriptions, most active in ARIA

Adam_Page: Hilton Accessibility ARIA WG

cyril: Netflix, TTWG Co-editor

<atsushi> Hiroshi_Ohta: from LINE Yahoo Corp.

reinaldoferraz: Reinaldo Ferraz, W3C chapter Sao Paulo, observer

sprang: Google Meet, Observer

<reinaldoferraz> NIC.br

wschi: Weiwei Xu, Huawei, Media Standard Department

nigel: intros ahead of schedule
… any other agenda topics?

Agenda

DAPT profiles in TTML

Authoring and production workflows for Loc and AD

no other agenda items

cyril: have others deployed AD in recent deployments
… MPEG-H, coding audio descriptions, etc

wschi: used in broadcast and to mix them in encode/decode

mixing not supported in browser yet

cyril: used in VOD

Robert Warren: niko agriculture, interest in the humanities

jcraig: To Cyril's question, Apple has a number of different audio description features
… as a streaming service provider with Apple TV+, I have no part of it but very proud of the work
… we do with AD and captions. Most of the Apple Original content has 9 AD languages and 40 caption track.
… On the product side, there are a number of features related to AD and captions that most people
… are not aware of. For example if you are blind and have a screen reader you can choose to have
… captions Brailled or spoken live (for translation).
… Experimented with something similar for audio descriptions.
… Eric Carlson and I demoed that 2 years ago in TPAC.
… Take AD track type of the web <video> element, parsing it on the fly and either
… speaking it or Brailling it. Silent descriptions sent live to the Braille display.
… Someone who is deafblind could enjoy them widely.
… Not deployed widely, just a tech demo
… Love to get more interest in it, add the Braille support.
… That demo was a custom implementation of WebVTT in the video player

Adam_Page: Hilton deploying more AD to the webiste

Adam_Page: Another data point. At Hilton, not a big video platform,
… baking the audio track in

jcraig: Second track is standard way to deploy AD now, with a tag saying it's AD, to support
… auto selection
… Technically just a standard audio track

Adam_Page: user chooses the preferred track
… most require extended.

jcraig: One of the things was descriptions longer than the natural gap in the audio, e.g.
… extended descriptions, we demoed auto pause of video in the player when that happened.
… Have not seen a lot of deployments of that.
… I think WGBH has some demos of extended description

jcraig: demo in Vancouver was extended lecture paused to describe a chart

nigel: BBC deploys choice between pre-mixed audio trck with AD versus w/o
… also deploying a dry AD track (not mixed with main audio) plus mix data
… DVB-T is widely deployed and supported in the UK
… transport stream is specified in the UK's "d-book"

cyril: UK-specific technology?

nigel: yes, since 2006 or so

<rwarren2> "Widely available for a large amount of money" ;)

nigel: That's the broadcast standard
… online we deploy separate video files like Hilton
… lately starting to deploy Live descriptions... timing is an artform.. the describers research ahead of time

cyril: is the describer local or remote?

nigel: either... third-party service

Adam_Page: English only?

nigel: yes

wschi: thinking of replicating the live AD use case into browsers as well?

nigel: 3-4 yrs ago, demoed TTML2 with live mix instructions in the browser
… could be used for live broadcast, too
… tech demo can mix two audio tracksusing mix data (well) and/or less-well mixed with text-to-speech .. (generated speech synthesis)
… could deploy as sADM, or we may deploy as a custom implementation in the BBC player

niko: NGA can include AD, and spatial position to separate Object-based audio

Object-based audio

bernd: demo and discussed in the Media and Entertainment Interest Group this past Monday....

eric_carlson: WebKit eng at Apple, inc TT

jernoble: Jer Noble... WebKit Engineer at Apple, and TT

cyril: how widely deployed is AD around the world?
… Are there countries with no AD? distribution?

nigel: BBC audio describes over 20% of our programmes, regulatory requirement is 10%.
… other countries do some percentage

jcraig: smaller percentages

cyril: is AD deployed widely in Japan?

Hiroshi_Ohta: audio subchannels are popular in Japan... for example, background data about baseball players during games.... Not as widely deployed for AD for the Blind

nigel: most recent olympic games in Japan included additional data (NHK?) on the subchannel generating AD about the scores

??: not sure of which subchannels are auto-generated or not?

nigel: one development in AD that has been gaining in use is synthesized voices
… there is an advocacy group in UK that has been running user test experiments
… Royal National Institute for the Blind (RNIB)

Nigel: new attendee?

dana: Hi!. I'm Dana. I work on WebKit.

jcraig: Deployments in Japan: more common with streaming services.
… Japanese is one of the languages that Apple Original content localises to
… HBO is starting to ramp up as well, and starting to lead the way with signed / PiP / ASL movies,
… being deployed as separate video files because there isn't a way to compose the dry components
… and keep them in sync

nigel: on the tangent of sign interpretation, there is a new regulatory requirement in Spain
… 3cat (Catalonian broadcaster) recently demoed an HTTBV receiver?
… Got the signing stream over IP and recomposited, implementation in WASM
… I think the resolution of the signed video was lower than the main broadcast video

jcraig: resolution does not need to be as high, but high framerate is critical with sign language... easy to lose context with dropped frames

DAPT

nigel: this spec has originations back a few years
… TTML2 could trigger audio playback, pitch, etc, audio mixing etc
… but in general TTML is a TT format... I tried to do an AD variant, but had not as broad uptake
… so DAPT is AD plus mainstream dubbing as use cases
… and other uses
… thinking of production workflows, video will be commissioned and produced... Loc and AD comes later as a second step
… usually need a transcript.... for SDH subtitles or localized translation subtitles
… cyril said these processes are sometimes too removed, and the dubbing plus translation can be mismatched
… trying to convince content producers to move the transcription process earlier in the chain
… a lot of the service providers use proprietary tools

<Zakim> jcraig, you wanted to mention FCC DAC report

jcraig: I can share afterwards - I'm Apple's rep to the FCC disability advisory committee
… and worked on a report for the commission with other people, which is public, I'll share the link later,
… which is effectively guidelines and recommendations for broadcasters and streamers for how
… to do exactly this, and which specific resources should be deployed widely with the original video.
… A lot of time the contracts for production do not include the accessible alternatives, for subtitles,
… descriptions, translations etc.
… So then when the content goes to cable providers etc., the recommendations talk about this
… particular item, the audio description transcript and ideally timing, as well as the dialogue,
… should as much as possible be considered and distributed by the original distributor, to avoid this mismatch.
… Which avoids rework and mismatch when there's already prior work that's been done.
… Redub with different transcripts etc cause those problems.

nigel: Chris intro?

cpn: Chris Needham: BBC, Chair Media WG

nigel: broadly speaking, DAPT useful as a production tool
… for Timed Text, audio, etc

<ray-schwartz9> Need to head to another meeting. Thanks for letting me sit in!

nigel: mostly upstream of something that would go to the client devices, but DAPT could go directly to the player, .. for braille or TTS, local audio mix, etc.
… including pans, levels, etc

nigel: intro?

youenn: Youenn Fablet, Apple

nigel: doc includes examples to help people understand the use cases
… tracking for translation,the current language and original lang ("pivot languages?")

ex: norwegian to hebrew.... probably passed through English as a pivot language
… so by tracking through this, you may have a better idea of how to avoid or correct mistranslations
… metadata describes characters (the type portrayed by actors) and other info
… metadata could differentiate visual description vs transliteration of text rendered visually on screen... (time or location chyrons, as an example)
… [scanning through the document]... showing timed text example of AD... along with mixing data
… also can include prerecorded audio
… [showing Gain attribute data]
… result is that it ducks the main program audio while AD mix is played, and re-raises the gain after the "ducking"

<Zakim> jcraig, you wanted to ask about ducking prefs

jcraig: Screen readers often have a setting for ducking audio, not used when there's pre-mixed audio
… Is there more data here than just the gain, like a context, like "this is a ducking transition",
… because that would potentially allow the user preference for ducking.
… Is there semantic information about why the transitions are happening?

cyril: I don't think we thought about that use case, semantic signalling,
… but I see that it could easily be added - TTML is easily extended, either in the spec or with
… proprietary information.

jcraig: Talking about sub-channel audio for a baseball game earlier, some people might want
… to hear that in the same room as others who do not.
… That mixed data could be deployed to a different channel or speaker.
… That type of semantic metadata could also apply.

rwarren2: A friend who has gone blind late in his life: enjoys baseball,
… but now it's not on the radio, there's a change in the announcement style
… It's frustrating because they no longer know what the action is, because the assumption
… is that you can see what's happening.

jcraig: Anecdotally, I have a lot of blind friends into baseball, who would like that. My assumption is because the position don't move, and you can build a mental picture based on action that is described well, like it used to be on radio.

nigel: Irish commission researched about appropriate ducking levels based on how loud the program audio is, how much to duck, and how loud the AD should be.

Investigating a Standardised Approach to Setting Audio Description Dip Values

nigel: so that the background programme sound does not drown out the description
… so "one size" does not "fit all" when it comes to audio ducking

<Zakim> nigel, you wanted to react to jcraig to answer that

nigel: anecdotal data point, visited VRT in Belgium would hand tweaking gain to allow un-ducking relevant noise ("door opening") during AD dialog, to improve understandability

wschi: how do you stream the XML?

nigel: could be one big file...
… Or MPEG-DASH, HLS, etc.

<wschi> ST2110-43#

cyril: RTP payload ST2110-43

wschi: could be very high bitrate?

cyril: might be similar to a lower bitrate for voice-only (not full mix)

nigel: which options would we need?

jcraig: saw one anti-pattern with a streamer who deployed Dolby Atmos, bt the AD track was flattend mixed to Stereo

Nigel: was on AD examples... There are also Dubbing example

cyril: focus on AD to ask for feedback?

nigel: structure includes data model separate from the TTML
… recording or synthesized, with optional mix data
… within the spec, each class or object type is described... no need to have a full understanding of TTML to understand it.

cyril: request feedback on AD... are there use cases not included? identifying gaps, etc?

wschi: very expressive about audio features... are there interactive (user pref) controls about how that mix would work?

jcraig: Games are often very customisable, different sliders for different game sound effects.
… Even tweaks for things that might be considered triggers or scare warnings,
… that level of distinction.
… All custom, but deployed because users are asking for those features.

nigel: implementations... authoring... conversion tools, etc
… expecting more activity in order to meet the goals of the community need

wschi: re: deployment, is NGA not there yet?

nigel: not dependent on the format...

nigel: perhaps URI or fragment id for this?

cyril: I don't think there is a standard in ISOBMFF? to spec a subtrack of a subtrack?

jernoble: for HLS, there are variants , not really tracks...

Nigel: tech discussion should continue into the hallway

cyril: please review and provide feedback

nigel: also discussing related topics tomorrow
… hope to get to CR soon

jcraig: The FCC Disability Advisory Committee (DAC) report on "Audio Description File Transmittal for Internet Protocol Delivered Video Programming" https://www.fcc.gov/ecfs/document/10208388924441/1
… Word/PDF/.. PDF linked from Word/PDF/.. Word/PDF linked from/PDF linked from under the Recommendations heading: https://www.fcc.gov/audio-description
… most relevant, the section as the end on "Potential Opportunities in the Audio Description Ecosystem for Participants and the
… Commission" covers recommendations like:
… - Encourage vendors to provide and content creators to request AD scripts with timestamps in addition to the AD audio files.
… - Encourage vendors to deliver these unmixed [AD] audio files to stakeholders.

Meeting Close

nigel: Thank you everyone, very interesting discussion points, we're out of time [adjourns meeting]