Meeting minutes
Introductions
Nigel: Nigel Megitt, BBC, chair ADCG, co-chair TTWG, one of the Editors of DAPT
ray-schwartz9: Ray Schwartz: he/him NFCU, memeber of ARIA
gabriel: eng on MS Edge, part of Web Audio
atsushi: w3c contact TTWG
<nigel> s/??/atsushi
niko: Nikolas Fairburn, Media and Entertainment Interest Group
bernd: member of Media and Entertainment Interest Group, and WICG
<jcraig> s/??/niko/
jcraig: James Craig, Apple Accessibility, member of TTWG, interested in audio descriptions, most active in ARIA
Adam_Page: Hilton Accessibility ARIA WG
cyril: Netflix, TTWG Co-editor
<atsushi> Hiroshi_Ohta: from LINE Yahoo Corp.
reinaldoferraz: Reinaldo Ferraz, W3C chapter Sao Paulo, observer
.
sprang: Google Meet, Observer
<reinaldoferraz> NIC.br
wschi: Weiwei Xu, Huawei, Media Standard Department
nigel: intros ahead of schedule
… any other agenda topics?
Agenda
DAPT profiles in TTML
Authoring and production workflows for Loc and AD
no other agenda items
cyril: have others deployed AD in recent deployments
… MPEG-H, coding audio descriptions, etc
wschi: used in broadcast and to mix them in encode/decode
mixing not supported in browser yet
cyril: used in VOD
Robert Warren: niko agriculture, interest in the humanities
jcraig: To Cyril's question, Apple has a number of different audio description features
… as a streaming service provider with Apple TV+, I have no part of it but very proud of the work
… we do with AD and captions. Most of the Apple Original content has 9 AD languages and 40 caption track.
… On the product side, there are a number of features related to AD and captions that most people
… are not aware of. For example if you are blind and have a screen reader you can choose to have
… captions Brailled or spoken live (for translation).
… Experimented with something similar for audio descriptions.
… Eric Carlson and I demoed that 2 years ago in TPAC.
… Take AD track type of the web <video> element, parsing it on the fly and either
… speaking it or Brailling it. Silent descriptions sent live to the Braille display.
… Someone who is deafblind could enjoy them widely.
… Not deployed widely, just a tech demo
… Love to get more interest in it, add the Braille support.
… That demo was a custom implementation of WebVTT in the video player
Adam_Page: Hilton deploying more AD to the webiste
Adam_Page: Another data point. At Hilton, not a big video platform,
… baking the audio track in
jcraig: Second track is standard way to deploy AD now, with a tag saying it's AD, to support
… auto selection
… Technically just a standard audio track
Adam_Page: user chooses the preferred track
… most require extended.
jcraig: One of the things was descriptions longer than the natural gap in the audio, e.g.
… extended descriptions, we demoed auto pause of video in the player when that happened.
… Have not seen a lot of deployments of that.
… I think WGBH has some demos of extended description
jcraig: demo in Vancouver was extended lecture paused to describe a chart
nigel: BBC deploys choice between pre-mixed audio trck with AD versus w/o
… also deploying a dry AD track (not mixed with main audio) plus mix data
… DVB-T is widely deployed and supported in the UK
… transport stream is specified in the UK's "d-book"
cyril: UK-specific technology?
nigel: yes, since 2006 or so
<rwarren2> "Widely available for a large amount of money" ;)
nigel: That's the broadcast standard
… online we deploy separate video files like Hilton
… lately starting to deploy Live descriptions... timing is an artform.. the describers research ahead of time
cyril: is the describer local or remote?
nigel: either... third-party service
Adam_Page: English only?
nigel: yes
wschi: thinking of replicating the live AD use case into browsers as well?
nigel: 3-4 yrs ago, demoed TTML2 with live mix instructions in the browser
… could be used for live broadcast, too
… tech demo can mix two audio tracksusing mix data (well) and/or less-well mixed with text-to-speech .. (generated speech synthesis)
… could deploy as sADM, or we may deploy as a custom implementation in the BBC player
niko: NGA can include AD, and spatial position to separate Object-based audio
Object-based audio
bernd: demo and discussed in the Media and Entertainment Interest Group this past Monday....
eric_carlson: WebKit eng at Apple, inc TT
jernoble: Jer Noble... WebKit Engineer at Apple, and TT
cyril: how widely deployed is AD around the world?
… Are there countries with no AD? distribution?
nigel: BBC audio describes over 20% of our programmes, regulatory requirement is 10%.
… other countries do some percentage
jcraig: smaller percentages
cyril: is AD deployed widely in Japan?
Hiroshi_Ohta: audio subchannels are popular in Japan... for example, background data about baseball players during games.... Not as widely deployed for AD for the Blind
nigel: most recent olympic games in Japan included additional data (NHK?) on the subchannel generating AD about the scores
??: not sure of which subchannels are auto-generated or not?
nigel: one development in AD that has been gaining in use is synthesized voices
… there is an advocacy group in UK that has been running user test experiments
… Royal National Institute for the Blind (RNIB)
Nigel: new attendee?
dana: Hi!. I'm Dana. I work on WebKit.
jcraig: Deployments in Japan: more common with streaming services.
… Japanese is one of the languages that Apple Original content localises to
… HBO is starting to ramp up as well, and starting to lead the way with signed / PiP / ASL movies,
… being deployed as separate video files because there isn't a way to compose the dry components
… and keep them in sync
nigel: on the tangent of sign interpretation, there is a new regulatory requirement in Spain
… 3cat (Catalonian broadcaster) recently demoed an HTTBV receiver?
… Got the signing stream over IP and recomposited, implementation in WASM
… I think the resolution of the signed video was lower than the main broadcast video
jcraig: resolution does not need to be as high, but high framerate is critical with sign language... easy to lose context with dropped frames
DAPT
nigel: this spec has originations back a few years
… TTML2 could trigger audio playback, pitch, etc, audio mixing etc
… but in general TTML is a TT format... I tried to do an AD variant, but had not as broad uptake
… so DAPT is AD plus mainstream dubbing as use cases
… and other uses
… thinking of production workflows, video will be commissioned and produced... Loc and AD comes later as a second step
… usually need a transcript.... for SDH subtitles or localized translation subtitles
… cyril said these processes are sometimes too removed, and the dubbing plus translation can be mismatched
… trying to convince content producers to move the transcription process earlier in the chain
… a lot of the service providers use proprietary tools
<Zakim> jcraig, you wanted to mention FCC DAC report
jcraig: I can share afterwards - I'm Apple's rep to the FCC disability advisory committee
… and worked on a report for the commission with other people, which is public, I'll share the link later,
… which is effectively guidelines and recommendations for broadcasters and streamers for how
… to do exactly this, and which specific resources should be deployed widely with the original video.
… A lot of time the contracts for production do not include the accessible alternatives, for subtitles,
… descriptions, translations etc.
… So then when the content goes to cable providers etc., the recommendations talk about this
… particular item, the audio description transcript and ideally timing, as well as the dialogue,
… should as much as possible be considered and distributed by the original distributor, to avoid this mismatch.
… Which avoids rework and mismatch when there's already prior work that's been done.
… Redub with different transcripts etc cause those problems.
nigel: Chris intro?
cpn: Chris Needham: BBC, Chair Media WG
nigel: broadly speaking, DAPT useful as a production tool
… for Timed Text, audio, etc
<ray-schwartz9> Need to head to another meeting. Thanks for letting me sit in!
nigel: mostly upstream of something that would go to the client devices, but DAPT could go directly to the player, .. for braille or TTS, local audio mix, etc.
… including pans, levels, etc
nigel: intro?
youenn: Youenn Fablet, Apple
nigel: doc includes examples to help people understand the use cases
… tracking for translation,the current language and original lang ("pivot languages?")
ex: norwegian to hebrew.... probably passed through English as a pivot language
… so by tracking through this, you may have a better idea of how to avoid or correct mistranslations
… metadata describes characters (the type portrayed by actors) and other info
… metadata could differentiate visual description vs transliteration of text rendered visually on screen... (time or location chyrons, as an example)
… [scanning through the document]... showing timed text example of AD... along with mixing data
… also can include prerecorded audio
… [showing Gain attribute data]
… result is that it ducks the main program audio while AD mix is played, and re-raises the gain after the "ducking"
<Zakim> jcraig, you wanted to ask about ducking prefs
jcraig: Screen readers often have a setting for ducking audio, not used when there's pre-mixed audio
… Is there more data here than just the gain, like a context, like "this is a ducking transition",
… because that would potentially allow the user preference for ducking.
… Is there semantic information about why the transitions are happening?
cyril: I don't think we thought about that use case, semantic signalling,
… but I see that it could easily be added - TTML is easily extended, either in the spec or with
… proprietary information.
jcraig: Talking about sub-channel audio for a baseball game earlier, some people might want
… to hear that in the same room as others who do not.
… That mixed data could be deployed to a different channel or speaker.
… That type of semantic metadata could also apply.
rwarren2: A friend who has gone blind late in his life: enjoys baseball,
… but now it's not on the radio, there's a change in the announcement style
… It's frustrating because they no longer know what the action is, because the assumption
… is that you can see what's happening.
jcraig: Anecdotally, I have a lot of blind friends into baseball, who would like that. My assumption is because the position don't move, and you can build a mental picture based on action that is described well, like it used to be on radio.
nigel: Irish commission researched about appropriate ducking levels based on how loud the program audio is, how much to duck, and how loud the AD should be.
Investigating a Standardised Approach to Setting Audio Description Dip Values
nigel: so that the background programme sound does not drown out the description
… so "one size" does not "fit all" when it comes to audio ducking
<Zakim> nigel, you wanted to react to jcraig to answer that
nigel: anecdotal data point, visited VRT in Belgium would hand tweaking gain to allow un-ducking relevant noise ("door opening") during AD dialog, to improve understandability
wschi: how do you stream the XML?
nigel: could be one big file...
… Or MPEG-DASH, HLS, etc.
<wschi> ST2110-43#
cyril: RTP payload ST2110-43
wschi: could be very high bitrate?
cyril: might be similar to a lower bitrate for voice-only (not full mix)
nigel: which options would we need?
jcraig: saw one anti-pattern with a streamer who deployed Dolby Atmos, bt the AD track was flattend mixed to Stereo
Nigel: was on AD examples... There are also Dubbing example
cyril: focus on AD to ask for feedback?
nigel: structure includes data model separate from the TTML
… recording or synthesized, with optional mix data
… within the spec, each class or object type is described... no need to have a full understanding of TTML to understand it.
cyril: request feedback on AD... are there use cases not included? identifying gaps, etc?
wschi: very expressive about audio features... are there interactive (user pref) controls about how that mix would work?
jcraig: Games are often very customisable, different sliders for different game sound effects.
… Even tweaks for things that might be considered triggers or scare warnings,
… that level of distinction.
… All custom, but deployed because users are asking for those features.
nigel: implementations... authoring... conversion tools, etc
… expecting more activity in order to meet the goals of the community need
wschi: re: deployment, is NGA not there yet?
nigel: not dependent on the format...
nigel: perhaps URI or fragment id for this?
cyril: I don't think there is a standard in ISOBMFF? to spec a subtrack of a subtrack?
jernoble: for HLS, there are variants , not really tracks...
Nigel: tech discussion should continue into the hallway
cyril: please review and provide feedback
nigel: also discussing related topics tomorrow
… hope to get to CR soon
jcraig: The FCC Disability Advisory Committee (DAC) report on "Audio Description File Transmittal for Internet Protocol Delivered Video Programming" https://
… Word/PDF/.. PDF linked from Word/PDF/.. Word/PDF linked from/PDF linked from under the Recommendations heading: https://
… most relevant, the section as the end on "Potential Opportunities in the Audio Description Ecosystem for Participants and the
… Commission" covers recommendations like:
… - Encourage vendors to provide and content creators to request AD scripts with timestamps in addition to the AD audio files.
… - Encourage vendors to deliver these unmixed [AD] audio files to stakeholders.
Meeting Close
nigel: Thank you everyone, very interesting discussion points, we're out of time [adjourns meeting]