14:04:09 RRSAgent has joined #audio-descriptions 14:04:09 logging to https://www.w3.org/2022/09/14-audio-descriptions-irc 14:04:11 Zakim has joined #audio-descriptions 14:05:23 Meeting: VTT-based Audio Descriptions for Media Accessibility - TPAC 2022 breakout 14:05:25 Chair: Eric_Carlson, James_Craig 14:05:27 Agenda: https://www.w3.org/events/meetings/7bd857cf-f322-4ab7-9430-3a28eae12de4#agenda 14:05:29 RRSAgent, make log public 14:05:31 RRSAgent, this meeting spans midnight 14:16:38 RRSAgent, stay 14:16:40 Zakim, stay 14:16:40 I don't understand 'stay', dom 15:15:03 dom has joined #audio-descriptions 15:52:03 englishm has joined #audio-descriptions 16:12:11 agenda+ breakout 18:15:54 dom has joined #audio-descriptions 20:15:10 dom has joined #audio-descriptions 20:30:16 dom has joined #audio-descriptions 21:07:07 dom__ has joined #audio-descriptions 21:53:04 dom has joined #audio-descriptions 23:08:39 shadi has joined #audio-descriptions 23:22:30 dom has joined #audio-descriptions 23:24:50 tink has joined #audio-descriptions 23:25:58 present+ Léonie (tink) 23:27:32 gkatsev has joined #audio-descriptions 23:28:02 jasonjgw has joined #audio-descriptions 23:34:44 dsinger has joined #audio-descriptions 23:34:45 yonet_ has joined #audio-descriptions 23:35:31 present+ 23:35:40 present+ dsinger 23:35:47 present+ Gary_Katsevman 23:36:13 nigel has joined #audio-descriptions 23:36:29 Present+ Nigel_Megitt 23:36:36 JohnRochford has joined #audio-descriptions 23:36:40 Travis has joined #audio-descriptions 23:36:42 present+ 23:36:43 cyril_ has joined #audio-descriptions 23:36:45 present+ 23:36:49 MichaelGood has joined #audio-descriptions 23:36:50 scribe: cyril_ 23:36:58 vivien has joined #audio-descriptions 23:37:36 present+ 23:37:40 jcraig: defining audio descriptions, similar to how closed captions work for the deaf, but for the blind 23:37:52 ... AD are alternate or additive audio tracks 23:37:56 ... for low vision users 23:38:03 ... old time radio shows used to do that 23:38:24 ... Apple has experience in this area 23:38:35 ... [screen shot of all the audio choices and caption choices] 23:38:59 ... [Explaining Ted Lasso's slide] 23:39:43 zcorpan has joined #audio-descriptions 23:39:57 1. Recorded audio: full mix (composited in studio) or dry mix (composited on device; new possibilities may be enabled by Dolby AC4 “accessibility descriptor”) 23:40:03 present+ 23:40:07 ... among the type of AD 23:40:29 ... there is a new type of Dolby AC4 with tagging, accessibility descriptors 23:40:38 James/Apple should be proud of the 20+ languages, with audio description for each, for the Ted Lasso production on Apple TV. 23:40:47 ... could allow repositioning of the audio description to the head phones or near the ears 23:41:01 ... use cases are posted on the Dolby site 23:41:11 2. Generated text-to-speech-based audio. (Will mention pros/cons later) 23:41:21 ... we are going to talk about the 2nd typem, which is text to speech 23:41:28 ... we will talk about pros/cons 23:41:44 3. Text-to-braille descriptions w/ or w/o audio. No known implementations, but we hope to change that. Wanted to call this out b/c the “audio descriptions” doesn’t necessarily mean the end format will be audio. 23:41:53 ... another type is text to braille 23:42:14 ... it could be used for people who are deaf and blind as well 23:42:38 ... who is familiar with Extended Descriptions 23:42:44 ... they pause the media playback 23:42:59 ... it can be awkward and we are looking for feedback 23:43:15 ... there are cases where it is necessary and some others where it's not 23:43:22 ... 2 implementations in Chromium and WebKit 23:43:43 Descriptions that pause the media playback. Appropriate in some circumstances, and less desirable in others…. For example in most entertainment contexts (e.g. Movies and TV shows) extended descriptions are often considered undesirable. In other contexts, the utility of extended descriptions may be necessary. We’ll talk more about this in the demo. 23:43:44 eric: A WebVTT has an attribute that describes the kind of cues in it 23:44:01 .. it always had a kind of "description" but was never used 23:44:04 ... the concept is simple 23:44:18 ... in the web engine we enable the caption track and when we would render the cues to the screen 23:44:27 ... instead we pass the text to the voice synthesis engine 23:44:42 ... and like other types of captions, in the caption files, there is a start time and end time 23:45:03 ... in this experimental implementation that I added to WebKit 23:45:06 ... there are 2 modes 23:45:12 ... a cue would start at it start time 23:45:26 ... but if the utterance is longer than its start time, we don't stop the audio 23:45:36 ... it may overlap over the audio 23:45:48 ... if another cue starts, the previous cue is stopped 23:46:05 cyril_: is the demo public? 23:46:12 eric: no, it's not. I don't own the content 23:46:38 ... the descriptions were carefully authored to fit the gap 23:46:52 [dmoe] 23:47:08 s/[dmoe]/[demo] 23:49:00 ... second demo of a lecturer drawing on a chalk board 23:49:26 ... we have not yet enable Extended Descriptions 23:49:32 ... so you'll hear overlap 23:49:59 ... obviously not a good user experience 23:50:08 lgombos______ has joined #audio-descriptions 23:50:14 ... we duck the volume of the video a bit to let the description be louder 23:50:21 q+ 23:50:22 q+ to ask how you decide when and by how much to duck the video audio 23:50:36 ... let's turn on Extended Descriptions now 23:51:15 ... [video is paused until the description is fully spoken] 23:51:26 q+ 23:51:28 That's a much-better user experience. 23:51:31 ... that makes it more understandable 23:51:55 ... however in a movie trailer, where the descriptions were not authored carefully enough 23:52:34 The one with the professor documenting a formula on a chalkboard along with extended audio description that pauses the lecture. 23:52:55 ... in that case, it can be really disruptive to the experience 23:53:21 ... one of the questions that we have is: how can we let the user control that? 23:53:41 jcraig: is this just a content problem or do we need more user control at playback? 23:54:07 q? 23:54:07 ... the lectures and technical discussions work well but entertainment media does not 23:54:12 q+ 23:54:13 q+ 23:54:15 ... turning a 2h movie into 3 23:54:36 JohnRochford: I'm legally blind and love audio descriptions 23:54:55 ... maybe we don't want to see a 3h movie, but I would want a lecture as was demonstrated 23:55:11 mbgower has joined #audio-descriptions 23:55:16 present+ 23:55:18 q? 23:55:25 present+ 23:55:30 jcraig: VTT already works in the browsers 23:55:44 ... it was not as challenging as expected 23:56:00 ... some other pros is that it's easy to author (but it's a con also) 23:56:10 ... it's easy internationalized 23:56:18 rrsagent, make minutes 23:56:18 I have made the request to generate https://www.w3.org/2022/09/14-audio-descriptions-minutes.html mbgower 23:56:20 ... VTT could be used to augment voiced audio descriptions 23:56:35 ... for hybrid voiced and brailled 23:57:12 ... it's much more scalable and cheaper but quality control is a concern 23:57:27 ... another pro is that most streamed media use manifest formats 23:57:40 ... there is a content negotiation happening 23:57:49 ... you're only going to get the bit you need 23:58:10 ... so if it's packaged (epub ...) so all the audio tracks increase the file size 23:58:30 ... this would allow assistive technology data does not increase the file size 23:58:49 ... to the cons, generated text to speech does not sound as good as studio recorded 23:59:02 ... most of the blind community prefers the recorded version if available 23:59:20 ... you can screw up internationalization by just translating and keeping the same timing 23:59:35 ... some content providers provide low quality tts descriptions 23:59:45 ... it's difficult to know how long an utterance will take 23:59:53 ... different users have different preferences for speed 00:00:01 ... so sometimes timing might not work 00:00:08 ... should we clip, compress, etc.. 00:00:20 ... what happens if it's authored the wrong way? 00:00:48 ... this does not yet work with spoken captions 00:01:02 ... Apple shipped that on TVOS and iOS several years ago 00:01:26 ... where for deaf/blind in a country where the language is not your primary language 00:01:39 ... you could ask the system to speak the subtitles 00:02:05 nigel: it's commonly used in Europe, called spoken subtitles 00:02:28 q? 00:02:33 jcraig: in that case we have issues combining spoken captions and text based descriptions 00:03:06 ... the most important con is that there is no way to support Extended Descriptions in live streams 00:03:22 ... you could author a VTT file that would pause the live stream 00:03:34 eric: in the current implementation, only one description track can be active at a time 00:03:40 ... it's technically possible to have more than one 00:03:52 ... this way was more straightforward to implement 00:03:56 q+ to say did you say speech rate could be controlled by user? 00:03:56 ... we are looking for feedback 00:04:20 ack JohnRochford 00:04:38 JohnRochford: the first 2 videos that you showed, were they examplifuing the same thing? 00:04:58 eric: the first 2 was to demonstrate that it can work well with 2 pieces of content with no need to pause 00:05:17 ... the first showing of lecture was to show that standard descriptions was not working 00:05:28 ... and the second showing was to demonstrate how they could work 00:05:41 q+ 00:05:43 JohnRochford: I have use cases for multiple tracks 00:05:48 ... for example to learn a language 00:05:51 q? 00:05:57 eric: that's a good point 00:06:25 ... do you know what we would want to do in the case where both tracks have cues that should be active at the same time? 00:06:33 ... speak one utterance and then the next one 00:06:44 JohnRochford: it would seem to me that it could be a user preference 00:06:52 ... simultaneous or sequential 00:07:05 ... a lot of immigrants in the US watch Sesame Street 00:07:10 ack nigel 00:07:10 nigel, you wanted to ask how you decide when and by how much to duck the video audio 00:07:18 Is this the proposal: https://github.com/WebKit/explainers/tree/main/texttracks ? 00:07:22 nigel: you said that you duck the video/audio 00:07:33 ... that's usually done as part of the audio recording process 00:07:42 ... so how do you decide in the implementation 00:07:48 eric: this is an experiment 00:07:59 ... hot off the press so not spent a lot of time on it 00:08:15 jcraig: we have a user preference for audio ducking for screen reader 00:08:42 ... we would probably implement it in a similar way and not specific for Safari 00:09:06 nigel: the use case for setting the amount by which you duck during authoring is that the program audio loudness varies 00:09:12 ... you don't want to duck by the same amount 00:09:43 jcraig: we could have hints on the ducking 00:09:46 ? 00:09:49 q? 00:09:51 ack gkatsev 00:09:59 gkatsev: I have a couple of comments 00:10:36 ... 1. as a maintainer of video.js, we've had support for that, exposing to screen readers, and use the speech to text 00:10:50 ... Owen Edwards and I worked on that 00:11:01 https://github.com/OwenEdwards/videojs-speak-descriptions-track 00:11:02 ... as a plugin of videoJS 00:11:16 JenniferS has joined #audio-descriptions 00:11:25 Present+ 00:11:31 ... we ducked the audio to a quarter of what it was 00:11:41 ... but if it would be longer, we would pause 00:11:51 jcraig: is that using the WebSpeech API? 00:11:55 gkatsev: yes 00:12:10 ... There is an Elephant's Dream track that you can use publicly 00:12:20 q? 00:12:23 ... and this is awesome that you're doing that 00:12:26 ack jasonjgw 00:12:26 ack jas 00:12:57 jasonjgw: you could use the maximum speech rate of the user 00:12:57 Q+ 00:13:12 ... you could also dynamically vary the speech rate up to the max user level 00:13:19 ... you could also pause if this does not work 00:13:56 jcraig: we did talk about compression 00:14:05 ... that is change the speech rate 00:14:17 ... but what do you mean by user's highest acceptable speech rate 00:14:38 jasonjgw: the user is going to have comfort rang 00:14:47 ... and you don't want to exceed that 00:15:07 jcraig: so you're proposing to expose a new user preference 00:15:33 q+ 00:15:54 jasonjgw: my second question is that I started to do spatial audio 00:16:05 ... and it could help with having 2 audio tracks 00:16:05 great idea 00:16:29 ... reading described video in braille in real time could be quite a challenging process 00:16:41 ... I'm wondering what the user interface could be? 00:17:48 jcraig: there are braille displays that are more like a full page of braille than a line of braille 00:18:00 ... we could have a scrolled buffer that's moving up 00:18:17 ... it might display on the content type 00:18:33 jasonjgw: the 2D braille display are interesting 00:18:40 ... they might speed up the reading process 00:18:46 ... but it might not be enough 00:18:58 q? 00:19:04 ack tink 00:19:14 tink: 1. what type of AD ight be wanted 00:19:16 q- 00:19:18 ... the context will answer 00:19:23 s/ight/might/ 00:19:31 ... with entertainement, we tend to leave the AD on 00:19:44 ... but if I'm watching on my own, I might want Extended AD 00:20:00 ... with lectures, XAD would be extremely useful 00:20:12 jcraig: do we need a different "kind" value for XAD 00:20:19 ... you just mentioned different context 00:20:47 tink: the question was more about the mechanics of it? 00:21:01 ... isn't the time the time between the captioning cues? 00:21:17 q+ to say that the time expressed in the cues should not be the time needed, but the time it applies to 00:21:20 jcraig: it depends on what audio happens in between 00:22:10 take it from the sound effects track? 00:22:23 jcraig: let's say we're about to overrun and pause 00:22:41 ... we could trhough machine learning speech recognition or caption data, we could detect 00:22:47 q+ 00:23:21 tink: we could also influence the authoring 00:23:33 dsinger: what is the interval of the video to which this description applies 00:23:47 q+ 00:23:52 ... then it could be a decision of the user to speech over, pause or change reading rate 00:24:14 ... but you want to preserve correct timing, so that if the user seeks it works 00:24:21 ack ds 00:24:23 dsinger, you wanted to say that the time expressed in the cues should not be the time needed, but the time it applies to 00:24:48 jcraig: there are creative users of audio descriptions, one of the most impressive is to listen to credits at the end of the movie 00:25:14 ack zc 00:25:19 ... don't always assume that the time of audio description is accurate, there is an creative aspect 00:25:26 zcorpan: I'm happy to see this implemented 00:25:32 JenniferS has joined #audio-descriptions 00:25:43 Q 00:25:46 ... you said earlier that in the current implementation, only one track is used 00:25:55 ... does it apply to AD tracks? 00:26:08 only one description track at a time 00:26:11 eric: only one AD track at a time, you could also have subtitles/captions also if you want 00:26:30 zcorpan: a use case I envisage is for deaf/blind users with a braille output 00:26:35 q? 00:26:39 ack Je 00:27:09 JenniferS: it would be very good to have a documentation of the various use cases could be created so that the ideation of the solution could progress 00:27:15 ... a lot of people don't do braille 00:27:27 qq+ to mention ADPT's expression of requirements and workflow 00:28:42 ack n 00:28:42 nigel, you wanted to react to JenniferS to mention ADPT's expression of requirements and workflow 00:28:54 nigel: there is a thing called the Audio Description community group 00:29:07 ... we create a document which is a requirements for authoring AD 00:29:20 ack cyril_ 00:29:54 cyril_: I want to mention the work of the Timed Text WG on the DAPT work https://w3c.github.io/dapt-reqs/#define-audio-mixing-instructions-ad-process-step-4 a profile of TTML for authoring/exchanging AD 00:29:58 -> https://w3c.github.io/adpt/#requirements ADPT Requirements 00:30:06 ack ni 00:31:19 nigel: the BBC has a lot of AD 00:31:27 ... it's created for Broadcast 00:31:34 ... so we could not do XAD 00:31:36 ack john 00:31:45 Zakim, close the queue 00:31:45 ok, cyril_, the speaker queue is closed 00:32:02 JohnRochford: I have a demo that I'll post 00:32:14 ... with applicability to sign language 00:32:39 ... as an extension of what you're doing 00:32:47 ... have an inset, picture and picture ASL 00:33:18 http://bit.ly/ASLdemo Users can watch talking-head videos and/or ASL interpreters and/or view closed captioning. 00:33:56 JohnRochford: the deaf community told us that close captions are not enough because they don't read them well 00:34:10 Zakim, generate minutes 00:34:10 I don't understand 'generate minutes', cyril_ 00:34:17 Zakim, make minutes 00:34:17 I don't understand 'make minutes', cyril_ 00:34:18 RRSAgent, draft minutes 00:34:18 I have made the request to generate https://www.w3.org/2022/09/14-audio-descriptions-minutes.html vivien 00:34:59 mbgower has joined #audio-descriptions 00:35:20 jasonjgw has left #audio-descriptions 00:36:27 mbgower has joined #audio-descriptions 00:36:39 Chrome implementation. https://chromium-review.googlesource.com/c/chromium/src/+/3810947 00:36:47 @Travis ^ 00:36:49 nigel has joined #audio-descriptions 00:40:43 jcraig: can you (or someone else) file a whatwg/html issue about a new value? 00:42:49 descriptions vs extended-descriptions 00:44:06 nigel has joined #audio-descriptions 00:47:22 -> https://bbc.github.io/Adhere Adhere demo of AD in TTML2 00:50:22 nigel has joined #audio-descriptions 00:52:20 nigel has joined #audio-descriptions 00:58:05 s/[Explaining Ted Lasso's slide]/We’ve got some experience in this area. Apple isn’t just a provider of the playback hardware and software, it’s also a major content distributor, and Apple TV+ original content leads the industry in support for international audio description and caption choices. [Explaining Ted Lasso's slide]/ 01:00:07 s/[Explaining Ted Lasso's slide]/This is a screen shot of the audio and caption or subtitle choices of Ted Lasso Season 1 Episode 1. There are 10 spoken languages, and 10 additional AD versions in each of those languages. 20 Audio Tracks in total. [Explaining Ted Lasso's slide]/ 01:03:36 s/[Explaining Ted Lasso's slide]/There are also 40 caption (SDH VTT) tracks including those 10 + 30 other languages. We think the Apple TV+ content is the most comprehensive library of described and captioned media. / 01:03:45 rrsagent, make minutes 01:03:45 I have made the request to generate https://www.w3.org/2022/09/14-audio-descriptions-minutes.html jcraig 01:04:59 s/We think the Apple TV+ content is the most comprehensive library of described and captioned media.// 01:05:02 rrsagent, make minutes 01:05:02 I have made the request to generate https://www.w3.org/2022/09/14-audio-descriptions-minutes.html jcraig 01:07:43 ericc has joined #audio-descriptions 01:08:31 WebKit patch adding support for standard descriptions: https://github.com/WebKit/WebKit/pull/3486 01:08:58 WebKit patch adding support for extended descriptions: https://github.com/WebKit/WebKit/pull/4129 01:09:00 rrsagent, make minutes 01:09:00 I have made the request to generate https://www.w3.org/2022/09/14-audio-descriptions-minutes.html jcraig 01:09:21 s/@Travis ^// 01:09:41 rrsagent, make minutes 01:09:41 I have made the request to generate https://www.w3.org/2022/09/14-audio-descriptions-minutes.html jcraig 01:31:10 nigel has joined #audio-descriptions 02:46:42 nigel has joined #audio-descriptions 14:59:51 nigel has joined #audio-descriptions 15:31:25 nigel has joined #audio-descriptions 15:57:16 mbgower has joined #audio-descriptions 17:04:58 nigel has joined #audio-descriptions 17:06:43 mbgower has joined #audio-descriptions 17:09:17 nigel has joined #audio-descriptions 17:15:51 nigel_ has joined #audio-descriptions 17:20:11 nigel_ has joined #audio-descriptions 17:30:07 mbgower has joined #audio-descriptions 17:30:58 mbgower has joined #audio-descriptions 17:32:56 mbgower_ has joined #audio-descriptions 18:12:59 dsinger has joined #audio-descriptions 18:50:31 dsinger has joined #audio-descriptions 19:32:57 mbgower has joined #audio-descriptions 19:49:59 nigel has joined #audio-descriptions 19:54:20 nigel has joined #audio-descriptions 19:56:08 nigel has joined #audio-descriptions 20:00:18 nigel has joined #audio-descriptions 20:02:00 mbgower has joined #audio-descriptions 20:21:25 dsinger has joined #audio-descriptions 20:30:36 nigel has joined #audio-descriptions 20:33:42 dsinger has left #audio-descriptions 20:39:43 nigel has joined #audio-descriptions 20:42:32 nigel has joined #audio-descriptions 20:48:33 nigel_ has joined #audio-descriptions 21:15:02 dom has joined #audio-descriptions 21:26:12 nigel has joined #audio-descriptions 21:29:02 nigel_ has joined #audio-descriptions 21:33:05 mbgower has joined #audio-descriptions 21:51:58 zcorpan: missed your request for a whatwg issue yesterday. Do you mean a kind attr value for something like extended-description? I don't know whether that's the right solution. If you're asking for a general "consider this problem" issue, I can do that. If something else, let me know. thanks. 21:52:23 re: "jcraig: can you (or someone else) file a whatwg/html issue about a new value?" 21:59:24 Zakim, bye 21:59:24 leaving. As of this point the attendees have been Léonie, (tink), shadi, dsinger, Gary_Katsevman, Nigel_Megitt, JohnRochford, Travis, vivien, zcorpan, mbgower, yonet_, JenniferS 21:59:24 Zakim has left #audio-descriptions 21:59:25 RRSAgent, bye 21:59:25 I see no action items