Audio Description Community Group -- 25 Oct 2018

<scribe> scribe: nigel

Introductions

Nigel: Welcome everyone to the first face to face meeting of the AD CG.
... Run through of agenda

Nigel: In the room we have:
... Nigel Megitt (BBC)

marisa: Marisa Demeglio (DAISY consortium), in the Publishing WG and interested in accessibility

ericc: Eric Carlson (Apple), on the Webkit team, mostly working on media in the web, and
... of course very interested in accessibility solutions.

Andreas: Andreas Tai (IRT), mainly work on subtitles and captions and also look at other
... accessibility. Unfortunately not yet resources for dedicating time to this, but interested
... in the status.

onishi: Onishi (NHK), NHK use 4K and 8K broadcast service and this uses TTML. I'd like
... to research use case for TTML.

Matt: Matt Simpson (Red Bee), Head of Portfolio for Access Services, probably one of the
... biggest producers of audio description by volume for a number of clients around the world.

Nigel: Thank you all

Current and future status

Nigel: AD CG set up earlier in the year, we have a repo, an Editor, and participants.
... Goal: Get to good enough for Rec Track, add to TTWG Charter 1st half 2019

marisa: Timeline for TTML2?

Nigel: TTML2 is in Proposed Rec status, the TTWG is targeting Rec publication on 13th November.
... The AC poll is open until 1st November. Please vote if you haven't already!

Requirements

Nigel: Goal: To create an open standard exchange format to support audio description all the way from scripting to mixing.

ericc: You should look at what 3PlayMedia has.

Nigel: Thanks I will
... Are they delivering accessible text versions of AD?

ericc: Yes, both AD and extended, both pre-recorded and synthetic text, and they have
... a javascript based plug-in that works in modern browsers.

Nigel: That sounds great, I didn't know about that, thank you.

ericc: I haven't played with it much but it seems to work quite well.

marisa: When you talk about an accessible text what makes it accessible?

Nigel: It's delivered as text and the player can present it in an aria live region so that
... accessibility tools can pick it up.

marisa: And TTML makes that happen?

Nigel: It needs the player to make it happen.
... Existing Requirements - I published a wiki page of requirements a while back.

AD requirements

Nigel: Those requirements got some feedback which led to changes.
... In particular to relate them to the W3C MAUR requirements, which they align with.

<marisa> https://github.com/w3c/ttml2/wiki/Audio-Description-Requirements

Nigel: Those requirements describe the process that the document needs to support
... but not the specifics of what the document itself needs to support.
... I've done a first pass review, the main body of the spec work would be to validate that
... those TTML2 feature designators are the correct set.

<ericc> https://www.w3.org/community/audio-description/files/2018/10/AD-CG-F2F-2018-10-25.pdf

Nigel: In looking at those requirements I thought there were some constraints to consider.
... Two questions from me:
... 1. Do we ever need to be able to have more than one “description” active at the same time?

Matt: I can't see a reason for needing this - it would have to be a variation of the primary language.
... Multiple localised versions might be needed.
... I imagine that would be a single track per file.
... Yes, interesting thought.

marisa: A variation on a use case, if you have a deaf-blind user who is following the
... captions they also need the information from the description and the captions.

markw: They would have both description and captions available at the same time.

Nigel: Assumptions on my part:
... Separate AD and captions files
... No AD over dialogue so not a significant issue of overlap

marisa: If viewer needs to pause AD to read it on a braille display...

Nigel: My assumption: that would also pause media.

ericc: [nods]

marisa: That's the trickiest use case I can think of

Nigel: Me too

atai: I'm not sure if immersive environments are in scope.
... A European project that IRT is involved with is exploring requirements for AD in 360º videos.
... I'm not sure if they implemented it, but one idea is to have some parts of the AD only
... activated if the user looks in a certain direction, so if this is happening in one document
... then there would be certain AD parts with the same timing but maybe not active at
... the same time.

marisa: Great use case!
... Now a deaf blind user in a 360º is now the trickiest use case in the world I can think of!

ericc: That means in addition to a time range, in the case of a 360º video you may also
... want to have an additional selector for the viewport in which it is active.

markw: Or the location of the object it is associated with.

atai: This is very similar to the subtitle use case we showed before where you stick
... subtitles to a location. You need the same location information for AD.

markw: The user could have selections about the width of the viewport they want.

Nigel: That's a great use case - can I suggest it's a v2 thing based on the solution for
... subtitles, which we also don't know yet?

atai: I agree the solution for subtitles should apply here. That makes sense, but it would be
... good to discuss it and understand the dependencies.
... I will check with the people working on this. I don't know any technical group working
... on audio description so it would be a good forum for working on requirements.
... If they want to contribute something they can post it on the CG reflector.

Nigel: Good plan.
... Summarising, I don't think I've heard any requirement for multiple descriptions to be
... active at the same time, within a single language.
... My next constraint question is:
... Do we need to set media time ranges (clipBegin and clipEnd) on embedded audio?
... TTML2 allows audio to be embedded, but in our implementation work we hit a snag.
... applying media fragment URIs to a data URL is tricky.

ericc: Embedding audio as text is a terrible idea.

markw: Any reason other than the amount of data?

ericc: You have to keep the text and the decoded audio in memory at the same time,
... which is additional overhead.
... Technically it should be straightforward to seek to a point.

marisa: I don't want to implement it!

ericc: It's terrible.

atai: Is it then debatable to leave out this feature of embedded audio?

Nigel: I think so, yes, the result would be that distribution of recorded audio would have
... to be additional files alongside the TTML2 file. That has an asset management impact,
... but it also seems like good practice.

ericc: High level question: I talked with Ken Harenstein who does YouTube captions, last week,
... and he told me about 3PlayMedia. He said that from their research and from talking to
... users of audio descriptions and from talking to 3PlayMedia, it was his understanding that
... many users of audio descriptions prefer speech synthesis to pre-recorded because
... partly it allows them to set the speed like they're used to doing with screen readers
... and it made extended audio descriptions less disruptive because it reduces the likelihood
... of interrupting playback of the main resource. I wonder if you have heard that too and if
... it is true it seems that there should be information in a spec helping people who make
... these make the right kind.

Nigel: TTML2 supports text to speech, and also players can switch off the audio
... and expose the text to screen readers instead to allow the user's screen reader to take
... over.

marisa: I've heard that most screen readers speed up the speech.

markw: I've heard it works better speeding up synthesised speech

marisa: Of course if there's no language support for text to speech then you may still
... need pre-recorded audio.

atai: You may need to know how long the text to speech will take to author the rate correctly.

Nigel: There's a whole other world of pain in terms of distributability of web voices for text to speech.

ericc: I think the requirement is that the player pauses to allow for completion of the
... audio description, so it doesn't matter how long it takes.

marisa: What if you're switching language of AD and some are more verbose than others?

ericc: Yes, as long as the description accurately identifies the section of the media file
... that it describes then it is easy enough for the player to take care of, or at least it is the
... player's responsibility.

markw: The player could do other things like tweaking the playback speed to fit.

ericc: The Web Speech API doesn't allow access to predicting the duration of the speech.

atai: Is player behaviour in scope for this document?

ericc: Absolutely.
... It seems to me that it is because if you don't describe the behaviour of the player you
... are going to get different incompatible or non-interoperable implementations and that
... is an anti-goal.

markw: You want to describe the space of possible player behaviours, we just need to
... provide the information.

ericc: Yes, give guidelines to help implementers do the right thing, and people who create the descriptions.

Nigel: I agree, this is somewhat informative relative to the document format, but for example
... our UX people suggested that users would want to direct AD text to a screen reader
... and switch off audio presentation sometimes, or at least be able to select that.

marisa: Maybe have both audio and braille display to check spellings or do some other text-related processing.

Nigel: Yes
... In terms of user preference for synthesised or pre-recorded speech, one data point
... I learned recently is that the intelligibility of synthesised speech degrades more quickly
... in the presence of ambient sounds than human speech. The reasons are not clear.

markw: Suggests that some users would want to receive the AD in a separate earpiece
... from other audience members watching the same programme.

Matt: I think this is like dubbing vs subtitling, there may be cultural reasons for preferences.
... Our experience is it is harder to automate variable reading rate descriptions, and we find
... that invaluable to squeeze a description into a short period or let it "breathe".
... It's probably down to historical experience.

fbeaufort: I work at Google on the developer relations team.

Nigel: Any other constraints or requirements?

group: [silence]

TTML2 in more detail

Nigel: [slide on Audio Model]
... I just added this to try to explain because I've found it can be tricky to get across to developers
... that there is an analogy with HTML/CSS and the audio model in TTML.

markw: Players may or may not do this based on user preference, if for example someone
... is listening on a headset and there's main programme audio in the room the mixing
... preferences might change.
... [slide on the Web Audio Graph]
... This allows the audio mixing to happen with all the options that are needed in general
... in TTML2 - it may be that we only exercise a part of that solution space.

Proposed Solution

Nigel: The solution that I'm proposing is a profile of TTML2
... [slide for Profile of TTML2]

ericc: Also add that a UI should be provided for controlling the speed of audio descriptions

Nigel: Yes
... The other things on this slide we already discussed.
... Is anyone thinking this is a great problem to solve but it should look completely different?

ericc: Is it a goal to define a guide for how this should work in a web browser?

Nigel: The TTML2 features are done in terms of Web Audio, Web Speech etc. so yes.
... The mixing might happen server side but the client side mixing options allow for a better
... range of accessible experiences.

ericc: It seems to me that a really detailed guide to implementation would be the most useful thing.
... An explicit goal should be to help producers to create content in the right way, but also
... to help people that want to deliver that to know how to make it available to the people that need it.
... Not distribution, the playback experience.
... Nicely constructed audio descriptions are not useful unless the people that need them are
... able to consume them.

Nigel: [nods]

atai: It might be interesting to identify what is missing to get a good implementation in a browser
... environment.
... It might be interesting to hear how much browser communities are interested in that
... case. A possible way to do this would be to implement a javascript polyfill or something
... I'm not sure how much interest there is in native support.

ericc: Both are extremely useful. I don't know anything about 3PlayMedia but they have
... a javascript based player that uses text to speech API so we know that it is possible.
... There's is a commercial solution. We should have a description of ...
... and as a data point I was at a conference last week about media in the web and this was
... one of the breakouts, audio descriptions and extended audio descriptions.
... It was well attended and people in the room were very interested in coming up with a
... solution that browsers could implement natively.

Nigel: I'd love to be in touch with those people.

Implementation Experience

Nigel: BBC implemented a prototype to support TTML2 Rec track work

BBC implementation

Nigel: The point here is that it is possible to do this with current browser technologies,
... even if there are some minor issues that I should raise as issues, like on Web Speech.
... Question: Any other implementation work, or people who would like to do that at this time?

marisa: I would say no, we don't have the bandwidth but I'm keeping my eye on this for
... the long term. The use cases come up all the time from the APA group. I think it is
... on the horizon, but I can't commit to anything on the same timeline as this spec.

atai: Does BBC plan to publish this software as a reference implementation?

Nigel: I would say first we should publish as open source, and then allow for some
... scrutiny, and if people agree it's at that level then great. I don't think it is now.
... It would need more work.

atai: The question is if the BBC could be motivated to provide it as a reference
... implementation. It would help if you have a complete reference implementation.

Nigel: I would like to, but I don't think the code is good enough yet.
... I'm interested in other implementations too, for example it is possible that some
... participants in AD CG might make authoring tools.

ericc: You should talk to 3Play also.

Nigel: Yes, I will. It'd be great if they would join us here.

Roles, Tools, Timelines, Next Steps

Nigel: In terms of tools, we have a GitHub repo w3c/adpt
... We have the reflector, and EBU has kindly offered to facilitate web meetings with their WebEx.
... [Next steps slide]

atai: Regarding the next steps, to move over to WG and Rec track, does it necessarily have
... to end up in the TTWG? Could it be another group?
... Could it be somewhere else?
... To make sure the right set of people are involved.

Nigel: I'm not dogmatic about this - it seems like the home of TTML is a good place for
... profiles of TTML, but if there's a better chance of getting to Rec doing it somewhere else
... then I don't mind where it happens.

atai: One other idea: when the TTML2 feature set is there it may be useful to have a
... gap listing relative to IMSC 1.1 so that if people want to reuse implementations and
... start from IMSC 1.1 rather than TTML2 then they can see what they already have.

ericc: Or which features they prefer not to use.

Nigel: Because they had implementation difficulty?

ericc: Yes, for example someone targeting IMSC 1.1 support, if you list the features that
... are only supported in one and not the other, it could inform.

Nigel: Of course the significant features in IMSC are about visual presentation and here
... we are interested in audio features, so the common core of timing is all that's really left.

Discussion and close

Nigel: We've had good discussion all the way through, so thank you everyone.

ericc: Defining this using those TTML2 features is interesting and its good.
... It sets a fairly high bar to implement.

Nigel: It took a couple of weeks to implement.

ericc: It makes me wonder if it would be possible to have something that is more like a
... minor variation in a caption format.

Nigel: I think that's what this is.

ericc: Except for the ability to embed audio.

Nigel: That maybe took about half a day to implement. We could remove it from scope.

atai: It would be good to know what problems there are bringing this to a browser environment.

ericc: That's true. At the most basic it seems that what we have is some text and a range
... of time that it applies to in another file.

Nigel: I'm thinking of high production values where detailed audio mixing is needed.

ericc: Is that something we need for the web?

Nigel: I am aiming for a single open standard file format that content producers can use
... all the way through from content creation to broadcast and web use.

Matt: I would agree.

markw: Thinking about our chain, we create premixed versions and they seem quite high
... quality, so this might be worth considering.

atai: Thinking about the history of TTML, it started out as an authoring format and then
... began to be used for distribution and playback, which lead to IMSC. I understand the
... purpose for one file for the whole chain, that's perfect, it's ideal, we should just avoid the
... pitfalls.

ericc: If the goal is to have native implementation in a browser it may be worth looking
... at the complexity with that goal in mind.
... If it is not a goal then that's fine, but if it is then keep that goal in mind.

Nigel: I am not sure. It can be done with a polyfill but would browser makers like to support
... the primitives to allow that or to implement it natively?

atai: The playback experience would be better natively.

fbeaufort: If the playback was the same would you still want native implementation?

Nigel: It would be great to avoid sending polyfill js to every page in that case, and it would
... make adoption easier if the page author just had to include a track in the video element
... and then it would play.

ericc: Your polyfill is about 50KB of unminified uncompressed js so it's not very big.

Nigel: Thank you everyone! [adjourns meeting]

Audio Description Community Group

25 Oct 2018

Attendees

Contents