See also: IRC log
<scribe> scribe: nigel
Nigel: Welcome everyone to the
first face to face meeting of the AD CG.
... Run through of agenda
Nigel: In the room we have:
... Nigel Megitt (BBC)
marisa: Marisa Demeglio (DAISY consortium), in the Publishing WG and interested in accessibility
ericc: Eric Carlson (Apple), on
the Webkit team, mostly working on media in the web, and
... of course very interested in accessibility solutions.
Andreas: Andreas Tai (IRT),
mainly work on subtitles and captions and also look at
other
... accessibility. Unfortunately not yet resources for
dedicating time to this, but interested
... in the status.
onishi: Onishi (NHK), NHK use 4K
and 8K broadcast service and this uses TTML. I'd like
... to research use case for TTML.
Matt: Matt Simpson (Red Bee),
Head of Portfolio for Access Services, probably one of
the
... biggest producers of audio description by volume for a
number of clients around the world.
Nigel: Thank you all
Nigel: AD CG set up earlier in
the year, we have a repo, an Editor, and participants.
... Goal: Get to good enough for Rec Track, add to TTWG Charter
1st half 2019
marisa: Timeline for TTML2?
Nigel: TTML2 is in Proposed Rec
status, the TTWG is targeting Rec publication on 13th
November.
... The AC poll is open until 1st November. Please vote if you
haven't already!
Nigel: Goal: To create an open standard exchange format to support audio description all the way from scripting to mixing.
ericc: You should look at what 3PlayMedia has.
Nigel: Thanks I will
... Are they delivering accessible text versions of AD?
ericc: Yes, both AD and extended,
both pre-recorded and synthetic text, and they have
... a javascript based plug-in that works in modern
browsers.
Nigel: That sounds great, I didn't know about that, thank you.
ericc: I haven't played with it much but it seems to work quite well.
marisa: When you talk about an accessible text what makes it accessible?
Nigel: It's delivered as text and
the player can present it in an aria live region so that
... accessibility tools can pick it up.
marisa: And TTML makes that happen?
Nigel: It needs the player to
make it happen.
... Existing Requirements - I published a wiki page of
requirements a while back.
Nigel: Those requirements got
some feedback which led to changes.
... In particular to relate them to the W3C MAUR requirements,
which they align with.
<marisa> https://github.com/w3c/ttml2/wiki/Audio-Description-Requirements
Nigel: Those requirements
describe the process that the document needs to support
... but not the specifics of what the document itself needs to
support.
... I've done a first pass review, the main body of the spec
work would be to validate that
... those TTML2 feature designators are the correct set.
<ericc> https://www.w3.org/community/audio-description/files/2018/10/AD-CG-F2F-2018-10-25.pdf
Nigel: In looking at those
requirements I thought there were some constraints to
consider.
... Two questions from me:
... 1. Do we ever need to be able to have more than one
“description” active at the same time?
Matt: I can't see a reason for
needing this - it would have to be a variation of the primary
language.
... Multiple localised versions might be needed.
... I imagine that would be a single track per file.
... Yes, interesting thought.
marisa: A variation on a use
case, if you have a deaf-blind user who is following the
... captions they also need the information from the
description and the captions.
markw: They would have both description and captions available at the same time.
Nigel: Assumptions on my
part:
... Separate AD and captions files
... No AD over dialogue so not a significant issue of
overlap
marisa: If viewer needs to pause AD to read it on a braille display...
Nigel: My assumption: that would also pause media.
ericc: [nods]
marisa: That's the trickiest use case I can think of
Nigel: Me too
atai: I'm not sure if immersive
environments are in scope.
... A European project that IRT is involved with is exploring
requirements for AD in 360º videos.
... I'm not sure if they implemented it, but one idea is to
have some parts of the AD only
... activated if the user looks in a certain direction, so if
this is happening in one document
... then there would be certain AD parts with the same timing
but maybe not active at
... the same time.
marisa: Great use case!
... Now a deaf blind user in a 360º is now the trickiest use
case in the world I can think of!
ericc: That means in addition to
a time range, in the case of a 360º video you may also
... want to have an additional selector for the viewport in
which it is active.
markw: Or the location of the object it is associated with.
atai: This is very similar to the
subtitle use case we showed before where you stick
... subtitles to a location. You need the same location
information for AD.
markw: The user could have selections about the width of the viewport they want.
Nigel: That's a great use case -
can I suggest it's a v2 thing based on the solution for
... subtitles, which we also don't know yet?
atai: I agree the solution for
subtitles should apply here. That makes sense, but it would
be
... good to discuss it and understand the dependencies.
... I will check with the people working on this. I don't know
any technical group working
... on audio description so it would be a good forum for
working on requirements.
... If they want to contribute something they can post it on
the CG reflector.
Nigel: Good plan.
... Summarising, I don't think I've heard any requirement for
multiple descriptions to be
... active at the same time, within a single language.
... My next constraint question is:
... Do we need to set media time ranges (clipBegin and clipEnd)
on embedded audio?
... TTML2 allows audio to be embedded, but in our
implementation work we hit a snag.
... applying media fragment URIs to a data URL is tricky.
ericc: Embedding audio as text is a terrible idea.
markw: Any reason other than the amount of data?
ericc: You have to keep the text
and the decoded audio in memory at the same time,
... which is additional overhead.
... Technically it should be straightforward to seek to a
point.
marisa: I don't want to implement it!
ericc: It's terrible.
atai: Is it then debatable to leave out this feature of embedded audio?
Nigel: I think so, yes, the
result would be that distribution of recorded audio would
have
... to be additional files alongside the TTML2 file. That has
an asset management impact,
... but it also seems like good practice.
ericc: High level question: I
talked with Ken Harenstein who does YouTube captions, last
week,
... and he told me about 3PlayMedia. He said that from their
research and from talking to
... users of audio descriptions and from talking to 3PlayMedia,
it was his understanding that
... many users of audio descriptions prefer speech synthesis to
pre-recorded because
... partly it allows them to set the speed like they're used to
doing with screen readers
... and it made extended audio descriptions less disruptive
because it reduces the likelihood
... of interrupting playback of the main resource. I wonder if
you have heard that too and if
... it is true it seems that there should be information in a
spec helping people who make
... these make the right kind.
Nigel: TTML2 supports text to
speech, and also players can switch off the audio
... and expose the text to screen readers instead to allow the
user's screen reader to take
... over.
marisa: I've heard that most screen readers speed up the speech.
markw: I've heard it works better speeding up synthesised speech
marisa: Of course if there's no
language support for text to speech then you may still
... need pre-recorded audio.
atai: You may need to know how long the text to speech will take to author the rate correctly.
Nigel: There's a whole other world of pain in terms of distributability of web voices for text to speech.
ericc: I think the requirement is
that the player pauses to allow for completion of the
... audio description, so it doesn't matter how long it
takes.
marisa: What if you're switching language of AD and some are more verbose than others?
ericc: Yes, as long as the
description accurately identifies the section of the media
file
... that it describes then it is easy enough for the player to
take care of, or at least it is the
... player's responsibility.
markw: The player could do other things like tweaking the playback speed to fit.
ericc: The Web Speech API doesn't allow access to predicting the duration of the speech.
atai: Is player behaviour in scope for this document?
ericc: Absolutely.
... It seems to me that it is because if you don't describe the
behaviour of the player you
... are going to get different incompatible or
non-interoperable implementations and that
... is an anti-goal.
markw: You want to describe the
space of possible player behaviours, we just need to
... provide the information.
ericc: Yes, give guidelines to help implementers do the right thing, and people who create the descriptions.
Nigel: I agree, this is somewhat
informative relative to the document format, but for
example
... our UX people suggested that users would want to direct AD
text to a screen reader
... and switch off audio presentation sometimes, or at least be
able to select that.
marisa: Maybe have both audio and braille display to check spellings or do some other text-related processing.
Nigel: Yes
... In terms of user preference for synthesised or pre-recorded
speech, one data point
... I learned recently is that the intelligibility of
synthesised speech degrades more quickly
... in the presence of ambient sounds than human speech. The
reasons are not clear.
markw: Suggests that some users
would want to receive the AD in a separate earpiece
... from other audience members watching the same
programme.
Matt: I think this is like
dubbing vs subtitling, there may be cultural reasons for
preferences.
... Our experience is it is harder to automate variable reading
rate descriptions, and we find
... that invaluable to squeeze a description into a short
period or let it "breathe".
... It's probably down to historical experience.
fbeaufort: I work at Google on the developer relations team.
Nigel: Any other constraints or requirements?
group: [silence]
Nigel: [slide on Audio
Model]
... I just added this to try to explain because I've found it
can be tricky to get across to developers
... that there is an analogy with HTML/CSS and the audio model
in TTML.
markw: Players may or may not do
this based on user preference, if for example someone
... is listening on a headset and there's main programme audio
in the room the mixing
... preferences might change.
... [slide on the Web Audio Graph]
... This allows the audio mixing to happen with all the options
that are needed in general
... in TTML2 - it may be that we only exercise a part of that
solution space.
Nigel: The solution that I'm
proposing is a profile of TTML2
... [slide for Profile of TTML2]
ericc: Also add that a UI should be provided for controlling the speed of audio descriptions
Nigel: Yes
... The other things on this slide we already discussed.
... Is anyone thinking this is a great problem to solve but it
should look completely different?
ericc: Is it a goal to define a guide for how this should work in a web browser?
Nigel: The TTML2 features are
done in terms of Web Audio, Web Speech etc. so yes.
... The mixing might happen server side but the client side
mixing options allow for a better
... range of accessible experiences.
ericc: It seems to me that a
really detailed guide to implementation would be the most
useful thing.
... An explicit goal should be to help producers to create
content in the right way, but also
... to help people that want to deliver that to know how to
make it available to the people that need it.
... Not distribution, the playback experience.
... Nicely constructed audio descriptions are not useful unless
the people that need them are
... able to consume them.
Nigel: [nods]
atai: It might be interesting to
identify what is missing to get a good implementation in a
browser
... environment.
... It might be interesting to hear how much browser
communities are interested in that
... case. A possible way to do this would be to implement a
javascript polyfill or something
... I'm not sure how much interest there is in native
support.
ericc: Both are extremely useful.
I don't know anything about 3PlayMedia but they have
... a javascript based player that uses text to speech API so
we know that it is possible.
... There's is a commercial solution. We should have a
description of ...
... and as a data point I was at a conference last week about
media in the web and this was
... one of the breakouts, audio descriptions and extended audio
descriptions.
... It was well attended and people in the room were very
interested in coming up with a
... solution that browsers could implement natively.
Nigel: I'd love to be in touch with those people.
Nigel: BBC implemented a prototype to support TTML2 Rec track work
Nigel: The point here is that it
is possible to do this with current browser technologies,
... even if there are some minor issues that I should raise as
issues, like on Web Speech.
... Question: Any other implementation work, or people who
would like to do that at this time?
marisa: I would say no, we don't
have the bandwidth but I'm keeping my eye on this for
... the long term. The use cases come up all the time from the
APA group. I think it is
... on the horizon, but I can't commit to anything on the same
timeline as this spec.
atai: Does BBC plan to publish this software as a reference implementation?
Nigel: I would say first we
should publish as open source, and then allow for some
... scrutiny, and if people agree it's at that level then
great. I don't think it is now.
... It would need more work.
atai: The question is if the BBC
could be motivated to provide it as a reference
... implementation. It would help if you have a complete
reference implementation.
Nigel: I would like to, but I
don't think the code is good enough yet.
... I'm interested in other implementations too, for example it
is possible that some
... participants in AD CG might make authoring tools.
ericc: You should talk to 3Play also.
Nigel: Yes, I will. It'd be great if they would join us here.
Nigel: In terms of tools, we have
a GitHub repo w3c/adpt
... We have the reflector, and EBU has kindly offered to
facilitate web meetings with their WebEx.
... [Next steps slide]
atai: Regarding the next steps,
to move over to WG and Rec track, does it necessarily
have
... to end up in the TTWG? Could it be another group?
... Could it be somewhere else?
... To make sure the right set of people are involved.
Nigel: I'm not dogmatic about
this - it seems like the home of TTML is a good place for
... profiles of TTML, but if there's a better chance of getting
to Rec doing it somewhere else
... then I don't mind where it happens.
atai: One other idea: when the
TTML2 feature set is there it may be useful to have a
... gap listing relative to IMSC 1.1 so that if people want to
reuse implementations and
... start from IMSC 1.1 rather than TTML2 then they can see
what they already have.
ericc: Or which features they prefer not to use.
Nigel: Because they had implementation difficulty?
ericc: Yes, for example someone
targeting IMSC 1.1 support, if you list the features that
... are only supported in one and not the other, it could
inform.
Nigel: Of course the significant
features in IMSC are about visual presentation and here
... we are interested in audio features, so the common core of
timing is all that's really left.
Nigel: We've had good discussion all the way through, so thank you everyone.
ericc: Defining this using those
TTML2 features is interesting and its good.
... It sets a fairly high bar to implement.
Nigel: It took a couple of weeks to implement.
ericc: It makes me wonder if it
would be possible to have something that is more like a
... minor variation in a caption format.
Nigel: I think that's what this is.
ericc: Except for the ability to embed audio.
Nigel: That maybe took about half a day to implement. We could remove it from scope.
atai: It would be good to know what problems there are bringing this to a browser environment.
ericc: That's true. At the most
basic it seems that what we have is some text and a range
... of time that it applies to in another file.
Nigel: I'm thinking of high production values where detailed audio mixing is needed.
ericc: Is that something we need for the web?
Nigel: I am aiming for a single
open standard file format that content producers can use
... all the way through from content creation to broadcast and
web use.
Matt: I would agree.
markw: Thinking about our chain,
we create premixed versions and they seem quite high
... quality, so this might be worth considering.
atai: Thinking about the history
of TTML, it started out as an authoring format and then
... began to be used for distribution and playback, which lead
to IMSC. I understand the
... purpose for one file for the whole chain, that's perfect,
it's ideal, we should just avoid the
... pitfalls.
ericc: If the goal is to have
native implementation in a browser it may be worth
looking
... at the complexity with that goal in mind.
... If it is not a goal then that's fine, but if it is then
keep that goal in mind.
Nigel: I am not sure. It can be
done with a polyfill but would browser makers like to
support
... the primitives to allow that or to implement it
natively?
atai: The playback experience would be better natively.
fbeaufort: If the playback was the same would you still want native implementation?
Nigel: It would be great to avoid
sending polyfill js to every page in that case, and it
would
... make adoption easier if the page author just had to include
a track in the video element
... and then it would play.
ericc: Your polyfill is about 50KB of unminified uncompressed js so it's not very big.
Nigel: Thank you everyone! [adjourns meeting]