W3C

– DRAFT –
Synchronization Accessibility User Requirements (SAUR)

20 Oct 2021

Attendees

Present
atai, Bert_Bos, janina, Jemma, Jennie, Joshue108, Judy, kirkwood, MURATA, Nigel_Megitt, Roy, SteveNoble, tzviya
Regrets
-
Chair
Jason_White
Scribe
Joshue108

Meeting minutes

Introducing SAUR and its implications

<introductions>

JW: Thanks e'one for attending
… We are here to discuss the FPWD of Synchronization Accessibility User Requirements (SAUR)
… It has implications for future W3C guidance including future accessibility guidelines
… To define the problem..
… <Jason gives background on synchronization of accessibility-related components of multimedia, such as captions, sign language interpretation, and descriptions>
… How closely in sync do these resources need to be?
… What we have done in the Research Questions Task force (APA) is look at research literature and document findings.
… As well as the timing tolerances between different media resources.
… We cover enhancing compreshension via synchronization of captions, sign language interpretation, and descriptions
… There is a question around XR, Augmented environments - do they have different tolerances?
… We note the distinction between live and pre-recorded media also.
… There are various issues covered
… We need to discuss how we shall document certain things - but the intent is for other groups, working in related specs can take this material and use it.
… We want to make sure this work is useful as it is developed.

JW: Going to hand over to Steve Noble - he did most of the research here and documented the findings.
… We can then discuss.

<Zakim> nigel, you wanted to note the different actors involved and their respective responsibilities to achieve the requirements

NM: This is really valuable reserach, in terms of being data driven.
… The Timed Text working group is happy to be invited.

NM: Notes this is coming up in media IG and other places.
… TTWG focus is on the document format - and what text should be presented at what time.
… The diff is that you can specify the time, in a Timed Text doc - but what matters from audience perspective, is what gets presented.
… There are many things happening in a user agent that is presenting various a11y source formats.
… These requirements explain what is good for the audience, but we need to think in terms of meeting those.
… Playback requirements etc - how close should the UA get to honouring these requirements.
… There are diff jobs and responsibilities. The concern is that in defining the end result we need to ensure we note who is responsible.

SN: Thank you. That does come through in the research.
… What we know from research does impact on the user when one media component, is in sync with another.
… There is the question around what is possible currently, and what is experienced by the end user.
… Regarding syncing primary video and audio - if we think of a person speaking.
… That is a simple synchronization issue.
… But if they get out of sync - this causes an a11y barrier.
… For those who are lip reading, or hard of hearing, or in noisy space.
… Really, there are constraints relating to the media.
… Those in TV can control these things much more easily than in a Zoom meeting for example.
… So the environment is a factor.
… However, we are trying to find metrics.
… What are our target tolerances.
… What provides the best a11y?

SN: These points are around, what are the technology capabilities - what is possible?
… In this doc, we are trying to look at the issues that are known.
… Link to document: https://www.w3.org/TR/saur/

SN: I can quickly point out some of the issues we were looking at.
… e.g. the synchronization of the audio and video streams
… The research shows that for those who are not hard of hearing, a person can experience the same issues, as someone who is hard of hearing.
… This shows us that there is a range where the audio and video can be out of sync, but beyond that range can mean an a11y barrier.
… When they get very out of sync, this gets in the way of comprehension on one level, and reduces enjoyment on another.

SN: The researach shows that having the video slightly ahead (milliseconds) can actually be beneficial.
… You are starting to comprehend before you hear the audio

SN: When they get out of sync it causes problems for many.
… We looked at caption synchronization. There is an issue around the rate that it comes accross.
… We attempted to provide some metrics. As well as caption synchronization capabilities.
… There will be a trade off between latency and accuracy.
… There is often latency, and human captioning can't match automated but there is a quality tradeoff.
… In live meetings, how late can the captions come?
… We have just looked at the research, and the community will need to discuss.
… Regarding Sign Language, it is not a one to one translation.
… So there is some delay - so what kind of lag is sufficient, to not put the user at a disadvantage.
… Regarding video description synchronization - you are looking for available space in the audio - to describe what is going on visually.
… Impossible to have it exactly in sync.
… So what is possible and best case scenario?
… Regarding XR - we are not aware of a lot of research in XR media timing. Some here may have insights.
… So that is an overview.
… Happy to discuss and hear from others.

<Zakim> nigel, you wanted to mention that there's an asymmetry in audience experience between late and early presentation, at least of captions

NM: What you call video description may be called audio description in other places/
… You alluded to the asymmetry in audience experience between late and early presentation.
… Early captions can be harder for people to deal with, rather than late ones.
… <Discusses point on caption latency, especially live.>

SN: Pre-recorded captions have more scope for tweaking etc. There is a limited range of what can be done.
… There is an issue finding 'space'.

NM: From an editorial angle - they may not be exactly mapped to what is being described.
… Comprehension can happen anyway, depending on context - sync requirements do change.

SN: Exactly.

SN: <Tells anecdote about audio described Broadway show - where the jokes are delivered before they happen>

SN: It requires planning to get this right.

JS: Harks back to conversations we had around the time of HTML 5.2
… We have Media Accessibility User Requirements - where we looked at video, being presented as audio, or outputted as Braille/TTS
… We outlined the ability to allow the user to consume descriptions of video presented in text.
… <gives example>
… There may be more elaborate descriptions needed depending on context.
… You may also need to pause, control the stream etc.

<nigel> Demonstrator for allowing audio description to be presented in text, based on Audio Description Profile of TTML2

JS: Entertainment and education have different requirements
… Time off-sets etc

JS: We can revisit as a part of WCAG

AT: Like Nigel I think this is valuable collection of requirements that has come up earlier.
… It would be good to have concrete values for these tolerances
… It would be good to have clearer guideline of what to do.
… Looking at the research is good - people are looking for this.
… Caption rate etc - also important.
… The European Broadcasting union also has requirements.
… Secondly, it is good to summarize but who is responsible?
… If would be good to look at what kind of technical application development are these requirements targetting.
… And at what stage?

SN: What do we do next? What ranges of tolerances should be used for a11y? But what is the goal, who will implement?

JW: We are discussing interesting questions.
… We should summarize the research findings and draw some conclusions.
… If you know of other who could review and submit issues, please do notify them.
… What else can we do in the next version?

JS: Coming to conclusions.. I'd like input from Nigel, Andreas and others.
… We are looking at asking the browser to buffer content, as a part of Flash mitigation.
… There will be discussions later.
… If you can buffer, it can help users who are sensitive, when the machine can prevent this.
… Would this be helpful ? Tighter tolerances?
… Is this another use case?

NM: Regarding video - there are legal UK requirements to avoid flashing.
… Responsibility of the content provider.
… You could extend the scope to non video?
… Animations could do this for example.
… Some video providers spend effort on managing buffer time - so may be tricky if forced.
… Could work as an option.

NM: There is also an issue around @@

JB: This is an issue around assessing a users disability status.
… Our personalization work may help with Sandboxing this.
… <Discusses non-detectable buffering implementations>

JS: IIRC the delay is very short

<Zakim> Judy, you wanted to clarify the optionality

<janina> https://groups.csail.mit.edu/infolab/publications/Barbu-Deep-video-to-video-transformations.pdf

JB: I'm encouraged at the research - this may be low frequency but substantial for those effected.

<Zakim> nigel, you wanted to ask about non-media synchronisation accessibility user requirements

NM: This has made me think - you can cause flashing in diff ways.
… <NM reflects on user reaction to some interactions and responses>

NM: Should this be mentioned? Responsiveness?

JB: We want people to think about things we are missing?
… What are the other angles - thank you for thinking more broadly.

KP: As an aside, I've worked with transcripts and video - and it seems we are not talking enough about transcripts.
… Having multiple sources, can mean less of a problem.
… Another useful pathway.

SN: Good point Kim. And we haven't discussed this in the document.
… There are issues around, audio and video description etc - you want this stuff to make it into the transcript.
… This becomes an issue of incorporation.
… We've not looked at this here but we should

JS: This would be helpful - we did look at this when working on the MAUR (Media A11y)
… We got many things into the HTML spec.
… But there was a meeting where it was decided (2013), to not programmatically determine these things.
… That was a long time ago - things may be different now.

KM: I've got live examples of this.

JS: Yes.

KP: Also the production process is easier. I can demo.

SN: Appreciate the input on that.

Minutes manually created (not a transcript), formatted by scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).

Diagnostics

Succeeded: s/steam/stream/

Maybe present: AT, JB, JS, JW, KM, KP, NM, SN