Accessible RTC Use Cases

From Accessible Platform Architectures Working Group

NOTE: Latest draft of AccessibleRTC is now on Github (DEC 2019)

Overview of Accessible RTC [DRAFT]

This document outlines various accessibility related 'user requirements/needs', 'scenarios' and 'use cases' for Real Time Communication. These 'user needs' should drive accessibility requirements in various related specifications and the overall architecture that enables it. These user needs, scenarios, and use cases - come from people with disabilities who use Assistive Technology (AT) and wish to see the features described available within RTC enabled applications.

What is Real Time Communication(RTC)?

The traditional data exchange model is client to server. Real Time Communication(RTC)is game-changing. It is enabled in part, by specifications like WebRTC as this provides real-time peer to peer Audio, Video and Data exchange directly between supported web browsers without the need for browser plugins, as well as fast applications for video/audio calls, text chat, file exchange, screen sharing and gaming. However, WebRTC is not the sole specification with responsibility to enable accessible RTC. [1]

Accessible RTC is enabled by a combination of technologies and specifications such as those from the Media Working Group, Web and Networks IG, Second Screen, and Web Audio Working group as well as AGWG and ARIA. APA hopes this work will inform how these groups meet various responsibilities for enabling RTC, as as well updating use cases in various groups. For example, current work on WebRTC Next Version Use Cases can be reviewed here. [2]

Real Time Communication and Accessibility

RTC has the potential to allow improved accessibility features that will support a broad range of user needs for people with a wide range of disabilities. These needs can be met through improved audio and video quality, audio routing, captioning, improved live transcription, transfer of alternate formats such as sign-language, text-messaging / chat, real time user support, status polling.

User needs/Scenarios, Use Case definition and review process

This document outlines various accessibility related 'user needs' for Accessible RTC. These 'user needs' should drive accessibility requirements for Accessible RTC and its related architecture. These come from people with disabilities who use Assistive Technology (AT) and wish to see the features described available within Accessible RTC enabled applications.

User needs are framed in a range of 'Scenarios' (which can be thought of as similar to 'User Stories'). User needs and requirements are being actively reviewing by RQTF/APA in the context of the broader scope and application of this document.

  1. The groups aims to ensure we have user scenarios that embrace all of the requirements (our own as well as those found elsewhere).
  2. The group aims to cite other documents for specific requirements, where applicable.
  3. The group aims to specify additional explicit requirements ourselves, as appropriate.

User Needs and Scenarios

The following outlines a range user needs in various scenarios. The use cases below have also been compared to existing use case for Real-Time Text (RTT) such as the IETF Framework for Real-Time Text over IP Using the Session Initiation Protocol RFC 5194 and the European Procurement Standard EN 301 549.

Incoming calls and Caller ID

Scenario: A screen-reader user or user with a cognitive impairment needs to know a call is incoming and need to recognise the ID of a caller in an unobtrusive way e.g. a symbol set or other browser notification to indicate incoming calls or by alerting assistive technologies via relevant APIs.

E.g A user may wish to route the call notifications to a separate device such as Brailler and the call to regular bluetooth headphone once they have accepted it.

QUESTION: Does WebRTC have an API for notifications? Or should they use standard (browser level) notifications? What about ARIA notifications? Can these outputs be routed via a user preference?

Routing and Communication channel control

Scenario: A screen-reader user may have many audio output devices to manage. For example several displays, multiple sound cards to manage sound output (or input). Having a range of browser level outputs and routing options would remove the need for an analogue mixer or other sound cards and hardware.

Similar to the new WCAG 2.1 SC 'Status Messages' - a blind user may choose to route updates, alerts and so on to a specific output device of their choosing without them getting explicit focus. The user may wish to have 'mixed' type conversations. These mixed use case requirements are also mentioned in RFC 5194.

Scenario: A blind screen reader user wishes to monitor a chat stream in a video conference and may wish to direct system output, such as alerts or other output to a different device other than the screen reader, such as a Braille output device or other hardware. This controls means they can track the Braille output, separately from screen reader output , while continuing to monitor, watch or listening to a third audio source. Being able to direct the stream to the user's device of choice would give them this ability. This may also be useful for any user who wishes more control over where and how their communication streams are rendered.

Scenario: A deaf user may wish to move parts of a live teleconference session (as separate streams) to one or more devices for greater control. They may do this to configure aspects of the user experience in a WebRTC enabled application. For example by sending the video stream of a sign language interpreter to a high resolution display and like being able to manage this video stream separately, as the user may not wish to have it as a part of the video they are watching. The user needs to be able to control how and where alternate content such as subtitles or captions are displayed.

Scenario: Users with some form of cognitive disability or blind users may have relative volume levels set as preferences which could be related to importance/urgency/meaning. Positioning relative levels of audio can be used to arrange the 'importance' of various audio output to any given task. When a blind user is multitasking and getting status messages or monitoring different sound sources, the ability to be set panning would help the user have a broader sonic 'field' within which to place elements. This is a common technique in sound engineering and would allow a broader, richer sonic landscape.

NOTE: This issue is also highlighted in the APA Working Group Note Inaccessibility of CAPTCHA Alternatives to Visual Turing Tests on the Web as a problem a la telephone verification when a blind user is searching for key input and listening for audio cues. Control of volume and panning may be squarely in the Web Audio group.

RQTF notes that this may be moved into user agent/browser or application level by XR requirements.

NOTE: Components needs to be individually controlled. Multichannel may be squarely in the Web Audio group. Audio Output Devices API may fulfil some of the capabilities aspect of the use case: However, another area that needs to be explored is authorisation of access to these additional output devices Also the Audio Device Client Proposal may help to provide this kind of bespoke audio routing directly in the browser. Audio Device Client Explainer video Vimeo

Dynamic Audio description values in Live Conferencing

Scenario: A user may struggle to hear audio description depending on its volume level in a live teleconferencing situation. AD recommended sound values should be dynamic.

Quality Synchronisation and playback

Scenario: Any user watching the captioning or audio description needs to be confident that they are synchronised and accurate, that any outages/loss will be repaired while preserving context and meaning. For people with disabilities this may need special repair of broken streams in alternate tracks.

NOTE: There is currently no dedicated mechanism to transmit captions or audio descriptions in sync with WebRTC audio and video streams; There have been discussions on enabling a firmer way for synchronization based on the RTT standard from IETF (RFC 4103) - APA should look at if the current situation is good enough or if dedicated mechanism in WebRTC for transmission of synced captions etc is needed.

Simultaneous Voice, Text & Signing

Scenario: A deaf user wishes to both talk on a call, send and receive instant messages via a text interface and watch and/or communicate via sign language using a video stream. This could be partially enabled via RTT in WebRTC.

Support for Real Time Text (RTT)

Scenario: A deaf, speech impaired, hard of hearing or deaf blind user wishes to make an emergency call, send and receive instantly related text messages and or sign via a video stream in an emergency situation. This text aspect, and text relay services could be enabled via RTT in WebRTC. [4] [5]

Regarding RTT, the Federal Communications Commission (FCC) support the usage of RTT and in 2016, adopted rules to move from text telephony (TTY) to real-time text (RTT) technology. The FCC state that RTT should be a pre-installed feature of wireless devices that is enabled by default. [3]

NOTE: The FCC is the U.S. federal agency responsible for implementing and enforcing America’s communications law and regulations. We don't know if WebRTC-based systems offer emergency-call integration today - research needed on WebRTC requirements that go beyond accessibility; some requirements of the existing specs were derived from scenarios such as 'disabling voice-activity-detection'). Does WebRTC support transmitting characters direct like RTT?

Support for Video Relay Services and (VRS) and Remote Interpretation (VRI)

Scenario: A deaf, speech impaired, hard of hearing wishes to communicate on a call using a remote video interpretation service to access sign language and interpreter services. The VRI has two parties, the deaf/hard of hearing person who is using the VRI, and the interpreter who is on the screen. The interpreter can be on a videophone, web camera, or computer screen. The interpreter will use the audio, while someone speaks and the person will interpret to the deaf person by sign language, and then if the deaf/hard of hearing wants to say something they will sign to the interpreter and the interpreter will use his/her voice to relay that message. [6]

The following is From FCC overview of VRS:

  • VRS allows those persons whose primary language is sign language and to communicate in sign language, instead of having to type what they want to say.
  • Because consumers using VRS communicate in sign language, they are able to more fully express themselves through facial expressions and body language, which cannot be expressed in text.
  • A VRS call flows back and forth just like a telephone conversation between two hearing persons. For example, the parties can interrupt each other, which they cannot do with a TRS call using a TTY (where the parties have to take turns communicating with the CA).
  • Because the conversation flows more naturally back and forth between the parties, the conversation can take place much more quickly than with text-based TRS. As a result, the same conversation is much shorter through VRS than it would be through other forms of text-based TRS.
  • VRS calls may be made between ASL users and hearing persons speaking either English or Spanish. [10]

NOTE: May relate to interoperability with third-party services; IETF has looked at standardising a way to use SIP with VRS services: There is a question about impact of these services on end-to-end security (since these services are *by design* equivalent to man-in-the-middle attacks).

Distinguishing Sent and Received Text

Scenario: A deaf or deaf blind user needs to be to tell the difference between incoming text and outgoing text when used with RTT functionality, WebRTC could handle the routing of this information to a format or output of choice.

NOTE: This is not WebRTC specific and may be just an accessible UI issue.


Call status data

Scenario: In a teleconference with many participants a screen-reader user will need to know what participants are on the call, as well as their status. This status information is very important for people with disabilities as it helps to orientate them while communicating online. This critical information includes knowledge of - who is muted, or actively talking, who has their video or camera stream enabled and who doesn't. The user would benefit from being able to query who is active on a video call or muted. Status polling is where the blind user is able to get a snapshot overview of all of this status information in a WebRTC application.

NOTE: This is not WebRTC specific and may be just an accessible UI issue.

Quality of service

Bandwidth for audio

Scenario: A hard of hearing user needs better stereo sound so they can have a quality experience in work calls or meetings with friends or family. Transmission aspects, such as decibel range for audio needs to be of high quality. Industry allows higher audio resolution, but still mostly audio in mono only.

Bandwidth for video

Scenario: A hard of hearing user needs better stereo sound so they can have a quality experience in watching HD video or having HD meeting with friends or family. Transmission aspects, such as frames per minute for video quality needs to be of high quality.

NOTE: EN 301 549 Section 6, recommends for WebRTC enabled conferencing and communication the application shall be able to encode and decode communication with a frequency range with an upper limit of at least 7 000 Hz. [7]

NOTE: WebRTC lets applications prioritise bandwidth dedicated to audio / video / data streams; there is also some experimental work in signalling these needs to the network layer as well as support for prioritising frame rate over resolution in case of congestion.

Quality of video resolution and frame rate

Scenario: A deaf user is watching a signed broadcast and needs a high quality frame rate to maintain legibility and clarity in order to understand what is being signed.

NOTE: EN 301 549 Section 6, recommends WebRTC applications should support a frame rate of at least 20 frames per second (FPS). [7]

Live Transcription and Captioning [Review]

Scenario: A deaf user or user with some form of cognitive disability needs to access a channel containing live transcription or captioning during a conference call or broadcast and have this presented to them in accordance with their preferences whether it is signed or a related symbol set.

NOTE: Browser APIs needed to implement this are available; but needs better integration with third-party services (e.g. for sign language translation). Possibly covered by general requirements for ToIP contained in RFC 5194.

Assistance for Older Users or users with Cognitive disabilities

Scenario: Users with some form of cognitive disabilities may require assistance when using audio or video communication. A WebRTC video call could have a technical or user support channel, providing support that is customised to the needs of the user via a personalised UI, or as part of a remote health care system, or even just as apart of a conferencing application.

NOTE: Needs clarification/review by COGA, may be an accessible UI, or personalisation issue.

Personalised Symbol sets for users with Cognitive disabilities

Scenario: Users with some form of cognitive disabilities may prefer to use symbol sets for identifying functions available in a WebRTC enabled client whether for Voice, File or Data transfer.

NOTE: WebRTC does not standardise any of the UI, so this may be an accessible UI issue.

Internet Relay Chat Style Interface required for Blind Users

Blind users who depend on text to speech (TTS) to interact with their computers and smart devices require the traditional Internet Relay Chat (IRC) style interface. This must be preserved as a configuration option in agents that implement WebRTC as opposed to having only the Real Time Text (RTT) type interface favoured by users who are deaf or hearing impaired. This is because TTS cannot reasonably translate text into comprehensible speech unless the characters to be pronounced are scheduled and transmitted in close timing to one another.

The use case for RTT is very important and should certainly be supported by WebRTC. This use case does not compete with the use case for RTT and both should be supportable in the text stream provided by WebRTC. We understand why users who can comprehend characters in real time, as they are typed by a remote correspondent in a telecommunications session, are important to text interface users using display screen technology. Users should be supported in seeing those characters with very minimal latency.

Braille Users and RTT

Arguably, some braille users will also prefer the RTT model. However, braille users desiring text displayed with standard contracted braille might better be served in the manner users relying on Text to Speech (TTS) engines are served, by buffering the data to be transmitted until an end of line character is reached.

Challenges with TTS timing

As mentioned, TTS cannot reasonably translate text into comprehensible speech unless the characters to be pronounced are transmitted in close timing to one another. Typical gaps will result in stuttering and highly unintelligible utterances from the TTS engine.

NOTE: People familiar with Unix, and now Linux command line interfaces will understand the distinction described here as that between the two applications "talk" and "write." The former functions like RTT specifies. The latter functions like a classic IRC session. Both need to be supported by WebRTC user agents.

Here are links that further describe the functionality of these two classic Unix utilities:

talk utility is a two-way, screen-oriented communication program.

write utility will read lines from the standard input and write them to the terminal of the specified user.

Data table mapping User Needs with related specifications

The following table maps these user needs with any related specifications such as requirements defined in RFC 5194 - Framework for Real-Time Text over IP Using SIP and EN 301 549 - the EU Procurement Standard.

Overview of what specifications may address some of the use cases outlined above
Related specs or groups Mapping to RFC 5194 - Framework for Real-Time Text over IP Using SIP: Mapping to En 301 549 - EU Procument Standard
Incoming calls WCAG/AGWG, ARIA. Similar to 6.2.4.2 Alerting - RFC 5194/ pre-session set up with RTT 6.2.1 Maps to 6.2.2.2: Programmatically determinable send and receive direction
Accessible call setup WCAG/AGWG, ARIA. Under 'General Requirements for ToIP' x
Routing Media Working Group, Web and Networks IG, Second Screen. Audio Device Client Proposal may fullfil this need and allow complex routing and management of multiple audio input and output devices. x x
Dynamic Audio description values Media Working Group, Web and Networks IG, Second Screen. x x
Audio-subtitling/spoken subtitles Media Working Group, Web and Networks IG, Second Screen. x x
Communications control Media Working Group, Web and Networks IG. Second Screen API may fulfil this user need. Needs confirmation. HTML5 supports this, the streams need to be separable. Looks like an application implementation and not just a WebRTC issue. Could be managed via something like a status bar. Similar to R26 in 5.2.4. Presentation and User Requirements. Maps to 6.2.1.2: Concurrent voice and text
Text communication data channel Media Working Group Similar to R26 in RFC 5194 5.2.4. Presentation and User Requirements. NOTE: Very similar user requirement to 'Audio Routing and Communication channel control' x
Control relative volume and panning position for multiple audio Web Audio Working Group. Multichannel may be mostly covered in the Web Audio group space with some WebRTC requirements, and also by the Audio Device Proposal. x Maps to 6.2.1.2: Concurrent voice and text NOTE: Very similar user requirement to 'Audio Routing and Communication channel controlt'
Support for Real Time Text WebRTC Similar to R26 in RFC 5194 5.2.4. Presentation and User Requirements. NOTE: Very similar user requirement to 'Audio Routing and Communication channel control' x
Simultaneous Voice, Text & Signing Could be partially enabled via RTT in WebRTC. Relates to RFC 5194 - under R2-R8 x
Support for Video Relay Services and (VRS) and Remote Interpretation (VRI) May relate to interoperability with third-party services. Relates to RFC 5194 - under R21-R23 x
Distinguishing Sent and Received Text May relate to interoperability with third-party services. This is not WebRTC specific and may be just an accessible UI issue. Relates to RFC 5194 - under R16 - but this does NOT fully address our use case requirement. Maps to 6.2.2.1: Visually distinguishable display
Warning and recovery of lost data This is not WebRTC specific and may be just an accessible UI issue. Relates to RFC 5194 - under R14-R15 x
Quality of video resolution and frame rate x x EN 301 549 Section 6, recommends WebRTC applications should support a frame rate of at least 20 frames per second
Assistance for Older Users or users with Cognitive disabilities Needs clarification/review by COGA, may be an accessible UI, or personalisation issue. Relates to RFC 5149 - Transport Requirements/ToIP and Relay Services. [To what degree? Are there specific requirements missing that we need to cover?] x
Identify Caller WCAG/AGWG, ARIA. This may a candidate for removal, as identity may be handled by the browser via Identity for WebRTC 1.0. We may need to co-ordinate with another group that manages identity mechanisms in the browser, if doing so supports our overall use case. Similar to R27 in RFC 5194 5.2.4. Presentation and User Requirements Maps to 6.3 Caller ID
Live Transcription and Captioning Browser APIs needed to implement this are available; but needs better integration with third-party services (e.g. for sign language translation). Possibly covered by general requirements for ToIP contained in RFC 5194. Covered under 5.2.3 (transcoding service requirements). Referring to relay services that provide conversion from speech to text, or text to speech, to enable communication. x

References

Acknowledgements

This work is supported by the EC-funded WAI-Guide Project. Many thanks for their very useful input/feedback Janina Sajka, Jason White, Dominique Hazael-Massieux, Steve Lee, Estella Oncins Noguer.