EME and accessibility

Abstract

During an Advisory Committee review of the Encrypted Media Extensions (aka EME) specification, concerns were raised regarding potential accessibility for people with disabilities. This included the possibility that there might be barriers to accessing accessibility information such as captions, or that there might be barriers to researching innovative approaches to accessibility-related modifications of the video stream.

We have not found any such barriers in our investigation. This document describes the general process used to review potential accessibility concerns in W3C specifications during development, summarizes the results of accessibility testing of the EME specification, and notes the disposition of ideas for future research on accessible video regardless of encryption.

We note that questions about EME and accessibility are part of a broader dialog around the EME specification, and for those issues, we refer readers to the EME FAQ.

1. Introduction

1.1 Process for investigating potential accessibility issues in W3C specifications

Accessibility of digital technologies is essential if we are to support an inclusive society for the billion people in the world with disabilities. The constant evolution of digital technologies creates the possibility for accessibility innovations as well as introduction of new barriers. Therefore W3C standards development work includes accessibility review by the Accessible Platform Architectures Working Group (APA WG) for all specifications during the development process, to identify potential barriers to accessibility and new opportunities for accessibility support. If new barriers or opportunities are identified, these are addressed until a satisfactory resolution is reached.

1.2 Accessibility requirements specific to online media

In 2015 W3C published a comprehensive list of Media Accessibility User Requirements ("MAUR"), and described necessary accessibility support for online media, including captions, audio description of video, and transcripts. MAUR is occasionally updated and expanded as new requirements emerge. During the development of HTML5 by the HTML Working Group, features in HTML5 were added and adapted to ensure that requirements in MAUR drafts were addressed. Work on the HTML5 video element focused on addressing the user accessibility requirements in the MAUR.

1.3 Accessibility issues potentially relevant to EME and MAUR

The possibility of accessibility issues in EME has been a consideration because of W3C's commitment to accessibility since early in the development of the specification. Due diligence on this concern involved face-to-face meetings between accessibility experts in the APA and the HTML Working Groups starting in 2013. Repeated analysis and testing has shown no barriers to accessing captions, transcripts, or audio description of video.

Digital rights management of media content has been present in various forms for years. Accessibility information for media has generally been transmitted in the clear rather than in an encrypted form. This is what is recommended ("SHOULD") in the EME specification. Even in cases where accessibility information might be encrypted (for instance, when open captions might be carried in a primary video file), accessibility information "MUST" be decrypted along with the primary video file. A summary of tests confirming this is available below.

1.4 Potentially novel accessibility opportunities in online media

Recently, some people have suggested that research on innovations in accessibility adaptations might be blocked in some way by encryption, and also that there might be a risk incurred by accessibility researchers or by individuals with disabilities in exploring or using innovations. We have found nothing to substantiate that concern. Each innovation that has been suggested is not only possible to research, but best researched, on open video streams, completely independently from encryption. Whenever the potential for accessibility innovation arises in APA WG reviews, these can be referred into the APA WG's Research Questions Task Force (RQTF) for follow-up, as several of these use cases have already been.

There might also be concerns whether application of such innovations might be blocked in some way in the presence of encryption. As a general rule, many types of pre-processing and post-processing of media, whether accessibility-related or not, are applied at different stages in the media production process, and often by different providers (for instance, addition of CGI effects, or addition of audio description). When such modifications are for encrypted materials, permissions are arranged to accommodate these modifications. Based on a scenario-by-scenario analysis, these examples should be no different, as explained below.

2. Access to Captions, Audio Description, and Transcripts

The Encrypted Media Extension specification addresses relevant MAUR requirements in section https://www.w3.org/TR/encrypted-media/#media-requirements:, which defines "Unencrypted In-band Support Content":

In-band support content, such as captions, described audio, and transcripts, SHOULD NOT be encrypted.

Additionally, the EME specification defines how accessibility support content must be provided, if it is encrypted at any point:

Implementations that choose to support encrypted support content MUST provide the decrypted data to the user agent to be processed in the same way as equivalent unencrypted timed text tracks.

2.1 Closed captions

For web-based video, closed captions and transcripts (as well as enhanced subtitles) are transmitted as text data along with the video and audio streams, and are not visible until the user elects to turn them on. The EME specification notes that most encryption implementations do not support encryption of text data, which is the case for captions and transcripts as these are generally text files as well. Based on our experiments, while it is theoretically possible to encrypt text streams, current support seems absent or defective (see webm_crypt or MP4Box). Video websites (such as Youtube or Netflix) currently favor the approach of using a JS library with a separate text file to display the closed captions instead of relying on built-in caption functionalities.

The EME specification explicitly advises using a separate text file for accessibility information, and using the timed track element to associated that with the video content. Alternatively, an approach such as MP4Box.js can be used, which also allows for audio synthesis of audio descriptions when using text track cues. Combining encrypted video/audio content with unencrypted captions does not impede rendering of the text track or its audio synthesis.

2.2 Open captions

Open captions are usually captions that have been merged with the video track and cannot be turned off; therefore these are decrypted at the same time as the primary video stream is decrypted. But because they are part of the video stream, they cannot be accessed as text, whether the video stream is encrypted or not, which is disadvantageous for users who might need to configure the format or positioning of the caption display. Therefore the EME specification does not recommend this approach, but it allows it in order to accommodation legacy material that is open captioned in this manner. The EME specification also allows open captions carried in a separate unencrypted text track, which is more optimal for users.

2.3 Audio Description

Audio description (also know as video description, and described video) is most often an audio recording timed and recorded to fit into natural pauses in a video program. It is an additional narration track intended primarily for blind and visually impaired consumers of visual media, though it can also be a beneficial support for others, including people with some kinds of cognitive disabilities. As an alternate audio stream, it is typically presented in the clear (unencrypted), in which case it needs neither encryption nor decryption. While it is possible to encrypt an audio description of video (within one single media container or as a separate one), any such implementations are required by the EME specification to be decryptable just as would any other audio stream. It is also possible that description of video may be presented as a text track, as an accommodation for individuals who can neither see or hear well, or are deaf blind. Again as with captions, these text tracks are presented in the clear and therefore decryption is not an issue.

3. Use Cases for Potential Further Research

Use cases for innovative accessibility adaptations have been suggested, together with concerns that EME might in some way impede research on these types of adaptations. As already noted, each of these types of adaptations is most appropriately researched on open video streams; and as always, W3C is interested in exploring any such ideas in its Working Groups. Application of such enhancements can be made available through agreements with hosts of encrypted content; and some encrypted media hosts may choose to incorporate these enhancements directly into their services.

Use cases recently cited include:

3.1 Color Daltonization

Color daltonization involves shifting the color scheme to adapt to the needs of users with atypical color perception. For instance, someone with red-green color blindness could view a video stream that has been color-shifted to take advantage of a color palette that is more easily distinguishable for them. Tools for daltonization of static images on the Web already exist at the browser level. Color daltonization of video requires realtime processing of video frames and may be best processed at the operating system driver level, rather than at the browser level. But as capabilities for daltonization extend from static images to video streams, these enhancements could be applied by arrangement with hosts of encrypted content, or in some cases along with other post-processing adaptations of decrypted content.

3.2 Flash Mitigation

Exposure to flashing lights at certain intensities or to certain visual patterns can trigger discomfort (for instance headaches) or seizures in individuals. Flash mitigation involves detecting and buffering rapid changes in the contrast and brightness of colors and areas on a web page in order to mitigate potential triggers in people with photosensitive seizure disorders. For instance, an animated gif that rotates rapidly between contrasting colors, or a video stream that includes a strobing sequence, might trigger seizures in different people depending on their sensitivities. Just as with color daltonization, this requires realtime processing of video frames that may be best processed at the operating system driver level. But the existence of encryption in no way blocks research on flash mitigation. Application of these enhancements to encrypted content would be a question of arranging permissions with the host of the encrypted media, if the host did not directly incorporate these.

3.3 Time Scale Modification

Time scale modification involves slowing down or speeding up of recorded audio and/or video content. This could be an accessibility accommodation in the case of someone who needs a spoken version of text at either a faster or slower speed than the original recording. Slower speech could for instance support improved comprehension by someone with an audio processing disorder; while faster speech could support more efficient processing of audio content for someone who is used to listening to audio content at high speed in screen readers or audiobooks. Again there is nothing to block research on time scale modification in open video streams, and application of these could be arranged by permission if the host didn't directly incorporate the feature.

3.4 Automatic Generation of Captions

Improvements in auto-generation of captions, and human-assisted correction of such draft captions, can be researched without barriers on open video streams. For encrypted media, in some cases the provider of auto-generated captioning services is the same as the host of encrypted media, facilitating access to auto-generation services for those media files. If the auto-generated captioning and hosting are provided by different organizations, access for auto-generation services and for human correction of such draft captions can be arranged by permissions as with current captioning services. In the case that the captions provided with encrypted media are considered to be of inadequate quality, the host of the media might need encouragement to provide captions of an appropriate quality in places where there are not already requirements for this, otherwise to enable alternative captions to be viewed along with the media. Because the time position of an HTML5 video is unencrypted, an alternative caption track can be associated with video files in place of the existing one(s).

3.5 Other use cases

For other use cases that might emerge as interest increases in potential accessibility innovations in online media, we encourage people to raise these on the WAI Interest Group list for discussion. The Research Questions Task Force captures issues needing further investigation, and has already captured the issues of color daltonization and flash mitigation of streaming video. For those interested in helping research or implement accessibility innovations on the Web, please let us know.