During an Advisory Committee review around the Encrypted Media Extensions (aka EME) specification in its progression through the Recommendation process, concerns were raised regarding accessibility issues for people with disabilities around the specification due to potential restrictions on access to online media content and associated accessibility information. This document lists the results of our investigations around those concerns.
In 2015, W3C published a comprehensive list of Media Accessibility User Requirements we’ve ever produced and, consequently, the work around the HTML video element has focused on addressing those user requirements. Those requirements provide support for existing solutions including captions, audio descriptions of video, and transcripts.
The Encrypted Media specification targets some of the user requirements in section 8.5.3 Unencrypted In-band Support Content.:
In-band support content, such as captions, described audio, and transcripts, SHOULD NOT be encrypted.
Implementations that choose to support encrypted support content MUST provide the decrypted data to the user agent to be processed in the same way as equivalent unencrypted timed text tracks.
For web-based video, closed captions (as well as enhanced subtitles and transcripts) are transmitted as text data along with the video and audio streams and are not visible until the user elects to turn them on. The EME specification notes that encrypted text data is not generally supported by implementations, which is the case for captions and transcripts as well. In fact, based on our own experiments, while it's theoretically possible to encrypt text streams, current support seems absent or defective (see webm_crypt or MP4Box). With the current state of implementations around text streams (encrypted or not) and the recommendations in the EME specification, it's actually best to use a separate text file and to associate it with the video content using a timed text track element (or use an approach like MP4Box.js). It is not possible to encrypt a separate text file. This also allows for audio synthesis of audio descriptions when using text track cues. Combining encrypted video/audio content with unencrypted captions does not impede the overlay display of the text track or its audio synthesis.
Open captions are captions that have been merged with the video track and cannot be turned off. As such, they are part of the video stream and cannot be accessed as text, whether the video stream is encrypted or not.
Described video is traditionally audio recordings timed and recorded to fit into natural pauses in the program. It is an additional narration track intended primarily for blind and visually impaired consumers of visual media. As an alternate audio stream, it is possible to encrypt the stream (within one single media container or as a separate one) and, per the EME specification, in those cases it is expected that implementations decrypt those streams like any other audio stream.
Additional use cases, such as seizure disorders or atypical color perception, would need further investigation. The procedures or conditions described below would require realtime processing of the video frames to improve the accessibility of the video content and may significantly impact the performance of the video engine. Note that, independently of the EME specification, those issues are also relevant for unencrypted video or animated images and contents (through CSS and/or JS). While some operating systems allow for limited color contrast adjustments, further studies would be needed to establish when and how the processing of colors ought to happen (such as OS driver level, user agent level, extra devices) and its impact of Web technologies, if any; and likewise for flash mitigation.
Daltonization is a procedure for recoloring an image for viewing by a color-deficient viewer, such as red-green color blindness.
Exposure to flashing lights at certain intensities or to certain visual patterns can trigger discomfort (such as headaches) or seizures in individuals. Generally, flashing lights are most likely to trigger seizures between the frequency of 5 to 30 flashes per second (Hertz); this effect can potentially be mitigated server-side, but would require more research.