Well-deployed technologies

HTML5 adds two tags that dramatically improve the integration of multimedia content on the Web: the <video> and <audio> tags. Respectively, these tags allow embedding video and audio content, and make it possible for Web developers to interact much more freely with that content than they would through plug-ins. They make multimedia content first-class citizens of the Web, the same way images have been for the past 20 years.

The playback content can be streamed, augmented and completed via Media Source Extensions that lets developers buffer and generate media content in JavaScript, thus allowing Web application developers to create libraries that can handle adaptive streaming formats and protocols.

For the distribution of media whose content needs specific protection from copy, Encrypted Media Extensions (EME) enables Web applications to render encrypted media streams based on Content Decryption Modules (CDM).

While the new HTML5 tags allow to play multimedia content, the HTML Media Capture defines a markup-based mechanism to access captured multimedia content using attached camera and microphones, a very common feature on mobile devices. Direct manipulation of streams from camera and microphones is possible through the Media Capture and Streams API.

The Canvas 2D Context API enables modifying images, which in turn opens up the possibility of video editing, thus bringing multimedia manipulation capabilities to the Web platform.

FeatureSpecificationMaturityCurrent implementations
Video playbackvideo element in HTML 5.1REC

Shipped
shipped in firefoxshipped in chromeshipped in edgeshipped in safari

Audio playbackaudio element in HTML 5.1REC

Shipped
shipped in firefoxshipped in chromeshipped in edgeshipped in safari

Generation of media contentMedia Source Extensions™REC

Shipped
shipped in firefoxshipped in chromeshipped in edgeshipped in safari

Protected content playbackEncrypted Media ExtensionsREC

Shipped
shipped in chromeshipped in edgeshipped in firefoxshipped in safari

Capturing audio/videoHTML Media CapturePR
Media Capture and StreamsCR

Shipped
shipped in firefoxshipped in chromeshipped in edgeshipped in safari

Image & Video analysis, modificationHTML Canvas 2D ContextREC

Shipped
shipped in firefoxshipped in chromeshipped in edgeshipped in safari

Technologies in progress

Beyond the declarative approach enabled by the <audio> element, the Web Audio API provides a full-fledged audio processing API, which includes support for low-latency playback of audio content.

As users increasingly own more and more connected devices, the need to get these devices to work together increases as well:

  • The Presentation API offers the possibility for a Web page to open and control a page located on another screen from a mobile device, opening the road for multi-screen Web applications.
  • The Remote Playback API focuses more specifically on controlling the rendering of media on a separate device.
  • The Audio Output Devices API offers similar functionality for audio streams, enabling a Web application to pick on which audio output devices a given sound should be played on.

The Web Real-Time Communications Working Group is building an API to record streams from camera and microphones into files, and another API to use access to cameras to take photos programatically.

The Web Real-Time Communications Working Group is the host of specifications for a wider set of communication opportunities:

  • Peer-to-peer connection across devices,
  • P2P Audio and video streams allowing for real-time communications between users.
FeatureSpecificationMaturityCurrent implementations
Audio playbackWeb Audio APIWD

In development
indevelopment in safari

Shipped
shipped in firefoxshipped in chromeshipped in edge

Distributed renderingPresentation APICR

In development
indevelopment in firefoxindevelopment in safari

Shipped
shipped in chrome

Remote Playback APICR

Under consideration
consideration in firefox

In development
indevelopment in safari

Shipped
shipped in chrome

Audio Output Devices APICR

Under consideration
consideration in edge

Shipped
shipped in chrome

Capturing audio/videoMediaStream RecordingWD

Under consideration
consideration in edge

Shipped
shipped in chromeshipped in firefox

"MediaStream Image Capture"WD

In development
indevelopment in firefox

Shipped
shipped in chrome

P2P connections and audio/video streamsWebRTC 1.0: Real-time Communication Between BrowsersCR

Shipped
shipped in firefoxshipped in chromeshipped in edgeshipped in safari

Exploratory work

Mobile devices often expose shortcuts to handle the audio output of a main application (e.g. a music player) from a lock screen or the notification areas. The underlying operating system is in charge of determining which of these applications should have the media focus. The Media session specification would expose these changes of focus to Web applications.

The Multi-Device Timing Community Group is exploring another aspect of multi-device media rendering: its Timing Object specification enables to keep video, audio and other data streams in close synchrony, across devices and independently of the network topology. This effort needs support from interested parties to progress.

The Picture in Picture proposal would allow applications to initiate and control the rendering of a video in a separate miniature window that is viewable above all other activities.

To improve the interoperability of implementations of the Presentation API and Remote Playback API, in particular between the first and second screen, the Second Screen Community Group is discussing requirements for an Open Screen Protocol.

New mobile screens can render content in high resolution using a broader color space beyond the classical sRGB color space. To adapt to wide-gamut displays, all the graphical systems of the Web will need to adapt to these broader color spaces. CSS Colors Level 4 is proposing to define CSS colors in color spaces beyond the classical sRGB. Similarly, work on making canvas color-managed should enhance the support for colors in HTML Canvas.

The WebVR specification is a low-level API that allows applications to access and control head-mounted displays (HMD) using JavaScript. It is a critical enabler to render 360° video content in Virtual Reality headsets and in mobile devices used as such.

Mobile devices have widely heterogeneous decoding (and encoding) capabilities. To improve the user experience and take advantage of advanced device capabilities when they are available, media providers e.g. need to know whether the user's device can decode a particular codec at a given resolution, bitrate and framerate. Will the playback be smooth and power efficient? Can the display render HDR and wide color gamut content? The Media Capabilities specification defines an API to expose that information, with a view to replacing the more basic and vague isTypeSupported() and canPlayType() functions defined in HTML.

Media providers also need some mechanism to assess the user's perceived playback quality to alter the quality of content transmitted using adaptive streaming. The Media Playback Quality specification, initially part of Media Source Extensions, exposes metrics on the number of frames that were displayed or dropped.

Video processing using the Canvas API is very CPU-intensive. Beyond traditional video processing, modern GPUs often provide advanced vision processing capabilities (e.g. face and objects recognition) that would have direct applicability e.g. in augmented reality applications. The Shape Detection API is exploring this space.

Features not covered by ongoing work

Color Management
To ensure the proper rendering of videos with high-dynamic range (HDR) and wide-gamut colors, content providers would need to determine whether the underlying device and browser have proper support for this. Similarly, content providers need a mechanism to match colors to mix HDR content and Standard Dynamic Range (SDR) content. The Color on the Web Community Group allows color experts from various fields to share ideas and discuss technical solutions to improve the state of Color on the Web.
Native support for 360° video rendering
While it is already possible to render 360° videos within a <video> element, integrated support for the rendering of 360° videos would allow to hide the complexity of the underlying adaptive streaming logic to applications, letting Web browsers optimize streaming and rendering on their own.
The Canvas API provide capabilities to do image and video processing, but these capabilities are limited by their reliance on the CPU for execution; modern GPUs provide hardware-acceleration for a wide range of operations, but the browsers don't provide hooks to these. The GPU for the Web Community Group is discussing solutions to expose GPU computation functionality to Web applications, which could eventually allow web applications to process video streams efficiently, taking advantage of the GPU power.

Discontinued features

Network service discovery
The Network Service Discovery API was to offer a lower-level approach to the establishment of multi-device operations, by providing integration with local network-based media renderers, such as those enabled by DLNA, UPnP, etc. This effort was discontinued out of privacy concerns and lack of interest from implementers. The current approach is to let the user agent handle network discovery under the hoods, as done in the Presentation API and Remote Playback API.