1. Introduction
This section is non-normativeThis specification relies on exposing the following sets of properties:
-
An API to query the user agent with regards to the decoding and encoding abilities of the device based on information such as the codecs, profile, resolution, bitrates, etc. The API exposes information such as whether the playback should be smooth and power efficient.
The intent of purposes of the decoding capabilities API is to provide a powerful replacement to API such as
isTypeSupported()orcanPlayType()which are vague and mostly help the callers to know if something can not be decoded but not how well it should perform. -
Better information about the display properties such as supported color gamut or dynamic range abilities in order to pick the right content for the display and avoid providing HDR content to an SDR display.
-
Real time feedback about the playback so an adaptative streaming can alter the quality of the content based on actual user perceived quality. Such information will allow websites to react to a pick of CPU/GPU usage in real time. It is expected that this will be tackled as part of the [media-playback-quality] specification.
2. Decoding and Encoding Capabilities
2.1. Media Configurations
2.1.1. MediaConfiguration
dictionary {MediaConfiguration VideoConfiguration ;video AudioConfiguration ; };audio
dictionary :MediaDecodingConfiguration MediaConfiguration {required MediaDecodingType ;type MediaCapabilitiesKeySystemConfiguration ; };keySystemConfiguration
dictionary :MediaEncodingConfiguration MediaConfiguration {required MediaEncodingType ; };type
The input to the decoding capabilities is represented by a MediaDecodingConfiguration dictionary and the input of the encoding
capabilities by a MediaEncodingConfiguration dictionary.
For a MediaConfiguration to be a valid
MediaConfiguration, all of the following conditions MUST be true:
-
audioand/orvideoMUST be present. -
audioMUST be a valid audio configuration if present. -
videoMUST be a valid video configuration if present.
For a MediaDecodingConfiguration to be a valid
MediaDecodingConfiguration, all of the following conditions MUST
be true:
- It MUST be a valid MediaConfiguration.
-
If
keySystemConfigurationis present:
For a MediaDecodingConfiguration to describe [ENCRYPTED-MEDIA], a keySystemConfiguration MUST be present.
2.1.2. MediaDecodingType
enum {MediaDecodingType "file" ,"media-source" , };
A MediaDecodingConfiguration has two types:
fileis used to represent a configuration that is meant to be used for a plain file playback.media-sourceis used to represent a configuration that is meant to be used for playback of aMediaSourceas defined in the [media-source] specification.
2.1.3. MediaEncodingType
enum {MediaEncodingType "record" ,"transmission" };
A MediaEncodingConfiguration can have one of two types:
recordis used to represent a configuration for recording of media, e.g. usingMediaRecorderas defined in [mediastream-recording].transmissionis used to represent a configuration meant to be transmitted over electronic means (e.g. usingRTCPeerConnectionas defined in [webrtc]).
2.1.4. MIME types
In the context of this specification, a MIME type is also called content
type. A valid media MIME type is a string that is a valid
MIME type per [mimesniff]. If the MIME type does not imply a
codec, the string MUST also have one and only one parameter that is
named codecs with a value describing a single media codec.
Otherwise, it MUST contain no parameters.
A valid audio MIME type is a string that is valid media
MIME type and for which the type per [RFC7231] is
either audio or application.
A valid video MIME type is a string that is a valid media
MIME type and for which the type per [RFC7231] is
either video or application.
2.1.5. VideoConfiguration
dictionary {VideoConfiguration required DOMString contentType ;required unsigned long width ;required unsigned long height ;required unsigned long long bitrate ;required double framerate ;boolean hasAlphaChannel ;HdrMetadataType hdrMetadataType ;ColorGamut colorGamut ;TransferFunction transferFunction ; };
The contentType member
represents the MIME type of the video track.
To check if a VideoConfiguration configuration is a valid video configuration, the following steps MUST be run:
- If configuration’s
contentTypeis not a valid video MIME type, returnfalseand abort these steps. - If
framerateis not finite or is not greater than 0, returnfalseand abort these steps. - Return
true.
The width and height members represent
respectively the visible horizontal and vertical encoded pixels in the
encoded video frames.
The bitrate member
represents the average bitrate of the video track given in units of bits
per second. In the case of a video stream encoded at a constant bit rate
(CBR) this value should be accurate over a short term window. For the
case of variable bit rate (VBR) encoding, this value should be usable to
allocate any necessary buffering and throughput capability to
provide for the un-interrupted decoding of the video stream over the
long-term based on the indicated contentType.
The framerate member
represents the framerate of the video track. The framerate is the number
of frames used in one second (frames per second). It is represented as a
double.
The hasAlphaChannel member
represents whether the video track contains alpha channel information. If
true, the encoded video stream can produce per-pixel alpha channel information
when decoded. If false, the video stream cannot produce per-pixel alpha channel
information when decoded. If undefined, the UA should determine whether the
video stream encodes alpha channel information based on the indicated contentType, if possible. Otherwise, the UA should
presume that the video stream cannot produce alpha channel information.
If present, the hdrMetadataType member represents that the video track includes the specified HDR
metadata type, which the UA needs to be capable of interpreting for tone
mapping the HDR content to a color volume and luminance of the output
device. Valid inputs are defined by HdrMetadataType.
If present, the colorGamut member represents that the video track is delivered in the specified
color gamut, which describes a set of colors in which the content is
intended to be displayed. If the attached output device also supports
the specified color, the UA needs to be able to cause the output device
to render the appropriate color, or something close enough. If the
attached output device does not support the specified color, the UA
needs to be capable of mapping the specified color to a color supported
by the output device. Valid inputs are defined by ColorGamut.
If present, the transferFunction member represents that the video track requires the specified transfer
function to be understood by the UA. Transfer function describes the
electro-optical algorithm supported by the rendering capabilities of a
user agent, independent of the display, to map the source colors in the
decoded media into the colors to be displayed. Valid inputs are defined
by TransferFunction.
2.1.6. HdrMetadataType
enum {HdrMetadataType "smpteSt2086" ,"smpteSt2094-10" ,"smpteSt2094-40" };
If present, HdrMetadataType describes the capability to interpret HDR metadata
of the specified type.
The VideoConfiguration may contain one of the following types:
-
smpteSt2086, representing the static metadata type defined by [SMPTE-ST-2086]. -
smpteSt2094-10, representing the dynamic metadata type defined by [SMPTE-ST-2094]. -
smpteSt2094-40, representing the dynamic metadata type defined by [SMPTE-ST-2094].
2.1.7. ColorGamut
enum {ColorGamut "srgb" ,"p3" ,"rec2020" };
The VideoConfiguration may contain one of the following types:
2.1.8. TransferFunction
enum {TransferFunction "srgb" ,"pq" ,"hlg" };
The VideoConfiguration may contain one of the following types:
-
srgb, representing the transfer function defined by [sRGB]. -
pq, representing the "Perceptual Quantizer" transfer function defined by [SMPTE-ST-2084]. -
hlg, representing the "Hybrid Log Gamma" transfer function defined by BT.2100.
2.1.9. AudioConfiguration
dictionary {AudioConfiguration required DOMString contentType ;DOMString channels ;unsigned long long bitrate ;unsigned long samplerate ;boolean spatialRendering ; };
The contentType member
represents the MIME type of the audio track.
To check if a AudioConfiguration configuration is a valid audio configuration, the following steps MUST be run:
- If configuration’s
contentTypeis not a valid audio MIME type, returnfalseand abort these steps. - Return
true.
The channels member
represents the audio channels used by the audio track.
The channels needs to be defined as a double (2.1, 4.1, 5.1, ...), an unsigned short (number of channels) or as an enum value. The current
definition is a placeholder.
The bitrate member
represents the number of average bitrate of the audio track. The bitrate
is the number of bits used to encode a second of the audio track.
The samplerate represents the samplerate of the audio track in. The samplerate is the
number of samples of audio carried per second.
The samplerate is expressed in Hz (ie. number of samples of audio per second). Sometimes the samplerates
value are expressed in kHz which represents the number of
thousands of samples of audio per second.
44100 Hz is equivalent to 44.1 kHz.
The spatialRendering member indicates that the audio SHOULD be renderered spatially. The
details of spatial rendering SHOULD be inferred from the contentType. If not present, the UA MUST
presume spatialRendering is not required. When true, the
user agent SHOULD only report this configuration as supported if it can support spatial
rendering *for the current audio output device* without failing back to a
non-spatial mix of the stream.
2.1.10. MediaCapabilitiesKeySystemConfiguration
dictionary {MediaCapabilitiesKeySystemConfiguration required DOMString keySystem ;DOMString initDataType = "";MediaKeysRequirement distinctiveIdentifier = "optional";MediaKeysRequirement persistentState = "optional";sequence <DOMString >sessionTypes ;KeySystemTrackConfiguration audio ;KeySystemTrackConfiguration video ; };
This dictionary refers to a number of types defined by [ENCRYPTED-MEDIA] (EME). Sequences of EME types are
flattened to a single value whenever the intent of the sequence was to
have requestMediaKeySystemAccess() choose a subset it supports.
With MediaCapabilities, callers provide the sequence across multiple
calls, ultimately letting the caller choose which configuration to use.
The keySystem member represents a keySystem name as described in [ENCRYPTED-MEDIA].
The initDataType member represents a single value from the initDataTypes sequence
described in [ENCRYPTED-MEDIA].
The distinctiveIdentifier member represents a distinctiveIdentifier requirement as
described in [ENCRYPTED-MEDIA].
The persistentState member represents a persistentState requirement as described in [ENCRYPTED-MEDIA].
The sessionTypes member represents a sequence of required sessionTypes as
described in [ENCRYPTED-MEDIA].
The audio member
represents a KeySystemTrackConfiguration associated with the AudioConfiguration.
The video member
represents a KeySystemTrackConfiguration associated with the VideoConfiguration.
2.1.11. KeySystemTrackConfiguration
dictionary {KeySystemTrackConfiguration DOMString robustness = "";DOMString ?encryptionScheme =null ; };
The robustness member represents a robustness level as described in [ENCRYPTED-MEDIA].
The encryptionScheme member represents an encryptionScheme as described in [ENCRYPTED-MEDIA-DRAFT].
2.2. Media Capabilities Information
dictionary {MediaCapabilitiesInfo required boolean ;supported required boolean ;smooth required boolean ; };powerEfficient
dictionary :MediaCapabilitiesDecodingInfo MediaCapabilitiesInfo {required MediaKeySystemAccess ;keySystemAccess MediaDecodingConfiguration ; };configuration
dictionary :MediaCapabilitiesEncodingInfo MediaCapabilitiesInfo {MediaEncodingConfiguration ; };configuration
A MediaCapabilitiesInfo has associated supported, smooth, powerEfficient fields which are
booleans.
Authors can use powerEfficient in concordance
with the Battery Status API [battery-status] in order to determine
whether the media they would like to play is appropriate for the user
configuration. It is worth noting that even when a device is not power
constrained, high power usage has side effects such as increasing the
temperature or the fans noise.
A MediaCapabilitiesDecodingInfo has associated keySystemAccess which is a MediaKeySystemAccess or null as appropriate.
A MediaCapabilitiesDecodingInfo has an associated configuration which
is the decoding configuration properties used to generate the MediaCapabilitiesDecodingInfo.
A MediaCapabilitiesEncodingInfo has an associated configuration which
is the encoding configuration properties used to generate the MediaCapabilitiesEncodingInfo.
If the encrypted decoding configuration is supported, the
resulting MediaCapabilitiesInfo will include a MediaKeySystemAccess. Authors may use this to create MediaKeys and setup encrypted playback.
2.3. Algorithms
2.3.1. Create a MediaCapabilitiesEncodingInfo
Given a MediaEncodingConfiguration configuration, this
algorithm returns a MediaCapabilitiesEncodingInfo. The following steps are
run:
- Let info be a new
MediaCapabilitiesEncodingInfoinstance. Unless stated otherwise, reading and writing apply to info for the next steps. - Set configuration to be a new
MediaEncodingConfiguration. For every property in configuration create a new property with the same name and value in configuration. - If the user agent is able to encode the media represented by configuration, set supported to
true. Otherwise set it tofalse. - If the user agent is able to encode the media represented by configuration at a pace that
allows encoding frames at the same pace as they are sent to
the encoder, set smooth to
true. Otherwise set it tofalse. - If the user agent is able to encode the media represented by configuration in a power
efficient manner, set powerEfficient to
true. Otherwise set it tofalse. The user agent SHOULD NOT take into consideration the current power source in order to determine the encoding power efficiency unless the device’s power source has side effects such as enabling different encoding modules. - Return info.
2.3.2. Create a MediaCapabilitiesDecodingInfo
Given a MediaDecodingConfiguration configuration, this
algorithm returns a MediaCapabilitiesDecodingInfo. The following
steps are run:
- Let info be a new
MediaCapabilitiesDecodingInfoinstance. Unless stated otherwise, reading and writing apply to info for the next steps. - Set configuration to be a new
MediaDecodingConfiguration. For every property in configuration create a new property with the same name and value in configuration. -
If
configuration.keySystemConfigurationis present:- Set keySystemAccess to the result of running the Check Encrypted Decoding Support algorithm with configuration.
- If keySystemAccess is not
nullset supported totrue. Otherwise set it tofalse.
-
Otherwise, run the following steps:
- Set keySystemAccess to
null. - If the user agent is able to decode the media represented
by configuration, set supported to
true. - Otherwise, set it to
false.
- Set keySystemAccess to
- If the user agent is able to decode the media represented by configuration at a pace that allows a smooth
playback, set smooth to
true. Otherwise set it tofalse. - If the user agent is able to decode the media represented by configuration in a power efficient
manner, set powerEfficient to
true. Otherwise set it tofalse. The user agent SHOULD NOT take into consideration the current power source in order to determine the decoding power efficiency unless the device’s power source has side effects such as enabling different decoding modules. - Return info.
2.3.3. Check Encrypted Decoding Support
Given a MediaDecodingConfiguration config with a keySystemConfiguration present, this algorithm returns a MediaKeySystemAccess or null as appropriate. The
following steps are run:
- If the
keySystemmember ofconfig.keySystemConfigurationis not one of the Key Systems supported by the user agent, returnnull. String comparison is case-sensitive. - Let origin be the origin of the calling context’s Document.
- Let implementation be the implementation of
config.keySystemConfiguration.keySystem - Let emeConfiguration be a new
MediaKeySystemConfiguration, and initialize it as follows:- Set the
initDataTypesattribute to a sequence containingconfig.keySystemConfiguration.initDataType. - Set the
distinctiveIdentifierattribute toconfig.keySystemConfiguration.distinctiveIdentifier. - Set the
persistentStateattribute toconfig.keySystemConfiguration.peristentState. - Set the
sessionTypesattribute toconfig.keySystemConfiguration.sessionTypes. -
If
audiois present in config, set theaudioCapabilitiesattribute to a sequence containing a singleMediaKeySystemMediaCapability, initialized as follows:- Set the
contentTypeattribute toconfig.audio.contentType. -
If
config.keySystemConfiguration.audiois present:- Set the
robustnessattribute toconfig.keySystemConfiguration.audio.robustness - Set the
encryptionSchemeattribute toconfig.keySystemConfiguration.audio.encryptionScheme
- Set the
- Set the
-
If
videois present in config, set the videoCapabilities attribute to a sequence containing a singleMediaKeySystemMediaCapability, initialized as follows:- Set the
contentTypeattribute toconfig.video.contentType. -
If
config.keySystemConfiguration.videois present:- Set the
robustnessattribute toconfig.keySystemConfiguration.video.robustness. - Set the
encryptionSchemeattribute toconfig.keySystemConfiguration.video.encryptionScheme
- Set the
- Set the
- Set the
- Let supported configuration be the result of executing the Get Supported Configuration algorithm on implementation, emeConfiguration, and origin.
- If supported configuration is
NotSupported, returnnulland abort these steps. -
Let access be a new
MediaKeySystemAccessobject, and initialize it as follows:- Set the
keySystemattribute toemeConfiguration.keySystem. - Let the configuration value be supported configuration.
- Let the cdm implementation value be implementation.
- Set the
- Return access
2.4. Navigator and WorkerNavigator extension
[Exposed =Window ]partial interface Navigator { [SameObject ]readonly attribute MediaCapabilities ; };mediaCapabilities
[Exposed =Worker ]partial interface WorkerNavigator { [SameObject ]readonly attribute MediaCapabilities ; };mediaCapabilities
2.5. Media Capabilities Interface
[Exposed =(Window ,Worker )]interface { [MediaCapabilities NewObject ]Promise <MediaCapabilitiesDecodingInfo >(decodingInfo MediaDecodingConfiguration ); [configuration NewObject ]Promise <MediaCapabilitiesInfo >(encodingInfo MediaEncodingConfiguration ); };configuration
The decodingInfo() method method MUST run the following steps:
- If configuration is not a valid
MediaDecodingConfiguration, return a Promise rejected with a
newly created
TypeError. -
If
configuration.keySystemConfigurationis present, run the following substeps:- If the global object is of type
WorkerGlobalScope, return a Promise rejected with a newly createdDOMExceptionwhose name is InvalidStateError. - If the result of running Is the environment settings object
settings a secure context? [secure-contexts] with the global object’s relevant settings object is not
"Secure", return a Promise rejected with a newly created
DOMExceptionwhose name is SecurityError.
- If the global object is of type
- Let p be a new promise.
- In parallel, run the Create a MediaCapabilitiesDecodingInfo algorithm with configuration and resolve p with its result.
- Return p.
Note, calling decodingInfo() with a keySystemConfiguration present
may have user-visible effects, including requests for user consent. Such
calls should only be made when the author intends to create and use a MediaKeys object with the provided configuration.
The encodingInfo() method MUST run the following steps:
- If configuration is not a valid MediaConfiguration,
return a Promise rejected with a newly created
TypeError. - Let p be a new promise.
- In parallel, run the Create a MediaCapabilitiesEncodingInfo algorithm with configuration and resolve p with its result.
- Return p.
3. Security and Privacy Considerations
This specification does not introduce any security-sensitive information or APIs but is provides an easier access to some information that can be used to fingerprint users.
3.1. Decoding/Encoding and Fingerprinting
The information exposed by the decoding/encoding capabilities can already be discovered via experimentation with the exception that the API will likely provide more accurate and consistent information. This information is expected to have a high correlation with other information already available to the web pages as a given class of device is expected to have very similar decoding/encoding capabilities. In other words, high end devices from a certain year are expected to decode some type of videos while older devices may not. Therefore, it is expected that the entropy added with this API isn’t going to be significant.
HDR detection is more nuanced. Adding colorGamut, transferFunction, and hdrMetadataType has the potential to add significant entropy. However, for UAs whose decoders are implemented in software and therefore whose capabilities are fixed across devices, this feature adds no effective entropy. Additionally, for many cases, devices tend to fall into large categories, within which capabilities are similar thus minimizing effective entropy.
If an implementation wishes to implement a fingerprint-proof version of this specification, it would be recommended to fake a given set of capabilities (ie. decode up to 1080p VP9, etc.) instead of returning always yes or always no as the latter approach could considerably degrade the user’s experience. Another mitigation could be to limit these Web APIs to top-level browsing contexts. Yet another is to use a privacy budget that throttles and/or blocks calls to the API above a threshold.
3.2. Display and Fingerprinting
The information exposed by the display capabilities can already be accessed via CSS for the most part. The specification also provides default values when the user agent does not which to expose the feature for privacy reasons.
4. Examples
4.1. Query recording capabilities with encodingInfo()
< script> const configuration= { type: 'record' , video: { contentType: 'video/webm;codecs=vp8' , width: 640 , height: 480 , bitrate: 10000 , framerate: 29.97 } }; navigator. mediaCapabilities. encodingInfo( configuration) . then(( result) => { console. log( result. contentType+ ' is:' + ( result. supported? '' : ' NOT' ) + ' supported,' + ( result. smooth? '' : ' NOT' ) + ' smooth and' + ( result. powerEfficient? '' : ' NOT' ) + ' power efficient' ); }) . catch (( err) => { console. error( err, ' caused encodingInfo to throw' ); }); < /script>