Copyright © 2021 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This document defines a set of JavaScript APIs that let a Web application manage how audio is rendered on the user audio output devices.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
The WebRTC and Device and Sensors Working Group intend to publish this specification as a Candidate Recommendation soon. Consequently, this is a Request for wide review of this document.
This document was published by the Web Real-Time Communications Working Group as a Candidate Recommendation Draft. This document is intended to become a W3C Recommendation.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them to public-webrtc@w3.org (subscribe, archives).
Publication as a Candidate Recommendation does not imply endorsement by the W3C Membership. A Candidate Recommendation Draft integrates changes from the previous Candidate Recommendation that the Working Group intends to include in a subsequent Candidate Recommendation Snapshot.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 15 September 2020 W3C Process Document.
This section is non-normative.
This proposal allows JavaScript to direct the audio output of a media element to permitted devices other than the system or user agent default. This can be helpful in a variety of real-time communication scenarios as well as general media applications. For example, an application can use this API to programmatically direct output to a device such as a Bluetooth headset or speakerphone.
HTMLMediaElement ExtensionsThis section specifies additions to the HTMLMediaElement [HTML] when the Audio Output Devices API is
    supported.
When the HTMLMediaElement constructor is invoked, the user
    agent MUST add the following initializing step:
Let the element have a [[SinkId]] internal slot,
        initialized to "".
      
WebIDLpartial interface HTMLMediaElement {
  [SecureContext] readonly attribute DOMString sinkId;
  [SecureContext] Promise<undefined> setSinkId (DOMString sinkId);
};
      sinkId of type DOMString, readonlyThis attribute contains the ID of the audio device through which
            output is being delivered, or the empty string if output is
            delivered through the user-agent default device. If nonempty, this
            ID should be equal to the deviceId
            attribute of one of the MediaDeviceInfo values returned from 
            enumerateDevices().
On getting, the
            attribute MUST return the value of the [[SinkId]] slot.
setSinkIdSets the ID of the audio device through which audio output should be rendered if the application is permitted to play out of a given device.
When this method is invoked, the user agent must run the following steps:
Let document be the current settings object's responsible document.
If document is not
                
                allowed to use the feature identified by
                "speaker-selection", return a
                promise rejected with a new DOMException
                whose name is NotAllowedError.
                
Let element be the HTMLMediaElement
                object on which this method was invoked.
Let sinkId be the method's first argument.
If sinkId is equal to element's
                [[SinkId]],
                return a promise resolved with undefined.
Let p be a new promise.
Run the following substeps in parallel:
If sinkId is not the empty string and does not
                    match any audio output device identified by the result that
                    would be provided by enumerateDevices(),
                    reject p with a new
                    DOMException whose name is
                    NotFoundError and abort these substeps.
If sinkId is not the empty string, and the
                    application would not be permitted to play audio through
                    the device identified by sinkId if it weren't the
                    current user agent default device, reject p
                    with a new DOMException whose name is
                    NotAllowedError and abort these substeps.
Switch the underlying audio output device for element to the audio device identified by sinkId.
If the preceding substep failed, reject p
                    with a new DOMException whose name is
                    AbortError,
                    and abort these substeps.
Queue a task that runs the following steps:
Set element's [[SinkId]] to
                        sinkId.
Resolve p.
Return p.
New audio devices may become available to the user agent, or an
        audio device (identified by a media element's sinkId attribute) that had
        previously become unavailable may become available
        again, for example, if it is unplugged and later plugged back in.
In this scenario, the user agent must run the following steps:
Let sinkId be the identifier for the newly available device.
For each media element whose sinkId attribute is equal to
            sinkId:
The following paragraph is non-normative.
If the application wishes to react to the device
        change, the application can listen to the 
        devicechange event and query
        enumerateDevices() for the list of updated
        devices.
MediaDevices ExtensionsThis section specifies additions to the MediaDevices
    when the Audio Output Devices API is
    supported.
WebIDLpartial interface MediaDevices {
  Promise<MediaDeviceInfo> selectAudioOutput(optional AudioOutputOptions options = {});
};
      selectAudioOutputPrompts the user to select a specific audio output device.
When the selectAudioOutput method is called,
            the User Agent MUST run the following steps:
If the relevant global object of this does not have
                transient activation, return a promise rejected with
                a DOMException object whose name attribute
                has the value InvalidStateError.
Let options be the method's first argument.
Let deviceId be options.deviceId.
Let p be a new promise.
Run the following steps in parallel:
Let descriptor be a PermissionDescriptor named
                    "speaker-selection".
If descriptor's permission state is
                      "denied", reject
                      p with a new DOMException whose
                      name attribute has the value
                    NotAllowedError, and abort these steps.
Probe the User Agent for available audio output devices.
If there is no audio output device, reject p
                    with a new DOMException whose name
                    attribute has the value NotFoundError and abort
                    these steps.
If deviceId is not
                    "" and matches an id previously exposed by
                    selectAudioOutput in an earlier browsing
                    session, the user agent MAY decide, based on its previous
                    decision of whether to persist this id or not for this set
                    of origins, to run the following sub steps:
Let device be the device identified by deviceId, if available.
If device is available, resolve p with either deviceId or a freshly rotated device id for device, and abort the in-parallel steps.
Prompt the user to choose an audio output device, with descriptor.
If the result of the request is "denied", reject
                    p with a new DOMException whose name attribute
                    has the value NotAllowedError and abort these steps.
Let deviceInfo be a new MediaDeviceInfo object to represent the selected audio output device.
Add deviceInfo.deviceId
                    to [[explicitlyGrantedAudioOutputDevices]].
Resolve p with deviceInfo.
Return p.
Once a device is exposed after a call to selectAudioOutput, it MUST be listed by
            enumerateDevices() for the current browsing context.
If the promise returned by selectAudioOutput is resolved,
            then the user agent MUST ensure the document is both immediately
            allowed to play media in an
            HTMLMediaElement, and immediately
            allowed to start an
            AudioContext, without needing any additional user gesture.
This is imprecise due to the current lack of standardization of autoplay in browsers.
This dictionary describes the options that can be used to obtain access to an audio output device.
WebIDLdictionary AudioOutputOptions {
  DOMString deviceId = "";
};
          AudioOutputOptions MembersdeviceId of type DOMString, defaulting to
              ""When the value of this dictionary member
                is not "", and matches the id previously exposed by
                selectAudioOutput in an earlier session, the user
                agent MAY opt to skip prompting the user in favor of resolving
                with this id or a new rotated id for the same device, assuming
                that device is currently available.
Applications that wish to rely on user agents
                supporting persisted device ids must pass these through
                selectAudioOutput successfully before they will
                work with setSinkId. The reason for this is that it
                exposes fingerprinting information, but at the risk of prompting
                the user if the device is not available or the user agent
                decides not to honor the device id.
This document extends the Web platform with the ability to direct audio output to non-default devices, when user permission is given. User permission is necessary because playing audio out of a non-default device may be unexpected behavior to the user, and may cause a nuisance. For example, suppose a user is in a library or other quiet public place where she is using a laptop with system audio directed to a USB headset. Her expectation is that the laptop’s audio is private and she will not disturb others. If any Web application can direct audio output through arbitrary output devices, a mischievous website may play loud audio out of the laptop’s external speakers without the user’s consent.
To prevent these kinds of nuisance scenarios, the user agent must acquire the user’s consent to access non-default audio output devices. This would prevent the library example outlined earlier, because the application would not be permitted to play out audio from the system speakers.
The specification adds no permission requirement to the default audio output device.
The user agent may explicitly obtain user consent to play audio out of
      non-default output devices using selectAudioOutput.
Implementations MUST also support implicit consent via the
      getUserMedia() permission prompt; when an audio input
      device is permitted and opened via getUserMedia()
      , this also permits access to any associated
      audio output devices (i.e., those with the same groupId).
      This conveniently handles the common case of wanting
      to route both input and output audio through a headset or speakerphone
      device.
On page load, run the following step:
On the relevant global object,
          create an internal slot: [[explicitlyGrantedAudioOutputDevices]],
          used to store devices that the user grants explicitly through selectAudioOutput,
          initialized to an empty set.
This specification specifies the exposure decision algorithm for devices other than camera and microphone. The algorithm runs as follows, with device, microphoneList and cameraList as input:
Let document be the current settings object's responsible document.
Let deviceInfo be a new MediaDeviceInfo object to represent the device.
If document is not 
          allowed to use the feature identified by "speaker-selection",
          or deviceInfo.kind is not "audiooutput",
          return false.
If deviceInfo.deviceId
          is in [[explicitlyGrantedAudioOutputDevices]], return true.
If deviceInfo.groupId
          is the same as the groupId
          of any microphone in microphoneList,
          return true.
return false.
This specification defines one
      policy-controlled feature identified by the string
      "speaker-selection".
      It has a
      default allowlist of "self".
      
A document's permissions policy
        determines whether any content in that document is
        allowed to use selectAudioOutput to prompt the user for
        an audio output device, or
        allowed to use setSinkId to change the device
        through which audio output should be rendered, to a non-system-default
        user-permitted device. For selectAudioOutput this is
        enforced by the prompt the user to choose algorithm.
        
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY and MUST in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.
The following people have contributed directly to the development of this specification: Harald Alvestrand, Rick Byers, Dominique Hazael-Massieux (via the HTML5Apps project), Philip Jägenstedt, Victoria Kirst, Shijun Sun, Martin Thomson, Chris Wilson.
Referenced in:
Referenced in:
Referenced in: