1. Introduction
People increasingly consume media (audio/video) through the Web, which has become a primary channel for accessing this type of content. However, media on the Web often lacks seamless integration with underlying platforms. The Audio Session API addresses this gap by enhancing media handling across platforms that support audio session management or similar audio focus features. This API improves how web-based audio interacts with other apps, allowing for better audio mixing or exclusive playback, depending on the context, to provide a more consistent and integrated media experience across devices.
Additionally, some platforms automatically manage a site’s audio session based on media playback and the APIs used to play audio. However, this behavior might not always align with user expectations. This API allows developers to override the default behavior and gain more control over an audio session.
2. Concepts
A web page can do audio processing in various ways, combining different APIs like HTMLMediaElement
or AudioContext
.
This audio processing has a start and a stop, which aggregates all the different audio APIs being used.
An audio session represents this aggregated audio processing. It allows web pages to express the general nature
of the audio processing done by the web page.
An audio session can be of a particular type, and be in a particular state. An audio session manages the audio for a set of individual sources (microphone recording) and sinks (audio rendering), named audio session elements.
An audio session's element has a number of properties:
-
A default type, which is used to compute the audio session type, in case of "
auto
". -
An audible flag, which is either
true
if the element is playing/recording audio, orfalse
otherwise.
An audio session element is an audible element if its audible flag is true
.
Additionaly, an audio session element has associated steps for dealing with various state changes. By default, each of these steps is empty list of steps:
-
Element update steps, which are run whenever the audio session state changes.
-
Element suspend steps, which are run when the audio session state moves from
active
to eitherinterrupted
orinactive
. -
Element resume steps, which are run when audio session state moves from
interrupted
toactive
.
A top-level browsing context has a selected audio session. In case of a change to any audio session, the user agent will update which audio session becomes the selected audio session.
A top-level browsing context is said to have audio focus if its selected audio session is not null
and its state is active
.
3. The AudioSession
interface
AudioSession
is the main interface for this API.
It is accessed through the Navigator
interface (see § 4 Extensions to the Navigator interface).
[Exposed =Window ]interface :
AudioSession EventTarget {attribute AudioSessionType ;
type readonly attribute AudioSessionState ;
state attribute EventHandler ; };
onstatechange
To create an AudioSession
object in realm, run the following steps:
-
Let audioSession be a new
AudioSession
object in realm, initialized with the following internal slots:-
[[type]] to store the audio session type, initialized to
auto
. -
[[state]] to store the audio session state, initialized to
inactive
. -
[[elements]] to store the audio session elements, initialized to an empty list.
-
[[interruptedElements]] to store the audio session elements that where interrupted while being audible, initialized to an empty list.
-
[[appliedType]] to store the type applied to the audio session, initialized to
auto
. -
[[isTypeBeingApplied]] flag to store whether the type is being applied to the audio session, initialized to
false
.
-
-
Return audioSession.
Each AudioSession
object is uniquely tied to its underlying audio session.
The AudioSession
state attribute reflects its audio session state.
On getting, it MUST return the AudioSession
[[state]] value.
The AudioSession
type attribute reflects its audio session type, except for auto
.
On getting, it MUST return the AudioSession
[[type]] value.
On setting, it MUST run the following steps with newValue being the new value being set on audioSession:
-
If audioSession.[[type]] is equal to newValue, abort these steps.
-
Set audioSession.[[type]] to newValue.
-
Update the type of audioSession.
3.1. Audio session types
By convention, there are several different audio session types for different purposes.
In the API, these are represented by the AudioSessionType
enum:
playback
- Playback audio, which is used for video or music playback, podcasts, etc. They should not mix with other playback audio. (Maybe) they should pause all other audio indefinitely.
transient
- Transient audio, such as a notification ping. They usually should play on top of playback audio (and maybe also "duck" persistent audio).
transient-solo
- Transient solo audio, such as driving directions. They should pause/mute all other audio and play exclusively. When a transient-solo audio ended, it should resume the paused/muted audio.
ambient
- Ambient audio, which is mixable with other types of audio. This is useful in some special cases such as when the user wants to mix audios from multiple pages.
play-and-record
- Play and record audio, which is used for recording audio. This is useful in cases microphone is being used or in video conferencing applications.
auto
- Auto lets the user agent choose the best audio session type according the use of audio by the web page. This is the default type of
AudioSession
.
enum {
AudioSessionType "auto" ,"playback" ,"transient" ,"transient-solo" ,"ambient" ,"play-and-record" };
An AudioSessionType
is an exclusive type if it is playback
, play-and-record
or transient-solo
.
3.2. Audio session states
An audio session can be in one of the following state , which are represented in the API by the AudioSessionState
enum:
active
- the audio session is playing sound or recording microphone.
interrupted
- the audio session is not playing sound nor recording microphone, but can resume when it will get uninterrupted.
inactive
- the audio session is not playing sound nor recording microphone.
enum {
AudioSessionState "inactive" ,"active" ,"interrupted" };
The audio session's state may change, which will automatically update the state of its AudioSession
object.
4. Extensions to the Navigator
interface
Each Window
has an associated AudioSession, which is an AudioSession
object.
It represents the default audio session that is used by the user agent to automatically set up the audio session parameters.
The user agent will request or abandon audio focus when audio session elements start or finish playing.
Upon creation of the Window
object, its associated AudioSession MUST be set to a newly created AudioSession
object with the Window
object’s relevant realm.
The associated AudioSession list of elements is updated dynamically as audio sources and sinks of the Window
object are created or removed.
[Exposed =Window ]partial interface Navigator { // The default audio session that the user agent will use when media elements start/stop playing.readonly attribute AudioSession ; };
audioSession
5. Privacy considerations
6. Security considerations
7. Examples
7.1. A site sets its audio session type proactively to "play-and-record"
navigator. audioSession. type= 'play-and-record' ; // From now on, volume might be set based on 'play-and-record'. ... // Start playing remote media remoteVideo. srcObject= remoteMediaStream; remoteVideo. play(); // Start capturing navigator. mediaDevices. getUserMedia({ audio: true , video: true }) . then(( stream) => { localVideo. srcObject= stream; });
7.2. A site reacts upon interruption
navigator. audioSession. type= "play-and-record" ; // From now on, volume might be set based on 'play-and-record'. ... // Start playing remote media remoteVideo. srcObject= remoteMediaStream; remoteVideo. play(); // Start capturing navigator. mediaDevices. getUserMedia({ audio: true , video: true }) . then(( stream) => { localVideo. srcObject= stream; }); navigator. audioSession. onstatechange= async () => { if ( navigator. audioSession. state=== "interrupted" ) { localVideo. pause(); remoteVideo. pause(); // Make it clear to the user that the call is interrupted. showInterruptedBanner(); for ( const trackof localVideo. srcObject. getTracks()) { track. enabled= false ; } } else { // Let user decide when to restart the call. const shouldRestart= await showOptionalRestartBanner(); if ( ! shouldRestart) { return ; } for ( const trackof localVideo. srcObject. getTracks()) { track. enabled= true ; } localVideo. play(); remoteVideo. play(); } };
8. Acknowledgements
The Working Group acknowledges the following people for their invaluable contributions to this specification:
-
Becca Hughes
-
Mounir Lamouri
-
Zhiqiang Zhang