Technique G56:Mixing audio files so that non-speech sounds are at least 20 decibels lower than the speech audio content

About this Technique

This technique relates to 1.4.7: Low or No Background Audio (Sufficient).

This technique applies to any technology.

Description

The objective of this technique is to allow authors to include sound behind speech without making it too hard for people with hearing problems to understand the speech. Making sure that the foreground speech is 20 db louder than the background sound makes the speech 4 times louder than the background audio. For information on Decibels (dB), refer to About Decibels.

Examples

Example 1: An announcer speaking over a riot scene

A narrator is describing a riot scene. The volume of the riot scene is adjusted so that it is 20 db lower than the announcer's volume before the scene is mixed with the narrator.

Example 2: Sufficient audio contrast between a narrator and background music

This example demonstrates a voice with music in the background in which the voice is the appropriate 20 DB above the background. The voice (foreground) is recorded at -17.52 decibels (average RMS) and the music (background) is at -37.52 decibels, which makes the foreground 20 decibels louder than the background.

Audio example

Audio Example: Foreground is 20 decibels above the background (mp3)

Transcript of audio example (good contrast)

"Usually the foreground refers to a voice that is speaking and should be understood. My speaking voice right now is 20 decibels above the background which is the music. This is an example of how it should be done.."

Visual example of the recording above

The audio example above is visually represented below in a snapshot of the file in an audio editor. A section is highlighted that contains foreground and background. It is a much larger wave than the section that contains only background.

A display showing a sound wave. The left third of the sound wave is just background noise, registering from -10 to +10 decibels. The remaining right-hand section is foreground and background noise showing a range from -50 to +40 decibels. — Visual representation of sufficient contrast.

Example 3: Insufficient Audio Contrast between a narrator and background music

Audio example of the failure

This example demonstrates a voice with music in the background in which the voice is not 20 DB above the background. The voice (foreground) is at -18 decibels and the music (background) is at about -16 decibels making the foreground only 2 decibels louder than the background.

Audio Example: Foreground is less than 20 decibels above the background (mp3)

Transcript of audio example (bad contrast)

"This is an example of a voice that is not loud enough against the background. The voice which is the foreground is only about 2 decibels above the background. Therefore is difficult to understand for a person who is hard of hearing. It is hard to discern one word from the next. This is an example of what not to do."

Visual example of the failure

The highlighted section contains foreground and background. The wave is almost the same size the section that contains only background, which means the background is too loud in comparison to the foreground voice.

A display showing a sound wave. A short section on the left is just background noise with a range of -35db to +35db. The main section of the wave has background and foreground noise, with a range of almost -70db to +70db. The final background-only section has a spike of a similar size. — Visual representation of bad contrast.

Related Resources

No endorsement implied.

Audio creation / contrast tutorial

Tests

Procedure

Locate loud values of background content between foreground speech
Measure the volume in dB(A) SPL
Measure the volume of the foreground speech in dB(A) SPL
Subtract the values
Check that the result is 20 or greater.

Expected Results

#5 is true