Understanding Audio Normalization

Note: for tips on managing your audio settings in Vocal Video, go here.

We take great pains to produce videos at the best visual quality possible. But that's just half the battle. In order for your video to stand out, you need excellent audio as well.

Your responses are likely to run the gamut of audio quality. Some smartphones have a great mic, but a respondent may be holding the phone too far away. If a user is too close to their microphone, you're likely to hear a lot of clipping and pops. Not to mention some people just speak a lot more loudly than others.

Furthermore, you should also consider your video in the context of all the other audio one might encounter. Have you ever been switching channels on your television or running through the radio dial and been shocked at the relative loudness of one station to the next? While some broadcasters hope this will grab your attention, we've all had the experience of hitting play on a video and immediately regretting the blaring sound.

All You Need Is LUFS

But what is loudness? While most listeners only have access to a single volume knob, audio is made up of a number of different frequencies, from low end bass, to high end treble. The human ear has evolved different sensitivities to each of these frequencies. So the process can't be as simple as "setting everything to medium."

In 2011, the ITU-R BS. 1770-2 introduced the Loudness, K-weighted, relative to full scale (LKFS) international standard. This mouthful of an acronym made a uniform way to define the loudness. And, because it was such a mouthful, better marketing prevailed with a rebranding of the term to Loudness Units relative to Full Scale or LUFS.

With the LUFS standard, we have a way to measure, target, and adjust audio. So if someone is scrolling through social media while listening to streaming audio, and your video catches their eye, if they start playing, they won't need to adjust their audio at all.

So LUFS can help your video play nicely with other media. What about audio variation between respondents? And what if you include a soundtrack in your video? When mastering audio tracks in Vocal Video, we have two main goals:

Maintain a consistent volume between speakers
Soundtracks should enhance the experience, and never drown out a voice

LUFS can help here, as well. Remember the LU (loudness units) portion of the acronym? If we assign loudness units to each speaker, and the soundtrack, we can calculate a Loudness Range (LRA). The greater the range, the more variation there is between the quietest segment and the loudest.

We can then use this range to adjust the entire audio track to fit a comfortable spectrum. Ideally we'll preserve the dynamic range, so louder portions meant to stand out still have impact, but nothing is ever too hard to hear. Additional, no volume will ever be 'pushed into the red' wherein a spectrum can become maxed out and result in clipping or distortion.

Choosing a Standard

Setting these exact parameters can be subjective, but at the end of the day, there is one generally agreed upon standard. The BBC does a ton of work in this field and put together a comprehensive series of recommendations called Audio Engineering Society (AES) Recommendation 1004.

We figure if it's good enough for the BBC, it's good enough for our customers. So that's the audio profile we target with our default normalization.

To get the most out of this normalization, we suggest leaving all your video & audio scenes at the default (100%) volume setting. This will provide the normalization algorithm with the maximum amount of information when normalizing. For soundtracks, a level of about 15% is good for most songs, but if you're uploading your own file, you may need to adjust based on the type of music.

Normalizing Only Speakers vs. All Audio

Because soundtracks are played continuously while respondents are speaking, you might want the music to automatically get quieter when people are speaking. If so, and we'll enable our "Adjust Soundtrack Dynamically" setting in your account. With that setting enabled, the audio normalization will apply to all of the audio in your video, not just the speakers.

With this setting, if your video starts with a logo scene and a dramatic music cue, you'll hear the full impact of the song. However, once someone starts talking, we'll fade out the soundtrack so the subject's audio is in primary focus. If you include a text scene between speakers, the soundtrack will come back into focus until the next speaker begins.

We feel these options enable you to get well-produced audio tracks with minimal effort from you. Unfortunately, because audio normalization is applied during the post-production phase of video publishing, we can't offer a real time preview during the editing process.

Fine-tuning Your Audio

If you're not getting good results from audio normalization, or would like to customize your video further, you can opt out of audio normalization globally or on a video-by-video basis (see here for how). If you choose to go this route, the Vocal Video editor will be a true what-you-hear-is-what-you-get experience.

When editing without normalization, you can change the volume of individual video or audio scenes by using the volume control under the scene settings menu.

You can also adjust the soundtrack volume in the soundtrack panel. If a particular song isn't working for the cadence of your video, we'd encourage you to try another song from our collection of fully licensed soundtracks. Or, you can take things even further and upload your own music file (just make sure you have secured the appropriate rights to use the song.)

Understanding Audio Normalization

All You Need Is LUFS

Choosing a Standard

Normalizing Only Speakers vs. All Audio

Fine-tuning Your Audio

Related Articles