Basic Principles of Audio Processing

From Librivox wiki
Jump to navigationJump to search


This is the second part of a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.

Post Production 1: Equalisation (EQ) and mains hum removal

Post production tools fall into the categories of

  • FX
  • EQ
  • Dynamics
  • Special tools

what are they all about?

FX stands for effects: flange, phaser, fuzz box etc. We don't really need that in LibriVox so let's chuck it out and forget it. Job done. (If you really insist on reading The Pit and The Pendulum like John Laurie, then just chuck loads of reverb on, you'll wing it.)

EQ Stands for equalisation: tone, high middle and low, bass and treble--you know what I mean. Pretty easy, most people are comfortable with that concept and need read no further. But there are one or two tricks that can be done with digital EQ that are really quite effective, more later.

Dynamics is the one that most people don't understand, simply because there has never been a layman's equivalent, but honestly it's no more difficult than working the bass and treble on your old Hi Fi. Dynamics is about manipulating the level of sounds, whereas EQ is about manipulating the tone colour. I can use a compressor with my eyes shut but I have to think about EQ, so dynamics must be easier.

The one special tool I use is Sound Forge noise reduction--there will be a chapter on that later in this series.

Generally I would EQ before processing the dynamics; this is because EQ affects the loudness of frequency bands so you always have to do some dynamics after EQ even if you've already done it first.

So let's deal with EQ first.

When I was a kid I never touched the cheap sub standard tone controls on record players because of a sincere belief I held which years later was echoed almost word for word by a big record producer. He said, ”Tone controls should be absent from record players. After I've spent 12 hours slaving over a million pound mixer you don't have the right to go messing with my mix with a cheap crappy tone control.”

The EQ available in recording software is infinitely better than the old record players because it doesn't require million quid desks, just a bit of nifty mathematics.

EQ is not a black art if you go at it purposefully.

If a sound is low in one band or high in another, EQ can correct it. If it sounds good it probably is--if it ain't broke don't fix it.

Frequency bands

All sound is made up of various frequencies, perfect human hearing hears from 20 cycles per second to 20,000 cycles per second. When you get older your limit drops to 12-14k. Cycles per second in sound is labelled hertz, named after Heinrich Hertz, the Victorian German physicist who pioneered multinational car rental (now would I lie to you?).

If you play a 20 hertz tone through a speaker that can play that low you will feel it more than you will hear it.

  • 20 – 100 hertz is where the oomph of a kick drum lies in dance music.
  • 100 – 7000 hertz is where we speak and sing, our area of interest.
  • 100 – 300 hertz is the bottom end of a voice, the bit that a big man has more of than a petite woman. This is the area you could call 'boom'.
  • 300 – 900 hertz is the main body of a voice, the part that is pronounced in operatic singing. This part on its own sounds 'honky'.
  • 3000 – 5000 hertz is the fine detail in speech that makes it intelligible. This is the area that disappears on a cheap crappy tone control.
  • 6000 – 7000 hertz is the letter S, and the lush swoosh of a splash cymbal, this quite naturally is the 'hissy bit'.

When you give the frequency bands names like boom, honk and hiss it helps you to identify approximate frequencies by ear. I once walked out of terribly mixed Darkness concert muttering that it was all below 400 with nothing above 1k. In layman's terms, it sounded like a party in the house next door but amplified to the threshold of pain.

Coincidentally, electric guitars occupy all the same frequencies as the human voice--perhaps that explains why they've always been so popular. When I used to mix records, my method was to EQ the guitars down a bit from 3000 – 5000 (the intelligibility part) and raise voices in the same band. Consequently the voice came through the wall of guitar sound with just a small adjustment.

EQ controls

Small inexpensive mixers have EQ controls labelled high, mid and low and can be quite useful as long as they've been set at good intervals.

Audacity has lots of plugins available for it. The essential two to have would be:

  • Parametric
  • Graphic

Graphic EQ

Graphic EQ is the most familiar one to most people: the bands are labelled with freqencies and you can raise or lower the sliders to lift or cut in that band. Anyone can play a graphic EQ by ear. Just try a slider; if it doesn't do what you want put it back and try another. Totally unscary EQ.

Parametric EQ

This is a particularly useful tool for removing mains hum.

Parametric EQ is what you find on pro mixers. You still have the high, mid and low controls, but for each band you have three controls:

  • Q
  • Freq
  • level

Q is a great mystery, a black art known only to those adepts who have been baptised by Phil Spector. Not! Q translates as bandwidth. You can make a very narrow notch on the frequency spectrum by turning up the Q or a wider gentle slope by turning it down. If you find an EQ with the bandwidth labelled as Q then high Q means narrow band and low Q means wide band, it's really that simple. If you're not that confident about adjusting the bandwidth then just set it about half way and use the freq and level controls--you won't go far wrong.

The Freq control is labelled in hertz and simply allows you to center that band on a particular frequency, so you can be quite specific about where you want to lift or cut.

The level control should start off centred and is capable of adding or removing up to 15dB at the centre frequency, tapering off each side according to the Q setting. A good trick I learned is to turn up the level to +15dB and swing the frequency control across the band. As soon as you hit the problem frequency it will stick out like a sore thumb. You can then return the level to zero and continue down to make a gentle cut.

I go to the trouble of explaining Parametric EQ because it is the magic bullet that can kill a mains hum and you'd never know it was there. You couldn't do it with a graphic EQ. More on that later.

The greatest problem you can get on recordings, the most annoying, and the easiest to remove is a constant drone such as mains hum.

A constant drone noise may not be a mains hum but all you have to do is identify the frequencies and then you can filter them out.

Highlight a bit of the file that is supposed to be silent and pull up the Spectrum Analyser. The only things here that will show up on the Analyser are random noise and the hum or buzz.

The Audacity Spectrum tool presents itself as a graph and it can be a useful tool if you want to identify a problem sound. You just need to look for an unusual spike in the display and that is your problem tone. Put the mouse cursor over it and the frequency will be identified for you.

The nature of human hearing is that if you have to raise EQ it's better done with a wide Q but if you have to cut out a problem frequency with EQ it's better done with a narrow Q.

Parametric is my favourite tool for this job.

Mains supply in America is alternating current at 60 Hertz; in Europe it's 50 hertz. If the mains induces interference in your recorded signal, then it's at mains frequency and sometimes at the first harmonic of 100 or 120 hertz, so if I have a sound with a mains hum the spectrum analyser tells me which side of the Atlantic it was recorded on.

Set the bandwidth to narrowest (high Q). On an Audacity plugin a value of 0.02 of an octave is a good start. Set the centre frequency to 60hz U.S. or 50hz U.K., pull down the level control to -25 and process it. If there is still a droning hum it might be that there is a first harmonic of the mains at 100 or 120. In that case you could use a multiband EQ and do the two notches in one go.

If you then spectrum analyse the same bit of file you'll now see a narrow chasm in the graph where you notched out the offending frequency. The hole is so narrow and so far below the human voice that this filtering will simply clean your sound.

If you've still got hiss or random noise in the silent parts there are a couple of tools you can use to clean them out too. More on that in the noise removal section.

Post Production 2: Dynamics


Dynamics is a word that means movement, and in our case it refers to the movement of the level meter, the way the sound bounces up and down. The dynamic range of a sound is basically the amount of difference between the loudest and quietest sounds. A good LibriVox reader knows how to use the level of the voice to create ambience--this is good control of dynamics. You don't miss a word of the tale.

At the other end of the scale, listening to someone with a monotonous unvarying voice is like trying to read a book with a bag on your head: you have to keep rewinding because it couldn't hold your ears.

The human voice is the most dynamic sound in the studio, with the possible exception of the Fender Rhodes Piano, a hideous beast that should never be seen in public without a hard limiter on its output socket. But levity aside, LibriVox is all about the voice and a voice will always benefit from some judicious dynamic control.

There a number of dynamics tools:

  • Compressor
  • Limiter
  • Gate
  • Expander

Now let's whittle that down a bit: a limiter is an aggressive compressor; an expander is a soft gate. So now we're down to two: compressor/limiter and expander/gate. Now let's whittle it down even more: both the gate and the compressor are automatic volume controls--it's just that one works the other way up.

So basically all of your dynamics processing will be done by the same methodology. There is a great tool for Audacity called MDA dynamics and it can be found via the Audacity website. In one pass it covers compression, limiting, gating and expansion.


First let's talk about compression.

To explore these ideas you really need to be able to see something in front of you similar to what I am imagining so fire up your audio editor, load up a voice recording and find Effect | Normalise and normalise the file. All this does is to raise it up to the top limit so that you see the same things I am describing.

Using an audio compressor is a listening art that only works really well after a few years of practice, but using a compressor on Audacity or similar can be done visually since you can see the wave in front of you.

Human speech is the most dynamic sound in the studio: uncompressed, it disappears in a mix since most of it is very quiet and the loud spikes prevent it from being turned up. As a solo performance, the wide dynamics lessen its listenability. I listen to LibriVox books in two environments: on an MP3 player and in my workshop. The MP3 player has a sensible max sound level which is reduced by uncompressed sound. A poor listening environment like my workshop benefits from compressed sound, allowing all elements of the sound to come through without needing the amp turned up.

Using a compressor is easy enough once one knows what it does. When you have your recording and look at it on a digital editor you can see that there are lots of spikes, generally at the start of words, with the remainder of the word tailing off at a much lower level. That spike sets the upper level and the rest of your word falls below the ambient noise in the workshop reducing the intelligibility of the story.

What a compressor does is to monitor the level and according to the settings it has, as soon as the level rises above a preset point it turns it down. Then as soon as the level falls back, the compressor turns it up again.

The result is that all of the spikes have been reduced in level, reducing the difference between the upper and lower levels. The sound level is more consistent; it can now be turned up across the board raising the average level.

There are five controls common to audio compressors, all of which appear on its computer equivalent.

1. Threshold
2. Ratio
3. Attack
4. Release
5. Makeup gain


Threshold is expressed in minus decibels. (-dB).

This is the level above which the compressor will react. If you look at the display on your editor, you will see that digital audio is labelled as zero dB at the top and goes to negative numbers as your signal gets lower. Your recording uses the full height of the available headroom. You'll see that the upper parts are only thin spikes and the main body of the speech is lower down. You would set your threshold somewhere above the main body of the sound so that it will only react to the spikes

Looking at the side of the wave window you will, according to the diplay settings, see the levels labelled as minus dB numbers and you can see that -12dB would be a reasonable level to set the threshold to, the compressor will only react above this so everything below it would be unaffected.

By default Audacity labels the side bar between 0 and 1, but it can be changed to show in dB.


Ratio is expressed as one number to another. (e.g. 2:1)

Once you have your threshold set the ratio controls just how much the excess level will be reduced and it works thus. If a level is 12dB above the threshold the excess level will be reduced by the factor set as ratio. If ratio is 2:1 the 12dB excess will end up a 6dB excess, If you make it 3:1 it will end up as 4dB excess, and so on. If you wanted to limit the level to the threshold, a ratio above 10:1 would ensure that the 12dB excess would be squashed completely, enabling you to subsequently raise the level of the whole file by 12bd.

In Audacity you won't find these things labelled in dB: 0 to 1 seems to be the standard way of quantifying them, but it will still work with a little trial and error.


Attack is expressed in milliseconds (ms)

Attack determines how fast the compressor will react. Musically there is wide range of settings according to the instrument, but in the world of spoken word recordings, especially digital ones I always set this control to zero so that nothing gets by before the compressor does its work.


Release is expressed in milliseconds (ms)

This one determines how long it takes the compressor to restore normal level control after the sound drops back below the threshold, in the digital domain this one has to have a number otherwise the compressor would clip the waves creating unpleasant distortion. 30 milliseconds is a setting that you can't go wrong with.

Makeup Gain

Makeup gain is expressed in decibels (dB)

On an analogue compressor this allows the output volume to be turned up, since compressing the spikes has given us some free headroom. In the digital domain it's best to leave this set at zero since it's better to have some headroom to play with until you've finished processing your file.

So for anyone who's interested, please have a go. Take a raw file and try compressing it different ways. When you've finished with it, normalise it and save it as a new file, play the old one and the new one and compare the difference.

If you've learned something useful give me some feedback (PM chaoscollective on the LibriVox forum). If my efforts have made a difference, I'll write instructions for removing hums, hiss, random noise and equalisation.

The Limiter

The limiter does not need explanation: it's a cheap compressor with a ratio higher than 10:1. It's a wicked machine, the work of the Beelzebub and all his little minions such as tacky pop music producers; it is designed only to make their records seem louder than last week's number one. Give it no quarter. Its best use is as a doorstop to hold open the studio door on nice days.

The Gate

The original noise reduction tool.

A gate is another one of those dark devices available only to us adepts. No not really, it's just the opposite of a compressor, with one extra control. It's called an expander and that extra control is called range.

The gate is a bit of a mystery to most people simply because there is only one similar device in the non-studio world, and it's nice to have a similar device to think about since it gives a spark of understanding right away.

If you have never used a two-way radio, you will not know what I mean. The only equivalent is the squelch control on a two-way radio. CB, Ham radio, and site walkie-talkies all have it. When you use a radio like this, if your opposite number is not pressing his mic key and sending you a signal you hear a hash of white noise--static in old money.

The squelch control is a gate: you turn up the threshold and the radio goes silent because the static is not strong enough to trigger the squelch to open. As soon as your pal on the other radio keys his mic, a radio wave is present in your receiver, which is much stronger than the static and so opens the squelch/gate. There you go, it's now much more of a pleasure to use that radio without all that nasty noise when no one is talking. Noise removal for two way radio, it works just the same for audio recordings. Read on...

The gate is a track cleaning device that predates digital recording. When all of the instruments were recorded to tape and had to be passed through the mixer in real time, the hardware gate was essential.

In a studio, it is common to use about eight mics on the drums. A stereo pair and individual mics on single drums, especially the snare. With so many mics, the drum sound can get a bit wild and woolly and difficult to mix, so all the single drum mics are gated. What the gate does is the inverse of what the compressor does. It monitors the level at the threshold, when the sound is above the threshold the gate is open, the sound is unaffected. When the sound falls below the threshold the gate closes shutting off the sound channel. On a drum mix this means that there are not always eight mics contributing to the wild and woolly sound, the channels only open when the drum is hit.

Another use of a gate: Imagine the recording of Freak by Radiohead (check it out on youtube): the verse is played on a bass and one gentle clean guitar. The second guitarist is waiting for the chorus with a guitar sound that can kill at ten paces. While he waits his amp will be humming, all guitar amps do. While he's not playing, the gate is shut and his track is beautifully silent, when he does start to play, the gate opens and the amp hum is buried under the guitar sound. When he is finished the gate discreetly closes and the hum is gone again.

All modern gates including their VST counterparts have a range control: instead of closing the gate you can set the range control to reduce the level rather than shut the channel. When it's used like this it's called an expander, which is the opposite of a compressor, it opens up the dynamic range rather than reduces it. A compressor reduces the loudest sounds; a gate reduces the quietest sounds.

So... what you need to do is:

1. Set your compressor to do its work to the top end of the dynamic range and when the make up gain has turned up the background and the breathing, you can use the gate to treat the bottom of the dynamic range. Start by setting the range control to infinite or as high as you can, so that the gate shuts--this way you can hear when the threshold is crossed.

2. Adjust the threshold so that the words come through and the breathing doesn't, now you've got the threshold between the words and the breathing.

3. Bring the range control up so that instead of closing, the gate reduces, you'll find that between the words the background is now lower.

4. The jump between levels will probably be quite abrupt, which brings us to the last two controls: attack and release. The attack controls how long the gate takes to open, so will be at the start of a word; the release controls the closing or reducing time so will occur at the end of a word.

By giving these settings long enough values, the gate will open and close smoothly without abrupt changes in level which might be noticeable.

On a snare drum you would have a very short attack to catch the impact and a fairly long release so that the gate doesn't slam shut and cut off the tail. On a voice you can have a longer attack and still need a fairly long release so that you don't cut off the word ends. But again, it's down to the nature of the speaker, you have to experiment until you get the right sound. But just knowing what the functions are will enable you to achieve what you want.

It's not just breathing you can bring down with this--if you can hear the computer fan or passing traffic beween the words soft gating will take them down.

I once was mixing a Bob Dylan cover song and we noticed a strange sound. The backing vocalist was asthmatic and during the verses whilst wearing headphones with a very loud monitor mix in them she wasn't aware that the sensitive vocal mic was recording her wheezing breath. I simply plugged a gate into the channel and set the threshold, no more wheezing in the stereo mix. “Cracking Contraption, Gromit!”