Basic Principles of Audio Processing
WORK IN PROGRESS - RuthieG
This is the second part of a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.
Post Production 1: Equalisation (EQ) and mains hum removal
Post production tools fall into the categories of
- FX
- EQ
- Dynamics
- Special tools
what are they all about?
FX stands for effects: flange, phaser, fuzz box etc. We don't really need that in LibriVox so let's chuck it out and forget it. Job done. (If you really insist on reading The Pit and The Pendulum like John Laurie, then just chuck loads of reverb on, you'll wing it.)
EQ Stands for equalisation: tone, high middle and low, bass and treble--you know what I mean. Pretty easy, most people are comfortable with that concept and need read no further. But there are one or two tricks that can be done with digital EQ that are really quite effective, more later.
Dynamics is the one that most people don't understand, simply because there has never been a layman's equivalent, but honestly it's no more difficult than working the bass and treble on your old Hi Fi. Dynamics is about manipulating the level of sounds, whereas EQ is about manipulating the tone colour. I can use a compressor with my eyes shut but I have to think about EQ, so dynamics must be easier.
The one special tool I use is Sound Forge noise reduction--there will be a chapter on that later in this series.
Generally I would EQ before processing the dynamics; this is because EQ affects the loudness of frequency bands so you always have to do some dynamics after EQ even if you've already done it first.
So let's deal with EQ first.
When I was a kid I never touched the cheap sub standard tone controls on record players because of a sincere belief I held which years later was echoed almost word for word by a big record producer. He said, ”Tone controls should be absent from record players. After I've spent 12 hours slaving over a million pound mixer you don't have the right to go messing with my mix with a cheap crappy tone control.”
The EQ available in recording software is infinitely better than the old record players because it doesn't require million quid desks, just a bit of nifty mathematics.
EQ is not a black art if you go at it purposefully.
If a sound is low in one band or high in another, EQ can correct it. If it sounds good it probably is--if it ain't broke don't fix it.
Frequency bands
All sound is made up of various frequencies, perfect human hearing hears from 20 cycles per second to 20,000 cycles per second. When you get older your limit drops to 12-14k. Cycles per second in sound is labelled hertz, named after Heinrich Hertz, the Victorian German physicist who pioneered multinational car rental (now would I lie to you?).
If you play a 20 hertz tone through a speaker that can play that low you will feel it more than you will hear it.
- 20 – 100 hertz is where the oomph of a kick drum lies in dance music.
- 100 – 7000 hertz is where we speak and sing, our area of interest.
- 100 – 300 hertz is the bottom end of a voice, the bit that a big man has more of than a petite woman. This is the area you could call 'boom'.
- 300 – 900 hertz is the main body of a voice, the part that is pronounced in operatic singing. This part on its own sounds 'honky'.
- 3000 – 5000 hertz is the fine detail in speech that makes it intelligible. This is the area that disappears on a cheap crappy tone control.
- 6000 – 7000 hertz is the letter S, and the lush swoosh of a splash cymbal, this quite naturally is the 'hissy bit'.
When you give the frequency bands names like boom, honk and hiss it helps you to identify approximate frequencies by ear. I once walked out of terribly mixed Darkness concert muttering that it was all below 400 with nothing above 1k. In layman's terms, it sounded like a party in the house next door but amplified to the threshold of pain.
Coincidentally, electric guitars occupy all the same frequencies as the human voice--perhaps that explains why they've always been so popular. When I used to mix records, my method was to EQ the guitars down a bit from 3000 – 5000 (the intelligibility part) and raise voices in the same band. Consequently the voice came through the wall of guitar sound with just a small adjustment.
EQ controls
Small inexpensive mixers have EQ controls labelled high, mid and low and can be quite useful as long as they've been set at good intervals.
Audacity has lots of plugins available for it. The essential two to have would be:
- Parametric
- Graphic
Graphic EQ
Graphic EQ is the most familiar one to most people: the bands are labelled with freqencies and you can raise or lower the sliders to lift or cut in that band. Anyone can play a graphic EQ by ear. Just try a slider; if it doesn't do what you want put it back and try another. Totally unscary EQ.
Parametric EQ
Parametric EQ is what you find on pro mixers. You still have the high, mid and low controls, but for each band you have three controls:
- Q
- Freq
- level
Q is a great mystery, a black art known only to those adepts who have been baptised by Phil Spector. Not! Q translates as bandwidth. You can make a very narrow notch on the frequency spectrum by turning up the Q or a wider gentle slope by turning it down. If you find an EQ with the bandwidth labelled as Q then high Q means narrow band and low Q means wide band, it's really that simple. If you're not that confident about adjusting the bandwidth then just set it about half way and use the freq and level controls--you won't go far wrong.
The Freq control is labelled in hertz and simply allows you to center that band on a particular frequency, so you can be quite specific about where you want to lift or cut.
The level control should start off centred and is capable of adding or removing up to 15dB at the centre frequency, tapering off each side according to the Q setting. A good trick I learned is to turn up the level to +15dB and swing the frequency control across the band. As soon as you hit the problem frequency it will stick out like a sore thumb. You can then return the level to zero and continue down to make a gentle cut.
I go to the trouble of explaining Parametric EQ because it is the magic bullet that can kill a mains hum and you'd never know it was there. You couldn't do it with a graphic EQ. More on that later.
The greatest problem you can get on recordings, the most annoying, and the easiest to remove is a constant drone such as mains hum.
A constant drone noise may not be a mains hum but all you have to do is identify the frequencies and then you can filter them out.
Highlight a bit of the file that is supposed to be silent and pull up the Spectrum Analyser. The only things here that will show up on the Analyser are random noise and the hum or buzz.
The Audacity Spectrum tool presents itself as a graph and it can be a useful tool if you want to identify a problem sound. You just need to look for an unusual spike in the display and that is your problem tone. Put the mouse cursor over it and the frequency will be identified for you.
The nature of human hearing is that if you have to raise EQ it's better done with a wide Q but if you have to cut out a problem frequency with EQ it's better done with a narrow Q.
Parametric is my favourite tool for this job.
Mains supply in America is alternating current at 60 Hertz; in Europe it's 50 hertz. If the mains induces interference in your recorded signal, then it's at mains frequency and sometimes at the first harmonic of 100 or 120 hertz, so if I have a sound with a mains hum the spectrum analyser tells me which side of the Atlantic it was recorded on.
Set the bandwidth to narrowest (high Q). On an Audacity plugin a value of 0.02 of an octave is a good start. Set the centre frequency to 60hz U.S. or 50hz U.K., pull down the level control to -25 and process it. If there is still a droning hum it might be that there is a first harmonic of the mains at 100 or 120. In that case you could use a multiband EQ and do the two notches in one go.
If you then spectrum analyse the same bit of file you'll now see a narrow chasm in the graph where you notched out the offending frequency. The hole is so narrow and so far below the human voice that this filtering will simply clean your sound.
If you've still got hiss or random noise in the silent parts there are a couple of tools you can use to clean them out too. More on that in the noise removal section.
Post Production 2: Dynamics Part 1
Dynamics
Dynamics is a word that means movement, and in our case it refers to the movement of the level meter, the way the sound bounces up and down. The dynamic range of a sound is basically the amount of difference between the loudest and quietest sounds. A good LibriVox reader knows how to use the level of the voice to create ambience--this is good control of dynamics. You don't miss a word of the tale.
At the other end of the scale, listening to someone with a monotonous unvarying voice is like trying to read a book with a bag on your head: you have to keep rewinding because it couldn't hold your ears.
The human voice is the most dynamic sound in the studio, with the possible exception of the Fender Rhodes Piano, a hideous beast that should never be seen in public without a hard limiter on its output socket. But levity aside, LibriVox is all about the voice and a voice will always benefit from some judicious dynamic control.
There a number of dynamics tools:
- Compressor
- Limiter
- Gate
- Expander
Now let's whittle that down a bit: a limiter is an aggressive compressor; an expander is a soft gate. So now we're down to two: compressor/limiter and expander/gate. Now let's whittle it down even more: both the gate and the compressor are automatic volume controls--it's just that one works the other way up.
So basically all of your dynamics processing will be done by the same methodology. There is a great tool for Audacity called MDA dynamics and it can be found via the Audacity website. In one pass it covers compression, limiting, gating and expansion.
Compression
First let's talk about compression.
To explore these ideas you really need to be able to see something in front of you similar to what I am imagining so fire up your audio editor, load up a voice recording and find Effect | Normalise and normalise the file. All this does is to raise it up to the top limit so that you see the same things I am describing.
Using an audio compressor is a listening art that only works really well after a few years of practice, but using a compressor on Audacity or similar can be done visually since you can see the wave in front of you.
Human speech is the most dynamic sound in the studio: uncompressed, it disappears in a mix since most of it is very quiet and the loud spikes prevent it from being turned up. As a solo performance, the wide dynamics lessen its listenability. I listen to LibriVox books in two environments: on an MP3 player and in my workshop. The MP3 player has a sensible max sound level which is reduced by uncompressed sound. A poor listening environment like my workshop benefits from compressed sound, allowing all elements of the sound to come through without needing the amp turned up.
Using a compressor is easy enough once one knows what it does. When you have your recording and look at it on a digital editor you can see that there are lots of spikes, generally at the start of words, with the remainder of the word tailing off at a much lower level. That spike sets the upper level and the rest of your word falls below the ambient noise in the workshop reducing the intelligibility of the story.
What a compressor does is to monitor the level and according to the settings it has, as soon as the level rises above a preset point it turns it down. Then as soon as the level falls back, the compressor turns it up again.
The result is that all of the spikes have been reduced in level, reducing the difference between the upper and lower levels. The sound level is more consistent; it can now be turned up across the board raising the average level.
There are five controls common to audio compressors, all of which appear on its computer equivalent.
1. Threshold
2. Ratio
3. Attack
4. Release
5. Makeup gain
Threshold
Threshold is expressed in minus decibels. (-dB).
This is the level above which the compressor will react. If you look at the display on your editor, you will see that digital audio is labelled as zero dB at the top and goes to negative numbers as your signal gets lower. Your recording uses the full height of the available headroom. You'll see that the upper parts are only thin spikes and the main body of the speech is lower down. You would set your threshold somewhere above the main body of the sound so that it will only react to the spikes
Looking at the side of the wave window you will, according to the diplay settings, see the levels labelled as minus dB numbers and you can see that -12dB would be a reasonable level to set the threshold to, the compressor will only react above this so everything below it would be unaffected.
By default Audacity labels the side bar between 0 and 1, but it can be changed to show in dB.
Ratio
Ratio is expressed as one number to another. (e.g. 2:1)
Once you have your threshold set the ratio controls just how much the excess level will be reduced and it works thus. If a level is 12dB above the threshold the excess level will be reduced by the factor set as ratio. If ratio is 2:1 the 12dB excess will end up a 6dB excess, If you make it 3:1 it will end up as 4dB excess, and so on. If you wanted to limit the level to the threshold, a ratio above 10:1 would ensure that the 12dB excess would be squashed completely, enabling you to subsequently raise the level of the whole file by 12bd.
In Audacity you won't find these things labelled in dB: 0 to 1 seems to be the standard way of quantifying them, but it will still work with a little trial and error.
Attack
Attack is expressed in milliseconds (ms)
Attack determines how fast the compressor will react. Musically there is wide range of settings according to the instrument, but in the world of spoken word recordings, especially digital ones I always set this control to zero so that nothing gets by before the compressor does its work.
Release
Release is expressed in milliseconds (ms)
This one determines how long it takes the compressor to restore normal level control after the sound drops back below the threshold, in the digital domain this one has to have a number otherwise the compressor would clip the waves creating unpleasant distortion. 30 milliseconds is a setting that you can't go wrong with.
Makeup Gain
Makeup gain is expressed in decibels (dB)
On an analogue compressor this allows the output volume to be turned up, since compressing the spikes has given us some free headroom. In the digital domain it's best to leave this set at zero since it's better to have some headroom to play with until you've finished processing your file.
So for anyone who's interested, please have a go. Take a raw file and try compressing it different ways. When you've finished with it, normalise it and save it as a new file, play the old one and the new one and compare the difference.
If you've learned something useful give me some feedback (PM chaoscollective on the LibriVox forum). If my efforts have made a difference, I'll write instructions for removing hums, hiss, random noise and equalisation.
The Limiter
The limiter does not need explanation: it's a cheap compressor with a ratio higher than 10:1. It's a wicked machine, the work of the Beelzebub and all his little minions such as tacky pop music producers; it is designed only to make their records seem louder than last week's number one. Give it no quarter. Its best use is as a doorstop to hold open the studio door on nice days.