Basic Principles of Audio Processing: Difference between revisions

From Librivox wiki
Jump to navigationJump to search
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
<font color="red"><b>WORK IN PROGRESS</b></font> - RuthieG
<font color="red"><b>WORK IN PROGRESS</b></font> - RuthieG


This is a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.
This is the second part of a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.


== Digital recording, a brief overview ==
== Post Production 1: Equalisation (EQ) and mains hum removal==


Recording to a computer these days is cheap and relatively easy. In 1997 recording software cost cost me 150 Pounds. These days there are much better free and open source applications to cover all that one can do in a studio. Audacity is one of the best. I'll try to refer to Audacity as much as I can so that what I write can be tried by all those who have the inclination.
Post production tools fall into the categories of  
* FX
* EQ
* Dynamics 
* Special tools


Digital and tape, what's the difference?
what are they all about?


=== <b>Tape</b> ===
'''FX''' stands for effects: flange, phaser, fuzz box etc. We don't really need that in LibriVox so let's chuck it out and forget it. Job done. (If you really insist on reading ''The Pit and The Pendulum'' like John Laurie, then just chuck loads of reverb on, you'll wing it.)


* Magnetic tape records sound as a continuously variable magnetic field along the length of the tape; all of us over teenage are familiar with cassettes and possibly eight track cartridges. Magnetic tape has advantages and disadvantages against digital.
'''EQ''' Stands for equalisation: tone, high middle and low, bass and treble--you know what I mean. Pretty easy, most people are comfortable with that concept and need read no further. But there are one or two tricks that can be done with digital EQ that are really quite effective, more later.


* Tape recordings degrade over time.
'''Dynamics''' is the one that most people don't understand, simply because there has never been a layman's equivalent, but honestly it's no more difficult than working the bass and treble on your old Hi Fi. Dynamics is about manipulating the ''level'' of sounds, whereas EQ is about manipulating the ''tone colour''. I can use a compressor with my eyes shut but I have to think about EQ, so dynamics must be easier.


* Even the best tape has inbuilt hiss.
The one '''special tool''' I use is [http://en.wikipedia.org/wiki/Sound_Forge Sound Forge] noise reduction--there will be a chapter on that later in this series.


* Tape has to be moved extremely accurately both in position and speed, requiring very high quality hardware. Just one part out of adjustment can ruin a recording.
Generally I would EQ before processing the dynamics; this is because EQ affects the loudness of frequency bands so you always have to do some dynamics after EQ even if you've already done it first.


* Tape distorts the sound; admittedly it distorts the sound in a musically pleasing way such that Pink Floyd, Elton John and Kate Bush have never recorded in digital studios.
So let's deal with EQ first.


* Tape forgives high recording levels; the nature of magnetism is that when tape is over-saturated it makes music sound nicer, so much so that in studios it is deliberately over-driven to achieve this richly pleasing effect.
When I was a kid I never touched the cheap sub standard tone controls on record players because of a sincere belief I held which years later was echoed almost word for word by a big record producer. He said, ”Tone controls should be absent from record players. After I've spent 12 hours slaving over a million pound mixer you don't have the right to go messing with my mix with a cheap crappy tone control.


=== <b>Digital</b> ===
The EQ available in recording software is infinitely better than the old record players because it doesn't require million quid desks, just a bit of nifty mathematics.


Where the waves on tape are stored as field strength, digital recordings are stored as a long series of numbers, which is what computers excel at. In fact storing and shifting numbers is the only thing a computer can do, but they can do it very fast, so much so that the numbers can be coloured spots on a screen or voltages on a speaker. Once you start shifting the numbers quick enough, the pictures move and the speaker sings: video and audio.  
EQ is not a black art if you go at it purposefully.  


==== ''How it works'' ====
If a sound is low in one band or high in another, EQ can correct it. If it sounds good it probably is--if it ain't broke don't fix it.


The ever changing voltage caused by a sound vibration from microphone or mixer is presented to a tiny measuring circuit called an Analogue to Digital converter. At predetermined intervals the circuit measures the microphone voltage and assigns a number as its value. At CD quality this voltage is measured 44,000 times per second. You may have heard mention of 16 bit and 24 bit and 32 bit sound; this refers to the accuracy to which the measurements are taken. A 16 bit binary number equates to about 65,000 in old money. CD is 16 bit 44k, therefore every 44th of a millisecond a measurement is taken and stored as a number between 0 and 65,000.
=== Frequency bands ===


To play back that recording you do quite the opposite, every 44th of a millisecond a number is taken from memory and presented to another circuit, the Digital to Analogue converter, which then produces a voltage on its output in proportion to the number. The ever changing numbers produce an ever changing voltage which drives the speaker and you've got your sound vibrations back.
All sound is made up of various frequencies, perfect human hearing hears from 20 cycles per second to 20,000 cycles per second. When you get older your limit drops to 12-14k. Cycles per second in sound is labelled [http://en.wikipedia.org/wiki/Hertz hertz], named after Heinrich Hertz, the Victorian German physicist who pioneered multinational car rental (now would I lie to you?).  


Digital recordings when looked at really closely do not look like a smooth curvy wave, they look like a series of steps, but the steps are so small and the duration of them so short that you don't hear the steps, it smooths out to an uncannily accurate reproduction, which is why even the smallest cheapest MP3 player sounds much better than the best cassette player.
If you play a 20 hertz tone through a speaker that can play that low you will ''feel'' it more than you will hear it.  


Simple really, when someone sensible takes all the waffle away, isn't it?
* 20 – 100 hertz is where the oomph of a kick drum lies in dance music.


==== ''Benefits and risks of digital recording'' ====
* 100 – 7000 hertz is where we speak and sing, our area of interest.


Digital recordings are very accurate, the accuracy determined only by the quality of the DA and AD converters.
* 100 – 300 hertz is the bottom end of a voice, the bit that a big man has more of than a petite woman. This is the area you could call 'boom'.


But there is a risk: if the signal going in ''exceeds'' the measuring capacity of the converter it can't possibly get a higher number than 65k or a lower one than zero. Digital does not forgive overdrive. Digital distortion will make you throw off your headphones; it is about as pleasant as the sound made by a scallywag dragging a sharp key along the side of your brand new car.
* 300 – 900 hertz is the main body of a voice, the part that is pronounced in operatic singing. This part on its own sounds 'honky'.  


Consequently when making a recording it is imperative to see to it that the signal '''never''' reaches and crosses 0dB DFS (Digital Full Scale).
* 3000 – 5000 hertz is the fine detail in speech that makes it intelligible. This is the area that disappears on a cheap crappy tone control.  


On digital equipment zero decibels is the measure of the highest level, all other values are expressed as minus decibel numbers, all the way down to minus 96dB in the case of 16 bit or CD quality.
* 6000 – 7000 hertz is the letter S, and the lush swoosh of a splash cymbal, this quite naturally is the 'hissy bit'.  


When I record to digital tape, I record with a maximum peak value of -12 dB; in film soundtracks the value is more like -20dB. This leaves room for unexpected peaks to remain undistorted they can be compressed later, but [http://en.wikipedia.org/wiki/Clipping_(audio) clipping] (going over the maximum level) can't be fixed easily.
When you give the frequency bands names like boom, honk and hiss it helps you to identify approximate frequencies by ear. I once walked out of terribly mixed Darkness concert muttering that it was all below 400 with nothing above 1k. In layman's terms, it sounded like a party in the house next door but amplified to the threshold of pain.  


In the days of vinyl records, the instruments were recorded to 24 track tape resulting in tape hiss. When it came time to mixdown, the tape signals were passed through processors and effects units and the mixing desk, picking up electronic noise along the way. The mix then went down to a stereo master tape picking up more tape hiss. The master was then taken to the pressing plant where it was passed through yet more processors in the mastering process picking up yet more electronic noise until it was cut to a master pressing disc, which was then used to press records, which ended up on your turntable. The fact that the resultant record sounded extremely clean and nice explains why a decent analogue studio costs a million pounds, but an equivalent [http://en.wikipedia.org/wiki/Pro_Tools Pro Tools] set up costs fourteen thousand.
Coincidentally, electric guitars occupy all the same frequencies as the human voice--perhaps that explains why they've always been so popular. When I used to mix records, my method was to EQ the guitars down a bit from 3000 – 5000 (the intelligibility part) and raise voices in the same band. Consequently the voice came through the wall of guitar sound with just a small adjustment.  


=== Decibels (dB) an explanation ===
=== EQ controls ===


Audio, whether floating through the air or as an electrical voltage, is measured in decibels, which is one tenth of a bel. When processing sound it is useful to understand decibels, which are not like other measurements. For a start the decibel is [http://en.wikipedia.org/wiki/Logarithmic_scale logarithmic], because human hearing is logarithmic.  
Small inexpensive mixers have EQ controls labelled high, mid and low and can be quite useful as long as they've been set at good intervals.  


If I give you a ten pound load to carry, you would rate it as some value of heaviness. Then if I gave you another ten pound load to carry, it would feel twice as heavy. Hearing is not like this. Ten watts of audio power playing in a room could be measured at some position as 90dB, twenty watts would register a level of 93dB at the same spot, not 180dB.
Audacity has lots of plugins available for it. The essential two to have would be:
* Parametric
* Graphic


That is the rule of thumb. For every 3dB up, double the actual power; for every 3dB down divide by two. This measuring in air is called SPL (sound pressure level) and is a consistent measurement: 90dB SPL is the same volume wherever it occurs.
==== Graphic EQ ====


* 85dB SPL is the recommended limit for long term industrial exposure without protection
Graphic EQ is the most familiar one to most people: the bands are labelled with freqencies and you can raise or lower the sliders to lift or cut in that band. Anyone can play a graphic EQ by ear. Just try a slider; if it doesn't do what you want put it back and try another. Totally unscary EQ.
* a jackhammer is 110dB SPL 
* a jumbo jet on take off is 120dB SPL
* Motorhead once played a gig above 120dB SPL. (Long live King Lemmy!)
* 140 dB SPL causes instant and permanent deafness.  


When talking about signal levels, decibels are used in different contexts. In an analogue channel of a mixer, dB is a relative measurement where 0dB represents the upper limit of signal strength for that channel but with some leeway above it.
==== Parametric EQ ====


In digital recordings the decibel is referred to as dB DFS [http://en.wikipedia.org/wiki/Digital_full_scale (Digital Full Scale)] where 0dB DFS is the '''absolute upper limit''' and all values are measured in minus values, there is no such thing as a positive dB in Digital Full Scale.
''This is a particularly useful tool for removing mains hum.''


When talking about “line level” electrical signals that pass audio between equipment, 0db V (voltage) is defined as 775 millivolts. Perversely, pro equipment has its 0dB point at a line level of +4dB V and consumer equipment has a line level of -10dB V.
Parametric EQ is what you find on pro mixers. You still have the high, mid and low controls, but for each band you have three controls:
* Q
* Freq
* level


Confused? I am, but the important bit is the part about logarithmic hearing, just remember that part and you'll have a feel for what decibels measure.
'''Q''' is a great mystery, a black art known only to those adepts who have been baptised by Phil Spector. Not! Q translates as bandwidth. You can make a very narrow notch on the frequency spectrum by turning up the Q or a wider gentle slope by turning it down. If you find an EQ with the bandwidth labelled as Q then high Q means narrow band and low Q means wide band, it's really that simple. If you're not that confident about adjusting the bandwidth then just set it about half way and use the freq and level controls--you won't go far wrong.


== Digital recording: How to do it from scratch ==
The '''Freq''' control is labelled in hertz and simply allows you to center that band on a particular frequency, so you can be quite specific about where you want to lift or cut.


To make decent recordings for LibriVox is easy. If you know that LibriVox exists you already have the most expensive bit of kit you'll need.
The '''level''' control should start off centred and is capable of adding or removing up to 15dB at the centre frequency, tapering off each side according to the Q setting. A good trick I learned is to turn up the level to +15dB and swing the frequency control across the band. As soon as you hit the problem frequency it will stick out like a sore thumb. You can then return the level to zero and continue down to make a gentle cut.


All you need to add to that is:
I go to the trouble of explaining Parametric EQ because it is the magic bullet that can kill a mains hum and you'd never know it was there. You couldn't do it with a graphic EQ. More on that later.


* Recording software (You can't go wrong with [http://wiki.librivox.org/index.php/Audacity_1-2-3 Audacity].)
The greatest problem you can get on recordings, the most annoying, and the easiest to remove is a constant drone such as mains hum.
* A microphone (not expensive, if it sounds OK then it is)
* For some mics, a small mixer to act as a pre amp and gain control for the microphone. (Look up Behringer mixers, best value in the business)
* A pop shield


=== Microphones ===
A constant drone noise may not be a mains hum but all you have to do is identify the frequencies and then you can filter them out.


There are condenser mics and dynamic mics.
Highlight a bit of the file that is supposed to be silent and pull up the Spectrum Analyser. The only things here that will show up on the Analyser are random noise and the hum or buzz.


If you have a condenser microphone it needs power: either it has a battery compartment or it doesn't. If it doesn't, you need a mixer with a 48 volt phantom power button. This sends the mic power up the signal cable.
The Audacity Spectrum tool presents itself as a graph and it can be a useful tool if you want to identify a problem sound. You just need to look for an unusual spike in the display and that is your problem tone. Put the mouse cursor over it and the frequency will be identified for you.


If you have a dynamic mic you don't need phantom power. A word of warning here, with most dynamic mics, if you switch on phantom power you will instantly need a new mic.
The nature of human hearing is that if you have to raise EQ it's better done with a wide Q but if you have to cut out a problem frequency with EQ it's better done with a narrow Q.  
 
There are three types of connection for mics:
 
* USB
* 6mm or 3.5mm Jack (unbalanced)
* XLR (balanced)
 
The USB type has a flat connector that plugs into the rectangular USB port on your computer. XLR is a substantial plug with three pins. 6mm Jack is the round silver plug that you would plug into an electric guitar; 3.5 mm Jack is the small round plug that fits into the sound card on your computer.
 
The reason for the different types is simply that 3 core XLR is a method of getting interference in the mic cable to cancel itself out. 2 core Jack is less expensive. The USB mic is particularly useful for LibriVox purposes, as it is Plug and Play and suffers less from background noise.
 
=== Mixer ===
 
'''Note:''' it is entirely possible that you won't need a mixer. I know of at least one laptop which could record a nice clean signal through the mic socket but in my experience the mic socket in my desktop machine was filthy with noise. USB mics also don't require a mixer.
 
Your mixer doesn't have to mix anything, the smallest Behringer has one mic channel and costs peanuts. What it does though is give you a beautiful clean pre-amplifier, gain controls and a set of EQ controls to set the right tonal balance. No microphone has a flat frequency response and if yours does not suit your tastes you can adjust this with the EQ.


To record to the built-in sound interface on a PC, you'll need an output cable which will have two 6mm mono jacks at one end and one 3.5mm stereo jack at the other. The small jack will go into the line socket on your sound card, the 6mm monos will go into the unbalanced outputs from your mixer. You mixer may only have RCA connectors for unbalanced output, in which case get a cable to suit those. (RCA connectors are also known as phonos and are to be found on the back of CD players and on Playstations, one red, one white, the yellow on a Playstation is video.)
Parametric is my favourite tool for this job.


=== Pop shield ===
Mains supply in America is alternating current at 60 Hertz; in Europe it's 50 hertz. If the mains induces interference in your recorded signal, then it's at mains frequency and sometimes at the first harmonic of 100 or 120 hertz, so if I have a sound with a mains hum the spectrum analyser tells me which side of the Atlantic it was recorded on.


This is not essential but if you're close to the mic it will prevent popping on Bs and Ps. Say “Buh” and “Puh” to the palm of your hand and feel the blast of air. That blast is not sound it is wind and wind plays havoc with microphones, that's why TV and film recordists have a big hairy dog on a stick. The hairy dog is hollow with the mic floating in the middle. Sound can get through the hairy dogs furry coat but wind can't.
Set the bandwidth to narrowest (high Q). On an Audacity plugin a value of 0.02 of an octave is a good start. Set the centre frequency to 60hz U.S. or 50hz U.K., pull down the level control to -25 and process it. If there is still a droning hum it might be that there is a first harmonic of the mains at 100 or 120. In that case you could use a multiband EQ and do the two notches in one go.  


A pop shield is a disc of tenuous fabric suspended four inches in front of the mic, you can make one with a pop sock and a wire coat-hanger. Hey, maybe that's why they're called pop socks, they stop your mic popping.
If you then spectrum analyse the same bit of file you'll now see a narrow chasm in the graph where you notched out the offending frequency. The hole is so narrow and so far below the human voice that this filtering will simply clean your sound.


=== Setting up the signal chain, when using a mixer ===
If you've still got hiss or random noise in the silent parts there are a couple of tools you can use to clean them out too. More on that in the noise removal section.


Put your mic cable in your mic channel. Connect your mixer output to the sound card and you're ready to set the signal.
== Post Production 2: Dynamics ==


There are at least three level controls on your mixer and there's a reason for them all to be there, it's called gain staging. Simply said, the level controls ensure that the signal level is optimum at each stage.
=== Dynamics ===


Bigger mixers have linear faders, smaller mixers have rotary faders: they do the same job. Faders are always labelled such that when set at 0dB there is room to move up as well as down which is where you get your freedom to adjust the mix if you're mixing. The 0dB setting is where the signal passes through unchanged in value.  
Dynamics is a word that means movement, and in our case it refers to the movement of the level meter, the way the sound bounces up and down. The dynamic range of a sound is basically the amount of difference between the loudest and quietest sounds. A good LibriVox reader knows how to use the level of the voice to create ambience--this is good control of dynamics. You don't miss a word of the tale.


Set your mic channel fader and master output fader to 0dB. Just where the mic connects there will be a small rotary control labelled trim, turn it right down. In this position, on the smallest inexpensive mixer you will be able to use the output meter to set the mic trim level. The output meter on my small Behringer is three LEDs, like a traffic light. Bigger mixers may have a bar graph or a needle meter.
At the other end of the scale, listening to someone with a monotonous unvarying voice is like trying to read a book with a bag on your head: you have to keep rewinding because it couldn't hold your ears.


At this point you get your sound source going into the microphone. e.g. Read a book.
The human voice is the most dynamic sound in the studio, with the possible exception of the Fender Rhodes Piano, a hideous beast that should never be seen in public without a hard limiter on its output socket. But levity aside, LibriVox is all about the voice and a voice will always benefit from some judicious dynamic control.


While slowly turning up the trim control, watch the meter: if it's labelled in decibels you want to be bouncing around somewhere near 0dB. But don't peak too far over 0dB.
There a number of dynamics tools:


In the case of my traffic light meter you want some green light on most of the time with the occasional flash of yellow light; if the red light comes up the signal is too high.
* Compressor
* Limiter
* Gate
* Expander


Next look on the mic channel, there should be a red LED usually labelled 'Clip'. If this lights up, the channel is overloading, so turn down the trim a bit. You're nearly there.
Now let's whittle that down a bit: a '''limiter''' is an aggressive '''compressor'''; an '''expander''' is a soft '''gate'''. So now we're down to two: '''compressor/limiter''' and '''expander/gate'''. Now let's whittle it down even more: both the gate and the compressor are automatic volume controls--it's just that one works the other way up.


Now turn down the master output fader and start looking at the recording meter on Audacity which should be on Record and Pause.  
So basically all of your dynamics processing will be done by the same methodology. There is a great tool for Audacity called  '''MDA dynamics''' and it can be found via the [http://wiki.audacityteam.org/index.php?title=VST_Plug-ins#List_of_functional_Plug-ins Audacity website]. In one pass it covers compression, limiting, gating and expansion.


With your sound source still going, start turning up the output fader on the mixer. The reason for this is that 'line level' on a PC is lower than 'line out' on a mixer, so 0dB would be too loud. We have to come up at it from below. This is the '''gain staging''' process: setting the gain at each stage so it's the correct level for that stage.
==== Compression ====


If you find that the computer seems too sensitive, double click the speaker icon in the system tray. When the Windows mixer opens, click Options | Properties. Change the check box from playback to recording. The 'line in' fader may be all the way to the top, drag it down a bit to make the computer less sensitive to the mixer.
First let's talk about compression.  


Back to the mixer: with Audacity running and in Record and Pause you'll have a meter to watch. Bring up the mixer output until the highest peak is reaching -12 dB on the recorder meter.
To explore these ideas you really need to be able to see something in front of you similar to what I am imagining so fire up your audio editor, load up a voice recording and find '''Effect | Normalise''' and normalise the file. All this does is to raise it up to the top limit so that you see the same things I am describing.  


At this point, decide if you'll be monitoring the recording on headphones--probably not because all headphones leak sound and it will loop back through the mic. But you do need to do a bit whilst monitoring just to check that it actually sounds OK. While monitoring you can apply a bit of EQ on the mixer; for instance you can take a bit of low EQ off if you are coming out too boomy.  
Using an audio compressor is a listening art that only works really well after a few years of practice, but using a compressor on Audacity or similar can be done visually since you can see the wave in front of you.  


Monitoring must always be done from the end of the line so don't use the headphone socket on the mixer, use the 'line out' socket on the sound card.
Human speech is the most dynamic sound in the studio: uncompressed, it disappears in a mix since most of it is very quiet and the loud spikes prevent it from being turned up. As a solo performance, the wide dynamics lessen its listenability. I listen to LibriVox books in two environments: on an MP3 player and in my workshop. The MP3 player has a sensible max sound level which is reduced by uncompressed sound. A poor listening environment like my workshop benefits from compressed sound, allowing all elements of the sound to come through without needing the amp turned up.  


If all is OK then you are ready to make your recording. You have just set up a signal chain like a professional would. See? I told you it was easy.
Using a compressor is easy enough once one knows what it does. When you have your recording and look at it on a digital editor you can see that there are lots of spikes, generally at the start of words, with the remainder of the word tailing off at a much lower level. That spike sets the upper level and the rest of your word falls below the ambient noise in the workshop reducing the intelligibility of the story.  


The rest of it is a breeze, just click off the pause and yap.
What a compressor does is to monitor the level and according to the settings it has, as soon as the level rises above a preset point it turns it down. Then as soon as the level falls back, the compressor turns it up again.  


== Post Production 1: Equalisation (EQ) and mains hum removal==
The result is that all of the spikes have been reduced in level, reducing the difference between the upper and lower levels. The sound level is more consistent; it can now be turned up across the board raising the average level.


Post production tools fall into the categories of  
There are five controls common to audio compressors, all of which appear on its computer equivalent.
* FX
* EQ
* Dynamics 
* Special tools


what are they all about?
1. Threshold<br />
2. Ratio<br />
3. Attack<br />
4. Release<br />
5. Makeup gain<br />


'''FX''' stands for effects: flange, phaser, fuzz box etc. We don't really need that in LibriVox so let's chuck it out and forget it. Job done. (If you really insist on reading ''The Pit and The Pendulum'' like John Laurie, then just chuck loads of reverb on, you'll wing it.)
===== Threshold =====
 
'''EQ''' Stands for equalisation: tone, high middle and low, bass and treble--you know what I mean. Pretty easy, most people are comfortable with that concept and need read no further. But there are one or two tricks that can be done with digital EQ that are really quite effective, more later.


'''Dynamics''' is the one that most people don't understand, simply because there has never been a layman's equivalent, but honestly it's no more difficult than working the bass and treble on your old Hi Fi. Dynamics is about manipulating the ''level'' of sounds, whereas EQ is about manipulating the ''tone colour''. I can use a compressor with my eyes shut but I have to think about EQ, so dynamics must be easier.
''Threshold is expressed in minus decibels.'' (-dB).


The one '''special tool''' I use is [http://en.wikipedia.org/wiki/Sound_Forge Sound Forge] noise reduction--there will be a chapter on that later in this series.
This is the level above which the compressor will react. If you look at the display on your editor, you will see that digital audio is labelled as zero dB at the top and goes to negative numbers as your signal gets lower. Your recording uses the full height of the available headroom. You'll see that the upper parts are only thin spikes and the main body of the speech is lower down. You would set your threshold somewhere above the main body of the sound so that it will only react to the spikes


Generally I would EQ before processing the dynamics; this is because EQ affects the loudness of frequency bands so you always have to do some dynamics after EQ even if you've already done it first.
Looking at the side of the wave window you will, according to the diplay settings, see the levels labelled as minus dB numbers and you can see that -12dB would be a reasonable level to set the threshold to, the compressor will only react above this so everything below it would be unaffected.
 
So let's deal with EQ first.
 
When I was a kid I never touched the cheap sub standard tone controls on record players because of a sincere belief I held which years later was echoed almost word for word by a big record producer. He said, ”Tone controls should be absent from record players. After I've spent 12 hours slaving over a million pound mixer you don't have the right to go messing with my mix with a cheap crappy tone control.”
 
The EQ available in recording software is infinitely better than the old record players because it doesn't require million quid desks, just a bit of nifty mathematics.
 
EQ is not a black art if you go at it purposefully.
 
If a sound is low in one band or high in another, EQ can correct it. If it sounds good it probably is--if it ain't broke don't fix it.
 
=== Frequency bands ===


All sound is made up of various frequencies, perfect human hearing hears from 20 cycles per second to 20,000 cycles per second. When you get older your limit drops to 12-14k. Cycles per second in sound is labelled [http://en.wikipedia.org/wiki/Hertz hertz], named after Heinrich Hertz, the Victorian German physicist who pioneered multinational car rental (now would I lie to you?).  
By default Audacity labels the side bar between 0 and 1, but it can be changed to show in dB.  


If you play a 20 hertz tone through a speaker that can play that low you will ''feel'' it more than you will hear it.
===== Ratio =====


* 20 – 100 hertz is where the oomph of a kick drum lies in dance music.  
''Ratio is expressed as one number to another.'' (e.g. 2:1)


* 100 – 7000 hertz is where we speak and sing, our area of interest.  
Once you have your threshold set the ratio controls just how much the excess level will be reduced and it works thus. If a level is 12dB above the threshold the excess level will be reduced by the factor set as ratio. If ratio is 2:1 the 12dB excess will end up a 6dB excess, If you make it 3:1 it will end up as 4dB excess, and so on. If you wanted to limit the level to the threshold, a ratio above 10:1 would ensure that the 12dB excess would be squashed completely, enabling you to subsequently raise the level of the whole file by 12bd.  


* 100 – 300 hertz is the bottom end of a voice, the bit that a big man has more of than a petite woman. This is the area you could call 'boom'.
In Audacity you won't find these things labelled in dB: 0 to 1 seems to be the standard way of quantifying them, but it will still work with a little trial and error.


* 300 – 900 hertz is the main body of a voice, the part that is pronounced in operatic singing. This part on its own sounds 'honky'.
===== Attack =====


* 3000 – 5000 hertz is the fine detail in speech that makes it intelligible. This is the area that disappears on a cheap crappy tone control.
''Attack is expressed in milliseconds'' (ms)


* 6000 – 7000 hertz is the letter S, and the lush swoosh of a splash cymbal, this quite naturally is the 'hissy bit'.  
Attack determines how fast the compressor will react. Musically there is wide range of settings according to the instrument, but in the world of spoken word recordings, especially digital ones I always set this control to zero so that nothing gets by before the compressor does its work.  


When you give the frequency bands names like boom, honk and hiss it helps you to identify approximate frequencies by ear. I once walked out of terribly mixed Darkness concert muttering that it was all below 400 with nothing above 1k. In layman's terms, it sounded like a party in the house next door but amplified to the threshold of pain.
===== Release =====


Coincidentally, electric guitars occupy all the same frequencies as the human voice--perhaps that explains why they've always been so popular. When I used to mix records, my method was to EQ the guitars down a bit from 3000 – 5000 (the intelligibility part) and raise voices in the same band. Consequently the voice came through the wall of guitar sound with just a small adjustment.
''Release is expressed in milliseconds'' (ms)


=== EQ controls ===
This one determines how long it takes the compressor to restore normal level control after the sound drops back below the threshold, in the digital domain this one has to have a number otherwise the compressor would clip the waves creating unpleasant distortion. 30 milliseconds is a setting that you can't go wrong with.


Small inexpensive mixers have EQ controls labelled high, mid and low and can be quite useful as long as they've been set at good intervals.
===== Makeup Gain =====


Sound Forge has 3 types of EQs:
''Makeup gain is expressed in decibels'' (dB)


* Graphic
On an ''analogue'' compressor this allows the output volume to be turned up, since compressing the spikes has given us some free headroom. In the ''digital'' domain it's best to leave this set at zero since it's better to have some headroom to play with until you've finished processing your file.  
* Parametric
* Paragraphic.  


==== Graphic EQ ====
So for anyone who's interested, please have a go. Take a raw file and try compressing it different ways. When you've finished with it, normalise it and save it as a new file, play the old one and the new one and compare the difference.


Graphic EQ is the most familiar one to most people: the bands are labelled with freqencies and you can raise or lower the sliders to lift or cut in that band. Anyone can play a graphic EQ by ear. Just try a slider; if it doesn't do what you want put it back and try another. Totally unscary EQ.
If you've learned something useful give me some feedback (PM chaoscollective on the [http://librivox.org/forum/index.php LibriVox forum]). If my efforts have made a difference, I'll write instructions for removing hums, hiss, random noise and equalisation.  


==== Parametric EQ ====  
==== The Limiter ====


Parametric EQ is what you find on pro mixers. You still have the high mid and low controls, but for each band you have three controls
The '''limiter''' does not need explanation: it's a cheap compressor with a ratio higher than 10:1. It's a wicked machine, the work of the Beelzebub and all his little minions such as tacky pop music producers; it is designed only to make their records seem louder than last week's number one. Give it no quarter. Its best use is as a doorstop to hold open the studio door on nice days.
* Freq
* Q
* level


The '''Freq''' control is labelled in hertz and simply allows you to center that band on a particular frequency, so you can be quite specific about where you want to lift or cut.
==== The Gate ==== 


'''Q''' is a great mystery, a black art known only to those adepts who have been baptised by Phil Spector. Not! Q translates as bandwidth. You can make a very narrow notch on the frequency spectrum by turning up the Q or a wider gentle slope by turning it down. If you find an EQ with the bandwidth labelled as Q then high Q means narrow band and low Q means wide band, it's really that simple. If you're not that confident about adjusting the bandwidth then just set it about half way and use the freq and level controls--you won't go far wrong.
''The original noise reduction tool.''


The '''level''' control should start off centred and is capable of adding or removing up to 15dB at the centre frequency, tapering off each side according to the Q setting. A good trick I learned is to turn up the level to +15dB and swing the frequency control across the band, as soon as you hit the problem frequency it will stick out like a sore thumb. You can then return the level to zero and continue down to make a gentle cut.
A gate is another one of those dark devices available only to us adepts. No not really, it's just the opposite of a compressor, with one extra control. It's called an expander and that extra control is called range.  


I go to the trouble of explaining Parametric EQ because it is the magic bullet that can kill a mains hum and you'd never know it was there. You couldn't do it with a graphic EQ. More on that later.  
The gate is a bit of a mystery to most people simply because there is only one similar device in the non-studio world, and it's nice to have a similar device to think about since it gives a spark of understanding right away.  


The greatest problem you can get on recordings, the most annoying, and the easiest to remove is a constant drone such as mains hum.
If you have never used a two-way radio, you will not know what I mean. The only equivalent is the squelch control on a two-way radio. CB, Ham radio, and site walkie-talkies all have it. When you use a radio like this, if your opposite number is not pressing his mic key and sending you a signal you hear a hash of white noise--static in old money.  


A constant drone noise may not be a mains hum but all you have to do is identify the frequencies and then you can filter them out.
The squelch control is a gate: you turn up the threshold and the radio goes silent because the static is not strong enough to trigger the squelch to open. As soon as your pal on the other radio keys his mic, a radio wave is present in your receiver, which is much stronger than the static and so opens the squelch/gate. There you go, it's now much more of a pleasure to use that radio without all that nasty noise when no one is talking. Noise removal for two way radio, it works just the same for audio recordings. Read on...


Highlight a bit of the file that is supposed to be silent and pull up the Spectrum Analyser. The only things here that will show up on the Analyser are random noise and the hum or buzz.
The gate is a track cleaning device that predates digital recording. When all of the instruments were recorded to tape and had to be passed through the mixer in real time, the hardware gate was essential.


The Sound Forge spectrum analyser presents itself as a graph and it can be a useful tool if you want to identify a problem sound. There just have to be plugins for all the other programmes too--seek and you will find.
In a studio, it is common to use about eight mics on the drums. A stereo pair and individual mics on single drums, especially the snare. With so many mics, the drum sound can get a bit wild and woolly and difficult to mix, so all the single drum mics are gated. What the gate does is the inverse of what the compressor does. It monitors the level at the threshold, when the sound is above the threshold the gate is open, the sound is unaffected. When the sound falls below the threshold the gate closes shutting off the sound channel. On a drum mix this means that there are not always eight mics contributing to the wild and woolly sound, the channels only open when the drum is hit.  


The base of the graph is Frequency and the side bar is Level in dB. The line across the graph shows the level of various frequencies. The line is generally pretty straight, slightly canted to the right and tails off at the top frequencies. If there is an obvious hump or spike in the line you have a problem that is easy to find. Zoom in on the spike. You can identify the center frequency in the hump or spike and remember it while you dig up the parametric EQ.  
Another use of a gate: Imagine the recording of ''Freak'' by Radiohead (check it out on youtube): the verse is played on a bass and one gentle clean guitar. The second guitarist is waiting for the chorus with a guitar sound that can kill at ten paces. While he waits his amp will be humming, all guitar amps do. While he's not playing, the gate is shut and his track is beautifully silent, when he does start to play, the gate opens and the amp hum is buried under the guitar sound. When he is finished the gate discreetly closes and the hum is gone again.


The nature of human hearing is that if you have to raise EQ it's better done with a wide Q but if you have to cut out a problem frequency with EQ it's better done with a narrow Q.  
All modern gates including their VST counterparts have a range control: instead of closing the gate you can set the range control to reduce the level rather than shut the channel. When it's used like this it's called an expander, which is the opposite of a compressor, it opens up the dynamic range rather than reduces it. A compressor reduces the loudest sounds; a gate reduces the quietest sounds.  


==== Paragraphic EQ ====
So... what you need to do is:


Paragraphic EQ is my favourite tool for this job, it's a four band parametric EQ with a little graph which shows you what you are about to do to a sound.  
1. Set your compressor to do its work to the top end of the dynamic range and when the make up gain has turned up the background and the breathing, you can use the gate to treat the bottom of the dynamic range. Start by setting the '''range control''' to infinite or as high as you can, so that the gate shuts--this way you can hear when the threshold is crossed.  


There are presets: one of the presets is 60hz mains hum removal. All the presets do is set the controls.  
2. Adjust the '''threshold''' so that the words come through and the breathing doesn't, now you've got the threshold between the words and the breathing.  


Mains supply in America is alternating current at 60 Hertz, in Europe it's 50 hertz. If the mains induces interference in your recorded signal, then it's at mains frequency and sometimes at the first harmonic of 100 or 120 hertz so if I have a sound with a mains hum the spectrum analyser tells me which side of the Atlantic it was recorded on.  
3. Bring the '''range control''' up so that instead of closing, the gate reduces, you'll find that between the words the background is now lower.  


If you've got the Sound Forge paragraphic EQ showing, you can pull down the preset for mains hum removal and you'll see what I describe.  
4. The jump between levels will probably be quite abrupt, which brings us to the last two controls: attack and release. The '''attack''' controls how long the gate takes to open, so will be at the start of a word; the '''release''' controls the closing or reducing time so will occur at the end of a word.  


Other parametric EQs will do the job just as well though.
By giving these settings long enough values, the gate will open and close smoothly without abrupt changes in level which might be noticeable.  


Set the bandwidth to narrowest (high Q). On Sound Forge, this will narrow down to one note, set the centre frequency to 60hz U.S. or 50Hz U.K. and pull down the level control to -25. On the Sound Forge EQ, all of the four bands can have the same settings so you can stack four filters to give 100 decibels of hum removal. Your little graph will have a 100 decibel hole one note wide where the mains hum should be. Now when you consider that there's only 96 decibels to the bottom of a CD that's pretty good filtering. If there is still a droning hum it might be that there is a first harmonic of the mains at 100 or 120. In that case I usually use two filters for each frequency: 50 dB is as good as dead.  
On a snare drum you would have a very short attack to catch the impact and a fairly long release so that the gate doesn't slam shut and cut off the tail. On a voice you can have a longer attack and still need a fairly long release so that you don't cut off the word ends. But again, it's down to the nature of the speaker, you have to experiment until you get the right sound. But just knowing what the functions are will enable you to achieve what you want.  


If you then spectrum analyse the same bit of file you'll now see a narrow chasm in the graph with a tiny mains hum spike in the bottom of it at about -80dB. The hole is so narrow and so far below the human voice that this filtering will simply clean your sound.
It's not just breathing you can bring down with this--if you can hear the computer fan or passing traffic beween the words soft gating will take them down.


If you've still got hiss or random noise in the silent parts there are a couple of tools you can use to clean them out too. More on that in the Noise Removal section.
I once was mixing a Bob Dylan cover song and we noticed a strange sound. The backing vocalist was asthmatic and during the verses whilst wearing headphones with a very loud monitor mix in them she wasn't aware that the sensitive vocal mic was recording her wheezing breath. I simply plugged a gate into the channel and set the threshold, no more wheezing in the stereo mix. ''“Cracking Contraption, Gromit!”''

Latest revision as of 13:38, 22 April 2010

WORK IN PROGRESS - RuthieG

This is the second part of a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.

Post Production 1: Equalisation (EQ) and mains hum removal

Post production tools fall into the categories of

  • FX
  • EQ
  • Dynamics
  • Special tools

what are they all about?

FX stands for effects: flange, phaser, fuzz box etc. We don't really need that in LibriVox so let's chuck it out and forget it. Job done. (If you really insist on reading The Pit and The Pendulum like John Laurie, then just chuck loads of reverb on, you'll wing it.)

EQ Stands for equalisation: tone, high middle and low, bass and treble--you know what I mean. Pretty easy, most people are comfortable with that concept and need read no further. But there are one or two tricks that can be done with digital EQ that are really quite effective, more later.

Dynamics is the one that most people don't understand, simply because there has never been a layman's equivalent, but honestly it's no more difficult than working the bass and treble on your old Hi Fi. Dynamics is about manipulating the level of sounds, whereas EQ is about manipulating the tone colour. I can use a compressor with my eyes shut but I have to think about EQ, so dynamics must be easier.

The one special tool I use is Sound Forge noise reduction--there will be a chapter on that later in this series.

Generally I would EQ before processing the dynamics; this is because EQ affects the loudness of frequency bands so you always have to do some dynamics after EQ even if you've already done it first.

So let's deal with EQ first.

When I was a kid I never touched the cheap sub standard tone controls on record players because of a sincere belief I held which years later was echoed almost word for word by a big record producer. He said, ”Tone controls should be absent from record players. After I've spent 12 hours slaving over a million pound mixer you don't have the right to go messing with my mix with a cheap crappy tone control.”

The EQ available in recording software is infinitely better than the old record players because it doesn't require million quid desks, just a bit of nifty mathematics.

EQ is not a black art if you go at it purposefully.

If a sound is low in one band or high in another, EQ can correct it. If it sounds good it probably is--if it ain't broke don't fix it.

Frequency bands

All sound is made up of various frequencies, perfect human hearing hears from 20 cycles per second to 20,000 cycles per second. When you get older your limit drops to 12-14k. Cycles per second in sound is labelled hertz, named after Heinrich Hertz, the Victorian German physicist who pioneered multinational car rental (now would I lie to you?).

If you play a 20 hertz tone through a speaker that can play that low you will feel it more than you will hear it.

  • 20 – 100 hertz is where the oomph of a kick drum lies in dance music.
  • 100 – 7000 hertz is where we speak and sing, our area of interest.
  • 100 – 300 hertz is the bottom end of a voice, the bit that a big man has more of than a petite woman. This is the area you could call 'boom'.
  • 300 – 900 hertz is the main body of a voice, the part that is pronounced in operatic singing. This part on its own sounds 'honky'.
  • 3000 – 5000 hertz is the fine detail in speech that makes it intelligible. This is the area that disappears on a cheap crappy tone control.
  • 6000 – 7000 hertz is the letter S, and the lush swoosh of a splash cymbal, this quite naturally is the 'hissy bit'.

When you give the frequency bands names like boom, honk and hiss it helps you to identify approximate frequencies by ear. I once walked out of terribly mixed Darkness concert muttering that it was all below 400 with nothing above 1k. In layman's terms, it sounded like a party in the house next door but amplified to the threshold of pain.

Coincidentally, electric guitars occupy all the same frequencies as the human voice--perhaps that explains why they've always been so popular. When I used to mix records, my method was to EQ the guitars down a bit from 3000 – 5000 (the intelligibility part) and raise voices in the same band. Consequently the voice came through the wall of guitar sound with just a small adjustment.

EQ controls

Small inexpensive mixers have EQ controls labelled high, mid and low and can be quite useful as long as they've been set at good intervals.

Audacity has lots of plugins available for it. The essential two to have would be:

  • Parametric
  • Graphic

Graphic EQ

Graphic EQ is the most familiar one to most people: the bands are labelled with freqencies and you can raise or lower the sliders to lift or cut in that band. Anyone can play a graphic EQ by ear. Just try a slider; if it doesn't do what you want put it back and try another. Totally unscary EQ.

Parametric EQ

This is a particularly useful tool for removing mains hum.

Parametric EQ is what you find on pro mixers. You still have the high, mid and low controls, but for each band you have three controls:

  • Q
  • Freq
  • level

Q is a great mystery, a black art known only to those adepts who have been baptised by Phil Spector. Not! Q translates as bandwidth. You can make a very narrow notch on the frequency spectrum by turning up the Q or a wider gentle slope by turning it down. If you find an EQ with the bandwidth labelled as Q then high Q means narrow band and low Q means wide band, it's really that simple. If you're not that confident about adjusting the bandwidth then just set it about half way and use the freq and level controls--you won't go far wrong.

The Freq control is labelled in hertz and simply allows you to center that band on a particular frequency, so you can be quite specific about where you want to lift or cut.

The level control should start off centred and is capable of adding or removing up to 15dB at the centre frequency, tapering off each side according to the Q setting. A good trick I learned is to turn up the level to +15dB and swing the frequency control across the band. As soon as you hit the problem frequency it will stick out like a sore thumb. You can then return the level to zero and continue down to make a gentle cut.

I go to the trouble of explaining Parametric EQ because it is the magic bullet that can kill a mains hum and you'd never know it was there. You couldn't do it with a graphic EQ. More on that later.

The greatest problem you can get on recordings, the most annoying, and the easiest to remove is a constant drone such as mains hum.

A constant drone noise may not be a mains hum but all you have to do is identify the frequencies and then you can filter them out.

Highlight a bit of the file that is supposed to be silent and pull up the Spectrum Analyser. The only things here that will show up on the Analyser are random noise and the hum or buzz.

The Audacity Spectrum tool presents itself as a graph and it can be a useful tool if you want to identify a problem sound. You just need to look for an unusual spike in the display and that is your problem tone. Put the mouse cursor over it and the frequency will be identified for you.

The nature of human hearing is that if you have to raise EQ it's better done with a wide Q but if you have to cut out a problem frequency with EQ it's better done with a narrow Q.

Parametric is my favourite tool for this job.

Mains supply in America is alternating current at 60 Hertz; in Europe it's 50 hertz. If the mains induces interference in your recorded signal, then it's at mains frequency and sometimes at the first harmonic of 100 or 120 hertz, so if I have a sound with a mains hum the spectrum analyser tells me which side of the Atlantic it was recorded on.

Set the bandwidth to narrowest (high Q). On an Audacity plugin a value of 0.02 of an octave is a good start. Set the centre frequency to 60hz U.S. or 50hz U.K., pull down the level control to -25 and process it. If there is still a droning hum it might be that there is a first harmonic of the mains at 100 or 120. In that case you could use a multiband EQ and do the two notches in one go.

If you then spectrum analyse the same bit of file you'll now see a narrow chasm in the graph where you notched out the offending frequency. The hole is so narrow and so far below the human voice that this filtering will simply clean your sound.

If you've still got hiss or random noise in the silent parts there are a couple of tools you can use to clean them out too. More on that in the noise removal section.

Post Production 2: Dynamics

Dynamics

Dynamics is a word that means movement, and in our case it refers to the movement of the level meter, the way the sound bounces up and down. The dynamic range of a sound is basically the amount of difference between the loudest and quietest sounds. A good LibriVox reader knows how to use the level of the voice to create ambience--this is good control of dynamics. You don't miss a word of the tale.

At the other end of the scale, listening to someone with a monotonous unvarying voice is like trying to read a book with a bag on your head: you have to keep rewinding because it couldn't hold your ears.

The human voice is the most dynamic sound in the studio, with the possible exception of the Fender Rhodes Piano, a hideous beast that should never be seen in public without a hard limiter on its output socket. But levity aside, LibriVox is all about the voice and a voice will always benefit from some judicious dynamic control.

There a number of dynamics tools:

  • Compressor
  • Limiter
  • Gate
  • Expander

Now let's whittle that down a bit: a limiter is an aggressive compressor; an expander is a soft gate. So now we're down to two: compressor/limiter and expander/gate. Now let's whittle it down even more: both the gate and the compressor are automatic volume controls--it's just that one works the other way up.

So basically all of your dynamics processing will be done by the same methodology. There is a great tool for Audacity called MDA dynamics and it can be found via the Audacity website. In one pass it covers compression, limiting, gating and expansion.

Compression

First let's talk about compression.

To explore these ideas you really need to be able to see something in front of you similar to what I am imagining so fire up your audio editor, load up a voice recording and find Effect | Normalise and normalise the file. All this does is to raise it up to the top limit so that you see the same things I am describing.

Using an audio compressor is a listening art that only works really well after a few years of practice, but using a compressor on Audacity or similar can be done visually since you can see the wave in front of you.

Human speech is the most dynamic sound in the studio: uncompressed, it disappears in a mix since most of it is very quiet and the loud spikes prevent it from being turned up. As a solo performance, the wide dynamics lessen its listenability. I listen to LibriVox books in two environments: on an MP3 player and in my workshop. The MP3 player has a sensible max sound level which is reduced by uncompressed sound. A poor listening environment like my workshop benefits from compressed sound, allowing all elements of the sound to come through without needing the amp turned up.

Using a compressor is easy enough once one knows what it does. When you have your recording and look at it on a digital editor you can see that there are lots of spikes, generally at the start of words, with the remainder of the word tailing off at a much lower level. That spike sets the upper level and the rest of your word falls below the ambient noise in the workshop reducing the intelligibility of the story.

What a compressor does is to monitor the level and according to the settings it has, as soon as the level rises above a preset point it turns it down. Then as soon as the level falls back, the compressor turns it up again.

The result is that all of the spikes have been reduced in level, reducing the difference between the upper and lower levels. The sound level is more consistent; it can now be turned up across the board raising the average level.

There are five controls common to audio compressors, all of which appear on its computer equivalent.

1. Threshold
2. Ratio
3. Attack
4. Release
5. Makeup gain

Threshold

Threshold is expressed in minus decibels. (-dB).

This is the level above which the compressor will react. If you look at the display on your editor, you will see that digital audio is labelled as zero dB at the top and goes to negative numbers as your signal gets lower. Your recording uses the full height of the available headroom. You'll see that the upper parts are only thin spikes and the main body of the speech is lower down. You would set your threshold somewhere above the main body of the sound so that it will only react to the spikes

Looking at the side of the wave window you will, according to the diplay settings, see the levels labelled as minus dB numbers and you can see that -12dB would be a reasonable level to set the threshold to, the compressor will only react above this so everything below it would be unaffected.

By default Audacity labels the side bar between 0 and 1, but it can be changed to show in dB.

Ratio

Ratio is expressed as one number to another. (e.g. 2:1)

Once you have your threshold set the ratio controls just how much the excess level will be reduced and it works thus. If a level is 12dB above the threshold the excess level will be reduced by the factor set as ratio. If ratio is 2:1 the 12dB excess will end up a 6dB excess, If you make it 3:1 it will end up as 4dB excess, and so on. If you wanted to limit the level to the threshold, a ratio above 10:1 would ensure that the 12dB excess would be squashed completely, enabling you to subsequently raise the level of the whole file by 12bd.

In Audacity you won't find these things labelled in dB: 0 to 1 seems to be the standard way of quantifying them, but it will still work with a little trial and error.

Attack

Attack is expressed in milliseconds (ms)

Attack determines how fast the compressor will react. Musically there is wide range of settings according to the instrument, but in the world of spoken word recordings, especially digital ones I always set this control to zero so that nothing gets by before the compressor does its work.

Release

Release is expressed in milliseconds (ms)

This one determines how long it takes the compressor to restore normal level control after the sound drops back below the threshold, in the digital domain this one has to have a number otherwise the compressor would clip the waves creating unpleasant distortion. 30 milliseconds is a setting that you can't go wrong with.

Makeup Gain

Makeup gain is expressed in decibels (dB)

On an analogue compressor this allows the output volume to be turned up, since compressing the spikes has given us some free headroom. In the digital domain it's best to leave this set at zero since it's better to have some headroom to play with until you've finished processing your file.

So for anyone who's interested, please have a go. Take a raw file and try compressing it different ways. When you've finished with it, normalise it and save it as a new file, play the old one and the new one and compare the difference.

If you've learned something useful give me some feedback (PM chaoscollective on the LibriVox forum). If my efforts have made a difference, I'll write instructions for removing hums, hiss, random noise and equalisation.

The Limiter

The limiter does not need explanation: it's a cheap compressor with a ratio higher than 10:1. It's a wicked machine, the work of the Beelzebub and all his little minions such as tacky pop music producers; it is designed only to make their records seem louder than last week's number one. Give it no quarter. Its best use is as a doorstop to hold open the studio door on nice days.

The Gate

The original noise reduction tool.

A gate is another one of those dark devices available only to us adepts. No not really, it's just the opposite of a compressor, with one extra control. It's called an expander and that extra control is called range.

The gate is a bit of a mystery to most people simply because there is only one similar device in the non-studio world, and it's nice to have a similar device to think about since it gives a spark of understanding right away.

If you have never used a two-way radio, you will not know what I mean. The only equivalent is the squelch control on a two-way radio. CB, Ham radio, and site walkie-talkies all have it. When you use a radio like this, if your opposite number is not pressing his mic key and sending you a signal you hear a hash of white noise--static in old money.

The squelch control is a gate: you turn up the threshold and the radio goes silent because the static is not strong enough to trigger the squelch to open. As soon as your pal on the other radio keys his mic, a radio wave is present in your receiver, which is much stronger than the static and so opens the squelch/gate. There you go, it's now much more of a pleasure to use that radio without all that nasty noise when no one is talking. Noise removal for two way radio, it works just the same for audio recordings. Read on...

The gate is a track cleaning device that predates digital recording. When all of the instruments were recorded to tape and had to be passed through the mixer in real time, the hardware gate was essential.

In a studio, it is common to use about eight mics on the drums. A stereo pair and individual mics on single drums, especially the snare. With so many mics, the drum sound can get a bit wild and woolly and difficult to mix, so all the single drum mics are gated. What the gate does is the inverse of what the compressor does. It monitors the level at the threshold, when the sound is above the threshold the gate is open, the sound is unaffected. When the sound falls below the threshold the gate closes shutting off the sound channel. On a drum mix this means that there are not always eight mics contributing to the wild and woolly sound, the channels only open when the drum is hit.

Another use of a gate: Imagine the recording of Freak by Radiohead (check it out on youtube): the verse is played on a bass and one gentle clean guitar. The second guitarist is waiting for the chorus with a guitar sound that can kill at ten paces. While he waits his amp will be humming, all guitar amps do. While he's not playing, the gate is shut and his track is beautifully silent, when he does start to play, the gate opens and the amp hum is buried under the guitar sound. When he is finished the gate discreetly closes and the hum is gone again.

All modern gates including their VST counterparts have a range control: instead of closing the gate you can set the range control to reduce the level rather than shut the channel. When it's used like this it's called an expander, which is the opposite of a compressor, it opens up the dynamic range rather than reduces it. A compressor reduces the loudest sounds; a gate reduces the quietest sounds.

So... what you need to do is:

1. Set your compressor to do its work to the top end of the dynamic range and when the make up gain has turned up the background and the breathing, you can use the gate to treat the bottom of the dynamic range. Start by setting the range control to infinite or as high as you can, so that the gate shuts--this way you can hear when the threshold is crossed.

2. Adjust the threshold so that the words come through and the breathing doesn't, now you've got the threshold between the words and the breathing.

3. Bring the range control up so that instead of closing, the gate reduces, you'll find that between the words the background is now lower.

4. The jump between levels will probably be quite abrupt, which brings us to the last two controls: attack and release. The attack controls how long the gate takes to open, so will be at the start of a word; the release controls the closing or reducing time so will occur at the end of a word.

By giving these settings long enough values, the gate will open and close smoothly without abrupt changes in level which might be noticeable.

On a snare drum you would have a very short attack to catch the impact and a fairly long release so that the gate doesn't slam shut and cut off the tail. On a voice you can have a longer attack and still need a fairly long release so that you don't cut off the word ends. But again, it's down to the nature of the speaker, you have to experiment until you get the right sound. But just knowing what the functions are will enable you to achieve what you want.

It's not just breathing you can bring down with this--if you can hear the computer fan or passing traffic beween the words soft gating will take them down.

I once was mixing a Bob Dylan cover song and we noticed a strange sound. The backing vocalist was asthmatic and during the verses whilst wearing headphones with a very loud monitor mix in them she wasn't aware that the sensitive vocal mic was recording her wheezing breath. I simply plugged a gate into the channel and set the threshold, no more wheezing in the stereo mix. “Cracking Contraption, Gromit!”