Basic Principles of Audio Processing: Difference between revisions

From Librivox wiki
Jump to navigationJump to search
Content deleted Content added
RuthieG (talk | contribs)
RuthieG (talk | contribs)
Continuing article 2
Line 1: Line 1:
<font color="red"><b>WORK IN PROGRESS</b></font> - RuthieG
<font color="red"><b>WORK IN PROGRESS</b></font> - RuthieG


This is a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.
This is the second part of a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.

== Digital recording, a brief overview ==

Recording to a computer these days is cheap and relatively easy. In 1997 recording software cost cost me 150 Pounds. These days there are much better free and open source applications to cover all that one can do in a studio. Audacity is one of the best. I'll try to refer to Audacity as much as I can so that what I write can be tried by all those who have the inclination.

Digital and tape, what's the difference?

=== <b>Tape</b> ===

* Magnetic tape records sound as a continuously variable magnetic field along the length of the tape; all of us over teenage are familiar with cassettes and possibly eight track cartridges. Magnetic tape has advantages and disadvantages against digital.

* Tape recordings degrade over time.

* Even the best tape has inbuilt hiss.

* Tape has to be moved extremely accurately both in position and speed, requiring very high quality hardware. Just one part out of adjustment can ruin a recording.

* Tape distorts the sound; admittedly it distorts the sound in a musically pleasing way such that Pink Floyd, Elton John and Kate Bush have never recorded in digital studios.

* Tape forgives high recording levels; the nature of magnetism is that when tape is over-saturated it makes music sound nicer, so much so that in studios it is deliberately over-driven to achieve this richly pleasing effect.

=== <b>Digital</b> ===

Where the waves on tape are stored as field strength, digital recordings are stored as a long series of numbers, which is what computers excel at. In fact storing and shifting numbers is the only thing a computer can do, but they can do it very fast, so much so that the numbers can be coloured spots on a screen or voltages on a speaker. Once you start shifting the numbers quick enough, the pictures move and the speaker sings: video and audio.

==== ''How it works'' ====

The ever changing voltage caused by a sound vibration from microphone or mixer is presented to a tiny measuring circuit called an Analogue to Digital converter. At predetermined intervals the circuit measures the microphone voltage and assigns a number as its value. At CD quality this voltage is measured 44,000 times per second. You may have heard mention of 16 bit and 24 bit and 32 bit sound; this refers to the accuracy to which the measurements are taken. A 16 bit binary number equates to about 65,000 in old money. CD is 16 bit 44k, therefore every 44th of a millisecond a measurement is taken and stored as a number between 0 and 65,000.

To play back that recording you do quite the opposite, every 44th of a millisecond a number is taken from memory and presented to another circuit, the Digital to Analogue converter, which then produces a voltage on its output in proportion to the number. The ever changing numbers produce an ever changing voltage which drives the speaker and you've got your sound vibrations back.

Digital recordings when looked at really closely do not look like a smooth curvy wave, they look like a series of steps, but the steps are so small and the duration of them so short that you don't hear the steps, it smooths out to an uncannily accurate reproduction, which is why even the smallest cheapest MP3 player sounds much better than the best cassette player.

Simple really, when someone sensible takes all the waffle away, isn't it?

==== ''Benefits and risks of digital recording'' ====

Digital recordings are very accurate, the accuracy determined only by the quality of the DA and AD converters.

But there is a risk: if the signal going in ''exceeds'' the measuring capacity of the converter it can't possibly get a higher number than 65k or a lower one than zero. Digital does not forgive overdrive. Digital distortion will make you throw off your headphones; it is about as pleasant as the sound made by a scallywag dragging a sharp key along the side of your brand new car.

Consequently when making a recording it is imperative to see to it that the signal '''never''' reaches and crosses 0dB DFS (Digital Full Scale).

On digital equipment zero decibels is the measure of the highest level, all other values are expressed as minus decibel numbers, all the way down to minus 96dB in the case of 16 bit or CD quality.

When I record to digital tape, I record with a maximum peak value of -12 dB; in film soundtracks the value is more like -20dB. This leaves room for unexpected peaks to remain undistorted they can be compressed later, but [http://en.wikipedia.org/wiki/Clipping_(audio) clipping] (going over the maximum level) can't be fixed easily.

In the days of vinyl records, the instruments were recorded to 24 track tape resulting in tape hiss. When it came time to mixdown, the tape signals were passed through processors and effects units and the mixing desk, picking up electronic noise along the way. The mix then went down to a stereo master tape picking up more tape hiss. The master was then taken to the pressing plant where it was passed through yet more processors in the mastering process picking up yet more electronic noise until it was cut to a master pressing disc, which was then used to press records, which ended up on your turntable. The fact that the resultant record sounded extremely clean and nice explains why a decent analogue studio costs a million pounds, but an equivalent [http://en.wikipedia.org/wiki/Pro_Tools Pro Tools] set up costs fourteen thousand.

=== Decibels (dB) an explanation ===

Audio, whether floating through the air or as an electrical voltage, is measured in decibels, which is one tenth of a bel. When processing sound it is useful to understand decibels, which are not like other measurements. For a start the decibel is [http://en.wikipedia.org/wiki/Logarithmic_scale logarithmic], because human hearing is logarithmic.

If I give you a ten pound load to carry, you would rate it as some value of heaviness. Then if I gave you another ten pound load to carry, it would feel twice as heavy. Hearing is not like this. Ten watts of audio power playing in a room could be measured at some position as 90dB, twenty watts would register a level of 93dB at the same spot, not 180dB.

That is the rule of thumb. For every 3dB up, double the actual power; for every 3dB down divide by two. This measuring in air is called SPL (sound pressure level) and is a consistent measurement: 90dB SPL is the same volume wherever it occurs.

* 85dB SPL is the recommended limit for long term industrial exposure without protection
* a jackhammer is 110dB SPL
* a jumbo jet on take off is 120dB SPL
* Motorhead once played a gig above 120dB SPL. (Long live King Lemmy!)
* 140 dB SPL causes instant and permanent deafness.

When talking about signal levels, decibels are used in different contexts. In an analogue channel of a mixer, dB is a relative measurement where 0dB represents the upper limit of signal strength for that channel but with some leeway above it.

In digital recordings the decibel is referred to as dB DFS [http://en.wikipedia.org/wiki/Digital_full_scale (Digital Full Scale)] where 0dB DFS is the '''absolute upper limit''' and all values are measured in minus values, there is no such thing as a positive dB in Digital Full Scale.

When talking about “line level” electrical signals that pass audio between equipment, 0db V (voltage) is defined as 775 millivolts. Perversely, pro equipment has its 0dB point at a line level of +4dB V and consumer equipment has a line level of -10dB V.

Confused? I am, but the important bit is the part about logarithmic hearing, just remember that part and you'll have a feel for what decibels measure.

== Digital recording: How to do it from scratch ==

To make decent recordings for LibriVox is easy. If you know that LibriVox exists you already have the most expensive bit of kit you'll need.

All you need to add to that is:

* Recording software (You can't go wrong with [http://wiki.librivox.org/index.php/Audacity_1-2-3 Audacity].)
* A microphone (not expensive, if it sounds OK then it is)
* For some mics, a small mixer to act as a pre amp and gain control for the microphone. (Look up Behringer mixers, best value in the business)
* A pop shield

=== Microphones ===

There are condenser mics and dynamic mics.

If you have a condenser microphone it needs power: either it has a battery compartment or it doesn't. If it doesn't, you need a mixer with a 48 volt phantom power button. This sends the mic power up the signal cable.

If you have a dynamic mic you don't need phantom power. A word of warning here, with most dynamic mics, if you switch on phantom power you will instantly need a new mic.

There are three types of connection for mics:

* USB
* 6mm or 3.5mm Jack (unbalanced)
* XLR (balanced)

The USB type has a flat connector that plugs into the rectangular USB port on your computer. XLR is a substantial plug with three pins. 6mm Jack is the round silver plug that you would plug into an electric guitar; 3.5 mm Jack is the small round plug that fits into the sound card on your computer.

The reason for the different types is simply that 3 core XLR is a method of getting interference in the mic cable to cancel itself out. 2 core Jack is less expensive. The USB mic is particularly useful for LibriVox purposes, as it is Plug and Play and suffers less from background noise.

=== Mixer ===

'''Note:''' it is entirely possible that you won't need a mixer. I know of at least one laptop which could record a nice clean signal through the mic socket but in my experience the mic socket in my desktop machine was filthy with noise. USB mics also don't require a mixer.

Your mixer doesn't have to mix anything, the smallest Behringer has one mic channel and costs peanuts. What it does though is give you a beautiful clean pre-amplifier, gain controls and a set of EQ controls to set the right tonal balance. No microphone has a flat frequency response and if yours does not suit your tastes you can adjust this with the EQ.

To record to the built-in sound interface on a PC, you'll need an output cable which will have two 6mm mono jacks at one end and one 3.5mm stereo jack at the other. The small jack will go into the line socket on your sound card, the 6mm monos will go into the unbalanced outputs from your mixer. You mixer may only have RCA connectors for unbalanced output, in which case get a cable to suit those. (RCA connectors are also known as phonos and are to be found on the back of CD players and on Playstations, one red, one white, the yellow on a Playstation is video.)

=== Pop shield ===

This is not essential but if you're close to the mic it will prevent popping on Bs and Ps. Say “Buh” and “Puh” to the palm of your hand and feel the blast of air. That blast is not sound it is wind and wind plays havoc with microphones, that's why TV and film recordists have a big hairy dog on a stick. The hairy dog is hollow with the mic floating in the middle. Sound can get through the hairy dogs furry coat but wind can't.

A pop shield is a disc of tenuous fabric suspended four inches in front of the mic, you can make one with a pop sock and a wire coat-hanger. Hey, maybe that's why they're called pop socks, they stop your mic popping.

=== Setting up the signal chain, when using a mixer ===

Put your mic cable in your mic channel. Connect your mixer output to the sound card and you're ready to set the signal.

There are at least three level controls on your mixer and there's a reason for them all to be there, it's called gain staging. Simply said, the level controls ensure that the signal level is optimum at each stage.

Bigger mixers have linear faders, smaller mixers have rotary faders: they do the same job. Faders are always labelled such that when set at 0dB there is room to move up as well as down which is where you get your freedom to adjust the mix if you're mixing. The 0dB setting is where the signal passes through unchanged in value.

Set your mic channel fader and master output fader to 0dB. Just where the mic connects there will be a small rotary control labelled trim, turn it right down. In this position, on the smallest inexpensive mixer you will be able to use the output meter to set the mic trim level. The output meter on my small Behringer is three LEDs, like a traffic light. Bigger mixers may have a bar graph or a needle meter.

At this point you get your sound source going into the microphone. e.g. Read a book.

While slowly turning up the trim control, watch the meter: if it's labelled in decibels you want to be bouncing around somewhere near 0dB. But don't peak too far over 0dB.

In the case of my traffic light meter you want some green light on most of the time with the occasional flash of yellow light; if the red light comes up the signal is too high.

Next look on the mic channel, there should be a red LED usually labelled 'Clip'. If this lights up, the channel is overloading, so turn down the trim a bit. You're nearly there.

Now turn down the master output fader and start looking at the recording meter on Audacity which should be on Record and Pause.

With your sound source still going, start turning up the output fader on the mixer. The reason for this is that 'line level' on a PC is lower than 'line out' on a mixer, so 0dB would be too loud. We have to come up at it from below. This is the '''gain staging''' process: setting the gain at each stage so it's the correct level for that stage.

If you find that the computer seems too sensitive, double click the speaker icon in the system tray. When the Windows mixer opens, click Options | Properties. Change the check box from playback to recording. The 'line in' fader may be all the way to the top, drag it down a bit to make the computer less sensitive to the mixer.

Back to the mixer: with Audacity running and in Record and Pause you'll have a meter to watch. Bring up the mixer output until the highest peak is reaching -12 dB on the recorder meter.

At this point, decide if you'll be monitoring the recording on headphones--probably not because all headphones leak sound and it will loop back through the mic. But you do need to do a bit whilst monitoring just to check that it actually sounds OK. While monitoring you can apply a bit of EQ on the mixer; for instance you can take a bit of low EQ off if you are coming out too boomy.

Monitoring must always be done from the end of the line so don't use the headphone socket on the mixer, use the 'line out' socket on the sound card.

If all is OK then you are ready to make your recording. You have just set up a signal chain like a professional would. See? I told you it was easy.

The rest of it is a breeze, just click off the pause and yap.


== Post Production 1: Equalisation (EQ) and mains hum removal==
== Post Production 1: Equalisation (EQ) and mains hum removal==

Revision as of 09:27, 10 April 2010

WORK IN PROGRESS - RuthieG

This is the second part of a series of short articles written by a sound engineer with many years' experience. The idea is to explain in plain language how to make a quality sound file.

Post Production 1: Equalisation (EQ) and mains hum removal

Post production tools fall into the categories of

  • FX
  • EQ
  • Dynamics
  • Special tools

what are they all about?

FX stands for effects: flange, phaser, fuzz box etc. We don't really need that in LibriVox so let's chuck it out and forget it. Job done. (If you really insist on reading The Pit and The Pendulum like John Laurie, then just chuck loads of reverb on, you'll wing it.)

EQ Stands for equalisation: tone, high middle and low, bass and treble--you know what I mean. Pretty easy, most people are comfortable with that concept and need read no further. But there are one or two tricks that can be done with digital EQ that are really quite effective, more later.

Dynamics is the one that most people don't understand, simply because there has never been a layman's equivalent, but honestly it's no more difficult than working the bass and treble on your old Hi Fi. Dynamics is about manipulating the level of sounds, whereas EQ is about manipulating the tone colour. I can use a compressor with my eyes shut but I have to think about EQ, so dynamics must be easier.

The one special tool I use is Sound Forge noise reduction--there will be a chapter on that later in this series.

Generally I would EQ before processing the dynamics; this is because EQ affects the loudness of frequency bands so you always have to do some dynamics after EQ even if you've already done it first.

So let's deal with EQ first.

When I was a kid I never touched the cheap sub standard tone controls on record players because of a sincere belief I held which years later was echoed almost word for word by a big record producer. He said, ”Tone controls should be absent from record players. After I've spent 12 hours slaving over a million pound mixer you don't have the right to go messing with my mix with a cheap crappy tone control.”

The EQ available in recording software is infinitely better than the old record players because it doesn't require million quid desks, just a bit of nifty mathematics.

EQ is not a black art if you go at it purposefully.

If a sound is low in one band or high in another, EQ can correct it. If it sounds good it probably is--if it ain't broke don't fix it.

Frequency bands

All sound is made up of various frequencies, perfect human hearing hears from 20 cycles per second to 20,000 cycles per second. When you get older your limit drops to 12-14k. Cycles per second in sound is labelled hertz, named after Heinrich Hertz, the Victorian German physicist who pioneered multinational car rental (now would I lie to you?).

If you play a 20 hertz tone through a speaker that can play that low you will feel it more than you will hear it.

  • 20 – 100 hertz is where the oomph of a kick drum lies in dance music.
  • 100 – 7000 hertz is where we speak and sing, our area of interest.
  • 100 – 300 hertz is the bottom end of a voice, the bit that a big man has more of than a petite woman. This is the area you could call 'boom'.
  • 300 – 900 hertz is the main body of a voice, the part that is pronounced in operatic singing. This part on its own sounds 'honky'.
  • 3000 – 5000 hertz is the fine detail in speech that makes it intelligible. This is the area that disappears on a cheap crappy tone control.
  • 6000 – 7000 hertz is the letter S, and the lush swoosh of a splash cymbal, this quite naturally is the 'hissy bit'.

When you give the frequency bands names like boom, honk and hiss it helps you to identify approximate frequencies by ear. I once walked out of terribly mixed Darkness concert muttering that it was all below 400 with nothing above 1k. In layman's terms, it sounded like a party in the house next door but amplified to the threshold of pain.

Coincidentally, electric guitars occupy all the same frequencies as the human voice--perhaps that explains why they've always been so popular. When I used to mix records, my method was to EQ the guitars down a bit from 3000 – 5000 (the intelligibility part) and raise voices in the same band. Consequently the voice came through the wall of guitar sound with just a small adjustment.

EQ controls

Small inexpensive mixers have EQ controls labelled high, mid and low and can be quite useful as long as they've been set at good intervals.

Sound Forge has 3 types of EQs:

  • Graphic
  • Parametric
  • Paragraphic.

Graphic EQ

Graphic EQ is the most familiar one to most people: the bands are labelled with freqencies and you can raise or lower the sliders to lift or cut in that band. Anyone can play a graphic EQ by ear. Just try a slider; if it doesn't do what you want put it back and try another. Totally unscary EQ.

Parametric EQ

Parametric EQ is what you find on pro mixers. You still have the high mid and low controls, but for each band you have three controls

  • Freq
  • Q
  • level

The Freq control is labelled in hertz and simply allows you to center that band on a particular frequency, so you can be quite specific about where you want to lift or cut.

Q is a great mystery, a black art known only to those adepts who have been baptised by Phil Spector. Not! Q translates as bandwidth. You can make a very narrow notch on the frequency spectrum by turning up the Q or a wider gentle slope by turning it down. If you find an EQ with the bandwidth labelled as Q then high Q means narrow band and low Q means wide band, it's really that simple. If you're not that confident about adjusting the bandwidth then just set it about half way and use the freq and level controls--you won't go far wrong.

The level control should start off centred and is capable of adding or removing up to 15dB at the centre frequency, tapering off each side according to the Q setting. A good trick I learned is to turn up the level to +15dB and swing the frequency control across the band, as soon as you hit the problem frequency it will stick out like a sore thumb. You can then return the level to zero and continue down to make a gentle cut.

I go to the trouble of explaining Parametric EQ because it is the magic bullet that can kill a mains hum and you'd never know it was there. You couldn't do it with a graphic EQ. More on that later.

The greatest problem you can get on recordings, the most annoying, and the easiest to remove is a constant drone such as mains hum.

A constant drone noise may not be a mains hum but all you have to do is identify the frequencies and then you can filter them out.

Highlight a bit of the file that is supposed to be silent and pull up the Spectrum Analyser. The only things here that will show up on the Analyser are random noise and the hum or buzz.

The Sound Forge spectrum analyser presents itself as a graph and it can be a useful tool if you want to identify a problem sound. There just have to be plugins for all the other programmes too--seek and you will find.

The base of the graph is Frequency and the side bar is Level in dB. The line across the graph shows the level of various frequencies. The line is generally pretty straight, slightly canted to the right and tails off at the top frequencies. If there is an obvious hump or spike in the line you have a problem that is easy to find. Zoom in on the spike. You can identify the center frequency in the hump or spike and remember it while you dig up the parametric EQ.

The nature of human hearing is that if you have to raise EQ it's better done with a wide Q but if you have to cut out a problem frequency with EQ it's better done with a narrow Q.

Paragraphic EQ

Paragraphic EQ is my favourite tool for this job, it's a four band parametric EQ with a little graph which shows you what you are about to do to a sound.

There are presets: one of the presets is 60hz mains hum removal. All the presets do is set the controls.

Mains supply in America is alternating current at 60 Hertz, in Europe it's 50 hertz. If the mains induces interference in your recorded signal, then it's at mains frequency and sometimes at the first harmonic of 100 or 120 hertz so if I have a sound with a mains hum the spectrum analyser tells me which side of the Atlantic it was recorded on.

If you've got the Sound Forge paragraphic EQ showing, you can pull down the preset for mains hum removal and you'll see what I describe.

Other parametric EQs will do the job just as well though.

Set the bandwidth to narrowest (high Q). On Sound Forge, this will narrow down to one note, set the centre frequency to 60hz U.S. or 50Hz U.K. and pull down the level control to -25. On the Sound Forge EQ, all of the four bands can have the same settings so you can stack four filters to give 100 decibels of hum removal. Your little graph will have a 100 decibel hole one note wide where the mains hum should be. Now when you consider that there's only 96 decibels to the bottom of a CD that's pretty good filtering. If there is still a droning hum it might be that there is a first harmonic of the mains at 100 or 120. In that case I usually use two filters for each frequency: 50 dB is as good as dead.

If you then spectrum analyse the same bit of file you'll now see a narrow chasm in the graph with a tiny mains hum spike in the bottom of it at about -80dB. The hole is so narrow and so far below the human voice that this filtering will simply clean your sound.

If you've still got hiss or random noise in the silent parts there are a couple of tools you can use to clean them out too. More on that in the Noise Removal section.