Tech Specs

From Librivox wiki
Revision as of 21:05, 17 February 2021 by TriciaG (talk | contribs) (Quick Facts!)
Jump to: navigation, search

Quick Facts!

Please submit final contributions with the following parameters:

  • Sample Frequency: 44.1 kHz (44,100 Hz)
  • File format: MP3 (MPEG-1 Audio Layer III)
  • Bit rate: 128 Kbps constant bit rate
  • Recording Mode: Mono (make sure completed file is audible in BOTH EARS when heard through headphones)
  • Volume: around 89 dB (-16 to -21 LUFS)
  • Sample size: 16 bit

If you're not sure whether your settings are correct, then try the tool Checker.

If you don't know how to do that, don't despair. For more information, check out Audacity 1-2-3. You can also post your file to the Listeners and Editors wanted forum, or tell the book coordinator that you are unsure so s/he can check. In most cases the files will be fine! :-)

Record a test if you're uncertain.

Why do we require specific technical settings?

The list of parameters when recording audio and subsequently converting it to MP3 is long. The list of possible combinations is even longer. Even though all these techniques and file formats are standardised, not every software and not every hardware works well with all the possible combinations. The values given above form a very common combination that should "just work (TM)" with every player out there. So by adhering to the standards above, you just make life easier for the coordinators.

They are also a set that while recording provides CD-Quality and after compression a quality that for most cases is hardly distinguishable from a CD.

Finally, requires these specifications to derive other file formats for us. Using different specifications would limit or prevent that.

But using other sample rates than 44.1 kHz is better suited to voice recording, isn't it?

There has been a lot of discussion, since there are tempting reasons to use other sample rates. Therefore the complete official version is the following:

  1. 44.1 kHz is the default sample rate in lots of software. We need to get files at this rate because this is what our host,, uses for its flash player
  2. Please don't use any rate other than 44,100 Hz. If you accidentally send us something at these rates, our catalogers will resample it for you and advise you on how to record at a sample rate we can use. Don't worry, your work will never be wasted!

What do the specifications mean?

The following will be technical and only explain the prominent parameters mentioned above. For questions on other terms you might want to visit the Glossary.

File Format

In the beginning of PC based multimedia, all computer makers were happy if their system could store and playback audio data. Nobody thought about exchanging audio with other users. This is why every system that has been around for more than 10 years has its own audio format. For SUN it's called .au, for Microsoft .wav, Macs have their own, SGI had one and probably a lot more companies. However they all contained [wiki:self:AboutSampling PCM data] with some additional information, mainly what the recording parameters were. So most of these formats have been made obsolete by Microsoft's .wav for PCM audio data.

For the compressed representations, however, the representation of the audio data may change significantly. This creates the need for a own file format to hold data that was compressed using a particular technique, like MP3, AAC, OggVorbis or RealMedia.


We can hear because sound waves enter the ear and are transmitted to the brain. During that whole process the wave is analogue, or with another word, continuous. That means at every possible point in time the wave has a certain strength. That strength also may have any value. The number of possible time-strength pairs is infinite (really!).

However a computer cannot handle infinity. Therefore the wave has to be digitized. During that process sample points in time are chosen and at these sample points the strength is measured. A chip called an A/D converter is responsible for that and a sound card usually does little more than that. The process described above has a name: It is called Pulse Code Modulation or PCM.

All audio that enters a computer or another digital system will in the beginning be represented as PCM data. But, since PCM data still uses a lot of storage space, more intelligent ways of storing that information have been invented, among them MP3 or the GSM compression used in mobile phones. They basically use one of two principles.

  1. The human ear does not hear everything that actually is inside the PCM data. MP3, ogg and AAC are all based on this property.
  2. The PCM data is very redundant. That means that a value can nearly be calculated if the prior values are known. GSM and the ADPCM format found on Windows PCs use this property.

So the compression algorithms do the following. They calculate which parts of the PCM data they can leave out without changing the audio generated at playback time. To do so they usually need to do a lot of mathematically and computationally complex transformations and often the intermediate results have to be kept around for playback.

Sample Frequency

This determines the highest frequency. If set too low, your recordings will sound dull and have a metallic feeling to them.

The sample frequency is the number of measurements per second take by A/D-converter. The unit Hertz[Hz] is the same as 1/second. Therefore a sample frequency of 44.1 kHz states, that in one second, 44100 samples will be taken.

There is a nice law that states the following:

If the highest tone found in a piece of audio is known, then if the sample frequency is twice as high as that highest tone, it is possible to exactly reconstruct the original wave from the digital data.

Put another way, using a sample frequency of 44.1 kHz the highest frequency that can be recorded without a quality loss is 22050Hz. The highest frequency a human can perceive depends on the age and is estimated at around 18 kHz for very young people and at around 12 kHz for elderly people without any explicit hearing damage. The highest tone in the human voice is estimated at around 8 kHz. With that information, a sample frequency of 16 kHz would generally suffice. However there are sounds beside the voice in a recording, that make it sound better and that are above 8 kHz.

Sample Size

This parameter determines the dynamic range of your recording. The higher the number, the more lively your recording will sound. You will be able to hear more subtle changes in the volume of your reading, like whispering and shouting.

This parameter determines the accuracy of each measurement (or sample). The bigger the number, the better the quality. The process of rounding the actual measurements to one of the values representable in the computer is called quantization, the noise introduced by doing so quantization error.

The standard for CDs is a sample size of 16 bit. This is sufficient to store audio at a very high quality, if the whole recording process has been properly balanced. If you plan to do audio manipulation like amplification, normalizing or compressing you should use the highest sample size available from your hard and software. After all editing is done, you can lower the sample size again.

Further information (WWW Links)

Primer on PC Audio

The Primer on PC Audio (by High Criteria, the producer of Total Recorder) covers much of the same material as the Short Course above, but in written form. It also discusses compression formats, such as MP3.

Note that the primer covers material (such as Transfer of Audio from LPs and Cassettes to CDs) which is not relevant to LibriVox.