Tech Specs

From Librivox wiki
Revision as of 18:35, 20 May 2009 by Jc (talk | contribs) (New page: === Quick Facts! === Please submit final contributions with the following parameters. * Sample Frequency: '''44.1 kHz (44,100 Hz)''' * File format: '''MP3''' * [wiki:self:Gloss...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Quick Facts!

Please submit final contributions with the following parameters.

* [wiki:self:Glossary#bit_rate Bit rate]: 128 Kbps constant bit rate
  • Recording Mode: Mono (make sure completed file is audible in BOTH EARS when heard through headphones)
  • Sample size: 16 bit
  • ID3 Tag: ID3v2 Please only fill in Name, Artist, Album tags (as specified in the project's top post). All other necessary tags (like Genre) will be filled in during cataloging. More here

If you don't know how to do that, don't despair. Check out the sections about editing (Audacity, GarageBand) for more information, post your file to the Listeners and Editors wanted forum, or tell the book coordinator that you are unsure so s/he can check. In most cases the files will be fine! :-)

( Record a test if you're uncertain.) LibriVox is about books! Not about technical specifications.

Why do we require specific technical settings?

The list of parameters when recording audio and subsequently converting it to MP3 is long. The list of possible combinations is even longer. Even though all these techniques and file formats are standardised, not every software and not every hardware works well with all the possible combinations. The values given above form a very common combination that should "just work (TM)" with every player out there. So by adhering to the standards above, you just make life a easier for the coordinators.

They are also a set that while recording provides CD-Quality and after compression a quality that for most cases is hardly distinguishable from a CD.

But using other sample rates than 44.1 kHz is better suited to voice recording, isn't it?

There has been a lot of discussion, since there are tempting reasons to use other sample rates. Therefore the complete official version is the following:

  1. 44.1 kHz is the default sample rate in lots of software. We need to get files at this rate because this is what our host,, uses for its flash player
  2. Please don't use any rate other than 44,100 Hz. If you accidentally send us something at these rates, our catalogers will resample it for you and advise you on how to record at a sample rate we can use. Don't worry, your work will never be wasted!

What do the specifications mean?

The following will be technical and only explain the prominent parameters mentioned above. For questions on other terms you might want to visit the Glossary.

File Format

In the beginning of PC based multimedia, all computer makers were happy if their system could store and playback audio data. Nobody thought about exchanging audio with other users. This is why every system that has been around for more than 10 years has its own audio format. For SUN its call .au, for Microsoft .wav, Macs have their own, SGI had one and probably a lot more companies. However they all contained [wiki:self:AboutSampling PCM data] with some additional information, mainly what the recording parameters were. So most of these formats have been obsoleted by Microsoft's .wav for PCM audio data. Here is more information about the process of converting sound to digital data.

For the compressed representations, however, the representation of the audio data may change significantly. This creates the need for a own file format to hold data that was compressed using a particular technique, like MP3, AAC, !OggVorbis or !RealMedia.

Sample Frequency

This determines the highest frequency. If set too low, your recordings will sound dull and have a metallic feeling to them.

The sample frequency is the number of measurements per second take by A/D-converter. The unit Hertz[Hz] is the same as 1/second. Therefore a sample frequency of 44.1 kHz states, that in one second, 44100 samples will be taken.

There is a nice law that states the following:

If the highest tone found in a piece of audio is known, then if the sample frequency is twice as high as that highest tone, it is possible to exactly reconstruct the original wave from the digital data.

Put another way, using a sample frequency of 44.1 kHz the highest frequency that can be recorded without a quality loss is 22050Hz. The highest frequency a human can perceive depends on the age and is estimated at around 18 kHz for very young people and at around 12 kHz for elderly people without any explicit hearing damage. The highest tone is human voice is estimated at around 8 kHz. With that information, a sample frequency of 16 kHz would generally suffice. However there are sounds beside the voice in a recording, that make it sound better and that are above 8 kHz.

Sample Size

This parameter determines the dynamic range of your recording. The higher the number, the more lively your recording will sound. You will be able to hear more subtle changes in the volume of your reading, like whispering and shouting.

This parameter determines the accuracy of each measurement (or sample). The bigger the number, the better the quality. The process of rounding the actual measurements to one of the values representable in the computer is called quantization, the noise introduced by doing so quantization error.

The standard for CDs is a sample size of 16 bit. This is sufficient to store audio at a very high quality, if the whole recording process has been properly balanced. If you plan to do audio manipulation like amplification, normalizing or compressing you should use the highest sample size available from your hard and software. After all editing is done, you can lower the sample size again.

Further information (WWW Links)

A Short Course in Digital Audio

A program which runs under MS Windows; the file size is 4.4 MBytes.

[attachment:a_short_course_in_digital_audio__narrated_animation.exe A Short Course in Digital Audio]

(by Syntrillium Software Corp., the originator of [:SoftwareWeUse#cool_edit:Cool Edit]) is a narrated animation which introduces the concepts of analog audio (sound waves and their graphical representation, conversion to a electrical signal by a microphone, and analog recording and playback) and digital audio (digitizing or sampling, Wave files, and PC sound cards).

Note that the hyper-links and references within the course to the Cool Edit and Syntrillium Web sites are obsolete (since Cool Edit is now [:SoftwareWeUse#adobe_audition:Adobe Audition]). And that the brief discussion of MIDI files is not relevant to !LibriVox.

Primer on PC Audio

The Primer on PC Audio (by High Criteria, the producer of [:SoftwareWeUse#total_recorder:Total Recorder]) covers much of the same material as the Short Course above, but in written form. It also discusses compression formats, such as MP3.

Note that the primer covers material (such as Transfer of Audio from LPs and Cassettes to CDs) which is not relevant to !LibriVox.

Guide for audiobook narrators

Free online book for new narrators at Read By The Author dot com

Also goes into issues involved in editing and doing retakes.