THE COST OF LIVING - DOES 96KHZ MAKE SENSE? - Yamaha

The ‘96K’ debate may have already started when CBS started retailing Billy Joel’s 52nd Street on Compact Disc (CD) in 1982. A few years earlier Philips and Sony, in a rare collaborative mood, decided that the CD standard would have 16 bit words to represent digital audio and that 44100 samples would be used to represent one second of audio - a consensus on trade off between audio quality and storage capacity.

In 1982, A/D and D/A convertors used analogue anti-aliasing filters to make things work. These filters had to be extremely steep to apply a brick wall slope between 20kHz and 22.2 kHz with difficult to hide artifacts, already starting the debate on using a higher sample rate. The issue was soon resolved by using oversampling and digital filter techniques.

Frequency range.

Most broadcast and live systems today use a sampling frequency of 48kHz, this being slightly less demanding on the anti-aliasing filter. Using a higher sampling frequency of 96kHz extends the audio reproduction frequency range from 20kHz to to 40kHz. However, the human hearing range really ends at around 20kHz... the highest hair cell in the human ear's cochlea is tuned to about that frequency with a very narrow bandwidth and there are no hair cells listening to higher frequencies. So why do we need 96kHz?

Timing resolution.

The answer is timing resolution. For continuous signals, a 48kHz system is perfectly capable of reproducing time relationships with a very high accuracy. But when it comes to the start and end of audio signals, the timing resolution breaks down to about 21 microseconds - the reciprocal of the sampling rate. The challenge: the human hearing system is actually capable of detecting smaller time differences then 21 microseconds (known as dichotic difference). For example, experienced listeners, in ideal listening conditions, have been reported to be able to localise sounds with accuracy of less than five degrees. Since localisation is partially dependent on the arrival time difference between left and right ears, taking an average head size of 20cm sets the time difference between left and right for a sound coming from the side (90 degrees) at about half a millisecond. Five degrees then accounts for about six microseconds.

To be able to capture this resolution, a sampling rate of 192kHz would do the trick. However, most listening is done in less controlled circumstances such as a home living room, a hotel room, in a car or at a rock concert with a large PA system and thousands of fellow-listeners. In those cases, a 21 microseconds resolution may be more than enough - corresponding to 48kHz. Only for controlled listening situations - with perfect acoustics, perfect speakers and a single listener in the perfect sweet spot, is 96kHz worth going for.

Cost.

Going up from 48kHz to 96kHz basically comes with three significant cost factors. First the system’s DSP capacity has to be doubled (quadrupled if you use convolution); basically meaning more and/or more powerful - and therefore more expensive - DSP chip sets, backbone structure and memory. Secondly, audio transport (infrastructure) capacity also has to be doubled to cope with the extra 48kHz of audio data. With most Gigabit audio networks that’s no problem - the network can easily cope with the extra data. But if conventional audio transport is used the double sampling rate is paid back by halving the channel count - e.g. an AES10 (MADI) cable drops from 64 to 32 channels. Finally, for recording, hard disk capacity needs to be doubled since twice the amount of data has to be stored. All together a big cost increase. Oh, and if you go to 192kHz, everything doubles again...

Rule of thumb.

As a rule of thumb, a sampling rate of 48kHz is more than enough for very high quality live and domestic audio system performance. It's the reason why most live mixers run on 48kHz, and we still play CD’s using 44.1kHz without many complaints. But if the production is made for active listening in a controlled environment, live or through recording, then 96kHz may be an option – but it comes at a high cost.

Audio quality in networked systems

Back to Top